表題番号:2019C-718 日付:2020/05/07
研究課題Speech recognition enhancement targeted for non-native speakers, using dual supervised learning and policy gradient methodology
研究者所属(当時) 資格 氏名
(代表者) 理工学術院 情報生産システム研究センター 助手 ラジコヲスキ カツペル パエル
(連携研究者) Waseda University, Graduate School of Information, Production and Systems Professor Osamu Yoshie
(連携研究者) Warsaw University, The Faculty of Electronics and Information Technology Professor Robert Nowak
研究成果概要
Automatic speech recognition (ASR) systems achieve high accuracy rates, depending on the methodology applied and datasets used. The score decreases significantly when the same ASR system is being used with a non-native speaker of the language to be recognized. The main reason behind that is a specific pronunciation and accent features related to the mother tongue of such a speaker. At the same time, the limited volume of labeled non-native speech datasets makes it difficult to train sufficiently accurate ASR systems for non-native speakers, from the ground up. In the research we addressed the problem, using the dual supervised learning and style transfer methodology. We designed a pipeline for modifying the speech of a non-native speaker so that it resembles the native speech to a higher extent. The publications cover experiments for the accent modification using different experimental setups and different approaches. The experiments were conducted on English language pronounced by Japanese speakers (UME-ERJ dataset). The results show that there is a significant relative improvement in terms of the speech recognition accuracy. Our methodology can be used as a real-time wrapper for any existing ASR system, which reduces the necessity of training new algorithms for non-native speech (thus overcoming the obstacle related to the data scarcity).