表題番号:2019C-718
日付:2020/05/07
研究課題Speech recognition enhancement targeted for non-native speakers, using dual supervised learning and policy gradient methodology
研究者所属(当時) | 資格 | 氏名 | |
---|---|---|---|
(代表者) | 理工学術院 情報生産システム研究センター | 助手 | ラジコヲスキ カツペル パエル |
(連携研究者) | Waseda University, Graduate School of Information, Production and Systems | Professor | Osamu Yoshie |
(連携研究者) | Warsaw University, The Faculty of Electronics and Information Technology | Professor | Robert Nowak |
- 研究成果概要
- Automatic speech recognition (ASR) systems achieve high accuracy rates, depending on the methodology applied and datasets used. The score decreases significantly when the same ASR system is being used with a non-native speaker of the language to be recognized. The main reason behind that is a specific pronunciation and accent features related to the mother tongue of such a speaker. At the same time, the limited volume of labeled non-native speech datasets makes it difficult to train sufficiently accurate ASR systems for non-native speakers, from the ground up. In the research we addressed the problem, using the dual supervised learning and style transfer methodology. We designed a pipeline for modifying the speech of a non-native speaker so that it resembles the native speech to a higher extent. The publications cover experiments for the accent modification using different experimental setups and different approaches. The experiments were conducted on English language pronounced by Japanese speakers (UME-ERJ dataset). The results show that there is a significant relative improvement in terms of the speech recognition accuracy. Our methodology can be used as a real-time wrapper for any existing ASR system, which reduces the necessity of training new algorithms for non-native speech (thus overcoming the obstacle related to the data scarcity).