特定課題報告書印刷(Print out of Special Research Projects)

表題番号：2025R-026 日付：2026/04/05

研究課題Facilitative effects of Japanese disfluencies for listeners in Japanese

	研究者所属（当時）	資格	氏名
（代表者）	理工学術院創造理工学部	教授	ローズ　ラルフ　レオン
（連携研究者）	理工学術院英語教育センター	准教授	菅原彩加
（連携研究者）	理工学術院情報通信	院生	Yang, Yaxi

研究成果概要: To lay the foundation for future work on the use of hesitation phenomena in crosslinguistic communication (e.g., Japanese and English) via speech AI, the project focused on organizing some of the conceptual framework for such research and carrying out investigations of AI-generated speech. Initially, this involved building the case for more inclusive Speech AI. While this naturally includes making speech AI capable for those with speech pathologies, we tried to build the case that these interests align with similar interests in making speech AI capable for nonnative speakers of a language whose disfluency patterns may prevent smooth spoken communication with conventional AI tools—comparable to difficulties that those with speech pathologies face. We reported and discussed our position with participants at the Speech AI for All workshop at the CHI international conference.

Then, we began to look at speech AI directly in order to better understand how disfluencies are produced and/or handled by speech language models. In particular, we want to know how natural are the disfluencies that AI speech models produce. We performed an experiment to get speech models to generate speeches and intentionally include disfluencies in the speech. We found that some speech models can produce filled pauses (e.g., um/uh in English, e-to/ano- in Japanese) in ways that look objectively natural, the production of other kinds of disfluencies (e.g., repairs, repeats, prolongations) are clearly unnatural and rather than sounding hesitative, actually show signs of being emphatic.

In related but separate work, we have studied how speech AI agents perceive disfluencies in human speech recordings. Human listeners are able to use disfluencies to more effectively comprehend a speaker’s message, including syntactic structure. So we aimed to answer the question whether AI agents show the same pattern of results. We found that current speech language models do use disfluency information similar to the way humans do, though perhaps not to the same depth. The results of this perception study as well as the above production study have been submitted to international conferences for presentation.

Finally, since our study of disfluencies importantly looks at their acoustic nature, we also looked at how such acoustic measures as jitter, shimmer, and harmonicity influence judgments of second language fluency. The results of this study have also been submitted for presentation at an international conference.