表題番号:2021C-131 日付:2022/04/06
研究課題集合知による注釈付けに基づくデータ駆動型言語理解の変革
研究者所属(当時) 資格 氏名
(代表者) 理工学術院 基幹理工学部 教授 河原 大輔
研究成果概要
We tried to use the wisdom of crowds to build probabilistically annotated corpora towards a breakthrough in natural language understanding. Using crowdsourcing as the wisdom of crowds, five to ten crowdworkers made annotations for the tasks of syntactic parsing and discourse relation analysis. Furthermore, the collected annotations were converted to probabilities using the EM algorithm. As a result, we confirmed that the higher the level of a task is, the more the probability value of each annotation label varied. We also verified that the resulting probabilistic multi-label annotations were plausible. In the future, we plan to increase the size of the probabilistically annotated corpora and develop analyzers based on the annotated corpora. This will enable us to dramatically improve the accuracy of natural language analysis and understanding, such as syntactic parsing, anaphora resolution, and so forth.