特定課題報告書印刷(Print out of Special Research Projects)

表題番号：2023R-020 日付：2024/04/05

研究課題Automatic generation of vocabulary quiz items for large-scale testing

	研究者所属（当時）	資格	氏名
（代表者）	理工学術院創造理工学部	教授	ローズ　ラルフ　レオン
（連携研究者）	早稲田大学理工学術院	准教授	折田奈甫
（連携研究者）	早稲田大学理工学術院	准教授	菅原彩加
（連携研究者）	早稲田大学理工学術院	講師	ワンチャオ

研究成果概要: Work on the project during AY2023 proceeded in two main areas. The first area built on work previously done in an AY2021 Tokutei Kadai project in which we asked language teaching experts to review a large set of automatically generated multiple-choice cloze vocabulary quiz questions and check the questions for well-formedness, and then to revise items that were deemed not well-formed. Due to constraints at the time, only about half of the 5,000 items were checked. In this year, the remainder of the set was checked to form a large dataset. This is a unique dataset in the field in that it is not simply a set of well-formed items, but rather contains expert judgments on what is and is not well-formed. This information is extremely useful for machine learning approaches to the automatic generation of these vocabulary questions. In fact, the dataset is already being used in a collaborative effort together with researchers at the Intelligent Media Processing Group at Osaka Metropolitan University.

The second main area of work has been on building a new generator, capitalizing on the recent advances in the generative capabilities of large language models, particularly OpenAI’s GPT 3.5-turbo. A new generator has been developed in the Python language and has been published to GitHub for public dissemination and sharing as the repository “VocQGen” (vocabulary quiz generator). VocQGen takes advantage of GPT 3.5-turbo and various Python libraries to control part-of-speech correspondence in order to generate multiple-choice cloze vocabulary questions. Initial results were immediately very promising showing in a small-scale investigation that well over 90% of the items were judged to be well-formed and suitable for use with our target learners (university students).

Further developments were made using retrieval-augmented generation (RAG) to advance VocQGen’s generative capability. At the end of the project year, generated items were submitted to a large-scale evaluation with 68 university students (our target users) and 8 teachers (as language testing experts). The results of this evaluation are still being analyzed, but preliminary results suggest that VocQGen generates items that are more reliable than ones that were produced manually (by an experienced language teacher for actual use in testing).

VocQGen is publicly available via GitHub and other researchers as well as developers are welcome to make use of it. Future work involves providing a web interface such that teachers (for example) could use it to quickly generate vocabulary quizzes for their students.