EWHA, THE FUTURE WE CREATE

EWHA Portal

Ewha W.UEwha W.U

Open Search
Search
Open Mobile Menu

Ewha W.U

Search
nav bar
 
Ewha University

Research News

Research Team Led by Professor Choi Hye-Won Gains Attention with Humanities-based AI Research

  • 작성처
  • Date2021.11.26
  • 7934

Research Team Led by Professor Choi Hye-Won Gains Attention with Humanities-based AI Research


Improving AI speech recognition performance by collecting non-native speech data of Korean from speakers of 65 languages 

Expecting the humanities-based AI research to further contribute to the globalization of the Korean Wave




The “Language and Artificial Intelligence” research team led by Professor Choi Hye-Won from the Division of English Language and Literature has conducted a non-native Korean speech data project for The Open AI Dataset Project (AI-Hub) for 2021, which was supervised by the Ministry of Science and ICT and supported by the National Information Society Agency. The project is being implemented as part of the government’s “Digital New Deal” policy and regarded as the core project of the “Data Dam,” aimed at promoting Korea’s leap forward as an AI powerhouse and the intelligent innovation as measures to facilitate innovation across the state and society, while supporting the construction and development of AI learning data for the purpose of job creation. 


The research team received 200 million KRW from the government funding of 1.9 billion KRW assigned to the consortium formed with five other domestic institutions, to carry out planning, designing and collecting non-native Korean speech data, and analyzing pronunciation errors in them until the end of 2021. The project aimed to enhance AI recognition of Korean speech by building non-native Korean speech data for AI learning.


AI voice recognition technology is widely used in our daily lives as it is embedded in smartphones, AI speakers, navigation systems, and automatic translators. However, in reality, minority speakers such as the elderly, dialect users, and foreigners who entail atypical and distinct phonetic characteristics are typically excluded from the main subjects of AI voice recognition. As the model performance of recent AI that utilizes deep learning technologies is determined by the volume of data, it is difficult to reflect quantitatively minor languages. Recognizing this issue, the research team at Ewha attempted to supplement the limited volume of low-resource data, namely Korean speech by foreigners, using a data collection technique based on error analysis annotation. To this end, a data construction strategy was formulated based on the comparative study of speakers’ native phonetic and phonological systems, and Korean speech data was collected from speakers of 65 languages from 80 countries, including English, Chinese, Japanese, Vietnamese, and Thai.


Currently, 2.5 million foreigners, forming five percent of the population, reside in South Korea, speaking Korean. The recent global popularity of the Korean Wave, manifested by cultural assets such as Parasite, Squid Game, and BTS, has drastically increased the interest toward the Korean language. Amidst the exponential increase in the number of foreigners learning Korean in and out of Korea, the study by Professor Choi’s team is meaningful as it is expected to help meet the growing demand for Korean language education.


Going forward, the study results are expected to contribute to research on improving the recognition rate of and developing a model for foreigners’ Korean speech, and help multicultural families, foreign workers, and foreign tourists to use various AI voice support services with greater ease. The findings are also expected to be put into meaningful use with countless possibilities for various learning tools, including the development of learning apps for efficient learning of the Korean language to assist the rapidly-growing number of Korean learners.