Golden speakers: a new technology to teach pronunciation in a second language

Adult learners of a second-language (L2) often continue to speak with the accent associated with their first language. This can subject them to discrimination and make them less confident when interacting with others.

L2 learners rarely receive formal training in pronunciation, a primary reason being that effective training must be customized to meet each learner’s individual needs. To address this problem, Dr. Ricardo Gutierrez-Osuna, professor in the Department of Computer Science and Engineering and director of the Perception Sensing and Instrumentation Lab at Texas A&M University, is working with linguists at Iowa State University to develop the technology to provide better pronunciation practice.

“Paraphrasing my colleague John Levis, pronunciation is viewed as the Cinderella of language learning, which typically emphasizes grammar and vocabulary, whereas pronunciation is expected to just happen,” Gutierrez said.

One of the primary difficulties with pronunciation is that it cannot be taught on a large scale, such as in a classroom of 300 students. Traditional classroom instruction cannot easily meet the needs of diverse learners because each individual has their own needs based on their first language.

Gutierrez and his collaborators are working to develop algorithms that synthesize a personalized “golden speaker” for each learner. The golden speaker will be his or her own voice, but without the non-native inflection usually present. The idea is that learners can more easily perceive differences between their actual and ideal pronunciations when hearing their own voice, thus improving their pronunciation.

The National Science Foundation recently awarded Gutierrez and his team two grants to develop and evaluate golden speakers. The first grant, from the NSF Robust Intelligence program, will develop the underlying technology for golden speakers, the speech representations and speech processing algorithms to produce high-quality golden speaker voices.

Similar to RGB color coordinates, the teacher’s voice and the L2 learner’s voice each have their own weighted sum of base sounds that comprise their individual way of speaking. The golden speaker program will record the teacher’s and the learner’s voices while speaking a sentence or phrase. It will then assess the fragments and make the learner’s voice comparable to that of the teacher’s voice by recombining the base sounds.

The second grant, from the NSF Cyberlearning program, will create a web-based learning environment that allows L2 learners to build their golden speakers by selecting among different regional U.S. accents or voices that they identify themselves with, such as actors, world leaders, or inspirational figures. The investigators will use this learning environment to study how golden speakers facilitate pronunciation training.

“The goal is to create these engaging pronunciation exercises in a computer format so that students can practice on their own time, at their own pace, and in the comfort of their own home, with their alter ego: their own voices but with a native accent,” Gutierrez said.

To bring these projects to fruition, Gutierrez is collaborating with professors John Levis and Evgeny Chukharev-Hudilainen of the Applied Linguistics program at Iowa State University, and current Ph.D. students, Chris Liberatore and Guanlong Zhao. Former Ph.D. students, Dr. Daniel Felps and Dr. Sandesh Aryal also helped with this research.

These two projects have the potential to impact society in a significant way by enhancing communication in critical sectors such as higher education, technology and healthcare, which attract a large number of non-native speakers of English.