From law-enforcement operations, witness protection, and privacy concerns caused by deepfakes, the ability to anonymize speech is sought after by both government agencies and individuals concerned with their privacy. A new research study will enable users to alter their voice in real time using the power of AI.
Dr. Ricardo Gutierrez-Osuna, professor in the Department of Computer Science and Engineering has received a contract to further develop AI voice modification software. The collaboration includes researchers from Texas A&M (Ph.D. student Waris Quamer), Honeywell’s Drs. Tor Finseth and Nichola Lubold, Professor John Hansen from the University of Texas at Dallas, and Professor Zhiyao Duan from the University of Rochester.
As a non-native English speaker, Gutierrez-Osuna began working on voice modification software nearly 20 years ago. At the time, his goal was to create a system that would allow second-language learners to listen to their own voices but with a native accent, making it easier for them to learn correct pronunciations. As time has passed, the need for voice anonymization software has increased. Gutierrez-Osuna feels that his work has now come full circle, from modifying speech to help people learn new languages to creating speech anonymization AI that protects those speakers.
“The main goal of this project is to get somebody to speak into a microphone, and with a small delay of a few milliseconds, transform that voice so it sounds like somebody else, someone of a different sex, different age, different voice timbre, different emotional content, different speech hesitations, and eventually different word choices,” said Gutierrez-Osuna.
Gutierrez-Osuna feels that his work has now come full circle, from modifying speech to help people learn new languages to creating speech anonymization AI that protects those speakers.
While voice anonymization software currently exists, the resulting speech signal produced by the AI software contain small amounts of distortions. These distortions may be used to determine that the speaker’s voice has been altered. Researchers are currently working to reduce these distortions and make the anonymized speech sound as natural and authentic as possible.
If successful, this project will allow users to make their voice unrecognizable using the power of AI to transform speech at streaming rates. The system will allow users to change their voice biometrics, making them harder to identify, be it by human listeners or automatic speaker verification systems.
One obstacle faced by researchers is public concerns about AI being used to alter voices.
“Of course, as with any technology that you develop, someone out there may use it for purposes that you didn't intend or foresee. When I started working on this, I had no idea that deep fakes were going to be a thing in 20 years,” said Gutierrez-Osuna.
Researchers believe that this research poses no threat to public privacy and may have the potential to alleviate some of these concerns. As the new AI software continues to be tested, researchers may use the system to enhance deepfake countermeasures, preventing other technologies from being used for malicious intent.