Machine listening: Making speech recognition techniques extra inclusive

May 1, 2024

1

Interactions with voice expertise, comparable to Amazon’s Alexa, Apple’s Siri, and Google Assistant, could make life simpler by growing effectivity and productiveness. Nevertheless, errors in producing and understanding speech throughout interactions are widespread. When utilizing these units, audio system typically style-shift their speech from their regular patterns right into a louder and slower register, known as technology-directed speech.

Analysis on technology-directed speech usually focuses on mainstream kinds of U.S. English with out contemplating speaker teams which can be extra constantly misunderstood by expertise. In JASA Categorical Letters, revealed on behalf of the Acoustical Society of America by AIP Publishing, researchers from Google Analysis, the College of California, Davis, and Stanford College wished to deal with this hole.

One group generally misunderstood by voice expertise are people who communicate African American English, or AAE. Because the price of computerized speech recognition errors may be increased for AAE audio system, downstream results of linguistic discrimination in expertise could outcome.

“Throughout all computerized speech recognition techniques, 4 out of each ten phrases spoken by Black males had been being transcribed incorrectly,” mentioned co-author Zion Mengesha. “This impacts equity for African American English audio system in each establishment utilizing voice expertise, together with well being care and employment.”

“We noticed a possibility to raised perceive this drawback by speaking to Black customers and understanding their emotional, behavioral, and linguistic responses when partaking with voice expertise,” mentioned co-author Courtney Heldreth.

The workforce designed an experiment to check how AAE audio system adapt their speech when imagining speaking to a voice assistant, in comparison with speaking to a buddy, member of the family, or stranger. The examine examined acquainted human, unfamiliar human, and voice assistant-directed speech circumstances by evaluating speech price and pitch variation. Research contributors included 19 adults figuring out as Black or African American who had skilled points with voice expertise. Every participant requested a collection of inquiries to a voice assistant. The identical questions had been repeated as if chatting with a well-known individual and, once more, to a stranger. Every query was recorded for a complete of 153 recordings.

Evaluation of the recordings confirmed that the audio system exhibited two constant changes after they had been speaking to voice expertise in comparison with speaking to a different individual: a slower price of speech with much less pitch variation (extra monotone speech).

“These findings counsel that individuals have psychological fashions of how you can speak to expertise,” mentioned co-author Michelle Cohn. “A set ‘mode’ that they interact to be higher understood, in mild of disparities in speech recognition techniques.”

There are different teams misunderstood by voice expertise, comparable to second-language audio system. The researchers hope to increase the language varieties explored in human-computer interplay experiments and handle boundaries in expertise in order that it might assist everybody who needs to make use of it.

Supply hyperlink