Follow for more talkers

AI deepfake voices now indistinguishable from human speech

People were asked to evaluate which voices sounded most realistic and which sounded most dominant or trustworthy.

Avatar photo

Published

on
(Photo by Catherine Breslin via Unsplash)

By Stephen Beech

AI-generated "deepfake" voices are now indistinguishable from real human voices, warns new research.

The study shows that the average listener can no longer distinguish between computer-simulated voices and those of real human beings.

Many people still think of AI-generated speech as sounding “fake” or unconvincing and easily told apart from human voices, say scientists.

But the Queen Mary University of London (QMUL) study shows that AI voice technology has now reached a stage where it can create “voice clones” or deepfakes which sound just as realistic as human recordings.

The study, published in the journal PLOS One, compared real human voices with two different types of synthetic voices, generated using state-of-the-art AI voice synthesis tools.

Some were “cloned” from voice recordings of real humans, intended to mimic them, while others were generated from a large voice model and did not have a specific human counterpart.

Study participants were asked to evaluate which voices sounded most realistic and which sounded most dominant or trustworthy.

(Photo by Solen Feyissa via Pexels)

The research team also looked at whether AI-generated voices had become “hyperreal," given that some studies have shown that AI-generated images of faces are now judged to be human more often than images of real human faces.

While the study did not find a “hyperrealism effect” from the AI voices, it did show that voice clones can sound as real as human voices, making it difficult for listeners to distinguish between them.

Both types of AI-generated voices were evaluated as more dominant than human voices, and some were also perceived as more trustworthy.

Study co-leader Dr. Nadine Lavan, senior lecturer in psychology at QMUL, said: “AI-generated voices are all around us now.

"We’ve all spoken to Alexa or Siri, or had our calls taken by automated customer service systems.

“Those things don’t quite sound like real human voices, but it was only a matter of time until AI technology began to produce naturalistic, human-sounding speech.

"Our study shows that this time has come, and we urgently need to understand how people perceive these realistic voices.”

(Photo by Pawel Czerwinski via Unsplash)

Dr. Lavan pointed out how easily and quickly the team had been able to create clones, or deepfakes, of real voices - with the consent of their owners - using commercially available software.

She said: “The process required minimal expertise, only a few minutes of voice recordings, and almost no money.

“It just shows how accessible and sophisticated AI voice technology has become.”

Dr. Lavan says the pace of improvement has been "very rapid" and carries implications for ethics, copyright, and security - especially in areas such as fake news, fraud, and impersonation.

But she added, "The ability to generate realistic voices at scale opens up exciting opportunities.

“There might be applications for improved accessibility, education, and communication, where bespoke high-quality synthetic voices can enhance user experience.”

Stories and infographics by ‘Talker Research’ are available & ready to use. Stories and videos by ‘Talker News’ are managed by Talker Inc. For queries, please submit an inquiry via our contact form.

Top Talkers