Breaking News

Panasonic Introduces the First Ultra-Telephoto Zoom Lens in the LUMIX S Series CORSAIR announces Vanguard Pro 96 and Vanguard 96 Gaming Keyboards Viltrox Spark Z3 TTL On-Camera Flash Transcend Launches Next-Gen microSD Express USD710S Logitech announces Signature Slim Solar+ K980 Keyboard

logo

  • Share Us
    • Facebook
    • Twitter
  • Home
  • Home
  • News
  • Reviews
  • Essays
  • Forum
  • Legacy
  • About
    • Submit News

    • Contact Us
    • Privacy

    • Promotion
    • Advertise

    • RSS Feed
    • Site Map

Search form

DeepMind Uses WaveNet technology to Reunite Speech-impaired Users with Their Original Voices

DeepMind Uses WaveNet technology to Reunite Speech-impaired Users with Their Original Voices

Enterprise & IT Dec 18,2019 0

A recent project undertook by DeepMind with with Google as part of Google’s Euphonia project, demonstrates an early proof of concept of how text-to-speech technologies can synthesise a natural sounding voice using minimal recorded speech data.

Losing one’s voice can be socially devastating. Today, the main option available to people to preserve their voice is message banking, wherein people with Amyotrophic lateral sclerosis (ALS, commonly known as Lou Gehrig’s disease) can digitally record and store personally meaningful phrases using their natural inflection and intonation. But message banking lacks flexibility, resulting in a static dataset of phrases.

DeepMind has been been collaborating with Google and people like ALS campaigner Tim Shaw to help develop technologies that can make it easier for people with speech difficulties to communicate. The challenges of this are two-fold. Firstly, the technology can recognise the speech of people with non-standard pronunciation–something Google AI has been researching through Project Euphonia. Secondly, people should ideally be able to communicate using their original voice. Stephen Hawking, who also suffered from ALS, communicated with a famously unnatural sounding text-to-speech synthesiser. Thus, the second challenge is customising text-to-speech technology to the user’s natural speaking voice.

With WaveNet and Tacotron, DeepMind has seen tremendous breakthroughs in the quality of text-to-speech systems. However, whilst it is possible to create natural sounding voices that sound like specific people in certain contexts developing synthetic voices requires many hours of studio recording time with a very specific script – a luxury that many people with ALS simply don’t have. Creating machine learning models that require less training data is an active area of research at DeepMind, and is crucial for use cases such as this where we need to recreate a voice with just a handful of audio recordings. DeepMind helped do this by harnessing the WaveNet work and the novel approaches demonstrated in a paper, Sample Efficient Adaptive Text-to-Speech (TTS).

Thanks to Tim’s time in the media spotlight, resulting in about thirty minutes of high-quality audio recordings, DeepMind's researchers were able to apply the methodologies from WaveNet and TTS to recreate his former voice.

Following a six-month effort, Google’s AI team visited Tim and his family to show him the results of their work. The meeting was captured for the new YouTube Originals learning series, “The Age of A.I.” hosted by Robert Downey Jr. Tim and his family were able to hear his old voice for the first time in years, as the model – trained on Tim’s NFL audio recordings – read out the letter he’d recently written to his younger self.

“I don’t remember that voice,” Tim remarked. His father responded, “we do.” Later, Tim recounted–"it has been so long since I've sounded like that, I feel like a new person. I felt like a missing part was put back in place. It's amazing. I'm just thankful that there are people in this world that will push the envelope to help other people."

How the technology works

WaveNet is a generative model trained on many hours of speech and text data from diverse speakers. It can then be fed arbitrary new text to be synthesized into a natural-sounding spoken sentence.

DeepMind has already illustrated that it’s possible to train a new voice with minutes, rather than hours, of voice recordings through a process called fine-tuning. This involves first training a large WaveNet model on up to thousands of speakers, which takes a few days, until it can produce the basics of natural sounding speech. Then, the researchers take the small corpus of data for the target speaker and intelligently adapt the model, adjusting the weights so that we can create a single model that matches the target speaker. The concept of fine-tuning is similar to how people learn. For example, if you are attempting to learn calculus, you should first understand the foundations of basic algebra, and then apply these simpler concepts to help solve more complex equations.

Later the researchers migrated from WaveNet to WaveRNN, which is a more efficient text to speech model co-developed by Google AI and DeepMind. WaveNet requires a second distillation step to speed it up to serve requests in real-time, which makes fine-tuning more challenging. WaveRNN, on the other hand, does not require a second training step and can synthesize speech much faster than a WaveNet model that has not been distilled.

In addition to speeding up the models by switching to WaveRNN, DeepMind's researches collaborated with Google AI to improve the quality of the models. Google AI researchers demonstrated that a similar fine-tuning approach could be applied to the related Google Tacotron model, which DeepMind uses in conjunction with WaveRNN to synthesise realistic voices. By combining these technologies trained on audio clips of Tim Shaw from his NFL days, the researchers were able to generate an authentic sounding voice that resembles how Tim sounded before his speech degraded. While the voice is not yet perfect – lacking the expressiveness, quirks, and controllability of a real voice, the combination of WaveRNN and Tacotron may help people like Tim preserve an important part of their identity, and one day the technology could be integrated it into speech-generation devices.

Tags: deepmind
Previous Post
IBM's New Battery Design Uses Seawater as Alternative to Heavy Metals
Next Post
Instagram Bans Promotions of Vaping, Tobacco, Alcohol and Diet Supplements

Related Posts

  • DeepMind Researchers Create Deep RL Agent That Outperforms Humans in the Atari Human Benchmark

  • Google AI System Could Used to Detect Breast Cancer Detection

  • DeepMind's Alphastar Achieved a Grandmaster Level at StarCraft II

  • DeepMind and Waymo Work on Training More Capable Self-driving Cars

  • DeepMind AI Beats Professional StarCraft II Players

  • Google DeepMind Go AI Opens Up New Horizons In Chess And Shogi Games

  • Deep Mind's Neural Scene Rendering System Predicts 3D Surroundings Using Its Own Sensors

  • Google DeepMind Makes AI Training Platform Available On GitHub

Latest News

Panasonic Introduces the First Ultra-Telephoto Zoom Lens in the LUMIX S Series
Cameras

Panasonic Introduces the First Ultra-Telephoto Zoom Lens in the LUMIX S Series

CORSAIR announces Vanguard Pro 96 and Vanguard 96 Gaming Keyboards
PC components

CORSAIR announces Vanguard Pro 96 and Vanguard 96 Gaming Keyboards

Viltrox Spark Z3 TTL On-Camera Flash
Cameras

Viltrox Spark Z3 TTL On-Camera Flash

Transcend Launches Next-Gen microSD Express USD710S
Cameras

Transcend Launches Next-Gen microSD Express USD710S

Logitech announces Signature Slim Solar+ K980 Keyboard
PC components

Logitech announces Signature Slim Solar+ K980 Keyboard

Popular Reviews

be quiet! Dark Mount Keyboard

be quiet! Dark Mount Keyboard

Terramaster F8-SSD

Terramaster F8-SSD

be quiet! Light Mount Keyboard

be quiet! Light Mount Keyboard

be quiet! Light Base 600 LX

be quiet! Light Base 600 LX

be quiet! Pure Base 501

be quiet! Pure Base 501

Soundpeats Pop Clip

Soundpeats Pop Clip

Akaso 360 Action camera

Akaso 360 Action camera

Dragon Touch Digital Calendar

Dragon Touch Digital Calendar

Main menu

  • Home
  • News
  • Reviews
  • Essays
  • Forum
  • Legacy
  • About
    • Submit News

    • Contact Us
    • Privacy

    • Promotion
    • Advertise

    • RSS Feed
    • Site Map
  • About
  • Privacy
  • Contact Us
  • Promotional Opportunities @ CdrInfo.com
  • Advertise on out site
  • Submit your News to our site
  • RSS Feed