Breaking News

Leica Ultravid 8x32 HD-Plus Special Edition in brown leather BIOSTAR INTRODUCES THE BIRPL-PAT INDUSTRIAL MOTHERBOARD Sony Electronics and The Associated Press complete testing of advanced In-Camera authenticity technology TEAMGROUP Launches T-FORCE SIREN GD120S AIO SSD Cooler - An Exceptional AIO M.2 2280 SSD Liquid Cooler COUGAR Introduces the Hotrod – a motorsports-inspired gaming chair designed to support extreme gaming performance

logo

  • Share Us
    • Facebook
    • Twitter
  • Home
  • Home
  • News
  • Reviews
  • Essays
  • Forum
  • Legacy
  • About
    • Submit News

    • Contact Us
    • Privacy

    • Promotion
    • Advertise

    • RSS Feed
    • Site Map

Search form

DeepMind Uses WaveNet technology to Reunite Speech-impaired Users with Their Original Voices

DeepMind Uses WaveNet technology to Reunite Speech-impaired Users with Their Original Voices

Enterprise & IT Dec 18,2019 0

A recent project undertook by DeepMind with with Google as part of Google’s Euphonia project, demonstrates an early proof of concept of how text-to-speech technologies can synthesise a natural sounding voice using minimal recorded speech data.

Losing one’s voice can be socially devastating. Today, the main option available to people to preserve their voice is message banking, wherein people with Amyotrophic lateral sclerosis (ALS, commonly known as Lou Gehrig’s disease) can digitally record and store personally meaningful phrases using their natural inflection and intonation. But message banking lacks flexibility, resulting in a static dataset of phrases.

DeepMind has been been collaborating with Google and people like ALS campaigner Tim Shaw to help develop technologies that can make it easier for people with speech difficulties to communicate. The challenges of this are two-fold. Firstly, the technology can recognise the speech of people with non-standard pronunciation–something Google AI has been researching through Project Euphonia. Secondly, people should ideally be able to communicate using their original voice. Stephen Hawking, who also suffered from ALS, communicated with a famously unnatural sounding text-to-speech synthesiser. Thus, the second challenge is customising text-to-speech technology to the user’s natural speaking voice.

With WaveNet and Tacotron, DeepMind has seen tremendous breakthroughs in the quality of text-to-speech systems. However, whilst it is possible to create natural sounding voices that sound like specific people in certain contexts developing synthetic voices requires many hours of studio recording time with a very specific script – a luxury that many people with ALS simply don’t have. Creating machine learning models that require less training data is an active area of research at DeepMind, and is crucial for use cases such as this where we need to recreate a voice with just a handful of audio recordings. DeepMind helped do this by harnessing the WaveNet work and the novel approaches demonstrated in a paper, Sample Efficient Adaptive Text-to-Speech (TTS).

Thanks to Tim’s time in the media spotlight, resulting in about thirty minutes of high-quality audio recordings, DeepMind's researchers were able to apply the methodologies from WaveNet and TTS to recreate his former voice.

Following a six-month effort, Google’s AI team visited Tim and his family to show him the results of their work. The meeting was captured for the new YouTube Originals learning series, “The Age of A.I.” hosted by Robert Downey Jr. Tim and his family were able to hear his old voice for the first time in years, as the model – trained on Tim’s NFL audio recordings – read out the letter he’d recently written to his younger self.

“I don’t remember that voice,” Tim remarked. His father responded, “we do.” Later, Tim recounted–"it has been so long since I've sounded like that, I feel like a new person. I felt like a missing part was put back in place. It's amazing. I'm just thankful that there are people in this world that will push the envelope to help other people."

How the technology works

WaveNet is a generative model trained on many hours of speech and text data from diverse speakers. It can then be fed arbitrary new text to be synthesized into a natural-sounding spoken sentence.

DeepMind has already illustrated that it’s possible to train a new voice with minutes, rather than hours, of voice recordings through a process called fine-tuning. This involves first training a large WaveNet model on up to thousands of speakers, which takes a few days, until it can produce the basics of natural sounding speech. Then, the researchers take the small corpus of data for the target speaker and intelligently adapt the model, adjusting the weights so that we can create a single model that matches the target speaker. The concept of fine-tuning is similar to how people learn. For example, if you are attempting to learn calculus, you should first understand the foundations of basic algebra, and then apply these simpler concepts to help solve more complex equations.

Later the researchers migrated from WaveNet to WaveRNN, which is a more efficient text to speech model co-developed by Google AI and DeepMind. WaveNet requires a second distillation step to speed it up to serve requests in real-time, which makes fine-tuning more challenging. WaveRNN, on the other hand, does not require a second training step and can synthesize speech much faster than a WaveNet model that has not been distilled.

In addition to speeding up the models by switching to WaveRNN, DeepMind's researches collaborated with Google AI to improve the quality of the models. Google AI researchers demonstrated that a similar fine-tuning approach could be applied to the related Google Tacotron model, which DeepMind uses in conjunction with WaveRNN to synthesise realistic voices. By combining these technologies trained on audio clips of Tim Shaw from his NFL days, the researchers were able to generate an authentic sounding voice that resembles how Tim sounded before his speech degraded. While the voice is not yet perfect – lacking the expressiveness, quirks, and controllability of a real voice, the combination of WaveRNN and Tacotron may help people like Tim preserve an important part of their identity, and one day the technology could be integrated it into speech-generation devices.

Tags: deepmind
Previous Post
IBM's New Battery Design Uses Seawater as Alternative to Heavy Metals
Next Post
Instagram Bans Promotions of Vaping, Tobacco, Alcohol and Diet Supplements

Related Posts

  • DeepMind Researchers Create Deep RL Agent That Outperforms Humans in the Atari Human Benchmark

  • Google AI System Could Used to Detect Breast Cancer Detection

  • DeepMind's Alphastar Achieved a Grandmaster Level at StarCraft II

  • DeepMind and Waymo Work on Training More Capable Self-driving Cars

  • DeepMind AI Beats Professional StarCraft II Players

  • Google DeepMind Go AI Opens Up New Horizons In Chess And Shogi Games

  • Deep Mind's Neural Scene Rendering System Predicts 3D Surroundings Using Its Own Sensors

  • Google DeepMind Makes AI Training Platform Available On GitHub

Latest News

Leica Ultravid 8x32 HD-Plus Special Edition in brown leather
Consumer Electronics

Leica Ultravid 8x32 HD-Plus Special Edition in brown leather

BIOSTAR INTRODUCES THE BIRPL-PAT INDUSTRIAL MOTHERBOARD
Enterprise & IT

BIOSTAR INTRODUCES THE BIRPL-PAT INDUSTRIAL MOTHERBOARD

Sony Electronics and The Associated Press complete testing of advanced In-Camera authenticity technology
Cameras

Sony Electronics and The Associated Press complete testing of advanced In-Camera authenticity technology

TEAMGROUP Launches T-FORCE SIREN GD120S AIO SSD Cooler - An Exceptional AIO M.2 2280 SSD Liquid Cooler
Cooling Systems

TEAMGROUP Launches T-FORCE SIREN GD120S AIO SSD Cooler - An Exceptional AIO M.2 2280 SSD Liquid Cooler

COUGAR Introduces the Hotrod – a motorsports-inspired gaming chair designed to support extreme gaming performance
Gaming

COUGAR Introduces the Hotrod – a motorsports-inspired gaming chair designed to support extreme gaming performance

Popular Reviews

Pioneer BDR-S13U-X Blu-Ray Recorder

Pioneer BDR-S13U-X Blu-Ray Recorder

Arctic Liquid Freezer II 360 Α-RGB

Arctic Liquid Freezer II 360 Α-RGB

Pioneer BDR-X13U-S

Pioneer BDR-X13U-S

Pioneer BDR-XD08UMB-S External Blu-Ray Recorder

Pioneer BDR-XD08UMB-S External Blu-Ray Recorder

Verbatim External 4K Slimline Blu-Ray Recorder

Verbatim External 4K Slimline Blu-Ray Recorder

Surefire KINGPIN M2 Keyboard

Surefire KINGPIN M2 Keyboard

Samsung 970 EVO Plus 2TB NVME SSD

Samsung 970 EVO Plus 2TB NVME SSD

Crucial X8 4TB PortableSSD

Crucial X8 4TB PortableSSD

Main menu

  • Home
  • News
  • Reviews
  • Essays
  • Forum
  • Legacy
  • About
    • Submit News

    • Contact Us
    • Privacy

    • Promotion
    • Advertise

    • RSS Feed
    • Site Map
  • About
  • Privacy
  • Contact Us
  • Promotional Opportunities @ CdrInfo.com
  • Advertise on out site
  • Submit your News to our site
  • RSS Feed