Breaking News

PlayStation Plus Game Catalog for May 2025 ASUS Republic of Gamers Announces Strix OLED XG32U Series GIGABYTE AORUS MASTER 16 AI PC Wins COMPUTEX 2025 Best Choice Award addlink Virtual Showcase 2025: Explore what’s our next in storage SAMA Unveils New Gaming PC Hardware at COMPUTEX 2025

logo

  • Share Us
    • Facebook
    • Twitter
  • Home
  • Home
  • News
  • Reviews
  • Essays
  • Forum
  • Legacy
  • About
    • Submit News

    • Contact Us
    • Privacy

    • Promotion
    • Advertise

    • RSS Feed
    • Site Map

Search form

Google Improves Voice Search

Google Improves Voice Search

Enterprise & IT Sep 24,2015 0

Google voice search has taken a new turn by adopting new neural network acoustic models, which the company says they deliver greatly increased speech recognition accuracy. The new neural network acoustic models are using Connectionist Temporal Classification (CTC) and sequence discriminative training techniques. These models are a special extension of recurrent neural networks (RNNs) that are more accurate, especially in noisy environments, and they are significantly faster.

In a traditional speech recognizer, the waveform spoken by a user is split into small consecutive slices or "frames" of 10 milliseconds of audio. Each frame is analyzed for its frequency content, and the resulting feature vector is passed through an acoustic model such as a DNN (Deep Neural Network) that outputs a probability distribution over all the phonemes (sounds) in the model. A Hidden Markov Model (HMM) helps to impose some temporal structure on this sequence of probability distributions. This is then combined with other knowledge sources such as a Pronunciation Model that links sequences of sounds to valid words in the target language and a Language Model that expresses how likely given word sequences are in that language. The recognizer then reconciles all this information to determine the sentence the user is speaking. If the user speaks the word "museum" for example - /m j u z i @ m/ in phonetic notation - it may be hard to tell where the /j/ sound ends and where the /u/ starts, but in truth the recognizer doesn’t care where exactly that transition happens: All it cares about is that these sounds were spoken.

Google's improved acoustic models rely on Recurrent Neural Networks (RNN). RNNs have feedback loops in their topology, allowing them to model temporal dependencies: when the user speaks /u/ in the previous example, their articulatory apparatus is coming from a /j/ sound and from an /m/ sound before. Try saying it out loud - "museum" - it flows very naturally in one breath, and RNNs can capture that. The type of RNN used here is a Long Short-Term Memory (LSTM) RNN which, through memory cells and a sophisticated gating mechanism, memorizes information better than other RNNs. Adopting such models already improved the quality of our recognizer significantly.

The next step was to train the models to recognize phonemes in an utterance without requiring them to make a prediction for each time instant. With Connectionist Temporal Classification, the models are trained to output a sequence of "spikes" that reveals the sequence of sounds in the waveform. They can do this in any way as long as the sequence is correct.

The tricky part though was how to make this happen in real-time. After many iterations, Google's Speech Team managed to train streaming, unidirectional, models that consume the incoming audio in larger chunks than conventional models, but do actual computations less often. With this, the team reduced computations and made the recognizer much faster. They also added artificial noise and reverberation to the training data, making the recognizer more robust to ambient noise.

The new acoustic model was smart, but it would mean extra latency for users. Google solved this problem by training the model to output phoneme predictions much closer to the ground-truth timing of the speech.

Google's new acoustic models are now used for voice searches and commands in the Google app (on Android and iOS), and for dictation on Android devices.

Tags: Google
Previous Post
Tablet Installed Base to Fall in 2016
Next Post
Sprint Offers New iPhones With $1 Plan Trade-in Plan

Related Posts

  • Elevate your gameplay across mobile and PC

  • What’s new in Android 15, plus more updates

  • NVIDIA Teams Up With Google DeepMind to Drive Large Language Model Innovation

  • Google at CES 2024

  • Google introduces Gemini AI model

  • Google Cloud Launches AI-Powered Anti Money Laundering Product for Financial Institutions

  • Connecting all things Android at MWC Barcelona

  • Mercedes-Benz and Google Join Forces to Create Next-Generation Navigation Experience

Latest News

PlayStation Plus Game Catalog for May 2025
Gaming

PlayStation Plus Game Catalog for May 2025

ASUS Republic of Gamers Announces Strix OLED XG32U Series
Gaming

ASUS Republic of Gamers Announces Strix OLED XG32U Series

GIGABYTE AORUS MASTER 16 AI PC Wins COMPUTEX 2025 Best Choice Award
Consumer Electronics

GIGABYTE AORUS MASTER 16 AI PC Wins COMPUTEX 2025 Best Choice Award

addlink Virtual Showcase 2025: Explore what’s our next in storage
Enterprise & IT

addlink Virtual Showcase 2025: Explore what’s our next in storage

SAMA Unveils New Gaming PC Hardware at COMPUTEX 2025
Cooling Systems

SAMA Unveils New Gaming PC Hardware at COMPUTEX 2025

Popular Reviews

be quiet! Light Loop 360mm

be quiet! Light Loop 360mm

be quiet! Dark Rock 5

be quiet! Dark Rock 5

be quiet! Dark Mount Keyboard

be quiet! Dark Mount Keyboard

G.skill Trident Z5 Neo RGB DDR5-6000 64GB CL30

G.skill Trident Z5 Neo RGB DDR5-6000 64GB CL30

Arctic Liquid Freezer III 420 - 360

Arctic Liquid Freezer III 420 - 360

Crucial Pro OC 32GB DDR5-6000 CL36 White

Crucial Pro OC 32GB DDR5-6000 CL36 White

Crucial T705 2TB NVME White

Crucial T705 2TB NVME White

be quiet! Light Base 600 LX

be quiet! Light Base 600 LX

Main menu

  • Home
  • News
  • Reviews
  • Essays
  • Forum
  • Legacy
  • About
    • Submit News

    • Contact Us
    • Privacy

    • Promotion
    • Advertise

    • RSS Feed
    • Site Map
  • About
  • Privacy
  • Contact Us
  • Promotional Opportunities @ CdrInfo.com
  • Advertise on out site
  • Submit your News to our site
  • RSS Feed