Google Voice Search To Zulu and South African English
Google today introduced Voice Search support for Zulu and
Afrikaans, as well as South African-accented English.
Google defines underrepresented languages as those which, while
spoken by millions, have little presence in electronic and
physical media, e.g., webpages, newspapers and magazines.
Underrepresented languages have also often received little
attention from the speech research community. Their phonetics,
grammar, acoustics, etc., havent been extensively studied,
making the development of ASR (automatic speech recognition)
voice search systems challenging.
Google believes that the speech research community needs to start working on many of these underrepresented languages to advance progress and build speech recognition, translation and other Natural Language Processing (NLP) technologies. The development of NLP technologies in these languages is critical for enabling information access for everybody. Indeed, these technologies have the potential to break language barriers.
Google has collaborated with the Multilingual Speech Technology group at South Africas North-West University led by Prof. Ettiene Barnard (also of the Meraka Research Institute), an authority in speech technology for South African languages. The company's development effort was spearheaded by Charl van Heerden, a South African intern and a student of Prof. Barnard. With the help of Prof. Barnards team, Google collected acoustic data in the three languages, developed lexicons and grammars, and Charl and others used those to develop the three Voice Search systems. A team of language specialists traveled to several cities collecting audio samples from hundreds of speakers in multiple acoustic conditions such as street noise, background speech, etc. Speakers were asked to read typical search queries into an Android app specifically designed for audio data collection.
For Zulu, Google faced the additional challenge of few text sources on the web. Google often analyzes the search queries from local versions of Google to build its lexicons and language models. However, for Zulu there werent enough queries to build a useful language model. Furthermore, since it has few online data sources, native speakers have learned to use a mix of Zulu and English when searching for information on the web. So for our Zulu Voice Search product, Google had to build a truly hybrid recognizer, allowing free mixture of both languages. Its phonetic inventory covers both English and Zulu and its grammars allow natural switching from Zulu to English, emulating speaker behavior.
This is Google's first release of Voice Search in a native African language.
Google believes that the speech research community needs to start working on many of these underrepresented languages to advance progress and build speech recognition, translation and other Natural Language Processing (NLP) technologies. The development of NLP technologies in these languages is critical for enabling information access for everybody. Indeed, these technologies have the potential to break language barriers.
Google has collaborated with the Multilingual Speech Technology group at South Africas North-West University led by Prof. Ettiene Barnard (also of the Meraka Research Institute), an authority in speech technology for South African languages. The company's development effort was spearheaded by Charl van Heerden, a South African intern and a student of Prof. Barnard. With the help of Prof. Barnards team, Google collected acoustic data in the three languages, developed lexicons and grammars, and Charl and others used those to develop the three Voice Search systems. A team of language specialists traveled to several cities collecting audio samples from hundreds of speakers in multiple acoustic conditions such as street noise, background speech, etc. Speakers were asked to read typical search queries into an Android app specifically designed for audio data collection.
For Zulu, Google faced the additional challenge of few text sources on the web. Google often analyzes the search queries from local versions of Google to build its lexicons and language models. However, for Zulu there werent enough queries to build a useful language model. Furthermore, since it has few online data sources, native speakers have learned to use a mix of Zulu and English when searching for information on the web. So for our Zulu Voice Search product, Google had to build a truly hybrid recognizer, allowing free mixture of both languages. Its phonetic inventory covers both English and Zulu and its grammars allow natural switching from Zulu to English, emulating speaker behavior.
This is Google's first release of Voice Search in a native African language.