Technology SMAC Artificial Intelligence

Microsoft is tapping academia to improve speech recognition for 3 Indian languages

Photo Credit: Pixabay

Shweta Sharma

7 Sep, 2018

The Indian arm of tech giant Microsoft is granting researchers access to speech data in three local languages in its quest to build more robust speech-recognition systems.

The company said in a statement that this will constitute speech training and test data for Telugu, Tamil and Gujarati and will include audio and corresponding transcripts.

This Indian language Speech Corpus content is being provided by Microsoft Research Open Data initiative, a collection of free datasets from Microsoft Research to advance state-of-the-art research in areas such as natural language processing, computer vision, and domain-specific sciences.

According to a company statement, this is the largest corpus of publicly-available Indian language speech data that researchers and other members of the academic world can use to build Indian language speech-recognition for voice-based applications.

“We believe India’s increasing digital literacy needs to be supported by a multilingual digital world,” said Sundar Srinivasan, general manager of artificial intelligence & research at Microsoft India. “Using our technology expertise, we want to accelerate innovation in voice-based computing for India by supporting researchers and academia.”

Microsoft’s Indian Language Speech Corpus was tested at Interspeech 2018, which is touted as the world’s largest and most comprehensive conference on the science and technology of spoken language processing.

In a Low Resource Speech Recognition Challenge, participants used data from Microsoft’s Indian language speech corpus to build Automatic Speech Recognition (ASR) systems. They were able to create high-quality speech recognition models using this data.

Microsoft has been working with Indian languages since the launch of Project Bhasha in 1998, allowing users to input localised text using the Indian Language Input tool.

Microsoft also recently announced support for email addresses in multiple Indian languages across most of its email apps and services.

Leave Your Comment(s)

Microsoft Speech Data Indian Languages speech recognition Microsoft Indian Language Speech Corpus Sundar Srinivasan

Microsoft is tapping academia to improve speech recognition for 3 Indian languages

Leave Your Comment(s)

SUBSCRIBE TO NEWSLETTERS

Most Popular

Women’s Day: Mid, senior-level women techies need more role models, upskilling opportunities

AI governance should be an intrinsic part of tech skilling: Geeta Gurnani, IBM

Gender-balanced cyber workforce can lead to greater efficiency: Kris Lovejoy

SUBSCRIBE TO NEWSLETTERS

Sign up for Newsletter

Leave Your Comment(s)

Sign up for Newsletter

SUBSCRIBE TO NEWSLETTERS

Most Popular

Women’s Day: Mid, senior-level women techies need more role models, upskilling opportunities

AI governance should be an intrinsic part of tech skilling: Geeta Gurnani, IBM

Gender-balanced cyber workforce can lead to greater efficiency: Kris Lovejoy

SUBSCRIBE TO NEWSLETTERS

TRENDING STORIES

Sign up for Newsletter