How an Indian startup stole a march over tech giants at Indic language voice game

How an Indian startup stole a march over tech giants at Indic language voice game
Photo Credit: Photo Credit: Thinkstock

In the small but growing world of Indian entrepreneurs, Subodh Kumar’s career trajectory may sound all too common. He graduated from an elite engineering college, studied management at a top B-school, worked with global companies from New York to Hong Kong and came back to Bengaluru to start his own venture. It was then that his choice of work set his startup apart from the crowd.

Instead of joining the ranks of entrepreneurs launching food-tech, e-commerce, fintech or other consumer-focussed ventures, Kumar—an IIT Kharagpur and IIM Bangalore alumnus—started an artificial intelligence-based speech recognition startup along with IIT peers Sanjeev Kumar and Kishore Mundra. Earlier this week, the startup—Liv.ai—was acquired by Flipkart. The deal is significant for India’s biggest e-commerce company as it hopes Liv.ai’s speech-to-text technology in 10 Indic languages including Hindi, Punjabi and Tamil will help it expand its user base by adding customers who can’t use English to shop online.

How did the idea of setting up an AI-based speech recognition startup came to mind?


In fact, Subodh and Sanjeev didn’t start with a concrete business idea in mind when they returned to India. Former Microsoft executive Subodh and former Qualcomm executive Sanjeev began by researching on artificial intelligence, deep learning and phonology—the study of different patterns of sounds in different languages. Both were interested in solving problems related to voice in India.

“At the time people in the US were talking about how users could talk to TVs in English. At home, I saw my parents struggling with smartphones as they were not conversant in English and the touch interface was proving to be a challenge for them,” says Subodh. “I realised we could solve this problem and we started conducting research on how we can let Indian users interact with smart devices in their own language.”

Identifying the problem


Most tech companies, including giants such as Google, Microsoft and Amazon, realise that the touch interface in smartphones is not always intuitive, making voice a better option. As the world moves towards voice computing, the tech giants have been trying to add as many Indic languages as possible to their digital assistants—Google Assistant, Microsoft Cortana or Amazon Alexa—to help users do everything from shopping online to booking tables at a restaurant or making travel plans.

India, with a population of 1.3 billion, provides plenty of opportunities for these tech companies to grow. But it also offers challenges, with its dozens of languages and hundreds of dialects.

“These companies know that the road to the next one billion users leads through India. Half of those new users would want to use their devices in their own language, and not English,” says Navkender Singh, a senior analyst with market research firm IDC.


“These companies need data for these people to offer better customer services and experiences but they will not be able to do so if their AI-powered assistants don't understand the language they are speaking,” he adds.

In response to TechCircle’s queries for this article, Google reaffirmed that the Assistant in its mobile and smart home speaker avatars understands only Hindi and English. In its reply, Amazon said Alexa currently only supports Indian English and can recognise proper nouns (names of celebrities, places, songs and movie names) in Indian languages.

It’s this problem of dealing in multiple languages that Liv.ai, operated by Liv Artificial Intelligence Pvt. Ltd, has cracked. And that’s why it came on Flipkart’s radar.


Making a match

Some would argue that Flipkart, now majority-owned by US-based Walmart Inc., could have developed its own speech recognition technology instead of buying Liv.ai. Indeed, the matter was discussed at the highest levels within the company before the acquisition was finalised, according to a Flipkart executive who didn’t wish to be named.

Another option that Flipkart could have explored was to tie up with big tech companies such as Google or Microsoft for speech recognition technology. In a bid to take on Amazon, Walmart already has a partnership with Google in the US under which consumers can ask Google Assistant to shop for them.


However, Flipkart didn’t opt for such a tie-up since these companies didn’t have the Indic language ability. Also, tying up with a global tech company might have required Flipkart to share Indian consumer payments and behavioural data, which it likely wanted to avoid, according to Sandy Shen, research director at Gartner.

“Flipkart is looking to compete against Google and Amazon, so won’t be using their solutions for sure. We can expect to see some personal assistant services coming in the future in both the software and a smart device, for example a wireless speaker,” Shen said. “When the solution is sophisticated enough, Flipkart can develop that into a platform-as-a-service offering for third parties.”

Shen also said that Flipkart could have developed its own natural language processing engine and that part of that solution requires the speech-to-text and text-to-speech conversion technology. “If there is a good solution in the market at a reasonable price, it makes sense to buy rather than build,” she said.


A good solution

For Subodh and his team, however, solving the problem of getting Indic language capabilities wasn’t an easy task. Subodh says they had to face many challenges including the lack of powerful hardware. Another problem was the absence of an open-source dataflow programming library such as TensorFlow, which is needed to develop a natural language processing (NLU) system for creating algorithms to teach machines how to understand context in languages.

“We had to build everything from scratch. In order to train our systems we needed powerful hardware. So, we decided to use Nvidia GPUs instead of CPUs,” he says, referring to graphics processing units that can process data much faster than the commonly used central processing units. Subodh said the Nvidia GPUs had the so-called Maxwell architecture, which was based on a new design specifically for streaming multiprocessors that could improve energy efficiency.

Sanjeev’s knowledge of phonology and grammar of different languages helped, too. “We needed to understand phonetics of languages to solve the dialects part of speech recognition. Sanjeev mathematically drew up algorithms for different languages so that machines could learn faster using a technique similar to deep learning,” Subodh said. Deep learning is a method of machine learning that emulates the way humans learn.

To be sure, all tech giants are also working on newer speech delivery techniques. Google is developing a technology called Wavenet to make its Assistant sound more like a human while Microsoft has come up with a bot that can make phone calls for the Chinese market.

For software, Subodh said that since dataflow programming libraries such as TensorFlow didn't exist when they started, they took help from Theano—a similar library they found on software development platform GitHub.

Liv.ai had another problem to solve. It needed data to train its NLS systems to come out with the final product. Subodh was hesitant about telling how his company collected data.

However, analysts and industry executives feel one way of collecting large amounts of data in multiple languagues is tying up with telecom companies. Another way could be asking civilians to record audio for them in these languages. 

For instance, Google, which offers Indic languages support in its Search and Translate products, gathers data by holding events where it asks participants to translate clauses from one language to another. In fact, Google's Translate can even change languages via images. For example, if you aren't able to read a billboard because it is in an Indic language, you can point your camera via the app and it changes to English or the desired language.

Rijul Jain, a senior executive at Astarc Ventures, which had invested in Liv.ai at an early stage, said voice is a big area and many startups are trying to challenge the incumbents on specific fronts. While some are focussing on a particular set of languages, others could be working on specialised verticals such as financial applications or ec-ommerce. Still others could be focussing on products such as a mobile app or an home assistant, he said. 

"Each focus area requires different sets of data for the product to be accurate and so requires a different data collection strategy," Jain said.

Like Astarc Ventures, another investor who bet early on Liv.ai was Amod Malviya, who had invested in his personal capacity. Malviya, a former Flipkart chief technology officer, clarified that he had not played matchmaker for the acquisition.

Malviya said he had invested in Liv.ai because he thought the startup's technology was far superior that any other technology in the market at the time. “I knew what they had managed to do by then would prove to be a game-changer.”

Sign up for Newsletter

Select your Newsletter frequency