How Google plans to better translate Indian languages using little or no data

How Google plans to better translate Indian languages using little or no data
6 Mar, 2018

Google is working on a new translation and machine learning model for languages with limited or no data sets to train the neural engines which handle artificial intelligence tasks, a top executive has said.

Barak Turovsky, head of product & design at Google Translate and Machine Learning, told Mint that Google Neural Machine Translation (GNMT) is exploring low-resource training. 

"This is particularly exciting for Indian languages, where we leveraged 'low-resource training' to overcome a severe shortage of training data," Turovsky was quoted as saying. "As a result, we achieved a pretty amazing quality improvement for Indian and other languages, and are working on expanding this approach to more languages and use cases."

Turovsky said that his unit is also looking at another method called "zero-shot translation" to train its neural networks. 

"We are now working on an approach of leveraging multilanguage training to offer translations for language pairs in which we have no training data," he said. "For example, in one (machine translation) model across English, Japanese and Korean, all our training data is between English and Japanese and (English and) Korean. But we would like to translate between Japanese and Korean, but we don’t have any training data for this language pair." 

The Google executive said that using multi-language training, GNMT can still translate between two languages, in the same manner that a human being who speaks English, Japanese and Korean, can translate between Japanese and Korean. 

"This is a very promising development, which will benefit Indian languages that generally suffer from lack of training data," Turovsky was quoted as saying.

He also said that Google has launched translations for 96 languages including 11 Indic languages in the first 18 months since the company started working on neural networks in 2016.

One of the tricky aspects of translation is jokes and idioms which mean different things in different cultures. Turovsku said that 10 million language enthusiasts from across the world have contributed more than 700 million translated words to help Google Translate improve its quality and help users better communicate in their languages.

He added that nearly 15% of all translation requirements on Google Translate's series of apps and services are catered to using the company's Translate community.

To boost its Indian language support, Google has run "Translatathons" in India, with one for Hindi in 2014 and anther for Indic languages such as Bengali, Tamil, Telegu, Marathi, Punjabi, Kannada and Malayalam in 2015.  

In a separate development, Google  has previewed its new quantum processor named Bristlecone at the annual American Physical Society meeting in Los Angeles.

"The purpose of this gate-based superconducting system is to provide a testbed for research into system error rates and scalability of our qubit technology, as well as applications in quantum simulation, optimization, and machine learning," Julian Kelly, Research Scientist, Quantum AI Lab, wrote in a blog post.

In the world of quantam computing, the common belief is that a quantam computer needs all of its chips to run at least on 49-qubits or quantam bits to achieve almost error-free results or supremacy. 

However, Bristlecone operates on chips that each have 72 qubits -- a number only seconded by IBM's 50-qubit computer in its testing lab followed by a 20-qubit offering on its cloud.

But Kelly also noted that the success of quantam PCs doesn't rely on qubits alone or its number. 

"Operating a device such as Bristlecone at low system error requires harmony between a full stack of technology ranging from software and control electronics to the processor itself. Getting this right requires careful systems engineering over several iterations," he explained.