Using artificial intelligence, researchers at social networking giant Facebook have developed a faster and more accurate way of translating low-density languages such as Urdu and Burmese, according to a report in Forbes.
The report said that the research could prove significant for Facebook, as it uses automatic language translation to help its users around the world read posts in their preferred language.
The breakthrough will be presented at the Empirical Methods in Natural Language Processing (EMNLP) conference to be held in Belgium next month, the report added.
The Facebook AI Research (FAIR) team was able to train a machine translation (MT) system by feeding it large pieces of different text in various languages from public websites such as Wikipedia. These pieces of text were independent of one another. Having different pieces of text in the same language is known as monolingual corpora.
Existing machine translation systems can achieve near human-level performance for some languages but they require access to a parallel corpus — vast quantities of the same sentences in different languages — for it to learn.
Antoine Bordes, a research scientist and the head of FAIR's Paris research lab, was quoted as saying that building a parallel corpus is complicated as people fluent in two languages are required to create it.
“For instance, if you wanted to build a parallel corpus of Portuguese/Nepali, you would need to find people fluent in these two languages, which would be very difficult," said Bordes.
According to the report, most language translation computer systems use both monolingual corpora and a parallel corpus to learn.
Bordes added that the novelty in their approach is that they can train MT systems from monolingual corpora only, and don't require a parallel corpus.
“Potentially, given a book written in an alien language, we could use our model to translate it into English," he said.