Cracking the Human Language Code
Using machine learning and big data, a former particle physicist wants to teach you a foreign language in 200 hours―for free.
This article was first published on www.ca.com and has been republished with the author’s permission.
During his time as a researcher at the Large Hadron Collider in Switzerland, Estonian theoretical physicist Mait Müntel felt guilty that he had spent nine years in the country and not yet learned the lingua franca. So, he bought a book that promised to get him up to speed in French in 24 hours.
“In the first chapter, there was a conversation: ‘Hello, what are you doing?’ ‘I’m working in a garbage recycling factory,’” Müntel recalls. “And I thought, A garbage recycling factory―how statistically relevant is that? If I had learned ‘beer’, it would have been a million times more useful.”
Müntel’s frustration ultimately grew into the language learning app Lingvist, which debuted in 2014 with the claim that it can teach a new tongue in 200 hours, a promise Müntel says is borne out through the company’s own internal data. It’s an increasingly competitive field: Hundreds of startups, collectively valued at hundreds of millions of dollars, have staged a coup against a handful of old market leaders. Rosetta Stone, the once indomitable ruler of them all, has watched half of its value slip away from its 2013 peak.
Mining Pop Culture
Lingvist offers free instruction in French and English, and Müntel says the app has 350,000 registered users, all of whom “have just somehow found us” without active user acquisition. The material is culled from the most statistically used words in as many sources as the team can get their hands on by endlessly crunching data from books, the internet and films, to determine how people actually talk. When building his prototype for the program while at CERN, Müntel downloaded the subtitle files of about 40,000 movies translated from French to English and analyzed them for word frequency. “If you had looked at those movies one after another, you would spend two or three thousand years,” he says.
The process of learning those statistically useful words is powered by an adaptive learning backend that discovers when users are more likely to forget something and then reintroduces the learned material at the moment it is at risk of slipping away. The personalization matters: Too much repetition and you’re wasting time; too little and you’re not learning much at all.
Müntel’s machine-learning algorithm, initially built with the ROOT data analysis framework used by scientists at CERN, so impressed Jaan Tallinn, the Estonian computer scientist who co-created Skype, that Tallinn stopped work on his own app to become a co-founder of Lingvist and one of its early investors. Since then, they’ve hired a small army of linguists to stock the product with content, write translations and test additional theories of how people best pick up new languages. “There are 60-plus language learning theories, most of them being contradictory,” Müntel says. The solution? “Constant A/B testing.”
Tallinn and Müntel see a possible B2B product down the line, to help companies retool their employees to quickly become multilingual. And more broadly, the nuts and bolts of Lingvist’s machine learning allow for potential deployment across any number of learning programs, outside of language, both for intracompany training and in MOOCs and online education, such as Udacity. Tallinn says the company could one day roll out APIs for non-language purposes, but it’s not yet a priority.
“A lot of Lingvist’s backend is just learning how people learn”, he says. “That said, it’s important for a startup to stay focused. So for now, Lingvist is solely in the language business.”