Subscribe now

Columnist and Technology

How artificial intelligence is helping keep Indigenous languages alive

Communities in North America and New Zealand are working on teaching algorithms to understand Indigenous languages. But what happens when corporations get involved, asks Annalee Newitz 

By Annalee Newitz

27 September 2023

2MBG96A FILE - Books written in the Quechua Indigenous language sit behind a student during a class on medicinal plants, at a public primary school in Licapa, Peru, Wednesday, Sept. 1, 2021. About 10 million people speak Quechua, but trying to automatically translate emails and text messages into the most widely spoken Indigenous language family in the Americas was nearly impossible before Google introduced it into its digital translation service Wednesday, May 11, 2022. The internet giant says new artificial intelligence technology is enabling it to vastly expand Google Translate?s repertoire of the

Books written in Quechua sit behind a student at a primary school in Licapa, Peru

Associated Press/Alamy

LEARNING a language used to mean sitting in a classroom and memorising hundreds of words. But thanks to apps like Duolingo, you can take a quick French lesson on your phone between meetings. Or you can use Google Translate to help you read French instead. As miraculous as these apps are, they don’t work for everyone – especially if you speak an Indigenous language in regions like North America or New Zealand, where European settlers made a concerted and often violent effort to replace local languages with their own.

This was something that Michael Running Wolf thought about a lot when he was working as an engineer at Amazon on Alexa, the home assistant that responds to voice commands. Now a PhD student studying natural language processing at McGill University in Canada, Running Wolf grew up in Montana on the Northern Cheyenne Indian Reservation, where he learned the Cheyenne language by talking to his grandmother. When writing code for Alexa, he realised that its speech recognition algorithms would never work for his tribe’s language. That is because many North American Indigenous languages have a fundamentally different structure from English or Mandarin – two of the main languages that speech recognition software is designed for.

Many Indigenous languages in America are polysynthetic, which means that words change form depending on context. In English, we might say “the full green cup is mine”, but in a polysynthetic language you could express that phrase with a single word. “There are an infinite number of words,” Running Wolf explains. “Words are created on the fly and are highly contextual.” Teaching Alexa to recognise these words would require a new algorithm.

But that isn’t the only problem. To learn English, Alexa was trained on tens of thousands of hours of spoken English. There are no such data sets for most Indigenous languages, which might have only a few hours of recorded speech available to engineers.

Many Indigenous languages are polysynthetic, with a different structure from English or Mandarin

The answers started to come together after Running Wolf met Keoni Mahelona at an Indigenous AI conference. Working with the Māori community, Mahelona had bootstrapped a speech-recognition model using only 310 hours of te reo, the Māori language spoken in New Zealand. Mahelona is the chief technical officer of Te Hiku Media, which runs a radio station broadcasting in te reo – making it the perfect resource for an audio data set. Now that he has shown it is possible to build a language model with only hundreds of hours of data, there is a pathway for other Indigenous languages. Mahelona is currently working with a team on Papa Reo, a platform for te reo speech-recognition apps that will make it easier for New Zealanders to speak to their devices in te reo, depending on their needs .

Running Wolf has a different goal. He wants AI that can help people practise speaking their tribes’ languages. He imagines kids supplementing their school work by talking to an AI, which will recognise when they have mispronounced a word and gently correct them. He is also working with his wife, Caroline Running Wolf, a PhD student at the University of British Columbia, who is designing augmented reality (AR) experiences for people who want to learn their native languages in context. Working with members of the Kwagu’ł community on North Vancouver Island, she is creating an AR game that requires people to gather materials for a traditional potlatch feast in a virtual recreation of their ancestral lands. It is fun to steer a canoe and find ingredients while chatting in the Kwak’wala language, but it also “teaches cultural protocols”, she says. “It’s language in context,” she adds, which is the best way to learn.

With so many Indigenous language apps on the horizon, it would seem that Alexa might soon be chatting with Running Wolf’s grandmother in Cheyenne. But that is the opposite of what most Indigenous developers want. Mahelona has one message to large companies that intend to monetise Indigenous languages: “Just don’t.” He says they can work with Māori-run organisations like his own to access Indigenous languages through licensing or other agreements. The reasons for this go back to the days of colonisation: “Extremely wealthy corporations will profit off a language that was once beaten out of our grandparents by the government,” he says. “It’s another kick in the face.”

Running Wolf thinks similarly. He has heard some companies have tried to get audio data on the Cheyenne reservation by offering people a few bucks to speak into a recorder. That isn’t the right way to do it. “There’s nothing evil in extracting data,” he says, “[But] I would like a model that considers the economic development of the community.”

 

What I’m reading

Deb Chachra’s brilliant engineering manifesto, How Infrastructure Works.

What I’m watching

The silliest, cutest pirates ever in the live-action version of the anime One Piece.

What I’m working on

Brushing up on my French in Duolingo.

 

Annalee Newitz is a science journalist and author. Their latest novel is The Terraformers and they are the co-host of the Hugo-winning podcast Our Opinions Are Correct. You can follow them @annaleen and their website is techsploitation.com