Should countries build their own AIs?

AI will soon touch many parts of our lives. But it doesn’t have to be controlled by big tech companies

By Chris Stokel-Walker
9 June 2023
feature

The generative AI revolution is here, and it is expected to increase global GDP by 7% in the next decade. Right now, those profits will mostly be swept up by a handful of private companies dominating the sector, with OpenAI and Google leading the pack.

This poses problems for governments as they grapple with the prospect of integrating AI into the way they operate. It’s likely that AI will soon touch many parts of our lives, but it doesn’t need to be an AI controlled by the likes of OpenAI and Google.

The Tony Blair Institute for Global Change, a London-based think tank, recently began advocating for the U.K. to create its own sovereign AI model — an initiative that some British media outlets have dubbed “ChatGB.” The idea is to create a British-flavored tech backbone that underpins large swaths of public services, free from the control of major U.S.-based platforms. Being “entirely dependent on external providers,” says the Institute, would be a “risk to our national security and economic competitiveness.”

Sovereign AIs stand in stark contrast to the most prominent tools of the moment. The large language models that underpin tools like OpenAI’s ChatGPT are built using data scraped from across the internet, and their inner workings are controlled by private enterprises.

In a 100-page “technical report” accompanying the release of GPT-4, its latest large language model, OpenAI declined to share information about how its model was trained or what information it was trained on, citing safety risks and “the competitive landscape” (read: “we don’t want competitors to see how we built our tech”). The decision was widely criticized. Indeed, the company could put its code out there and cleanse data sets to avoid posing any risk to individuals’ data privacy or safety. This kind of transparency would allow experts to audit the model and identify any risks it might pose.

Developing a sovereign AI would allow countries to know how their model was trained and what data it was trained on, according to Benedict Macon-Cooney, the chief policy strategist at the Tony Blair Institute.

“It allows you to — to some extent — instill your values in the model,” said Sasha Luccioni, a research scientist at HuggingFace, an open source AI platform and research group. “Each model does encode values.” Indeed, while 96% of the planet lives outside the United States, most big tech products are developed by a tiny, relatively elite group of people in the U.S. who tend to build technology encoded with libertarian, Silicon Valley-style ideals.

That’s been true for social media historically, and it is also coming through with AI: A 2022 academic paper by researchers from HuggingFace showed that the ghost in the AI machine has an American accent — meaning that most of the training data, and most of the people coding the model itself, are American. “The cultural stereotypes that are encoded are very, very American,” said Luccioni. But with a sovereign AI model, Luccioni says, “you can choose sources that come from your country, and you can choose the dialects that come from your country.”

That’s vital given the preponderance of English-language models and the paucity of AI models in other languages. While there are more than 7,000 languages spoken and written worldwide, the vast majority of the internet, upon which these models are trained, is written in English. “English is the dominant language, because of British imperialism and because of American trade,” said Aliya Bhatia, a policy analyst at the Center for Democracy & Technology, who recently published a paper on the issue. “These models are trained on a predominant model of English language data and carry over these assumptions and values that are encoded into the English language, specifically the American English language.”

A big exception, of course, is China. Models developed by Chinese companies are sovereign almost by default because they are built using data that is drawn primarily from the internet in China, where the information ecosystem is heavily influenced by the state and the Communist party. Nevertheless, China’s economy is big enough that it is able to sustain independent development of robust tools. “I think the goal isn’t necessarily that everything be made in China or innovated in China, but it’s to avoid reliance on foreign countries,” said Graham Webster, a research scholar and the editor-in-chief of the DigiChina Project at Stanford University’s Cyber Policy Center.

There are lots of ways to develop such models, according to Macon-Cooney, of the Blair Institute, some of which could become highly specific to government interests. “You can actually build large language models around specific ideas,” he explained. “One practical example where a government might want to do that is building a policy Al.” The model would be fed previously published policy papers going back decades, many of which are often scrapped only to be brought back by a successive government, thus building up the model’s understanding of policy that could then be used to reduce the workload on public servants. Similar models could be developed for education or health, says Macon-Cooney. “You just need to find a use case for your actual specific outcome, which the government needs to do,” he said. “Then begin to build up that capability, feed in the right learnings, and build that expertise up in-house.”

The European Union is a prime example of a supranational organization that could benefit from its vast data reserves to make its own sovereign AI, says Luccioni. “They have a lot of underexploited data,” she said, pointing to the multilingual corpus of the European Parliament’s hearings, for instance. The same is true of India, where the controversial Aadhaar digital identification system could put the vast volumes of data it collects to use to develop an AI model. India’s ministers have already hinted they are doing just that and have confirmed in interviews that AI will soon be layered into the Aadhaar system. In a multilingual country like India, that comes with its own problems. “We’re seeing a large push towards Hindi becoming a national language, at the expense of the regional and linguistic diversity of the country,” said Bhatia.

Developing your own AI costs a lot of money — which Macon-Cooney says governments might struggle with. “If you look at the economics side of this, I think there is a deep question of whether a government can actually begin to spend, let alone actually begin to get that expertise, in house,” he said. The U.K. announced, in its March 2023 budget, a plan to spend $1.1 billion on a new exascale supercomputer that would be put to work developing AI. A month later, it topped that up with an additional $124 million to fund an AI taskforce that will be supported by the Alan Turing Institute, a government-affiliated research center that gets its name from one of the first innovators of AI.

One solution to the money problem is to collaborate. “Sovereign initiatives can’t really work because any one nation or one organization is, unless they’re very, very rich, going to have trouble getting the talent to compute and the data necessary for training language models,” Luccioni said. “It really makes a lot of sense for people to pool resources.”

But working together can nullify the reason sovereign AIs are so attractive in the first place.

Luccioni believes that the European Union will struggle to develop a sovereign AI because of the number of stakeholders involved who would have to coalesce around a single position to develop the model in the first place. “What happens if there’s 13% Basque in the data and 21% Finnish?” she asked. “It’s going to come with a lot of red tape that companies don’t have, and so it’s going to be hard to be as agile as OpenAI.” Finland for its part has developed a sovereign AI project, called Aurora, that is meant to streamline processes for providing a range of services for citizens. But progress has been slow, mostly due to the project’s scale.

There’s also the challenge of securing the underlying hardware. While the U.K. has announced $1 billion in funding for the development of its exascale computer, it pales in comparison with what OpenAI has. “They have 27 times the size just to run ChatGPT than the whole of the British state has itself,” Macon-Cooney said. “So one private lab is many, many magnitudes bigger than the government.” That could force governments looking to develop sovereign models into the arms of the same old tech companies under the guise of supplying cloud computing to train the models — which comes with its own problems.

And even if you can bring down the computing power — and the associated costs — needed to run a sovereign AI model, you still need the expertise. Governments may struggle to attract talent in an industry dominated by private sector companies that can likely pay more and offer more opportunities to innovate.

“The U.K. will be blown out of the water unless it begins to think quite deliberately about how it builds this up,” said Macon-Cooney.

Luccioni sees some signs of promise for countries looking to develop their own AIs, with talented developers wanting to work differently. “I know a lot of my friends who are working at big research companies and big tech companies are getting really frustrated by the closed nature of them,” she said. “A lot of them are talking about going back to academia — or even government.”

The story you just read is a small piece of a complex and an ever-changing storyline that Coda covers relentlessly and with singular focus. But we can’t do it without your help. Show your support for journalism that stays on the story by becoming a member today. Coda Story is a 501(c)3 U.S. non-profit. Your contribution to Coda Story is tax deductible.

Support Coda