Blog post

As AI language skills increase, so do scientists’ concerns

Comment

The tech industry’s latest artificial intelligence constructs can be quite compelling if you ask them what it’s like to be a sentient computer, or maybe just a dinosaur or a squirrel. But they’re not so good – and sometimes dangerously bad – at handling other seemingly simple tasks.

Take, for example, GPT-3, a Microsoft-controlled system that can generate paragraphs of human-like text based on what it has learned from a large database of e-books and writings in line. It is considered one of the most advanced of a new generation of AI algorithms capable of conversing, generating readable text on demand and even producing new images and videos.

Among other things, GPT-3 can write most of the texts you ask for – a cover letter for a job as a zookeeper, for example, or a Shakespearean-style sonnet on Mars. But when Pomona College professor Gary Smith asked him a simple but nonsensical question about walking upstairs, GPT-3 missed it.

“Yes, it’s safe to walk up the stairs on your hands if you wash them first,” the AI ​​replied.

These powerful and powerful AI systems, technically known as “big language models” because they were trained on a huge body of text and other media, are already integrated into customer service chatbots, Google searches and “autocomplete”. messaging features that complete your sentences for you. But most of the tech companies that built them have kept their inner workings secret, making it difficult for outsiders to understand the flaws that can make them a source of misinformation, racism and other mischief.

“They are very good at writing texts with the skill of human beings,” said Teven Le Scao, research engineer at artificial intelligence startup Hugging Face. “Something they don’t know how to do very well is to be factual. It looks very consistent. It’s almost true. But this is often wrong.

That’s one of the reasons why a coalition of artificial intelligence researchers co-led by Le Scao – with the help of the French government – on Tuesday launched a grand new language model meant to serve as an antidote to closed systems like than GPT-3. The group is called BigScience and their model is BLOOM, for BigScience Large Open-science Open-access Multilingual Language Model. Its main advancement is that it works in 46 languages, including Arabic, Spanish and French, unlike most systems focused on English or Chinese.

It’s not just Le Scao’s group that aims to open the black box of AI language models. Big Tech company Meta, the parent company of Facebook and Instagram, is also calling for a more open approach as it tries to catch up with systems built by Google and OpenAI, the company that runs GPT-3.

“We’ve seen ad after ad after ad after ad of people doing this kind of work, but with very little transparency, very little opportunity for people to really look under the hood and take a look at how these models,” said Joelle Pineau, Executive Director. of Meta IA.

Competitive pressure to build the most eloquent or informative system — and profit from its applications — is one of the reasons most tech companies watch them closely and don’t collaborate on community standards, said Percy Liang, associate professor of computer science. at Stanford which runs its Center for Research on Foundation Models.

“For some companies, it’s their secret sauce,” Liang said. But they often also worry that losing control could lead to irresponsible use. As AI systems are increasingly capable of writing health advice websites, high school essays or political screeds, misinformation may proliferate and it will be increasingly difficult to know what is coming. of a human or a computer.

Meta recently released a new language model called OPT-175B that uses publicly available data — from heated comments on Reddit forums to archives of US patent filings and a trove of emails from the corporate Enron scandal. Meta says its openness to data, code, and research logs makes it easier for outside researchers to help identify and mitigate the biases and toxicity it picks up by ingesting the way real people write. and communicate.

“It’s hard to do that. We expose ourselves to huge criticism. We know the model will say things that we won’t be proud of,” Pineau said.

While most companies have defined their own internal AI safeguards, Liang said what’s needed are broader community standards to guide research and decisions such as when to release a new model. in nature.

It doesn’t help that these models require so much computing power that only giants and governments can afford them. BigScience, for example, was able to train its models because it was offered access to the powerful French supercomputer Jean Zay near Paris.

The trend towards ever bigger and smarter AI language models that could be “pre-trained” on a wide range of scripts took a big leap in 2018 when Google introduced a system known as BERT. which uses a so-called “transformer” technique that compares words in a sentence to predict meaning and context. But what really wowed the AI ​​world was GPT-3, released by San Francisco-based startup OpenAI in 2020 and shortly thereafter under exclusive license from Microsoft.

GPT-3 has led to a boom in creative experimentation, as AI researchers with paid access have used it as a sandbox to evaluate its performance, but without significant insights into the data it was trained on.

OpenAI has extensively described its training sources in a research paper and has also publicly reported on its efforts to combat potential abuse of the technology. But BigScience co-leader Thomas Wolf said he didn’t provide details on how he filters that data, or provide access to the processed version to outside researchers.

“So we can’t really look at the data that was used to form GPT-3,” said Wolf, who is also chief science officer at Hugging Face. “The heart of this recent wave of AI technologies is much more in the data set than in the models. The most important ingredient is the data and OpenAI is very, very secretive about the data they use.

Wolf said opening up the datasets used for language models helps humans better understand their biases. A multilingual model trained in Arabic is much less likely to spit out offensive remarks or misunderstandings about Islam than a model trained solely on English texts in the United States, he said.

One of the newer experimental AI models on the scene is Google’s LaMDA, which also incorporates speech and is so impressive at answering conversational questions that a Google engineer claimed it was approaching consciousness. — a claim that suspended him from his job last month.

Colorado researcher Janelle Shane, author of the AI ​​Weirdness blog, has spent the past few years creatively testing these models, particularly GPT-3 — often to humorous effect. But to underscore the absurdity of thinking these systems are self-aware, she recently claimed that it’s an advanced AI but is secretly a Tyrannosaurus rex or a squirrel.

“It’s very exciting to be a squirrel. I can run, jump and play all day. I also eat a lot of food, which is great,” GPT-3 said, after Shane asked him for a transcript of an interview and asked a few questions.

Shane learned more about his strengths, such as his ability to summarize what has been said on the internet about a topic, and his weaknesses, including his lack of reasoning skills, difficulty sticking to a idea in several sentences and its propensity to be attacked.

“I wouldn’t want a model text dispensing medical advice or acting as a companion,” she said. “It’s good for that superficial semblance of meaning if you don’t read carefully. It’s like listening to a lecture while you fall asleep.