Smarter AIs Are Actually More Likely To Fabricate Facts Instead Of Turning Down Questions They’re Unable To Answer

With each algorithm that is created, large language models (LLMs) are getting increasingly intelligent and powerful.
That means they can provide more accurate information. But new research has suggested that smarter AI chatbots are actually becoming less reliable because they are more likely to fabricate facts instead of turning down questions they are unable to answer.
In a new study, researchers examined some of the industry’s leading LLMs, including OpenAI’s GPT, Meta’s LLaMA, and an open source model called BLOOM developed by the research group BigScience.
It was discovered that their responses were becoming more accurate in many cases, but across the board, they were less trustworthy and produced a greater proportion of wrong answers compared to older models.
“They are answering almost everything these days. And that means more correct, but also more incorrect [answers],” said José Hernández-Orallo, a co-author of the study and a researcher at the Valencian Research Institute for Artificial Intelligence in Spain.
But according to Mike Hicks, a philosopher of science and technology at the University of Glasgow in Scotland, the AI was simply getting better at pretending it was more knowledgeable than it actually was.
The models were quizzed on topics like math and geography. They were also asked to perform tasks, such as listing information in a specific order.
Overall, the bigger, more powerful models gave the most accurate responses, but when it came to harder questions, they were prone to error and had a lower level of correctness.
Some of the biggest liars were Open AI’s GPT-4 and o1. They would answer almost every question they were asked.

Kirsten D/peopleimages.com – stock.adobe.com – illustrative purposes only, not the actual person
But for the most part, all the studied LLMs seemed to be trending in that direction. In addition, none of the LLaMA family of models could reach a 60 percent accuracy level for even the easiest questions.
In conclusion, the bigger and more sophisticated the AI models became, the larger the percentage of wrong answers they gave.
The researchers suggest that people are ignoring how AI models mess up easy questions because they are so impressed by the accuracy with which they handle more complex questions.
The research was also telling about how humans perceive the AI responses. A group of participants were instructed to judge whether the chatbots’ answers were accurate or inaccurate, and they got it wrong between 10 and 40 percent of the time.
Programming the LLMs to be less willing to answer everything is likely the simplest solution to the issues, according to the researchers.
“You can put a threshold, and when the question is challenging [get the chatbot to] say. ‘no, I don’t know,'” said Hernández-Orallo.
However, AI companies may not be too eager to program their chatbots in such a way, as it might reveal the technology’s limitations to the public.
The details of the study were published in the journal Nature.
Sign up for Chip Chick’s newsletter and get stories like this delivered to your inbox.
More About:News