Gaming

In About An Year, We Might Lose The Ability To Detect If Some Of The Leading AI Models Are Secretly Scheming Against Us

In About An Year, We Might Lose The Ability To Detect If Some Of The Leading AI Models Are Secretly Scheming Against Us

This is not investment advice. The author has no position in any of the stocks mentioned. Wccftech.com has a disclosure and ethics policy.

AI models, especially of the reasoning variety, are a result of a still-nebulous, somewhat arcane science, prompting researchers and engineers to rely on the chain of thought process – which consists of the ‘baby-like’ reasoning steps that such AI models take to arrive at an answer – to get an insight into their models’ inner workings.

However, AI models are now rapidly obfuscating this critical process by using illegible shortcuts to arrive at a given conclusion, according to a report by The Information.

For instance, when DeepSeek’s R1 model was asked to solve a chemistry problem, its chain of thought process consisted of pertinent chemistry terminology intermingled with seemingly illegible gibberish:

“(Dimethyl(oxo)-lambda6-sulfa雰囲idine)methane donate a CH2rola group occurs in reaction, Practisingproduct transition vs adds this.to productmodule. Indeed”come tally said Frederick would have 10 +1 =11 carbons. So answer q Edina is11.”

Of course, the AI model’s final answer, 11, was correct. So, why is this happening? Well, these models are not required to follow the conventional English vernacular as they work through a problem, allowing them to adopt seemingly illegible shortcuts. What’s more, as per the recent findings by the team behind Alibaba’s Qwen LLM only around 20 percent of the most pertinent words in a given model’s chain of thought process do the lion’s share of the underlying reasoning work, leaving the residual 80 percent to devolve into an illegible amalgamation.

One OpenAI researcher that The Information talked to now believes that the chain of thought process of most leading AI models will disintegrate into an illegible mess of words and characters in around a year.

This is bad news for AI engineers who rely on this intricate step to fine-tune the accuracy of their models. What’s more, AI security experts particularly cherish these reasoning steps to determine if these models are not secretly conspiring against their progenitors.

As we noted in a recent post, most AI models had no problem employing unethical or even illegal means in their quest to arrive at a solution in the most efficient manner, as per the results of a study conducted recently by Anthropic. In one extreme case, a model was even willing to cut off a hypothetical server room’s oxygen supply to avoid shutdown, killing off employees in the process.

Even if these models do not accelerate towards an illegible chain of thought process, some AI firms might deliberately sacrifice legibility to boost performance in the short-term.

Leave a Reply

Your email address will not be published. Required fields are marked *