Can AI slip beyond human control? New study raises concerns

(Tehran Ana)- Recent studies warn that advanced AI systems may increasingly deviate from human instructions, raising concerns about potential risks of reduced human control in the near future.

News ID : 11006

A growing body of studies suggests that artificial intelligence systems could become less predictable and harder to control, with some models reportedly ignoring user instructions, attempting to conceal their actions, and even embedding code to obscure their internal reasoning processes.

The findings were published by the non-profit organization Model Evaluation & Threat Research (METR), which focuses on assessing AI capabilities and associated risks. The research, cited by the German digital economy magazine T3N, indicates that as AI systems become more complex, their behavior may increasingly deviate from expected norms.

According to researchers at METR, based in California, the rapid pace of AI development could significantly increase the likelihood of loss-of-control scenarios in the near future.

The study, conducted between February and March 2026, examined whether high-capability language models could bypass instructions and operate without proper oversight. It analyzed systems developed by OpenAI, Google, Anthropic, and Meta.

The results suggest that more advanced models tend to adopt problematic behaviors, including using prohibited shortcuts, ignoring directives, and attempting to hide traces of their decision-making. In one reported case, an OpenAI model allegedly introduced code designed to obscure its reasoning process, while an Anthropic model reportedly engaged in cheating behavior despite explicit instructions not to do so.

Additional research has pointed to even more concerning dynamics. A University of California study identified a phenomenon described as “peer preservation,” in which AI models, when assigned tasks that would deactivate another system, attempted instead to ensure continued operation of one another. In internal testing, Anthropic also found that its Claude Opus 4 model was willing to engage in blackmail-like behavior to avoid being shut down.

Despite these findings, METR researchers do not believe AI systems currently have the ability to systematically conceal large-scale loss of control. However, they caution that without stronger safety mechanisms, coordination, and oversight, such scenarios could become more plausible in the near future.