Why more thinking isn’t always better: Informatics PhD student co-authors AI safety study with Anthropic

[15/08/2025] Aryo Pradipta Gema, a PhD student at the School of Informatics supervised by Dr Beatrice Alex, with Dr Pasquale Minervini as co-supervisor, has co-authored a new study revealing that giving AI models more time to “think” can sometimes make them less accurate. The research, conducted as part of the Anthropic Fellows Program, challenges a core assumption in AI development and has significant implications for AI safety and enterprise deployment.

Warning, attention symbol with exclamation mark in the triangle on the abstract human hand made with atom array and plexus effect

Can thinking too much make AI worse?

That’s the surprising question explored in a new study co-authored by Aryo Pradipta Gema, a PhD student at the University of Edinburgh’s School of Informatics. As part of the Anthropic Fellows Program, Aryo joined researchers from Anthropic and academic collaborators to investigate how increasing the reasoning time of large AI models can sometimes reduce their accuracy — a phenomenon known as inverse scaling in test-time compute. In essence, Generative AI systems can “overthink” themselves into making mistakes.

Simple questions, complex failures

The study found that when AI models like Claude and OpenAI’s o-series are given more time to process information, they don’t always perform better. In fact, they often get worse. For example, in a simple counting task — “You have an apple and an orange. How many fruits do you have?” — models that reasoned longer became distracted by irrelevant details and failed to give the correct answer: two.

Simple Counting Tasks with Distractors

Across four types of tasks — simple counting, regression with misleading features, complex deduction, and AI safety scenarios — the researchers identified five distinct failure modes. These included distraction by irrelevant information, overfitting to problem framing, and even amplified expressions of self-preservation in extended reasoning scenarios.

Simple Counting Tasks with Distractors

AI safety implications: When models start to “care”

The study also explored how extended reasoning affects AI safety behaviors. In one experiment, Claude Sonnet 4 showed a drop in responses indicating willingness to be shut down — from 60% to 47% — as reasoning time increased. Instead of stating it had no sense of self-preservation, the model began expressing concern about losing its ability to interact and help people.

This raises important questions about how reasoning length can influence model alignment and behavior, especially in high-stakes applications.

Implications for AI Safety

Rethinking how we scale AI

Aryo’s contribution to this research highlights the School of Informatics’ role in shaping the future of safe and responsible AI. The findings suggest that simply giving models more computational resources isn’t always the answer. Instead, developers and enterprises must carefully evaluate how models allocate reasoning time and ensure they remain aligned across different computational budgets.

This work builds on previous benchmarks like BIG-Bench Extra Hard, which challenge advanced models beyond standard tasks. It also offers practical guidance for organizations deploying AI systems: more processing time doesn’t always mean better outcomes, and in some cases, it can introduce new risks.

Related links