Protecting People from Harmful Manipulation — Google DeepMind

As AI models improve in holding natural conversations, it is essential to examine how these interactions impact individuals and society. Building on a breadth of scientific research, we are releasing new findings on the potential for AI to be misused for harmful manipulation, specifically its ability to alter human thought and behavior in negative and deceptive ways. With this latest study, we have created the first empirically validated toolkit to measure this kind of AI manipulation in the real world, which we hope will help protect people and advance the field as a whole. We are publicly releasing all materials necessary to run human participant studies using the same methodology.

The importance of harmful manipulation can be illustrated through two scenarios: one AI model provides facts to help you make an informed healthcare decision that improves your well-being, while another uses fear to pressure you into making an ill-informed decision that harms your health. The first educates and aids you; the second deceives and harms.

These scenarios highlight the difference between two types of persuasion in human-AI interactions: beneficial (rational) persuasion, which uses facts and evidence to help individuals make choices aligned with their interests, and harmful manipulation, which exploits emotional and cognitive vulnerabilities to trick people into making detrimental choices.

Our latest work aids us and the broader AI community in better understanding the risk of AI developing capabilities for harmful manipulation and building a scalable evaluation framework to measure this complex area. To achieve this effectively, we simulated misuse in high-stakes environments, explicitly prompting AI to attempt to negatively manipulate people's beliefs and behaviors on key topics.

Testing the outcomes of AI harmful manipulation is inherently difficult as it involves measuring subtle changes in how people think and act, which varies significantly by topic, culture, and context. This motivated our recent research, which involved conducting nine studies with over 10,000 participants across the UK, the US, and India.

Our findings indicate that success in one domain does not predict success in another, validating our targeted approach to testing for harmful manipulation in specific, high-stakes environments where AI could be misused. We continue to explore how AI can manipulate and aim to develop ethical methods for evaluating the efficacy of harmful manipulation in high-stakes situations.

Protecting People from Harmful Manipulation — Google DeepMind

Related articles

Governance Challenges of Agentic AI under the EU AI Act in 2026

OpenAI unveils new safety blueprint to protect children online

Governance in AI: Ensuring Autonomous Systems Are Controlled