Introducing Gemma Scope 2 for Analyzing Language Model Behavior

22 просмотров Источник
Introducing Gemma Scope 2 for Analyzing Language Model Behavior

The company has announced a new toolkit for interpreting language models - Gemma Scope 2. These tools will help researchers gain a deeper understanding of the internal decision-making processes in language models, which, despite their impressive capabilities, remain opaque.

Gemma Scope 2 supports all Gemma 3 models, ranging from 270 million to 27 billion parameters, and allows for tracking potential risks within their 'brains'. This is the largest release of interpretation tools from the AI lab to date, encompassing around 110 petabytes of data and training over 1 trillion parameters.

With Gemma Scope 2, researchers will be able to debug unexpected model behaviors and conduct audits of AI agents, accelerating the development of safe solutions for issues such as jailbreaks, hallucinations, and bias.

The new toolkit includes autoencoders and transcoders, enabling researchers to look inside the models and understand how their thoughts are formed and how they relate to the model's behavior. This is crucial for studying aspects such as discrepancies between the model's logic and its internal state.

Gemma Scope 2 also offers enhanced tools for analyzing complex internal processes, including new training techniques that help uncover more useful concepts and address shortcomings of the previous version. The chatbot behavior analysis tools will assist in exploring complex multi-step actions, such as failure mechanisms and the fidelity of reasoning chains.

Похожие статьи