Google DeepMind Develops LLM to Automate Game Theory Algorithms
Google DeepMind's research introduces AlphaEvolve, a powerful agent that utilizes LLMs to automate the development of algorithms in Multi-Agent Reinforcement Learning (MARL) within imperfect-information games. Traditionally, the design of such algorithms relied heavily on intuition and trial-and-error, which was time-consuming. AlphaEvolve replaces this manual process with automated search, allowing researchers to focus on more complex aspects.
The team applies AlphaEvolve to two established paradigms: Counterfactual Regret Minimization (CFR) and Policy Space Response Oracles (PSRO). In both cases, the system discovers new algorithm variants that perform competitively against existing hand-designed algorithms. All experiments were conducted using the OpenSpiel framework.
CFR is an iterative algorithm that minimizes regret by accumulating counterfactual regret and deriving a new policy based on positive accumulated regret. The AlphaEvolve system enhances this process by allowing algorithms to evolve and adapt to various game conditions.
AlphaEvolve is a distributed evolutionary system that uses LLMs to mutate source code rather than numerical parameters. In the process, algorithms are selected based on their performance, and their code is modified by the LLM, leading to the creation of new candidates that are then tested in various games.
One of the discovered algorithms is VAD-CFR, which adapts the discounting methodology to the volatility of the learning process. It employs volatility-adaptive discounting, allowing the algorithm to forget unstable data more quickly and react better to current actions. VAD-CFR demonstrates outstanding results, surpassing the performance of existing algorithms in most tests.
Another variant developed is AOD-CFR, which uses a linear schedule for discounting accumulated regrets and optimistic policy trends. These advancements highlight the potential of LLMs in developing complex algorithms and open new horizons for automation in artificial intelligence.
NVIDIA Achieves Performance Records and Lowers Token Costs
Build Production-Ready Agentic Systems with Z.AI GLM-5
Related articles
Reinforcement Fine-Tuning on Amazon Bedrock: Best Practices
Explore best practices for reinforcement fine-tuning on Amazon Bedrock.
Using human-in-the-loop constructs in healthcare and life sciences
Human-in-the-loop constructs are essential for AI control in healthcare.
Amazon Bedrock simplifies customization of Nova models for businesses
Amazon Bedrock simplifies the customization of Nova models for businesses, enabling the integration of unique knowledge and improved accuracy.