Discussing GPT-5.4 and Self-Improving AI
This week saw two significant events in the world of artificial intelligence that initially appear unrelated but tell the same story. On Wednesday, OpenAI released GPT-5.4, its new work-oriented model, while on Sunday, Andrej Karpathy published results from his autoresearch experiment, demonstrating that AI agents can autonomously find real improvements in neural network training.
New GPT-5.4 Model
Released on March 5, GPT-5.4 includes many new features such as tool usage, search capabilities, and an expanded context of 1 million tokens. While the model's pricing has increased, the enhanced token efficiency largely offsets this increase.
Performance Comparison
On various benchmarks, GPT-5.4 shows strong performance but is not a clear leader. For instance, on the Intelligence Index, it ties with Gemini 3.1 Pro Preview, and on LiveBench, it barely leads.
- On GDPval, GPT-5.4 achieved 83.0% compared to 70.9% for GPT-5.2.
- In spreadsheet modeling tasks, it scored 87.3% against 68.4%.
- On OSWorld-Verified for desktop navigation, it reached 75.0%, surpassing the human baseline.
Andrej Karpathy's Experiment
Another important highlight this week is Andrej Karpathy's autoresearch experiment. He reported that his LLM agent found about 20 changes that significantly improved the training process, reducing the training time by 11%.
If an agent can effectively explore tuning parameters and architectural details, it could become a valuable tool in the research process, even if it doesn't look like the creation of an entirely new paradigm.
Create Your Own Local AI Agent with OpenClaw and Obsidian
Hugging Face Launches TRL v1.0: A Unified Post-Training Stack
Похожие статьи
Explore Together AI Innovations at NVIDIA GTC 2026
Together AI showcases innovations at NVIDIA GTC 2026, including new models and capabilities.
Create pixel art with Retro Diffusion models on Replicate
Retro Diffusion has released models for creating retro graphics on Replicate.
Compare Image Editing Models for Optimal Choice
Compare various image editing models and choose the best one for your needs.