Discussing GPT-5.4 and Self-Improving AI

16 просмотров Источник
Discussing GPT-5.4 and Self-Improving AI

This week saw two significant events in the world of artificial intelligence that initially appear unrelated but tell the same story. On Wednesday, OpenAI released GPT-5.4, its new work-oriented model, while on Sunday, Andrej Karpathy published results from his autoresearch experiment, demonstrating that AI agents can autonomously find real improvements in neural network training.

New GPT-5.4 Model

Released on March 5, GPT-5.4 includes many new features such as tool usage, search capabilities, and an expanded context of 1 million tokens. While the model's pricing has increased, the enhanced token efficiency largely offsets this increase.

Performance Comparison

On various benchmarks, GPT-5.4 shows strong performance but is not a clear leader. For instance, on the Intelligence Index, it ties with Gemini 3.1 Pro Preview, and on LiveBench, it barely leads.

  • On GDPval, GPT-5.4 achieved 83.0% compared to 70.9% for GPT-5.2.
  • In spreadsheet modeling tasks, it scored 87.3% against 68.4%.
  • On OSWorld-Verified for desktop navigation, it reached 75.0%, surpassing the human baseline.

Andrej Karpathy's Experiment

Another important highlight this week is Andrej Karpathy's autoresearch experiment. He reported that his LLM agent found about 20 changes that significantly improved the training process, reducing the training time by 11%.

If an agent can effectively explore tuning parameters and architectural details, it could become a valuable tool in the research process, even if it doesn't look like the creation of an entirely new paradigm.

Похожие статьи