Research and Product Announcements at AI Native Conf

Источник
Research and Product Announcements at AI Native Conf

At the AI Native Conf, Together announced several significant innovations, including FlashAttention-4, a Reinforcement Learning API, ThunderAgent, and ATLAS-2. The AI Native Cloud is more than just a marketing term; it is a full-fledged cloud designed for AI-native users. The team of researchers and engineers behind achievements like FlashAttention and ThunderKittens manages the production systems relied upon by clients such as Cursor and Decagon. This proximity to production allows for rapid implementation of new technologies, providing immediate benefits to customers.

During the first AI Native Conf, seven new research and product releases were announced across three areas: kernels, reinforcement learning, and algorithmic inference optimization. Each of these innovations represents a significant advancement in our research-to-production pipeline.

FlashAttention-4 serves as the attention engine for many large-scale language models, showcasing impressive performance by running 2.7 times faster than Triton and 1.3 times faster than cuDNN 9.13. This is particularly beneficial for long-context tasks such as video understanding and test-time compute scaling.

Another key achievement is Together Megakernel, which provided a significant performance boost for a leading voice agent company. The optimization reduced response time to 77 ms, which is 3.6 times faster than their previous setup.

Additionally, the together.compile system automates the kernel optimization process, significantly simplifying model interactions. The application of this technology accelerated video generation by 25%.

Together's Reinforcement Learning API offers a complete stack for training, allowing teams to control and optimize the process. This is crucial as over 70% of reinforcement learning time is spent on rollouts, where Together's research can be a game-changer.

Finally, ThunderAgent addresses the challenges faced when working with agentic workflows, providing more efficient resource management and load optimization.

Похожие статьи