D4RT revolutionizes 4D reconstruction and tracking in AI

60 просмотров Источник
D4RT revolutionizes 4D reconstruction and tracking in AI

D4RT: Integrated and Fast 4D Reconstruction and Tracking

January 22, 2026

Research

Authors: Guillaume Le Moën and Mehdi S. M. Sajjadi

D4RT is a new AI model offering a unified solution for 4D scene reconstruction and tracking, encompassing both spatial and temporal dimensions. The model aims to enhance machines' ability to perceive the world akin to human vision, including memory and prediction.

Understanding the Fourth Dimension

To comprehend dynamic scenes from 2D video, AI models must track each pixel in three spatial dimensions and an additional time dimension. This involves distinguishing object motion from camera motion and maintaining a coherent scene representation, even when objects overlap or move out of frame. Traditional methods require resources or specialized models, leading to slow and fragmented reconstructions. In contrast, D4RT's architecture and innovative query mechanism allow the model to be up to 300 times more efficient than previous approaches, making it suitable for real-time use in fields like robotics and augmented reality.

Query-Based Functionality

D4RT is built on a Transformer architecture with an encoder-decoder setup. The encoder transforms video into a compact representation of scene geometry and motion. Unlike older systems that use separate modules for different tasks, D4RT employs a flexible query mechanism focused on the primary question:

"Where is this pixel from the video in 3D space at a specific time, as observed from a chosen camera?"

Based on previous research, the decoder efficiently processes queries related to this representation. Since the queries are independent, they can be processed in parallel on modern AI hardware, increasing D4RT's speed and scalability for tasks from point tracking to whole scene reconstruction.

D4RT combines a powerful encoder with a lightweight decoder capable of handling thousands of queries simultaneously. By addressing specific questions, such as determining a pixel's position at a certain time and camera angle, the model successfully tackles various tasks like tracking, depth estimation, and pose estimation with a single, adaptable interface.

Capabilities: Fast and Accurate 4D Interpretation

The model's flexible design allows it to tackle various 4D tasks, including:

  • Point Tracking: By querying a pixel's position over time, D4RT can predict its 3D trajectory, even if the object is not visible in subsequent frames.
  • Point Cloud Reconstruction: D4RT can create the entire 3D structure of a scene by capturing the time and camera viewpoint, eliminating additional steps like camera estimation.
  • Camera Pose Estimation: By matching 3D images from multiple viewpoints, D4RT can accurately determine the camera's trajectory.

According to the technical report, D4RT outperforms existing methods in various 4D reconstruction tasks. Comparisons show that while other techniques struggle with moving objects, D4RT maintains a continuous understanding of dynamic scenes. This is achieved without losing efficiency, performing tasks 18-300 times faster than previous methods. For instance, D4RT can process a minute-long video in about five seconds on a single TPU chip, compared to ten minutes with previous technologies, representing a 120-fold improvement.

Practical Applications

D4RT demonstrates that accuracy and efficiency in 4D reconstruction can coexist. Its query-based system allows for real-time capture of dynamic environments, paving the way for advanced spatial computing in:

  • Robotics: Provides the spatial awareness necessary for robots to safely navigate environments filled with moving objects and people.
  • Augmented Reality (AR): Supports AR devices by offering real-time scene geometry understanding with low latency, facilitating on-device use.
  • World Models: By effectively separating various motions and static elements, D4RT contributes to developing AI with a comprehensive "world model" necessary for achieving AGI.

Research into the potential applications of D4RT in robotics, AR, and other fields continues.

Further Developments

  • Gemini Robotics 1.5 Introduces AI Agents into the Physical World
  • Introducing Veo 3.1 and Enhanced Creative Capabilities
  • Genie 3: A New Frontier for World Models

Похожие статьи