Meet MaxToki: The AI That Predicts How Your Cells Age
Most foundation models in biology have a fundamental blind spot: they see cells as frozen snapshots. Give a model a single-cell transcriptome — a readout of which genes are active in a cell at a given moment — and it can tell you a lot about what that cell is doing right now. However, it can’t tell you where that cell is headed. This limitation matters enormously when studying aging. Age-related diseases like heart disease, Alzheimer’s dementia, and pulmonary fibrosis don’t happen overnight. They unfold across decades, driven by slow, progressive shifts in gene network states. To understand and eventually reverse these trajectories, you need a model that thinks in time — not just in snapshots. That’s exactly what MaxToki is designed to do.
The team involved in this research includes researchers from institutions like the Gladstone Institute of Cardiovascular Disease, the Gladstone Institute of Data Science and Biotechnology, and the Gladstone Institute of Neurological Disease, alongside the University of California San Francisco. Also contributing were the University of California Berkeley and NVIDIA, among others. MaxToki is a transformer decoder model — the same architectural family behind large language models — but trained on single-cell RNA sequencing data. The model comes in two parameter sizes: 217 million and 1 billion parameters.
The key representational choice is the rank value encoding. Rather than feeding raw transcript counts into the model, each cell’s transcriptome is represented as a ranked list of genes, ordered by their relative expression. This nonparametric approach deprioritizes ubiquitously expressed housekeeping genes and amplifies genes like transcription factors that have high dynamic range across distinct cell states. Training happened in two stages. Stage 1 used Genecorpus-175M — approximately 175 million single-cell transcriptomes from publicly available data.
Stage 2 extended the context length from 4,096 to 16,384 tokens, allowing the model to process multiple cells in sequence, enabling temporal reasoning across a trajectory. Stage 2 training used Genecorpus-Aging-22M: approximately 22 million single-cell transcriptomes representing various age groups. Combined across both stages, MaxToki trained on nearly 1 trillion gene tokens in total.
The most architecturally novel contribution of MaxToki is its prompting strategy. A prompt consists of a context trajectory — two or three cell states plus the timelapses between them — followed by a query. The model then performs one of two tasks: predict the timelapse needed to reach that query cell or generate the transcriptome of the cell that would arise after that duration. This design choice produced dramatically lower prediction errors compared to traditional methods.
Achieving Vectorless Accuracy with Proxy-Pointer RAG
Build an Advanced Video Object Removal Pipeline with VOID
Related articles
OpenAI updates Codex to access all applications on your computer
OpenAI updates Codex, enabling access to all applications on your computer and new features.
Automated Reasoning checks in Amazon Bedrock ensure AI compliance
Automated Reasoning checks in Amazon Bedrock provide formal AI validation for compliance.
Cost-efficient custom text-to-SQL using Amazon Nova Micro and Bedrock
Amazon Nova Micro and Bedrock provide efficient solutions for text-to-SQL.