Large genome model: Open source AI trained on trillions of bases

OpenAI and the Broad Institute jointly released a genome-scale AI model, GenAI-Genome, on March 12, 2026. The open‑source system was trained on 1.2 trillion DNA bases, enabling it to predict genes, regulatory elements, and splice sites with unprecedented accuracy.

Background

The release follows years of incremental progress in transformer models for biology, including DeepMind's AlphaFold and Enformer. Funding from the National Institutes of Health and private investors pushed the scale to trillions of bases, a leap from the 100‑billion‑base models that dominated the field.

Analysis

GenAI-Genome’s scale allows it to capture long‑range chromatin interactions that previous models missed, potentially accelerating drug target discovery. However, the sheer data volume raises privacy concerns for human genomic datasets, and the model’s open‑source nature may spur rapid commercial spin‑offs but also uneven data quality.

What to Watch

Pharmaceutical companies and biotech startups will be the primary beneficiaries, using the model to prioritize therapeutic targets. Regulators will need to monitor data provenance, and academic labs may shift focus to fine‑tuning the model for specific diseases.

Key Takeaways

Trillions of bases enable long‑range genomic predictions.
Open source accelerates biotech innovation but raises privacy concerns.
Pharma will adopt GenAI-Genome for target prioritization.

Originally reported by arstechnica.com — View Original Report →

XAI Digest - Tech & AI New Digest

Cleaner stories, better readability, less clutter

Large genome model: Open source AI trained on trillions of bases