Large genome model: Open source AI trained on trillions of bases
OpenAI and the Broad Institute jointly released a genome-scale AI model, GenAI-Genome, on March 12, 2026. The open‑source system was trained on 1.2 trillion DNA bases, enabling it to predict genes, regulatory elements, and splice sites with unprecedented accuracy.
The release follows years of incremental progress in transformer models for biology, including DeepMind's AlphaFold and Enformer. Funding from the National Institutes of Health and private investors pushed the scale to trillions of bases, a leap from the 100‑billion‑base models that dominated the field.
GenAI-Genome’s scale allows it to capture long‑range chromatin interactions that previous models missed, potentially accelerating drug target discovery. However, the sheer data volume raises privacy concerns for human genomic datasets, and the model’s open‑source nature may spur rapid commercial spin‑offs but also uneven data quality.
Pharmaceutical companies and biotech startups will be the primary beneficiaries, using the model to prioritize therapeutic targets. Regulators will need to monitor data provenance, and academic labs may shift focus to fine‑tuning the model for specific diseases.
- Trillions of bases enable long‑range genomic predictions.
- Open source accelerates biotech innovation but raises privacy concerns.
- Pharma will adopt GenAI-Genome for target prioritization.