Large genome model: Open source AI trained on trillions of bases
OpenAI and Genomics AI Lab unveiled the Large Genome Model (LGM) on March 1, 2026, a transformer trained on over 3 trillion base pairs. The open‑source release promises to accelerate genomic analysis across research and industry.
Prior to LGM, most genomic AI models were proprietary and limited to 10‑billion‑base datasets. The new model leverages a consortium of publicly funded sequencing projects and the NIH Genomic Data Commons to achieve unprecedented scale.
LGM’s scale enables accurate identification of rare regulatory elements and splice variants, potentially reducing drug development timelines. However, the model’s size raises computational barriers for smaller labs and raises questions about data provenance and privacy. Its open‑source nature may spur competition but also necessitates robust governance frameworks.
Pharma companies like Pfizer and biotech startups will likely adopt LGM to streamline target discovery. Academic groups may use it for population genomics studies. Watch for licensing changes and the emergence of cloud‑based inference services.
- LGM scales genomic AI to 3 trillion bases, boosting variant detection.
- Open source could democratize access but requires high‑end compute.
- Pharma adoption may shorten drug discovery cycles.