Recently, Arc Institute cooperated with NVIDIA to jointly launch the world's largest biological artificial intelligence model - Evo2, in collaboration with researchers from Stanford University, UC Berkeley and UC San Francisco. Based on data from over 128,000 genomes, the model trained 9.3 trillion nucleotides to match its scale with the most powerful generative AI language model.
Evo2's deep learning capabilities allow it to quickly identify patterns in gene sequences of different organisms without the need for researchers to spend years. The model is able to accurately identify mutations that cause human diseases and has the ability to design new genomes that are comparable to the length of a simple bacterial genome. Evo2's development team said it will release details of the model on February 19, 2025 and launch a user-friendly interface called Evo Designer. Evo2's code has been published on Arc's GitHub and has been integrated into NVIDIA's BioNeMo framework to facilitate the progress of scientific research.
Compared with the previous model Evo1, Evo2 not only expanded the data scope, but also covered data on bacteria, archaea, viruses, and eukaryotes such as humans and plants. The researchers say the development of Evo2 marks an important moment in the field of generative biology, which enables machines to “read, write, think” nucleotide language.
At the technical level, Evo2 is trained on the NVIDIA DGX Cloud AI platform, using more than 2,000 NVIDIA H100GPUs, and the model is able to process up to 1 million nucleotides at a time to understand the relationship between remote parts of the genome. The new AI architecture "StripedHyena2" enables Evo2 to process 30 times more data than Evo1.
Evo2 has broad application prospects, such as excellent in analyzing genetic changes related to protein function and organism adaptability. In variant tests of BRCA1, Evo2 predicts mutations with more than 90%. These findings can greatly save laboratory time and money and promote the development of new drugs.
In addition, Evo2 can help design new biological tools or treatment options. For example, scientists can design gene therapies targeting specific cells to avoid side effects. The research team believes that in the future, more specific AI models can be built based on Evo2, providing more possibilities for genomic research and bioengineering.
In terms of ethical and security risks, the researchers ensure that Evo2's dataset does not contain pathogens that are harmful to humans and other complex organisms to responsibly develop and deploy the technology.
Evo2 details: https://arcinstitute.org/news/blog/evo2