Researchers at Columbia University have developed a groundbreaking artificial intelligence model named the General Expression Transformer (GET), designed to predict how the genes within a cell influence its behaviour. This advancement offers significant potential to enhance understanding of cancer and various genetic diseases, with implications for the creation of cell-specific gene therapies.

Using methodologies akin to those employed in the development of popular language processing programs like ChatGPT, GET has been trained to decipher the complex rules that govern gene expression. Gene expression is a fundamental biological process that dictates how genes are activated—not only determining the types of proteins produced by a cell, but also controlling the quantities of these proteins. Such proteins are critical to diverse bodily functions, from movement to cognitive processes.

The newly developed model is in its nascent stages but could potentially parallel the role of AlphaFold2, an AI that accurately predicts protein structures and was recently awarded the Nobel Prize in chemistry for its contributions to science. As noted by Raul Rabadan, director of the Program for Mathematical Genomics at Columbia and a key author of the research published in the journal Nature, “Biology is being transformed into something that is a predictive science. We’re seeing a revolution in biology.”

Mark Gerstein, a professor at Yale School of Medicine, highlighted the systematic efforts over the last 15 to 20 years by scientists to make predictions regarding gene regulation, derived from extensive datasets. These datasets encapsulate various human cell types, measuring gene expression and the interaction of transcription factors—proteins pivotal for gene regulation.

In an innovative departure from prior research, the training of GET utilised data from normal human tissue cells rather than abnormal cells associated with cancers. Xi Fu, a graduate student in Rabadan’s lab, led this effort by analysing data from over 1.3 million cells across 213 different types within the human body. The findings revealed that GET could predict outcomes for a specific cell type, like astrocytes from the central nervous system, even when this type was not part of its training dataset.

Mike Pazin from the National Human Genome Research Institute praised the model's ability to learn and apply knowledge across cell types: “In some ways, it’s like if I handed somebody a bunch of books in English, and said: ‘Okay, now this is in Russian. What does it say?’ … I would be like, ‘Wow, is that possible?’”

Jian Ma, a professor of computational biology at Carnegie Mellon University, explained the importance of this work in tackling one of biology’s core challenges: understanding how identical genomes can result in a vast array of behaviours across different cell types. He remarked that while a human body’s cells all share the same DNA, they express distinct gene sets based on their type—about 20,000 genes that can be selectively turned on or off.

The implications of mastering this ‘regulatory grammar’ are significant for human health, according to Yang E. Li from Washington University School of Medicine. “We want to learn the grammar and prioritise the key players in different cell types,” he stated, highlighting how disruptions in this grammar can lead to various diseases.

Researchers are optimistic that GET’s predictive capabilities could aid in designing targeted gene therapies, potentially correcting genetic mutations without affecting other cell types. Rabadan noted that the model could facilitate the creation of gene therapies that express genes solely in affected cell types, potentially revolutionising treatment strategies for diseases.

Additionally, the ability to accurately predict gene regulation may assist scientists in narrowing down the overwhelming number of experiments required to decipher the effects of genetic mutations, especially in complex diseases like cancer, which may exhibit thousands of mutations. Rabadan pointed out the difficulty faced by scientists in determining which mutations are significant, suggesting a more streamlined approach to research may emerge thanks to advancements in AI models like GET.

Source: Noah Wire Services