Researchers at the Technical University of Munich (TUM) and Helmholtz Munich have demonstrated significant advancements in the use of artificial intelligence, specifically self-supervised learning, to analyse vast quantities of biological data derived from individual cells. This innovative approach aims to deepen understanding of human cellular functions, particularly how variations occur between healthy individuals and those with diseases.

Our bodies are composed of approximately 75 billion cells, and discerning the specific roles of these cells is crucial for medical research and treatment development. The study's focus is on single-cell technology, which allows for the examination of tissues at the cellular level. This technology enables researchers to identify the distinct functions of various cell types and investigate the alterations caused by factors such as smoking, lung cancer, and COVID-19 on cellular structures.

The research tackles the challenge of interpreting the immense datasets generated through these investigations. Machine learning, particularly self-supervised learning, has emerged as a viable method for analysing unlabelled data—data that has not been pre-classified into specific categories. According to Fabian Theis, who holds the Chair of Mathematical Modelling of Biological Systems at TUM, self-supervised learning proves beneficial because it leverages the abundance of unlabelled data to derive meaningful insights without necessitating prior categorisation.

Two core techniques underpin self-supervised learning: masked learning, where parts of the input data are concealed and the model learns to reconstruct the omitted portions, and contrastive learning, which involves teaching the model to distinguish between similar and dissimilar data. In their recent study, published in Nature Machine Intelligence, the research team employed these techniques to analyse over 20 million individual cells, comparing the effectiveness of self-supervised learning against traditional learning methods.

Notably, the researchers found that self-supervised learning notably enhances performance in transfer tasks—where insights from larger datasets inform the analysis of smaller ones. Additionally, the study produced promising results when executing zero-shot cell predictions, which entail performing tasks without pre-training. The team concluded that masked learning is particularly well-suited for tackling large datasets associated with single-cell genomics.

With these findings, the researchers are progressing towards the development of virtual cells—sophisticated computer models that encapsulate the diversity of cells across various datasets. These models are anticipated to facilitate the analysis of cellular changes related to diseases, thereby offering significant potential for future biomedical research. The study provides critical insights into the efficient training and optimisation of these models, marking a noteworthy step forward in the intersection of artificial intelligence and biomedicine.

For more information, individuals can contact Professor Fabian Theis at the Technical University of Munich.

Source: Noah Wire Services