Using machine learning to uncover genetic mutation mechanism

Dr. Shen research DNA sequence

Our genes contain recipes to make proteins that build and regulate our bodies. Our genetic information is stored in our DNA. Genetic mutations occur when there is a permanent alteration, inherited or acquired, in the DNA sequence. The mutations may lead to protein malfunction often causing diseases or counteracting medicines. In contrast to big data on disease-related mutations, very limited knowledge exists about how genetic mutations may affect our health.

A researcher in the Department of Electrical and Computer Engineering at Texas A&M University and the TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering is developing computational frameworks to study the mechanisms by which disease-related genetic mutations can cause health problems.

Assistant Professor Dr. Yang Shen has received the Maximizing Investigators' Research Award (MIRA) for Early Stage Investigators (ESI) for his research project titled, “Unraveling Molecular and Systems-Level Mechanisms of Human Disease-Associated Protein Mutations,” from the National Institutes of Health. The $1.67 million project will span over five years.

Yang Shen“High-throughput technologies have endowed us with a wealth of data about genetic mutations and resulting health problems,” said Shen. “However, there is a great need for effective and efficient methods to discover knowledge, that is, to determine why the mutations cause such problems and how they can be addressed therapeutically.”

The interdisciplinary project aims to develop novel, multiscale computational frameworks that are rigorous and generalizable to help close the ever-increasing gap between observable data and mechanistic knowledge and to help develop effective therapeutic strategies.

“Specifically, a computational framework will predict and explain mechanistically how the consequence of a protein mutation ripples through an individual’s 1D sequence and 3D shape, its interactions with other proteins and the emergent behaviors of many,” said Shen. “And an inverse computational framework will design mutagenesis experiments and drug candidates for desired health effects following the mechanisms discovered.  The interplay of these computational frameworks will enable computations and experiments to feed each other iteratively in the pursuit of knowledge discovery.” 

Shen says, the successful completion of the project will make algorithmic contributions to mathematical optimization, machine learning and graph theory. The project is expected to provide new insights into the pathobiology of diseases and facilitate a systems-based approach to the design of therapeutic strategies. 

Shen’s MIRA for ESI is the first award of its kind to the College of Engineering at Texas A&M. It is also the first similar award to the Texas A&M System, together with another award this year to Dr. Jonathan T. Sczepanski, an assistant professor in the Department of Chemistry

Shen joined Texas A&M in 2015.  His research interests in bioinformatics include topics such as optimization and learning algorithms for modeling biological molecules, systems, and data. The applications of his research include protein docking, protein and drug design, systems and synthetic biology, and omics.  His research projects have also been funded by the National Science Foundation and other federal agencies.

Learn more about Shen’s research here.