
Diseases like cancer are tragic on their own, but even more so when they evolve to resist the very treatments meant to eliminate them.
At Texas A&M University, researchers are fighting back. Dr. Yang Shen, an associate professor in the Department of Electrical and Computer Engineering, and his team are collaborating with clinicians to unravel genetic mutations that cause diseases or confer drug resistance.
Shen’s work was recognized in 2017 when he received the Maximizing Investigators' Research Award (MIRA) for Early Stage Investigators from the National Institute of General Medical Sciences. Shen has now been honored with the MIRA Award for Established Investigators to build on his earlier research.
“We’ve made translational impacts in medical research while advancing computational methods, including multimodal deep learning and generative AI,” Shen said. “AI4Science and AI4Health communities are quickly emerging across disciplinary boundaries. This is truly an exciting era to witness and contribute. It’s great to be part of something bigger than myself.”
Shen’s ultimate goal is to advance and apply computational methods to help solve biomedical challenges such as therapeutic resistance before tragedy strikes.
“Instead of waiting for and reacting to resistance mutations, you could be one step ahead, not just responding to the mutations afterward, but anticipating the mutations beforehand. And you can even be two steps ahead, therapeutically prepared for not only the already known but also the anticipated resistance mutations,” Shen said.
Looking Back
“Seven years ago, we had a very ambitious plan, and seven years later, we have not only achieved that vision but surpassed it,” Shen said. “We're now entering the next phase.”
Our vision is to make sure that we can mechanistically predict how a genetic mutation expresses its impact from the atomic, molecular, cellular and tissue level through the hierarchy of life to cause disease or health outcomes
Thanks to biotechnologies like whole genome sequencing, a multitude of genetic and phenotypic data is being accumulated with unprecedented size and complexity. However, mechanistic knowledge is often a missing link between genetic mutations and disease phenotypes. Using computational methods to delve into this public data, Shen and his team gained insight into how the specific combination of genes an organism has — known as genotypes — influences their phenotypes or physical traits and how that can translate to diagnosis or therapeutics.
First, they want to be one step ahead and anticipate unknown genetic variants that may cause the same disease phenotype.
“Our vision is to make sure that we can mechanistically predict how a genetic mutation expresses its impact from the atomic, molecular, cellular and tissue level through the hierarchy of life to cause disease or health outcomes,” Shen said.
“Once we discover a certain mechanistic hypothesis, how can we confirm it? We go backwards. Design experiments that can test the hypothesis we generated in the first stage,” Shen added. “It’s come full circle – hypothesis generation and test. That was the vision back then, and we are much closer to that vision now.”
Collaboration
Shen and his team have collaborations centered around different aspects of cancer, especially breast cancer, with institutions such as the University of Chicago and Northwestern University.
They also have a decade-long collaboration with clinical scientists at Memorial Sloan Kettering Cancer Center in New York City. Twelve years ago, these clinicians were one of the first groups to discover certain mutations in the ESR1 gene (estrogen receptor alpha) that caused breast cancer to be resistant to hormone therapy. Shen’s computer modeling work explained how such mutations in the estrogen receptor alpha protein could have caused resistance to therapeutics.
“Collaborations with clinical scientists made me aware of the biomedical challenges I wanted to address,” Shen said. “It motivated me to develop computational methods to address those needs.”
In the past year, Shen and his clinical collaborators published a paper in the Journal of Clinical Investigation, unravelling new types of ESR1 mutations. These mutations occur at completely different locations of the estrogen receptor alpha protein compared to those discovered over a decade ago. They indicate different mechanisms and manifest themselves biologically in different ways.

Machine Learning
In the published study, Shen uses machine learning to identify distinctions between two types of mutations – the previously discovered and the new type. Machine-learned features represent the mechanistic hypothesis. Following the identified hypothesis, the team computationally designed experiments to verify such mechanisms.
“To address challenges arising from the biomedical demands, one needs to not only apply existing machine learning methods but also develop new ones.” Shen said, “The biomedical demands inspire foundational advances in machine learning.”
One way of meeting these demands is by using graphs to see how things like molecules of bonded atoms to tissues of interacting cells are connected. Graph-supervised learning uses labeled data about graphs to “supervise” models and make predictions.
For instance, it can predict if a molecule is active or inactive for a specified drug target. However, if you only have experimental data for a small pool of molecules, it is difficult to make reliable predictions from that model. Small-data challenges are prevalent in the biomedical field where experimental and clinical measurements are often expensive and slow.
Shen’s team and their collaborators at the University of Texas, Austin, are pioneering a new type of machine learning called graph self-supervised learning. For the challenge of small-label data, graph self-supervised learning is a game changer because it doesn’t require labeled data. Instead, it automatically learns from abundant graph data without labels to make predictions for a wide range of applications. Shen’s team further extended self-supervised learning to multiple data modalities, such as graphs and texts.
Another way of meeting the demands, particularly to stay “two steps ahead,” is to design proteins and small molecules toward personalized medicine. Sensing the increasing impact of AI in generating images and texts, Shen’s team have been early adopters and active developers of generative AI techniques for designing proteins and small molecules.
“It's not just our success story, but also a success story of the National Institute of General Medical Sciences. Their vision is remarkable in how much they are willing to commit to both early-stage and established investigators with this unique funding mechanism to enhance breakthroughs,” Shen said. “And none of this would be possible without my wonderful students and collaborators.”