Decoding tuberculosis: Using analysis to better understand disease, develop new treatments


Mycobacterium tuberculosis (Mtb) is the causative agent of the serious lung disease, tuberculosis, that is widespread throughout the world. About one-third of the human population is infected with tuberculosis, which takes at least six months of daily drug treatment to cure. Unfortunately, there is an increasing prevalence of drug resistant strains of the disease.

Dr. Thomas Ioerger, associate professor in the Department of Computer Science and Engineering at Texas A&M University, is working on a highly impactful, interdisciplinary project to better understand the biology and genetic makeup of Mtb. His research has the potential to prevent a significant amount of deaths per year from TB, and by analyzing the genetic makeup of TB, Ioerger and this research community is one step closer to managing this threatening disease.

He is working to uncover the interaction with the host, for example the avoidance of immune defenses during infection, to develop new and more effective drugs to fight against the disease. A key technological advance over the last decade has been using high-throughput DNA sequencing as part of these evaluation protocols, which is a way to sequence and analyze large DNA sequences, or genomes.

Texas A&M is home to several DNA sequencers, which are operated by the Genomics and Bioinformatics Center at the Texas A&M AgriLife Extension Service. The Genomics and Bioinformatics Center is directed by Dr. Charlie Johnson, who is also executive director of the Center for Bioinformatics and Genomics Systems Engineering, a joint AgriLife and Texas A&M Engineering Experiment (TEES) Center.

Ioerger is the principal investigator for the Bioinformatics and Data Dissemination Core for a National Institutes of Health (NIH) program project called FLUTE, which stands for Functionalizing Lists of Unknown Tuberculosis Entities. He is collaborating with research scientists at Harvard Medical School, Massachusetts Medical School and Weill Cornell Medical College.

Ioerger's colleagues in FLUTE use genetic techniques, such as making mutants of a laboratory strain of Mtb with specific genes knocked-out, and then running experiments to try to evaluate the effect of the knock-out on growth in various conditions. His collaborators send DNA samples from various experiments, the samples are prepared in the lab of his local collaborator, Dr. James C. Sacchettini in the Texas A&M Department of Biochemistry and Biophysics, and then they are sequenced. Ioerger's group focuses on the analysis of this genomic data and algorithms for inferring the functions of the target proteins.

Nearly half of the protein coding genes of TB have functions that are unknown. One of the main sequencing methods FLUTE relies on is a method called transposon sequencing (Tn-Seq), or the sequencing of transposon mutant libraries. A transposon is a DNA sequence that can remove its own gene and change its position within a chromosome. Tn-Seq uses transposons to insert these into random locations in the genome.

Mutant libraries contain a collection of mutants in which a large number of the genes have been disrupted – a different one in each mutant. These mutant libraries have been a great resource for investigators to understand the various biological functions of individual genes, including those involved in metabolism, antibiotic susceptibility and pathogenesis, the development of the disease.

By sequencing these Tn insertion libraries, profiles of the areas of the genome that receive transposons insertions are obtained. The premise of Tn-Seq is that essential areas of the genome cannot tolerate being disrupted by a transposon.

Tn-Seq has become widely used in microbiology labs around the world for studying the functions of proteins in many different bacterial organisms. It was largely developed in the laboratories of Ioerger’s collaborators, Dr. Eric J. Rubin, professor of immunology and infectious diseases at Harvard University T.H. Chan School of Public Health; and Dr. Christopher M. Sassetti, professor at the University of Massachusetts Medical School. Both are involved in FLUTE and have published many joint papers with Ioerger.

Ioerger and his colleagues are developing computational tools for the statistical analysis of Tn-Seq data. Propelled by the desire for a statistically rigorous way to tell which genes are essential based on Tn-Seq data, Ioerger and his former doctoral student and current postdoctoral researcher, Michael DeJesus, initially developed a Bayesian statistical inference method that utilizes the Extreme Value Distribution to assess the significance of gaps in the genome, or regions that did not receive transposon insertions. They have also developed statistical algorithms for comparative analysis of Tn-Seq datasets and identification of genetic interactions.

Several years ago, Ioerger and his students packaged these algorithms together in a software called Transit, which is now distributed to the public and is used by many labs around the world for Tn-Seq analysis. More recently, they have used these tools to identify proteins involved in pathways like cell-wall biosynthesis, cholesterol metabolism, trehalose biosynthesis, biotin biosynthesis and iron acquisition.

Ioerger also has funding from the Bill and Melinda Gates Foundation, which philanthropically supports research on TB. Through the Gates Foundation, Ioerger's group uses DNA sequencing to help identify the protein targets of inhibitory compounds from high-throughput screens, which is part of large drug-discovery pipeline involving many universities and industry partners (pharmaceutical companies).

While this work has more applied goals of developing new drugs, it builds on all the basic science learned about the physiology of Mycobacterium tuberculosis, including knowledge of the functions of genes and essentiality in different conditions through Tn-Seq and other methods.