Upon entering Alex Tropsha’s office on the third floor of Beard Hall near UNC-Chapel Hill’s medical campus, he says “hello” and then immediately apologizes.
“I’m in the middle of submitting a big grant to the NIH,” he shares. “But I will make you the best cup of coffee you’ve ever had while you wait.”He leans to his left, places a cup on his premium Italian espresso machine, and hits a button. The electric sound of hot water being forced into thick coffee grounds fills the room.
Meanwhile, Tropsha types furiously, giving the grant one final review.
“My favorite keystroke is F7,” he admits. “That’s the spell check.”
Finally, he hits “submit,” smiles, and offers a piece of chocolate. His bubbly nature is infectious, and he’s excited to talk about his research: cheminformatics. The field combines chemistry and computer science to analyze chemical data. More specifically, he focuses on creating new data sets for drug discovery — finding new molecules to treat diseases — and drug repurposing, identifying new uses for existing drugs.
“I’m always thinking about how to process data, how to build models of data, and how to prove that models have predictive power,” Tropsha says.
Tropsha has been at UNC-Chapel Hill since 1989. He started as a postdoctoral researcher and, a few years later, became director of the Laboratory for Molecular Modeling within the UNC Eshelman School of Pharmacy.
Today, he is the K.H. Lee Distinguished Professor in Eshelman but also holds appointments in the computer science and biomedical engineering departments, the computing institute RENCI, and the UNC School for Data Science and Society.
Impact Report
Only 22% of known human diseases have an approved drug treatment, according to the nonprofit Every Cure. Alex Tropsha is using AI to repurpose existing drugs and identify new drugs for diseases without treatments.
Alex’s research is funded by grants from the Advanced Research Projects Agency for Health and the National Institutes of Health. This funding supports vital research, advancing treatments for the people of North Carolina and beyond.
“Even though I’m a member of five departments, the underlying core is the same: There is always a need to collect, curate, understand, process, procure, build models for, and forecast new data,” he says. “That’s the commonality across multiple disciplines.”
In 2024, Tropsha became one of the first researchers at Carolina to receive an Advanced Research Projects Agency for Health (ARPA-H) grant — federal funding focused on accelerating health outcomes. The project brings together five research groups from across the country to build a tool that uses artificial intelligence (AI) to improve drug repurposing.
“It has been estimated that only 22% of known human diseases have at least one approved drug treatment,” Tropsha says. “If this effort is successful, it will dramatically affect finding cures — especially for rare and neglected diseases that Big Pharma, effectively, cannot afford to work on. It will collectively impact over 10% of people with rare diseases and is the most exciting and, potentially, most impactful project of my life.”
Data-driven drug discovery
The field of cheminformatics started forming in the late 1950s, when researchers began modeling the relationship between chemicals’ structures and their properties.
In 1965, the American Chemical Society received funding from the National Science Foundation to create a database of this information called the CAS Registry for use by scientists all over the world. Today, it identifies more than 204 million organic and inorganic substances, from caffeine to aspirin. Tropsha has been using information like this since he began his own research on the topic in the late 1970s.
At that time, he began working on his master’s in chemistry at Moscow State University, focusing on data generation and analysis. He wanted to create reliable, computational models of chemical data that could forecast new uses or new compounds with a desired property.
“My father wanted me to be a physician, but at some point, I realized I was more passionate about research that supports medicine,” Tropsha shares. “Chemistry sounded like a natural way of thinking about designing and discovering medicines that can cure diseases.”
Tropsha continued working on this topic for his PhD through 1986 and came to Carolina three years later. His colleague Frank Brown, who was an adjunct professor in the UNC Eshelman School of Pharmacy at the time, officially coined the term “cheminformatics” in 1998, solidifying Tropsha’s place in the pioneering effort to define this new area of study.
An artificial intelligence approach
In addition to cheminformatics, Tropsha is also an expert in AI. In 2018, he and his colleagues published one of the first papers on generative chemical AI, demonstrating that they could artificially create new chemical entities with desired properties.
For one of his current projects, Tropsha is integrating diverse data sets into structured formats called knowledge graphs, which organize data in a way that allows machines to understand and use it. He does this using an elementary unit called a semantic triplet, comprised of a subject, object, and predicate. For example, if a drug treats a particular disease, the semantic triplet would be “drug X treats disease Y.”
“We know drug X treats cancer, but can it also treat asthma? Can we employ a knowledge graph in support of this hypothesis?” he asks. “Once this knowledge is organized, we can start using machine learning and AI methods to understand the existing connectivity between different concepts across multiple disciplines and, for instance, forecast new connections between existing drugs and diseases.”
Tropsha has built on this work in collaboration with RENCI via an open-source knowledge graph called ROBOKOP. Pulling on information from large biomedical databases, ROBOKOP is a roadmap uncovering answers to questions by examining connections between topics like drugs, diseases, and genes.
What genes are involved in disease X? What drugs treat those genes? What side effects do those drugs have? ROBOKOP can provide the answers or create new data-supported hypotheses about these connections.
“Knowledge graphs are excellent at bringing together heterogeneous information into a single system so that it can be more easily explored,” says Chris Bizon, director of data science and analytics at RENCI. “Two previously unconnected pieces of information will sometimes produce an ‘aha’ moment or unexpected discovery that wouldn’t be obvious otherwise.”
Bizon is Tropsha’s co-principal investigator for the ARPA-H project, and ROBOKOP is a key component. The team intends to build models and research tools to evaluate every possible drug-disease pair for the likelihood that the drug may treat a disease. They will do this for approximately 2,700 drugs and 18,500 diseases, more than 14,000 of which have no drugs associated with them at all. Then, they select the top candidates for preclinical and clinical evaluations by Every Cure, the parent organization for the ARPA-H grant.
The number of diseases available to analyze has grown in the last year of the project due to the recognition of disease variants, because a drug may treat one variant, but not another. For example, COVID has multiple variants, including Alpha, Beta, Delta, and Omicron.
“The entire chemical disease matrix is much bigger than what we considered originally,” Tropsha confirms. “And it includes the challenge of discovering new medications as well as repurposing existing medications for both known and new diseases.”
A team effort
Tropsha and Bizon are focused on the modeling side of this project and are just one part of an incredibly large team.
Carolina geneticist Melissa Haendel is working with them to accurately code the information in ROBOKOP. Their collaborators at the University of Pennsylvania are testing the drug-disease connections identified through the knowledge graph and, so far, have successfully treated one patient with positive results. Every Cure has recruited a team of clinicians to assess how knowledge gained from the project can be applied in clinics.
“We need to adopt an end-to-end approach,” Tropsha says. “We produce analytical suggestions for the data, and our methodologists are making assumptions. We need to know what the clinical team is thinking about so we can modify the process to make our predictions more intelligent and acceptable for the medical world. It’s a feedback loop.”
Tropsha credits much of the project’s success to the culture of collaboration at Carolina.
“I think everybody shares this culture of openness and a willingness to collaborate and contribute in a non-territorial way, which is very important for multi-investigator applications,” he says.
When asked what’s kept him here for 36 years, he points to the university’s growth in data science, including creation of the UNC School of Data Science and Society in 2022.
“For me, it’s not about looking for greener pastures, but making the pasture around me greener,” Tropsha says. “My science has evolved with the university and now data science is our bread and butter.”