Predicting genes that can cause disease due to the production of truncated or altered proteins that take on a new or different function, rather than those that lose their function, is now possible thanks to an international team of researchers, including researchers from Baylor College of Medicine, that has developed a new analytical tool to effectively and efficiently predict such candidate genes.

The tool allowed the researchers to identify 252 candidate ‘disease genes.’ Some of these genes had already been studied in other labs where it was shown that they most likely cause disease by producing defective proteins, which supports the effectiveness of this novel tool. The study appears in the American Journal of Human Genetics.

“Genes can cause disease because of mutations that result in loss-of-function; that is when the gene is not producing the protein it encodes. But genes also can cause disease when mutations result in the production of a defective protein with a new function – a gain-of-function mutation – that may interfere with the function of the normal protein,” said corresponding author Dr. Claudia M.B. Carvalho, assistant professor of molecular and human genetics at Baylor College of Medicine.

In 2015, Carvalho and her colleagues proposed that a gain-of-function mutation in the gene DVL1 is a common cause of dominant Robinow syndrome, a genetically heterogeneous condition for which there was no molecular explanation. They found variants of the gene DVL1 producing a protein that was defective because it was missing a piece at one end.

“These findings suggested that other genes also can cause disease, not by loss of function, but by gain of function,” Carvalho said. “We wanted to know which genes might mechanistically behave in a similar manner.”

“This was not an easy task,” said first author Dr. Zeynep Coban-Akdemir, a bioinformatics and genetics and genomics postdoctoral associate of molecular and human genetics at Baylor. “There are computational tools to predict loss of function but not to predict genes that may cause disease through gain of function, so we started this project to do that.”

The researchers began by identifying clues that a gene may cause disease by a gain-of-function mechanism.

“The clue usually is the location of the mutation in the gene. If the mutation, in this case one called premature termination codon (PTC), happens in the very end of the gene, then this usually predicts that the defective gene and messenger RNA (mRNA) will likely escape the cell’s surveillance mechanisms, which then leads to the production of a defective protein and disease by gain of function,” Coban-Akdemir said.

But if the PTC mutation happens either in the middle or the beginning of the gene, then this usually predicts that the surveillance mechanism will work and, therefore the mRNA will be destroyed and no protein will be produced by the mutated gene. In this case, disease will happen by loss of function of the gene.

“We began working with control datasets, which include the genes of large numbers of people without disease. We reasoned that if PTC mutations in a particular gene accumulated near the beginning or the middle of the gene, and not at the end, then that gene would likely be intolerant to the mutations at the end of the gene,” Coban-Akdemir said. “Genes with this characteristic became our candidates for genes that could be causing disease by a gain-of-function mechanism.”

Next, the researchers looked into another database including the genes of people with diseases and investigated whether the candidate genes they had previously identified were in a cohort of people with disease. The genes they identified this way can potentially cause disease by gain of function. 

“The importance of this work is that there was already a way to analyze genes that caused disease because of loss of function, so here we designed and tested a tool that allows us to make predictions about which of the many genetic alterations found in patients are most likely playing a role in their disease by gain of function,” Carvalho said. “Once we identify the genes, we may conduct further studies to determine ways to help the patients.”

”This is an incredible example of what a fantastic benchtop experimental scientist and a terrific computational scientist can do when they put their intellects together and study the wonderful BigData generated by the Baylor College of Medicine Human Genome Sequencing Center,” said senior author Dr. Jim Lupski, Cullen Professor of Molecular and Human Genetics at Baylor, principal investigator at the Baylor Hopkins Center for Mendelian Genomics and faculty with the Baylor genetics and genomics graduate training program. “I look forward to reading about work emanating from professor Carvalho and Dr. Coban-Akdemir’s study for years to come.”

Other contributors to this work include Janson J. White, Xiaofei Song, Shalini N. Jhangiani, Jawid M. Fatih, Tomasz Gambin, Yavuz Bayram, Ivan K. Chinn, Ender Karaca, Jaya Punetha, Cecilia Poli, Baylor-Hopkins for Mendelian Genomics, Eric Boerwinkle, Chad A. Shaw, Jordan S. Orange, Richard Gibbs and Tuuli Lappalainen. See a complete list of the authors’ affiliations.

Financial support was provided by a joint National Institute of Human Genome Research and National Heart Lung and Blood Institute US NIH grant (UM1HG006542) to the Baylor Hopkins Center for Mendelian Genomics.