The Knowledge Integration Toolkit or KnIT – which previously predicted unknown protein-protein interactions – is now fully automated without need for human involvement in its function, said researchers from Baylor College of Medicine and IBM Watson Health.
A presentation on the new process will be presented this week at the Knowledge Discovery and Data Mining Conference in Sydney, Australia. The new work is an extension of a study presented last year at the conference that showed KnIT accurately predicted proteins that bind and chemically modify interactions of p53, an important tumor suppressor.
“We built a tool based on IBM’s cognitive technologies that can read the medical literature – over 24,000,000 published and public paper abstracts– and relate papers associated with word signatures in an effort to accelerate our understanding of the functional properties of proteins,” said Dr. Olivier Lichtarge, director of the Center of Computational and Integrative Biomedical Research at Baylor and the principle investigator on the study. “This latest study moves much beyond that and goes beyond just p53.”
Unlike the previous study, KnIT is now fully automated and demonstrably scalable, according to the researchers, who are now using the technology to target other proteins.
“Our findings validate that KnIT, which is based on IBM’s Watson technology, can make accurate predictions that will eventually help researchers focus lab resources on the most promising areas for discovery,” said Scott Spangler, Principal Data Scientist, IBM Research. “Because the underlying principles and techniques we employed are general, this approach could be applied to a variety of scientific and engineering problems to mine existing knowledge and identify the next frontiers of discovery.”
The tool works by extracting information from the scientific literature, automatically identifying direct and indirect references to protein interactions, which is knowledge that can be represented in network form, the team presented.
It then reasons over this network to predict new, previously unknown interactions, they said.
The accuracy and scope of KnIT's knowledge extractions are validated using comparisons to structured data sources assembled by humans as well as by performing retrospective studies that predict subsequent literature discoveries using literature available prior to a given date.
The KnIT methodology is a step towards automated hypothesis generation from text, with potential application to other scientific domains.
Co-authors on the report include Angela D. Wilkins, Benjamin J. Bachman, Ilya B. Novikov, María E. Terrón-Díaz, Anbu K. Adikesavan, Sam Regenbogen, Christie M. Buchovecky, Andreas M. Lisewski, Houyin Zhang and Lawrence Donehower, all of Baylor; Meena Nagarajan, Shenghua Bao, Peter J. Haas, Sumit Bhatia, Jacques J. Labrie, Linda Kato, Ana Lelescu, Stephen Boyer, Griff Weber, Ying Chen and Scott Spangler, all of IBM and Curtis R. Pickering of The University of Texas MD Anderson Cancer Center.
Funding for this work was provided by the McNair Medical Institute of The Robert and Janice McNair Foundation, Defense Advanced Research Projects Agency DARPA (N66001-14-1-4027), National Science Foundation (NSF DBI-1356569, NSF DBI-0851393), National Institutes of Health (NIH-GM079656, NIH-GM066099), and was supported in part by the IBM Accelerated Discovery Lab.