Information Hyperlinked over Proteins

Information Hyperlinked over Proteins (or iHOP) is an online text mining service that provides a gene-guided network to access PubMed abstracts. The service was established by Robert Hoffmann and Alfonso Valencia in 2004.

The concept underlying iHOP is that by using genes and proteins as hyperlinks between sentences and abstracts, the information in PubMed can be converted into one navigable resource. Navigating across interrelated sentences within this network rather than the use of conventional keyword searches allows for stepwise and controlled acquisition of information. Moreover, this literature network can be superimposed upon experimental interaction data to facilitate the simultaneous analysis of novel and existing knowledge. As of September 2014, the network presented in iHOP contains 28.4 million sentences and 110,000 genes from over 2,700 organisms, including the model organisms Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans, Danio rerio, Arabidopsis thaliana, Saccharomyces cerevisiae and Escherichia coli.

The iHOP system has shown that by navigating from gene to gene, distant medical and biological concepts may be connected by only a small number of genes; the shortest path between two genes has been shown to involve on average four intermediary genes.

The iHOP system architecture consists of two separate parts: the 'iHOP factory' and the web application. The iHOP factory manages the PubMed source data (text and gene data) and organises it within a PostgreSQL relational database. The iHOP factory also produces the relevant XML output for display by the web application.

iHOP is free to use and is licensed under a Creative Commons BY-ND license.