| Literature DB >> 18508809 |
Sun Kim1, Soo-Yong Shin, In-Hee Lee, Soo-Jin Kim, Ram Sriram, Byoung-Tak Zhang.
Abstract
Protein-protein interaction (PPI) extraction has been an important research topic in bio-text mining area, since the PPI information is critical for understanding biological processes. However, there are very few open systems available on the Web and most of the systems focus on keyword searching based on predefined PPIs. PIE (Protein Interaction information Extraction system) is a configurable Web service to extract PPIs from literature, including user-provided papers as well as PubMed articles. After providing abstracts or papers, the prediction results are displayed in an easily readable form with essential, yet compact features. The PIE interface supports more features such as PDF file extraction, PubMed search tool and network communication, which are useful for biologists and bio-system developers. The PIE system utilizes natural language processing techniques and machine learning methodologies to predict PPI sentences, which results in high precision performance for Web users. PIE is freely available at http://bi.snu.ac.kr/pie/.Entities:
Mesh:
Year: 2008 PMID: 18508809 PMCID: PMC2447724 DOI: 10.1093/nar/gkn281
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Overview of PIE. The PIE system consists of several modules. ‘Article Filter’ and ‘Sentence Filter’ decide whether given articles or sentences contain PPI information. ‘Search Engine’ retrieves the stored information such as learning data (Article DB) and protein names (Protein DB). ‘Interaction DB’ means the database including interaction-related words. ‘XML–RPC Module’ is responsible for RPC communication with other PPI services. ‘Web Interface Module’ manages the whole process of PPI predictions from Web users. Prediction results contain the links to the iHOP service to provide further protein information. For PubMed search, PIE retrieves PubMed articles using the NCBI E-Utilities.
Figure 2.An example of PIE prediction results. PIE provides a user-friendly and intuitive interface. (A) Input. Web users can upload papers as a file or copy and paste text. A PubMed tool is provided for PubMed article searches. PIE allows multiple PubMed articles for PPI prediction in two ways, manual selection and automatic selection. (B) PubMed search. The article search using PubMed service is available for common use. The search results can be narrowed by the options such as number of results, published years and published journals. The ‘I'm; feeling lucky’ button is for the automatic article selection, which does similar jobs as common PPI extraction tools do. (C) Output. Prediction results are listed in the center box, highlighting PPI sentences based on their probabilities. Colors of sentences represent their probabilities: ‘Red’ for high probability and ‘Green’ for moderate probability. According to the protein DB and the interaction DB, protein names and interaction-related words are indicated by bold and italic fonts, respectively. In particular, protein names are linked to the iHOP service for providing further information. Users can leave feedback to update PIE performance by selecting a ‘No Feedback,’ ‘Agree,’ ‘Partly Disagree’ or ‘Disagree’ button.
Figure 3.ROC curves for test data. Performance of PIE has been measured using independent test sets. The options on PIE was set to using simplified tags and protein dictionary. In all cases, TPR is rapidly increased at low FPR, implying that the system performs high precision predictions for high-probability sentences.