Literature DB >> 19707301

NASCENT: an automatic protein interaction network generation tool for non-model organisms.

Daniel Banky¹, Rafael Ordog, Vince Grolmusz.

Abstract

UNLABELLED: Large quantity of reliable protein interaction data are available for model organisms in public depositories (e.g., MINT, DIP, HPRD, INTERACT). Most data correspond to experiments with the proteins of Saccharomyces cerevisiae, Drosophila melanogaster, Homo sapiens, Caenorhabditis elegans, Escherichia coli and Mus musculus. For other important organisms the data availability is poor or non-existent. Here we present NASCENT, a completely automatic web-based tool and also a downloadable Java program, capable of modeling and generating protein interaction networks even for non-model organisms. The tool performs protein interaction network modeling through gene-name mapping, and outputs the resulting network in graphical form and also in computer-readable graph-forms, directly applicable by popular network modeling software. AVAILABILITY: http://nascent.pitgroup.org.

Entities: Gene Species

Keywords: interaction; model; network; protein; tool

Year: 2009 PMID： 19707301 PMCID： PMC2720673 DOI： 10.6026/97320630003361

Source DB: PubMed Journal: Bioinformation ISSN： 0973-2063

Background

Experimentally validated, high quality protein-protein interaction data are deposited in numerous information sources on the Web, for example in databases like MINT [1], HPRD [2], DIP [3] and IntAct [4]. Most of the data were acquired for popular model organisms, like Saccharomyces cerevisiae [5], Drosophila melanogaster, Caenorhabditis elegans [6], Escherichia coli, Mus musculus and Homo sapiens [7,8,9]. Numerous other organisms of importance are completely missing from these depositories, or only very little data are deposited publicly (e.g., Mycobacterium tuberculosis). Modeling protein interaction networks for organisms poorly represented in the large depositories is an important task. In the present work we describe a web-based tool, called NASCENT, capable of automatically modeling of protein interaction network data from the rich experimental data source deposited in IntAct [4]. In NASCENT, the user is allowed to designate a source organism and the target organism. The source organism, preferably one of those with lots of deposited protein-protein interaction data, is used for input in modeling the interactions in the target organism. The organisms need to be identified by NCBI taxonomy ID's; an autocompleting tool helps the user to find this ID from the scientific Latin name of the species. The mapping of the interactions is done by corresponding genes of the expressed proteins of the two organisms, as described in the Methodology section. NASCENT applies the Swiss-Prot database [10] for performing the mapping of the gene names of different organisms. The NASCENT is scalable tool for integrating the constantly updated source database (IntAct [4] and the mapping database (UniProtKB/Swiss-Prot [10]: it applies weekly updates to the internal database, queried by the tool. The graphical interface of Nascent is intended to yield a quick overview of the network generated. The nodes are labeled by the UniProt primary accession numbers of the proteins [10], and the drawing method of the graph can be selected as one of seven layouts (i.e., random, forcedirected, Fruchterman-Reingold force directed, node-link tree, balloon tree, radial tree and circle layouts). For the easier navigation through complex networks, when a mouse cursor moves over a protein-code, then its color will be changed to red, while their neighbors will be yellow. Network nodes can also be moved around and can be grouped easily with the mouse. The NASCENT tool is capable of creating networks in SIF, text and GraphML output for using in other network tools. JPEG export can also be chosen at http://nascent.pitgroup.org . A faster, downloadable selfstanding Java program is also available there.

Methodology

The network construction algorithm is summarized on Figure 2. When the user designates a species, the network of that species is retrieved from the local, regularly updated mirror of the IntAct database [4]. If the user checks the box requiring the inclusion of the phylogenic sub-tree on the source side, then the protein interaction network data of all the descendent subspecies of the source organism will also be included in the list. That forms the local copy of the interaction network of the source species.

Figure 2

Flow-chart of the network construction by NASCENT

Next, the protein-gene correspondence is computed, using the UniProt database [10]. The target organism's genes and proteins are chosen from the UniProt database [10] as the next step. If the user checks the box requiring the inclusion of the phylogenic subtree on the target organism, then the dictionary of the genes and protein accession codes of the descendent subspecies of the target organism will also be included in the list. Next the proteins of the source and the target organisms are corresponded according to the gene names; we call this step gene mapping on Figure 2.

Multiple correspondences are handled as follows

Suppose that the source organism is X and the target organism is Y. If in the source organism X the interaction edge A-B is present, connecting proteins A and B, and protein A corresponds to proteins A1, A2 and A3 in organism Y, and protein B in X corresponds to protein B1 in Y, then all the three edges A1-B1, A2-B1 and A3-B1 will be added to the network of the target. The graphical user interface was made by using the Prefuse toolkit http://prefuse.org. The Java installer application chosen is the install4j of ej-technologies http://www.ej-technologies.com.

Caveats

NASCENT will return an empty graph for target organisms with very few data in the SwissProt database. For example, if one try to generate the network of Canis lupus from the network of the Mus musculus, and the “Include subtree” box is unchecked next to the target organism, then the result graph will be empty, since there is relatively little data on the grey wolf in Uniprot. However, if we check the box “Include sub-tree” box, then all the data of the subspecies will be screened, including those of the domestic dog, so the graph will not be empty. It is recommended to check that box if NASCENT returns an empty graph. Presently, only NCBI Taxonomy ID's of species can be entered, the codes of subspecies will not generate outputs; except in the case when all the subspecies of a given species is screened as in the example above. Note, that generating large force-directed layouts is resource-hungry.

Utility

The most useful application is generating protein interaction networks for important, but non-model organisms, with lots of data in Uniprot. The network generated can be exported into popular graph drawing and network analyzing software, since we offer text, SIF and GrapML formatted output. For fast review the Java applet should be chosen by clicking “Show Graph“ on the Results Page. There the nodes can be moved around, and the neighbors are marked automatically, by the touch of the mouse.

10 in total

1. The Database of Interacting Proteins: 2004 update.

Authors: Lukasz Salwinski; Christopher S Miller; Adam J Smith; Frank K Pettit; James U Bowie; David Eisenberg
Journal: Nucleic Acids Res Date: 2004-01-01 Impact factor: 16.971

2. Towards a proteome-scale map of the human protein-protein interaction network.

Authors: Jean-François Rual; Kavitha Venkatesan; Tong Hao; Tomoko Hirozane-Kishikawa; Amélie Dricot; Ning Li; Gabriel F Berriz; Francis D Gibbons; Matija Dreze; Nono Ayivi-Guedehoussou; Niels Klitgord; Christophe Simon; Mike Boxem; Stuart Milstein; Jennifer Rosenberg; Debra S Goldberg; Lan V Zhang; Sharyl L Wong; Giovanni Franklin; Siming Li; Joanna S Albala; Janghoo Lim; Carlene Fraughton; Estelle Llamosas; Sebiha Cevik; Camille Bex; Philippe Lamesch; Robert S Sikorski; Jean Vandenhaute; Huda Y Zoghbi; Alex Smolyar; Stephanie Bosak; Reynaldo Sequerra; Lynn Doucette-Stamm; Michael E Cusick; David E Hill; Frederick P Roth; Marc Vidal
Journal: Nature Date: 2005-09-28 Impact factor: 49.962

3. Comparison of human protein-protein interaction maps.

Authors: Matthias E Futschik; Gautam Chaurasia; Hanspeter Herzel
Journal: Bioinformatics Date: 2007-01-19 Impact factor: 6.937

4. A map of the interactome network of the metazoan C. elegans.

Authors: Siming Li; Christopher M Armstrong; Nicolas Bertin; Hui Ge; Stuart Milstein; Mike Boxem; Pierre-Olivier Vidalain; Jing-Dong J Han; Alban Chesneau; Tong Hao; Debra S Goldberg; Ning Li; Monica Martinez; Jean-François Rual; Philippe Lamesch; Lai Xu; Muneesh Tewari; Sharyl L Wong; Lan V Zhang; Gabriel F Berriz; Laurent Jacotot; Philippe Vaglio; Jérôme Reboul; Tomoko Hirozane-Kishikawa; Qianru Li; Harrison W Gabel; Ahmed Elewa; Bridget Baumgartner; Debra J Rose; Haiyuan Yu; Stephanie Bosak; Reynaldo Sequerra; Andrew Fraser; Susan E Mango; William M Saxton; Susan Strome; Sander Van Den Heuvel; Fabio Piano; Jean Vandenhaute; Claude Sardet; Mark Gerstein; Lynn Doucette-Stamm; Kristin C Gunsalus; J Wade Harper; Michael E Cusick; Frederick P Roth; David E Hill; Marc Vidal
Journal: Science Date: 2004-01-02 Impact factor: 47.728

5. Functional organization of the yeast proteome by a yeast interactome map.

Authors: André X C N Valente; Seth B Roberts; Gregory A Buck; Yuan Gao
Journal: Proc Natl Acad Sci U S A Date: 2009-01-21 Impact factor: 11.205

6. IntAct--open source resource for molecular interaction data.

Authors: S Kerrien; Y Alam-Faruque; B Aranda; I Bancarz; A Bridge; C Derow; E Dimmer; M Feuermann; A Friedrichsen; R Huntley; C Kohler; J Khadake; C Leroy; A Liban; C Lieftink; L Montecchi-Palazzi; S Orchard; J Risse; K Robbe; B Roechert; D Thorneycroft; Y Zhang; R Apweiler; H Hermjakob
Journal: Nucleic Acids Res Date: 2006-12-01 Impact factor: 16.971

7. MINT: the Molecular INTeraction database.

Authors: Andrew Chatr-aryamontri; Arnaud Ceol; Luisa Montecchi Palazzi; Giuliano Nardelli; Maria Victoria Schneider; Luisa Castagnoli; Gianni Cesareni
Journal: Nucleic Acids Res Date: 2006-11-29 Impact factor: 16.971

8. Human Protein Reference Database--2009 update.

Authors: T S Keshava Prasad; Renu Goel; Kumaran Kandasamy; Shivakumar Keerthikumar; Sameer Kumar; Suresh Mathivanan; Deepthi Telikicherla; Rajesh Raju; Beema Shafreen; Abhilash Venugopal; Lavanya Balakrishnan; Arivusudar Marimuthu; Sutopa Banerjee; Devi S Somanathan; Aimy Sebastian; Sandhya Rani; Somak Ray; C J Harrys Kishore; Sashi Kanth; Mukhtar Ahmed; Manoj K Kashyap; Riaz Mohmood; Y L Ramachandra; V Krishna; B Abdul Rahiman; Sujatha Mohan; Prathibha Ranganathan; Subhashri Ramabadran; Raghothama Chaerkady; Akhilesh Pandey
Journal: Nucleic Acids Res Date: 2008-11-06 Impact factor: 16.971

9. UniHI 4: new tools for query, analysis and visualization of the human protein-protein interactome.

Authors: Gautam Chaurasia; Soniya Malhotra; Jenny Russ; Sigrid Schnoegl; Christian Hänig; Erich E Wanker; Matthias E Futschik
Journal: Nucleic Acids Res Date: 2008-11-04 Impact factor: 16.971

10. The Universal Protein Resource (UniProt) 2009.

Authors:
Journal: Nucleic Acids Res Date: 2008-10-04 Impact factor: 16.971

10 in total

2 in total

1. Human protein reference database and human proteinpedia as discovery resources for molecular biotechnology.

Authors: Renu Goel; Babylakshmi Muthusamy; Akhilesh Pandey; T S Keshava Prasad
Journal: Mol Biotechnol Date: 2011-05 Impact factor: 2.695

2. Identifying diabetes-related important protein targets with few interacting partners with the PageRank algorithm.

Authors: Vince I Grolmusz
Journal: R Soc Open Sci Date: 2015-04-29 Impact factor: 2.963

2 in total