Literature DB >> 27899679

AtPID: a genome-scale resource for genotype-phenotype associations in Arabidopsis.

Qi Lv1,2, Yiheng Lan1, Yan Shi1, Huan Wang1, Xia Pan1, Peng Li3, Tieliu Shi4.   

Abstract

AtPID (Arabidopsis thaliana Protein Interactome Database, available at http://www.megabionet.org/atpid) is an integrated database resource for protein interaction network and functional annotation. In the past few years, we collected 5564 mutants with significant morphological alterations and manually curated them to 167 plant ontology (PO) morphology categories. These single/multiple-gene mutants were indexed and linked to 3919 genes. After integrated these genotype-phenotype associations with the comprehensive protein interaction network in AtPID, we developed a Naïve Bayes method and predicted 4457 novel high confidence gene-PO pairs with 1369 genes as the complement. Along with the accumulated novel data for protein interaction and functional annotation, and the updated visualization toolkits, we present a genome-scale resource for genotype-phenotype associations for Arabidopsis in AtPID 5.0. In our updated website, all the new genotype-phenotype associations from mutants, protein network, and the protein annotation information can be vividly displayed in a comprehensive network view, which will greatly enhance plant protein function and genotype-phenotype association studies in a systematical way.
© The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Year:  2016        PMID: 27899679      PMCID: PMC5210528          DOI: 10.1093/nar/gkw1029

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Protein functional annotation and the protein networks including protein–protein interactions and regulatory relations are essential for understanding the underlying mechanism of the biological system. AtPID is a comprehensive data resource developed using Arabidopsis thaliana as the model system for protein interactions and functional annotation. From the year 2005, we started to collect the protein–protein interactions (PPIs) from literature and released AtPID 1.0, which only included limited curated PPIs and protein functional annotations from TAIR (The Arabidopsis Information Resource) (1). Due to the increasing demand of the comprehensive PPI from related research communities, we extended the PPI network by different computational methods and released AtPID 2.0 in 2006. In order to further increase the coverage and overcome the false positive issue within the predicted dataset, we manually curated more PPIs from the literature and developed a Naïve Bayesian based classifier to integrate and evaluate all the predicted PPIs, which made our database updated to AtPID 3.0 in 2008 as a rich source of information for system-level understanding of gene function and biological processes (2). In order to better serve the related research communities for the mechanism studies of various physiological activities, we annotated the Arabidopsis proteins in the AtPID 4.0 database with further information (e.g. functional annotation, subcellular localization, tissue-specific expression, phosphorylation information, SNP phenotype and mutant phenotype, etc.) and interaction qualifications (e.g. transcriptional regulation, complex assembly, functional collaboration, etc.) via further literature text mining and integration of other resources (3) (Table 1).
Table 1.

A comprehensive comparison for the different versions of AtPID database

Function annotationMolecule interaction
The version of AtPIDProtein functional descriptionSubcellular localizationMutantPhenotype annotationCurated PPIsPredicted PPIsCurated transcriptional regulationsPredicted transcriptional regulations
AtPID 3 (2)32 000--466623 396
AtPID 4 (3)40 00010 4295121 mutants, 3431 genes556598 1748070
AtPID 5 (Current Version)40 00011 0525609 mutants, 3916 genes8202 mutant-PO associations45 382118 556943531 991
Comparing with other organisms, plants have unique advantages on the mutagenesis and tissue culture, a large number of characterized stable Arabidopsis mutants have been reported in research literature, and large-scale seeds/mutant resources for plant functional studies were built for genome annotation and functional studies, e.g. uNASC Database (The European Arabidopsis Stock Centre), RAPID (RIKEN Arabidopsis Phenome Information Database), CSHL Database (the Arabidopsis Genetrap Website at Cold Spring Harbor Lab), Chloroplast Function Database, SeedGenes Database, AGRICOLA Database (Systematic RNAi knockouts in Arabidopsis), Araport (the Arabidopsis Information Portal) and TAIR (4–10). Mutant phenotypes are especially critical for functional studies of plants. Although great efforts have been made on collecting related data in plants, the mutant phenotypes are still largely under-annotated. AtPID has been committed to collect more mutants with significantly morphological alterations and tried to annotate all the mutants’ phenotypes in a systematical way. The Plant Ontology is a controlled vocabulary (ontology) that describes plant anatomy and morphology and stages of development for all plants (11). In order to index and annotate all the mutants in AtPID into a standard semantic framework, we cooperated with Shanghai Society for Plant Biology and annotated all the mutants to more specific downstream PO categories. In this update, the AtPID 5.0 database greatly expands the information on PPIs, mutant phenotypes obtained from published literature (12–14), public databases and computational approaches. For mutant related information, the data of mutant phenotypes were carefully curated by biologists. In addition, novel associations between genes and phenotypes were predicted through Naïve Bayes method. Furthermore, we developed a more comprehensive visualization toolkit to view all the interactions at PPI, transcriptional regulation and genotype–phenotype levels under the same framework, which could easily show/map all other annotation information in our database for selected genes. All of the improvements and updates will accelerate researchers in exploiting information in our database in a more effective and comprehensive way.

RESULTS

Summary of new data in the updated AtPID 5.0

Comparing with the other well-used PPI resources (Table 2), the updated database indexed 45 382 curated PPIs and 118 556 predicted PPIs from literature mining, public databases or computational approaches. These numbers are significantly increased due to the ravenous growth and maturing biomedical national processing language and the large-scale experiments for functional studies (15–17). We also generated a comprehensive chloroplast proteomics dataset in Arabidopsis by large-scale proteomics experiments and indexed all 3134 credible chloroplast proteins into our annotation system. Furthermore, we systematically annotated 31 991 TFBS associations to 6891 genes based on the integration of expression profiling and cis-regulatory element information. This update largely enriches protein annotations in our database by tracking the recent research progresses of related areas and will greatly assist functional experiments and systematic studies.
Table 2.

Numbers of interactions in AtPID 5.0 compared with the other well-used data resources

PPI-related databaseDescription for the PPI databaseCurated PPIsPredicted PPIs
AtPID 5.0An integrated database resource for protein interaction network and functional annotation proteome.(http://www.megabionet.org/atpid)45 382118 556
PAIRThe predicted Arabidopsis interactome resource(http://www.cls.zju.edu.cn/pair/) (24)5990137 986
TAIRA database of genetic and molecular biology data for the model higher plant Arabidopsis thaliana.(http://www.arabidopsis.org) (25)6503
BioGRIDAn interaction repository with data compiled through comprehensive curation efforts.(http://thebiogrid.org/) (26)42 216
STRINGA database of predicted functional associations between proteins.(http://string-db.org/) (27)>1 000 000

Comprehensive annotation of genotype–phenotype associations

Using text mining and database integration, the previous version (AtPID 4.1) collected 5121 mutants with significantly observable phenotypes related to 3431 genes. In the past few years, through in-depth cooperation with Shanghai Society for Plant Biology, we collected 488 new mutants and systematically annotated all the existed and new curated mutants’ phenotypes to 167 standardized plant ontology categories (Figure 1A). Comprehensive collection on phenotype data can help phenotype mechanism studies as what have been done in systematical exploration of disease associations (18,19). Strategies or algorithms have been developed to predict gene related functions by integrating multiple level data (18–20). We integrated three different information, PPIs, co-expression from expression profiling and GO annotation with Naïve Bayes method. PPIs were quantified by the extended Czekanowski–Dice distance (21) and missing values were complemented by orthologs in other 14 species’ experimental PPIs from STRING database (22). Shared Smallest Biological Processes (SSBPs) was applied to describe the possibility of gene interactions on GO annotation (23). Co-expression of gene pairs were computed over the microarrays mentioned above to predict regulatory interactions. The correlation coefficient values of the three information were low (PPIs-GO: 0.05; PPIs-co-expression: 0.08; co-expression-GO: −0.03), suggesting that features were independent from each other and satisfied with the assumption of Naïve Bayes method. Naïve Bayes was undertaken by e1071 package in R. The model showed high predictability, with average AUC 0.72. Finally, the prediction contains 4457 novel gene-PO pairs with 1369 genes, which could be a supplement to the known mutant information.
Figure 1.

The overview and network display of the curated genotype–phenotype associations in AtPID 5.0. Top left corner (A) exhibits the top level Plant Ontology (PO) entries in Arabidopsis and the annotated gene numbers related to this PO. Bottom right corner (B) shows the flower-associated network. (C) Node with mouse hovering annotation.

The overview and network display of the curated genotype–phenotype associations in AtPID 5.0. Top left corner (A) exhibits the top level Plant Ontology (PO) entries in Arabidopsis and the annotated gene numbers related to this PO. Bottom right corner (B) shows the flower-associated network. (C) Node with mouse hovering annotation.

User friendly visualization toolkit for comprehensive genotype–phenotype network

For the phenotype annotation information, we re-developed the network visualization application (Figure 1B and C) with JavaScript, which inherited all the functions of the old java applet, and added phenotype as a new node type. The new visualization application has better compatibility and performance due to the optimization of the database structure and the network generation methods. Meanwhile, it presents the network in a more interactive and comprehensive way. All the protein annotation information and protein relations in AtPID 5.0 can be presented simultaneously on the same view, and users can easily extend the network by double clicking any node on the border of current network. The combination of genotype–phenotype associations and the protein interaction information can provide existing knowledge of selected proteins to biologists in a very intuitive way and help them easily understand the functional relations to confirm their hypotheses or inspire them on new study designs.

CONCLUSIONS

Here, we have made great efforts to provide a significantly improved resource for genotype–phenotype associations, which could serve as a resource for experimental design and facilitate genome-wide systematical studies in Arabidopsis. The AtPID 5.0 also provides illustrations of the functional annotation and protein network with a friendly web-based interface. We have largely extended the current annotation information by literature curation, bioinformatics predictions and also the high-throughput experimental data in the AtPID 5.0, e.g. we generated a comprehensive chloroplast proteomics dataset in Arabidopsis by large-scale proteomics experiments and indexed all the data as the evidence for subcellular localization in current AtPID. We will continue to accumulate more genome-wide data to better serve the research community.
  27 in total

1.  A trial of phenome analysis using 4000 Ds-insertional mutants in gene-coding regions of Arabidopsis.

Authors:  Takashi Kuromori; Takuji Wada; Asako Kamiya; Masahiro Yuguchi; Takuro Yokouchi; Yuko Imura; Hiroko Takabe; Tetsuya Sakurai; Kenji Akiyama; Takashi Hirayama; Kiyotaka Okada; Kazuo Shinozaki
Journal:  Plant J       Date:  2006-06-30       Impact factor: 6.417

2.  Systematic identification of human mitochondrial disease genes through integrative genomics.

Authors:  Sarah Calvo; Mohit Jain; Xiaohui Xie; Sunil A Sheth; Betty Chang; Olga A Goldberger; Antonella Spinazzola; Massimo Zeviani; Steven A Carr; Vamsi K Mootha
Journal:  Nat Genet       Date:  2006-04-02       Impact factor: 38.330

Review 3.  Protein networks in disease.

Authors:  Trey Ideker; Roded Sharan
Journal:  Genome Res       Date:  2008-04       Impact factor: 9.043

4.  Gene trap lines define domains of gene regulation in Arabidopsis petals and stamens.

Authors:  Naomi Nakayama; Juana M Arroyo; Joseph Simorowski; Bruce May; Robert Martienssen; Vivian F Irish
Journal:  Plant Cell       Date:  2005-07-29       Impact factor: 11.277

5.  Versatile gene-specific sequence tags for Arabidopsis functional genomics: transcript profiling and reverse genetics applications.

Authors:  Pierre Hilson; Joke Allemeersch; Thomas Altmann; Sébastien Aubourg; Alexandra Avon; Jim Beynon; Rishikesh P Bhalerao; Frédérique Bitton; Michel Caboche; Bernard Cannoot; Vasil Chardakov; Cécile Cognet-Holliger; Vincent Colot; Mark Crowe; Caroline Darimont; Steffen Durinck; Holger Eickhoff; Andéol Falcon de Longevialle; Edward E Farmer; Murray Grant; Martin T R Kuiper; Hans Lehrach; Céline Léon; Antonio Leyva; Joakim Lundeberg; Claire Lurin; Yves Moreau; Wilfried Nietfeld; Javier Paz-Ares; Philippe Reymond; Pierre Rouzé; Goran Sandberg; Maria Dolores Segura; Carine Serizet; Alexandra Tabrett; Ludivine Taconnat; Vincent Thareau; Paul Van Hummelen; Steven Vercruysse; Marnik Vuylsteke; Magdalena Weingartner; Peter J Weisbeek; Valtteri Wirta; Floyd R A Wittink; Marc Zabeau; Ian Small
Journal:  Genome Res       Date:  2004-10       Impact factor: 9.043

6.  The BioGRID interaction database: 2015 update.

Authors:  Andrew Chatr-Aryamontri; Bobby-Joe Breitkreutz; Rose Oughtred; Lorrie Boucher; Sven Heinicke; Daici Chen; Chris Stark; Ashton Breitkreutz; Nadine Kolas; Lara O'Donnell; Teresa Reguly; Julie Nixon; Lindsay Ramage; Andrew Winter; Adnane Sellam; Christie Chang; Jodi Hirschman; Chandra Theesfeld; Jennifer Rust; Michael S Livstone; Kara Dolinski; Mike Tyers
Journal:  Nucleic Acids Res       Date:  2014-11-26       Impact factor: 19.160

7.  Araport: the Arabidopsis information portal.

Authors:  Vivek Krishnakumar; Matthew R Hanlon; Sergio Contrino; Erik S Ferlanti; Svetlana Karamycheva; Maria Kim; Benjamin D Rosen; Chia-Yi Cheng; Walter Moreira; Stephen A Mock; Joseph Stubbs; Julie M Sullivan; Konstantinos Krampis; Jason R Miller; Gos Micklem; Matthew Vaughn; Christopher D Town
Journal:  Nucleic Acids Res       Date:  2014-11-20       Impact factor: 16.971

8.  Sustainable funding for biocuration: The Arabidopsis Information Resource (TAIR) as a case study of a subscription-based funding model.

Authors:  Leonore Reiser; Tanya Z Berardini; Donghui Li; Robert Muller; Emily M Strait; Qian Li; Yarik Mezheritsky; Andrey Vetushko; Eva Huala
Journal:  Database (Oxford)       Date:  2016-03-17       Impact factor: 3.451

9.  Clustering proteins from interaction networks for the prediction of cellular functions.

Authors:  Christine Brun; Carl Herrmann; Alain Guénoche
Journal:  BMC Bioinformatics       Date:  2004-07-13       Impact factor: 3.169

10.  OpenDMAP: an open source, ontology-driven concept analysis engine, with applications to capturing knowledge regarding protein transport, protein interactions and cell-type-specific gene expression.

Authors:  Lawrence Hunter; Zhiyong Lu; James Firby; William A Baumgartner; Helen L Johnson; Philip V Ogren; K Bretonnel Cohen
Journal:  BMC Bioinformatics       Date:  2008-01-31       Impact factor: 3.169

View more
  2 in total

1.  Using expression quantitative trait loci data and graph-embedded neural networks to uncover genotype-phenotype interactions.

Authors:  Xinpeng Guo; Jinyu Han; Yafei Song; Zhilei Yin; Shuaichen Liu; Xuequn Shang
Journal:  Front Genet       Date:  2022-08-15       Impact factor: 4.772

2.  AtMAD: Arabidopsis thaliana multi-omics association database.

Authors:  Yiheng Lan; Ruikun Sun; Jian Ouyang; Wubing Ding; Min-Jun Kim; Jun Wu; Yuhua Li; Tieliu Shi
Journal:  Nucleic Acids Res       Date:  2021-01-08       Impact factor: 16.971

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.