| Literature DB >> 21071393 |
Toshinori Endo1, Keisuke Ueno, Kouki Yonezawa, Katsuhiko Mineta, Kohji Hotta, Yutaka Satou, Lixy Yamada, Michio Ogasawara, Hiroki Takahashi, Ayako Nakajima, Mia Nakachi, Mamoru Nomura, Junko Yaguchi, Yasunori Sasakura, Chisato Yamasaki, Miho Sera, Akiyasu C Yoshizawa, Tadashi Imanishi, Hisaaki Taniguchi, Kazuo Inaba.
Abstract
The Ciona intestinalis protein database (CIPRO) is an integrated protein database for the tunicate species C. intestinalis. The database is unique in two respects: first, because of its phylogenetic position, Ciona is suitable model for understanding vertebrate evolution; and second, the database includes original large-scale transcriptomic and proteomic data. Ciona intestinalis has also been a favorite of developmental biologists. Therefore, large amounts of data exist on its development and morphology, along with a recent genome sequence and gene expression data. The CIPRO database is aimed at collecting those published data as well as providing unique information from unpublished experimental data, such as 3D expression profiling, 2D-PAGE and mass spectrometry-based large-scale analyses at various developmental stages, curated annotation data and various bioinformatic data, to facilitate research in diverse areas, including developmental, comparative and evolutionary biology. For medical and evolutionary research, homologs in humans and major model organisms are intentionally included. The current database is based on a recently developed KH model containing 36,034 unique sequences, but for higher usability it covers 89,683 all known and predicted proteins from all gene models for this species. Of these sequences, more than 10,000 proteins have been manually annotated. Furthermore, to establish a community-supported protein database, these annotations are open to evaluation by users through the CIPRO website. CIPRO 2.5 is freely accessible at http://cipro.ibio.jp/2.5.Entities:
Mesh:
Substances:
Year: 2010 PMID: 21071393 PMCID: PMC3013717 DOI: 10.1093/nar/gkq1144
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Similarity and the number of shared sequences among proteomes contained in CIPRO. Five proteomes and one local protein data set for Ciona intestinalis contained in CIPRO is shown. Branch lengths represent approximate distances between data set and their internal nodes, based on the proportion of shared sequences with their neighbors. The numbers in cyan circles show the number of sequences shared by descended nodes.
Figure 2.A sample data view for protein entry KH.C2.187.v3.A.SL2-2. Sequence and functional information are shown on the left side of the window and experimental data are summarized as informative graphics in the right panel. Components indicate (1) protein short name and description, (2) amino acid length, deduced molecular weight, deduced isoelectric point, existence of stop codon and amino acid sequence, (3) link to BLAST search site at NCBI with the sequence filled in the query form, (4) homolog and motif information, (5) miscellaneous literal information including disease, automatic annotation, phylogeny, hits to KEGG Ortholog Cluster and duplicated sequences, (6) identical sequence entries, (7) experimental results, (8) graphical results of bioinformatics analyses, (9) user annotation facility and (10) user comments in which formatted text with links and pictures can be integrated.
Criteria for automated annotation
| Category | Criteria | Notation | Unique entries |
|---|---|---|---|
| I | ≥50% identity, ≥50% coverage | HOMOLOGOUS TO | 10 170 |
| II | ≥25% identity | SIMILAR TO | 11 077 |
| III | Found a motif or domain in databases | XXX domain containing proteins | 18 927 |
| IV | Predicted proteins with evolutionary conservation | Conserved hypothetical proteins | 6372 |
| V | Predicted proteins longer than or equal to 80 amino acids | Hypothetical proteins | 28 430 |
| VI | Predicted proteins shorter than 80 amino acids | Hypothetical short proteins | 14 697 |
The higher category always takes precedence for the annotation of each protein.
aHomology to predicted proteins are not counted in this category.