| Literature DB >> 25435546 |
Atsushi Kurotani1, Yutaka Yamada2, Kazuo Shinozaki2, Yutaka Kuroda3, Tetsuya Sakurai4.
Abstract
Arabidopsis thaliana is an important model species for studies of plant gene functions. Research on Arabidopsis has resulted in the generation of high-quality genome sequences, annotations and related post-genomic studies. The amount of annotation, such as gene-coding regions and structures, is steadily growing in the field of plant research. In contrast to the genomics resource of animals and microorganisms, there are still some difficulties with characterization of some gene functions in plant genomics studies. The acquisition of information on protein structure can help elucidate the corresponding gene function because proteins encoded in the genome possess highly specific structures and functions. In this study, we calculated multiple physicochemical and secondary structural parameters of protein sequences, including length, hydrophobicity, the amount of secondary structure, the number of intrinsically disordered regions (IDRs) and the predicted presence of transmembrane helices and signal peptides, using a total of 208,333 protein sequences from the genomes of six representative plant species, Arabidopsis thaliana, Glycine max (soybean), Populus trichocarpa (poplar), Oryza sativa (rice), Physcomitrella patens (moss) and Cyanidioschyzon merolae (alga). Using the PASS tool and the Rosetta Stone method, we annotated the presence of novel functional regions in 1,732 protein sequences that included unannotated sequences from the Arabidopsis and rice proteomes. These results were organized into the Plant Protein Annotation Suite database (Plant-PrAS), which can be freely accessed online at http://plant-pras.riken.jp/.Entities:
Keywords: Database; Gene function; Physicochemical property; Plant protein; Protein property
Mesh:
Substances:
Year: 2014 PMID: 25435546 PMCID: PMC4301743 DOI: 10.1093/pcp/pcu176
Source DB: PubMed Journal: Plant Cell Physiol ISSN: 0032-0781 Impact factor: 4.927
Detection of novel functional regions in the unannotated protein sequences of Arabidopsis and rice by means of Plant-PrAS (Plant Protein Annotation Suite database)
| Plant species | Unannotated sequences | Pfam(+) | Pfam(–) | ||
|---|---|---|---|---|---|
| PASS(+) | Rosetta Stone | ||||
| Composite(+) | Component(+) | ||||
| Arabidopsis | 5,180 | 312 | 111 | 421 | 63 |
| MSU Rice | 15,322 | 640 | 111 | 280 | 225 |
| RAP-DB (rice) | 14,716 | 1,518 | 301 | 307 | 412 |
| Total | 35,218 | 2,470 | 523 | 1,008 | 700 |
a The number of protein hits in the Pfam database.
b The number of proteins whose functional regions were detected by PASS but not by Pfam [Pfam(–)].
The number of proteins whose functional regions were detected as Rosetta Stone composites with Pfam(–).
The number of proteins whose functional regions were detected as Rosetta Stone components with Pfam(–).
Fig. 1Search interfaces of Plant-PrAS. A user can search for multiple protein sequence properties on the ‘Property Search’ page (A). The user can also search for objective records using the ‘Keyword Search’ function (B). ‘ID Search’ makes it possible to search for objective records by IDs from public databases (C).
Fig. 2Examples of search results in Plant-PrAS. (A) The results of Property Search. (B) The results of Keyword or ID Search.
Fig. 3Typical examples of the annotation details of proteins in Plant-PrAS. (A) Basic information on a protein in Plant-PrAS. (B) Physical and sequence properties. (C) Structural properties. (D) The detected functional regions. (E) Functional annotation. (F) Modifications and subcellular localization. (G) Summary with average, median and percentile values in relation to proteins from the same species (as a background distribution).
The percentage of soluble proteins (among all proteins) in Arabidopsis and rice
| Species | Category | No. of sequences (soluble/total) | Percentage of soluble proteins |
|---|---|---|---|
| Arabidopsis | Annotated | 7,545/21,146 | 35.7% |
| Unannotated | 2,389/5,180 | 46.1% | |
| MSU Rice | Annotated | 8,432/24,765 | 34.0% |
| Unannotated | 8,177/15,322 | 53.4% | |
| RAP-DB (rice) | Annotated | 7,579/21,192 | 35.8% |
| Unannotated | 7,746/14,716 | 52.7% |
In Arabidopsis and rice, there is a greater number of soluble proteins among unannotated proteins than among annotated proteins (P < 0.05 in the t-test of the differences between annotated and unannotated proteins).
Proteins that have a solubility score >0.5 according to the SOLpro software were regarded as soluble proteins.