| Literature DB >> 28069893 |
Atsushi Kurotani1, Yutaka Yamada1, Tetsuya Sakurai1,2.
Abstract
Algae are smaller organisms than land plants and offer clear advantages in research over terrestrial species in terms of rapid production, short generation time and varied commercial applications. Thus, studies investigating the practical development of effective algal production are important and will improve our understanding of both aquatic and terrestrial plants. In this study we estimated multiple physicochemical and secondary structural properties of protein sequences, the predicted presence of post-translational modification (PTM) sites, and subcellular localization using a total of 510,123 protein sequences from the proteomes of 31 algal and three plant species. Algal species were broadly selected from green and red algae, glaucophytes, oomycetes, diatoms and other microalgal groups. The results were deposited in the Algal Protein Annotation Suite database (Alga-PrAS; http://alga-pras.riken.jp/), which can be freely accessed online.Entities:
Keywords: Algae; Comparative analysis; Database; Gene function; Protein properties
Mesh:
Substances:
Year: 2017 PMID: 28069893 PMCID: PMC5444574 DOI: 10.1093/pcp/pcw212
Source DB: PubMed Journal: Plant Cell Physiol ISSN: 0032-0781 Impact factor: 4.927
Percentages of sequences annotated by the KOG, Pfam, UniProtKB, GO and PDB databases
| Class | Percentage of annotated sequences | |||||
|---|---|---|---|---|---|---|
| KOG | Pfam | UniProtKB | GO | PDB | Total | |
| Land plants | 34.2 | 67.9 | 70.7 | 44.9 | 47.4 | 77.3 |
| Algae | 26.9 | 54.6 | 46.1 | 34.7 | 36.6 | 60.3 |
| Green algae | 31.8 | 60.7 | 55.9 | 38.9 | 41.7 | 67.3 |
| Red algae | 34.6 | 61.5 | 55.7 | 41.0 | 44.0 | 67.1 |
| Glaucophyceae | 14.0 | 31.4 | 25.7 | 19.5 | 19.8 | 37.0 |
| Oomycetes | 28.6 | 57.8 | 49.6 | 37.8 | 38.5 | 64.2 |
| Diatoms | 25.1 | 53.7 | 39.8 | 33.7 | 34.2 | 57.9 |
| Other microalgae | 22.8 | 50.8 | 39.1 | 31.2 | 33.4 | 56.0 |
| All species | 28.0 | 56.5 | 49.6 | 36.2 | 38.2 | 62.8 |
a Poor annotations such as ‘poorly characterized’ in KOG, ‘domain unknown function (DUF)’ in Pfam, and ‘Uncharacterized protein,’ ‘Putative uncharacterized,’ ‘Unnamed product’ and only ID in UniProtKB, were excluded from hits.
b Values were calculated by combining the results of KOG, Pfam, UniProtKB, GO and PDB.
List of calculated protein properties in this study
| Classification of protein properties | Sub-classification of protein properties |
|---|---|
| Physicochemical properties | Protein length |
| Percentage of charged residues | |
| Percentage of nonpolar residues | |
| Percentage of acidic residues | |
| Percentage of basic residues | |
| Grand average value of hydropathicity index (GRAVY) | |
| Isoelectric point (pI) | |
| Probability of protein solubility | |
| Structural properties | Percentage of beta-pleated sheet secondary structure |
| Percentage of disordered residues | |
| Number of long disordered regions | |
| Existence of signal peptide cleavage site | |
| Number of transmembrane helices | |
| Number of S–S bonds | |
| Number of domain linkers | |
| Number of internal repeats | |
| Number of PEST regions | |
| Post-translational modifications (PTMs) and subcellular localization | Number of Ser, Thr and Tyr phosphorylation sites |
| Number of O-linked glycosylation sites | |
| Number of N-linked glycosylation sites | |
| Number of ubiquitination sites | |
| Protein subcellular localization sites |
Fig. 1Property Search interface. (A) Users can search by multiple protein properties on the Property Search page. (B) Example of a summary table from the Property Search results.
Fig. 2Typical examples of annotation detail page. (A) Basic information on a protein in Alga-PrAS. (B) Summary with average, median and percentile values in relation to proteins from identical species (upper portion) and identical clustered proteins by OrthoMCL (lower portion). (C) Structural properties. (D) Sequence window for highlighting position data for regions or sites.
Fig. 3Interfaces of ID Search and Keyword Search. (A) ID Search. This provides a search function by inputting arbitrary IDs in the text box as a query. (B) Keyword Search. This is an annotation search function against the assigned descriptions of the public databases. (C) Example of the results of Keyword Search. The example is the search result for the species Chlamydomonas reinhardtii, the description Pfam, Swiss-Prot and TrEMBL, and the keywords induced responsive.
Fig. 4Sequence Search interface. (A) Sequence Search allows protein or nucleic acid sequences to be submitted in the FASTA format as a query with the option of a cutoff e-value. (B) Example of Sequence Search results. The result tables for BLAST and PASS searches show that the conserved protein region is located from six to 94 amino acids of the query protein sequence.
Fig. 5Search example of the exploration of candidates of G protein-coupled receptors (GPCRs). The settings for Property Search are as follows; ‘Chlamydomonas reinhardtii’ in the Species field (e.g. ‘7’ in Membrane), ‘not hit’ in Signal, Pfam, UniProtKB/Swiss-Prot, UniProtKB/TrEMBL, KOG and Gene Ontology, and ‘0%’ in PDB (A). The results identified 10 protein sequences as candidate GPCRs (B). Users click ‘C. reinhardtii’ on the Species column on the summary table, and the accession IDs which are searched by the above process are then displayed (C).
Preference of protein disorder and PTMs in species-specific protein clusters and common protein clusters for each taxonomic class
| Taxonomic class | Disorder | S-pho/400aa | T-pho/400aa | Y-pho/400aa | O-gly/400aa | N-gly/400aa | Ubi/400aa | |
|---|---|---|---|---|---|---|---|---|
| Land plants | Specific | 16% | 1.3 | 0.5 | 0.5 | 0.9 | 1.3 | 1.1 |
| Common | 13% | 0.7 | 0.3 | 0.4 | 0.6 | 1.2 | 0.7 | |
| S/C ratio | 1.2 | 2.0 | 1.7 | 1.1 | 1.4 | 1.1 | 1.5 | |
| Green algae | Specific | 20% | 2.4 | 1.2 | 0.6 | 1.8 | 0.9 | 0.9 |
| Common | 12% | 0.6 | 0.3 | 0.5 | 0.8 | 0.9 | 0.6 | |
| S/C ratio | 1.7 | 4.0 | 3.6 | 1.4 | 2.4 | 0.9 | 1.6 | |
| Red algae | Specific | 12% | 1.7 | 0.9 | 0.6 | 1.4 | 1.0 | 0.9 |
| Common | 14% | 0.7 | 0.4 | 0.5 | 0.8 | 1.0 | 0.6 | |
| S/C ratio | 0.9 | 2.3 | 2.1 | 1.3 | 1.7 | 1.0 | 1.5 | |
| Glaucophyceae | Specific | 14% | 2.3 | 1.0 | 0.5 | 1.8 | 0.8 | 0.8 |
| Common | 10% | 0.5 | 0.3 | 0.3 | 0.9 | 1.0 | 0.6 | |
| S/C ratio | 1.4 | 4.9 | 3.6 | 1.5 | 2.0 | 0.8 | 1.4 | |
| Oomycetes | Specific | 14% | 1.3 | 0.7 | 0.6 | 0.9 | 1.3 | 0.8 |
| Common | 12% | 0.6 | 0.3 | 0.4 | 0.7 | 1.1 | 0.7 | |
| S/C ratio | 1.1 | 2.3 | 2.2 | 1.3 | 1.4 | 1.1 | 1.2 | |
| Diatoms | Specific | 20% | 1.8 | 0.8 | 0.6 | 1.0 | 2.1 | 1.8 |
| Common | 10% | 0.3 | 0.2 | 0.4 | 0.6 | 1.2 | 0.7 | |
| S/C ratio | 2.0 | 5.4 | 4.7 | 1.7 | 1.9 | 1.7 | 2.7 | |
| Other microalgae | Specific | 16% | 2.1 | 0.9 | 0.6 | 1.2 | 1.0 | 1.4 |
| Common | 11% | 0.7 | 0.4 | 0.4 | 0.8 | 0.9 | 0.7 | |
| S/C ratio | 1.4 | 3.1 | 2.7 | 1.4 | 1.6 | 1.1 | 2.1 |
a The Specific category (species-specific protein clusters) involves just one species in a cluster using the OrthoMCL tool.
b The Common category (common protein clusters) involves all 34 species used in this study.
c Ratio of specific to common values.
d Average of normalized value of predicted PTM sites. The number of predicted PTM sites was normalized per 400 amino acids (aa).
List of protein sequence resources in this study
| Classification | Species | Proteome resources | References for genomic analysis |
|---|---|---|---|
| Green algae | |||
| JGI Genome Portal | |||
| JGI Genome Portal | |||
| JGI Genome Portal | |||
| JGI Genome Portal | |||
| NCBI | |||
| JGI Genome Portal | |||
| JGI Genome Portal | |||
| NCBI | |||
| JGI Genome Portal | |||
| JGI Genome Portal | |||
| NCBI | |||
| Red algae | |||
| NCBI | |||
| NRIFS | |||
| NCBI | |||
| Glaucophyceae | |||
| Oomycetes | JGI Genome Portal | ||
| JGI Genome Portal | |||
| Superfamily database | |||
| JGI Genome Portal | |||
| Diatoms | JGI Genome Portal | ||
| JGI Genome Portal | |||
| JGI Genome Portal | |||
| Other algal species | JGI Genome Portal | ||
| JGI Genome Portal | |||
| OIST | |||
| NCBI | |||
| NCBI | |||
| JGI Genome Portal | |||
| Land plants | TAIR | ||
| JGI Genome Portal | |||
| JGI Genome Portal |
a–f Other algal species (Aureococcus anophagefferens, Ectocarpus siliculosus, Symbiodinium minutum, Emiliania huxleyi, Guillardia theta and Bigelowiella natans) belong to Pelagophyceae, Phaeophyceae, Dinophyceae, Haptophyceae, Cryptophyceae and Chlorarachniophyceae, respectively.
g http://www.plantmorphogenesis.bio.titech.ac.jp/∼algae_genome_project/klebsormidium/index.html (Hori et al. 2014).
h http://genome.jgi.doe.gov (Nordberg et al. 2014).
i http://www.ncbi.nlm.nih.gov (Pruitt et al. 2007, Pruitt et al. 2012).
j http://merolae.biol.s.u-tokyo.ac.jp (Matsuzaki et al. 2004).
k http://nrifs.fra.affrc.go.jp/ResearchCenter/5_AG/genomes/nori/index.html (Nakamura et al. 2013).
l http://cyanophora.rutgers.edu/porphyridium (Bhattacharya et al. 2013).
m http://cyanophora.rutgers.edu/cyanophora/home.php (Price et al. 2012).
n http://supfam.org/SUPERFAMILY (Oates et al. 2015).
o http://marinegenomics.oist.jp/symb/viewer/info?project_id=21 (Shoguchi et al. 2013).
p https://www.arabidopsis.org (Swarbreck et al. 2008).