| Literature DB >> 19283069 |
Shinichi Sunagawa1, Michael K DeSalvo, Christian R Voolstra, Alejandro Reyes-Bermudez, Mónica Medina.
Abstract
The amount of genomic sequence information continues to grow at an exponential rate, while the identification and characterization of genes without known homologs remains a major challenge. For non-model organisms with limited resources for manipulative studies, high-throughput transcriptomic data combined with bioinformatics methods provide a powerful approach to obtain initial insights into the function of unknown genes. In this study, we report the identification and characterization of a novel family of putatively secreted, small, cysteine-rich proteins herein named Small Cysteine-Rich Proteins (SCRiPs). Their discovery in expressed sequence tag (EST) libraries from the coral Montastraea faveolata required the performance of an iterative search strategy based on BLAST and Hidden-Markov-Model algorithms. While a discernible homolog could neither be identified in the genome of the sea anemone Nematostella vectensis, nor in a large EST dataset from the symbiotic sea anemone Aiptasia pallida, we identified SCRiP sequences in multiple scleractinian coral species. Therefore, we postulate that this gene family is an example of lineage-specific gene expansion in reef-building corals. Previously published gene expression microarray data suggest that a sub-group of SCRiPs is highly responsive to thermal stress. Furthermore, data from microarray experiments investigating developmental gene expression in the coral Acropora millepora suggest that different SCRiPs may play distinct roles in the development of corals. The function of these proteins remains to be elucidated, but our results from in silico, transcriptomic, and phylogenetic analyses provide initial insights into the evolution of SCRiPs, a novel, taxonomically restricted gene family that may be responsible for a lineage-specific trait in scleractinian corals.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19283069 PMCID: PMC2652719 DOI: 10.1371/journal.pone.0004865
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Summary of iterative search strategy used to identify Mfav-SCRiP sequences.
| Step | Screening method | Query against | Sequences used | Identification of |
| 1 | HMM | sixframe-translated | Conserved domains of β-defensins (pfam00711) | Mfav-SCRiP1 |
| EST library sequences | ||||
| 2 | tBLASTn | EST library sequences | Mfav-SCRiP1 | Mfav-SCRiP2 |
| Mfav-SCRiP3a | ||||
| Mfav-SCRiP4 | ||||
| 3 | HMM | sixframe-translated | Mfav-SCRiP1-4 | Mfav-SCRiP3b |
| EST library sequences | Mfav-SCRiP5 | |||
| 4 | tBLASTn | EST library sequences | Mfav-SCRiP1-6 | Mfav-SCRiP6 |
| Mfav-SCRiP7 | ||||
| Mfav-SCRiP8 | ||||
| 5 | HMM | sixframe-translated | Mfav-SCRiP1-8 | no additional sequences |
| EST library sequences | ||||
| 6 | tBLASTn | nt (NCBI), est_others (NCBI) | all Mfav-SCRiPs | Amil-SCRiP1 |
| Amil-SCRiP2 | ||||
| Amil-SCRiP3 | ||||
| Mcap-SCRiP1a | ||||
| Mcap-SCRiP1b |
likely to be a longer isoform of SCRiPs.
possibly a pseudogene.
Physicochemical propertiesa of SCRiPs.
| SCRiP | Preproprotein | Proprotein | Mature protein | Mature protein |
| aa (MW) | aa (MW) | aa (MW) | IP / DE / RK (net) | |
| Mfav-SCRiP1 | 79 (8,451) | 58 (6,203) | 41 (4,316) | 3.89 / 2 / 0 (−2) |
| Mfav-SCRiP2 | 68 (7,769) | - | 44 (5,219) | 3.80 / 11 / 1 (−10) |
| Mfav-SCRiP4 | 81 (8,954) | 58 (6,642) | 48 (5,464) | 3.99 / 8 / 2 (−6) |
| Mfav-SCRiP5 | 68 (7,735) | - | 44 (5,184) | 3.89 / 8 / 2 (−6) |
| Mfav-SCRiP6 | 81 (9,330) | 58 (7,019) | 47 (5,761) | 3.99 / 11 / 3 (−8) |
| Mfav-SCRiP8 | 74 (8,107) | 53 (5,869) | 41 (4,380) | 5.53 / 4 / 3 (−1) |
| Amil-SCRiP2 | 79 (8,907) | 58 (6,675) | 42 (4,756) | 5.15 / 5 / 2 (−3) |
| Amil-SCRiP3 | 83 (9,222) | 62 (7,041) | 42 (4,531) | 6.02 / 3 / 2 (−1) |
| Mcap-SCRiP1a | 81 (8,919) | 62 (6,893) | 40 (4,326) | 7.71 / 2 / 3 (+1) |
| Mcap-SCRiP1b | 81 (8,892) | 62 (6,889) | 40 (4,326) | 7.71 / 2 / 3 (+1) |
Number of amino acids (aa), molecular weight (MW), isoelectric point (IP), number of negative (DE) and positive amino acids (RK), and net-charge (net) of amino acid residues are shown for SCRiP members identified in this study. In addition to aa and MW for the complete protein (preproprotein), data are shown for both the proprotein, i.e. the product after cleavage of the signal peptide and the mature protein, i.e. the product after proprotein convertase processing. D – aspartate, E – glutamate, R – arginine, K – lysine.
only sequence with β-defensin motif.
no proprotein convertase cleavage site.
mature proteins are identical.
Figure 1Multiple sequence alignments of SCRiPs identified in this study.
Predicted signal peptides are shown in upper case and conserved amino acid sites are indicated by an asterisk (100% similarity) or a dot (90% similarity) below the alignment. The N-terminal signal peptide region and C-terminal cysteine-rich domain are underlined. Boxed amino acids show potential proprotein convertase (PC) cleavage sites, which recognize the consensus motif [R/K]-[R/K], or [R/K]-(X)n-[R/K], where n = 2, 4 or 6.
Figure 2Un-rooted maximum likelihood tree of SCRiP sequences identified in this study.
The un-rooted maximum likelihood tree was constructed using the proposed general time reversible (GTR) nucleotide substitution model with discrete gamma. Bootstrap replicate (n = 1000) recovery rates are shown at the internal nodes, when clusters were supported at a level of >80%. Scale = nucleotide substitutions per site.
Figure 3Gene expression microarray data of SCRiPs in thermally stressed Montastraea faveolata (A) and different developmental stages in Acropora millepora (B).
(A) Cluster of most down-regulated genes in thermally stressed M. faveolata showing fold changes, rank of most down-regulated genes, and clone IDs according to data available under GEO accession: GSE10630. (B) Heat map of log2-transformed signal intensities of Amil-SCRiP gene expression shown as ratios of successive developmental stages. Pre-settlement = ratio of pre-settlment over prawn chip; Post-settlement = ratio of post-settlement over pre-settlement; Adult = ratio of adult over post-settlement. CloneID annotations are available under GEO accession: GSE11251. Note that same numbering does not imply orthology between SCRiPs from different coral species.
Summary of SCRiP sequences identified in this study.
| SCRiP | Evidence | Accession numbers |
| Mfav-SCRiP1 | 7 cDNA clones, gDNA | BK006525 |
| Mfav-SCRiP2 | 1 cDNA clone, RACE clones | BK006526 |
| Mfav-SCRiP3a | 1 cDNA clone | BK006527 |
| Mfav-SCRiP3b | 1 cDNA clone | BK006528 |
| Mfav-SCRiP4 | 3 cDNA clones | BK006529 |
| Mfav-SCRiP5 | 1 cDNA clone, 4 RACE clones | BK006530 |
| Mfav-SCRiP6 | 5 cDNA clones | BK006531 |
| Mfav-SCRiP7 | 1 cDNA clone and RACE clones | BK006532 |
| Mfav-SCRiP8 | 1 cDNA clone and RACE clones | BK006533 |
| Amil-SCRiP1 | single EST read | BK006534 |
| Amil-SCRiP2 | 6 cDNA clones | BK006535 |
| Amil-SCRiP3 | 27 cDNA clones | BK006536 |
| Mcap-SCRiP1a | GenBank entry | BK006537 |
| Mcap-SCRiP1b | GenBank entry | BK006538 |