| Literature DB >> 23324169 |
Chun-lin Su1, Ya-Ting Chao, Shao-Hua Yen, Chun-Yi Chen, Wan-Chieh Chen, Yao-Chien Alex Chang, Ming-Che Shih.
Abstract
A specialized orchid database, named Orchidstra (URL: http://orchidstra.abrc.sinica.edu.tw), has been constructed to collect, annotate and share genomic information for orchid functional genomics studies. The Orchidaceae is a large family of Angiosperms that exhibits extraordinary biodiversity in terms of both the number of species and their distribution worldwide. Orchids exhibit many unique biological features; however, investigation of these traits is currently constrained due to the limited availability of genomic information. Transcriptome information for five orchid species and one commercial hybrid has been included in the Orchidstra database. Altogether, these comprise >380,000 non-redundant orchid transcript sequences, of which >110,000 are protein-coding genes. Sequences from the transcriptome shotgun assembly (TSA) were obtained either from output reads from next-generation sequencing technologies assembled into contigs, or from conventional cDNA library approaches. An annotation pipeline using Gene Ontology, KEGG and Pfam was built to assign gene descriptions and functional annotation to protein-coding genes. Deep sequencing of small RNA was also performed for Phalaenopsis aphrodite to search for microRNAs (miRNAs), extending the information archived for this species to miRNA annotation, precursors and putative target genes. The P. aphrodite transcriptome information was further used to design probes for an oligonucleotide microarray, and expression profiling analysis was carried out. The intensities of hybridized probes derived from microarray assays of various tissues were incorporated into the database as part of the functional evidence. In the future, the content of the Orchidstra database will be expanded with transcriptome data and genomic information from more orchid species.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23324169 PMCID: PMC3583029 DOI: 10.1093/pcp/pct004
Source DB: PubMed Journal: Plant Cell Physiol ISSN: 0032-0781 Impact factor: 4.927
Fig. 1Overview of the information process pipeline for next-generation sequencing data. (A) Flow chart of the sequence data process pipeline (for details see the Materials and Methods). (B) Data content of Phalaenopsis aphrodite in the Orchidstra. Expressed transcripts (TSA for transcriptome shotgun assembly) were cross-linked to miRNA (SR for small RNA) by the target genes and precursors identification.
Statistics of transcriptome shotgun assemblies (TSAs) in the Orchidstra database
| Orchid species | No. of coding TSAs | No. of nc TSAs | No. of total TSAs | Average length | N50 | Tissue source | Data source | Expression profiling |
|---|---|---|---|---|---|---|---|---|
| 42,573 | 191,233 | 233,806 | 875 | 405 | R, L, S, F, PE, ST, FB, IN, PC | TSA | Yes | |
| 31,515 | 51,550 | 83,065 | 783 | 533 | R, L, F, PE | TSA | NA | |
| 26,786 | 20,900 | 47,686 | 668 | 619 | L, F, PB, IN, FB | TSA, cDNA | NA | |
| 10,515 | 3,302 | 13,817 | 639 | 669 | L, S, AB | cDNA | NA | |
| 2,401 | 0 | 2,401 | 631 | 684 | FB | cDNA | NA | |
| 1,143 | 0 | 1,143 | 686 | 740 | FB | cDNA | NA | |
| Total | 114,933 | 266,985 | 381,918 | 773 | 494 |
nc TSA, non-coding TSA.
Average length of the nucleotide sequence of protein-coding TSAs.
N50 of total assembled TSAs.
Code for tissue sources: R, root; L, leaf; S, stem; F, open flower; PE, pedicel; ST, stalk; IN, inflorescence; FB, flower bud; PB, pseudobulb; AB, auxiliary bud; PC, protocorm.
Data source: TSA, transcriptome shotgun assembly sequence; cDNA, reverse transcriptase-mediated cDNA library.
Statistics of functional affiliates in the Orchidstra database
| Orchid species | No. of coding TSAs | No. of TSAs with | ||||
|---|---|---|---|---|---|---|
| Pfam | GO | KEGG | Rice homolog | At homolog | ||
| 42,573 | 24,084 | 16,701 | 15,216 | 23,002 | 24,205 | |
| 31,515 | 20,731 | 16,229 | 15,932 | 22,686 | 21,833 | |
| 26,786 | 12,189 | 24,283 | 12,562 | 16,671 | 18,579 | |
| 10,515 | 7,429 | 7,706 | 1,388 | 7,981 | 7,761 | |
| 2,401 | 1,667 | 1,335 | 1,675 | 1,852 | 1,805 | |
| 1,143 | 839 | 746 | 1,093 | 949 | 934 | |
| Total | 114,933 | 66,939 | 67,000 | 47,866 | 73,141 | 75,117 |
Rice homologs were obtained from Blast against MSU Rice Genome Annotation Project Release 7, and At (Arabidopsis thaliana) homologs were from the Arabidopsis TAIR10 release.
Fig. 2Examples of functional annotation in the Orchidstra database. (A) KEGG pathway in graphic view. Steroid biosynthesis is demonstrated here. The EC number within a colored box indicates genes with P. aphrodite identity. (B) Comparison of expression profiles of multiple genes shows tissue expression patterns.
Fig. 3Comparison of Arabidopsis and rice homologs of expressed TSAs among four orchid species in the Orchidstra database. (A) Number of Arabidopsis homologs. (B) Number of rice homologs. The number in brackets indicates the total number of homologs found in the species. Pa, Phalaenopsis aphrodite; Ep, Erycina pusilla; Og, Oncidium Gower Ramsey; and Dn, Dendrobium nobile.
Fig. 4Sequence comparisons of EIF5A genes. (A) Multiple sequence alignment reveals high amino acid identity of EIF5A between plant species. (B) Phylogenetic analysis of EIF5A from various species. Contig id is used for EIF5A of orchid species that can be found in the Orchidstra database, while other species are given a GenBank id.