| Literature DB >> 23056606 |
Yue Ying Tan1, Rimantas Kodzius, Boon-Hui Tay, Alice Tay, Sydney Brenner, Byrappa Venkatesh.
Abstract
Cartilaginous fishes are the most ancient group of living jawed vertebrates (gnathostomes) and are, therefore, an important reference group for understanding the evolution of vertebrates. The elephant shark (Callorhinchus milii), a holocephalan cartilaginous fish, has been identified as a model cartilaginous fish genome because of its compact genome (∼910 Mb) and a genome project has been initiated to obtain its whole genome sequence. In this study, we have generated and sequenced full-length enriched cDNA libraries of the elephant shark using the 'oligo-capping' method and Sanger sequencing. A total of 6,778 full-length protein-coding cDNA and 10,701 full-length noncoding cDNA were sequenced from six tissues (gills, intestine, kidney, liver, spleen, and testis) of the elephant shark. Analysis of their polyadenylation signals showed that polyadenylation usage in elephant shark is similar to that in mammals. Furthermore, both coding and noncoding transcripts of the elephant shark use the same proportion of canonical polyadenylation sites. Besides BLASTX searches, protein-coding transcripts were annotated by Gene Ontology, InterPro domain, and KEGG pathway analyses. By comparing elephant shark genes to bony vertebrate genes, we identified several ancient genes present in elephant shark but differentially lost in tetrapods or teleosts. Only ∼6% of elephant shark noncoding cDNA showed similarity to known noncoding RNAs (ncRNAs). The rest are either highly divergent ncRNAs or novel ncRNAs. In addition to full-length transcripts, 30,375 5'-ESTs and 41,317 3'-ESTs were sequenced and annotated. The clones and transcripts generated in this study are valuable resources for annotating transcription start sites, exon-intron boundaries, and UTRs of genes in the elephant shark genome, and for the functional characterization of protein sequences. These resources will also be useful for annotating genes in other cartilaginous fishes whose genomes have been targeted for whole genome sequencing.Entities:
Mesh:
Substances:
Year: 2012 PMID: 23056606 PMCID: PMC3466250 DOI: 10.1371/journal.pone.0047174
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Elephant shark cDNA sequence analysis pipeline.
Full-length cDNA (FLcDNA) generated from various tissues.
| Tissue | Total FLcDNA | Protein-coding sequences | Noncoding sequences | |
| known | novel | |||
| Gills | 2,864 | 711 | 124 | 2,029 |
| Intestine | 3,442 | 1,419 | 98 | 1,925 |
| Kidney | 3,262 | 885 | 142 | 2,235 |
| Liver | 2,341 | 1,222 | 96 | 1,023 |
| Spleen | 3,285 | 1,955 | 54 | 1,276 |
| Testis | 2,285 | 586 | 82 | 1,617 |
| Total | 17,479 | 6,778 | 590 | 10,111 |
Sets of 5′-ESTs and 3′-ESTs generated from various tissues.
| Tissue | 5'-ESTs | 5'-ESTs with similarity to known proteins | 3'-ESTs | 3'-ESTs with similarity to known proteins | Other ESTs with similarity to known proteins |
| Gills | 4,795 | 2,082 | 6,174 | 1,436 | 5,043 |
| Intestine | 3,271 | 2,000 | 3,497 | 1,372 | 5,378 |
| Kidney | 5,470 | 2,361 | 7,401 | 1,654 | 6,375 |
| Liver | 4,147 | 3,209 | 5,653 | 3,732 | 6,773 |
| Spleen | 4,908 | 2,845 | 6,068 | 2,096 | 5,688 |
| Testis | 7,784 | 2,233 | 12,524 | 2,766 | 9,016 |
| Total | 30,375 | 14,730 | 41,317 | 13,056 | 38,273 |
Types of polyadenylation signals observed in protein-coding and noncoding full-length cDNA transcripts of elephant shark.
| Polyadenylation signal type | Protein-coding transcripts | Noncoding transcripts | ||
| Number | Percentage | Number | Percentage | |
| AAUAAA | 754 | 64.3% | 3,825 | 61.4% |
| AUUAAA | 199 | 17.0% | 1,145 | 18.4% |
| AGUAAA | 14 | 1.2% | 120 | 1.9% |
| UAUAAA | 33 | 2.8% | 164 | 2.6% |
| UUUAAA | 12 | 1.0% | 63 | 1.0% |
| CAUAAA | 11 | 0.9% | 47 | 0.8% |
| AAGAAA | 3 | 0.3% | 36 | 0.6% |
| AAUACA | 9 | 0.8% | 54 | 0.9% |
| GAUAAA | 10 | 0.9% | 45 | 0.7% |
| AAUAUA | 9 | 0.8% | 56 | 0.9% |
| AAAACA | 3 | 0.3% | 16 | 0.3% |
| ACUAAA | 1 | 0.1% | 19 | 0.3% |
| AAUGAA | 1 | 0.1% | 22 | 0.4% |
| AAAAAG | 2 | 0.2% | 13 | 0.2% |
| AAUAGA | 2 | 0.2% | 7 | 0.1% |
| Not identifiable | 110 | 9.4% | 597 | 9.6% |
Figure 2Tissue-wise occurrence of gene ontology terms for full-length protein-coding cDNA (Molecular function).
Figure 3Tissue-wise occurrence of gene ontology terms for full-length protein-coding cDNA (Biological Process).
KEGG ontology categorization for full-length protein-coding cDNA from various tissues.
| KEGG categories | Number of KO terms | |||||
| Gills | Intestine | Kidney | Liver | Spleen | Testis | |
|
|
|
|
|
|
|
|
| Cell Communication | 18 | 9 | 11 | 6 | 9 | 19 |
| Cell Growth and Death | 6 | 6 | 4 | 2 | 7 | 5 |
| Cell Motility | 15 | 9 | 4 | 3 | 5 | 7 |
| Transport and Catabolism | 30 | 16 | 37 | 27 | 40 | 31 |
|
|
|
|
|
|
|
|
| Signal Transduction | 29 | 19 | 17 | 23 | 37 | 22 |
| Signaling Molecules and Interaction | 20 | 24 | 7 | 5 | 18 | 2 |
| Membrane Transport | - | - | - | - | - | 2 |
|
|
|
|
|
|
|
|
| Folding, Sorting and Degradation | 27 | 31 | 21 | 15 | 30 | 28 |
| Replication and Repair | 2 | 1 | 4 | 2 | 1 | 7 |
| Transcription | 6 | 8 | 7 | 8 | 15 | 21 |
| Translation | 129 | 495 | 196 | 233 | 335 | 201 |
|
|
|
|
|
|
|
|
| Amino Acid Metabolism | 13 | 14 | 24 | 37 | 12 | 11 |
| Biosynthesis of Other Secondary Metabolites | - | 2 | 2 | 3 | 3 | - |
| Carbohydrate Metabolism | 11 | 7 | 29 | 23 | 16 | 21 |
| Energy Metabolism | 31 | 87 | 88 | 51 | 26 | 22 |
| Glycan Biosynthesis and Metabolism | 5 | 5 | 6 | 3 | 4 | 2 |
| Lipid Metabolism | 10 | 3 | 10 | 51 | 10 | 13 |
| Metabolism of Cofactors and Vitamins | 6 | 9 | 9 | 15 | 22 | 5 |
| Metabolism of Other Amino Acids | 10 | 24 | 28 | 30 | 5 | 9 |
| Metabolism of Terpenoids and Polyketides | - | 1 | 1 | 6 | - | 3 |
| Nucleotide Metabolism | 1 | 4 | 6 | 5 | 2 | 7 |
| Xenobiotics Biodegradation and Metabolism | 11 | 23 | 26 | 43 | 6 | 9 |
|
|
|
|
|
|
|
|
| Circulatory System | 22 | 53 | 40 | 17 | 5 | 3 |
| Development | 2 | - | - | 2 | 4 | 2 |
| Digestive System | 3 | 162 | 5 | 32 | 26 | 7 |
| Endocrine System | 14 | 124 | 26 | 35 | 20 | 14 |
| Environmental Adaptation | 2 | - | 1 | 1 | - | - |
| Excretory System | 5 | 10 | 11 | 5 | 11 | 10 |
| Immune System | 38 | 29 | 15 | 45 | 32 | 16 |
| Nervous System | 9 | 7 | 8 | 9 | 17 | 12 |
| Sensory System | 3 | - | 1 | 1 | 1 | 4 |
Families of ncRNA genes present in various tissues of elephant shark.
| RNA type | Number |
| 5_8S_rRNA | 2 |
| SSU_rRNA_eukarya | 135 |
| tRNA | 386 |
| Mir-598 | 15 |
| snoRNA | 11 |
| U1 | 3 |
| U2 | 3 |
| U3 | 2 |
| U4 | 9 |
| U5 | 2 |
| U6 | 1 |
| 7SK | 11 |
| Clostridiales-1 RNA | 6 |
| CsrB/RsmB RNA family | 1 |
| Metazoa_SRP | 3 |
Ancient genes differentially lost in teleost fishes.
| Elephant shark GenBank accession | Expression pattern of elephant shark gene | Mouse ID/ Gene | Description | Function in mouse | GO term (Biological Process) | Protein domains | Expression data |
| JX052286 | Gills | MGI:2143261 | Alpha-1,4-N-acetylglucosaminyl-transferase | Mice homozygous for a knock-out allele exhibit gastric adenocarcinoma with increased cell proliferation, angiogenesis, inflammation and gastric mucosal thickness. | Glycoprotein biosynthetic process | Alpha 1,4-glycosyltransferase domain; Glycosyltransferase, DXD sugar-binding motif | Gastric gland mucous cells, duodenal Brunner's glands |
| JX052312 | Gills, intestine, spleen | MGI:96493 | Immunoglobulin joining chain | Formation of polymeric Igs and their transport into secretions | Humoral immune response | Immunoglobulin J chain | Blood, intestine, spleen |
| JX052629 | All six tissues | MGI:96120 | High mobility group nucleosomal binding domain 1 | Mice homozygous for a knock-out allele display partial embryonic lethality, increased cellular sensitivity to ultraviolet- and gamma-irradiation, increased tumor incidence and metastatic potential, increased incidence of ionizing radiation-induced tumors, and abnormal cell cycle checkpoint function. | Chromatin organization, post-embryonic camera-type eye morphogenesis, etc. | High mobility group nucleosome-binding domain-containing family | Ubiquitous expression |
| JX052670 | Intestine, kidney, liver | MGI:99692 | Metallothionein 4 | Organization and assembly of metal- thiolate clusters | Ccellular metal ion homeostasis, response to cadmium ion | Metallothionein domain, vertebrate; Metallothionein superfamily, eukaryotic | Decidua, embryo, liver, tongue |
| JX052805 | Intestine | MGI:88421 | Colipase, pancreatic | Homozygous mutation of this gene results in increased mortality before weaning. Surviving mutants are growth retarded and remain smaller than wild-type into adulthood with decreased body fat, impaired fat absorption, elevated cholesterol, and reduced triglycerides. | Digestion, lipid catabolic process, etc. | Colipase; Colipase, conserved site; Colipase, C-terminal; Colipase, N-terminal | Pancreas |
| JX053121 | Liver | MGI:2685264 | Tigger transposable element derived 4 | Regulation of transcription | Biological process | DDE superfamily endonuclease, CENP-B-like; DNA binding HTH domain, Psq-type; Homeodomain-like; HTH CenpB-type DNA-binding domain | Diaphragm, tongue, skeletal muscle, vertebral axis muscle system |
Ancient genes differentially lost in tetrapods.
| Elephant shark GenBank accession | Expression pattern of elephant shark gene | Zebrafish ID/ Gene | Description | Function in zebrafish | GO term (Biological Process) | Protein domains | Expression data |
| JX052809 | Intestine | ZDB-GENE-081104-302 si:dkey-10c21.1 | Hypothetical protein | Not known | Regulation of apoptotic process | CARD; Caspase activation and recruitment domain: a protein-protein interaction domain DD_superfamily; The Death Domain Superfamily of protein-protein interaction domains | Not known |
| JX052773 | Intestine | ZDB-GENE-050419-230 si:dkey-246j7.1 | FYVE and coiled-coil domain-containing protein 1 | Not known | Not known | FYVE domain; Zinc-binding domain; targets proteins to membrane lipids via interaction with phosphatidylinositol-3-phosphate, PI3P; present in Fab1, YOTB, Vac1, and EEA1; SMC_prok_B; chromosome segregation protein SMC, common bacterial type; RUN domain | Not known |
| JX052420 | Gills, intestine, spleen | ZDB-GENE-091204-344 si:dkey-25o1.5 | Chemokine CCL-C24j | Small cytokines, including a number of secreted growth factors and interferons involved in mitogenic, chemotactic, and inflammatory activity; distinguished from other cytokines by their receptors, which are G-protein coupled receptors; divided. | Immune response | Chemokine interleukin-8-like domain | Not known |
| JX053182 | Spleen |