| Literature DB >> 23573207 |
Jian Xu1, Peifeng Ji, Baosen Wang, Lan Zhao, Jian Wang, Zixia Zhao, Yan Zhang, Jiongtang Li, Peng Xu, Xiaowen Sun.
Abstract
BACKGROUND: Amur ide (Leuciscus waleckii) is an economically and ecologically important species in Northern Asia. The Dali Nor population inhabiting Dali Nor Lake, a typical saline-alkaline lake in Inner Mongolia, is well-known for its adaptation to extremely high alkalinity. Genome information is needed for conservation and aquaculture purposes, as well as to gain further understanding into the genetics of stress tolerance. The objective of the study is to sequence the transcriptome and obtain a well-assembled transcriptome of Amur ide.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23573207 PMCID: PMC3613414 DOI: 10.1371/journal.pone.0059703
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Statistics of transcriptome sequencing, assembly and annotation of Amur ide.
| Stage | ||
| Sequencing | Number of reads (101-bp paired-end) | 99,883,236 |
| Total bases | 9.99 Gb | |
| Cleaned reads | 87,740,916 | |
| Assembly | Number of contigs | 53,632 |
| Maximum contig length | 9,691 bp | |
| Minimum contig length | 107 bp | |
| Average contig length | 647 bp | |
| N50 length | 1,094 bp | |
| Annotation | Contigs with blast results | 30,866 |
| Unigenes with blast results | 19,338 | |
| Contigs with GO terms | 13,717 | |
| Unigenes with GO terms | 10,674 |
Figure 1Length distribution of assembled contigs of Amur ide.
Summary of BLASTX search results of Amur ide transcriptome.
| Database | Amur ide hits | Unique protein | % of total unique proteins |
| NR | 30,866 | 19,338 | |
| UTR | 3,822 | 2,497 | |
| Refseq/Ensembl | |||
| Zebrafish | 31,790 | 15,759 | 57.8% of 27,271 |
| Medaka | 27,524 | 13,419 | 54.4% of 24,661 |
| Tetraodon | 27,096 | 12,952 | 56.0% of 23,118 |
| Three-spined stickleback | 27,996 | 14,047 | 50.9% of 27,576 |
Figure 2Distribution of ortholog hit ratio and its relationship with ortholog length.
Ortholog hit ratios were calculated for contigs with BLASTx results. A ratio of 1.0 indicates the gene is likely fully assembled.
Figure 3Gene Ontology (GO) categories of the unigenes.
Distribution of the GO categories assigned to the Amur ide transcriptome. Unique transcripts (unigenes) were annotated in three categories: cellular components, molecular functions, and biological processes.
KEGG biochemical mappings for Amur ide.
| KEGG categories represented | Unique sequences |
|
|
|
| Carbohydrate Metabolism | 826 (597) |
| Amino Acid Metabolism | 188 (140) |
| Energy Metabolism | 174 (122) |
| Nucleotide Metabolism | 142 (96) |
| Metabolism of Cofactors and Vitamins | 114 (82) |
| Lipid Metabolism | 207 (152) |
| Glycan Biosynthesis and Metabolism | 132 (101) |
| Metabolism of Other Amino Acids | 78 (50) |
| Xenobiotics Biodegradation and Metabolism | 78 (45) |
| Biosynthesis of Secondary Metabolites | 15 (12) |
| Biosynthesis of Polyketides and Nonribosomal Peptides | 21 (18) |
|
|
|
| Replication and Repair | 110 (88) |
| Folding, Sorting and Degradation | 366 (294) |
| Transcription | 176 (146) |
| Translation | 367 (284) |
|
|
|
| Signal Transduction | 378 (275) |
| Signaling Molecules and Interaction | 143 (107) |
| Membrane Transport | 17 (16) |
|
|
|
| Cell Motility | 104 (64) |
| Cell Growth and Death | 178 (141) |
| Transport and Catabolism | 348 (258) |
| Cell Communication | 206 (137) |
|
|
|
| Immune System | 318 (240) |
| Endocrine System | 202 (149) |
| Development | 80 (48) |
| Circulatory System | 139 (98) |
| Digestive System | 78 (56) |
| Excretory System | 200 (134) |
| Nervous System | 22 (15) |
| Sensory System | 117 (85) |
| Environmental Adaptation | 33 (23) |
|
|
|
Unique sequences indicate non-redundant sequences involving particular KEGG category.
Figure 4Length distribution of identified ORF.
Figure 5Length distribution of putative full-length cDNAs.
Statistics of microsatellites identified from Amur ide transcriptome.
| Total number of contigs | 53,632 |
| Microsatellites identified | 10,395 |
| Di-nucleotide repeats | 4,316 |
| Tri-nucleotide repeats | 749 |
| Tetra-nucleotide repeats | 40 |
| Penta-nucleotide repeats | 12 |
| Number of contigs containing microsatellites | 8,447 |
| Number of microsatellites with sufficient flanking sequences | 4,120 |
Classification of SNPs identified from Amur ide transcriptome.
| SNP classification | Number of SNPs |
| 5′ UTR | 646 |
| 3′ UTR | 5,159 |
| Coding region | 10,408 |
| synonymous | 4,335 |
| non-synonymous | 6,073 |
| pre-terminated | 265 |
| skip-stop-codon | 214 |
| mis-sense | 5,594 |
| Undefined | 18,086 |
| Total | 34,299 |
Figure 6Distribution of SNP non-synonymous (dN) and synonymous (dS) substitution.
The solid red line is the null expectation dN = dS. The filled red circles represent unigenes with dN/dS>1. The dashed blue line shows the slope ( = 0.428) of the overall average dN for all contigs/overall average dS for all contigs.
Unigenes showing positive selection (dN/dS>1) corresponding to stress adaption or immune response.
| Unigenes | Uniprot ID | Description | dN/dS |
| contig7468 | sp|P20702|ITAX_HUMAN | Integrin alpha-X | 4.18 |
| contig256 | sp|P11364|TCB_FLV | T-cell receptor beta chain T17T-22 | 3.57 |
| contig9830 | sp|Q95118|IL2RG_BOVIN | Cytokine receptor common subunit gamma | 3.46 |
| contig344 | sp|P49616|UPAR_RAT | Urokinase plasminogen activator surface receptor | 3.30 |
| contig1256 | sp|P40189|IL6RB_HUMAN | Interleukin-6 receptor subunit beta | 2.78 |
| contig3025 | sp|P11911|CD79A_MOUSE | B-cell antigen receptor complex-associated protein alpha chain | 2.64 |
| contig1148 | sp|P48284|CAH4_RAT | Carbonic anhydrase 4 | 2.59 |
| contig3210 | sp|P08317|IL8_CHICK | Interleukin-8 | 2.59 |
| contig6695 | sp|P08294|SODE_HUMAN | Extracellular superoxide dismutase [Cu-Zn] | 2.30 |
| contig11187 | sp|P01873|MUCM_MOUSE | Ig mu chain C region membrane-bound form | 2.22 |
| contig12382 | sp|Q9NVE5|UBP40_HUMAN | Ubiquitin carboxyl-terminal hydrolase 40 | 1.96 |
| contig13125 | sp|P13387|EGFR_CHICK | Epidermal growth factor receptor | 1.86 |
| contig21350 | sp|Q66S61|MBL2_CALJA | Mannose-binding protein C | 1.77 |
| contig941 | sp|P10820|PERF_MOUSE | Perforin-1 | 1.74 |
| contig59 | sp|P06314|KV404_HUMAN | Ig kappa chain V-IV region B17 | 1.60 |
| contig21332 | sp|A7M9B2|YCF1_CUSRE | Putative membrane protein ycf1 | 1.55 |
| contig21463 | sp|Q6PIU2|NCEH1_HUMAN | Neutral cholesterol ester hydrolase 1 | 1.54 |
| contig421 | sp|P04114|APOB_HUMAN | Apolipoprotein B-100 | 1.54 |
| contig11654 | sp|P20759|IGHG1_RAT | Ig gamma-1 chain C region | 1.41 |
| contig255 | sp|Q28085|CFAH_BOVIN | Complement factor H | 1.32 |
| contig895 | sp|P30568|GSTA_PLEPL | Glutathione S-transferase A | 1.31 |
| contig2382 | sp|Q8BK26|FBX44_MOUSE | F-box only protein 44 | 1.31 |
| contig15120 | sp|Q91009|NTRK1_CHICK | High affinity nerve growth factor receptor | 1.27 |
| contig7962 | sp|Q9MZV7|CASP1_CANFA | Caspase-1 | 1.25 |
| contig1236 | sp|O95415|BRI3_HUMAN | Brain protein I3 | 1.18 |
| contig10013 | sp|Q80SU7|GVIN1_MOUSE | Interferon-induced very large GTPase 1 | 1.17 |
| contig1492 | sp|P19181|HV05_CARAU | Ig heavy chain V region 5A | 1.11 |
| contig1317 | sp|P15684|AMPN_RAT | Aminopeptidase N | 1.06 |
| contig11745 | sp|Q96G23|CERS2_HUMAN | Ceramide synthase 2 | 1.02 |
| contig13871 | sp|P29533|VCAM1_MOUSE | Vascular cell adhesion protein 1 | 1.02 |
| contig3775 | sp|P50283|CD7_MOUSE | T-cell antigen CD7 | 1.01 |
| contig17067 | sp|Q16787|LAMA3_HUMAN | Laminin subunit alpha-3 | 1.01 |