Literature DB >> 22279088

Transcriptome analysis of silver carp (Hypophthalmichthys molitrix) by paired-end RNA sequencing.

Beide Fu1, Shunping He.   

Abstract

The silver carp (Hypophthalmichthys molitrix) is among the most intensively pond-cultured fish species and is used in the wild to counteract water bloom in China. However, little genomic information is available for this species, especially regarding its ability to grow rapidly in water, even water contaminated with high concentrations of poisonous microcystin. In this study, we performed de novo transcriptome assembly and analysis of the 17.10 million short-read sequences produced by the Illumina paired-end sequencing technology. Using an improved multiple k-mer contig assembly method coupled with further scaffolding, 85,759 sequences were obtained. There were 23,044 sequences annotated with 3423 gene ontology terms for 104 196 term occurrences and the three corresponding organizing principles. A total of 38,200 assembled sequences were involved in 218 predicted Kyoto Encyclopedia of Genes and Genomes metabolic pathways. We also recovered 41 of 44 genes involved in the biosynthesis of glutathione. Of these, five genes were identified as experienced positive selection between silver carp and zebrafish, as determined by the likelihood ratio test. This report is the first annotated review of the silver carp transcriptome. These data will be of interest to researchers investigating the evolution and biological processes of the silver carp. This work also provides an archive for future studies of recent speciation and evolution of Cyprinidae fishes and can be used in comparative studies of other fishes.

Entities:  

Mesh:

Substances:

Year:  2012        PMID: 22279088      PMCID: PMC3325077          DOI: 10.1093/dnares/dsr046

Source DB:  PubMed          Journal:  DNA Res        ISSN: 1340-2838            Impact factor:   4.458


Introduction

The transcriptome is made up of the subset of genes active in a selected tissue and species. Understanding the dynamics of the transcriptome is essential for interpreting phenotypic variation caused by combinations of genotypic and environmental factors.[1] Massively parallel sequencing of RNA[2] (RNA-Seq) has offered the opportunity to characterize the transcriptome with unprecedented sensitivity and depth. It has already revolutionized the way we study the transcriptome. The latest paired-end sequencing of RNA-Seq techniques have further improved the efficiency of DNA sequencing and expanded short read lengths, permitting a deeper understanding of the transcriptome.[3] RNA-Seq is independent of prior knowledge and does not require design work, thus reducing the required staff, cost and time and providing the unprecedented opportunity to conduct low-cost transcriptome studies at lower cost for non-model organisms. The RNA-Seq technology has been applied to many model organisms[4-12] for the discovery of splice variants, RNA editing sites and new microRNAs, but fewer studies were conducted in non-model fish organisms.[13,14] The Actinopterygii, in terms of numbers, are the dominant class of vertebrate, comprising nearly 96% of the 26 000 species of fish.[15] However, the genomic information of this group is very rare: only six genomes[16-19] and several transcriptomes[20,21] are available. This has hindered research into these valuable species. Cyprinidae is the largest family of freshwater fish in the Actinopterygii.[15] The endemic clade of East Asian Cyprinidae displays a tremendous diversity of phenotypic and ecological traits in this area. This clade is an ideal model system for the study of rapid radiations and evolutionary adaptation over short periods of time.[22] Silver carp (Hypophthalmichthys molitrix) of the family Cyprinidae are among the most intensively pond-cultured species in China. As one of famous four major Chinese carps, breeding production reached 3 million tons in China in 2009.[23] Aside from their great importance to the fishery economy, silver carp have been found useful in counteracting cyanobacteria blooms in China.[24,25] Silver carp are also a good model for the study of speciation because of its split with bighead carp (Hypophthalmichthys nobilis), which occurred only ∼3 Mya.[26] However, lack of genomic resources like genome sequence, transcriptome sequences and molecular markers has made the study of silver carp breeding, the mechanism of its ability to counteract water bloom and evolutionary analysis a difficult task. When no genome sequence is available, transcriptome sequencing is an effective way to obtain large numbers of molecular makers and identify transcripts involved in specific biological processes. In this study, we present the first silver carp transcriptome using massively parallel mRNA sequencing. We perform Illumina sequencing of the heart, liver, brain, spleen and kidney tissues to characterize the H. molitrix transcriptome. A database (Silver Carp Base) is under construction and we expect that it will provide the first picture of the transcriptome of this species. The database will be updated in the future if additional data become available.

Materials and methods

Ethics statement

All experimental protocols were approved by the ethics committee of Institute of HydroBiology, Chinese Academy of Sciences.

Organ collection and RNA isolation

A wild silver carp was collected from the middle reach of the Yangtze River. To obtain the whole transcriptome, RNA from five organs (heart, liver, brain, spleen and kidneys) was extracted using TRIzol reagent (Invitrogen, Carlsbad, CA, USA). After the quality examination by the way of electrophoresis and a BioPhotometer plus 6132 (Eppendorf, Germany), RNAs from different organs were mixed together at equivalent concentrations. Total RNA extraction was in accordance with the manufacturer's protocol and it was treated with RNase-free DNase I (New England Biolabs) for 30 min at 37°C to remove the residual DNA.

cDNA library preparation and sequencing

Beads with oligo(dT) were used to purify poly(A) mRNA from total RNA. Then, the mRNA was fragmented using a RNA fragmentation kit (Ambion). First-strand cDNA was synthesized using random hexamer-primer and reverse transcriptase (Invitrogen), and second-strand cDNA was synthesized next. Then the paired-end cDNA library was prepared in accordance with Illumina's protocols with an insert size of 200 bp and sequenced for 75 bp. The Illumina GA processing pipeline v0.2.2.6 was used to analyze the image and for base calling.

De novo assembly of silver carp transcriptome

As no optimal k-mer length is appropriate for all de novo transcriptome assemblies, the multiple k-mer method was used to obtain longer silver carp mRNA sequences, which are very useful in subsequent analysis steps. Our method is based on the modified ‘additive Multi-k’ method described by Yann Surget-Groba[27] After removing reads with the sequencing adapter and reads of low quality, paired-end reads were subjected to de novo assembly using ABySS[28] with k-mer lengths of 58, 54, 52, 50, 48, 46, 44, 42, 40, 38 and 34. The unused reads at higher k-mer lengths were not discarded before running the assembly for a lower k-mer length. The output data set of each k-mer length was subjected to SSPACE[29] for scaffolding, respectively. When pooling all the results together, some contigs and scaffolds appeared in two or more assemblies, causing redundancy. These were removed using CD-HIT-EST.[30] The longest possible contigs and scaffolds were retained. At last, the STM+ method[27] was used to perform translation mapping scaffolding with the Danio rerio proteome[31] serving as a reference.

Sequence annotation

The assembled sequences were blasted against the NCBI Nr (non-redundant) protein database and Swiss-prot database using BLASTX[32] and an E-value of 1e−5. To shorten the search time, searches were limited to the first 10 significant hits for each query. Gene names were assigned to each sequence according to its best BLAST hit (highest score). The Blast2GO suit[33] was used for functional annotation of assembled sequences applying the function for the mapping of gene ontology (GO) terms to sequences with BLAST hits obtained from hits with E-value < 1e−5, annotation cut-off > 55 and a GO weight > 5 were used for annotation. Assembled sequences were thus assigned to primary and sub-GO functional categories.

Simple sequence repeat markers discovery

A microsatellite program (MISA)[34] (http://pgrc.ipkgatersleben.de/misa/) was used to identify and localize microsatellite motifs. We searched for all types of simple sequence repeats (SSRs) from mononucleotide to hexanucleotides using the following parameters: at least 10 repeats for mono-, 6 repeats for di- and 5 repeats for tri-, tetra-, penta- and hexanucleotide for simple repeats. Both perfect (i.e. SSRs contain a single repeat motif like such as ‘ATC’) and compound (i.e. composed of two or more motifs separated by <100 bp) SSRs were identified.

Positive selection

All the 44 sequences involved in the biosynthesis of glutathione (GSH) were downloaded from Kyoto Encyclopedia of Genes and Genomes (KEGG) and used as a query to blast against the 85 759 sequences assembled. Only the reciprocal BLAST best-hit result sequence was kept to form the sequence pair with its corresponding query. To determine whether the sequence pair underwent positive selection, a likelihood ratio test (LRT) was performed for the Nsite 7 and Nsite 8 of codeml in PAML was used.[35]

Results

De novo assembly with multiple k-mer lengths and sequence validation

We prepared the mixed cDNAs from the heart, liver, brain, spleen and kidneys of silver carp at equivalent concentration. One lane of Illumina Genome Analyzer was performed and ∼17.10 million 75 bp paired-end reads were obtained. After cleaning the low-quality reads, we used a modified version of a previous published procedure[27] (see Materials and methods) to assemble the reads for non-redundant consensus. The bioinformatics workflow is depicted in the flowchart shown in Supplementary Fig. 1S. Short read data have been deposited in NCBI's Short Read Archive at http://www.ncbi.nlm.nih.gov/sra under the accession SRP008133. To assemble the paired-end reads into contigs, we used ABySS[28] with different k-mer lengths (Table 1). Although paired-end information has been used in ABySS, a great improvement was found after scaffolding the contigs with SSPACE[29] (Table 2). After pooling all the scaffolds obtained from multiple k-mers together, 3 930 925 sequences were collected. Using CD-HIT-EST,[30] scaffolds were assembled into clusters that were analyzed for consensus. Finally, 85 796 sequences ranging from 200 to 13 880 bp were collected. The length distribution of all the sequences is shown in Fig. 1.
Table 1.

Summary statistics of the assemblies used to assess the performances of the Mulit-K de novo assembly method

Methodk-merContig > 100N50Max lengthTotal length (Mb)Average contig size
single K5833286533242.07687
5422 39715955978.197127
5237 207233608712.602140
5051 041239829718.059142
4861 097241804523.245144
4669 717242835828.235145
4477 03823911 06233.133142
4282 80623610 00437.633139
4087 673233732241.928135
3891 69422810 09245.964130
3497 15322013 87353.206119
multi K118 76425713 88058.075159

These statistics correspond to the set of contig > 100 bp. k-mer, required length of identical overlap match between two reads by ABySS; N50, contig length–weighted median; max length, length of the longest contig; (Total length) summed length of all contig > 100 bp.

Table 2.

Summary statistics of the scaffolds produced by SSPACE

k-merscaffold > 100N50Max lengthTotal length (Mb)Average scaffold size
5828056548352.07492
5419 18424170698.165135
5231 31427911 04112.561151
5041 81530111 66918.033155
4849 81432512 40323.239159
4657 24133210 96428.259160
4463 79932411 06233.196156
4269 85631410 95037.804152
4074 82730212 14042.046146
3879 40829111 33946.097139
3487 40827013 88053.831127

These statistics correspond to the set of scaffold > 100 bp. k-mer, required length of identical overlap match between two reads by ABySS; N50, scaffold length–weighted median; max length, length of the longest scaffold; total length, summed length of all scaffold > 100 bp.

Figure 1.

Length distributions of scaffolds assembled by a multiple k-mer method.

Summary statistics of the assemblies used to assess the performances of the Mulit-K de novo assembly method These statistics correspond to the set of contig > 100 bp. k-mer, required length of identical overlap match between two reads by ABySS; N50, contig length–weighted median; max length, length of the longest contig; (Total length) summed length of all contig > 100 bp. Summary statistics of the scaffolds produced by SSPACE These statistics correspond to the set of scaffold > 100 bp. k-mer, required length of identical overlap match between two reads by ABySS; N50, scaffold length–weighted median; max length, length of the longest scaffold; total length, summed length of all scaffold > 100 bp. Length distributions of scaffolds assembled by a multiple k-mer method. To determine the expression level of the transcripts, we mapped the raw reads to the assembled sequences with SOAP[36] and the RPKM value (Reads Per Kilobase of exon model per Million mapped reads) of all the transcripts are shown in Supplementary Table S1. Figure 2 depicts the relationship of RPKM versus the transcript size. Transcript length increased with coverage depth and reached an asymptote approximately at an average coverage of ∼50.
Figure 2.

The relationship of RPKM versus the transcript size. RPKM, Reads Per Kilobase of exon model per Million mapped reads.

The relationship of RPKM versus the transcript size. RPKM, Reads Per Kilobase of exon model per Million mapped reads. Until now, no general criteria have been proposed as standards for evaluation of the quality of transcriptome assembly. We used three substantial factors to assess how well the assembled sequences represent the actual transcriptome population: (i) gene coverage, (ii) transcript sequence quality and (iii) completeness. The transcriptome gene coverage was judged by comparison with the sequence information available for silver carp. All 13 mitochondrial protein-coding genes and 203 of 217 proteins in the NCBI database were present in our assembled scaffolds. We compared our assembled scaffolds with the zebrafish transcriptome (ENSEMBL Zv61) and found that 40 509 of 41 759 (85.9%) zebrafish transcripts have matches in assembled scaffolds. At the same time, 19 893 reciprocal best-hit blast matches with the zebrafish transcriptome were identified using E-value 1e−5. Transcriptome quality was assessed by comparing the mitochondrial protein-coding genes found in assembled sequences to mitochondrion sequence in GenBank (NC_010156). A total of 10 185 nucleotide identities were observed out of 10 522 (96.8%) total nucleotide length of contig to coding mitochondrial sequences BLAST matches, suggesting very good transcriptome sequence quality. The observed 3.2% sequence difference might be due to the high intraspecific genetic variability. Finally, in terms of sequence completeness, the relative number of full-length sequences in the 19 893 reciprocal best-hit blast matches to zebrafish transcriptome was estimated. A sequence was considered full length if it contained the complete 5′- and 3′-UTR of the mRNA. In this study, we used a less stringent but broadly adopted definition, considering a sequence to be full length if it comprised at least the complete coding sequence (CDS).[21] We mapped the 19 893 sequences to their corresponding CDS in the zebrafish transcript, and if the CDS was fully covered by assembled sequence, we thought the sequence as full-length sequence. Under the criteria given above, 1937 sequences (9.7%) were validated as full length. One thousand, six hundred and thirty-five sequences (8.2%) covered more than 95% of the zebrafish CDS and 2394 sequences (12.0%) covered more than 75% of the zebrafish CDS. For a pseudo-stop codon usually appears on a chimeric or truncated transcript, we translated the nucleotide sequences to protein sequences to verify the completeness of the transcripts. One thousand, three hundred and seventy-seven sequences were validated as full length and 2109 sequences covered more than 95% of the zebrafish CDS. In addition to the computing methods given above, Reverse transcription polymerase chain reaction (RT-PCR) assay was used to validate the quality of the assembled transcriptome. Primers for 22 transcripts with different expression levels (RPKM ranged from 336 to 10 507) were designed and all the cDNAs were amplified. Out of the 22 pairs of primers, 10 pairs were silver-specific transcripts which did not have an NCBI Nr BLAST hit (see Sequence annotation). The primer information and RT–PCR results are shown in Supplementary Table S2 and Fig. S2. Several complementary methods were used to annotate the assembled sequences. First, the assembled sequences were searched against the Nr protein databases using BLASTX with an E-value of 1e−5. Of the 85 796 assembled sequences, 54 198 (63.2%) had significant matches (Supplementary Table S3). Most of the sequences with top-hit blast result from zebrafish (44 999 sequences; 83.0%). In addition, 18 536 (34.2%) sequences matched predicted proteins, 81 (0.1%) with unknown proteins. Second, silver carp sequences that had matches in Nr databases were given GO annotations with the Uniprot database. Of these, 23 044 were assigned to one or more 3423 GO terms, for a total of 104 196 term occurrences. As many as 17 451 sequences were found to be involved in biological process and could be divided into cellular process (13 382 sequences with percentage of 76.7%), metabolic process (10 846; 62.2%), biological regulation (5032; 28.9%), multicellular organismal process (4386; 25.1%), pigmentation (4296; 24.6%), localization (4162; 23.8%), developmental process (4088; 23.4%), establishment of localization (3448; 19.8%), cellular component organization (2793; 16.0%) and response to stimulus (2275; 13.0%). Other type of functions occurred at <10% each (Fig. 3).
Figure 3.

Functional classification of silver carp transcriptome and comparison with zebrafish transcriptome. (A) GO: biological process. (B) GO: molecular function. (C) GO: cellular component. In some cases, one transcript may have multiple functions. Grey, silver carp; black, zebrafish.

Functional classification of silver carp transcriptome and comparison with zebrafish transcriptome. (A) GO: biological process. (B) GO: molecular function. (C) GO: cellular component. In some cases, one transcript may have multiple functions. Grey, silver carp; black, zebrafish. GO analysis have also shown that 15 799 sequences were associated with a cellular component, including cell (15 114; 95.6%), cell part (15 114; 95.6%), organelle (8265; 52.3%), organelle part (3624; 22.9%) and macromolecular complex (3042; 19.2%). Moreover, 17 837 sequences showed potential molecular function, such as binding (13 027; 73%), catalytic activity (8473; 47.5%), molecular transducer activity (1648; 9.2%) and transporter activity (1406; 7.8%). The detailed information about the functional classification is shown in Supplementary Table S3. Representation of GO categories in the silver carp transcriptome set was found to be similar to that of the zebrafish GO database, but there were a few differences in each of the three main GO categories (Fig. 3). After correcting for multiple tests, we found that 30 of 37 comparisons were significantly over or underrepresented in comparison to the zebrafish records. For example, among the biological processes, pigmentation (GO: 0043473) was underrepresented in the silver carp transcriptome, while localization (GO: 0051179) and response to stimulus (GO: 0050896) were overrepresented. Meanwhile, annotation of the 85 759 sequences using Clusters of Orthologous Groups of protein (COG) databases yielded good results for 14 840 putative proteins. The COG-annotated putative proteins ranged functionally into at least 25 molecular families, including biochemical metabolism, signal transduction, cellular structure and immune defense, in accordance with the categories observed in GO annotation (Fig. 4).
Figure 4.

COG annotations of putative proteins. All putative proteins were aligned to COG database and can be classified functionally into at least 25 molecular families.

COG annotations of putative proteins. All putative proteins were aligned to COG database and can be classified functionally into at least 25 molecular families.

Metabolic pathways by KEGG analysis

A total of 38 200 assembled sequences were found to be involved in 218 predicted KEGG metabolic pathways. The number of sequences ranged from 3 to 4510 (Supplementary Table S4). The top 20 pathways with the greatest number of sequences are shown in Table 3, and the greatest number of transcripts was found in the metabolic pathways. The top 10 metabolic pathways were: purine metabolism (789), pyrimidine metabolism (473), oxidative phosphorylation (436), inositol phosphate metabolism (435), glycerophospholipid metabolism (371), riboflavin metabolism (347), glycolysis/gluconeogenesis (341), lysine degradation (337), pyruvate metabolism (239) and starch and sucrose metabolism (218) (Supplementary Table S5).
Table 3.

The top 20 pathways with highest sequence numbers

NumPathwayAll genes with pathway annotation (38 200)Pathway ID
1Metabolic pathways4510 (11.81%)ko01100
2Pathways in cancer1790 (4.69%)ko05200
3Regulation of actin cytoskeleton1634 (4.28%)ko04810
4Focal adhesion1518 (3.97%)ko04510
5MAPK signaling pathway1463 (3.83%)ko04010
6Endocytosis1345 (3.52%)ko04144
7Tight junction1256 (3.29%)ko04530
8Adherens junction1073 (2.81%)ko04520
9Phagosome1034 (2.71%)ko04145
10Dilated cardiomyopathy1027 (2.69%)ko05414
11Vascular smooth muscle contraction1014 (2.65%)ko04270
12Complement and coagulation cascades1005 (2.63%)ko04610
13Hypertrophic cardiomyopathy (HCM)957 (2.51%)ko05410
14Chemokine signaling pathway955 (2.5%)ko04062
15Calcium signaling pathway942 (2.47%)ko04020
16Axon guidance939 (2.46%)ko04360
17Insulin signaling pathway912 (2.39%)ko04910
18Huntington's disease907 (2.37%)ko05016
19Leukocyte transendothelial migration869 (2.27%)ko04670
20Protein processing in endoplasmic reticulum864 (2.26%)ko04141
The top 20 pathways with highest sequence numbers

Positive selection of genes involved in GSH synthesis

Microcystins (MCs) are cyclic non-ribosomal peptides produced by cyanobacteria. They are cyanotoxins and can be very toxic to fishes and other animals, including humans. With the increasing frequency of water bloom outbreaks in many countries, the task of eliminating them has become both more urgent and more difficult. Recently, the silver carp and bighead carp have been used to counteract cyanobacteria in many lakes in China.[24,37] Despite the hepatotoxicity of MC, the body weights of silver carp increase very fast in bodies of water that are full of MCs.[38] The high tolerance of silver carp to MCs might be due to the high basic GSH level in the liver or an increased GSH synthesis.[39] To fully understand the mechanism behind the high tolerance of silver cap to MCs, we evaluated whether the genes involved in glutathione synthesizing were under positive selection in silver carp. From the KEGG databases, 44 genes involved in glutathione synthesis in zebrafish were obtained with the full CDS region (PATHWAY dre00480). After searching against the whole transcriptome sequences, we found that most of the zebrafish CDS had been recovered (Table 4). Sequence pairs were constructed by the zebrafish CDS, and its corresponding best-hit blast scaffolds of silver carp were thereafter tested for whether they had experienced positive selection. For these 44 genes, the average number of codons was found to be 356 (range, 137–966). The F3 × 4 model of codon frequencies was used and models M7 and M8 were used to determine which pairs of sequences were under positive selection. The log-likelihood value under M8 was much higher than its corresponding value under M7, indicating that model M8 is more suitable to the sequence pair compared with model M7. LRT shows that five sequence pairs were found to have P-values <0.05. They are thought to have experienced positive selection between silver carp and zebrafish (Table 5).
Table 4.

Sequences recovered in the glutathione synthesizing pathway

Gene idDescriptionLengthMatched
dre:100002145Gamma-glutamyltranspeptidase20820
dre:100006589Isocitrate dehydrogenase 1 (NADP+)12901254
dre:100124622Glutathione S-transferase672514
dre:100330864Ribonucleoside-diphosphate reductase subunit M2-like11611057
dre:100333757Gamma-glutamyltransferase 5-like15211162
dre:114426Ornithine decarboxylase13861370
dre:30733Ribonucleotide reductase M2 polypeptide11611057
dre:30740Ribonucleotide reductase M1 polypeptide23852385
dre:322533Alanyl (membrane) aminopeptidase b28981238
dre:324366Glutathione S-transferase M660658
dre:324900Protein-disulfide reductase (glutathione)519515
dre:326857Glutamate-cysteine ligase, catalytic subunit18961896
dre:333974Glutamate-cysteine ligase, modifier subunit822736
dre:352926Glutathione peroxidase 1a576565
dre:352928Glutathione peroxidase 4a561561
dre:352929Glutathione peroxidase 4b576576
dre:386951Isocitrate dehydrogenase 2 (NADP+), mitochondrial13501350
dre:394009Spermidine synthase870749
dre:406278Gamma-glutamylcyclotransferase663600
dre:406703Glutathione S-transferase, alpha-like672571
dre:406736Glutathione S-transferase453425
dre:406762Phosphogluconate hydrogenase15361418
dre:431762Glutathione S-transferase459451
dre:436833Glutathione S-transferase kappa 1690680
dre:436894Glutathione S-transferase723722
dre:449784Microsomal glutathione S-transferase 1465244
dre:450084Glutathione synthetase14281385
dre:552981Glutathione peroxidase 75610
dre:553169Glutathione S-transferase pi 2627625
dre:553575Glutathione reductase (NADPH)12781257
dre:555478Aminopeptidase N2883763
dre:562854Leucine aminopeptidase 315541320
dre:563972Glutathione S-transferase theta 1a729729
dre:566746Gamma-glutamyltranspeptidase177382
dre:567275Glutathione S-transferase423423
dre:568744Glutathione S-transferase M3660658
dre:569014Gamma-glutamyltranspeptidase 1-like1725281
dre:570579Glucose-6-phosphate dehydrogenase15721572
dre:571365Glutathione S-transferase660658
dre:723997Microsomal glutathione S-transferase 24110
dre:79381Glutathione S-transferase pi627626
dre:798788Glutathione peroxidase 3669529
dre:799288Glutathione S-transferase672512
dre:80872Spermine synthase10831057
Table 5.

Genes determined to be under positive selection

Gene idModelLog likelihooddN/dSEstimates of parametersSites under selection (P > 0.95)
dre_322533M7(beta)−4530.0797610.3750p = 0.00500 and q = 0.00810164,167,256
M8(beta and ω)−4519.9342040.7805p0 = 0.93767, p = 0.04957, q = 0.14698 and w = 8.71914
dre_406703M7(beta)−1299.1594880.1663p = 0.13888 and q = 0.68894No
M8(beta & ω)−1295.38492516.1965p0=0.94248, p=9.04140, q = 99.00000 and w = 280.21751
dre_563972M7(beta)−1399.1092830.3750p = 0.00542 and q = 0.00912224,226,227,233
M8(beta & ω)−1390.9974182.5343p0 = 0.91793, p = 9.32406, q = 32.38965 and w = 28.38710
dre_79381M7(beta)−1105.0639000.1428p = 0.03221 and q = 0.18627129,174
M8(beta & ω)−1100.37224914.0364p0 = 0.97094, p = 7.50265, q = 99.00000 and w = 480.64973
dre_799288M7(beta)−1334.0961440.1833p = 0.12802 and q = 0.56863No
M8(beta & ω)−1328.43291413.0557p0 = 0.92145, p = 9.04080, q = 99.00000 and w = 165.23531
Sequences recovered in the glutathione synthesizing pathway Genes determined to be under positive selection

Identification of SSR or microsatellites

Because SSRs or microsatellite markers are used for many animal breeding applications, the 85 796 sequences were analyzed for identification of SSR markers. We obtained 13 327 SSR markers in 9636 sequences with the MISA.[34] In terms of abundance, mononucleotide repeats were found to be most abundant (7693, 57.3%) followed by dinucleotide repeats (3733, 28.0%) and trinucleotide repeats (1538, 11.5%). Other type of repeat units occurred at <2% each. SSR markers were divided into two groups, perfect SSR markers (only one single repeat motif such as ‘AGC’) and compound SSR markers (composed of two or more SSR markers separated by <100 bp). A total of 1206 (9.0%) compound SSR markers were identified. After excluding the mono-nucleotide repeats, the frequency of an SSR motif was calculated. Among the dinucleotide repeat motifs, AC/GT was the most abundant, with 69.2%; trinucleotide repeat motifs were rich in ATC/GAT, with 27.8%, and tetranucleotide repeat motifs were AGAT/ATCT, with 21.1%.

Discussion

The transcriptome is the complete repertoire of expressed RNA transcripts in the cell and its characterization is essential to understanding the functional complexity of the genome. Using the next-generation sequencing technology, we were able to sequence and annotate the transcriptome of silver carp. This is most comprehensive study of silver carp transcriptome data to date. The transcriptome sequences obtained by this study are useful to the understanding of the genetic makeup of the silver carp transcriptome, which until now has been very limited. The Illumina sequencing yielded 17.10 million paired-end reads for silver carp. The 85 769 sequences produced here may be useful for further research into silver carp functional genomics. The obtained overall GC content of the silver carp transcriptome was 39.2%, which was lower than the GC content of cDNA library of zebrafish (Ensembl 61).[31] However, when we removed the assembled sequences that contained gaps, the GC content rose to 45.5%, which was similar to that of the zebrafish cDNA library (46.2%). Due to the lack of a complete genome sequence, the quality of transcriptome analysis of non-model species must rely largely on the contigs and scaffolds assembled from the raw reads. After reassembling the transcriptomes of two mosquitoes with known genomes using a de novo assembler, Gibbons et al.[40] found that short reads can be used to assemble transcriptomes of non-model organisms. Although the development of the short-read assembler[28,41,42] has rendered research facilities capable of dealing with more and more reads, de novo assembly of transcriptomes without known reference genome using short reads is still difficult for transcripts with highly variable coverage.[43] So, a higher k-mer length will theoretically generate a more contiguous assembly of highly expressed RNAs while poorly expressed transcripts will be more easily obtained if a lower k-mer length is used.[41] Therefore, an approach for de novo assembly of the transcriptome using various k-mer lengths is highly desirable and has been proven useful.[27] The final assembly statistics indicate that the multiple k-mer method used in this study outperforms all other single k-mer methods (Table 1). In the single k-mer assembly, the average length and N50 were highest when the k-mer was set to 46, which we found to be best in all single k-mer assemblies. However, the number of contigs >100 bp and total length were twice that of the best single k-mer ABySS assembly. This marked increase was accompanied by a higher N50 and average contig size, indicating a substantial improvement in contiguity. Multiple k-mer methods assembled the Illumina reads into contigs, but the location information in paired-end reads were not used at all. In this study, we improved upon the methods described by Yann et al.[27] Results proved that using SSPACE to scaffold the contigs produced by each k-mer could produce longer sequences. The statistics before and after scaffolding (Tables 1 and 2) indicate that a higher average length of sequence can be obtained by joining the two contigs originating from the two ends of a DNA fragment. The max length and the average length of the sequences after scaffolding were further extended. To assess the quality of assembled transcriptome, we used both computing and experimenting assays to validate the transcripts generated. The RT–PCR results confirmed that our method is reliable for the recovery of both highly and poorly expressed transcripts. Both gene annotation and KEGG pathway analyses are useful for us to predict potential genes and their functions at a whole-transcriptome level. In the silver carp transcriptome, as discovered by this study, the predominant gene clusters are involved in the structural formation of the cell, cell part and organelle of a cellular component, the binding and catalytic activity of molecular function, metabolic process and cellular processes of biological processes. Similar results were found in Sus scrofa,[44] European eel[21] and rainbow trout.[45] However, in Chickpea transcriptome, sequences were found to be mainly involved in the protein metabolism of biological process, in chloroplasts, in the transferase activity of molecular function. This suggests remarkable difference between animal and plants. KEGG analysis showed that more than 44.5% of transcripts to be enrichment factors involved in 218 known metabolic or signaling pathways, including cell adherence, migration, apoptosis and immune-related processes. The KEGG pathway analysis and gene annotation may be useful for further investigation of gene function in future. Although there are differences between our silver carp transcriptome and available database for zebrafish in GO annotations, concordance in the overall patterns suggests that our library were widely sampled and provided a good representation. One previous study reported that GSH can conjugate with MC on its sulfhydryl, which is the first step in the detoxification of a cyanobacterial toxin in aquatic organisms.[46] The glutathione S-transferase (GST) gene plays important roles both in the biosynthesis of GST and catalysis of the reaction between GSH and MC. M8 assumes 11 site classes: 10 classes for the beta distribution and 1 class for the positively selected site. Therefore, it is suitable to detect positive selection in sequence pairs. Although the value of dN/dS observed in this model might not be precise, the LRT is most likely reliable. Positive selection pressure on GST gene might be the result of the adaptation of silver carp to the eutrophied bodies of water in the middle and lower reaches of the Yangtze River. Genetic markers are of great importance to the understanding genetic variation and to the identification of genes and quantitative trait locus for traits of interested in molecular breeding applications. Until now, only a small number of genetic markers have been available for silver carp.[47,48] One of the main reasons for this is the lack of genome sequence information. Alternatively, transcriptomes have been used for the discovery of genetic markers.[44,49,50] Although markers developed from transcriptomes are less polymorphic, they have been found to be very useful in trait mapping[51] and comparative genomics studies.[52] It has been reported that SSRs comprise 3% of the human genome, with the largest proportion of them being dinucleotide repeats (0.5%).[53] In this study, 28% of the 13 327 silver carp SSRs were found to be dinucleotide repeats, followed by trinucleotide repeats (11.5%). The most common dinucleotide repeats were AC and AG, in contrast to those found in the human genome (AC and AT). The same difference was also found in trinucleotide repeats, with ATC and AGG being most common in silver carp and AAT and AAC being most common in human.[53] In conclusion, we have determined the transcriptome of silver carp through use of high-throughput Illumina paired-end sequencing. Our study obtained 85 759 scaffolds and demonstrated some important features of silver carp transcriptome, such as gene annotation and KEGG pathway analysis, as shown by cross-transcriptome analysis. In addition, we identified reliable genetic markers for 13 324 SSRs. We also found that five genes identified as under positive selection between silver carp and zebrafish. This study will be helpful for improvement of the understanding of the recent speciation and adaption of Cyprinidae and provides useful resources and markers for future functional genomic research.

Supplementary Data

Supplementary Data are available at www.dnaresearch.oxfordjournals.org.

Funding

This research was supported by the grants from National Basic Research Program of China (973 Program 2010CB126302), National Natural Science Foundation of China (31090254 and U1036603) and Chinese Academy of Sciences (KSCX2-EW-Q-12).
  48 in total

1.  Initial sequencing and analysis of the human genome.

Authors:  E S Lander; L M Linton; B Birren; C Nusbaum; M C Zody; J Baldwin; K Devon; K Dewar; M Doyle; W FitzHugh; R Funke; D Gage; K Harris; A Heaford; J Howland; L Kann; J Lehoczky; R LeVine; P McEwan; K McKernan; J Meldrim; J P Mesirov; C Miranda; W Morris; J Naylor; C Raymond; M Rosetti; R Santos; A Sheridan; C Sougnez; Y Stange-Thomann; N Stojanovic; A Subramanian; D Wyman; J Rogers; J Sulston; R Ainscough; S Beck; D Bentley; J Burton; C Clee; N Carter; A Coulson; R Deadman; P Deloukas; A Dunham; I Dunham; R Durbin; L French; D Grafham; S Gregory; T Hubbard; S Humphray; A Hunt; M Jones; C Lloyd; A McMurray; L Matthews; S Mercer; S Milne; J C Mullikin; A Mungall; R Plumb; M Ross; R Shownkeen; S Sims; R H Waterston; R K Wilson; L W Hillier; J D McPherson; M A Marra; E R Mardis; L A Fulton; A T Chinwalla; K H Pepin; W R Gish; S L Chissoe; M C Wendl; K D Delehaunty; T L Miner; A Delehaunty; J B Kramer; L L Cook; R S Fulton; D L Johnson; P J Minx; S W Clifton; T Hawkins; E Branscomb; P Predki; P Richardson; S Wenning; T Slezak; N Doggett; J F Cheng; A Olsen; S Lucas; C Elkin; E Uberbacher; M Frazier; R A Gibbs; D M Muzny; S E Scherer; J B Bouck; E J Sodergren; K C Worley; C M Rives; J H Gorrell; M L Metzker; S L Naylor; R S Kucherlapati; D L Nelson; G M Weinstock; Y Sakaki; A Fujiyama; M Hattori; T Yada; A Toyoda; T Itoh; C Kawagoe; H Watanabe; Y Totoki; T Taylor; J Weissenbach; R Heilig; W Saurin; F Artiguenave; P Brottier; T Bruls; E Pelletier; C Robert; P Wincker; D R Smith; L Doucette-Stamm; M Rubenfield; K Weinstock; H M Lee; J Dubois; A Rosenthal; M Platzer; G Nyakatura; S Taudien; A Rump; H Yang; J Yu; J Wang; G Huang; J Gu; L Hood; L Rowen; A Madan; S Qin; R W Davis; N A Federspiel; A P Abola; M J Proctor; R M Myers; J Schmutz; M Dickson; J Grimwood; D R Cox; M V Olson; R Kaul; C Raymond; N Shimizu; K Kawasaki; S Minoshima; G A Evans; M Athanasiou; R Schultz; B A Roe; F Chen; H Pan; J Ramser; H Lehrach; R Reinhardt; W R McCombie; M de la Bastide; N Dedhia; H Blöcker; K Hornischer; G Nordsiek; R Agarwala; L Aravind; J A Bailey; A Bateman; S Batzoglou; E Birney; P Bork; D G Brown; C B Burge; L Cerutti; H C Chen; D Church; M Clamp; R R Copley; T Doerks; S R Eddy; E E Eichler; T S Furey; J Galagan; J G Gilbert; C Harmon; Y Hayashizaki; D Haussler; H Hermjakob; K Hokamp; W Jang; L S Johnson; T A Jones; S Kasif; A Kaspryzk; S Kennedy; W J Kent; P Kitts; E V Koonin; I Korf; D Kulp; D Lancet; T M Lowe; A McLysaght; T Mikkelsen; J V Moran; N Mulder; V J Pollara; C P Ponting; G Schuler; J Schultz; G Slater; A F Smit; E Stupka; J Szustakowki; D Thierry-Mieg; J Thierry-Mieg; L Wagner; J Wallis; R Wheeler; A Williams; Y I Wolf; K H Wolfe; S P Yang; R F Yeh; F Collins; M S Guyer; J Peterson; A Felsenfeld; K A Wetterstrand; A Patrinos; M J Morgan; P de Jong; J J Catanese; K Osoegawa; H Shizuya; S Choi; Y J Chen; J Szustakowki
Journal:  Nature       Date:  2001-02-15       Impact factor: 49.962

2.  Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes.

Authors:  Samuel Aparicio; Jarrod Chapman; Elia Stupka; Nik Putnam; Jer-Ming Chia; Paramvir Dehal; Alan Christoffels; Sam Rash; Shawn Hoon; Arian Smit; Maarten D Sollewijn Gelpke; Jared Roach; Tania Oh; Isaac Y Ho; Marie Wong; Chris Detter; Frans Verhoef; Paul Predki; Alice Tay; Susan Lucas; Paul Richardson; Sarah F Smith; Melody S Clark; Yvonne J K Edwards; Norman Doggett; Andrey Zharkikh; Sean V Tavtigian; Dmitry Pruss; Mary Barnstead; Cheryl Evans; Holly Baden; Justin Powell; Gustavo Glusman; Lee Rowen; Leroy Hood; Y H Tan; Greg Elgar; Trevor Hawkins; Byrappa Venkatesh; Daniel Rokhsar; Sydney Brenner
Journal:  Science       Date:  2002-07-25       Impact factor: 47.728

3.  Scaffolding pre-assembled contigs using SSPACE.

Authors:  Marten Boetzer; Christiaan V Henkel; Hans J Jansen; Derek Butler; Walter Pirovano
Journal:  Bioinformatics       Date:  2010-12-12       Impact factor: 6.937

4.  The genome sequence of Atlantic cod reveals a unique immune system.

Authors:  Bastiaan Star; Alexander J Nederbragt; Sissel Jentoft; Unni Grimholt; Martin Malmstrøm; Tone F Gregers; Trine B Rounge; Jonas Paulsen; Monica H Solbakken; Animesh Sharma; Ola F Wetten; Anders Lanzén; Roger Winer; James Knight; Jan-Hinnerk Vogel; Bronwen Aken; Oivind Andersen; Karin Lagesen; Ave Tooming-Klunderud; Rolf B Edvardsen; Kirubakaran G Tina; Mari Espelund; Chirag Nepal; Christopher Previti; Bård Ove Karlsen; Truls Moum; Morten Skage; Paul R Berg; Tor Gjøen; Heiner Kuhl; Jim Thorsen; Ketil Malde; Richard Reinhardt; Lei Du; Steinar D Johansen; Steve Searle; Sigbjørn Lien; Frank Nilsen; Inge Jonassen; Stig W Omholt; Nils Chr Stenseth; Kjetill S Jakobsen
Journal:  Nature       Date:  2011-08-10       Impact factor: 49.962

5.  Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing.

Authors:  Qun Pan; Ofer Shai; Leo J Lee; Brendan J Frey; Benjamin J Blencowe
Journal:  Nat Genet       Date:  2008-11-02       Impact factor: 38.330

6.  Next-generation DNA sequencing.

Authors:  Jay Shendure; Hanlee Ji
Journal:  Nat Biotechnol       Date:  2008-10       Impact factor: 54.908

7.  A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome.

Authors:  Marc Sultan; Marcel H Schulz; Hugues Richard; Alon Magen; Andreas Klingenhoff; Matthias Scherf; Martin Seifert; Tatjana Borodina; Aleksey Soldatov; Dmitri Parkhomchuk; Dominic Schmidt; Sean O'Keeffe; Stefan Haas; Martin Vingron; Hans Lehrach; Marie-Laure Yaspo
Journal:  Science       Date:  2008-07-03       Impact factor: 47.728

8.  Exploiting EST databases for the development and characterization of gene-derived SSR-markers in barley (Hordeum vulgare L.).

Authors:  T Thiel; W Michalek; R K Varshney; A Graner
Journal:  Theor Appl Genet       Date:  2002-09-14       Impact factor: 5.699

9.  Sequencing and characterization of the guppy (Poecilia reticulata) transcriptome.

Authors:  Bonnie A Fraser; Cameron J Weadick; Ilana Janowitz; F Helen Rodd; Kimberly A Hughes
Journal:  BMC Genomics       Date:  2011-04-20       Impact factor: 3.969

10.  Analysis of muscle and ovary transcriptome of Sus scrofa: assembly, annotation and marker discovery.

Authors:  Qinghua Nie; Meixia Fang; Xinzheng Jia; Wei Zhang; Xiaoning Zhou; Xiaomei He; Xiquan Zhang
Journal:  DNA Res       Date:  2011-07-05       Impact factor: 4.458

View more
  28 in total

1.  Characterizing the transcriptome and molecular markers information for roach, Rutilus rutilus.

Authors:  Wei Chi; Xufa Ma; Jiangong Niu; Ming Zou
Journal:  J Genet       Date:  2016-03       Impact factor: 1.166

2.  Transcriptome Analysis and Identification of a Female-Specific SSR Marker in Pistacia chinensis Based on Illumina Paired-End RNA Sequencing.

Authors:  Xiaomao Cheng; Fei Wang; Wen Luo; Jingge Kuang; Xiaoxia Huang
Journal:  Genes (Basel)       Date:  2022-06-07       Impact factor: 4.141

3.  Identification of glutathione S-transferase genes responding to pathogen infestation in Populus tomentosa.

Authors:  Weihua Liao; Lexiang Ji; Jia Wang; Zhong Chen; Meixia Ye; Huandi Ma; Xinmin An
Journal:  Funct Integr Genomics       Date:  2014-05-29       Impact factor: 3.410

4.  Quantitatively evaluating detoxification of the hepatotoxic microcystin-LR through the glutathione (GSH) pathway in SD rats.

Authors:  Xiaochun Guo; Liang Chen; Jun Chen; Ping Xie; Shangchun Li; Jun He; Wei Li; Huihui Fan; Dezhao Yu; Cheng Zeng
Journal:  Environ Sci Pollut Res Int       Date:  2015-10-21       Impact factor: 4.223

5.  De novo assembly of mud loach (Misgurnus anguillicaudatus) skin transcriptome to identify putative genes involved in immunity and epidermal mucus secretion.

Authors:  Yong Long; Qing Li; Bolan Zhou; Guili Song; Tao Li; Zongbin Cui
Journal:  PLoS One       Date:  2013-02-20       Impact factor: 3.240

6.  Second generation physical and linkage maps of yellowtail (Seriola quinqueradiata) and comparison of synteny with four model fish.

Authors:  Jun-ya Aoki; Wataru Kai; Yumi Kawabata; Akiyuki Ozaki; Kazunori Yoshida; Takashi Koyama; Takashi Sakamoto; Kazuo Araki
Journal:  BMC Genomics       Date:  2015-05-24       Impact factor: 3.969

7.  Transcriptome sequencing and characterization of Japanese scallop Patinopecten yessoensis from different shell color lines.

Authors:  Jun Ding; Le Zhao; Yaqing Chang; Wenming Zhao; Zhenlin Du; Zhenlin Hao
Journal:  PLoS One       Date:  2015-02-13       Impact factor: 3.240

8.  Genome-wide SNP discovery from transcriptome of four common carp strains.

Authors:  Jian Xu; Peifeng Ji; Zixia Zhao; Yan Zhang; Jianxin Feng; Jian Wang; Jiongtang Li; Xiaofeng Zhang; Lan Zhao; Guangzan Liu; Peng Xu; Xiaowen Sun
Journal:  PLoS One       Date:  2012-10-26       Impact factor: 3.240

9.  Transcriptome analysis of crucian carp (Carassius auratus), an important aquaculture and hypoxia-tolerant species.

Authors:  Xiaolin Liao; Lei Cheng; Peng Xu; Guoqing Lu; Michael Wachholtz; Xiaowen Sun; Songlin Chen
Journal:  PLoS One       Date:  2013-04-22       Impact factor: 3.240

10.  Deep sequencing for de novo construction of a marine fish (Sparus aurata) transcriptome database with a large coverage of protein-coding transcripts.

Authors:  Josep A Calduch-Giner; Azucena Bermejo-Nogales; Laura Benedito-Palos; Itziar Estensoro; Gabriel Ballester-Lozano; Ariadna Sitjà-Bobadilla; Jaume Pérez-Sánchez
Journal:  BMC Genomics       Date:  2013-03-15       Impact factor: 3.969

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.