Literature DB >> 25338101

In-depth transcriptome analysis of the red swamp crayfish Procambarus clarkii.

Huaishun Shen1, Yacheng Hu1, Yuanchao Ma1, Xin Zhou2, Zenghong Xu2, Yan Shui2, Chunyan Li3, Peng Xu3, Xiaowen Sun3.   

Abstract

The red swamp crayfish Procambarus clarkii is a highly adaptable, tolerant, and fecund freshwater crayfish that inhabits a wide range of aquatic environments. It is an important crustacean model organism that is used in many research fields, including animal behavior, environmental stress and toxicity, and studies of viral infection. Despite its widespread use, knowledge of the crayfish genome is very limited and insufficient for meaningful research. This is the use of next-generation sequencing techniques to analyze the crayfish transcriptome. A total of 324.97 million raw reads of 100 base pairs were generated, and a total of 88,463 transcripts were assembled de novo using Trinity software, producing 55,278 non-redundant transcripts. Comparison of digital gene expression between four different tissues revealed differentially expressed genes, in which more overexpressed genes were found in the hepatopancreas than in other tissues, and more underexpressed genes were found in the testis and the ovary than in other tissues. Gene ontology (GO) and KEGG enrichment analysis of differentially expressed genes revealed that metabolite- and immune-related pathway genes were enriched in the hepatopancreas, and DNA replication-related pathway genes were enriched in the ovary and the testis, which is consistent with the important role of the hepatopancreas in metabolism, immunity, and the stress response, and with that of the ovary and the testis in reproduction. It was also found that 14 vitellogenin transcripts were highly expressed specifically in the hepatopancreas, and 6 transcripts were highly expressed specifically in the ovary, but no vitellogenin transcripts were highly expressed in both the hepatopancreas and the ovary. These results provide new insight into the role of vitellogenin in crustaceans. In addition, 243,764 SNP sites and 43,205 microsatellite sequences were identified in the sequencing data. We believe that our results provide an important genome resource for the crayfish.

Entities:  

Mesh:

Substances:

Year:  2014        PMID: 25338101      PMCID: PMC4206422          DOI: 10.1371/journal.pone.0110548

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

The red swamp crayfish Procambarus clarkii is a freshwater crayfish species that is native to parts of Mexico and the United States [1], but is also commonly found outside its natural range in Asia, Africa, Europe, and elsewhere in the Americas, where it is often considered to be an invasive pest [2]. P. clarkii was introduced to China from Japan in the 1930s [3]. Crayfish farming began in in the 18th century in Louisiana in the USA, where the species was cultivated in rice fields. Crayfish have been farmed extensively in China since the 1990s, and China is now the world's leading crayfish producer [4]. P. clarkii is a highly adaptable, tolerant, and fecund freshwater crayfish that can inhabit a wide range of aquatic environments, including those with moderate salinity, low oxygen levels, extreme temperatures, and pollution [2], [5]. Because of these characteristics, in addition to its economic role, the crayfish has become an important crustacean model organism in research on viral infection [6]–[10], animal behavior [11]–[17], and environmental stress and toxicity [18]–[23]. Despite great interest in this organism, knowledge of the crayfish genome is very limited, and gene discovery has been performed on a relatively small scale. Only 330 expressed sequences (EST) and 547 nucleotide sequences have been deposited in GenBank (accessed on Jul 29, 2014) for the crayfish, which is fewer than the close relative Pacifastacus leniusculus (1063 EST and 1100 nucleotide sequences) and far less fewer than other economically important crustaceans, such as the freshwater prawn Macrobrachium nipponense, the giant freshwater prawn Macrobrachium rosenbergii, the pacific white shrimp Litopenaeus vannamei, and others. Furthermore, only a very few genetic markers have been discovered for P. clarkii [3], [24]–[26]. The traditional methodology to explore expressed sequence tags (ESTs) involves construction of a cDNA library followed by Sanger sequencing, which is time-consuming and inefficient. Normally, the numbers of ESTs generated using this method is no more than ten thousand [27]. In recent years, next-generation sequencing technologies from companies such as 454 Life Sciences, Illumina, and Applied Biosystems (SOLiD sequencing) have been widely used to explore genomic information in model and non-model organisms. In comparison to traditional Sanger sequencing technology, next-generation sequencing technologies are superior in many aspects, and in general they are able to provide enormous amounts of sequence data with a greater breadth and depth of information, in shorter times and at a significantly lower cost [28]–[30]. The expressed sequences generated using next-generation sequencing technologies are often on the order of thousands or hundreds of thousands of sequences, which are ten-fold or one-hundred-fold greater than the number identified by traditional technologies. Crustaceans studied using next-generation sequencing technologies include the giant freshwater prawn Macrobrachium rosenbergii [31], the orient river prawn Macrobrachium nipponense [32], the Chinese mitten crab Eriocheir sinensis [33]–[36], the pacific white shrimp Litopenaeus vannamei [37]–[40], the Chinese shrimp Fenneropenaeus chinensis [41], the pandalid shrimp Pandalus latirostris [42], and the crab Portunus trituberculatus [43]–[44]. These data have significantly enriched our genetic and genomic knowledge of crustaceans. In this study, hi-seq sequencing technology was used to sequence the transcriptomes of 4 major organs in the crayfish: hepatopancreas, muscle, ovary, and testis. This data was used to generate expressed sequence data, simple sequence repeat markers, and SNP markers that represent a resource for trait mapping, as well as differential organ gene expression profiles, to better understand the functions of the studied organs in the crayfish. We believe that the data obtained from this study represent an import resource for crayfish research into gene function, molecular events associated with breeding, and other areas.

Material and Methods

Ethics statement

This study was approved by the Animal Care and Use Committee of the Center for Applied Aquatic Genomics at Chinese Academy of Fishery Sciences.

Animal collection

P. clarkii weighing approximately 10–20 g were collected from a crayfish farm in Xuyi, Jiangsu Province, China. Collected crayfish were cultured in water tanks with adequate aeration at 20°C and a natural photoperiod, and were fed with a commercial crayfish diet once per day. Four tissue types (hepatopancreas, muscle, ovary, and testis) were collected, and each group of tissues contained samples from approximately ten crayfish. The tissue samples were frozen immediately in liquid nitrogen, and stored at −80°C.

RNA isolation and Illumina sequencing

Total RNA from various tissues was isolated using the RNeasy Plus Mini Kit (Qiagen, Valencia, CA, USA) according to the manufacturer's protocol, and treated with RNase-free DNase I (Qiagen) to remove genomic DNA. RNA integrity was evaluated by 1.5% agarose gel electrophoresis. RNA concentrations were measured and purity was determined using a NanoDrop ND-1000 spectrophotometer (NanoDrop Technologies, Wilmington, DE, USA). RNA-seq library preparation and sequencing was carried out by the Genomic Analysis Lab of The Institute of Genetics and Developmental Biology of the Chinese Academy of Sciences (Beijing, China). Approximately 5 µg of DNase-treated total RNA was used to construct a cDNA library following the protocols of the Illumina TruSeq RNA Sample Preparation Kit (Illumina, San Diego, CA 92122, USA). The cDNA libraries were amplified by PCR and contained TruSeq indexes 1–4 within the adaptors. Amplified libraries yielded approximately 500 ng of cDNA with an average length of approximately 270 base pairs (bp). Finally, the libraries were sequenced on an Illumina HiSeq 2000 instrument with 100 bp paired-end (PE) reads.

De novo assembly and transcriptome analysis

Raw reads, which were generated by the Illumina/Solexa sequencer, were first trimmed by removing adapter sequences. Low quality reads (quality scores less than 20) were trimmed and short length reads (<10 bp) were removed [45]–[46]. The resulting high-quality reads were used in subsequent assembly. The Crayfish transcriptome was de novo assembled using Trinity software (vision 2013.02.25) with the default parameters [47]. In brief, three steps were performed. First, data was processed by Inchworm, in which the high-quality reads were combined to form longer fragments called contigs. Second, data was processed by Chrysalis, in which sequences were obtained by connecting contigs in such a manner that they could not be extended on either end, which resulted in de Bruijn graphs. Finally, de Bruijn graphs were further treated by Butterfly to obtain transcripts.

Transcriptome annotation and gene ontology analysis

All transcripts were compared with the NCBI non-redundant (nr) protein database, GO database, COG database, and KEGG database for functional annotation using BLAST software with an e-value cutoff of 1e-5 [48]. Functional annotation was performed with gene ontology (GO) terms (www.geneontology.org) that were analyzed using Blast2go software (http://www.blast2go.com/b2ghome) [49]. The COG and KEGG pathway annotations were performed using Blastall software against the COG and KEGG databases [48].

Differentially expressed genes

To obtain expression levels for every transcript in different tissues, cleaned reads were first mapped to all transcripts using Bowtie software [50]–[51], then the FPKM (fragments per kilobase of exon per million fragments mapped) value of every transcript was obtained using RSEM (RNASeq by expectation maximization, http://deweylab.biostat.wisc.edu/rsem/) software. Differentially expressed genes were identified using edgeR (empirical analysis of digital gene expression data in R, http://www.bioconductor.org/packages/release/bioc/html/edgeR.html) software [52]–[53]. For this analysis step, the filtering threshold was set as an FDR (false discovery rate) <0.5.

RT-PCR amplification of transcripts

To validate the assembly of the crayfish transcriptome, 20 selected transcripts were used for expression analysis by RT-PCR. Total RNA was prepared from the four tissues (hepatopancreas, muscle, ovary, and testis) of crayfish using Trizol reagent in accordance with the manufacturer's instructions. Total RNA was treated with RQ1 RNase-free DNase (Promega, Madison, WI, USA) to avoid genomic DNA contamination, and then reverse transcribed using M-MLV reverse transcriptase (Promega, USA) according to the manufacturer's instructions. The synthesized cDNA was used as a template for PCR. The following PCR program was used: denaturation at 94°C for 3 min, 35 amplification cycles of 94°C for 30 s, 58°C for 30 s, and 72°C for 30 s, and a final extension at 72°C for 10 min. The PCR products were determined by 1.5% agarose gel electrophoresis using DNA markers. The expression of the 18S RNA gene of Procambrus clarkii (accession number: EU920952.1) was selected as reference gene, using the primer pair Pc18S-F (5′-ATCACGTCTCTGACCGCAAG-3′) and Pc18S-R (5′-GACACTTGAAAGATGCGGCG-3′).

SNP and microsatellite sequence identification

The raw reads were exported in FASTQ format to allow them to be imported into software for SNP calling. SAMtools (http://samtools.sourceforge.net/) and VarScan (http://varscan.sourceforge.net/) software were applied to align reads to the reference transcriptome and to detect SNPs [54]–[55]. For this analysis step, the filtering threshold was set as a quality score no less than 20. Msatcommander software was used to identify microsatellites from assembled contigs, as well as for primer design [56]. The mononucleotide repeats were ignored by modifying the configuration file. The repeat thresholds for di-, tri-, tetra-, penta-, and hexa-nucleotide motifs were set as 8, 5, 5, 5, and 5, respectively. Only microsatellite sequences with flanking sequences longer than 50 bp on both sides were collected for future marker development.

Results and Discussion

Illumina sequencing from crayfish tissues

Illumina-based RNA-sequencing (RNA-Seq) was performed with samples of four tissue types from crayfish. A total of 324.97 million paired-end reads were generated with a read length of 100 bp, of which 102.46 million reads were from the hepatopancreas, 83.51 million reads were from muscle, 84.94 million reads were from the ovary, and 36.06 million reads were from the testis. All raw sequence data were deposited in the NCBI Sequence Read Archive (SRA) under accession code SRP044128. After trimming of low quality reads and short reads, a total of 306.73 million high-quality sequences (94.73%) were obtained (Table S1 in File S1), and these sequences were used for further analysis.

De novo assembly of the transcriptome

At present, P. clarkii has no reference genome sequence, therefore a de novo assembly strategy was utilized, in which the crayfish transcriptome was de novo assembled by Trinity software (version 2013.02.25) using the default parameters. De novo assembly of 306.73 million high-quality sequences generated a total of 88,463 transcripts that ranged from 351 to 34,708 bp in length, with an average length of 1655.49 bp (Table 1). The length distribution of transcripts is shown in Figure 1. Most of the transcripts (23.71%) were 401–600 bp in length, 11.51% ranged from 601–800 bp, and 10.90% ranged from 1–400 bp. These 88,463 transcripts yielded a total of 50,219 non-redundant transcripts because of alternative splicing; therefore, it was possible to match two or more transcripts to one gene. Our sequence data provided a large number of transcripts as compared to publicly available data from the Genbank database, which represent a convenient source of information for future full length cDNA cloning and gene function research in P. clarkii. Prior to this study, only 330 EST and 547 nucleotide sequences were listed in the GenBank database. Our study supplied 50,219 non-redundant transcripts, which has significantly enriched our knowledge of the P. clarkii genome and will facilitate further study of the functions of P. clarkii genes.
Table 1

Summary of Illumina Hiseq2000 assembly and analysis of P. clarkii transcriptomic sequences.

TypeNumber
Total genes50219
Total transcripts88463
Total residues146449732
Average length1655.49
Largest transcript34708
Smallest transcript351
Figure 1

Length distribution of assembled transcripts of P. clarkii.

Functional annotation

Protein coding sequences of transcripts were predicted using a tool supplied by Trinity software (http://trinityrnaseq.sourceforge.net/analysis/extract_proteins_from_trinity_transcripts.html). Of the 88,463 transcripts, 42,905 were found to contain open reading frames (ORFs), with an average protein coding length of 552.78 bp and a mean nucleotide length of 2551.34 bp. These isosequences likely represent genes that play essential roles in P. clarkii biological processes. 88,463 transcripts were compared with the NCBI non-redundant (nr) protein database, GO database, COG database, and KEGG database for functional annotation using BlastX with an e-value cutoff of 1e-5 (Table S2 in File S1). A total of 31,763 transcripts (35.91% of all transcripts) had significant hits in at least one of these databases, which corresponded to 11,222 genes (21.63% of all genes). A total of 30,779 transcripts (96.90% of all annotated transcripts) had significant hits in the nr protein database, which corresponded to 10,862 genes (96.79% of all annotated genes). The gene names of top BLAST hits were assigned to each transcript with significant hits, and 3110 transcripts from P. clarkii were best matched with genes from Daphnia pulex, 2675 transcripts were best matched with genes from Tribolium castaneum, and 1837 transcripts were best matched with genes from Pediculus humanus (Figure S1). Daphnia pulex is a primitive water flea, Tribolium castaneum is a type of beetle, and Pediculus humanus is a louse species that infests humans. Thus, the genes from P. clarkii were most similar to those known from crustaceans and insects, and the distribution of significant BLAST hits over different organisms reflects the phylogenetic relationship between P. clarkii and other species. GO analysis was conducted on annotated transcripts using blast2go software. A total of 15,457 transcripts, corresponding to 5890 genes, were assigned at least one GO term for biological processes, molecular functions, and cellular components, and the output of the GO annotations was plotted (Figure 2). Terms from the molecular function term group made up the majority of significant terms (12,842 transcripts, 88.08%), followed by the biological process group (10,241 transcripts, 66.25%) and the cellular component group (7,406 transcripts, 47.91%). For biological processes, genes involved in cellular processes (GO: 0009987, 8,148 transcripts), metabolic processes (GO: 0008152, 6,622 transcripts), and single-organism process (GO: 0044699, 5,623 transcripts) were highly represented. For molecular functions, binding (GO: 0005488, 8,313 transcripts) and catalytic activity (GO: 0003824, 6,781 transcripts) were the most represented GO terms. For cellular components, cells (GO: 0005623, 5,778 transcripts), cell part (GO: 0044464, 5,778 transcripts), and organelles (GO: 0043226, 3,632 transcripts) were the most represented terms. There were 9 identified terms that contained fewer than 10 transcripts.
Figure 2

Gene ontology (GO) classification of transcripts of P. clarkii.

GO terms were processed by Blast2Go and categorized at level 2 under three main categories (biological process, cellular component, and molecular function).

Gene ontology (GO) classification of transcripts of P. clarkii.

GO terms were processed by Blast2Go and categorized at level 2 under three main categories (biological process, cellular component, and molecular function). Transcripts were also compared with the COG database, and 6,034 transcripts (2,386 genes) were matched to database entries. These transcripts were classified into 25 functional categories (Figure 3), among which the largest group (2031 transcripts) was signal transduction mechanisms, followed by general function prediction only (1718 transcripts), transcription (914 transcripts), posttranslational modification, protein turnover, and chaperones (782 transcripts). Genes matched to the nuclear structure category (17 transcripts) represented the smallest group.
Figure 3

Cluster of orthologous groups (COG) classification of putative proteins.

In addition, a KEGG pathway analysis was performed on all assembled transcripts as an alternative approach for functional categorization and annotation. A total of 14,596 transcripts, corresponding to 5414 genes, were categorized into functional groups, in which the metabolism group was the most well represented, with 7821 transcripts, followed by the human disease group (7597 transcripts), organismal systems group (7179 transcripts), environmental information processing group (3460 transcripts), cellular processes group (3420 transcripts), and genetic information processing group (2481 transcripts) (Table S3 in File S1). Each functional group was made up of genes from different KEGG pathways. In addition, the number of transcripts in each KEGG pathway was counted, and the most abundant 20 KEGG pathways are shown in Figure 4. In brief, 2016 transcripts were categorized into metabolic pathways, followed by pathways for biosynthesis of secondary metabolites (552 transcripts), cancer (435 transcripts), focal adhesion (431 transcripts), and endocytosis (419 transcripts).
Figure 4

KEGG Classification of the genes.

14596 transcripts were assigned to 311 KEGG pathways. The top 20 most abundant KEGG pathways are shown.

KEGG Classification of the genes.

14596 transcripts were assigned to 311 KEGG pathways. The top 20 most abundant KEGG pathways are shown.

Differential analysis of gene expression profiles between tissues

The expression levels of whole transcripts in the hepatopancreas, the testis, the ovary, and muscle were evaluated (Table S4 and S5 in File S1). Transcriptomic analysis of these tissues showed that more genes were overexpressed in the hepatopancreas as compared with genes expressed in the other three tissues, while more genes were underexpressed in the testis and the ovary compared with genes expressed in the hepatopancreas and muscle (Figure 5). Interestingly, although more genes were overexpressed in the hepatopancreas, fewer genes were expressed in the hepatopancreas (32999 genes) than in the muscle (37873 genes), the ovary (45083 genes), and the testis (39503 genes). This result may be due to the crucial role of the hepatopancreas in growth, which resulted in genes for metabolism being actively and highly expressed, while the ovary and the testis are reproductive organs, and thus more functional molecules are needed in reserve, but are not highly expressed.
Figure 5

Differentially expressed genes analysis of four different crayfish tissues.

hep: hepatopancreas; mu: muscle; ov: ovary; te: testis.

Differentially expressed genes analysis of four different crayfish tissues.

hep: hepatopancreas; mu: muscle; ov: ovary; te: testis.

Enriched pathways in the hepatopancreas

Metabolism is the basic physiological process that sustains living organisms, and it includes multiple reactions, such as the synthesis of digestive enzymes, secretion, digestion, nutrient absorption, excretion, lipid and glycogen storage, and mobilization [31]. In crustaceans, the hepatopancreas is the major metabolic organ. KEGG pathway enrichment analysis showed that regulation of amino acids, carbohydrates, lipids, and glycan metabolism were significantly enriched in the hepatopancreas compared to the other three tissues (Table 2). For example, 59 transcripts were identified in the fatty acid metabolism pathway (ko00071), and 26 of these transcripts (21 genes) were found to be significantly overexpressed in the hepatopancreas compared to muscle, including ACOX1 (acyl-CoA oxidase [EC:1.3.3.6]), ACADS (butyryl-CoA dehydrogenase [EC:1.3.8.1]), ALDH7A1 (aldehyde dehydrogenase family 7 member A1 [EC:1.2.1.3, 1.2.1.8, 1.2.1.31]), and ACAA2 (acetyl-CoA acyltransferase 2 [EC:2.3.1.16]), and only 7 transcripts (3 genes) were found to be significantly underexpressed. It was also found that the expression levels of genes in the fatty acid metabolism pathway in the hepatopancreas were significantly higher than those in the ovary and the testis, indicating that active fatty acid metabolism takes place in the hepatopancreas of P. clarkii (Table S6 in File S1).
Table 2

KEGG pathways enriched in 4 crayfish tissues with Bonferroni-corrected p-values.

tissue 1tissue 2#TermSample numberBackground numberCorrected P-Value
hepato-pancreasmuscleLysosome1482727.27E-07
Galactose metabolism45661.96E-05
Peroxisome791385.42E-05
Regulation of actin cytoskeleton1693425.45E-05
Rheumatoid arthritis39580.000139
Methane metabolism34490.000183
PPAR signaling pathway49790.000186
Bacterial invasion of epithelial cells901690.000266
Starch and sucrose metabolism641120.000267
Focal adhesion1843890.000278
Arachidonic acid metabolism611070.000422
Basal cell carcinoma38600.000693
Sphingolipid metabolism56980.000766
Glutathione metabolism581030.000904
Glycolysis/Gluconeogenesis561000.001221
Other glycan degradation42710.001747
Retinol metabolism22310.001909
Metabolism of xenobiotics by cytochrome P45037610.00197
Alanine, aspartate and glutamate metabolism32510.002217
Pentose phosphate pathway35580.003069
Type II diabetes mellitus30480.003427
Shigellosis661280.005031
Phagosome1002080.005674
Fatty acid metabolism34580.006587
Viral myocarditis811650.007891
Drug metabolism - other enzymes35610.008885
Nitrogen metabolism22340.009292
Endometrial cancer531020.012052
Pyruvate metabolism39710.013601
Pentose and glucuronate interconversions24390.014534
Amino sugar and nucleotide sugar metabolism611240.026912
Caffeine metabolism9110.028532
Antigen processing and presentation33610.037375
Proximal tubule bicarbonate reclamation21350.038357
Cytokine-cytokine receptor interaction16250.044251
Drug metabolism - cytochrome P45030550.044251
hepato-pancreasovaryRibosome85910
Lysosome1882724.53E-08
Drug metabolism - cytochrome P45049552.00E-07
Metabolism of xenobiotics by cytochrome P45051617.90E-06
Oxidative phosphorylation961355.70E-05
Glutathione metabolism761036.91E-05
Rheumatoid arthritis44580.004021
Sphingolipid metabolism68980.004644
Retinol metabolism26310.004991
Lysine degradation921400.006405
Pentose and glucuronate interconversions31390.006589
Fatty acid metabolism42580.017066
Ascorbate and aldarate metabolism18210.023131
Cell cycle1161880.023826
Arginine and proline metabolism59880.023826
Arachidonic acid metabolism701070.023826
Parkinson's disease821290.031582
PPAR signaling pathway53790.036184
Melanoma41590.039756
hepato-pancreastestisLysosome1632724.82E-07
Peroxisome911382.02E-06
Amino sugar and nucleotide sugar metabolism831242.14E-06
Fatty acid metabolism44581.18E-05
Rheumatoid arthritis43584.29E-05
Starch and sucrose metabolism721127.17E-05
Oxidative phosphorylation841357.17E-05
Metabolism of xenobiotics by cytochrome P45044617.95E-05
Drug metabolism - cytochrome P45040550.000155
Glutathione metabolism651030.0004
Drug metabolism - other enzymes42610.000658
Retinol metabolism24310.002129
Galactose metabolism43660.003549
Pentose and glucuronate interconversions28390.004508
Valine, leucine and isoleucine degradation41630.004508
Sphingolipid metabolism59980.004508
Aminobenzoate degradation16190.004508
Butanoate metabolism30430.004508
Other glycan degradation45710.004508
beta-Alanine metabolism26360.004797
Alanine, aspartate and glutamate metabolism34510.006232
Arginine and proline metabolism53880.007429
DNA replication42670.008497
PPAR signaling pathway48790.009379
Arachidonic acid metabolism621070.009379
Histidine metabolism19250.009451
Cysteine and methionine metabolism36560.009451
Two-component system16200.009557
Nitrogen metabolism24340.010345
Parkinson's disease721290.013287
Antigen processing and presentation38610.014212
Tyrosine metabolism28430.023795
Propanoate metabolism29450.024472
Glycosaminoglycan degradation30470.024998
Pyrimidine metabolism891680.026852
Glycine, serine and threonine metabolism44750.029221
Synthesis and degradation of ketone bodies15200.032987
Glycolysis/Gluconeogenesis561000.032987
Tryptophan metabolism32520.035613
Meiosis - yeast631150.035613
Nucleotide excision repair43740.036095
Collecting duct acid secretion17240.041595
Pyruvate metabolism41710.048078
ovarymuscleRibosome81910
Protein processing in endoplasmic reticulum1342400.001548
Type II diabetes mellitus35480.001548
Glycolysis/Gluconeogenesis631000.001843
DNA replication43670.014867
Colorectal cancer611020.014867
Insulin signaling pathway1232300.019284
Cell cycle1021880.024154
Focal adhesion1963890.024154
Mismatch repair32490.027405
Homologous recombination38610.02987
Fanconi anemia pathway601050.033469
Basal cell carcinoma37600.036003
Thyroid cancer14180.04045
Melanoma36590.044224
Prostate cancer51890.044446
testismuscleRNA polymerase47690.000652
Focal adhesion1993890.002043
Glycolysis/Gluconeogenesis611000.002043
Carbon fixation in photosynthetic organisms29420.007085
Bladder cancer19250.011212
Basal cell carcinoma38600.011212
Cytosolic DNA-sensing pathway30450.011555
Pyrimidine metabolism901680.018107
Colorectal cancer581020.021629
Regulation of actin cytoskeleton1683420.03891
ovarytestisRibosome80910
Enriched pathways in xenobiotic metabolism, heavy metal and oxidative stress, and the innate immune system were also found in the hepatopancreas. The crayfish is well-known for its ability to survive in polluted environments, including water polluted by heavy metals, pesticides, and other chemicals, and also to tolerate hypoxia. The hepatopancreas is the primary site for accumulation and detoxification of xenobiotic pollutants in lysosomes. The R cells in the hepatopancreas perform biotransformation using enzymes, such as cytochrome p450, to sequester and detoxify xenobiotic pollutants [31]. Compared to the other tissues, several pathways were significantly enriched in the hepatopancreas: “lysosome”, “peroxisome”, “metabolism of xenobiotics by cytochrome p450”, and “drug metabolism - cytochrome P450” (Table 2). Peroxisomes are essential organelles that play key roles in redox signaling and lipid homeostasis [57]. Here, 146 transcripts were identified in the peroxisome pathway (ko04146), of which 71 transcripts were expressed at significantly higher levels in the hepatopancreas than in muscle tissue, including SOD2 (superoxide dismutase, Fe-Mn family [EC:1.15.1.1]), CAT (catalase [EC:1.11.1.6]), DDO (D-aspartate oxidase [EC:1.4.3.1]), DAO (D-amino-acid oxidase [EC:1.4.3.3]), SCP2 (sterol carrier protein 2 [EC:2.3.1.176]), and PIPOX (sarcosine oxidase/L-pipecolate oxidase [EC:1.5.3.7, 1.5.3.1]) (Figure S2, Table S7 in File S1). Only 11 transcripts were expressed at significantly lower levels in the hepatopancreas than in muscle tissue. Methyl farnesoate and ecdysteroids are important hormones in crustaceans. Methyl farnesoate, which is synthesized in the mandibular organ (MO), is an insect juvenile hormone homologue that is believed to act as a juvenile hormone in crustaceans [58]. Juvenile hormones are involved in many biological processes, including development and reproduction. The major function of ecdysteroids is to control molting, but they are also involved in reproduction [59]. Here, genes in the insect hormone biosynthesis pathway were identified from the P. clarkii transcriptome. Among these genes, genes in the juvenile hormone synthesis pathway were significantly overexpressed in the hepatopancreas compared to the other three tissues (Table S8 in File S1). In particular, seven of nine transcripts that encoded a CYP15A1 (cytochrome P450, family 15, subfamily A, polypeptide 1) homologue were highly expressed in the hepatopancreas, in which the expression levels of these transcripts were more than 100-fold greater than in the other tissues. These results indicate that the hepatopancreas is an important site for genes that are responsible for the synthesis of juvenile hormone. In contrast, genes in the ecdysteroid synthesis pathways did not show the same trend, and the differences in expression of these genes were not significant between the examined tissues.

Enriched pathways in the ovary and the testis

The ovary and the testis are the major reproductive organs, in which the processes of oogenesis, sperm genesis, DNA replication, and meiosis occur frequently. As expected, KEGG pathway analysis showed that the pathways of “DNA replication”, “cell cycle”, “mismatch repair”, “homologous recombination”, were significantly enriched in the ovary compared to muscle and the hepatopancreas. The pathways of “DNA replication”, “pyrimidine metabolism”, “meiosis-yeast”, and “Nucleotide excision repair” were significantly enriched in the testis compared to muscle and the hepatopancreas. For example, 51 transcripts were identified in the DNA replication pathway (ko03030), of which 32 transcripts (26 genes) were expressed at significantly greater levels in the ovary than in the hepatopancreas, and of which only 2 genes were expressed at significantly lower levels in the ovary than in the hepatopancreas. Analysis showed that 26 transcripts (22 genes) were expressed at significantly higher levels in the testis than in the hepatopancreas, and only 6 transcripts (4 genes) were expressed at significantly lower levels in the testis than in the hepatopancreas (Table S9 in File S1). In oviparous animals, vitellin is the major yolk protein that provides nutrition during embryonic development. The precursor of vitellin is vitellogenin (Vg). It is believed that extraovarian Vg is synthesized in the hepatopancreas and secreted in the hemolymph, where it is sequestered into developing oocytes by the Vg receptor (VgR) through receptor-mediated endocytosis [60]. It has been reported that multiple genes encode vitellogenin in various crustaceans, such as the shrimp Metapenaeus ensis, the freshwater water flea Daphnia magna, and the banana shrimp Penaeus merguiensis [61]–[63]. Here 29 transcripts (20 genes) were determined to encode vitellogenin, of which 14 transcripts were highly expressed specifically in the hepatopancreas, 6 transcripts were highly expressed specifically in the ovary, and no transcript was found to be highly expressed in the testis or in muscle. Indeed, vitellogenin was extremely difficult to detect in muscle (Table S10 in File S1), suggesting that vitellogenin is synthesized in the hepatopancreas and the ovary of P. clarkii. This result is consistent with previous reports in the Chinese mitten-handed crab Eriocheir sinensis, the tiger shrimp Penaeus monodon, the blue crab Callinectes sapidus, the freshwater crayfish Cherax quadricarinatus, the freshwater prawn Macrobrachium rosenbergii, the green mud crab Scylla paramamosain, and other species of shrimp and crab [64]–[68]. Interestingly, no transcript among the 20 transcripts determined to encode vitellogenin was found to be highly expressed in both the hepatopancreas and the ovary. Thus, expression of the identified vitellogenin transcripts was tissue-specific, including 14 transcripts that were hepatopancreas-specific and 6 transcripts that were ovary-specific. It has been reported that MeVg1, one of two vitellogenin genes in the shrimp Metapenaeus ensis, is expressed only in the ovary and the hepatopancreas, while the other vitellogenin gene, MeVg2, is expressed exclusively in the hepatopancreas [62], [69]. These results provide new insight into the expression of vitellogenin genes in the hepatopancreas and the ovary, and provide the basis for future studies on the manner in which vitellogenin genes collaboratively perform their specific functions at different developmental stages in the ovary. The vitellogenin receptor is located in the cell membrane of oocytes and mediates vitellogenin absorption by oocytes through receptor-mediated endocytosis (RME) [70]. Unlike vitellogenin in P. clarkii, which was highly expressed in the hepatopancreas and in the ovary, the vitellogenin receptor gene was highly expressed in the ovary only, which is consistent with its ovary-specific expression pattern in the shrimp Penaeus monodon and in the freshwater prawn Macrobrachium rosenbergii (Table S10 in File S1) [60], [64].

RT-PCR Assays

To validate the assembled transcripts and their expression profiles in the 4 collected tissue types, 20 transcript sequences were selected for RT-PCR (reverse transcription polymerase chain reaction) amplification. Their putative gene names, primer sequences, and expected PCR product sizes are shown in Table S11 in File S1. All 20 primer pairs gave amplification products of the expected sizes (Figure 6). For G09 and G13, in addition to the expected PCR products, larger PCR unexpected products were also found in the ovary. Analysis of the FPKM levels of these 20 selected transcript sequences showed that 8 sequences (G01–G08) were specifically expressed in the hepatopancreas, 3 sequences (G09–G11) were specifically expressed in muscle, 3 sequences (G12–G14) were specifically expressed in the ovary, 1 sequence (G15) was specifically expressed in the testis, and the other 5 sequences (G16–G20) were highly expressed in 3 or 4 tissue types (Table S11 in File S1). RT-PCR analysis showed that, with the exception of sequence G11, which was indicated by FPKM analysis to be specifically expressed in muscle, but was indicated by RT-PCR to be highly expressed in both muscle and testis, the expression modes of the other 19 sequences in the 4 tissue types were consistent with their FPKM levels (Figure 6). The evaluation and validation of the assembled transcripts verified the high accuracy of Illumina paired-end sequencing and de novo assembly, and thus indicated that our study could be useful for further research into gene function.
Figure 6

RT-PCR amplification and agarose gel (1.5%) electrophoresis of 20 transcripts.

G01–G20: names of transcripts, represented transcript_ID given in Table S10 in File S1; 18S: 18S rRNA transcript; He: hepatopancreas; Mu: muscle; Ov: ovary; Te: testis; M: DNA marker.

RT-PCR amplification and agarose gel (1.5%) electrophoresis of 20 transcripts.

G01–G20: names of transcripts, represented transcript_ID given in Table S10 in File S1; 18S: 18S rRNA transcript; He: hepatopancreas; Mu: muscle; Ov: ovary; Te: testis; M: DNA marker.

SNP identification

Single-nucleotide polymorphisms (SNPs) are the most common type of variation in the genome. SNPs were identified by alignments of multiple sequences used for contig assembly. After excluding those that had a base mutation frequency of less than 1%, a total of 243,764 SNPs were obtained (Figure 7). The proportions of transition substitutions were 34.44% for C:G→T:A and 31.74% for T:A→C:G, compared with smaller proportions of transversion for C:G→A:T (8.49%), C:G→G:C (6.42%), T:A→A:T (11.05%) and T:A→G:C (7.86%). The total transition:transversion ratio was 1.96∶1. Differences in base structure and the numbers of hydrogen bonds between different bases resulted in a large proportion of transition type SNPs and a small proportion of transversion type SNPs. The ovary had the most SNPs (94023 SNPs), followed by the testis (62601 SNPs), hepatopancreas (54855 SNPs), and muscle (32285 SNPs). Statistics for identified SNPs in the crayfish transcriptome are shown in Figure 7.
Figure 7

Classification of SNPs identified in the crayfish transcriptome.

Microsatellite sequence identification

Microsatellite sequences, or simple sequence repeats (SSRs), are polymorphic loci present in genomic DNA that consist of repeated core sequences of 2–6 base pairs in length [71]. A total of 27,451 SSRs were initially identified from 29,534 transcripts, including 36.92% trinucleotide repeats, 24.14% di-nucleotide repeats, and 2.48% tetra/penta/hexanucleotide repeats (Figure 8). In addition, a total of 4775 SSRs (27.12%) were found that were more than 15 base pairs in length. Among the tri-nucleotide repeat motifs, (AGC/GCT)n (3,693 SSRs, 23.16%) and (ACC/GGT)n (3368 SSRs, 22.13%) were the most common types, and appeared significantly more than the other types of tri-nucleotide repeat motifs (Figure 8). After removing the microsatellites that lacked sufficient flanking sequences for primer design, 16953 unique sequences with microsatellites possessed sufficient flanking sequences on both sides of the microsatellites to allow the design of primers for genotyping.
Figure 8

Distribution of simple sequence repeat (SSR) nucleotide classes among different nucleotide types found in the transcriptome of P. clarkii.

Conclusions

This is the report on the transcriptome of P. clarkii using de novo assembly techniques with next-generation sequencing. We identified 50,219 non-redundant transcripts that will provide the basis for future studies on crayfish gene function. We also explored gene expression patterns in four different tissues from P. clarkii, and a number of candidate novel genes were identified that may be involved in important physiological processes and are worthy of further investigation. In addition, a large number of predicted SNPs and SSRs were reported that provide a basis for further genetic analysis and crayfish breeding. The hit species distribution based on BLASTx. (TIF) Click here for additional data file. Transcripts identified as differentially expressed between muscle tissue and the hepatopancreas in the peroxisome pathway (ko04146). Genes for which expression levels in muscle were higher than those in the hepatopancreas are shown with a red frame, and genes for which expression levels in muscle were lower than those in the hepatopancreas are shown with a green frame. (RAR) Click here for additional data file. Tables S1–S11. Table S1, Statistics for P. clarkii sequencing data. Table S2, Summary of BLASTX search results for the P. clarkii transcriptome. Table S3, Number of transcripts (genes) for each KEGG functional group. Table S4, Annotation and FPKM values for transcripts with ORFs in 4 tissue types. Table S5, Annotation and FPKM values of transcripts without ORFs in 4 tissue types. Table S6, Transcripts identified in the fatty acid metabolism pathway (ko00071) and differential expression analysis between the hepatopancreas and muscle tissue. Table S7, Transcripts identified in the peroxisome pathway (ko04146) and differential expression analysis between the hepatopancreas and muscle tissue. Table S8, Transcripts identified in the insect hormone biosynthesis pathway (ko00981). Table S9, Transcripts identified in the DNA replication pathway (ko03030) and differential expression analysis between the ovary and the hepatopancreas and between the testis and the hepatopancreas. Table S10, Transcripts encoding vitellogenin and the vitellogenin receptor identified from the P. clarkii transcriptome. Table S11, Information on the 20 transcript sequences selected for RT-PCR. (XLSX) Click here for additional data file.
  66 in total

1.  Microsatellites and kinship.

Authors:  D C Queller; J E Strassmann; C R Hughes
Journal:  Trends Ecol Evol       Date:  1993-08       Impact factor: 17.712

2.  A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data.

Authors:  Heng Li
Journal:  Bioinformatics       Date:  2011-09-08       Impact factor: 6.937

3.  Gene discovery from an ovary cDNA library of oriental river prawn Macrobrachium nipponense by ESTs annotation.

Authors:  Ping Wu; Dan Qi; Liqiao Chen; Hao Zhang; Xiaowei Zhang; Jian Guang Qin; Songnian Hu
Journal:  Comp Biochem Physiol Part D Genomics Proteomics       Date:  2008-12-24       Impact factor: 2.674

4.  Antiviral immunity in the red swamp crayfish, Procambarus clarkii: hemocyte production, proliferation and apoptosis.

Authors:  Abdel-Hakim S El-Din; Kohar G Varjabedian; Rewaida A Abdel-Gaber; Marwa M Mohamed
Journal:  J Egypt Soc Parasitol       Date:  2013-04

5.  Dynamics of vitellogenin mRNA expression during vitellogenesis in the banana shrimp Penaeus (Fenneropenaeus) merguiensis using real-time PCR.

Authors:  Pharima Phiriyangkul; Peerapong Puengyam; Ingrid B Jakobsen; Prapaporn Utarabhand
Journal:  Mol Reprod Dev       Date:  2007-09       Impact factor: 2.609

6.  Transcriptome analysis of the oriental river prawn, Macrobrachium nipponense using 454 pyrosequencing for discovery of genes and markers.

Authors:  Keyi Ma; Gaofeng Qiu; Jianbin Feng; Jiale Li
Journal:  PLoS One       Date:  2012-06-20       Impact factor: 3.240

7.  Transcriptome analysis of Litopenaeus vannamei in response to white spot syndrome virus infection.

Authors:  Xiaohan Chen; Digang Zeng; Xiuli Chen; Daxiang Xie; Yongzhen Zhao; Chunling Yang; Yongmei Li; Ning Ma; Ming Li; Qiong Yang; Zhenping Liao; Hui Wang
Journal:  PLoS One       Date:  2013-08-26       Impact factor: 3.240

8.  Functional categorization of transcriptome in the species Symphysodon aequifasciatus Pellegrin 1904 (Perciformes: Cichlidae) exposed to benzo[a]pyrene and phenanthrene.

Authors:  Renato de Souza Pinto Lemgruber; Nislanha Ana dos Anjos Marshall; Andrea Ghelfi; Daniel Barros Fagundes; Adalberto Luis Val
Journal:  PLoS One       Date:  2013-12-03       Impact factor: 3.240

9.  Transcriptome analysis on Chinese shrimp Fenneropenaeus chinensis during WSSV acute infection.

Authors:  Shihao Li; Xiaojun Zhang; Zheng Sun; Fuhua Li; Jianhai Xiang
Journal:  PLoS One       Date:  2013-03-19       Impact factor: 3.240

10.  Discovery of four natural clones in a crayfish species Procambarus clarkii.

Authors:  G H Yue; G L Wang; B Q Zhu; C M Wang; Z Y Zhu; L C Lo
Journal:  Int J Biol Sci       Date:  2008-08-28       Impact factor: 6.580

View more
  23 in total

1.  A High-Density Genetic Map and QTL Fine Mapping for Growth- and Sex-Related Traits in Red Swamp Crayfish (Procambarus clarkii).

Authors:  Xin-Fen Guo; Yu-Lin Zhou; Min Liu; Zhi Li; Li Zhou; Zhong-Wei Wang; Jian-Fang Gui
Journal:  Front Genet       Date:  2022-02-15       Impact factor: 4.599

2.  The nucleic acid-binding protein PcCNBP is transcriptionally regulated during the immune response in red swamp crayfish Procambarus clarkii.

Authors:  Aldo Nicosia; Salvatore Costa; Marcello Tagliavia; Teresa Maggio; Monica Salamone; Giorgia Adamo; Maria Antonietta Ragusa; Carmelo Bennici; Tiziana Masullo; Salvatore Mazzola; Fabrizio Gianguzza; Angela Cuttitta
Journal:  Cell Stress Chaperones       Date:  2016-03-03       Impact factor: 3.667

3.  Transcriptome analysis of the threatened snail Ellobium chinense reveals candidate genes for adaptation and identifies SSRs for conservation genetics.

Authors:  Se Won Kang; Bharat Bhusan Patnaik; So Young Park; Hee-Ju Hwang; Jong Min Chung; Min Kyu Sang; Hye Rin Min; Jie Eun Park; Jiyeon Seong; Yong Hun Jo; Mi Young Noh; Jong Dae Lee; Ki Yoon Jung; Hong Seog Park; Yeon Soo Han; Jun Sang Lee; Yong Seok Lee
Journal:  Genes Genomics       Date:  2017-12-01       Impact factor: 1.839

4.  Genome survey sequencing of red swamp crayfish Procambarus clarkii.

Authors:  Linlin Shi; Shaokui Yi; Yanhe Li
Journal:  Mol Biol Rep       Date:  2018-06-21       Impact factor: 2.316

5.  Developing metabolomics-based bioassessment: crayfish metabolome sensitivity to food and dissolved oxygen stress.

Authors:  Natalie M Izral; Robert B Brua; Joseph M Culp; Adam G Yates
Journal:  Environ Sci Pollut Res Int       Date:  2018-10-25       Impact factor: 4.223

Review 6.  A Comprehensive Review on Crustaceans' Immune System With a Focus on Freshwater Crayfish in Relation to Crayfish Plague Disease.

Authors:  Younes Bouallegui
Journal:  Front Immunol       Date:  2021-05-13       Impact factor: 7.561

7.  De novo assembly of a transcriptome from juvenile blue crabs (Callinectes sapidus) following exposure to surrogate Macondo crude oil.

Authors:  Bree K Yednock; Timothy J Sullivan; Joseph E Neigel
Journal:  BMC Genomics       Date:  2015-07-11       Impact factor: 3.969

8.  Identification and Characterization of Reference Genes for Normalizing Expression Data from Red Swamp Crawfish Procambarus clarkii.

Authors:  Hucheng Jiang; Zhaojun Qian; Wei Lu; Huaiyu Ding; Hongwei Yu; Hui Wang; Jiale Li
Journal:  Int J Mol Sci       Date:  2015-09-08       Impact factor: 5.923

9.  Similarities between decapod and insect neuropeptidomes.

Authors:  Jan A Veenstra
Journal:  PeerJ       Date:  2016-05-26       Impact factor: 2.984

10.  In-depth comparative transcriptome analysis of intestines of red swamp crayfish, Procambarus clarkii, infected with WSSV.

Authors:  Zhiqiang Du; Yanhui Jin; Daming Ren
Journal:  Sci Rep       Date:  2016-06-10       Impact factor: 4.379

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.