| Literature DB >> 27164098 |
Hyungtaek Jung1, Byung-Ha Yoon2,3, Woo-Jin Kim4, Dong-Wook Kim5, David A Hurwood6, Russell E Lyons7, Krishna R Salin8, Heui-Soo Kim9, Ilseon Baek10, Vincent Chand11, Peter B Mather12.
Abstract
The giant freshwater prawn, Macrobrachium rosenbergii, a sexually dimorphic decapod crustacean is currently the world's most economically important cultured freshwater crustacean species. Despite its economic importance, there is currently a lack of genomic resources available for this species, and this has limited exploration of the molecular mechanisms that control the M. rosenbergii sex-differentiation system more widely in freshwater prawns. Here, we present the first hybrid transcriptome from M. rosenbergii applying RNA-Seq technologies directed at identifying genes that have potential functional roles in reproductive-related traits. A total of 13,733,210 combined raw reads (1720 Mbp) were obtained from Ion-Torrent PGM and 454 FLX. Bioinformatic analyses based on three state-of-the-art assemblers, the CLC Genomic Workbench, Trans-ABySS, and Trinity, that use single and multiple k-mer methods respectively, were used to analyse the data. The influence of multiple k-mers on assembly performance was assessed to gain insight into transcriptome assembly from short reads. After optimisation, de novo assembly resulted in 44,407 contigs with a mean length of 437 bp, and the assembled transcripts were further functionally annotated to detect single nucleotide polymorphisms and simple sequence repeat motifs. Gene expression analysis was also used to compare expression patterns from ovary and testis tissue libraries to identify genes with potential roles in reproduction and sex differentiation. The large transcript set assembled here represents the most comprehensive set of transcriptomic resources ever developed for reproduction traits in M. rosenbergii, and the large number of genetic markers predicted should constitute an invaluable resource for future genetic research studies on M. rosenbergii and can be applied more widely on other freshwater prawn species in the genus Macrobrachium.Entities:
Keywords: Macrobrachium rosenbergii; crustacean; de novo assembly; hybrid transcriptome; optimization; prawn; reproduction
Mesh:
Substances:
Year: 2016 PMID: 27164098 PMCID: PMC4881516 DOI: 10.3390/ijms17050690
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Overview of sequencing reads and reads after processing.
| Statistics | 454 FLX | Ion-Torrent | ||||||
|---|---|---|---|---|---|---|---|---|
| Dataset Name | All | Muscle | Ovary | Testis | All | Muscle | Ovary d | Testis |
| TNB before processing (Mbp) a | 244.37 | 114.38 | 86.06 | 43.94 | 1475.49 | 302.71 | 751.09 | 421.69 |
| Total number of reads | 787,731 | 367,379 | 279,393 | 140,959 | 12,945,479 | 3,027,258 | 6,477,802 | 3,440,419 |
| Average read length (bp) | 310 | 311 | 308 | 311 | 113 | 100 | 116 | 123 |
| GC percentage (%) | 48.30 | 50.45 | 44.88 | 49.40 | 42.26 | 48.60 | 47.10 | 46.70 |
| TNB after trimming and processing (Mbp) a | 234.03 | 109.88 | 81.90 | 42.25 | 1269.81 | 252.13 | 644.89 | 374.80 |
| TNR used for assembly b | 754,374 (95.77%) | 352,936 (96.07%) | 265,907 (95.17%) | 135,531 (96.15%) | 11,141,087 (86.06%) | 2,521,259 (83.29%) | 5,561,961 (85.86%) | 3,057,867 (88.88%) |
| ARL after trimming and processing (bp) | 261 | 267 | 258 | 258 | 94 | 87 | 95 | 100 |
| GC percentage (%) c | 48.60 | 49.50 | 44.40 | 49.60 | 42.90 | 47.10 | 38.90 | 46.70 |
a Total number of bases (TNB); b Total number of reads (TNR); c Average read length (ARL); d Two 316 Chips run. Trimming and processing indicate Q20 and 50 bp for 454 FLX and 30 bp for Ion-Torrent.
Figure 1Comparison of de novo assemblies applying multi K-values. K-values in Trinity indicate 25-mer default value. (a) Number of cintigs; (b) Maximum contig length (bp); (c) N50 length (bp), and (d) Average contig length (bp).
Assembly results with best k-mer setting from the M. rosenbergii transcriptome 454 FLX and Ion-Torrent datasets using different de novo assemblers.
| Assembler | Parameter | All | Muscle | Ovary | Testis |
|---|---|---|---|---|---|
| Trans-ABySS ( | Total base of reads (bp) | 1,294,937,751 | 317,222,362 | 603,806,324 | 343,353,395 |
| No. of total contigs | 40,225 | 4556 | 25,482 | 5335 | |
| Total bases of contigs (bp) | 13,707,795 | 1,629,728 | 8,896,935 | 1,788,051 | |
| No. of contig ≥ 1000 bp | 366 | 83 | 284 | 51 | |
| Contig N50 (bp) | 348 | 357 | 357 | 333 | |
| Mean contig length (bp) | 341 | 358 | 349 | 335 | |
| Largest contig (bp) | 2750 | 4194 | 2387 | 2054 | |
| † CEGMA/Mapping (%) | 50.11/77.71 | ||||
| Trinity ( | Total base of reads (bp) | 1,294,937,751 | 317,222,362 | 603,806,324 | 343,353,395 |
| No. of total contigs | 78,007 | 11,337 | 70,352 | 17,260 | |
| Total bases of contigs (bp) | 27,946,414 | 4,152,262 | 23,845,796 | 5,580,093 | |
| No. of contig ≥ 1000 bp | 912 | 182 | 439 | 99 | |
| Contig N50 (bp) | 368 | 370 | 344 | 322 | |
| Mean contig length (bp) | 358 | 366 | 339 | 323 | |
| Largest contig (bp) | 4480 | 3223 | 3350 | 2694 | |
| † CEGMA/Mapping (%) | 51.81/77.27 | ||||
| CLC Bio ( | Total base of reads (bp) | 1,294,937,751 | 317,222,362 | 603,806,324 | 343,353,395 |
| No. of total contigs | 44,407 | 8397 | 35,847 | 10,604 | |
| Total bases of contigs (bp) | 19,415,235 | 3,826,821 | 15,013,223 | 4,260,690 | |
| No. of contig ≥ 1000 bp | 9259 | 1871 | 6860 | 1770 | |
| Contig N50 (bp) | 438 | 448 | 417 | 399 | |
| Mean contig length (bp) | 437 | 456 | 419 | 402 | |
| Largest contig (bp) | 9495 | 8037 | 5139 | 3978 | |
| † CEGMA/Mapping (%) | 53.09/79.26 | ||||
† CEGMA (Core Eukaryotic Genes Mapping Approach) and CLC Genomic Workbench Mapping.
Figure 2Summary of gene ontology (GO) terms for all combined contigs in M. rosenbergii. Similarity of contigs generated with the CLC Genomic Workbench were searched and annotated using WEGO.
Top 20 differently expressed genes from ovary and testis tissues in M. rosenbergii. Minimum E-value (<10−5) and exclude hypothetical/predicted proteins.
| Contig Number | Putative Function | Length (bp) | Log2 Fold Change | Average Log2 Counts Per Million | False Discovery Rate | ||
|---|---|---|---|---|---|---|---|
| kazal-type protease inhibitor | 976 | 4.82 × 10−85 | −16.66 | 13.23 | 6.17 × 10−10 | 1.30 × 10−05 | |
| kazal-type proteinase inhibitor | 631 | 2.18 × 10−22 | −14.96 | 11.53 | 1.19 × 10−08 | 4.68 × 10−05 | |
| male reproductive-related protein a | 629 | 1.12 × 10−16 | −15.43 | 12.00 | 5.23 × 10−09 | 3.67 × 10−05 | |
| 342 | 3.20 × 10−24 | −13.97 | 10.54 | 6.58 × 10−08 | 0.000104 | ||
| 432 | 2.54 × 10−08 | −13.97 | 10.5 | 6.60 × 10−08 | 0.000104 | ||
| 281 | 1.49 × 10−23 | −13.71 | 10.28 | 1.03 × 10−07 | 0.000129 | ||
| 369 | 6.13 × 10−08 | −12.50 | 9.06 | 8.37 × 10−07 | 0.00046 | ||
| metalloproteinase-like | 1210 | 3.65 × 10−52 | −15.01 | 11.58 | 1.08 × 10−08 | 4.68 × 10−05 | |
| 511 | 2.19 × 10−32 | −13.91 | 10.48 | 7.24 × 10−08 | 0.000109 | ||
| periaxin-like protein | 1872 | 4.31 × 10−08 | −14.57 | 11.14 | 2.30 × 10−08 | 6.46 × 10−05 | |
| cysteine-rich motor neuron 1 | 742 | 1.09 × 10−07 | −14.52 | 11.09 | 2.54 × 10−08 | 6.66 × 10−05 | |
| male reproductive-related protein mar-mrr | 254 | 2.48 × 10−08 | −14.33 | 10.90 | 3.54 × 10−08 | 8.06 × 10−05 | |
| 272 | 1.26 × 10−42 | −14.22 | 10.79 | 4.28 × 10−08 | 8.18 × 10−05 | ||
| 280 | 9.61 × 10−43 | −12.58 | 9.14 | 7.33 × 10−07 | 0.000428 | ||
| blastula protease-10 | 1528 | 1.22 × 10−34 | −13.98 | 10.55 | 6.45 × 10−08 | 0.000104 | |
| matrix metalloproteinase-9 | 1103 | 7.60 × 10−20 | −13.88 | 10.45 | 7.65 × 10−08 | 0.000111 | |
| keratin associated protein | 771 | 4.77 × 10−19 | −13.76 | 10.33 | 9.47 × 10−08 | 0.000126 | |
| 3d domain protein | 510 | 6.23 × 10−06 | −13.67 | 10.24 | 1.10 × 10−07 | 0.000129 | |
| lpxtg-motif cell wall anchor domain protein | 1231 | 5.25 × 10−23 | −13.65 | 10.22 | 1.13 × 10−07 | 0.000129 | |
| serine proteinase inhibitor | 324 | 1.44 × 10−16 | −13.63 | 10.20 | 1.18 × 10−07 | 0.00013 | |
| insulin-like androgenic gland factor | 614 | 1.13 × 10−110 | −13.45 | 10.01 | 1.62 × 10−07 | 0.000159 | |
| von willebrand factor d and egf domain-containing | 1632 | 1.12 × 10−27 | −13.36 | 9.93 | 1.87 × 10−07 | 0.000175 | |
| apolipoprotein d-like | 513 | 5.53 × 10−08 | −13.12 | 9.69 | 2.85 × 10−07 | 0.000239 | |
| hemolectin cg7002-pa | 2221 | 8.78 × 10−25 | −13.05 | 9.62 | 3.21 × 10−07 | 0.000252 | |
| epididymal sperm-binding protein 1-like | 1043 | 3.63 × 10−16 | −12.63 | 9.20 | 6.67 × 10−07 | 0.000412 | |
| 2 RNA ligase family protein | 364 | 3.61 × 10−06 | −12.50 | 9.06 | 8.41 × 10−07 | 0.00046 | |
| keratin associated protein | 668 | 3.21 × 10−11 | −12.49 | 9.05 | 8.52 × 10−07 | 0.00046 | |
| fibronectin 1b | 677 | 1.57 × 10−06 | −12.41 | 8.97 | 9.84 × 10−07 | 0.00052 |
Figure 3Differential gene expression analysis comparing ovary and testis tissues from M. rosenbergii. Red dots indicate significantly differentially expressed genes (transcripts or contigs) and the red curved line is average expression strength in the plot.
Summary of putative single nucleotide polymorphism (SNP) and Indel distribution.
| 44,897 | 7290 | 7421 | 7249 | 7083 | 2046 | 2024 | 1304 | 2445 | 2070 | 1905 | 1335 | 2725 | |
| 100 (%) | 16.2 | 16.5 | 16.1 | 15.7 | 4.5 | 4.5 | 2.9 | 5.4 | 4.6 | 4.2 | 2.9 | 6.0 | |
| 8739 | 4369 | 4121 | 207 | 42 | |||||||||
| 100 (%) | 49.99 | 47.16 | 2.37 | 0.48 | |||||||||
Summary of putative simple sequence repeat (SSR) nucleotide classes among different nucleotide types. Both contig and singleton sequences were used to predict SSR loci.
| SSR Types | Total | Contigs | Singletons | |||
|---|---|---|---|---|---|---|
| Discovered Motifs | Designed Primers | Discovered Motifs | Designed Primers | Discovered Motifs | Designed Primers | |
| Di-nucleotide | 21,433 | 6643 | 3754 | 977 | 17,679 | 5666 |
| AT/TA | 7598 | 2414 | 1323 | 358 | 6275 | 2056 |
| CA/AC | 5775 | 1752 | 1018 | 259 | 4757 | 1493 |
| CG/GC | 455 | 133 | 50 | 11 | 405 | 122 |
| CT/TC | 0 | 0 | 0 | 0 | 0 | 0 |
| GA/AG | 7605 | 2344 | 1363 | 349 | 6242 | 1995 |
| GT/TG | 0 | 0 | 0 | 0 | 0 | 0 |
| Tri-nucleotide | 7181 | 2218 | 1432 | 375 | 5749 | 1843 |
| AAC/ACA/CAA | 538 | 174 | 80 | 20 | 458 | 154 |
| ACG/CGA/GAC | 199 | 55 | 59 | 13 | 140 | 42 |
| ACT/CTA/TAC | 385 | 120 | 64 | 19 | 321 | 101 |
| AGC/GCA/CAG | 750 | 224 | 225 | 60 | 525 | 164 |
| AGG/GAG/GGA | 631 | 181 | 228 | 52 | 403 | 129 |
| AGT/GTA/TAG | 0 | 0 | 0 | 0 | 0 | 0 |
| CAT/ATC/TCA | 1327 | 402 | 228 | 53 | 1099 | 349 |
| CCA/CAC/ACC | 174 | 49 | 97 | 26 | 77 | 23 |
| CCG/CGC/GCC | 40 | 13 | 27 | 7 | 13 | 6 |
| CCT/CTC/TCC | 0 | 0 | 0 | 0 | 0 | 0 |
| CGG/GGC/GCG | 0 | 0 | 0 | 0 | 0 | 0 |
| CTG/TGC/GCT | 0 | 0 | 0 | 0 | 0 | 0 |
| CTT/TTC/TCT | 0 | 0 | 0 | 0 | 0 | 0 |
| GAA/AAG/AGA | 1616 | 513 | 215 | 61 | 1401 | 452 |
| GAT/ATG/TGA | 0 | 0 | 0 | 0 | 0 | 0 |
| GTT/TGT/TTG | 0 | 0 | 0 | 0 | 0 | 0 |
| TAA/ATA/AAT | 1521 | 487 | 209 | 64 | 1312 | 423 |
| TCG/CGT/GTC | 0 | 0 | 0 | 0 | 0 | 0 |
| TGG/GTG/GGT | 0 | 0 | 0 | 0 | 0 | 0 |
| TTA/TAT/ATT | 0 | 0 | 0 | 0 | 0 | 0 |
| ≥Tetra-nucleotide | 570 | 179 | 52 | 13 | 518 | 166 |