| Literature DB >> 31562330 |
Yannick Cogne1, Davide Degli-Esposti2, Olivier Pible1, Duarte Gouveia1, Adeline François2, Olivier Bouchez3, Camille Eché3, Alex Ford4, Olivier Geffard2, Jean Armengaud5, Arnaud Chaumot2, Christine Almunia1.
Abstract
Gammarids are amphipods found worldwide distributed in fresh and marine waters. They play an important role in aquatic ecosystems and are well established sentinel species in ecotoxicology. In this study, we sequenced the transcriptomes of a male individual and a female individual for seven different taxonomic groups belonging to the two genera Gammarus and Echinogammarus: Gammarus fossarum A, G. fossarum B, G. fossarum C, Gammarus wautieri, Gammarus pulex, Echinogammarus berilloni, and Echinogammarus marinus. These taxa were chosen to explore the molecular diversity of transcribed genes of genotyped individuals from these groups. Transcriptomes were de novo assembled and annotated. High-quality assembly was confirmed by BUSCO comparison against the Arthropod dataset. The 14 RNA-Seq-derived protein sequence databases proposed here will be a significant resource for proteogenomics studies of these ecotoxicologically relevant non-model organisms. These transcriptomes represent reliable reference sequences for whole-transcriptome and proteome studies on other gammarids, for primer design to clone specific genes or monitor their specific expression, and for analyses of molecular differences between gammarid species.Entities:
Mesh:
Year: 2019 PMID: 31562330 PMCID: PMC6764967 DOI: 10.1038/s41597-019-0192-5
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Sampling information and number of reads for each sample before and after filtering by mean quality for the 14 transcriptomes.
| Species | Code Name | Sex | River | City | Country | GPS | Number of raw reads | Number of reads after filtering | |
|---|---|---|---|---|---|---|---|---|---|
|
| EGSF | Female | Saucats | Saucats | France | 44°39′34″N | 0°34′25″W | 80 482 966 | 80 277 434 |
|
| EGSM | Male | Saucats | Saucats | France | 44°39′34″N | 0°34′25″W | 90 372 154 | 90 118 242 |
|
| EGUF | Female | sea coast | Portsmouth | UK | 50°47′41″N | 1°01′50″W | 85 032 246 | 84 652 454 |
|
| EGUM | Male | sea coast | Portsmouth | UK | 50°47′41″N | 1°01′50″W | 70 768 994 | 70 540 528 |
| GFAF | Female | Seebach | Fellering | France | 47°53′31″N | 6°58′53″E | 81 959 830 | 81 543 116 | |
| GFAM | Male | Seebach | Fellering | France | 47°53′31″N | 6°58′53″E | 95 167 986 | 94 695 372 | |
| GFBF | Female | Pollon | Saint-Maurice-de-Rémens | France | 45°57′21″N | 5°15′44″E | 96 361 300 | 96 093 396 | |
| GFBM | Male | Pollon | Saint-Maurice-de-Rémens | France | 45°57′21″N | 5°15′44″E | 85 125 996 | 84 758 816 | |
| GFCF | Female | Pollon | Saint-Maurice-de-Rémens | France | 45°57′21″N | 5°15′44″E | 78 459 708 | 77 977 148 | |
| GFCM | Male | Pollon | Saint-Maurice-de-Rémens | France | 45°57′21″N | 5°15′44″E | 75 598 166 | 75 407 534 | |
|
| GPCF | Female | Pollon | Saint-Maurice-de-Rémens | France | 45°57′21″N | 5°15′44″E | 84 202 086 | 83 965 920 |
|
| GPCM | Male | Pollon | Saint-Maurice-de-Rémens | France | 45°57′21″N | 5°15′44″E | 89 235 492 | 89 025 410 |
|
| GWF | Female | Galaveyson | Le Grand Serre | France | 45°16′27″N | 5°07′08″E | 80 192 262 | 79 695 588 |
|
| GWM | Male | Galaveyson | Le Grand Serre | France | 45°16′27″N | 5°07′08″E | 63 959 618 | 63 638 482 |
*Müller type.
Assembly quality metrics.
| EGSF | EGSM | EGUF | EGUM | GFAF | GFAM | GFBF | GFBM | GFCF | GFCM | GPCF | GPCM | GWF | GWM | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| n_seqs | 166,100 | 211,358 | 162,914 | 133,658 | 182,439 | 383,876 | 325,379 | 344,409 | 280,883 | 324,661 | 245,224 | 257,575 | 214,232 | 183,988 |
| largest | 21,406 | 28,082 | 25,426 | 29,815 | 11,828 | 22,574 | 26,858 | 21,757 | 29,633 | 25,029 | 17,350 | 17,019 | 27,829 | 22,483 |
| n_bases | 178,852,651 | 228,738,512 | 168,030,154 | 142,457,935 | 118,459,292 | 283,956,781 | 259,691,927 | 263,406,154 | 226,877,323 | 236,552,608 | 198,832,295 | 180,448,306 | 186,939,687 | 144,019,426 |
| mean_len | 1076.8 | 1082.2 | 1031.4 | 1065.8 | 649.3 | 739.7 | 798.1 | 764.8 | 807.7 | 728.6 | 810.8 | 700.6 | 872.6 | 782.8 |
| n_over_1k | 42,496 | 54,408 | 44,211 | 38,307 | 31,373 | 76,176 | 66,143 | 67,014 | 57,497 | 58,066 | 50,528 | 44,353 | 49,495 | 38,349 |
| n_over_10k | 498 | 827 | 348 | 324 | 5 | 156 | 345 | 308 | 303 | 202 | 232 | 29 | 311 | 101 |
| n_with_orf | 35,470 | 58,284 | 38,503 | 30,621 | 32,784 | 78,940 | 62,829 | 65,479 | 53,123 | 56,151 | 40,810 | 33,985 | 46,313 | 40,639 |
| mean_orf (%) | 41.0 | 47.9 | 46.0 | 43.3 | 51.9 | 54.2 | 50.5 | 50.7 | 49.3 | 50.2 | 45.3 | 44.1 | 50.2 | 51.2 |
| n90 | 355 | 361 | 340 | 357 | 270 | 282 | 285 | 284 | 289 | 273 | 283 | 271 | 310 | 300 |
| n50 | 2646 | 2594 | 2278 | 2299 | 963 | 1240 | 1518 | 1354 | 1555 | 1290 | 1622 | 1187 | 1703 | 1328 |
| n10 | 7494 | 7850 | 6812 | 6736 | 2978 | 4256 | 5442 | 5071 | 5593 | 4958 | 5522 | 4319 | 5767 | 4539 |
| gc(%) | 42.7 | 42.5 | 43.6 | 43.4 | 42.6 | 41.8 | 43.6 | 43.0 | 43.8 | 43.4 | 43.5 | 42.4 | 43.3 | 43.1 |
| RMBT(%)* | 91.7 | 94.4 | 89.6 | 91.9 | 88.2 | 83.9 | 90.1 | 82.7 | 87.7 | 86.1 | 81.9 | 84.8 | 87.2 | 86.5 |
| G-RMBT(%)* | 80.7 | 86.8 | 75.2 | 73.5 | 76.5 | 65.1 | 82.0 | 61.9 | 75.8 | 70.0 | 63.9 | 66.4 | 75.4 | 70.6 |
| Score# | 0.16 | 0.16 | 0.11 | 0.11 | 0.18 | 0.12 | 0.13 | 0.10 | 0.12 | 0.11 | 0.10 | 0.12 | 0.14 | 0.15 |
*RMBT means Reads Mapping Back on the Transcriptome; G-RMBT means Good Reads Mapping Back on the Transcriptome.
#Score calculated by Transrate.
Accessions for the 14 transcriptomes.
| Code Name | Transcriptome accession | Read accession | BioProject | BioSample |
|---|---|---|---|---|
| EGSF | GHCT01000000 |
| PRJNA497972 | SAMN10259946 |
| EGSM | GHCU01000000 |
| PRJNA497972 | SAMN10259947 |
| EGUF | GHCW01000000 |
| PRJNA497972 | SAMN10259948 |
| EGUM | GHCV01000000 |
| PRJNA497972 | SAMN10259949 |
| GFAF | GHCX01000000 |
| PRJNA497972 | SAMN10259934 |
| GFAM | GHCY01000000 |
| PRJNA497972 | SAMN10259935 |
| GFBF | GHCZ01000000 |
| PRJNA497972 | SAMN10259936 |
| GFBM | GHDA01000000 |
| PRJNA497972 | SAMN10259937 |
| GFCF | GHDC01000000 |
| PRJNA497972 | SAMN10259938 |
| GFCM | GHDB01000000 |
| PRJNA497972 | SAMN10259939 |
| GPCF | GHCP01000000 |
| PRJNA497972 | SAMN10259940 |
| GPCM | GHCQ01000000 |
| PRJNA497972 | SAMN10259941 |
| GWF | GHCR01000000 |
| PRJNA497972 | SAMN10259944 |
| GWM | GHCN01000000 |
| PRJNA497972 | SAMN10259945 |
Fig. 1BUSCO assessment results for the 14 assembled transcriptomes.
| Measurement(s) | transcription profiling assay |
| Technology Type(s) | RNA sequencing |
| Factor Type(s) | sex • species |
| Sample Characteristic - Organism | Gammarus • Echinogammarus |
| Sample Characteristic - Environment | habitat |