| Literature DB >> 35963883 |
André Gomes-Dos-Santos1,2, André M Machado3,4, L Filipe C Castro3,4, Vincent Prié5, Amílcar Teixeira6, Manuel Lopes-Lima3,7,8, Elsa Froufe9.
Abstract
Genomic tools applied to non-model organisms are critical to design successful conservation strategies of particularly threatened groups. Freshwater mussels of the Unionida order are among the most vulnerable taxa and yet almost no genetic resources are available. Here, we present the gill transcriptomes of five European freshwater mussels with high conservation concern: Margaritifera margaritifera, Unio crassus, Unio pictorum, Unio mancus and Unio delphinus. The final assemblies, with N50 values ranging from 1069-1895 bp and total BUSCO scores above 90% (Eukaryote and Metazoan databases), were structurally and functionally annotated, and made available. The transcriptomes here produced represent a valuable resource for future studies on these species' biology and ultimately guide their conservation.Entities:
Mesh:
Year: 2022 PMID: 35963883 PMCID: PMC9376081 DOI: 10.1038/s41597-022-01613-x
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 8.501
Fig. 1Maps of the five species’ potential distributions produced by overlapping points of recent presence records (obtained from Lopes-Lima et al.[10]) with the Hydrobasin level 5 polygons[59]. Overlapping distribution polygons between Unio mancus and Unio crassus are represented by a light purple shade, in the left panel. Overlapping distribution polygons between Unio pictorum and Margaritifera margaritifera are represented by an orange shade, in the right panel.
MixS descriptors for the five freshwater mussel species.
| Sample | |||||
|---|---|---|---|---|---|
| Investigation_type | Eukaryote | Eukaryote | Eukaryote | Eukaryote | Eukaryote |
| Project_name | Gill transcriptome of five freshwater musssles’ european species | ||||
| Lat_lon | 41.862414; −6.931596 | 45.515500; 15.473240 | 45.515500; 15.473240 | 41.710606; 8.828512 | 41.564361; −7.258665 |
| Geo_loc_name | Portugal | Croatia | Croatia | France | North of Portugal |
| Collection_date | 7/6/2021 | 7/12/2019 | 7/12/2019 | 4/21/2021 | 3/20/2021 |
| Env_package | Water | Water | Water | Water | Water |
| Seq_meth | Illumina HiSeq 4000 | Illumina HiSeq 4000 | Illumina HiSeq 4000 | Illumina HiSeq 4000 | Illumina HiSeq 4000 |
| Assembly method | Trinity | Trinity | Trinity | Trinity | Trinity |
| Collector | Amilcar Teixeira | Manuel Lopes-Lima | Manuel Lopes-Lima | Vincent Prié | Amilcar Teixeira |
| Sex | Undetermined | Undetermined | Undetermined | Undetermined | Undetermined |
| Maturity | Mature | Mature | Mature | Mature | Mature |
Fig. 2Bioinformatics pipeline applied for the transcriptome assembly and annotation. Auxiliary representative figures were created with BioRender.com.
Fig. 3FastQC quality report of the trimmed and decontaminated RNA-seq reads (after Centrifuge for each species. (a) Margaritifera margaritifera; (b) Unio crassus; (c) Unio pictorum; (d) Unio mancus; and (e) Unio delphinus.
Basic statistics of raw sequencing datasets and percentages of removed reads at each step of the preassembly processing strategy.
| Basic Statistics | Total Transcriptome | Non redundant Transcriptome | Total Transcriptome | Non redundant Transcriptome | Total Transcriptome | Non redundant Transcriptome | Total Transcriptome | Non redundant Transcriptome | Total Transcriptome | Non redundant Transcriptome | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Number of transcripts | 1694677 | 470852 | 1304611 | 169668 | 232124 | 68670 | 234695 | 65620 | 280001 | 82542 | |
| n bases | 1052464277 | 442302372 | 1002862692 | 262637793 | 189129150 | 83762650 | 198791465 | 89666570 | 224567067 | 103248722 | |
| Mean transcript lenght (bp) | 621.02389 | 939.36603 | 768.67926 | 1547.94894 | 814.75652 | 1219.7852 | 847.00815 | 1366.44881 | 802.01073 | 1250.86286 | |
| Number of transcripts over 1 K nt | 214128 | 134690 | 235872 | 104192 | 53293 | 28701 | 54754 | 31276 | 62078 | 35904 | 27.78417362 |
| Number of transcripts over 10 K | 1189 | 261 | 1905 | 453 | 7 | 5 | 33 | 15 | 24 | 12 | |
| N90 trancript lenght (bp) | 284 | 499 | 313 | 816 | 314 | 582 | 322 | 659 | 309 | 612 | |
| N70 trancript lenght (bp) | 462 | 759 | 589 | 1324 | 697 | 1037 | 732 | 1168 | 677 | 1047 | |
| N50 trancript lenght (bp) | 773 | 1069 | 1187 | 1889 | 1447 | 1688 | 1569 | 1895 | 1400 | 1669 | |
| N30 trancript lenght (bp) | 1475 | 1619 | 2409 | 2864 | 2438 | 2589 | 2635 | 2870 | 2426 | 2600 | |
| N10 trancript lenght (bp) | 3783 | 3281 | 5504 | 5458 | 4073 | 4174 | 4427 | 4592 | 4108 | 4252 | |
| Percentage of GC (%) | 0.36365 | 0.35712 | 0.35352 | 0.34896 | 0.35511 | 0.35179 | 0.35899 | 0.35468 | 0.36814 | 0.36893 | |
| BUSCO Complete (Single + Duplicated) | 93.7/94.5 | 85.8/89.4 | 97.1/98.1 | 92.1/93.1 | 87.5/83.1 | 83.8/79.7 | 89.8/88.2 | 85.2/83.9 | 92.1/88.3 | 89.1/84.8 | |
| BUSCO Single* | 45.5/47.4 | 83.8/85.8 | 44.6/43.6 | 90.8/90.5 | 58.1/57.8 | 80.5/77.8 | 62.7/64.6 | 82.2/82.7 | 62.7/64.0 | 81.2/80.8 | |
| BUSCO Duplicated* | 48.2/47.1 | 2.0/3.6 | 52.5/54.5 | 1.3/2.6 | 29.4/25.3 | 3.3/1.9 | 27.1/23.6 | 3.0/1.2 | 29.4/24.3 | 7.9/4.0 | |
| BUSCO Fragmented* | 4.0/4.5 | 8.3/6.1 | 2.3/1.6 | 3.6/3.9 | 7.9/10.2 | 6.9/7.4 | 6.6/8.0 | 7.6/6.4 | 5.6/7.8 | 5.0/6.1 | |
| BUSCO Missing* | 2.3/1.0 | 5.9/4.5 | 0.6/0.3 | 4.3/3.0 | 4.6/6.7 | 9.3/12.9 | 3.6/3.8 | 7.2/9.7 | 2.3/3.9 | 5.9/9.1 | |
| Total Buscos Found* | 97.7/99.0 | 94.1/95.5 | 99.4/99.7 | 95.7/97.0 | 95.4/93.3 | 90.7/87.1 | 96.4/96.8 | 92.8/90.3 | 97.7/96.1 | 94.1/90.4 | |
Transrate and Busco scores of redundant and non-redundant gill transcriptome assemblies for each species.
| Raw Reads | |||||
|---|---|---|---|---|---|
| Raw sequencing reads | 131051306 | 132002266 | 104108396 | 100704688 | 112439686 |
| Trimmomatic reads removed | 1524256 (1.16%) | 1761532 (1.33%) | 937250 (0.90%) | 714904 (0.71%) | 1074338 (0.96%) |
| Centrifuge reads removed | 157718 (0.12%) | 118410 (0.090%) | 101442 (0.097%) | 145422 (0.14%) | 250936 (0.22%) |
| Reads used in assembly | 129369332 (98.72%) | 130122324 (98.56%) | 103069704 (99.00%) | 99844362 (99.15%) | 111114412 (98.82%) |
*euk/met. Euk: Dataset with 303 genes of Eukaryota library profile. Met: Dataset with 978 genes of Metazoa library profile.
Structural and functional annotation statistics for the final gill transcriptome assemblies for each species.
| Structural annotation | |||||
|---|---|---|---|---|---|
| Number of transcripts | 470852 | 169668 | 68670 | 65620 | 82542 |
| Number of cdss | 56730 | 35069 | 19830 | 19881 | 28216 |
| Number of exons | 56730 | 35069 | 19830 | 19881 | 28216 |
| Total gene length | 442302372 | 262637793 | 83762650 | 89666570 | 103248722 |
| Total cds length | 41461605 | 34346592 | 17039142 | 18840849 | 22564185 |
| Total exon length | 95381543 | 85666986 | 36059402 | 41076667 | 48847415 |
| mean gene length | 939 | 1547 | 1219 | 1366 | 1250 |
| mean cds length | 730 | 979 | 859 | 947 | 799 |
| mean exon length | 1681 | 2442 | 1818 | 2066 | 1731 |
| Blast-p/x/n hits (NCBI-RefSeq; NCBI-nr; NCBI-nt) | 71046 | 51937 | 24194 | 24775 | 32688 |
| CDD | 6295 | 6475 | 4357 | 4693 | 5542 |
| Coils | 4943 | 4558 | 2815 | 2930 | 3821 |
| GO | 10784 | 9966 | 7243 | 7701 | 10272 |
| Gene3D | 15077 | 13342 | 9681 | 9975 | 13499 |
| Hamap | 270 | 266 | 221 | 229 | 254 |
| InterPro | 19126 | 16611 | 12116 | 12524 | 16717 |
| KEGG | 909 | 874 | 575 | 625 | 802 |
| MetaCyc | 835 | 781 | 581 | 574 | 777 |
| MobiDBLite | 10629 | 8238 | 5225 | 5737 | 6786 |
| PIRSF | 628 | 687 | 484 | 556 | 582 |
| PRINTS | 2609 | 2645 | 1961 | 2232 | 2589 |
| Pfam | 15788 | 14394 | 10591 | 11116 | 14428 |
| ProSitePatterns | 3585 | 3546 | 2445 | 2708 | 3346 |
| ProSiteProfiles | 9079 | 8323 | 5716 | 6034 | 7612 |
| Reactome | 3717 | 3515 | 2580 | 2732 | 3564 |
| SFLD | 69 | 72 | 54 | 60 | 67 |
| SMART | 7138 | 6869 | 4534 | 4958 | 6036 |
| SUPERFAMILY | 15070 | 13240 | 9376 | 9729 | 13190 |
| TIGRFAM | 757 | 751 | 552 | 617 | 815 |
| Total | 25267 | 20432 | 14723 | 14971 | 20637 |
| Measurement(s) | transcriptomics |
| Technology Type(s) | Illumina sequencing |
| Sample Characteristic - Organism | Margaritifera margaritifera • Unio crassus • Unio delphinus • Unio mancus • Unio pictorum |
| Sample Characteristic - Location | Europe |