| Literature DB >> 29878181 |
Sébastien Renaut1,2, Davide Guerra3, Walter R Hoeh4, Donald T Stewart5, Arthur E Bogan6, Fabrizio Ghiselli7, Liliana Milani7, Marco Passamonti7, Sophie Breton2,3.
Abstract
Freshwater mussels (Bivalvia: Unionida) serve an important role as aquatic ecosystem engineers but are one of the most critically imperilled groups of animals. Here, we used a combination of sequencing strategies to assemble and annotate a draft genome of Venustaconcha ellipsiformis, which will serve as a valuable genomic resource given the ecological value and unique "doubly uniparental inheritance" mode of mitochondrial DNA transmission of freshwater mussels. The genome described here was obtained by combining high-coverage short reads (65× genome coverage of Illumina paired-end and 11× genome coverage of mate-pairs sequences) with low-coverage Pacific Biosciences long reads (0.3× genome coverage). Briefly, the final scaffold assembly accounted for a total size of 1.54 Gb (366,926 scaffolds, N50 = 6.5 kb, with 2.3% of "N" nucleotides), representing 86% of the predicted genome size of 1.80 Gb, while over one third of the genome (37.5%) consisted of repeated elements and >85% of the core eukaryotic genes were recovered. Given the repeated genetic bottlenecks of V. ellipsiformis populations as a result of glaciations events, heterozygosity was also found to be remarkably low (0.6%), in contrast to most other sequenced bivalve species. Finally, we reassembled the full mitochondrial genome and found six polymorphic sites with respect to the previously published reference. This resource opens the way to comparative genomics studies to identify genes related to the unique adaptations of freshwater mussels and their distinctive mitochondrial inheritance mechanism.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29878181 PMCID: PMC6054159 DOI: 10.1093/gbe/evy117
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
DNA Sequencing Strategy
| Raw reads | Trimmed Reads | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Type | Insert Size (bp) | Read Length (bp) | No. Reads (paired) | Total Length (Mb) | No. Reads (paired) | Total Length (Mb) | Total Length (% raw) | Read Length (bp, trimmed) | Coverage |
| 300 | 2×100 | 189,876,842 | 37,975 | 185,721,156 | 36,274 | 95.5 | 97.6 | ||
| 300 | 2×100 | 195,394,768 | 39,079 | 191,002,987 | 37,319 | 95.5 | 97.7 | ||
| 300 | 2×100 | 178,820,287 | 35,764 | 174,954,230 | 34,224 | 95.6 | 98.9 | ||
| 564,091,897 | 112,818 | 551,678,373 | 107,818 | 95.6 | 98.1 | 65× | |||
| 5000 | 2×100 | 97,801,148 | 19,560 | 94,350,168 | 18,717 | 95.7 | 99.3 | 11× | |
| 4,406.4 (average) | 103,096 | 454 | 0.27× | ||||||
| 1,170.9 (average) | 285,260 | 334 | |||||||
| 301–50,048 (min–max) | |||||||||
. 1.—k-mer distribution (k = 21) as calculated by genomescope (Vurture et al. 2017). Blue bars represent the observed k-mer distribution; black line represents the modelled distribution without the error k-mers (red line) and up to a maximum k-mer coverage specified in the model (yellow line). Length, estimated genome length; Uniq, unique portion of the genome (nonrepetitive elements); Rep, repetitive portion of the genome; Het, genome heterozygosity.
Genome Size, Heterozygosity, and Repeat Elements
| Subclass | Order | Family | Species | Estimated Genome Size (Gb) | Heterozygosity (%) | % of Repeated Elements |
|---|---|---|---|---|---|---|
| Palaeoheterodonta | Unionida | Unionidae | 1.80 | 0.63 | 37.81 | |
| Heterodonta | Veneroida | Veneridae | 1.37 | high | 26.38 | |
| Pteriomorphia | Mytiloida | Mytilidae | 1.64 | 1.24 | 47.90 | |
| 2.38 | 2.02 | 62.00 | ||||
| 1.60 | high | 36.13 | ||||
| 1.67 | 2.3 | 33.00 | ||||
| Ostreoida | Ostreidae | 0.55 | 1.95 | 36.00 | ||
| Pectinidae | 0.95 | 0.8 | 32.10 | |||
| 1.43 | 0.45 | 38.87 | ||||
| Pterioida | Pteriidae | 1.15 | high | 37.00 | ||
| Pteriomorphia mean (SD) | 1.39 (0.58) | 1.29 (0.70) | 41.43 (10.29) | |||
| Mytiloida mean (SD) | 1.87 (0.44) | 1.52 (0.68) | 48.68 (12.95) | |||
| Ostreoida mean (SD) | 0.98 (0.44) | 1.07 (0.78) | 35.66 (3.40) | |||
| Pectinidae mean (SD) | 1.19 (0.34) | 0.63 (0.25) | 35.49 (4.79) | |||
| All subclasses mean (SD) | 1.41 (0.51) | 1.20 (0.69) | 39.35 (10.23) |
Note.—Estimates of genome size, heterozygosity, and percentage of repeated elements in the currently available bivalve nuclear genomes. Data for each single species were retrieved from the literature: P. yessoensis (highly inbreed individual, Wang et al. 2017), V. ellipsiformis (wild, recurrent population bottlenecks, this study), C. farreri (selective breeding in aquaculture, Li et al. 2017), B. platifrons (recurrent population bottlenecks in the wild, Sun et al. 2017), C. gigas (highly inbreed individual, Zhang et al. 2012), M. philippinarum (large wild population, Sun et al. 2017), L. fortunei (invasive worlwide, Uliano-Silva et al. 2018), R. philippinarum (selective breeding in aquaculture, Mun et al. 2017), P. fucata (selective breeding in aquaculture, Takeuchi et al. 2012), M. galloprovincialis (large wild population, Murgarella et al. 2016). The genome size for V. ellipsiformis was based on k-mer analysis (see Materials and Methods). Mean and standard deviation (SD) values are also shown for the taxa comprising more than one species and for all subclasses, that is, the class Bivalvia. Note that all species are marine, except for V. ellipsiformis and L. fortunei (freshwater).
no rate calculated, but “high” heterozygosity documented.
Assembly Statistics (ABySS2.0)
| Assembly | L50 | Min | N80 | N50 | N20 | Max | Sum (Mb) | ||
|---|---|---|---|---|---|---|---|---|---|
| 39.8 | 347,879 | 101,624 | 1,000 | 1,361 | 2,181 | 3,891 | 25,883 | 707 | |
| 18.5 | 444,734 | 127,617 | 1,000 | 1,485 | 2,452 | 4,273 | 25,944 | 984 | |
| 14.0 | 551,875 | 141,012 | 1,000 | 1,704 | 3,117 | 5,817 | 39,408 | 1,449 | |
| 13.7 | 423,853 | 92,607 | 1,000 | 2,303 | 5,477 | 9,099 | 45,260 | 1,539 | |
| 13.7 | 410,237 | 86,661 | 1,000 | 2,391 | 5,708 | 9,893 | 47,610 | 1,548 | |
| 13.6 | 366,926 | 58,906 | 1,000 | 2,534 | 6,523 | 16,660 | 298,135 | 1,549 |
Assembly (raw unitigs=raw assembly, not taking into account paired-end information, unitigs=filtering, merging, and popping bubbles in de Bruijn graph, contigs=unitigs with paired-end information mapped, scaffolds=contigs with mate-pairs information mapped, long scaffolds=scaffolds with PacBio/transcriptome information integrated), n=number of contigs, n: 1,000 = number of contigs of mininum length of 1,000, L50 = minimum number of sequences required to represent 50% of the entire assembly, min=mininum length of sequences analyzed, N80, N50, N20 = weighted median statistic such that 80/50/20% of the entire assembly is contained in contigs equal to or larger than this value in bp, max=maximum size of contig in bp, sum=sum of all contigs of size>min.
Assembly and Annotation Statistics for the Long Scaffold Assembly
| Long_scaffolds | Long_scaffolds (>1 kb scaffolds broken based on | Long_scaffolds (>1 kb scaffolds, masked assembly) | |
|---|---|---|---|
| 13,635,758 | 821,266 | 374,245 | |
| 371,706 | 549,364 | 374,245 | |
| 94,238 | 50,209 | 95,019 | |
| 26,952 | 5,151 | 27,030 | |
| 5,073 | 23 | 4,976 | |
| 1,456 | 0 | 1,427 | |
| 2,638,723,663 | 1,554,026,338 | 1,596,234,060 | |
| 1,590,292,198 | 1,425,294,273 | 1,596,234,060 | |
| 1,000,983,904 | 360,423,103 | 1,003,000,325 | |
| 541,545,133 | 64,766,821 | 538,648,016 | |
| 231,252,884 | 687,249 | 226,147,564 | |
| 107,178,666 | 0 | 104,739,660 | |
| 371,706 | 821,266 | 374,245 | |
| 313,274 | 44,597 | 313,274 | |
| 1,590,292,198 | 1,554,026,338 | 1,596,234,060 | |
| 1,800,000,000 | 1,800,000,000 | 1,800,000,000 | |
| 34.19 | 34.19 | 33.49 | |
| 6,656 | 2,812 | 6,627 | |
| 2,293.33 | 13.17 | 39,200.22 | |
| 201,068 | 277,765 | 123,457 | |
| 74,820 | 82,359 | 41,697 | |
| 18,539 | 14,338 | 11,897 | |
| 6,511 | 3,289 | 4,375 | |
| 29,031 | 14,198 | 25,544 |
Note.—All statistics are based on scaffolds of size ≥1 kb, unless otherwise noted (e.g., “No scaffolds [≥ = 0 bp]” and “Total length [≥ = 0 bp]” include all scaffolds).
Analysis of Genome Completeness Using busco 3.0.2 (Benchmarking Universal Single-Copy Orthologs, Simao et al. 2015)
| Metazoa | Eukaryota | |
|---|---|---|
| 664 (68%) | 185 (61%) | |
| 652 (67%) | 181 (60%) | |
| 12 (1%) | 4 (1%) | |
| 207 (21%) | 76 (25%) | |
| 107 (11%) | 42 (14%) | |
| 978 | 303 |
. 2.—Mitochondrial coverage based on sequence alignment and annotation (from NCBI). Six nucleotide positions were identified in the legend as fixed for an alternative allele compared with the reference of Breton (2009).