| Literature DB >> 34849805 |
Luis J Chueca1,2, Tilman Schell1, Markus Pfenninger3,4.
Abstract
Among all molluscs, land snails are a scientifically and economically interesting group comprising edible species, alien species and agricultural pests. Yet, despite their high diversity, the number of genome drafts publicly available is still scarce. Here, we present the draft genome assembly of the land snail Candidula unifasciata, a widely distributed species along central Europe, belonging to the Geomitridae family, a highly diversified taxon in the Western-Palearctic region. We performed whole genome sequencing, assembly and annotation of an adult specimen based on PacBio and Oxford Nanopore long read sequences as well as Illumina data. A genome draft of about 1.29 Gb was generated with a N50 length of 246 kb. More than 60% of the assembled genome was identified as repetitive elements. In total, 22,464 protein-coding genes were identified in the genome, of which 62.27% were functionally annotated. This is the first assembled and annotated genome for a geometrid snail and will serve as reference for further evolutionary, genomic and population genetic studies of this important and interesting group.Entities:
Keywords: zzm321990 de novo assembly; Geomitridae; annotation; land snails; long reads; molluscs; repeats
Mesh:
Year: 2021 PMID: 34849805 PMCID: PMC8496239 DOI: 10.1093/g3journal/jkab180
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Figure 1(A) Picture of an adult specimen of C. unifasciata, copyright © Luis J. Chueca. (B) Estimated extent of occurrence of C. unifasciata in Europe.
Genome assembly and annotation statistics for C. unifasciata and comparison with other land snails genomes
| Statistic |
|
|
|
|---|---|---|---|
|
| 1,286,461,591 | 3,490,924,950 | 1,850,322,141 |
|
| 11,756 | 28,698 | 8,122 |
|
| 246,413 | 330,079 | 721,038 |
|
| 1,602 | 3,071 | 697 |
|
| 205,769 | 337,823 | 584,695 |
|
| 2,034 | 2,964 | 903 |
|
| 8,586 | 28,537 | 921 |
|
| 7,180 | 26,580 | 189 |
|
| 246,413 | 333,110 | 59,589,303 |
|
| 940 | 3,035 | 13 |
|
| 341,667 | 341,704 | 58,752,149 |
|
| 1,188 | 2,930 | 15 |
|
| 40.69 | 41.25 | 38.77 |
|
| |||
|
| |||
|
| 92.4% (S : 85.3%; D: 7.1%) | 89.0% (S : 73.9%; D: 15.1%) | 91.5% (S : 84.6%; D: 6.9%) |
|
| 1.6% | 3.4% | 2.5% |
|
| 6.0% | 7.6% | 6.0% |
|
| |||
|
| 94.5%(S : 86.0%; D : 8.5%) | 71.7%(S : 59.5%; D : 12.2%) | 95.6%(S : 86.8%; D : 8.8%) |
|
| 2.6% | 10.4% | 1.9% |
|
| 2.9% | 17.9% | 2.5% |
|
| — | — | |
|
| 94.7% (S : 52.6%; D: 42.1%) | ||
|
| 3.8% | ||
|
| 1.5% |
Figure 2(A) GenomeScope k-mer profile plot for C. unifasciata genome. (B) Coverage histogram for the final assembly based on the Illumina reads.
Repeat statistics
| Assembly | LINE | SINE | LTR | DNA | Unclassified | SmRNA | Others | Total (%) |
|---|---|---|---|---|---|---|---|---|
|
| 1,253,318 | 427,509 | 11,975 | 298,828 | 1,334,718 | 413,197 | 708,740 | 61.1 |
|
| 2,820,864 | 342,120 | 209,476 | 443,363 | 4,400,828 | 444,489 | 1,267,814 | 76.4 |
De novo and homology based repeat annotations as reported by RepeatMasker and RepeatModeler for C. unifasciata and comparison with C. nemoralis. Families of repeats included here are long interspersed nuclear elements (LINEs), short interspersed nuclear elements (SINEs), long tandem repeats (LTR), DNA transposons (DNA), unclassified (unknown) repeat families, small RNA repeats (SmRNA), and others (consisting of small, but classified repeat groups). The last column represents the total percentage of base pairs annotated as repeats.
Figure 3Blob plot showing read coverage, GC content and size of each scaffold. Size of the blobs correspond to size of the scaffold and color corresponds to taxonomic assignment based on a blast search against the nt database.
Annotation statistics of the predicted protein-coding genes for C. unifasciata genome
|
| ||
|---|---|---|
|
| ||
|
| 22,464 | |
|
| 22,464 | |
|
| 147,783 | |
|
| 147,783 | |
|
| ||
|
| 1 | |
|
| 6.58 | |
|
| ||
|
| 11,931 | |
|
| 11,931 | |
|
| 129 | |
|
| 2,025 | |
|
| 129 | |
|
| ||
|
| 379,573,459 | |
|
| 26,582,739 | |
|
| ||
|
| 3,562 | |
|
| 21,231 (94.51%) | |
|
| 13,221 (62.27%) | |
|
| 5,069 (22.56%) | |
|
| 16,809 (74.83%) |
Software employed in this work, their package version and source availability. All url were last accessed on 07-06-2021.
| Name | Version | Url |
|---|---|---|
| Flye | 2.6 |
|
| wtdbg2 | 2.5 |
|
| Canu | 1.9 |
|
| Racon | 1.4.3 |
|
| Pilon | 1.23 |
|
| Quast | 5.0.2 |
|
| BUSCO | 3.0.2 |
|
| BlobTools | 1.1.1 |
|
| LINKS | 1.8.7 |
|
| Rascaf | 1.0.2 |
|
| Long-Read Gapcloser | 1.0 |
|
| FastQC | 0.11.9 |
|
| Trimmomatic | 0.39 |
|
| MultiQC | 1.9 |
|
| GenomeScope | 1.0 |
|
| Trinity | 2.9.1 |
|
| GeMoMa | 1.6.4 |
|
| MMseqs2 | 5877873 |
|
| Augustus | 3.3.3 |
|
| TransDecoder | 5.5.0 |
|
| SNAP | 2006-07-28 | — |
| EXONERATE | 2.2.0 |
|
| PASA | 2.4.1 |
|
| EVidenceMolder | 1.1.1 |
|
| guppy | 4.0.11 | https://github.com/nanoporetech/pyguppyclient |
| Nanoplot | 1.28.1 |
|
| Nanofilt | 2.6.0 |
|
| backmap.pl | 0.3 |
|
| SAMtools | 1.10 |
|
| BWA | 0.7.17 |
|
| minimap2 | 2.17 |
|
| Qualimap | 2.2.1 |
|
| bedtools | 2.28.0 |
|
| Rscript | 3.6.3 |
|
| RepeatModeler | 2.0 |
|
| RepeatMasker | 4.1.0 |
|
| HISAT2 | 2.1.0 |
|