| Literature DB >> 32457802 |
Ronghua Li1,2,3, Weijia Zhang1, Junkai Lu1,2, Zhouyi Zhang1, Changkao Mu1,2, Weiwei Song1,2, Herve Migaud1,2,3, Chunlin Wang1,2, Michaël Bekaert3.
Abstract
The hard-shelled mussel (Mytilus coruscus) is an economically important shellfish that has been cultivated for the last decade. Due to over-exploitation, most mussel stocks have dramatically declined. Efforts to study this species' natural distribution, genetics, breeding, and cultivation have been hindered by the lack of a high-quality reference genome. To address this, we produced a hybrid high-quality reference genome of M. coruscus using a long-read platform to assemble the genome and short-read, high-quality technology to accurately correct for sequence errors. The genome was assembled into 10,484 scaffolds, a total length of 1.90 Gb, and a scaffold N50 of 898 kb. Ab initio annotation of the M. coruscus genome assembly identified a total of 42,684 genes. This accurate reference genome of M. coruscus provides an essential resource with the advantage of enabling the genome-scale selective breeding of M. coruscus. More importantly, it will also help in deciphering the speciation and local adaptation of the Mytilus species.Entities:
Keywords: Mytilus coruscus; genome assembly and annotation; hard-shelled mussel; mitochondria; sequencing; syntheny
Year: 2020 PMID: 32457802 PMCID: PMC7227121 DOI: 10.3389/fgene.2020.00440
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1Genome size and quality estimations. (A) Genome assembly workflow. (B) The 31-mer distribution used for the estimation of genome size and heterozygosity. The heterozygous and homozygous peaks of k-mer depth are clearly markers, suggesting a high-complexity genome and high heterozygosity of 1.64%. (C) BUSCO assessment (Metazoa database; number of framework genes 954); 96.44% of the genes were recovered.
Sequencing data statistics.
| Total number of long reads | 11,312,815 |
| Total number of bases | 161,041,744,749 |
| N50 length | 21,771 nt |
| Maximum read length | 259,852 nt |
| Coverage | 87× |
| Total number of PE short reads | 288,220,402 |
| Total number of bases | 86,466,120,600 |
| Read length | 150 nt |
| Coverage | 46× |
Statistics of the genome assembly of Mytilus coruscus.
| K-mer = 31 | 4,311,539,104 |
| Heterozygous peak | 18.87 |
| Homozygous peak | 37.74 |
| Estimated genome size | 1,567,289,679 nt |
| Estimated repeats | 530,204,285 nt |
| Estimated heterozygosity | 1.64% |
| Largest contig | 11,437,774 nt |
| Total length | 1,903,799,720 nt |
| N50 | 664,188 nt |
| Largest scaffolds | 13,847,550 nt |
| Total length | 1,903,825,920 nt |
| N50 | 898,347 nt |
| GC | 32.22% |
| Mapped | 98.42% |
| Properly paired | 77.04% |
| Avg. coverage depth | 138x |
| Coverage over 10× | 99.48% |
| N's per 100 kbp | 1.38 |
| BUSCO recovered | 96.44% |
| Predicted rRNA genes | 278 |
| Predicted gene models | 92,615 |
| Predicted protein-coding genes | 42,684 |
RepeatMasker statistics.
| SINEs | 2,854 | 525,572 | 0.03 |
| ALUs | 0 | 0 | 0.00 |
| MIRs | 0 | 0 | 0.00 |
| LINEs | 437,682 | 160,984,195 | 8.46 |
| LINE1 | 812 | 607,529 | 0.03 |
| LINE2 | 13,314 | 5,148,240 | 0.27 |
| L3/CR1 | 7,119 | 3,117,407 | 0.16 |
| LTR elements | 35,692 | 25,465,347 | 1.34 |
| ERVL | 0 | 0 | 0.00 |
| ERVL-MaLRs | 0 | 0 | 0.00 |
| ERV classI | 0 | 0 | 0.00 |
| ERV classII | 675 | 176,007 | 0.01 |
| DNA elements | 74,846 | 21,072,684 | 1.11 |
| hAT-Charlie | 0 | 0 | 0.00 |
| TcMar-Tigger | 0 | 0 | 0.00 |
| Unclassified | 3,215,437 | 784,518,335 | 41.21 |
| Small RNA | 0 | 0 | 0.00 |
| Satellites | 1,170 | 118,198 | 0.01 |
| Simple repeats | 307,099 | 12,840,131 | 0.67 |
| Low complexity | 56,444 | 2,732,946 | 0.14 |
| Total repeats | 1,005,864,117 | 52.83 |
Repeats fragmented by insertions or deletions have been counted as one element.
Comparison between Mytilus spp. assemblies.
| Num. scaffolds | 1,002,334 | 10,484 |
| Span | 1,500,149,602 nt | 1,903,825,920 nt |
| Longest scaffold | 67,529 nt | 13,847,550 nt |
| Shortest scaffold | 200 nt | 3,201 nt |
| N50 | 2,931 nt | 898,347 nt |
| GC | 31.71% | 32.22% |
| Syntenic | 281,841 (28.12%) | 7,365 (70.25%) |
Sequences deposited and reported by Murgarella et al. (.
Figure 2Genome comparisons. (A) Circos Krzywinski et al. (2009) mapping of longest synteny blocks between M. galloprovincialis (287 dark gray scaffolds) and M. coruscus (104 blue scaffolds). (B) M. coruscus annotated mitochondrial genome.