| Literature DB >> 32660973 |
Jean-Baptiste Ledoux1,2, Fernando Cruz3, Jèssica Gómez-Garrido3, Regina Antoni3, Julie Blanc3, Daniel Gómez-Gras2, Silvija Kipson4, Paula López-Sendino2, Agostinho Antunes5,6, Cristina Linares7, Marta Gut3, Tyler Alioto8,9, Joaquim Garrabou2,10.
Abstract
The octocoral, Paramuricea clavata, is a habitat-forming anthozoan with a key ecological role in rocky benthic and biodiversity-rich communities in the Mediterranean and Eastern Atlantic. Shallow populations of P. clavata in the North-Western Mediterranean are severely affected by warming-induced mass mortality events (MMEs). These MMEs have differentially impacted individuals and populations of P. clavata (i.e., varied levels of tissue necrosis and mortality rates) over thousands of kilometers of coastal areas. The eco-evolutionary processes, including genetic factors, contributing to these differential responses remain to be characterized. Here, we sequenced a P. clavata individual with short and long read technologies, producing 169.98 Gb of Illumina paired-end and 3.55 Gb of Oxford Nanopore Technologies (ONT) reads. We obtained a de novo genome assembly accounting for 607 Mb in 64,145 scaffolds. The contig and scaffold N50s are 19.15 Kb and 23.92 Kb, respectively. Despite of the low contiguity of the assembly, its gene completeness is relatively high, including 75.8% complete and 9.4% fragmented genes out of the 978 metazoan genes contained in the metazoa_odb9 database. A total of 62,652 protein-coding genes have been annotated. This assembly is one of the few octocoral genomes currently available. This is undoubtedly a valuable resource for characterizing the genetic bases of the differential responses to thermal stress and for the identification of thermo-resistant individuals and populations. Overall, having the genome of P. clavata will facilitate studies of various aspects of its evolutionary ecology and elaboration of effective conservation plans such as active restoration to overcome the threats of global change.Entities:
Keywords: Oxford Nanopore Technologies; Paramuricea clavata; de novo assembly; genome annotation; global warming; long read sequencing; mass mortality events; octocoral; temperate habitat-forming anthozoan; whole genome sequencing
Mesh:
Year: 2020 PMID: 32660973 PMCID: PMC7467007 DOI: 10.1534/g3.120.401371
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Figure 1The red gorgonian Paramuricea clavata (Risso 1826): a) whole colony; b) close up on the polyps.
Figure 5Genome annotation workflow.
Whole genome sequencing output
| Library | Read length (bp) | Fragment length (bp) | Yield (Gb) | Coverage | error r1 (%) | error r2 (%) |
|---|---|---|---|---|---|---|
| PE400 | 251 | 395 | 169.98 | 242.76 | 0.29 | 0.46 |
| Oxford Nanopore (1D reads) | 2,677 | — | 3.55 | 5.07 | 16.12 | — |
Sequencing coverage has been estimated assuming the largest genome size estimate: 700.18 Mb.
Information corresponding to the 1D reads produced by two MinION runs. The read N50 is 2,677 and N90 is 1,253 bp. The error rate for Illumina estimated from differences with respect to phix control, while the ONT error rate is estimated from differences with respect to lambda phage control sequence.
Genome properties
| k = 21 | k = 57 | |||
|---|---|---|---|---|
| Genome Property | min | max | min | max |
| Homozygous (%) | 98.40 | 98.46 | 99.09 | 99.11 |
| Heterozygous (%) | 1.53 | 1.59 | 0.89 | 0.91 |
| Genome Haploid Length (bp) | 610,087,141 | 612,695,236 | 697,960,926 | 700,178,345 |
| Genome Repeat Length (bp) | 344,214,541 | 345,686,043 | 293,794,366 | 294,727,750 |
| Genome Unique Length (bp) | 265,872,600 | 267,009,193 | 404,166,560 | 405,450,595 |
| Model Fit (%) | 52.23 | 94.81 | 66.57 | 96.50 |
| Read Error Rate (%) | 0.61 | 0.61 | 0.38 | 0.38 |
| Genome with Repeats (%) | 56.42 | 56.42 | 42.09 | 42.09 |
Estimated from the Illumina pre-processed PE400 2x251bp reads by GenomeScope v.2.0 using a diploid model (P = 2) with k = 21 and k = 57, respectively.
Figure 257-mer analysis of the sequenced genome. All 57-mers in the PE400 library were counted and the number of distinct 57-mers (kmer species) for each depth from 1 to 250 are shown in this plot. The main homozygous peak at depth 124 corresponds to unique homozygous sequence and a tall heterozygous peak lies at half depth of it (62). Finally, the thick long tail starting at depth 180 corresponds with repetitive k-mers in the genome. The high peak at very low depths, caused by sequencing errors, has been truncated to facilitate representation.
Figure 3k-mer profile and model fit as estimated with GenomeScope v.2.0 from PE400 using a k-mer length of 57 bp. Note that the model finds an excess of repetitive sequence in the rightmost tail of the distribution after depth 180.
Figure 4Alignment of the complete mitochondrial assembly against the NCBI reference genome (NC_034749.1) using DNAdiff v1.2 (MUMMER 3.22 package (Kurtz )). The figure produced with Mummerplot v3.5 (MUMMER 3.22 package) shows the location of the three mismatches found: one indel at position 9,389 plus two SNVs at positions 7,977 and 17,155, respectively.
Sequence contaminants
| Species | % reads covered | No. reads covered | No. reads assigned | NCBI TaxID |
|---|---|---|---|---|
| 7.86 | 393,091 | 393,091 | 314275 | |
| 0.47 | 23,547 | 23,547 | 2162 | |
| 0.36 | 17,802 | 0 | 42374 | |
| 0.35 | 17,613 | 0 | 5664 | |
| 0.16 | 7,871 | 0 | 28901 | |
| 0.15 | 7,378 | 0 | 78579 | |
| 0.14 | 6,921 | 6,921 | 1419814 | |
| 0.1 | 5,153 | 0 | 4932 | |
| 0.07 | 3,293 | 0 | 68886 | |
| 0.06 | 2,824 | 0 | 35720 | |
| 0.04 | 2,018 | 2,018 | 28985 | |
| 0.03 | 1,475 | 1,475 | 10372 | |
| 0.03 | 1,282 | 1,282 | 317858 | |
| 0.02 | 902 | 0 | 1768 | |
| 0.02 | 787 | 0 | 2188 | |
| 0.02 | 783 | 783 | 1732201 |
The total number of read pairs screened was 5 million.
Actually included 10 matches to A. mediterranea (Taxonomy ID: 314275) and 393,081 matches to A. mediterranea U8 (Taxonomy ID: 1300257). Therefore, 393,081 matches represent 7.86% of the total reads.
Contiguity of the assembly (pcla8)
| Contigs | Scaffolds | |||
|---|---|---|---|---|
| Length (bp) | Number | Length (bp) | Number | |
| 205,335 | 1 | 239,170 | 1 | |
| 81,130 | 290 | 96,122 | 248 | |
| 61,509 | 728 | 73,949 | 613 | |
| 50,628 | 1,276 | 61,101 | 1,069 | |
| 42,871 | 1,928 | 51,904 | 1,609 | |
| 36,585 | 2,695 | 45,142 | 2,238 | |
| 31,872 | 3,584 | 39,216 | 2,958 | |
| 27,922 | 4,602 | 34,575 | 3,785 | |
| 24,634 | 5,760 | 30,608 | 4,718 | |
| 51,732 | 7,072 | 27,035 | 5,774 | |
| 19,152 | 8,558 | 23,918 | 6,967 | |
| 16,816 | 10,247 | 21,053 | 8,322 | |
| 14,652 | 12,176 | 18,511 | 9,861 | |
| 12,776 | 14,393 | 16,101 | 11,618 | |
| 10,959 | 16,956 | 13,818 | 13,654 | |
| 9,226 | 19,970 | 11,705 | 16,037 | |
| 7,541 | 23,599 | 9,613 | 18,893 | |
| 5,898 | 28,129 | 7,593 | 22,437 | |
| 4,201 | 34,184 | 5,486 | 27,108 | |
| 2,396 | 43,517 | 3,181 | 34,239 | |
| 128 | 79,709 | 203 | 64,145 | |
| 606,183,174 | 79,709 | 606,969,498 | 64,145 | |
Repeat annotation
| Repeat type | Bases covered | % genome |
|---|---|---|
| LINE | 33,645,094 | 5.54 |
| SINE | 9,570,518 | 1.58 |
| DNA | 65,436,874 | 10.78 |
| LTR | 8,780,644 | 1.45 |
| RC | 1,374,121 | 0.23 |
| Satellite | 1,422,244 | 0.23 |
| SnRNA | 86,345 | 0.01 |
| Simple repeat | 1,303,146 | 0.21 |
| Unknown | 196,068,308 | 32.30 |
Genome annotation statistics
| Number of protein-coding genes | 62,652 |
|---|---|
| Median gene length (bp) | 1,634 |
| Number of transcripts | 70,788 |
| Number of exons | 288,028 |
| Number of coding exons | 277,455 |
| Coding GC content | 41.58% |
| Median UTR length (bp) | 571 |
| Median intron length (bp) | 382 |
| Exons/transcript | 4.92 |
| Transcripts/gene | 1.13 |
| Multi-exonic transcripts | 65.1% |
| Gene density (gene/Mb) | 103.22 |