| Literature DB >> 31243452 |
Martin Helmkampf1, M Renee Bellinger1, Scott M Geib2, Sheina B Sim2, Misaki Takabayashi3,4.
Abstract
The rice coral, Montipora capitata, is widely distributed throughout the Indo-Pacific and comprises one of the most important reef-building species in the Hawaiian Islands. Here, we describe a de novo assembly of its genome based on a linked-read sequencing approach developed by 10x Genomics. The final draft assembly consisted of 27,870 scaffolds with a N50 size of 186 kb and contained a fairly complete set (81%) of metazoan benchmarking (BUSCO) genes. Based on haploid assembly size (615 Mb) and read k-mer profiles, we estimated the genome size to fall between 600 and 700 Mb, although the high fraction of repetitive sequence introduced considerable uncertainty. Repeat analysis indicated that 42% of the assembly consisted of interspersed, mostly unclassified repeats, and almost 3% tandem repeats. We also identified 36,691 protein-coding genes with a median coding sequence length of 807 bp, together spanning 7% of the assembly. The high repeat content and heterozygosity of the genome proved a challenging scenario for assembly, requiring additional steps to merge haplotypes and resulting in a higher than expected fragmentation at the scaffold level. Despite these challenges, the assembly turned out to be comparable in most quality measures to that of other available coral genomes while being considerably more cost-effective, especially with respect to long-read sequencing methods. Provided high-molecular-weight DNA is available, linked-read technology may thus serve as a valuable alternative capable of providing quality genome assemblies of nonmodel organisms.Entities:
Keywords: zzm321990 Montipora capitatazzm321990 ; coral; genome assembly; linked-read sequencing
Mesh:
Year: 2019 PMID: 31243452 PMCID: PMC6668484 DOI: 10.1093/gbe/evz135
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
. 1.—Mature colony of the rice coral Montipora capitata (brown, foreground) at Wai‘ōpae tide pools, Eastern Hawai‘i Island. Photograph by Julia Stewart.
. 2.—k-mer profile (k = 21) of the Montipora capitata genome raw reads as calculated by Jellyfish and GenomeScope. Light gray bars show the observed distribution, whereas the dark gray, blue, and red lines indicate the modeled distributions of k-mers representing the full genome, the unique fraction of the genome, and sequencing errors, respectively. Estimated genome characteristics include the genome size (length), nonrepetitive portion of the genome (unique), repetitive portion of the genome (repetitive), and genome heterozygosity (het).
Basic Assembly Statistics for the Montipora capitata Genome, in Comparison to Other Coral Genome Assemblies (GenBank Accessions Are Given Where Available)
| Organism | Assembly | Platform | Length (Mb) | Number | Largest (kb) | N50 Size (kb) | %GC | Ns |
|---|---|---|---|---|---|---|---|---|
|
| This study | Illumina (10×) | 572 | 49,761 | 226 | 24.3 | 39.5 | 6,936 |
| 615 | 27,870 | 2,051 | 185.5 | |||||
|
|
| PacBio + Illumina | — | — | — | — | 39.6 | — |
| 886 | 3,043 | 3,469 | 540.6 | |||||
|
| GCA_000222465.2 | 454 + Illumina | 379 | 54,028 | 98 | 11.0 | 39.0 | 15,243 |
| 447 | 2,420 | 2,550 | 483.6 | |||||
|
| GCA_002042975.1 | Illumina | 356 | 55,201 | 151 | 12.5 | 39.0 | 26,685 |
| 486 | 1,932 | 4,772 | 1,162.4 | |||||
|
| GCA_003704095.1 | Illumina | 225 | 50,903 | 214 | 26.0 | 37.8 | 3,673 |
| 234 | 4,392 | 2,168 | 326.1 | |||||
|
| GCA_900290455.1 | Unknown | 332 | 81,420 | 66 | 5.3 | 38.9 | 29,322 |
| 470 | 14,982 | 1,193 | 137.2 | |||||
|
| GCA_002571385.1 | Illumina | 358 | 37,778 | 250 | 20.5 | 38.5 | 10,536 |
| 400 | 5,687 | 2,970 | 457.5 |
Note.—Numbers on top refer to contigs, those below to scaffolds. “Ns” indicates the number of Ns per 100 kb. All statistics were computed directly from the assemblies rather than using published values to ensure comparability.
. 3.—Completeness of the Montipora capitata genome assembly (first column) assessed by the recovery of 978 metazoan benchmarking genes using BUSCO. For comparison, previously published coral genome assemblies were included in the analysis (see table 1, “RU” designates the Montipora capitata genome assembly by Shumaker et al. [2019]). Columns indicate the percentage of genes which were identified as single-copy (S), duplicated (D), fragmented (F), and missing (M) genes.
Repetitive Elements Identified in the Montipora capitata Genome Assembly
| Number | Total Length (Mb) | Fraction of Assembly (%) | |
|---|---|---|---|
| Tandem repeats | 160,985 | 16.5 | 2.7 |
| Interspersed repeats | 1,028,006 | 257.4 | 41.9 |
| DNA elements | 31,667 | 11.6 | 1.9 |
| LTR elements | 14,753 | 10.0 | 1.6 |
| Non-LTR elements | 110,728 | 42.6 | 6.9 |
| Unclassified | 870,858 | 193.2 | 31.4 |
Protein-Coding Gene Features Annotated in the Genomes of Montipora capitata and Other Coral Species
|
|
|
|
|
|
| |
|---|---|---|---|---|---|---|
| No. genes | 36,691 | 63,227 | 26,060 | 25,916 | 19,935 | 24,833 |
| Gene length | 1,408 | 1,722 | 4,208 | 5,115 | 4,626 | 4,944 |
| CDS length | 807 | 831 | 1,032 | 1,068 | 1,173 | 1,167 |
| Exon length | 151 | 134 | 135 | 141 | 127 | 131 |
| Intron length | 793 | 879 | 585 | 583 | 473 | 572 |
| No. exons/gene | 2 | 2 | 4 | 4 | 5 | 5 |
| CDS total (Mb) | 40.5 | 73.5 | 36.4 | 39.6 | 32.7 | 41.0 |
Note.—Statistics were calculated from GFF files (present study and M. capitata, Shumaker et al. 2019) or taken from GenBank annotation reports (release 100, for corresponding assemblies see table 1). Montipora capitata gene models (first column) include coding exons only, so gene and exon length estimates are not directly comparable with annotations incorporating UTRs. Length estimates and number of exons per gene are median values.