| Literature DB >> 31167834 |
Darlon V Lantican1,2, Susan R Strickler3, Alma O Canama1, Roanne R Gardoce1, Lukas A Mueller3, Hayde F Galvez4,5.
Abstract
We report the first whole genome sequence (WGS) assembly and annotation of a dwarf coconut variety, 'Catigan Green Dwarf' (CATD). The genome sequence was generated using the PacBio SMRT sequencing platform at 15X coverage of the expected genome size of 2.15 Gbp, which was corrected with assembled 50X Illumina paired-end MiSeq reads of the same genome. The draft genome was improved through Chicago sequencing to generate a scaffold assembly that results in a total genome size of 2.1 Gbp consisting of 7,998 scaffolds with N50 of 570,487 bp. The final assembly covers around 97.6% of the estimated genome size of coconut 'CATD' based on homozygous k-mer peak analysis. A total of 34,958 high-confidence gene models were predicted and functionally associated to various economically important traits, such as pest/disease resistance, drought tolerance, coconut oil biosynthesis, and putative transcription factors. The assembled genome was used to infer the evolutionary relationship within the palm family based on genomic variations and synteny of coding gene sequences. Data show that at least three (3) rounds of whole genome duplication occurred and are commonly shared by these members of the Arecaceae family. A total of 7,139 unique SSR markers were designed to be used as a resource in marker-based breeding. In addition, we discovered 58,503 variants in coconut by aligning the Hainan Tall (HAT) WGS reads to the non-repetitive regions of the assembled CATD genome. The gene markers and genome-wide SSR markers established here will facilitate the development of varieties with resilience to climate change, resistance to pests and diseases, and improved oil yield and quality.Entities:
Keywords: Cocos nucifera L.; Dovetail Chicago sequencing; Illumina Miseq Sequencing; PacBio SMRT sequencing; SSR and SNP markers; dwarf coconut; genome assembly; hybrid assembly
Mesh:
Substances:
Year: 2019 PMID: 31167834 PMCID: PMC6686914 DOI: 10.1534/g3.119.400215
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Statistical summary of the ‘Catigan Green Dwarf’ (CATD) coconut assembly using various sequencing technologies and corresponding bioinformatics pipelines
| PARAMETERS | SPARSE (ILLUMINA MISEQ) | SPARSE + DBG2OLC (ILLUMINA MISEQ + PACBIO SMRT) | HIRISE PIPELINE + PBJELLY (DRAFT ASSEMBLY + DOVETAIL CHICAGO) |
|---|---|---|---|
| Genome Coverage | 73.9% | 88.3% | 97.6% |
| Sequence Count | 482,724 | 25,020 | 7,998 |
| Total Length | 1.59 Gbp | 1.9 Gbp | 2.102 Gbp |
| N50 | 5,247 bp | 119 kbp | 570,487 bp |
| Longest Sequence | 57,454 bp | 1,725,761 bp | 8,779,653 bp |
| Shortest Sequence | 801 bp | 906 bp | 1,912 bp |
| Average Length | 3,295.14 bp | 76,510 bp | 570,487 bp |
| GC Level | — | — | 37.64% |
| N Content | — | — | 0.285% |
| Number of Gaps | — | — | 12,106 |
| Complete BUSCOs | — | — | 1322 (91.8%) |
| Alignment Rate (‘CATD’ Illumina Miseq WGS) | — | — | 96.96% |
| Alignment Rate (Quality-trimmed RNAseq reads - SRR1173229) | — | — | 95.7% |
| Number of gene models | — | — | 34,958 |
| Average gene length | 7724.72 bp | ||
| Average exon length | — | — | 267.36 bp |
| Average intron length | 1448.73 bp | ||
| Average number of exons per gene | — | — | 5.34 |
| Average number of introns per gene | — | — | 4.34 |
| Average protein length | — | — | 373.18 |
| Complete BUSCOs | — | — | 85.3 |
Figure 1Insertion time distributions of intact LTR in the ‘CATD’ coconut genome estimated using the Jukes-Cantor model (Jukes and Cantor 1969) for noncoding sequences, and mutation rate of 1.3 × 10−8 mutations per site per year (Ma and Bennetzen 2004).
Figure 2Syntenic dotplot between dwarf coconut var. Catigan Green dwarf (CATD) and tall coconut var. Hainan Tall (2a), CATD and date palm (P. dactylifera) (2b), and CATD and oil palm (E. guineensis; 2c). The dotplot axis matrix is in nucleotides with square dotplot axes relationship. The scaffolds in the y-axis of both (a) and (b) are arranged in the same manner by order of scaffold number. Scaffolds in the y-axis of (c) are sorted based on the Syntenic Path Assembly (SPA) using oil palm pseudomolecules as reference. The figures are generated using the Legacy Version of CoGe SynMap tool (Lyons ).
Figure 3Histogram depicting the synonymous rate change of syntenic gene pairs between dwarf coconut and other closely related sequenced genomes. The syntenic gene pairs were identified by DAGChainer, and colored based on their synonymous substitution rate as calculated by CodeML of the CoGe SynMap tool (Lyons ). Syntenic regions derived from speciation (orthologs) from shared whole genome duplication events (α, β and γ) are also labeled.
Figure 4Maximum likelihood phylogenetic tree generated using IQ-TREE from the sequence alignment of all the predicted RGAs characterized in the ‘CATD’ genome assembly. JTT amino acid substitution model (Jones ) with empirical codon frequencies (+F) and FreeRate (+R9) rate heterogeneity across sites (Yang 1995; Soubrier ) was used to generate the tree, validated with 1000 replicates of ultrafast bootstrapping (Hoang ) and SH-aLRT (Guindon et al. 2010) tests. The branches colored as red are for TM-CC, blue for NBS-containing and green for TX and TN resistance gene analogs.
Summary, characteristics and distribution of sequence variations between ‘Hainan Tall’ and ‘Catigan Green Dwarf’ (CATD) genomes. Location of variants is based on ‘CATD’ sequence assembly as the reference in this genome mapping analysis
| Genome Region | |||
|---|---|---|---|
| Variants | Non-repeat region | Genic region | Exonic region |
| (intergenic, gene) | (Intron + Exon) | ||
| Number of SNPs | 57,872 | 21,066 | 5,552 |
| Number of Transversions | 40,233 | 7,192 | 1,664 |
| Number of Transitions | 17,639 | 13,875 | 3,888 |
| Ts/Tv ratio | 2.2809 | 1.93 | 2.34 |
| Number of InDels (1-6 bp) | 631 | 143 | 48 |
| Single-base InDels | 392 | 70 | 17 |
| di-nucleotide InDels | 128 | 32 | 12 |
| >3-bp InDels | 111 | 41 | 19 |
| Total Number of Variants | 58,503 | 21,209 | 5,600 |
Figure 5Occurrence of sequence variations in the non-repeat region of coconut based on map alignment of ‘HAT’ WGS reads to the assembled ‘CATD’ genome. (a) Distribution of the type of coconut SNPs (transversions and transitions) detected; (b) frequency of occurrence of each SNP and bp length of InDels identified in coconut. Negative values signify deletion while positive values are insertions relative to the sequence of the assembled ‘CATD’ genome.