| Literature DB >> 35106563 |
Wirulda Pootakham1, Chutima Sonthirod1, Chaiwat Naktang1, Wasitthee Kongkachana1, Sonicha U-Thoomporn1, Phakamas Phetchawang1, Chatree Maknual2, Darunee Jiumjamrassil2, Tamanai Pravinvongvuthi2, Sithichoke Tangphatsornruang1.
Abstract
Mangroves are of great ecological and economical importance, providing shelters for a wide range of species and nursery habitats for commercially important marine species. Ceriops zippeliana (yellow mangrove) belongs to Rhizophoraceae family and is commonly distributed in the tropical and subtropical coastal communities. In this study, we present a high-quality assembly of the C. zippeliana genome. We constructed an initial draft assembly of 240,139,412 bases with an N50 contig length of 564,761 bases using the 10x Genomics linked-read technology. This assembly was further scaffolded with RagTag using a chromosome-scale assembly of a closely related Ceriops species as a reference. The final assembly contained 243,228,612 bases with an N50 scaffold length of 10,559,178 Mb. The size of the final assembly was close to those estimated using DNA flow cytometry (248 Mb) and the k-mer distribution analysis (246 Mb). We predicted a total of 23,474 gene models and 21,724 protein-coding genes in the C. zippeliana genome, of which 16,002 were assigned gene ontology terms. We recovered 97.1% of the highly conserved orthologs based on the Benchmarking Universal Single-Copy Orthologs analysis. The phylogenetic analysis based on single-copy orthologous genes illustrated that C. zippeliana and Ceriops tagal diverged approximately 10.2 million years ago (MYA), and their last common ancestor and Kandelia obovata diverged approximately 29.9 MYA. The high-quality assembly of C. zippeliana presented in this work provides a useful genomic resource for studying mangroves' unique adaptations to stressful intertidal habitats and for developing sustainable mangrove forest restoration and conservation programs.Entities:
Keywords: zzm321990 Ceriops zippelianazzm321990 ; 10x Genomics; RagTag; chromosome-scale genome assembly; mangrove
Mesh:
Year: 2022 PMID: 35106563 PMCID: PMC8982413 DOI: 10.1093/g3journal/jkac025
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Fig. 1.Morphology and distribution of C. zippeliana. a) Leaves, b) Fruits, c) Trunk, and d) Distribution map of C. zippeliana in Thailand.
Ceriops zippeliana genome assembly statistics.
| 10x Genomics | 10x Genomics + RagTag scaffolding | |
|---|---|---|
| N50 contig/scaffold size (bases) | 564,761 | 10,559,178 |
| L50 contig/scaffold number | 121 | 10 |
| N75 contig/scaffold size (bases) | 172,342 | 8,031,761 |
| L75 contig/scaffold number | 291 | 17 |
| N90 contig/scaffold size (bases) | 2,565 | 2,690 |
| L90 contig/scaffold number | 4,376 | 3,243 |
| Assembly size (bases) | 240,139,412 | 243,228,612 |
| Number of scaffolds | 27,067 | 25,704 |
| Number of scaffolds ≥100 kb | 358 | 29 |
| Number of scaffolds ≥1 Mb | 45 | 18 |
| Number of scaffolds ≥10 Mb | 0 | 12 |
| Longest scaffold (bases) | 2,579,863 | 15,634,258 |
| %N | 1.62 | 1.66 |
| GC content (%) | 35.24 | 35.24 |
Fig. 2.Genome size estimation. a) 19-mer estimate of the genome size. The x-axis is depth coverage (X), and the y-axis is the total number of k-mers with a given frequency. b) Histograms of relative DNA contents obtained after analysis of nuclei isolated from young leaf tissues of (I) C. zippeliana and (II) A. thaliana (used as a reference standard).
Fig. 3.Genomic landscape of C. zippeliana. (I) A physical map of 18 chromosomes numbered according to size (Mb scale). (II) Repeat density represented by the fraction of genomic regions covered by repetitive sequences in 250-kb windows. (III) Gene density represented by the number of genes in 250-kb windows. (IV) GC content represented by the percentage of G + C bases in 250-kb windows. (V) Syntenic regions in the genome are illustrated by connected lines.
Repeat contents in the C. zippeliana genome assembly.
| Types of repeats | Bases (Mb) | % of the assembly | % of total repeats |
|---|---|---|---|
| DNA transposons | 0.97 | 0.41 | 1.03 |
| Retrotransposons | |||
| LINE | 6.74 | 2.81 | 7.15 |
| SINE | 0.14 | 0.06 | 0.15 |
| LTR: | 26.00 | 10.82 | 27.60 |
| LTR: | 13.23 | 5.50 | 14.04 |
| LTR: Others | 1.8 | 0.75 | 1.91 |
| Simple sequence repeats | 4.16 | 1.73 | 4.41 |
| Unclassified elements | 41.18 | 17.15 | 43.71 |
| Total | 94.22 | 39.23 | — |
Annotation statistics for C. zippeliana.
|
| |
|---|---|
| Number of predicted gene models | 23,474 |
| Total gene length (Mb) | 70.26 |
| Average gene size (nt) | 2,993 |
| Average number of exons/gene | 5.415 |
| Total exon length (Mb) | 29.33 |
| Average exon length (nt) | 230.8 |
| GC content of exons (%) | 45.35 |
| Average number of Introns/gene | 4.41 |
| Total intron length (Mb) | 40.95 |
| Average intron length (nt) | 395.2 |
| GC content of introns (%) | 34.85 |
Functional annotations of C. zippeliana protein-coding genes.
| Database | Number of genes annotated (% of all predicted genes) |
|---|---|
| NR | 21,724 (92.54%) |
| Swissprot | 16,978 (72.33 %) |
| GO | 16,002 (68.17 %) |
| EC | 6,953 (29.62%) |
| KEGG | 4,177 (17.79%) |
| unannotated genes | 1,750 (7.46%) |
| Total (predicted gene models) | 23,474 |
Noncoding RNAs in the C. zippeliana genome.
| Type | Number | Length (nt) | Mean length (nt) |
|---|---|---|---|
| rRNA | 310 | 93,420 | 301.35 |
| microRNA | 3,536 | 345,228 | 97.63 |
| snRNA | 12,566 | 789,329 | 62.81 |
| snoRNA | 12,349 | 767,418 | 62.14 |
| splicing | 217 | 21,911 | 100.97 |
| tRNA | 404 | 29,208 | 72.30 |
| other ncRNA | 5,331 | 435,191 | 81.63 |
Fig. 4.Comparative analysis and phylogenetic tree of C. zippeliana and other plant species. a) A maximum-likelihood phylogenetic tree of six Rhizophoreae mangroves (indicated with asterisks) and other plant species based on single-copy orthologous protein sequences. Numbers at each node represent the estimated divergence time in MYA with gray bars showing the corresponding 95% equal-tail credibility intervals. The number of genes in expanded and contracted families is indicated in green and red, respectively. b) Distribution of 4DTv distances between orthologous genes (solid line) and paralogous genes (dotted line) in C. zippeliana, B. parviflora, K. obovata, and R. apiculata.