| Literature DB >> 35894697 |
Fan Jiang1, Sen Wang1, Hengchao Wang1, Anqi Wang1, Dong Xu1, Hangwei Liu1, Boyuan Yang1, Lihua Yuan1, Lihong Lei1, Rong Chen1, Weihua Li2, Wei Fan1.
Abstract
Ipomoea cairica is a perennial creeper that has been widely introduced as a garden ornamental across tropical, subtropical, and temperate regions. Because it grows extremely fast and spreads easily, it has been listed as an invasive species in many countries. Here, we constructed the chromosome-level reference genome of Ipomoea cairica by Pacific Biosciences HiFi and Hi-C sequencing, with the assembly size of 733.0 Mb, the contig N50 of 43.8 Mb, the scaffold N50 of 45.7 Mb, and the Benchmarking Universal Single-Copy Orthologs complete rate of 98.0%. Hi-C scaffolding assigned 97.9% of the contigs to 15 pseudo-chromosomes. Telomeric repeat analysis reveals that 7 of the 15 pseudo-chromosomes are gapless and telomere to telomere. The transposable element content of Ipomoea cairica is 73.4%, obviously higher than that of other Ipomoea species. A total of 38,115 protein-coding genes were predicted, with the Benchmarking Universal Single-Copy Orthologs complete rate of 98.5%, comparable to that of the genome assembly, and 92.6% of genes were functional annotated. In addition, we identified 3,039 tRNA genes and 2,403 rRNA genes in the assembled genome. Phylogenetic analysis showed that Ipomoea cairica formed a clade with Ipomoea aquatica, and they diverged from each other 8.1 million years ago. Through comparative genome analysis, we reconfirmed that a whole genome triplication event occurred specific to Convolvulaceae family and in the ancestor of the genus Ipomoea and Cuscuta. This high-quality reference genome of Ipomoea cairica will greatly facilitate the studies on the molecular mechanisms of its rapid growth and invasiveness.Entities:
Keywords: Convolvulaceae; Hi-C sequencing; Ipomoea cairica; PacBio sequencing; chromosome-level assembly
Mesh:
Substances:
Year: 2022 PMID: 35894697 PMCID: PMC9434287 DOI: 10.1093/g3journal/jkac187
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.542
Fig. 1.Assessment of the genome assembly for I. cairica. a) The photos show the shape of the leaves and flowers for the sequenced I. cairica. b) Distribution of K-mer frequencies in sequencing reads. The K-mer frequency peak 1 reflects the “heterozygous” regions, peak 2 reflects the “unique” regions, and peak 3 reflects the “repeats” regions in the genome. K-size equal 17; c) Hi-C heatmap of the genome assembly. We scanned the genome by 1-Mb nonoverlapping window as a bin and calculated valid interaction links of Hi-C data between any pair of bins, and color represents Log2(links number); d) view of the pseudo-chromosomes. The thick lines represent contigs, and the thin lines represent the links between the 2 contigs. The chromosome ends assembled with telomere-specific repeats (AAACCCT) were highlighted with solid circle.
Summary of the genomic sequencing data for I. cairica.
| Type | Sequencing platform | Read number | Base number (Gb) | Read length (bp) | Sequencing depth (x) |
|---|---|---|---|---|---|
| PacBio | PacBio Sequel II | 4,249,439 | 53.9 | 13,000 (N50) | 74 |
| Illumina | Illumina NovaSeq 6000 | 190,729,545 | 57.2 | 150 | 78 |
| Hi-C | Illumina NovaSeq 6000 | 299,790,699 | 89.9 | 150 | 123 |
Statistics of the genomic assembly for I. cairica.
| Genome assembly | Contigs | Scaffolds |
|---|---|---|
| Total length (bp) | 733,042,748 | 733,045,748 |
| Total number | 78 | 75 |
| Maximum (bp) | 65,792,717 | 65,792,717 |
| Minimum (bp) | 50,179 | 50,179 |
| N50 (bp) | 43,753,511 | 45,705,626 |
| N60 (bp) | 42,974,940 | 44,359,965 |
| N70 (bp) | 41,122,154 | 42,974,940 |
| N80 (bp) | 36,554,351 | 42,688,284 |
| N90 (bp) | 23,457,119 | 41,122,154 |
| BUSCO complete rate (%) | 98.0 | 98.0 |
Statistics of the pseudo-chromosomes.
| ID | Length (bp) | Contig no. | Gaps (bp) | G + C (%) |
|---|---|---|---|---|
| Chr01 | 65,792,717 | 1 | 0 | 36.61 |
| Chr02 | 58,719,702 | 2 | 1,000 | 35.81 |
| Chr03 | 57,176,273 | 1 | 0 | 36.61 |
| Chr04 | 51,515,914 | 2 | 1,000 | 37.23 |
| Chr05 | 48,098,784 | 1 | 0 | 36.11 |
| Chr06 | 47,573,477 | 1 | 0 | 36.77 |
| Chr07 | 45,705,626 | 1 | 0 | 36.35 |
| Chr08 | 44,607,093 | 1 | 0 | 36.17 |
| Chr09 | 44,359,965 | 1 | 0 | 36.19 |
| Chr10 | 43,753,511 | 1 | 0 | 36.43 |
| Chr11 | 42,974,940 | 1 | 0 | 36.99 |
| Chr12 | 42,688,284 | 2 | 1,000 | 35.88 |
| Chr13 | 42,619,119 | 1 | 0 | 36.64 |
| Chr14 | 41,122,154 | 1 | 0 | 36.42 |
| Chr15 | 40,670,372 | 1 | 0 | 36.04 |
| Total | 717,377,931 | 18 | 3,000 | 36.42 |
Gaps were preset as 1,000 Ns.
Statistics of transposable element content in various classes.
| TE class | Length (bp) | % of genome |
|---|---|---|
| LTR | 356,815,754 | 48.7 |
| DNA elements | 160,314,758 | 21.9 |
| MITE | 9,774,250 | 1.3 |
| LINE | 9,331,501 | 1.3 |
| SINE | 1,252,236 | 0.2 |
| RC | 392,761 | 0.1 |
| Others | 2,129 | 0.0 |
| Total | 537,883,389 | 73.4 |
LTR, long terminal repeat; MITE, miniature inverted-repeat transposable element; LINE, long interspersed nuclear element; SINE, short interspersed element; RC, rolling-circle transposable element.
Comparison of gene set between I. cairica and other Ipomoea species.
| Gene prediction |
|
|
|
|
|
|---|---|---|---|---|---|
| Gene number | 38,115 | 29,606 | 35,151 | 32,301 | 31,426 |
| Average exon number | 4.68 | 5.17 | 4.90 | 4.95 | 5.03 |
| Average exon length (bp) | 236 | 233 | 273 | 248 | 248 |
| Total exon length (bp) | 42,156,936 | 35,698,410 | 47,058,378 | 39,785,558 | 39,374,739 |
| Average CDS length (bp) | 1,106 | 1,205 | 1,338 | 1,231 | 1,252 |
| BUSCO assessment (%) | |||||
| Complete | 98.5 | 95.9 | 99.3 | 95.6 | 96.6 |
| Complete and single copy | 93.3 | 88.8 | 94.2 | 90.6 | 92.1 |
| Complete and duplication | 5.2 | 7.1 | 5.1 | 5.0 | 4.5 |
| Fragmented | 0.9 | 2.2 | 0.1 | 2.0 | 1.4 |
| Missing | 0.6 | 1.9 | 0.6 | 2.4 | 2.0 |
Fig. 2.Genome evolution analysis for I. cairica. a) Phylogeny tree constructed by RAxML using concatenated protein sequences from 391 single-copy genes. The outgroup species of C. canephora was not shown. The bar means substitution per amino acid site; b) the divergence time was estimated by MCMCtree within the package PAML, and setting the calibration time of 79–91 MYA between C. canephora and Solanales species. The node labels indicate estimated divergence time.
Fig. 3.Comparisons between the genomes of I. cairica and other Ipomoea species. Pair-wise alignment of genome sequences between I. cairica and a) I. nil, b) I. trifida, c) I. triloba, and d) I. aquatica that were performed using Minimap2 with parameter “-x asm5.”
Fig. 4.Circle (a) and dot (b) figures showing the intraspecies chromosome synteny for the genome of I. cairica. The collinear fragments with more than 10 syntenic gene pairs were plotted, and some examples showing the triples formed by the WGT event in the Ipomoea ancestor were highlighted with rectangular.
Fig. 5.Ks distribution of orthologous or paralogous genes for I. cairica and related species. a) Distributions of Ks within genomes of I. cairica, I. aquatica, I. nil, I. trifida, and I. triloba. b) Distributions of Ks within genomes I. cairica, C. australis, and S. lycopersicum were showed with solid lines and between genomes of I. cairica and the related C. australis and S. lycopersicum were showed with the line of dashes.