| Literature DB >> 35575077 |
M Renee Bellinger1, Erin M Datlof1, Karen E Selph2, Timothy J Gallaher3, Matthew L Knope1.
Abstract
The plant genus Bidens (Asteraceae or Compositae; Coreopsidae) is a species-rich and circumglobally distributed taxon. The 19 hexaploid species endemic to the Hawaiian Islands are considered an iconic example of adaptive radiation, of which many are imperiled and of high conservation concern. Until now, no genomic resources were available for this genus, which may serve as a model system for understanding the evolutionary genomics of explosive plant diversification. Here, we present a high-quality reference genome for the Hawai'i Island endemic species B. hawaiensis A. Gray reconstructed from long-read, high-fidelity sequences generated on a Pacific Biosciences Sequel II System. The haplotype-aware, draft genome assembly consisted of ~6.67 Giga bases (Gb), close to the holoploid genome size estimate of 7.56 Gb (±0.44 SD) determined by flow cytometry. After removal of alternate haplotigs and contaminant filtering, the consensus haploid reference genome was comprised of 15 904 contigs containing ~3.48 Gb, with a contig N50 value of 422 594. The high interspersed repeat content of the genome, approximately 74%, along with hexaploid status, contributed to assembly fragmentation. Both the haplotype-aware and consensus haploid assemblies recovered >96% of Benchmarking Universal Single-Copy Orthologs. Yet, the removal of alternate haplotigs did not substantially reduce the proportion of duplicated benchmarking genes (~79% vs. ~68%). This reference genome will support future work on the speciation process during adaptive radiation, including resolving evolutionary relationships, determining the genomic basis of trait evolution, and supporting ongoing conservation efforts. © The American Genetic Association. 2022.Entities:
Keywords: Asteraceae; PacBio HiFi; flow cytometry; koʻokoʻolau; monoploid genomes; polyploid
Mesh:
Year: 2022 PMID: 35575077 PMCID: PMC9113482 DOI: 10.1093/jhered/esab077
Source DB: PubMed Journal: J Hered ISSN: 0022-1503 Impact factor: 2.679
A list of programs, versions, parameters, and datasets used to produce and select a Bidens hawaiensis reference genome assembly
| Purpose | Software | Settings and associated programs | Data input or result |
|---|---|---|---|
| Reference-free genome profiling | GenomeScope2 | K-mer = 17; kcov = 13 | HiFi and HiSeq sequences |
| Genome assembly | Canu v2.0 | correctedErrorRate = 0.015 batOptions = “-eg 0.01 -eM 0.01 -dg 6 -db 6 -dr 1 -ca 50 -cp 5” -pacbio-corrected | HiFi; |
| Canu v2.0 | correctedErrorRate = 0.015 batOptions = “-eg 0.01 -eM 0.01 -dg 6 -db 6 -dr 1 -ca 50 -cp 5” contigFilter = “2 0 1.0 0.5 0” -pacbio-corrected | HiFi; | |
| Canu v2.0, hi_canu fork | correctedErrorRate = 0.015 batOptions = “-eg 0.01 -eM 0.01 -dg 6 -db 6 -dr 1 -ca 50 -cp 5” pacbio-hifi | HiFi; | |
| Duplicate purge for haploid consensus | minimap2 for Purge_Dups | assembly-reference mapping parameter (-x) = asm20; self-self mapping parameter (-x) = asm5 | All assemblies |
| Purge_Dups | purge_dups calcuts -l 2 -m 9 -u 30 | Asm1, Asm2 | |
| Purge_Dups | purge_dups calcuts -l 2 -m 6 -u 27 | HiCanu | |
| Assembly metrics | QUAST v5.0.2 | Default settings | All assemblies |
| Assembly completeness | bwa-MEM 0.7.17 SAMtools v1.9 | Read mapping: bwa mem -M | All haploid consensus assemblies |
| BUSCO 4.0.5 | Augustus v3.2.3, Blast+ v2.2.31, HMMER v3.2, | All assemblies | |
| OrthoDB Obd10 eudicot database eudicots_odb10.2019-11-20 | |||
| Genome architecture | RepeatModeler 2.0 | -q, custom repeat library | All haploid consensus assemblies |
| RepeatScout v1.0.06 | Default settings | ‘’ | |
| RECON v. 1.08 | Default settings | ‘’ | |
| RepeatMasker version open-4-1-1 | -s | ‘’ | |
| RMBlastN 2.10.0 | Dfam v3.3 (download date 2020-11-09) | ‘’ | |
| Organelle contig identification | MitoFinder v 1.4 | --new-genes, arwen | HiCanu primary and alternate contigs |
| mummer-4.0.0beta2 | --maxmatch -l 100 | ‘’ |
Analyses included: reference-free genome profiling; draft genome assembly (Asm1, Asm2, and HiCanu, result indicated by bold) using Pacific Biosystems HiFi sequence data and 3 sets of assembly parameters; removal of alternate contigs to produce consensus haploid assemblies; assembly benchmarking; repetitive content analysis; and identification of organelle contigs (HiCanu assembly only).
Figure 1.A profile of Bidens hawaiensis. (a) A B. hawaiensis flower photographed from the plant used for flow cytometry and genome sequencing. (b) An example flow cytometry data plot from one replicate of a sample with both Lycopersicon esculentum and B. hawaiensis. Shown are the number of nuclei as a function of the DNA fluorescence (arbitrary units) from propidium iodide staining. Data for the mean peak position of each replicate sample (n = 10) are available from Supplementary Table 2. (c, d) K-mer spectrum analysis of HiFi data. (c) The 3 monoploid genomes of B. hawaiensis can be detected as peaks in the coverage ∗ frequency plot. (d) A plot showing that a dominant k-mer frequency peak occurs at approximately 12x coverage, and that numerous k-mers with modest coverage were miscategorized as sequencing errors.
Descriptive statistics for 3 Bidens hawaiensis draft genome assemblies (Draft) and associated haploid consensus (Haploid) assemblies produced by duplicate purging
| Type | Assembly | Contigs | Total # assembled bases | N50 | BUSCO score (S+D) | Total # mapped short reads | Concordantly mapped short reads |
|---|---|---|---|---|---|---|---|
| Draft | Asm1 | 52 363 | 6 354 280 161 | 304 617 | 96.7% | n/a | n/a |
| Asm2 | 54 599 | 6 401 020 890 | 298 996 | 96.3% | n/a | n/a | |
| HiCanu | 64 728 | 6 669 162 087 | 187 741 | 96.4% | n/a | n/a | |
| Haploid | Asm1 | 10 514 | 2 748 720 504 | 707 276 | 95.6% | 28 048 504 (99.3%) | 16 684 169 (59.5%) |
| Asm2 | 10 913 | 2 756 276 744 | 701 909 | 95.2% | 28 049 029 (99.3%) | 16 700 646 (59.5%) | |
| HiCanu | 15 958 | 3 478 010 306 | 421 857 | 96.6% | 27 967 161 (99.6%) | 18 436 392 (65.9%) |
Assembly quality metrics include numbers of contigs, assembly length, N50, BUSCO scores (single and duplicate genes, S+D), and for duplicate purged assemblies, the total numbers of mapped and concordantly mapped Illumina short reads.
Figure 2.Comparative assessments of genome completeness and size. The genome assembly comparisons include the hexaploid Ko’oko’olau (Bidens hawaiensis; haploid consensus assembly), 12 diploid Asteraceae, and 5 polyploid eudicots (ploidy number in parenthesis). (a) Benchmarking Universal Single-Copy Ortholog (BUSCO) analysis with the eudicot ODB10 database containing 2326 genes. Shown are the proportion and numbers of BUSCO gene recoveries categorized as complete and single-copy (S), complete and duplicated (D), fragmented (F), or missing (M). (b) Total number of assembled bases for each genome. See Supplementary Table 1 for scientific names, genome accessions (when available), and genome references.