| Literature DB >> 34479618 |
Patrick Driguez1, Salim Bougouffa1, Karen Carty1, Alexander Putra1, Kamel Jabbari1, Muppala Reddy1, Richard Soppe1, Ming Sin Cheung1, Yoshinori Fukasawa1, Luca Ermini2.
Abstract
Currently, different sequencing platforms are used to generate plant genomes and no workflow has been properly developed to optimize time, cost, and assembly quality. We present LeafGo, a complete de novo plant genome workflow, that starts from tissue and produces genomes with modest laboratory and bioinformatic resources in approximately 7 days and using one long-read sequencing technology. LeafGo is optimized with ten different plant species, three of which are used to generate high-quality chromosome-level assemblies without any scaffolding technologies. Finally, we report the diploid genomes of Eucalyptus rudis and E. camaldulensis and the allotetraploid genome of Arachis hypogaea.Entities:
Keywords: Arachis; Chromosome-level draft genome; Eucalyptus; Genomic standardized workflow; High molecular weight DNA extraction; Long-read sequencing; Peanut
Mesh:
Substances:
Year: 2021 PMID: 34479618 PMCID: PMC8414726 DOI: 10.1186/s13059-021-02475-z
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Fig. 1Plant long-read sequencing workflow. Asterisk indicates sequencing time depends on genome size and ploidy. Seven days completion time is based on a diploid organism with a haploid genome < 0.6–1 gigabases
Fig. 2DNA extraction and long-read sequencing output from study plant species. Yield (A) and absorbance ratios (B) of extracted HMW DNA from ten study plant species. Subread N50 of CLR libraries on Sequel I and Sequel II for seven of the ten study plant species (C). The total throughput (D), subread N50 length (E), and Q20 yield (F) for HiFi libraries sequenced on Sequel II for five study plant species. The average for all data is plotted along the margin. CLR sequencing was not completed for A. hypogaea, B. rapa, and S. melongena
Genome assembly statistics for two Eucalyptus species and A. hypogaea. We calculated the assembly statistics using Quast. CLR-based assemblies were 3-cycle polished as detailed in “Methods”. Results are based on the purged assemblies (see “Methods”).
| Plants | Type | Sized (Mb), ≥ 1 Mb|total | No. contigsd, ≥ 1 Mb|total | N50 (Mb)/L50 | N90 (Mb)/L90 | Longest contig (Mb) | Alternative size (Mb) |
|---|---|---|---|---|---|---|---|
| HiFi (~ 40×) | 531|549 | 26|331 | 36.0/7 | 7.3/15 | 61.8 | 425 | |
| CLR (~ 50×) | 506|518 | 44|138 | 16.3/11 | 5.2/30 | 33.7 | 399 | |
| HiFi (~ 51×) | 525|532 | 14|149 | 41.4/5 | 23.2/12 | 69.1 | 520 | |
| CLR (~ 230×) | 516|523 | 28|77 | 29.3/7 | 8.5/19 | 58.1 | 570 | |
| HiFi (~ 74×) | 2,564|2,623 | 114|1417 | 42.3/22 | 10.37/69 | 90.3 | 51 |
aE. rudis unknown genome size. Coverage estimated based on assembly size
bE. camaldulensis reference genome size, 558.6 Mb (AC: GCA_014182705.1)
cA. hypogaea reference genome size, 2557 Mb (AC: GCF_003086295.2)
N50 the smallest length contig at which the cumulative contig lengths equal to 50% of the assembled size, L50 N50 contig count, N90 the smallest length contig at which the cumulative contig lengths equal to 90% of the assembled size, L90 N90 contig count
dMetric calculated based on (1) minimum contig length cut-off of 1Mb or (2) no cut-off
Fig. 3Chord diagrams of E. camaldulensis, E. rudis, and A. hypogaea de novo assemblies mapped against a reference genome. Alignment of E. camaldulensis (A), E. rudis (B), and A. hypogaea (C) HiFi assemblies against E. grandis (A and B) and A. hypogaea (C) reference genomes