| Literature DB >> 35068060 |
Florian Trigodet1, Karen Lolans1, Emily Fogarty2, Alon Shaiber3, Hilary G Morrison4, Luis Barreiro1, Bana Jabri1, A Murat Eren1,2,3,4.
Abstract
By offering extremely long contiguous characterization of individual DNA molecules, rapidly emerging long-read sequencing strategies offer comprehensive insights into the organization of genetic information in genomes and metagenomes. However, successful long-read sequencing experiments demand high concentrations of highly purified DNA of high molecular weight (HMW), which limits the utility of established DNA extraction kits designed for short-read sequencing. The challenges associated with input DNA quality intensify further when working with complex environmental samples of low microbial biomass, which requires new protocols that are tailored to study metagenomes with long-read sequencing. Here, we use human tongue scrapings to benchmark six HMW DNA extraction strategies that are based on commercially available kits, phenol-chloroform (PC) extraction and agarose encasement followed by agarase digestion. A typical end goal of HMW DNA extractions is to obtain the longest possible reads during sequencing, which is often achieved by PC extractions, as demonstrated in sequencing of cultured cells. Yet our analyses that consider overall read-size distribution, assembly performance and the number of circularized elements found in sequencing results suggest that column-based kits with enzyme supplementation, rather than PC methods, may be more appropriate for long-read sequencing of metagenomes.Entities:
Keywords: high-molecular-weight DNA; long-read sequencing; metagenomics; nanopore
Mesh:
Substances:
Year: 2022 PMID: 35068060 PMCID: PMC9177515 DOI: 10.1111/1755-0998.13588
Source DB: PubMed Journal: Mol Ecol Resour ISSN: 1755-098X Impact factor: 8.678
Summary of DNA concentration and quality metrics. Technical replicates are denoted as “XX_1” and “XX_2”
| Extraction methodology | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| PB | PE | UC | GT | PC | AE | ||||||||
| Method | Metric | PB_1 | PB_2 | PE_1 | PE_2 | UC_1 | UC_2 | GT_1 | GT_2 | PC_1 | PC_2 | AE_1 | AE_2 |
| Qubit | DNA concentration (µg ml–1) | 2.34 | 1.71 | 4.08 | 4.21 | 30.7 | 36.2 | 110 | 110 | 75.8 | 77.9 | 90.6 | 95.5 |
| Nanodrop | A260/A280 | 6.01 | 5.53 | 1.75 | 2.27 | 1.95 | 1.74 | 1.74 | 1.82 | 1.97 | 1.95 | 1.8 | 1.76 |
| Nanodrop | A260/A230 | 0.26 | 0.16 | 0.33 | 0.08 | 1.98 | 2.02 | 1.86 | 2.27 | 2.07 | 2.07 | 1.56 | 1.36 |
Primary measure of nucleic acid purity—expected value for “pure” DNA: ~1.8 while >2 indicates RNA contamination.
Secondary measure of nucleic acid purity—evaluates residual chemical contamination (phenol, guanidine HCl, carbohydrate carryover). Expected values are in the range of 1.8–2.2. Values significantly lower indicate chemical contamination and are undesired.
FIGURE 1Agarose gel electrophoresis of genomic DNA isolated from a pool of tongue dorsum samples. Genomic DNA was electrophoresed on a 0.8% (w/v) agarose gel. PB, PE and UC with replicates (22 ng input, left panel) and GT, PC and AE replicates (44 ng input, right panel) are shown. Different DNA inputs were used based on overall sample availability. λ‐HindIII, Lambda DNA, digested with the restriction endonuclease HindIII, was used to assess fragment size distribution. PB, DNeasy PowerSoil with modified bead beating; PE, DNeasy PowerSoil with enzymatic treatment; UC, DNeasy UltraClean Microbial Kit; GT, Qiagen Genomic Tip 20/G with enzymatic treatment; PC, phenol–chloroform; AE, agarose encasement. The designations “_1” and “_2” indicate replicate 1 and replicate 2, respectively
The impact of HMW DNA extraction protocol on proportional read numbers and sequence lengths according to read type (microbial vs. human)
| Qiagen Genomic Tip (GT) | Phenol–chloroform (PC) | Agarose encasement (AE) | ||||
|---|---|---|---|---|---|---|
| GT_1 | GT_2 | PC_1 | PC_2 | AE_1 | AE_2 | |
| All reads | ||||||
| Number of reads | 2,052,300 | 3,681,083 | 2,008,795 | 4,338,575 | 739,186 | 962,065 |
| Sequencing yield (Gbp) | 1.440 | 1.959 | 1.311 | 1.972 | 0.767 | 1.133 |
| Human reads | ||||||
| Number of reads | 1,584,229 | 2,674,013 | 1,664,447 | 3,433,184 | 490,377 | 585,733 |
| % | 77.19 | 72.64 | 82.86 | 79.13 | 66.34 | 60.88 |
| Sequencing yield (Gbp) | 0.908 | 1.199 | 1.026 | 1.536 | 0.405 | 0.463 |
| % | 63.07 | 61.23 | 78.26 | 77.88 | 52.79 | 40.91 |
| Microbial reads | ||||||
| Number of reads | 468,071 | 1,007,070 | 344,348 | 905,391 | 248,809 | 376,332 |
| % | 22.81 | 27.36 | 17.14 | 20.87 | 33.66 | 39.12 |
| Sequencing yield (Gbp) | 0.532 | 0.759 | 0.285 | 0.436 | 0.362 | 0.669 |
| % | 36.93 | 38.77 | 21.74 | 22.12 | 47.21 | 59.09 |
FIGURE 2The impact of DNA extraction protocol on the distribution of human (lighter colour) and microbial (darker colour) read lengths from MinION sequencing. These histograms visualize the total accumulative length (total number of nucleotides sequenced) per range of individual read lengths. The x‐axis represents the read length on a log scale and the y‐axis represents the cumulative length for a given size bin (bar width). The main panel shows the size distribution of reads >2500 bp for GT (green), PC (yellow) and AE (blue) while the inset panel shows the size distribution of all reads, using the same data. Results are outlined vertically by extraction method (replicate 1, top panel; replicate 2, lower panel)
Microbial read size distribution. All percentages are relative to the total reads (or sequencing yield) of the quality filtered reads, prior to removal of human reads
| Qiagen Genomic Tip (GT) | Phenol–chloroform (PC) | Agarose encasement (AE) | ||||
|---|---|---|---|---|---|---|
| GT_1 | GT_2 | PC_1 | PC_2 | AE_1 | AE_2 | |
| All reads | ||||||
| Number of reads | 2,052,300 | 3,681,083 | 2,008,795 | 4,338,575 | 739,186 | 962,065 |
| Sequencing yield (Gbp) | 1.440 | 1.959 | 1.311 | 1.972 | 0.767 | 1.133 |
| All microbial reads | ||||||
| Number of reads | 468,071 | 1,007,070 | 344,348 | 905,391 | 248,809 | 376,332 |
| % | 22.81 | 27.36 | 17.14 | 20.87 | 33.66 | 39.12 |
| Sequencing yield (Gbp) | 0.532 | 0.759 | 0.285 | 0.436 | 0.362 | 0.669 |
| % | 36.93 | 38.77 | 21.74 | 22.12 | 47.21 | 59.09 |
| N50 | 2810 | 1929 | 1116 | 449 | 2524 | 3649 |
| L50 | 38,513 | 62,126 | 34,874 | 155,142 | 38,768 | 52,182 |
| Median length (bp) | 468 | 326 | 427 | 307 | 782 | 837 |
| Longest microbial reads (bp) | 73,029 | 90,424 | 163,320 | 180,460 | 68,189 | 116,730 |
| The top hit for the longest microbial read on NCBI’s nr database (identity/alignment) |
|
|
|
|
|
|
| Microbial reads >2.5 kb | ||||||
| Number of reads | 43,173 | 49,572 | 13,752 | 10,182 | 39,270 | 82,221 |
| % | 2.10 | 1.35 | 0.68 | 0.23 | 5.31 | 8.55 |
| Sequencing yield (Gbp) | 0.278 | 0.352 | 0.109 | 0.117 | 0.182 | 0.426 |
| % | 19.32 | 17.98 | 8.29 | 5.92 | 23.77 | 37.58 |
Showing the second longest read as the first longest read (92,515 bp) had no hits on NCBI.
Comparison of sequencing run read metrics between untreated and BluePippin size‐selected samples. All percentages are relative to the total reads (or sequencing yield) of the quality filtered reads
| Untreated | BluePippin high‐pass size selection | |||
|---|---|---|---|---|
| GT_1 | GT_2 | GT_1 SS | GT_2 SS | |
| All reads | ||||
| Total reads | 2,052,300 | 3,681,083 | 221,344 | 430,986 |
| Sequencing yield (Gbp) | 1.440 | 1.959 | 0.410 | 0.450 |
| Human reads | ||||
| Number of reads | 1,584,229 | 2,674,013 | 117,071 | 282,113 |
| % | 77.19 | 72.64 | 52.89 | 65.46 |
| Sequencing yield (Gbp) | 0.908 | 1.199 | 0.104 | 0.162 |
| % | 63.07 | 61.23 | 25.31 | 35.92 |
| Microbial reads | ||||
| Number of reads | 468,071 | 1,007,070 | 104,273 | 148,873 |
| % | 22.81 | 27.36 | 47.11 | 34.54 |
| Sequencing yield (Gbp) | 0.532 | 0.759 | 0.306 | 0.289 |
| % | 36.93 | 38.77 | 74.69 | 64.08 |
| N50 | 2810 | 1929 | 6106 | 5594 |
| Microbial reads >2.5 kb | ||||
| Number of reads | 43,173 | 49,572 | 40,248 | 34,949 |
| % | 2.10 | 1.35 | 18.18 | 8.11 |
| Sequencing yield (Gbp) | 0.278 | 0.352 | 0.252 | 0.215 |
| % | 19.32 | 17.98 | 61.26 | 47.74 |
| Microbial reads >20 kb | ||||
| Number of reads | 1269 | 2294 | 300 | 407 |
| % | 0.06 | 0.06 | 0.14 | 0.09 |
| Sequencing yield (Gbp) | 0.035 | 0.067 | 0.008 | 0.011 |
| % | 2.42 | 3.41 | 1.86 | 2.49 |
GT, Qiagen Genomic Tip 20/G with enzymatic treatment; SS size‐selection.
Results of prodigal gene calling and HMM hits for single‐copy core genes (SCGs) and ribosomal RNAs (rRNAs). Analysis was performed after removal of human‐reads
| Qiagen Genomic Tip (GT) | Phenol–chloroform (PC) | Agarose encasement (AE) | ||||
|---|---|---|---|---|---|---|
| GT_1 | GT_2 | PC_1 | PC_2 | AE_1 | AE_2 | |
| Number of genes | 676,577 | 994,050 | 391,675 | 665,986 | 440,845 | 771,653 |
| Bacterial SCGs | 2893 | 3131 | 1309 | 1210 | 2136 | 3559 |
| rRNAs (per 1000 genes) | 901 (1.33) | 1107 (1.11) | 295 (0.75) | 307 (0.46) | 655 (1.49) | 1355 (1.76) |
| Bacterial 16S rRNA | 373 | 462 | 120 | 125 | 249 | 547 |
FIGURE 3Relative abundance of 16S rRNA at the genus level. We used the Human Oral Microbiome Database (HOMD) to assign taxonomy to the 16S rRNA from the MinION reads. For the short‐read metagenomes, we used the taxonomy of the ribosomal gene S7 with the Genome Taxonomy Database (GTDB). We processed the 16S rRNA amplicons with the Minimum Entropy Decomposition (MED) algorithm and used Silva version 132 to assign taxonomy. Genera representing less than 1% of a sample were pooled as rare (in light grey). Samples noted as TD correspond to an additional sampling performed 2 weeks after the initial pool of samples used for the long‐read extractions. PB, DNeasy PowerSoil with modified bead beating; GT, Qiagen Genomic Tip 20/G with enzymatic treatment; PC, phenol–chloroform; AE, agarose encasement. The designations “_1” and “_2” indicate replicate 1 and replicate 2, respectively
flye assembly statistics. Assemblies were polished using pilon and short‐reads from the extraction PB_2
| Qiagen Genomic Tip (GT) | Phenol–chloroform (PC) | Agarose encasement (AE) | ||||
|---|---|---|---|---|---|---|
| GT_1 | GT_2 | PC_1 | PC_2 | AE_1 | AE_2 | |
| Total length (bp) | 28,002,213 | 35,034,984 | 10,630,822 | 11,909,082 | 17,557,138 | 39,151,308 |
| Number of contigs | 401 | 466 | 137 | 106 | 483 | 855 |
| No.of contigs >5 kb | 369 | 412 | 130 | 101 | 467 | 775 |
| No. of contigs >10 kb | 347 | 383 | 124 | 100 | 440 | 714 |
| No. of contigs >20 kb | 299 | 336 | 107 | 97 | 354 | 564 |
| No. of contigs >50 kb | 159 | 194 | 73 | 67 | 84 | 203 |
| No. of contigs >100 kb | 63 | 84 | 33 | 35 | 21 | 69 |
| Longest contig (bp) | 1,025,627 | 1,176,789 | 504,505 | 676,998 | 304,763 | 875,613 |
| Shortest contig (bp) | 512 | 528 | 2103 | 1025 | 913 | 511 |
| N50 (bp) | 129,677 | 155,366 | 122,828 | 166,496 | 44,461 | 71,755 |
| Number of genes | 32,540 | 41,818 | 11,628 | 13,025 | 21,392 | 48,784 |
| Single‐copy core genes | ||||||
| Bacteria_71 | 845 | 1072 | 298 | 339 | 551 | 1163 |
| Archaea_76 | 428 | 554 | 139 | 167 | 283 | 614 |
| Protista_83 | 34 | 48 | 11 | 14 | 26 | 54 |
| Ribosomal RNAs | 75 | 116 | 30 | 43 | 44 | 100 |
| No. of expected bacterial genome | 12 | 12 | 5 | 5 | 7 | 15 |
| Circular contigs | ||||||
| Number of circular contigs | 20 | 32 | 8 | 5 | 9 | 24 |
| Max. length | 86,329 | 155,366 | 155,422 | 155,411 | 24,494 | 88,098 |