| Literature DB >> 22713159 |
Melissa B Duhaime1, Li Deng, Bonnie T Poulos, Matthew B Sullivan.
Abstract
Metagenomics generates and tests hypotheses about dynamics and mechanistic drivers in wild populations, yet commonly suffers from insufficient (< 1 ng) starting genomic material for sequencing. Current solutions for amplifying sufficient DNA for metagenomics analyses include linear amplification for deep sequencing (LADS), which requires more DNA than is normally available, linker-amplified shotgun libraries (LASLs), which is prohibitively low throughput, and whole-genome amplification, which is significantly biased and thus non-quantitative. Here, we adapt the LASL approach to next generation sequencing by offering an alternate polymerase for challenging samples, developing a more efficient sizing step, integrating a 'reconditioning PCR' step to increase yield and minimize late-cycle PCR artefacts, and empirically documenting the quantitative capability of the optimized method with both laboratory isolate and wild community viral DNA. Our optimized linker amplification method requires as little as 1 pg of DNA and is the most precise and accurate available, with G + C content amplification biases less than 1.5-fold, even for complex samples as diverse as a wild virus community. While optimized here for 454 sequencing, this linker amplification method can be used to prepare metagenomics libraries for sequencing with next-generation platforms, including Illumina and Ion Torrent, the first of which we tested and present data for here.Entities:
Mesh:
Year: 2012 PMID: 22713159 PMCID: PMC3466414 DOI: 10.1111/j.1462-2920.2012.02791.x
Source DB: PubMed Journal: Environ Microbiol ISSN: 1462-2912 Impact factor: 5.491
Fig. 1Linker amplification (LA) method schema. This study assesses an optimized LA method, with particular focus on providing new bar-codes in the linker ligation step to facilitate pooling of samples, as well as quantitative evaluation of the impact of amplification on resulting isolate and community DNA genomic sequencing.
Summary of treatments studied in linker amplification sequence analysis
| Pool | Treatment | Input DNA (ng) | PCR cycles | Barcode (5′–3′) | Linker | Reads (post-QC) |
|---|---|---|---|---|---|---|
| 1 | cyc15A | 10 | 15 | CGACA | CCA CAC AGA TCA CGA AGC ATA C | 4 306 |
| cyc15B | CATAG | 1 626 | ||||
| cyc15C | ATGTA | 7 582 | ||||
| cyc18A | 1 | 18 | CGTGT | 8 072 | ||
| cyc18B | ACGTG | 11 889 | ||||
| cyc18C | TGAGT | 12 739 | ||||
| cyc20A | 0.1 | 20 | CTCTA | 8 729 | ||
| cyc20B | ACTCT | 3 | ||||
| cyc20C | TGCTG | 5 186 | ||||
| cyc25A | 0.01 | 25 | CTATG | 8 722 | ||
| cyc25B | AGCAT | 8 006 | ||||
| cyc25C | TCGCA | 6 091 | ||||
| cyc30A | 0.001 | 30 | CTGAG | 7 250 | ||
| cyc30B | ATCAG | 8 201 | ||||
| cyc30C | TCATA | 10 992 | ||||
| 2 | cyc15rA | 10 | 15 + 3 | CGACA | CCA CAC AGA TCA CGA AGC ATA C | 1 287 |
| cyc15rB | CATAG | 2 088 | ||||
| cyc15rC | ATGTA | 1 710 | ||||
| cyc18rA | 1 | 18 + 3 | CGTGT | 1 183 | ||
| cyc18rB | ACGTG | 436 | ||||
| cyc18rC | TGAGT | 900 | ||||
| cyc20rA | 0.1 | 20 + 3 | CTCTA | 4 431 | ||
| cyc20rB | ACTCT | 4 | ||||
| cyc20rC | TGCTG | 1 194 | ||||
| cyc25rA | 0.01 | 25 + 3 | CTATG | 2 768 | ||
| cyc25rB | AGCAT | 1 157 | ||||
| cyc25rC | TCGCA | 1 527 | ||||
| cyc30rA | 0.001 | 30 + 3 | CTGAG | 1 775 | ||
| cyc30rB | ATCAG | 5 195 | ||||
| cyc30rC | TCATA | 1 252 | ||||
| unamp | n/a | No amp | None | ACG AGT GCG TAT ATC GCG AGT CAT | 30 279 | |
| 1 | B2cyc15A | 10 | 15 | CGACA | CCA CAC AGA TCA CGA AGC ATA C | 222 421 |
| B2cyc25A | 0.1 | 25 | CAGAT | 212 093 | ||
| B2cyc15rA | 10 | 15 + 3 | ACGTG | 119 144 | ||
| B2cyc25rA | 0.1 | 25 + 3 | TACGA | 111 680 | ||
| 2 | B2cyc15B | 10 | 15 | CGACA | CCA CAC AGA TCA CGA AGC ATA C | 261 245 |
| B2cyc25B | 0.1 | 25 | CAGAT | 340 488 | ||
| 3 | B2cyc15C | 10 | 15 | CGACA | CCA CAC AGA TCA CGA AGC ATA C | 246 313 |
| B2cyc25C | 0.1 | 25 | CAGAT | 310 311 | ||
| 4 | unamp A | n/a | No amp | None | ACG AGT GCG TAT ATC GCG AGT CAT | 132 639 |
| 5 | unamp B | n/a | No amp | None | None | 160 879 |
Triplicates are differentiated as A, B and C; reconditioned samples are identified with an ‘r’. When text in a row is blank, refer to the previously listed text; for example, input DNA for each of cyc15A, cyc15B, and cyc15C is 10 ng.
Three additional cycles represent the reconditioning PCR.
n/a, not applicable; No amp, no amplification.
Fig. 2A. Comparison of sheared H105/1 genomic DNA versus unsheared. DNA was run on Agilent chip (DNA 7500 ladder). B. Comparison of size-fractionation methods. Size-fractionation of sheared DNA targeting the 400–600 bp range. DNA was run on Agilent chip (high-sensitivity ladder).
Fig. 3Quantitative evaluation of resulting isolate genome sequencing data. A. Read depth of treatments, as mapped to the Phage H105/1 reference genome: 15–30 PCR cycles, with (red) and without (blue) reconditioning, and unamplified (black) genomic DNA. Counts are scaled by the total number of nucleotides per treatment (Table 1) and multiplied by a factor (1 222 442, the average number of nucleotides in all treatments), to scale to a relatable ‘read depth’ value. B. H105/1 genome ‘GC-bias curve’ representing the relationship between %G + C and read depth, as calculated in a 500 bp sliding window across the genome. Colour scale shared between (A) and (B).
Fig. 4Biosphere2 ocean metagenome ‘GC-bias curve’ representing the %G + C of each read per treatment, relative to the average %G + C of the unamplified treatments.
Fig. 5Protein cluster diversity in the Biosphere 2 ocean viral community. A. Rarefaction curve representing the relative sequence diversity of each treatment, as measured by protein clusters (with > 20 sequence members) derived from all amplified and unamplified treatments. Higher diversity in the amplified treatments is likely due to a preferential amplification of the rare biosphere by the LA TaKaRa-HS DNA polymerase used in PCR. B. Boxplot representing the range of percent original reads as singletons in each treatment. The two outliers at 51% are the unamplified treatments.