| Literature DB >> 22214261 |
Samuel O Oyola1, Thomas D Otto, Yong Gu, Gareth Maslen, Magnus Manske, Susana Campino, Daniel J Turner, Bronwyn Macinnis, Dominic P Kwiatkowski, Harold P Swerdlow, Michael A Quail.
Abstract
BACKGROUND: Massively parallel sequencing technology is revolutionizing approaches to genomic and genetic research. Since its advent, the scale and efficiency of Next-Generation Sequencing (NGS) has rapidly improved. In spite of this success, sequencing genomes or genomic regions with extremely biased base composition is still a great challenge to the currently available NGS platforms. The genomes of some important pathogenic organisms like Plasmodium falciparum (high AT content) and Mycobacterium tuberculosis (high GC content) display extremes of base composition. The standard library preparation procedures that employ PCR amplification have been shown to cause uneven read coverage particularly across AT and GC rich regions, leading to problems in genome assembly and variation analyses. Alternative library-preparation approaches that omit PCR amplification require large quantities of starting material and hence are not suitable for small amounts of DNA/RNA such as those from clinical isolates. We have developed and optimized library-preparation procedures suitable for low quantity starting material and tolerant to extremely high AT content sequences.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22214261 PMCID: PMC3312816 DOI: 10.1186/1471-2164-13-1
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Screening for tolerance to an AT-rich template using conventional PCR amplification. Top panel: PCR amplification of a 540 bp locus (Pf3D7_11:1294982-1295521) with a relatively balanced (70% AT) base composition (positive control) in the presence or absence of TMAC. Bottom panel: PCR amplification of a 1217 bp locus (Pf3D7_01:55900-57116) with extreme AT content (84%) in the presence or absence of TMAC. M, 100 bp DNA ladder (NEB); (1) PWO master; (2) PWO master + TMAC; (3) PfuULTRA; (4) PfuULTRA + TMAC; (5) Kapa HiFi; (6) Kapa HiFi + TMAC; (7) AccuPrime Taq HiFi; (8) AccuPrime Taq HiFi + TMAC; (9) AccuPrime pfx SuperMix; (10) Phusion; (11) Phusion +TMAC; (12) Platinum HiFi; (13) Platinum HiFi + TMAC; (14) Platinum pfx; (15) Platinum pfx + TMAC, (16) Ex Taq; (17) Ex Taq + TMAC; (18) Kapa2G Robust; (19) Kapa2G Robust + TMAC.
Figure 2A plot of genome coverage against normalized average depth. Duplicate data sets were normalized and pooled. Variance in coverage above and below the normalized average depth (red vertical line) across the genome is shown. Deviation of sample curves from the average depth indicates level of evenness in coverage depth distribution across the genome. The closer the sample curve is to the vertical line, the more even the coverage. The theoretical curve represents average normalized depth at 100% genome coverage. A) Coverage by libraries made from P. falciparum 3D7 (1 normalized depth represents 21×). B) Coverage by libraries made from clinical isolate, PK0076 (1 normalized depth represents 11×). Kapa HiFi, Kapa2G and Platinum pfx enzymes were used in the presence of TMAC.
Figure 3GC profile analysis of sequenced data. The GC content distribution for different library preparation methods are shown alongside theoretical data for comparison. A) GC content analysis on libraries prepared from P. falciparum 3D7 with mapped reads normalized to 21× genome coverage. B) GC content of libraries prepared from a clinical isolate (PK0076) with mapped reads normalized to 11× genome coverage. Libraries with GC content above 19.4% (the GC content of the P. falciparum 3D7 reference genome) indicate amplification bias towards templates with neutral GC composition. C) Artemis [9,10] screen view of coverage (mapped reads normalized to 21× genome coverage) for a PCR-free library and four other libraries under test on P. falciparum 3D7 chromosome 1 (zoomed in to show coverage on the GC rich telomere). Kapa HiFi, Kapa2G and Platinum pfx enzymes were used in the presence of TMAC. See additional file 1, Figure S2 A & B for coverage on the entire chromosome 1 and AT-rich locus.
Average GC content
| Sample | Library | PCR-free | Kapa HiFi | Kapa2G | AccuPrime | Platinum | RPA | T7 | Phusion |
|---|---|---|---|---|---|---|---|---|---|
| Av. %GC content | 19.50 | 20.35 | 21.44 | 21.93 | 21.47 | 20.75 | 22.63 | 23.80 | |
| Av. %GC content | 21.92 | 19.79 | 21.07 | 21.37 | 21.91 | 19.66 | 20.95 | 22.87 |
Calculated values of average GC content corresponding to each library preparation. Deviation from the 19.4% (value of the unamplified genome) indicates the effect of amplification bias. See Figure 3 A & B for a global graphical presentation of the GC content distribution. 3D7, P. falciparum strain 3D7; PK0076, P. falciparum clinical isolate.
Figure 4Box plots showing coverage analysis of . (i) P. falciparum 3D7; mapped reads normalized to 21× genome coverage (1 normalized depth represents 21×). (ii) Clinical isolate PK0076; mapped reads normalized to 11× genome coverage (1 normalized depth represents 11×). Subplots B, C and D in both i & ii show coverage of sub-regions of the P. falciparum 3D7 chromosome 11. A) Coverage depth variability plotted for each library on the entire chromosome. B) Distribution of base coverage depth for each library over gene Pf11_0074 and its neighboring introns. C) Distribution of base coverage depth at positions 259985-260864 (extreme AT-region). D) Distribution of base coverage depth at positions 29092-30361 (VAR gene and introns). Top and bottom sides of a box plot represent 75th and 25th percentile of base coverage-depth distribution respectively. The middle line represents 50th percentile. A narrow box indicates less variation in coverage depth across that locus and vice versa. Kapa HiFi, Kapa2G and Platinum pfx enzymes were used in the presence of TMAC. All P. falciparum 3D7and most clinical isolate libraries were prepared in duplicate and each replicate data plotted independently as shown.
Ranking of P. falciparum 3D7-amplified and PCR-free libraries.
| Amplification | Coverage score | MM score | Rank ALL |
|---|---|---|---|
| PCR -Free | 0.99 | 0.95 | 0.97 |
| Kapa HiFi | 0.8 | 0.58 | 0.69 |
| Platinum pfx | 0.67 | 0.59 | 0.63 |
| AccuPrime Taq HiFi | 0.59 | 0.56 | 0.58 |
| RPA | 0.52 | 0.6 | 0.56 |
| Kapa2G Robust | 0.59 | 0.57 | 0.58 |
| Phusion | 0.22 | 0.5 | 0.36 |
| T7 | 0.74 | 0.43 | 0.58 |
Ranking was based on genome coverage and accuracy. Coverage score: PCR-free and Kapa HiFi had the lowest percentage of genome covered at 0 or <5× thereby scoring the highest. Mismatch (MM) score: Loci of low complexity were used to generate mismatch ranking (see Material and Methods under Fidelity section). The number of True positive mismatches, False positive mismatches, Deletion and Insertions obtained for each library were used to generate a score before combining all scores to generate an overall mismatch rank. Using these criteria, RPA produced the highest score and T7 scored the least. See additional file 1, Table S3 for raw ranking data.