| Literature DB >> 32052832 |
Patrick Denis Browne1,2, Tue Kjærgaard Nielsen1,2, Witold Kot1,2, Anni Aggerholm3, M Thomas P Gilbert4, Lara Puetz4, Morten Rasmussen5, Athanasios Zervas2, Lars Hestbjerg Hansen1,2.
Abstract
BACKGROUND: Metagenomic sequencing is a well-established tool in the modern biosciences. While it promises unparalleled insights into the genetic content of the biological samples studied, conclusions drawn are at risk from biases inherent to the DNA sequencing methods, including inaccurate abundance estimates as a function of genomic guanine-cytosine (GC) contents.Entities:
Keywords: GC bias; Illumina; Oxford Nanopore; PacBio; high-throughput sequencing; metagenomics
Mesh:
Year: 2020 PMID: 32052832 PMCID: PMC7016772 DOI: 10.1093/gigascience/giaa008
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Sources of datasets for GC-bias analysis in metagenome sequencing
| Accession No. (relevant supplementary data) | Sequencing technology | Library preparation kit | Environment | Source | Total contigs >10 kb | Assembly length >10 kb | N50 >10 kb | No. PCR cycles |
|---|---|---|---|---|---|---|---|---|
| ERR526087 ( | HiSeq 2000 | Paired-End Genomic DNA Sample Prep Kit (Illumina) | Human faeces (female) | [ | 2880 | 71.9 Mb | 29,679 | 10–12 |
| SRR5035895 ( | MiSeq | NEBnext Ultra | Kelp-associated biofilm | [ | 217 | 3.77 Mb | 18,496 | 4–12 |
| SRS049959 ( | GA II | Paired-End Genomic DNA Sample Prep Kit (Illumina) | Human faeces (male) | NIH Human Microbiome Project | 1409 | 21.6 Mb | 14,775 | 10–12 |
| SRR8570466 ( | NextSeq | Nextera | Moving bed biofilm reactors with effluent wastewater | [ | 5496 | 109 Mb | 20,186 | 8 |
| SRR7521238 ( | HiSeq 2500 | NEBNext | Intestinal contents of a turkey vulture | [ | 1256 | 26.9 Mb | 22,974 | 14 |
Assembly statistics are presented for contigs larger than 10 kb only. The number of PCR cycles used during library preparation was inferred from the library preparation kit's instructions when it could not be found in the referenced publications.
Figure 1:Coverage biases in the sequencing of Fusobacterium sp. C1. The circle plot shows from the inside: GC content (Ring 1); positions of CDSs, rRNAs, and tRNAs (Ring 2); positions of the PCR targets for ddPCR and the 5.3-kb PCR products (Ring 3); and coverages of Nanopore, MiSeq, NextSeq, HiSeq, and PacBio reads (Rings 4–8, respectively). The circles are numbered from the inside. The GC content plot is centred on the median GC content, with GC contents greater than the median extending outwards. The coverage data are plotted in 50 nt windows, with separate linear scales for each dataset.
Figure 2:Coverage biases in MiSeq datasets from many bacteria with different GC contents. Dot plots show local GC content and normalized relative coverages in 500-nt windows (see Methods for explanation) of MiSeq data from a variety of bacteria with different average GC contents. Error bars indicate ±1 standard deviation of normalized coverage. The intensity of the blue in the dots is a log-transformed heat map of the number of 500-nt windows averaged into that datapoint. The datapoint with the most windows in each plot has maximum blue. The vertical green line marks the average GC content of each assembly. The average normalized coverage value is indicated with a horizontal dashed red line.
Figure 3:GC biases in NextSeq, PacBio, Nanopore, and HiSeq data. The dot plots are as described in Fig. 2.
Primer pairs used for ddPCR
| Product | Forward primer | Reverse primer | Product size |
|---|---|---|---|
| ATP synthase β-subunit | TGCTAAGGGACATGGAGGAC | AAGTCATCGGCTGGTACGTA | 414 bp |
| SSU ribosomal protein S3 | CGGAAGAAAAGGTGCTGAAAT | CTACGCTTCTCCTCCTTCCC | 424 bp |
| SSU ribosomal RNA | GCAGCAGTGGGGAATATTGG | CTGTTTGCTACCCACGCTTT | 413 bp |
Primers used to amplify 5.3-kb regions with different GC contents from Fusobacterium C1’s genome
| Primer name | Primer sequence | Orientation | Region |
|---|---|---|---|
| NormA_F | TACTAGCTCCACTTTTAATACCTG | Forward | 1,350,019–1,350,042 |
| NormA_R | GCTCTTCTTATTTCACCTTCATCT | Reverse | complement (1,355,348–1,355,371) |
| RNA_F | CTGTCTTTGCAAACCTTTCTATT | Forward | 1,317,778–1,317,800 |
| RNA_R | ATTTGGCTTCTTGTGTTTTAGTT | Reverse | complement (1,323,108–1,323,130) |