| Literature DB >> 32768390 |
Tehmina Bharucha1, Clarissa Oeser2, Francois Balloux3, Julianne R Brown4, Ellen C Carbo5, Andre Charlett6, Charles Y Chiu7, Eric C J Claas5, Marcus C de Goffau8, Jutte J C de Vries5, Marc Eloit9, Susan Hopkins10, Jim F Huggett11, Duncan MacCannell12, Sofia Morfopoulou13, Avindra Nath14, Denise M O'Sullivan15, Lauren B Reoma14, Liam P Shaw16, Igor Sidorov5, Patricia J Simner17, Le Van Tan18, Emma C Thomson19, Lucy van Dorp3, Michael R Wilson20, Judith Breuer21, Nigel Field2.
Abstract
The term metagenomics refers to the use of sequencing methods to simultaneously identify genomic material from all organisms present in a sample, with the advantage of greater taxonomic resolution than culture or other methods. Applications include pathogen detection and discovery, species characterisation, antimicrobial resistance detection, virulence profiling, and study of the microbiome and microecological factors affecting health. However, metagenomics involves complex and multistep processes and there are important technical and methodological challenges that require careful consideration to support valid inference. We co-ordinated a multidisciplinary, international expert group to establish reporting guidelines that address specimen processing, nucleic acid extraction, sequencing platforms, bioinformatics considerations, quality assurance, limits of detection, power and sample size, confirmatory testing, causality criteria, cost, and ethical issues. The guidance recognises that metagenomics research requires pragmatism and caution in interpretation, and that this field is rapidly evolving.Entities:
Mesh:
Year: 2020 PMID: 32768390 PMCID: PMC7406238 DOI: 10.1016/S1473-3099(20)30199-7
Source DB: PubMed Journal: Lancet Infect Dis ISSN: 1473-3099 Impact factor: 25.071
Figure 1Sources of uncertainty diagram highlighting potential contributing sources
For simplicity, this figure considers the sequencing of DNA from an environment and does not consider the process beyond the data output from the sequencer. The arrows pointing towards the central black arrow show the experimental process from left to right and the sources of variability that could contribute uncertainty. Conceptually it is clear how some of these factors contribute to systematic effects (bias). However, in addition these factors also contribute to the random error (variance) that will influence the precision of a potential finding. QC=quality control.
Figure 2The importance of reference database choice, design, and versioning in taxonomic profiling of clinical metagenomics samples
(A) Schematic representation of a typical clinical metagenomics sample with species assigned as coloured DNA and grey denoting DNA deriving from the host, contaminants, unidentified taxa, or taxa sequenced at low depth. The pie chart provides the full metagenomic composition with the bar providing the species composition excluding host DNA and contaminants. (B) Taxonomic profiling based on database 1. Species confidently assigned are highlighted by colours with unassigned species shown in grey. Using database 1, species A, B, and D are correctly assigned. Species that are misassigned are outlined with a circle. In this instance, sequences from species C are assigned to the closely related species C' because of the lack of a representative of species C in the reference database. Additionally, the reference database contains a partially contaminated sequence from species E, which is misassigned to contaminant sequences in the test clinical metagenomics sample. This affects the inference of species composition shown in the bar. (C) The addition of species F to database 2 allows assignment of a greater proportion of the species present in the original clinical metagenomics sample. Quality control and improvement of reference species E, now species E (QC), removes the spurious assignment of contaminant species. Species C is still misassigned to species C', its closest representative in the database. (D) Updating the reference database to include species C results in the correct assignment of sequences to species C rather than species C'. Species F is taxonomically reassigned to species X, leading to a change in the assigned species name despite no change in the data in the reference or query datasets. In all cases the pink sequences present in the original clinical metagenomics sample are not assigned as this species is not present in any of the three reference databases.