| Literature DB >> 35289669 |
Casper S Poulsen1, Claus T Ekstrøm2, Frank M Aarestrup1, Sünje J Pamp1,3.
Abstract
Metagenomics is increasingly used to describe microbial communities in biological specimens. Ideally, the steps involved in the processing of the biological specimens should not change the microbiome composition in a way that it could lead to false interpretations of inferred microbial community composition. Common steps in sample preparation include sample collection, storage, DNA isolation, library preparation, and DNA sequencing. Here, we assess the effect of three library preparation kits and two DNA sequencing platforms. Of the library preparation kits, one involved a PCR step (Nextera), and two were PCR free (NEXTflex and KAPA). We sequenced the libraries on Illumina HiSeq and NextSeq platforms. As example microbiomes, two pig fecal samples and two sewage samples of which aliquots were stored at different storage conditions (immediate processing and storage at -80°C) were assessed. All DNA isolations were performed in duplicate, totaling 80 samples, excluding controls. We found that both library preparation and sequencing platform had systematic effects on the inferred microbial community composition. The different sequencing platforms introduced more variation than library preparation and freezing the samples. The results highlight that all sample processing steps need to be considered when comparing studies. Standardization of sample processing is key to generating comparable data within a study, and comparisons of differently generated data, such as in a meta-analysis, should be performed cautiously. IMPORTANCE Previous research has reported effects of sample storage conditions and DNA isolation procedures on metagenomics-based microbiome composition; however, the effect of library preparation and DNA sequencing in metagenomics has not been thoroughly assessed. Here, we provide evidence that library preparation and sequencing platform introduce systematic biases in the metagenomic-based characterization of microbial communities. These findings suggest that library preparation and sequencing are important parameters to keep consistent when aiming to detect small changes in microbiome community structure. Overall, we recommend that all samples in a microbiome study are processed in the same way to limit unwanted variations that could lead to false conclusions. Furthermore, if we are to obtain a more holistic insight from microbiome data generated around the world, we will need to provide more detailed sample metadata, including information about the different sample processing procedures, together with the DNA sequencing data at the public repositories.Entities:
Keywords: DNA sequencing; library preparation; metadata; metagenomics; microbial communities; microbiome
Mesh:
Substances:
Year: 2022 PMID: 35289669 PMCID: PMC9045301 DOI: 10.1128/spectrum.00090-22
Source DB: PubMed Journal: Microbiol Spectr ISSN: 2165-0497
FIG 1Study design and comparison between sample groups. (A) Two pig feces samples and two sewage samples were processed directly or after storage at −80°C for 64 h. The DNA isolation was performed in duplicates, respectively. Library preparation and sequencing were performed in four different combinations, NEXTflex PCR-Free library preparation with sequencing on a HiSeq (NEXTflex HiSeq), KAPA PCR-free library preparation with sequencing on a HiSeq (KAPA HiSeq), NEXTflex PCR-Free library preparation with sequencing on a NextSeq (NEXTflex NextSeq), and Nextera library preparation with sequencing on a NextSeq (Nextera NextSeq). The latter sequencing strategy was performed twice (Nextera 1 NextSeq and Nextera 2 NextSeq). The setup resulted in a total of 80 metagenomes plus 5 negative controls (i.e., DNA extraction controls). (B) Boxplots display pairwise Aitchison distances between different groupings of samples. Within the different groups, dots representing the distances were colored according to which sample the comparison was made in. Blue dots represent a distance between two different samples.
FIG 2Principal-component analysis (PCA) subset to the different sample matrices. Euclidean distances were calculated after performing centered log-ratio transformation (CLR) of the count data (Aitchison distances). Variance explained by the two first axes are included in their labels. The same DNA samples processed differently are connected with dotted lines.
Effect of sample origin (pig feces 1, pig feces 2, sewage 1, and sewage 2) and different parameters in sample processing (library preparation, DNA sequencing)
| Sample(s) included | Sample | Storage | Library prepn | Sequencing platform |
|---|---|---|---|---|
| All | <10−5 (81.9) | 6.4 × 10−2 (0.5) | 4.2 × 10−2 (1.0) | 3.0 × 10−4 (1.8) |
| Pig feces | <10−5 (21.1) | 3.8 × 10−3 (3.3) | 5.7 × 10−4 (6.2) | <10−5 (19.1) |
| Sewage | <10−5 (61.7) | 2.5 × 10−2 (2.9) | 3.0 × 10−2 (4.1) | 4.4 × 10−3 (4.5) |
| Pig feces 1 | NA | 2.8 × 10−3 (9.7) | 2.8 × 10−2 (8.9) | <10−5 (26.2) |
| Pig feces 2 | NA | 0.17 (2.7) | 5.4 × 10−3 (12.3) | <10−5 (25.3) |
| Sewage 1 | NA | <10−5 (15.1) | 3.6 × 10−4 (14.4) | <10−5 (12.8) |
| Sewage 2 | NA | <10−5 (14.0) | 6.0 × 10−5 (17.8) | <10−5 (19.6) |
Statistical tests were performed by multiple permutations partitioning sum of squares (PERMANOVA). The P value, as well as the percentage of variation explained by the parameters, is reported, testing different sample sets (all, pig feces, sewage, pig feces 1, pig feces 2, sewage 1, and sewage 2).
Proportion of the variation explained in the PERMANOVA.
NA, not applicable; no P value obtained when variable subset to a single sample (pig feces 1, pig feces 2, sewage 1, and sewage 2).
FIG 3Heatmaps of pig feces and sewage samples separately with the 30 most abundant genera. Complete-linkage clustering was performed to create dendrograms for both genera and samples. Spearman correlation was used to cluster the genera, and Aitchison distances were used to cluster the samples. Genera abundance depicted in the cells were CLR-transformed counts standardized to zero mean and unit variance. Grouping of organisms were included in genera names according to cell wall structure based on Gram-positive staining (G+), Gram-negative staining (G−), or belonging to Archaea (Ar). (A) Heatmap of all pig feces samples, where the first branching was according to sequencing platform. The third cluster of genera exclusively contained Gram negatives. (B) Heatmap of all sewage samples. The fourth cluster mainly consisted of Gram positives. A few Gram positives were also present in the other clusters. For explanation of colours, see panel A.