| Literature DB >> 32053657 |
Kyle D Brumfield1,2, Anwar Huq1, Rita R Colwell1,2,3, James L Olds4, Menu B Leddy5.
Abstract
Microorganisms are ubiquitous in the biosphere, playing a crucial role in both biogeochemistry of the planet and human health. However, identifying these microorganisms and defining their function are challenging. Widely used approaches in comparative metagenomics, 16S amplicon sequencing and whole genome shotgun sequencing (WGS), have provided access to DNA sequencing analysis to identify microorganisms and evaluate diversity and abundance in various environments. However, advances in parallel high-throughput DNA sequencing in the past decade have introduced major hurdles, namely standardization of methods, data storage, reproducible interoperability of results, and data sharing. The National Ecological Observatory Network (NEON), established by the National Science Foundation, enables all researchers to address queries on a regional to continental scale around a variety of environmental challenges and provide high-quality, integrated, and standardized data from field sites across the U.S. As the amount of metagenomic data continues to grow, standardized procedures that allow results across projects to be assessed and compared is becoming increasingly important in the field of metagenomics. We demonstrate the feasibility of using publicly available NEON soil metagenomic sequencing datasets in combination with open access Metagenomics Rapid Annotation using the Subsystem Technology (MG-RAST) server to illustrate advantages of WGS compared to 16S amplicon sequencing. Four WGS and four 16S amplicon sequence datasets, from surface soil samples prepared by NEON investigators, were selected for comparison, using standardized protocols collected at the same locations in Colorado between April-July 2014. The dominant bacterial phyla detected across samples agreed between sequencing methodologies. However, WGS yielded greater microbial resolution, increased accuracy, and allowed identification of more genera of bacteria, archaea, viruses, and eukaryota, and putative functional genes that would have gone undetected using 16S amplicon sequencing. NEON open data will be useful for future studies characterizing and quantifying complex ecological processes associated with changing aquatic and terrestrial ecosystems.Entities:
Mesh:
Substances:
Year: 2020 PMID: 32053657 PMCID: PMC7018008 DOI: 10.1371/journal.pone.0228899
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Whole genome and 16S amplicon metagenomic datasets examined in this study.
| MG-RAST ID | NEON Data Product ID | NCBI BioProject ID | Sequencing Method | Collection Date (M/D/Y) | Collection Location |
|---|---|---|---|---|---|
| mgm4637825.3 | NEON Soil Metagenomes (DP1.10107.001) | PRJNA406974 | WGS | 4/15/14 | 40°49'06.4"N |
| mgm4637821.3 | NEON Soil Metagenomes (DP1.10107.001) | PRJNA406974 | WGS | 4/15/14 | 40°49'06.3"N |
| mgm4637831.3 | NEON Soil Metagenomes (DP1.10107.001) | PRJNA406974 | WGS | 7/15/14 | 40°48'45.9"N |
| mgm4637826.3 | NEON Soil Metagenomes (DP1.10107.001) | PRJNA406974 | WGS | 7/16/14 | 40°49'06.4"N |
| mgm4783766.3 | NEON Soil Marker Gene Sequences (DP1.10108.001) | PRJNA393362 | 16S Amplicon | 4/15/14 | 40°51'02.8"N |
| mgm4783759.3 | NEON Soil Marker Gene Sequences (DP1.10108.001) | PRJNA393362 | 16S Amplicon | 4/15/14 | 40°51'02.9"N |
| mgm4778732.3 | NEON Soil Marker Gene Sequences (DP1.10108.001) | PRJNA393362 | 16S Amplicon | 7/15/14 | 40°51'03.0"N |
| mgm4778744.3 | NEON Soil Marker Gene Sequences (DP1.10108.001) | PRJNA393362 | 16S Amplicon | 7/16/14 | 40°49'02.6"N |
Sequence breakdown of quality predicted protein features, and total taxonomic hits of WGS and 16S amplicon sequencing samples included in this study.
| Taxonomic Hits Distribution (Relative Abundance %) | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| MG-RAST ID | Sequencing Method | Sequence Count Post QC | Mean Sequence Length | Identified Protein Features | Identified rRNA Features | Total Taxonomic Hits | Archaea | Bacteria | Eukaryota | Viruses | Other Sequences |
| mgm4637825.3 | WGS | 11,623,197 | 158 ± 14 bp | 3,637,507 | 4,365 | 3,349,527 | 27,103 (0.81%) | 3,280,081 (97.93%) | 36,533 (1.09%) | 385 (0.01%) | 5,425 (0.16%) |
| mgm4637821.3 | WGS | 11,088,780 | 162 ± 16 bp | 3,575,354 | 3,603 | 3,285,741 | 24,348 (0.74%) | 3,236,387 (98.50%) | 19,219 (0.58%) | 324 (0.01%) | 5,463 (0.17%) |
| mgm4637831.3 | WGS | 5,704,956 | 162 ± 15 bp | 1,748,119 | 2,264 | 1,621,138 | 11,851 (0.73%) | 1,594,072 (98.33%) | 12,309 (0.76%) | 279 (0.02%) | 2,627 (0.16%) |
| mgm4637826.3 | WGS | 5,663,984 | 159 ± 14 bp | 1,823,419 | 2,588 | 1,679,821 | 11,619 (0.69%) | 1,651,457 (98.31%) | 14,006 (0.83%) | 240 (0.01%) | 2,499 (0.16%) |
| mgm4783766.3 | 16S | 2,827 | 253 ± 2 bp | N/A | 2,752 | 18,197 | 713 (5.57%) | 10,548 (82.43%) | 946 (7.39%) | 0 | 5,990 (4.61%) |
| mgm4783759.3 | 16S | 2,420 | 253 ± 2 bp | N/A | 2,765 | 9,728 | 631 (6.49%) | 8,262 (84.93%) | 491 (5.05%) | 0 | 644 (3.53%) |
| mgm4778732.3 | 16S | 5,132 | 253 ± 2 bp | N/A | 5,043 | 23,807 | 737 (3.10%) | 21,054 (88.44%) | 1309 (5.5%) | 0 | 707 (2.96%) |
| mgm4778744.3 | 16S | 2,880 | 253 ± 3 bp | N/A | 3,643 | 10,860 | 393 (3.62%) | 9,626 (88.64%) | 657 (6.05%) | 0 | 184 (1.69%) |