| Literature DB >> 27822553 |
Nicholas A Bokulich1, Jai Ram Rideout1, William G Mercurio1, Arron Shiffer1, Benjamin Wolfe2, Corinne F Maurice3, Rachel J Dutton4, Peter J Turnbaugh5, Rob Knight6, J Gregory Caporaso7.
Abstract
Mock communities are an important tool for validating, optimizing, and comparing bioinformatics methods for microbial community analysis. We present mockrobiota, a public resource for sharing, validating, and documenting mock community data resources, available at http://caporaso-lab.github.io/mockrobiota/. The materials contained in mockrobiota include data set and sample metadata, expected composition data (taxonomy or gene annotations or reference sequences for mock community members), and links to raw data (e.g., raw sequence data) for each mock community data set. mockrobiota does not supply physical sample materials directly, but the data set metadata included for each mock community indicate whether physical sample materials are available. At the time of this writing, mockrobiota contains 11 mock community data sets with known species compositions, including bacterial, archaeal, and eukaryotic mock communities, analyzed by high-throughput marker gene sequencing. IMPORTANCE The availability of standard and public mock community data will facilitate ongoing method optimizations, comparisons across studies that share source data, and greater transparency and access and eliminate redundancy. These are also valuable resources for bioinformatics teaching and training. This dynamic resource is intended to expand and evolve to meet the changing needs of the omics community.Entities:
Year: 2016 PMID: 27822553 PMCID: PMC5080401 DOI: 10.1128/mSystems.00062-16
Source DB: PubMed Journal: mSystems ISSN: 2379-5077 Impact factor: 6.496
Marker gene sequencing MCs currently available in mockrobiota
| Data set | Target region | Read length (nucleotides) | Method | Sample count(s) | Strain count | Original citation |
|---|---|---|---|---|---|---|
| Mock-1 | 16S | 100 | HiSeq | 1 E | 48 | |
| Mock-2 | 16S | 150 | MiSeq | 1 E | 48 | |
| Mock-3 | 16S | 250 | MiSeq | 2 E, 2 S | 22 | |
| Mock-4 | 16S | 150 | MiSeq | 2 E, 2 S | 22 | |
| Mock-5 | 16S | 250 | MiSeq | 2 E, 2 S | 22 | |
| Mock-6 | 16S | 100 | GAIIx | 3 E | 67 | |
| Mock-7 | 16S | 100 | HiSeq | 3 E | 67 | |
| Mock-8 | 16S | 100 | HiSeq | 3 E | 67 | |
| Mock-9 | ITS | 100 | HiSeq | 3 E | 16 | |
| Mock-10 | ITS | 100 | HiSeq | 3 E | 16 | |
| Mock-11 | 18S | 90 | HiSeq | 1 E | 12 |
Marker gene sequence target. 16S, 16S rRNA gene; ITS, internal transcribed spacer; 18S, 18S rRNA gene.
Number of MC samples contained in MC data set. E, samples with even abundance ratios among strains; S, samples with staggered (uneven) abundance ratios.
FIG 1 Example usage of mockrobiota MC resource for marker gene and metagenome sequencing pipelines. MC data sets are selected on the basis of multiple input criteria, including data set metadata, sample metadata, and represented taxa. Raw data (e.g., fastq) are demultiplexed, sequences are dereplicated or clustered as operational taxonomic units (OTUs) (marker gene data) or assembled/scaffolded to template genomes (metagenome data), and representative sequences are annotated (e.g., by taxonomy or gene). Observed taxonomic/gene annotations and abundances are compared to the expected composition (expected taxonomic assignments/gene annotations and abundances) of that MC, e.g., to generate precision and recall scores or correlations between observed and expected values. QC, quality control.
Example source composition
| Taxonomy | Sample 1 |
|---|---|
| 0.200 | |
| 0.200 | |
| 0.200 | |
| 0.200 | |
| 0.200 |
Example expected composition, annotated with Greengenes 13_8 reference taxonomy
| Taxonomy | Sample 1 |
|---|---|
| k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Staphylococcaceae;g__Staphylococcus;s__aureus | 0.200 |
| k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Staphylococcaceae;g__Staphylococcus;s__epidermidis | 0.200 |
| k__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Streptococcaceae;g__Streptococcus;s__ | 0.400 |
| k__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Streptococcaceae;g__Streptococcus;s__agalactiae | 0.200 |