| Literature DB >> 22434876 |
Florent E Angly1, Dana Willner, Forest Rohwer, Philip Hugenholtz, Gene W Tyson.
Abstract
We introduce Grinder (http://sourceforge.net/projects/biogrinder/), an open-source bioinformatic tool to simulate amplicon and shotgun (genomic, metagenomic, transcriptomic and metatranscriptomic) datasets from reference sequences. This is the first tool to simulate amplicon datasets (e.g. 16S rRNA) widely used by microbial ecologists. Grinder can create sequence libraries with a specific community structure, α and β diversities and experimental biases (e.g. chimeras, gene copy number variation) for commonly used sequencing platforms. This versatility allows the creation of simple to complex read datasets necessary for hypothesis testing when developing bioinformatic software, benchmarking existing tools or designing sequence-based experiments. Grinder is particularly useful for simulating clinical or environmental microbial communities and complements the use of in vitro mock communities.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22434876 PMCID: PMC3384353 DOI: 10.1093/nar/gks251
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Flowchart of the processes and parameters used by Grinder to generate related (A) amplicon and (B) shotgun libraries.
Figure 2.Grinder PCR amplicon selection process. All possible combinations of degenerate primer matches on the template DNA are considered. By default, Grinder will extract the shortest amplicon.
Comparison of existing sequencing read simulators
| Name | References | Lic. | Homepage | Lang. | Interf. | Dataset types | Paired-end | Sequencing technologies | Qual. scores | Distinguishing features |
|---|---|---|---|---|---|---|---|---|---|---|
| Grinder | Angly | GPL | sf.net/projects/biogrinder | Perl | CLI, API, GUI | Amplicon, (meta)genomic, (meta) transcript-omic | Yes | Sanger, 454, Illumina | Yes | Species abundance models, α and β diversity, MIDs, FASTQ output, multimeras, genome length and gene copy number bias |
| GemSIM | McElroy KE (unpublished data) | GPL | sf.net/projects/gemsim | Python | CLI | (Meta)genomic | Yes | Sanger, 454, Illumina | Yes | Haplotypes, FASTQ and SAM output |
| Mason | Holtgrewe ( | GPL | C++ | CLI | Genomic | Yes | Sanger, 454, Illumina | Yes | Haplotypes, speed-focused | |
| Flowsim | Balzer | GPL | biohaskell.org/Applications/FlowSim | Haskell | CLI | Genomic | No | 454 | Yes | Targets 454 simulation: SFF flowgram output, artificial replicates |
| MetaSim | Richter | Prop. | ab.inf.uni-tuebingende/software/metasim | Java | CLI, GUI | (Meta)genomic | Yes | Sanger, 454, Illumina | No | Genome evolution model |
| FASIM | Hur | Prop. | C | CLI | Genomic | No | Sanger | No | Biased sampling model, chimeras, chromatograms | |
| CelSim | Myers ( | Prop. | – | Awk, Perl | CLI | Genomic | No | Sanger | No | Repeat and variants generation |
| GenFrag | Engle and Burks ( | Prop. | – | C | CLI | Genomic | No | Sanger | No | First read simulator |
Lic, License; Prop, proprietary; Lang, Programming language; Interf, Interfaces; Sim, Simulation; Qual, Quality.
Figure 3.Analysis of 10 MID-containing 16S rRNA gene amplicon libraries generated by Grinder that share 20% of their phylotypes. (A) Histogram of read lengths for the libraries and curve representing their expected normal distribution. (B) Log-log plot of phylotype rank-abundance in the MID1 amplicon library, with and without simulated sequencing errors, using 97% and 100% identity for OTU clustering in QIIME. (C) Heatmap comparison of the OTU distribution in the amplicon libraries analyzed with QIIME at 97% identity OTU clustering.