Literature DB >> 27673566

Next generation sequencing data of a defined microbial mock community.

Esther Singer¹, Bill Andreopoulos¹, Robert M Bowers¹, Janey Lee¹, Shweta Deshpande¹, Jennifer Chiniquy¹, Doina Ciobanu¹, Hans-Peter Klenk², Matthew Zane¹, Christopher Daum¹, Alicia Clum¹, Jan-Fang Cheng¹, Alex Copeland¹, Tanja Woyke¹.

Abstract

Generating sequence data of a defined community composed of organisms with complete reference genomes is indispensable for the benchmarking of new genome sequence analysis methods, including assembly and binning tools. Moreover the validation of new sequencing library protocols and platforms to assess critical components such as sequencing errors and biases relies on such datasets. We here report the next generation metagenomic sequence data of a defined mock community (Mock Bacteria ARchaea Community; MBARC-26), composed of 23 bacterial and 3 archaeal strains with finished genomes. These strains span 10 phyla and 14 classes, a range of GC contents, genome sizes, repeat content and encompass a diverse abundance profile. Short read Illumina and long-read PacBio SMRT sequences of this mock community are described. These data represent a valuable resource for the scientific community, enabling extensive benchmarking and comparative evaluation of bioinformatics tools without the need to simulate data. As such, these data can aid in improving our current sequence data analysis toolkit and spur interest in the development of new tools.

Entities: CellLine Chemical Disease Species

Year: 2016 PMID： 27673566 PMCID： PMC5037974 DOI： 10.1038/sdata.2016.81

Source DB: PubMed Journal: Sci Data ISSN： 2052-4463 Impact factor: 6.444

Background & Summary

By definition, benchmark studies aim to provide standards that can be used to evaluate the performance of a process. The field of nucleic acid sequencing and sequence data processing has witnessed immense developments towards optimizing the balance of sequencing cost, precision and overall applicability to real-world questions. This progress has routinely relied on experimental setups of defined nature to critically rate novel approaches. In recent years, mock communities have been assisting in a variety of laboratory and computational test experiments, which resulted in quantitative and qualitative evaluation of corresponding studied methods. For example, mock communities were generated for the comparison of DNA extraction methods[1-3], for the development of a dual-index sequencing and curation pipeline for Illumina MiSeq generated amplicon sequence data[4-8], and to evaluate the Ion Torrent sequencing platform for gene-targeted studies[9,10]. Similarly, Pabinger et al.[11-13] used a mock community to benchmark MEMOSys, a web-based platform for metabolic models. The jumpstart consortium human microbiome project (HMP) data generation working group established a standardized protocol for ensuring high throughput consistency of 16S rRNA gene amplification and sequencing protocols by implementing a synthetic mock community of 21 known organisms, before finalizing their HMP 16S 454 protocol[14-16]. The HMP DNA and sequence data resources have not only enabled comprehensive characterization of the human microbiota, e.g.[17-19], but also the use and development of a variety of advanced analysis tools. For example, chimera screening tools UCHIME and Chimera Slayer[1,3], the OTU construction pipeline UPARSE[4,6-8], and fine-tuned workflows for amplicon gene studies[9] used HMP data generated from mock communities. In contrast to the HMP mock, the synthetic community described here, MBARC-26 (Mock Bacteria ARchaea Community), is composed of organisms isolated from heterogeneous soil and aquatic environments as well as derived from human, bovine and frog (Table 1). MBARC-26 consists of 23 bacterial and 3 archaeal strains, belonging to the phyla Acidobacteria, Actinobacteria, Bacteroidetes, Deinococcus-Thermus, Firmicutes, (Alpha- and Gamma-)Proteobacteria, Spirochaetes, Thermotogae, Verrucomicrobia and Euryarchaeota. Genome sizes span 1.8–6.5 Mbp, GC contents vary between 28.4–72.7%, and repeat content ranges from 0–18.3% (Fig. 1, Table 1). All genomes are available as finished genome sequences in GenBank (Table 1). MBARC-26 DNA was shotgun sequenced on Illumina HiSeq 2000 and PacBio RSII sequencing platforms (Table 2). We provide detailed descriptions of organism characteristics (Table 1), sample processing, including DNA extraction and quantification, sequencing library creation, and sequencing procedures (Table 2). Data statistics encompass sequencing throughput characteristics (Table 2), community structure according to read mapping to reference genomes and according to molarity (Fig. 2, Supplementary Table 1, Supplementary Fig. 1), quantitative comparison between Illumina and PacBio datasets (Fig. 3a, Table 1, Supplementary Figs 2 and 3), % genome coverage and fold coverage by sequencing platform (Fig. 3b), and GC content analysis (Supplementary Fig. 3). Due to inherent sequencing technology differences[11,13], these two datasets are characterized by platform-, run mode-, and chemistry-specific read length, data throughput, GC and amplification bias, and error rate. We point out that our quantitative results are directly correlated to the respective sample preparation and sequencing methods used, as these have been shown to critically affect community representation[14,20].

Table 1

Genome statistics of each mock community member.

Organism	Isolation source	GenBank Accession ID	Genome size [bp]	GC [%]	% repeats	# of scaffolds	# of 16S copies
Genome size includes chromosomes and plasmids. All genomes are available as finished sequences. Phylum associations for each strain are abbreviated as follows: AD—Acidobacteria, AT—Actinobacteria, B—Bacteroidetes, D—Deinococcus-Thermus, E—Euryarchaeota, F—Firmicutes, P—Proteobacteria, S—Spirochaetes, T—Thermotogae, V—Verrucomicrobia. Isolation sources were obtained from literature on respective strains, where available. GC content is based on genome size. Genomes without NCBI repeat region annotation are denoted with an *.
Terriglobus roseus DSM 18391 (AD)	Soil	NC_018014	5227858	60.3	18.3	1	2
Corynebacterium glutamicum ATCC 13032 (AT)	Sewage	NC_003450	3309401	53.8	NA*	1	6
Nocardiopsis dassonvillei DSM 43111 (AT)	Soil	NC_014211	6543312	72.7	0.2	2	5
Olsenella uli DSM 7084 (AT)	Human gingival crevice	NC_014363	2051896	64.7	0.46	1	1
Segniliparus rotundus DSM 44985 (AT)	Human sputum	NC_014168	3157527	66.8	0.92	1	1
Echinicola vietnamensis DSM 17526 (B)	Seawater collected in a mussel farm	NC_019904	5608040	44.8	4.34	1	4
Meiothermus Silvanus DSM 9946 (D)	Hot spring (50 °C)	NC_014212	3721669	62.7	6.54	3	2
Clostridium perfringens ATCC 13124 (F)	Bovine	NC_008261	3256683	28.4	2.02	1	20
Clostridium thermocellum ATCC 27405 (F)	Various	NC_009012	3843301	39	7.51	1	4
Desulfosporosinus acidiphilus SJ4 DSM 22704 (F)	Pond sediment	NC_018068	4991181	42.1	4.08	3	9
Desulfosporosinus meridiei DSM 13257 (F)	Aquifer groundwater	NC_018515	4873567	41.8	2.89	1	11
Desulfotomaculum gibsoniae DSM 7213 (F)	Freshwater mud	NC_021184	4855529	45.5	5.99	1	8
Streptococcus pyogenes M1 GAS SF370 (F)	Infected wound	NC_002737	1852441	38.5	NA*	1	6
Thermobacillus composti KWC4, DSM 18247 (F)	Composting reactor	NC_019897	4355525	60.1	7.14	2	5
Escherichia coli K-12, MG1655 (P)	Human stool	NC_000913	4639675	50.8	6.7	1	7
Frateuria aurantia DSM 6220 (P)	Lilium auratium	NC_017033	3603458	63.4	1.32	1	4
Hirschia baltica ATCC 49814 (P)	Brackish water	NC_012982	3540114	45.2	0.45	2	2
Pseudomonas stutzeri RCH2 (P)	Cr-contaminated aquifer	NC_019936	4600489	62.5	1.83	4	4
Salmonella bongori NCTC 12419 (P)	African frog	NC_015761	4460105	51.3	2.36	1	7
Salmonella enterica subsp. arizonae serovar RSK2980 (P)	Animal tissue	NC_010067	4600800	51.4	2.42	1	7
Spirochaeta smaragdinae DSM 11293 (S)	Oil field	NC_014364	4653970	49	2.01	1	2
Fervidobacterium pennivorans DSM 9078 (T)	Hot mud of spa	NC_017095	2166381	39	4.04	1	2
Coraliomargarita akajimensis DSM 45221 (V)	Seawater	NC_014008	3750771	53.6	1.07	1	2
Halovivax ruber XH-70 (E)	Saline lake	CP003050.1	3223876	64.3	NA*	1	2
Natronobacterium gregoryi SP2 (E)	Solar saltworks	NC_019792.1	3788356	62.2	4.22	1	3
Natronococcus occultus DSM 3396 (E)	Lake	NC_019974.1	4314118	64.7	0.91	3	4

Figure 1

Characteristics of MBARC-26 community.

Community members display diversity in phylogenetic distribution and relatedness (a), genome size (b), GC content (c), and repeat content normalized by genome size (d). Shades of the same color in (a) denote the same phylum association: Green—Proteobacteria, blue—Actinobacteria, purple—Firmicutes, yellow—Euryarchaeota.

Table 2

Sequence Statistics by sequencing platform.

Platform	Illumina	PacBio
Model	HiSeq-HO 2000	RS II
Library chemistry	TruSeq paired-end cluster kit v3	SMRTbell template preparation kit
Sequencing chemistry	TruSEq SBS sequencing kit 200 cycles v3	P4C2
Run mode	2x150	1x120 min
# of raw reads	355,875,608	300,584
# of filtered reads	347,963,988	53,654
Average insert size [bp]	219±43	1,041±576
Average quality score (filtered reads)	Read 1: 33.47, Read 2: 32.04	0.976

Figure 2

MBARC-26 community composition and relative abundance distribution, as based on Illumina and PacBio read mapping and mean DNA molarity.

Mock community members are grouped and arranged in order of % mapped sequences (Illumina). The observed discrepancy between molarity and % mapped PacBio and Illumina sequences in T. composti is likely due to contamination as T. composti was previously found to occur as laboratory contaminant in various shotgun metagenome datasets (unpublished data). The smaller discrepancies are expected due to DNA quantification spreads and platform biases. Colors denote phylum association as defined in Fig. 1.

Figure 3

Quantitative comparison of MBARC-26 Illumina and PacBio shotgun sequence datasets.

(a) Community representation according to % mapped sequences for each mock community member in the PacBio (x-axis) and Illumina (y-axis) shotgun sequence datasets. (b) Percent chromosome coverage and fold coverage of each mock community genome by sequencing platform using unassembled sequences. Colors denote phylum association as defined in Fig. 1.

To date, several studies already utilized MBARC-26 and took advantage of its purposefully selected characteristics. Availability of complete reference genomes and relative abundance spread of individual constituents enabled determining lower limits of various metagenome library preparation protocols[14]. MBARC-26 was also used to develop a new full-length 16S rRNA gene amplicon sequencing protocol called PhyloTags[17] and allowed for quantitative comparison of amplicon to shotgun sequence data and bias evaluation associated with GC content. Using the MBARC-26 Illumina metagenome dataset and corresponding single-cell sequence data Bremges et al. developed MeCorS, a metagenome-enabled single-cell read correction tool[21]. To further encourage the use of this mock community, we report the release of molarity and shotgun sequence datasets of MBARC-26. Perpetual community efforts to develop improved DNA sequence analysis software with various applications for shotgun sequence data requires standardized and well-characterized data for benchmark experiments. MBARC-26 was validated according to the specific sample processing tools using a variety of commonly used quality control methods, is accompanied by data statistics, and meant to enable method development and evaluation while enabling reproducibility of research findings.

Methods

These methods are expanded from descriptions in our previous work[17].

Cultivation and DNA extraction

DNA from Escherichia coli, Salmonella bongori, Salmonella enterica, Clostridium perfringens, Clostridium thermocellum and Streptococcus pyogenes was purchased from the American Type- Culture Collection (ATCC, Manassas, VA, USA). DNA from Fervidobacterium pennivorans, Thermobacillus composti and Corynebacterium glutamicum was extracted using phenol–chloroform extraction, as described in (ref. 22). DNA from Desulfosporosinus acidiphilus, Desulfosporosinus meridiei, Desulfotomaculum gibsoniae, Echinicola vietnamensis, Frateuria aurantia, Natronococcus occultus, Olsenella uli and Terriglobus roseus was isolated using the Jetflex Genomic DNA Purification Kit (Genomed GmbH, Loehne, Germany). DNA from Hirschia baltica was extracted using the Blood and Cell Culture DNA Maxi Kit (Qiagen, Valencia, CA, USA). DNA from Meiothermus silvanus, Nocardiopsis dassonvillei and Segniliparus rotundus was extracted using the Qiagen Genomic 500 DNA Kit (Qiagen, Hilden, Germany). DNA from Pseudomonas stutzeri was isolated using the Wizard Genomic DNA Purification Kit (Promega Corp., Madison, WI, USA). DNA from Coraliomargarita akajimensis, Halovivax ruber, Natronobacterium gregoryi and Spirochaeta smaragdinae was extracted using the Masterpure Gram-Positive DNA Purification Kit (Epicentre, Madison, WI, USA). All DNA extracts were quantified using the PicoGreen assay and the Qubit 2.0 fluorometer (Invitrogen, Carlsbad, CA, USA) (Supplementary Fig. 1). Each sample was quantified in quadruplicate. Samples were pooled at varying ratios to generate the mock community (Fig. 2, Supplementary Table 1).

Library creation and sequencing

For Illumina library creation, 100 ng of genomic DNA of MBARC-26, brought up to a total of 100 μl using TE, was sheared to 300 bp using the Covaris LE200 (Covaris, Inc., Woburn, MA, USA) and size-selected using AMPure XP beads (Beckman Coulter, Brea, CA, USA): 60 μl of beads were added to 100 μl of sample. The sample was then incubated at room temperature (RT) for 5 min. Beads were pelleted using a magnetic particle concentrator (MPC) (Thermo Fisher Scientific, South San Francisco, CA, USA) until liquid was clear. The supernatant was removed and transferred to a new tube. 30 μl of AMPure XP beads were then added for the second bead size selection. The mixture was pulse vortexed, quickly spun and incubated at RT for 5 min. Beads were pelleted using a magnetic particle concentrator (MPC) (Thermo Fisher Scientific, South San Francisco, CA, USA) until liquid was clear. The supernatant was then discarded without disturbing the beads and 200 μl of freshly prepared 75% ethanol (EtOH) was added, followed by a 30 s incubation to wash the beads. EtOH was discarded before the wash step with EtOH was repeated for a total of two washes. Afterwards, the sample was placed on a thermocycler (Eppendorf, Hamburg, Germany) with the lid open and incubated at 37 °C until the beads were dry and residual EtOH had evaporated. The beads were re-suspended in 53 μl of EB buffer (Qiagen, Redwood City, CA, USA), vortexed, quickly spun and incubated at RT for 1 min. Beads were pelleted using an MPC until liquid was clear (Thermo Fisher Scientific, South San Francisco, CA, USA). 50 μl of supernatant was then transferred to a new tube. The DNA fragment size was assessed using the Agilent Bioanalyzer 2100 High Sensitivity Kit (Agilent Technologies, Palo Alto, CA, USA) before proceeding to end repair. The fragments were treated with the Kapa Library Preparation Kit ORIGIN (Kapa Biosystems, Wilmington, MA, USA) for the following steps: For end-repair 26 μl MilliQ water, 9 μl 10X End Repair Buffer, and 5 μl End Repair Enzyme were combined in a 1.5 ml tube. The cocktail was vortexed and quickly spun, then stored on ice. 40 μl of End Repair cocktail was added to the 50 μl DNA sample. The mixture was vortexed and quickly spun, before incubation at 30 °C for 30 min in a thermocycler (Eppendorf, Hamburg, Germany). After incubation, 126 μl of AMPure XP beads (Beckman Coulter, Brea, CA, USA) were added to 90 μl of End Repair sample, pulse vortexed, quickly spun, and incubated at RT for 5 min. Beads were pelleted using a MPC until liquid was clear. The supernatant was then discarded without disturbing the beads. The beads were washed twice with 200 μl of freshly prepared 75% EtOH with an incubation time of 30 s. After washing, the sample was incubated at 37 °C in a thermocycler with the lid open until residual EtOH had evaporated. For DNA elution, 17.5 μl of EB buffer was added. The sample was vortexed, quickly spun, and incubated at RT for 1 min, before beads were pelleted on a MPC. 15 μl of supernatant was then transferred to a new tube. For A-tailing, 9 μl of MilliQ water, 3 μl of 10X A-Tailing Buffer and 3 μl of A-Tailing Enzyme were combined in this order in a 1.5 ml tube. The cocktail was vortexed and quickly spun. 15 μl of the A-Tailing cocktail was added to the 15 μl sample. The mixture was vortexed and quickly spun. The samples were then incubated in a thermocycler at 30 °C for 30 min, followed by 5 min at 70 °C. Adaptor ligation was immediately performed thereafter: 9 μl of 5X Ligation Buffer and 5 μl of ligase were combined in a 1.5 ml tube, vortexed and spun. The mixture was pulse vortexed and quickly spun. 14 μl of adaptor ligation cocktail were added to the 30 μl sample, before 1 μl of 18 μM adaptor was added to the ligation mixture for a final concentration of 400 nM. The mixture was incubated in a thermocycler at 20 °C for 15 min. After adaptor ligation, 5 μl of EB Buffer was added to 45 μl of adaptor-ligated sample. The sample was size-selected and washed twice with 45 μl of AMPure XP beads as described previously. After the first clean-up step, the sample was eluted with 52 μl of EB Buffer and 45 μl of supernatant was transferred to a clean tube. After the second clean-up step, the sample was eluted with 25 μl of EB Buffer. 23 μl of supernatant was transferred to a clean tube. The sample was quality-controlled and quantified using an Agilent Bioanalyzer 2100 High Sensitivity Kit. The prepared Illumina library was further quantified by using the Kapa Biosystems next-generation sequencing library qPCR kit according to the manufacturer’s guidelines (Kapa Biosystems, Wilmington, MA, USA). The amplification products were run on a Roche LightCycler 480 real-time PCR instrument for quantification (Roche Holding AG, Basel, Switzerland). The quantified library was then prepared for sequencing on the Illumina HiSeq sequencing platform (Illumina, Inc., San Diego, CA, USA). First, the TruSeq paired-end cluster kit, v3, and Illumina’s cBot instrument were used to generate a clustered flowcell for sequencing (Illumina, Inc., San Diego, CA, USA). Sequencing of the flowcell was performed on the Illumina HiSeq2000 sequencer using a TruSeq SBS sequencing kit 200 cycles, v3, following a 2x150 indexed run recipe (Illumina, Inc., San Diego, CA, USA) (Table 2). This resulted in 355,875,608 raw reads. For PacBio library creation, 5 μg of gDNA was sheared using a Covaris LE220 to generate 2 kb fragments (Covaris, Inc., Woburn, MA, USA). The sheared DNA fragments were then prepared according to the SMRTbell template preparation kit guidelines (Pacific Biosciences, Menlo Park, CA, USA). Briefly, DNA fragments were treated with DNA damage repair mix, end-repaired, and 5’ phosphorylated. PacBio hairpin adapters were then ligated to the fragments to create SMRTbell template for sequencing. The SMRTbell templates were purified using exonuclease treatments and size-selected using AMPure PB beads (Pacific Biosciences, Menlo Park, CA, USA) (Table 2). Sequencing primers were annealed and v. P4 sequencing polymerase was bound to the SMRTbell templates. The prepared SMRTbell template libraries were then sequenced on a Pacific Biosciences RSII sequencer using v. C2 chemistry and 1x120 min sequencing movie run times (Pacific Biosciences, Menlo Park, CA, USA). This resulted in 300,584 raw reads (Table 2).

Sequence QC

Illumina shotgun reads were filtered using BBDuk (filterk=27, trimk=27; http://jgi.doe.gov/data-and-tools/bb-tools/) to remove Illumina adapters, known Illumina artifacts, phiX, and to quality-trim both ends to Q12. Resulting reads containing more than one ‘N’, or with quality scores (before trimming) averaging less than 8 over the read, or length under 40 bp after trimming, were discarded. Remaining reads were mapped to a masked version of human HG19, dog, cat, and mouse with BBMap (http://jgi.doe.gov/data-and-tools/bb-tools/), discarding all hits exceeding 93% identity. This resulted in 347,963,988 filtered reads with average insert size of 219±43 bp. Quality filtering and error correction of PacBio sequences was performed using the RS_ReadsOfInsert protocol v. 2.3.0 in SMRT Portal (minimum subread length: 50 bp; minimum read quality: 75%). This resulted in 53,654 quality-filtered subreads with average read length of 1,041±576 bp.

Mapping, repeat regions, and phylogenetic tree construction

High quality Illumina and PacBio sequences were mapped to their bacterial and archaeal reference genomes using BBMap with parameters bbmap.sh, ambig=toss (Illumina) and mapPacBio.sh, ambig=toss (PacBio), respectively. Numbers of mapped sequences were normalized to the respective whole genome and chromosome lengths of reference organisms (Supplementary Table 1). Unmapped sequences amounted to 2,105 (3.92%) and 3,777 (7.04%) PacBio sequences, when mapped against genome and chromosome references, respectively. In the Illumina dataset, 8,981,844 (2.58%) and 18,088,260 (5.20%) Illumina sequences remained unmapped, when mapped against genome and chromosome reference, respectively. Repeat regions reported here were retrieved from NCBI GenBank[23] on May 16, 2016. They include tandem, inverted, flanking, terminal, direct and dispersed repeat types. For phylogenetic tree construction, full-length 16S rRNA gene sequences were aligned using the SINA aligner[24] including 10 neighbors at 95% minimum identity for classification against the SILVA, RDP, greengenes, LTP, and EMBL databases[25]. The alignment was masked using the SILVA-compatible 1,349 Lane mask[26]. Tree construction was performed using FastTree[27].

Data Records

Filtered shotgun sequences generated on the Illumina and PacBio platforms are publically available through NCBI (Data Citation 1 and Data Citation 2).

Technical Validation

To assess the quality of genomic DNA received, we used the PicoGreen assay and the Qubit 2.0 fluorometer (Invitrogen, Carlsbad, CA, USA). Each sample was quantified in quadruplicate. Samples were pooled at varying ratios to generate the mock community (Supplementary Fig. 1). Both shotgun sequence datasets were screened for adapters, artifacts, according to quality scores (Illumina: Q12; PacBio: 75%), number of ‘N’, read length (Illumina: min 40 bp, PacBio: min 50 bp), and contaminant sequences related to human, dog, cat, and mouse.

Additional information

How to cite: Singer, E. et al. Next generation sequencing data of a defined microbial mock community. Sci. Data 3:160081 doi: 10.1038/sdata.2016.81 (2016).

24 in total

1. Organismal, genetic, and transcriptional variation in the deeply sequenced gut microbiomes of identical twins.

Authors: Peter J Turnbaugh; Christopher Quince; Jeremiah J Faith; Alice C McHardy; Tanya Yatsunenko; Faheem Niazi; Jason Affourtit; Michael Egholm; Bernard Henrissat; Rob Knight; Jeffrey I Gordon
Journal: Proc Natl Acad Sci U S A Date: 2010-04-02 Impact factor: 11.205

2. UPARSE: highly accurate OTU sequences from microbial amplicon reads.

Authors: Robert C Edgar
Journal: Nat Methods Date: 2013-08-18 Impact factor: 28.547

3. FastTree 2--approximately maximum-likelihood trees for large alignments.

Authors: Morgan N Price; Paramvir S Dehal; Adam P Arkin
Journal: PLoS One Date: 2010-03-10 Impact factor: 3.240

4. Evaluation of the Ion Torrent Personal Genome Machine for Gene-Targeted Studies Using Amplicons of the Nitrogenase Gene nifH.

Authors: Bangzhou Zhang; C Ryan Penton; Chao Xue; Qiong Wang; Tianling Zheng; James M Tiedje
Journal: Appl Environ Microbiol Date: 2015-04-24 Impact factor: 4.792

5. Evaluation of 16S rDNA-based community profiling for human microbiome research.

Authors:
Journal: PLoS One Date: 2012-06-13 Impact factor: 3.240

6. SINA: accurate high-throughput multiple sequence alignment of ribosomal RNA genes.

Authors: Elmar Pruesse; Jörg Peplies; Frank Oliver Glöckner
Journal: Bioinformatics Date: 2012-05-03 Impact factor: 6.937

7. Comparison of DNA extraction methods for microbial community profiling with an application to pediatric bronchoalveolar lavage samples.

Authors: Dana Willner; Joshua Daly; David Whiley; Keith Grimwood; Claire E Wainwright; Philip Hugenholtz
Journal: PLoS One Date: 2012-04-13 Impact factor: 3.240

8. The advantages of SMRT sequencing.

Authors: Richard J Roberts; Mauricio O Carneiro; Michael C Schatz
Journal: Genome Biol Date: 2013-07-03 Impact factor: 13.583

9. MeCorS: Metagenome-enabled error correction of single cell sequencing reads.

Authors: Andreas Bremges; Esther Singer; Tanja Woyke; Alexander Sczyrba
Journal: Bioinformatics Date: 2016-03-15 Impact factor: 6.937

10. High-resolution phylogenetic microbial community profiling.

Authors: Esther Singer; Brian Bushnell; Devin Coleman-Derr; Brett Bowman; Robert M Bowers; Asaf Levy; Esther A Gies; Jan-Fang Cheng; Alex Copeland; Hans-Peter Klenk; Steven J Hallam; Philip Hugenholtz; Susannah G Tringe; Tanja Woyke
Journal: ISME J Date: 2016-02-09 Impact factor: 10.302

30 in total

Review 1. Tutorial: assessing metagenomics software with the CAMI benchmarking toolkit.

Authors: Fernando Meyer; Till-Robin Lesker; David Koslicki; Adrian Fritz; Alexey Gurevich; Aaron E Darling; Alexander Sczyrba; Andreas Bremges; Alice C McHardy
Journal: Nat Protoc Date: 2021-03-01 Impact factor: 13.491

2. Critical Assessment of Metagenome Interpretation: the second round of challenges.

Authors: Fernando Meyer; Adrian Fritz; Zhi-Luo Deng; David Koslicki; Till Robin Lesker; Alexey Gurevich; Gary Robertson; Mohammed Alser; Dmitry Antipov; Francesco Beghini; Denis Bertrand; Jaqueline J Brito; C Titus Brown; Jan Buchmann; Aydin Buluç; Bo Chen; Rayan Chikhi; Philip T L C Clausen; Alexandru Cristian; Piotr Wojciech Dabrowski; Aaron E Darling; Rob Egan; Eleazar Eskin; Evangelos Georganas; Eugene Goltsman; Melissa A Gray; Lars Hestbjerg Hansen; Steven Hofmeyr; Pingqin Huang; Luiz Irber; Huijue Jia; Tue Sparholt Jørgensen; Silas D Kieser; Terje Klemetsen; Axel Kola; Mikhail Kolmogorov; Anton Korobeynikov; Jason Kwan; Nathan LaPierre; Claire Lemaitre; Chenhao Li; Antoine Limasset; Fabio Malcher-Miranda; Serghei Mangul; Vanessa R Marcelino; Camille Marchet; Pierre Marijon; Dmitry Meleshko; Daniel R Mende; Alessio Milanese; Niranjan Nagarajan; Jakob Nissen; Sergey Nurk; Leonid Oliker; Lucas Paoli; Pierre Peterlongo; Vitor C Piro; Jacob S Porter; Simon Rasmussen; Evan R Rees; Knut Reinert; Bernhard Renard; Espen Mikal Robertsen; Gail L Rosen; Hans-Joachim Ruscheweyh; Varuni Sarwal; Nicola Segata; Enrico Seiler; Lizhen Shi; Fengzhu Sun; Shinichi Sunagawa; Søren Johannes Sørensen; Ashleigh Thomas; Chengxuan Tong; Mirko Trajkovski; Julien Tremblay; Gherman Uritskiy; Riccardo Vicedomini; Zhengyang Wang; Ziye Wang; Zhong Wang; Andrew Warren; Nils Peder Willassen; Katherine Yelick; Ronghui You; Georg Zeller; Zhengqiao Zhao; Shanfeng Zhu; Jie Zhu; Ruben Garrido-Oter; Petra Gastmeier; Stephane Hacquard; Susanne Häußler; Ariane Khaledi; Friederike Maechler; Fantin Mesny; Simona Radutoiu; Paul Schulze-Lefert; Nathiana Smit; Till Strowig; Andreas Bremges; Alexander Sczyrba; Alice Carolyn McHardy
Journal: Nat Methods Date: 2022-04-08 Impact factor: 28.547

3. Emu: species-level microbial community profiling of full-length 16S rRNA Oxford Nanopore sequencing data.

Authors: Alexander Dilthey; Todd J Treangen; Kristen D Curry; Qi Wang; Michael G Nute; Alona Tyshaieva; Elizabeth Reeves; Sirena Soriano; Qinglong Wu; Enid Graeber; Patrick Finzer; Werner Mendling; Tor Savidge; Sonia Villapol
Journal: Nat Methods Date: 2022-06-30 Impact factor: 47.990

Review 4. Ecological modelling approaches for predicting emergent properties in microbial communities.

Authors: Naomi Iris van den Berg; Daniel Machado; Sophia Santos; Isabel Rocha; Jeremy Chacón; William Harcombe; Sara Mitri; Kiran R Patil
Journal: Nat Ecol Evol Date: 2022-05-16 Impact factor: 19.100

Review 5. Perspectives and Benefits of High-Throughput Long-Read Sequencing in Microbial Ecology.

Authors: Leho Tedersoo; Mads Albertsen; Sten Anslan; Benjamin Callahan
Journal: Appl Environ Microbiol Date: 2021-08-11 Impact factor: 4.792

6. Assessing soil bacterial community and dynamics by integrated high-throughput absolute abundance quantification.

Authors: Jun Lou; Li Yang; Haizhen Wang; Laosheng Wu; Jianming Xu
Journal: PeerJ Date: 2018-03-14 Impact factor: 2.984

7. VITCOMIC2: visualization tool for the phylogenetic composition of microbial communities based on 16S rRNA gene amplicons and metagenomic shotgun sequencing.

Authors: Hiroshi Mori; Takayuki Maruyama; Masahiro Yano; Takuji Yamada; Ken Kurokawa
Journal: BMC Syst Biol Date: 2018-03-19

8. MEGAN-LR: new algorithms allow accurate binning and easy interactive exploration of metagenomic long reads and contigs.

Authors: Daniel H Huson; Benjamin Albrecht; Caner Bağcı; Irina Bessarab; Anna Górska; Dino Jolic; Rohan B H Williams
Journal: Biol Direct Date: 2018-04-20 Impact factor: 4.540

9. Evaluation of Primers Targeting the Diazotroph Functional Gene and Development of NifMAP - A Bioinformatics Pipeline for Analyzing nifH Amplicon Data.

Authors: Roey Angel; Maximilian Nepel; Christopher Panhölzl; Hannes Schmidt; Craig W Herbold; Stephanie A Eichorst; Dagmar Woebken
Journal: Front Microbiol Date: 2018-04-30 Impact factor: 5.640

10. Synthetic microbe communities provide internal reference standards for metagenome sequencing and analysis.

Authors: Simon A Hardwick; Wendy Y Chen; Ted Wong; Bindu S Kanakamedala; Ira W Deveson; Sarah E Ongley; Nadia S Santini; Esteban Marcellin; Martin A Smith; Lars K Nielsen; Catherine E Lovelock; Brett A Neilan; Tim R Mercer
Journal: Nat Commun Date: 2018-08-06 Impact factor: 14.919