Literature DB >> 36061316

Breaking Barriers with Bread: Using the Sourdough Starter Microbiome to Teach High-Throughput Sequencing Techniques.

Benjamin H Holt¹, Alison Buchan², Jennifer M DeBruyn^2,3, Heidi Goodrich-Blair², Elizabeth McPherson², Veronica A Brown^2,4.

Abstract

Widespread usage of high-throughput sequencing (HTS) in the LIFE SCIENCES has produced a demand for undergraduate and graduate institutions to offer classes exposing students to all aspects of HTS (sample acquisition, laboratory work, sequencing technologies, bioinformatics, and statistical analyses). Despite the increase in demand, many challenges exist for these types of classes. We advocate for the usage of the sourdough starter microbiome for implementing meta-amplicon sequencing. The relatively small community, dominated by a few taxa, enables potential contaminants to be easily identified, while between-sample differences can be quickly statistically assessed. Finally, bioinformatic pipelines and statistical analyses can be carried out on personal student laptops or in a teaching computer lab. In two semesters adopting this system, 12 of 14 students were able to effectively capture the sourdough starter microbiome, using the instructor's paired sample as reference.

Entities: Chemical

Keywords: high-throughput sequencing; microbiome; sourdough

Year: 2022 PMID： 36061316 PMCID： PMC9429883 DOI： 10.1128/jmbe.00306-21

Source DB: PubMed Journal: J Microbiol Biol Educ ISSN： 1935-7877

INTRODUCTION

Researchers in life science disciplines are increasingly addressing biological questions using high-throughput sequencing (HTS) of nucleic acids. While technological advancements and reduced costs enable accessibility, many early career biologists lack familiarization with creating, handling, and analyzing large sequencing data sets (1). To help mitigate this barrier, academic institutions now offer classes that include HTS concepts and techniques, where students learn basic molecular biology skills, sequencing technologies, bioinformatics, and/or data analyses (2). One of the most common applications of HTS is meta-amplicon sequencing of microbiomes—the assemblage of prokaryotes, fungi, and other microscopic eukaryotes associated with a particular environment. Meta-amplicon sequencing is ideal for HTS teaching experiences because it offers students greater hands-on opportunities than genome sequencing, it is generally more cost efficient than transcriptomics, and open source bioinformatic pipelines are available for data analysis. Here, we advocate that analysis of the microbiome of sourdough starters is effective for teaching HTS meta-amplicon sequencing, expanding student knowledge regarding contributions of microbes to everyday lives, and generating data that advance the understanding of sourdough microbiome community structure. Sourdough starters comprise flour, water or milk, and a consortium of “wild” microbes used to leaven bread via CO2 production (3). These consortia are relatively simple communities, containing only a few fungal and prokaryote members, offering two key advantages: (i) small data sets to facilitate analysis and (ii) easy identification of contaminants and sequencing errors. Nutrient source, storage, and geographic regions contribute to differences in microbial composition among starters (4, 5), allowing students to apply bioinformatics and statistics to analyze these differences. We have incorporated HTS meta-amplicon sequencing of sourdough microbiomes in an upper-level microbiology class at the University of Tennessee, Knoxville (UTK), where students carry out the entire HTS meta-amplicon process. The class has been held twice, with a total of 14 senior-level undergraduate and graduate students from different backgrounds, including life and agricultural sciences.

PROCEDURE

The workflow for 16S/ITS meta-amplicon sequencing follows the Illumina 16S Metagenomic Library Preparation protocol for Illumina MiSeq and contains 4 steps: library preparation, sequencing, bioinformatics, and analysis. Library preparation involves DNA extraction and a two-step polymerase chain-reaction (PCR) indexing method with two post-PCR purification steps. Following sequencing, FASTQ files are run through a bioinformatic pipeline, using students’ personal computers or campus computer labs.

SAFETY ISSUES

Students and instructors wear lab coats and nitrile gloves during laboratory procedures: gloves prevent contamination of samples; lab coats are standard protocol for microbiology labs. Ethidium bromide (EtBr) is used during gel electrophoresis; EtBr is mutagenic and can cause severe skin, eye, and lung irritation. To minimize contact, a staining box with diluted EtBr, rather than adding concentrated EtBr directly to gels, is used. Alternative nucleic acid stains (e.g., Midori Green) could be used to mitigate this risk.

METHODS

Forty-one sourdough starter samples were solicited from UTK representatives. Samples were stored at −20°C until use, and metadata, such as flour type and starter age, were recorded. To compare between students and a more experienced user, the instructor replicated every sample in parallel. DNA was extracted using the DNeasy PowerSoil Kit (Qiagen). Extracted DNA was amplified using fungal (ITS [6]) and prokaryotic (16S rRNA [7]) primers. Libraries were prepared and sequenced on the Illumina MiSeq at the UTK Genomics Core (Fig. 1). Students included extraction and PCR blanks consisting of water in the place of template. ZymoBIOMICS Microbial Community Standard (Zymo Research, Irvine, CA) served as a positive control for lab work and bioinformatics.

FIG 1

Outline of the lab work used in the high-throughput sequencing (HTS) class, which closely follows the Illumina 16S Metagenomic Library Preparation protocol.

Outline of the lab work used in the high-throughput sequencing (HTS) class, which closely follows the Illumina 16S Metagenomic Library Preparation protocol. Sequencing reads are automatically demultiplexed as they are processed from the UTK MiSeq such that each student receives their individual, respective sample sequences from the pooled sequencing reaction. Students then use Cutadapt to remove primer sequences before moving to an existing DADA2 bioinformatic pipeline (Steps 3–5, Fig. 2; Text S1 in the supplemental material) in R v4.0.3 (8, 9). R has gained popularity in data science due to a large support community, highly customizable syntax, open-source availability, ease of installation, and plethora of available packages (10, 11). The RStudio IDE makes visualizations and code annotations easy for students to grasp and comprehend. After taxonomic identification of the resulting sequences is complete, students discuss the biological significance of their results and are encouraged to develop their own hypotheses for statistical analysis. Sequencing results are shared among students for diversity analyses. This portion could be easily expanded and incorporated into future classes focused on statistical analyses of HTS data. Students visualize quality profiles of sequencing reads to discuss the expected quality in HTS results.

FIG 2

Outline of Bioinformatic steps used in the high-throughput sequencing class. Major hinderances include the wide range of computational experience and variety of operating systems. In Step 6, OTU refers to operational taxonomic unit, while ASV refers to amplicon sequencing variant, both of which are ways of clustering sequence variants. Student experimental outcomes were evaluated by assessing the similarities between student and instructor samples (Table 1). Sequence contaminants associated with the instructor’s samples were removed using the “decontam” package in R to create an idealized sample (12, 13). Sequence variants in student samples absent from the instructor sample were flagged as contaminants. A student sample was deemed “good” if >75% of the reads matched reads in the instructor’s replicate. In 66 pairs of student–instructor samples, 10 samples, derived from 2 of 14 students, were dropped prior to the bioinformatic pipeline due to lab work issues (mixing up labels or lab errors). Six samples failed to meet the 75% match criterion, and 2 samples failed to provide quality sequence reads for both instructor and student. Of the 48 samples meeting the “good” criterion, the mean proportion of reads shared with the instructor was 98%, indicating most (12 of 14) students effectively captured the sourdough starter microbiome of at least one of their samples. These 48 samples were used to assess α and β diversity questions (Text S1). Since molecular work often involves failed reactions, students were not graded on accuracy of results, and troubleshooting of issues was discussed in class.

TABLE 1

Results of sourdough microbiome sequencing libraries from 14 students over two semesters

Student	Good/total^a	% reads detected in instructor’s pair^b	# SV not in instructor’s sample (potential contaminants)^b^,^c
1	1/2	99.4	41.0
2	1/3	99.2	28.0
3	6/6	98.3	48.3
4	4/4	98.6	46.5
5	8/8	99.5	17.9
6	7/8	99.6	26.3
7	6/6	98.7	45.3
8	2/2	87.2	23.5
9	3/3	99.4	41.7
10	6/6	98.9	25.5
11	5/5	98.6	50.2
12	1/3	99.4	55.0
13	0/4	NA—sample mix-up	NA
14	0/6	NA—amplifications failed	NA

A sample was deemed “good” if 75% or more reads were detected in the instructor’s paired sample.

Means are reported if the denominator of column two is greater than one.

SV refers to sequence variants, and the mean number of SV present in the student sample but absent from the instructor’s samples is reported. Importantly, while students often had sequences not detected in the paired instructor sample, these constituted a small proportion of total reads retained.

Results of sourdough microbiome sequencing libraries from 14 students over two semesters A sample was deemed “good” if 75% or more reads were detected in the instructor’s paired sample. Means are reported if the denominator of column two is greater than one. SV refers to sequence variants, and the mean number of SV present in the student sample but absent from the instructor’s samples is reported. Importantly, while students often had sequences not detected in the paired instructor sample, these constituted a small proportion of total reads retained.

CONCLUSIONS

Sourdough starter microbiomes are an effective model system for teaching meta-amplicon sequencing and analysis of derived data. As these systems represent relatively low diversity microbiomes, contaminants are readily identifiable, allowing for easy assessment of student technical success. Furthermore, the bioinformatic pipeline and data analyses can be completed relatively quickly on personal laptops. Moreover, genetic variation is evident in these populations, providing opportunities for students to consider how seemingly subtle differences in nucleotide composition results in significant differences in community composition but not overall function. Students can easily relate to the sourdough starter as a part of everyday life, and this connection facilitates hypothesis development and outcomes interpretation of their microbiome sequencing projects.

9 in total

Review 1. Sourdough products for convenient use in baking.

Authors: Markus J Brandt
Journal: Food Microbiol Date: 2006-09-08 Impact factor: 5.516

2. Biology: The big challenges of big data.

Authors: Vivien Marx
Journal: Nature Date: 2013-06-13 Impact factor: 49.962

3. DADA2: High-resolution sample inference from Illumina amplicon data.

Authors: Benjamin J Callahan; Paul J McMurdie; Michael J Rosen; Andrew W Han; Amy Jo A Johnson; Susan P Holmes
Journal: Nat Methods Date: 2016-05-23 Impact factor: 28.547

4. Evaluation of general 16S ribosomal RNA gene PCR primers for classical and next-generation sequencing-based diversity studies.

Authors: Anna Klindworth; Elmar Pruesse; Timmy Schweer; Jörg Peplies; Christian Quast; Matthias Horn; Frank Oliver Glöckner
Journal: Nucleic Acids Res Date: 2012-08-28 Impact factor: 16.971

5. Microbes, metagenomes and marine mammals: enabling the next generation of scientist to enter the genomic era.

Authors: Robert Alan Edwards; John Matthew Haggerty; Noriko Cassman; Julia Christine Busch; Kristen Aguinaldo; Sowmya Chinta; Meredith Houle Vaughn; Robert Morey; Timothy T Harkins; Clotilde Teiling; Karin Fredrikson; Elizabeth Ann Dinsdale
Journal: BMC Genomics Date: 2013-09-04 Impact factor: 3.969

6. The Populus holobiont: dissecting the effects of plant niches and genotype on the microbiome.

Authors: M A Cregger; A M Veach; Z K Yang; M J Crouch; R Vilgalys; G A Tuskan; C W Schadt
Journal: Microbiome Date: 2018-02-12 Impact factor: 14.650

7. Simple statistical identification and removal of contaminant sequences in marker-gene and metagenomics data.

Authors: Nicole M Davis; Diana M Proctor; Susan P Holmes; David A Relman; Benjamin J Callahan
Journal: Microbiome Date: 2018-12-17 Impact factor: 14.650

8. The diversity and function of sourdough starter microbiomes.

Authors: Elizabeth A Landis; Angela M Oliverio; Erin A McKenney; Lauren M Nichols; Nicole Kfoury; Megan Biango-Daniels; Leonora K Shell; Anne A Madden; Lori Shapiro; Shravya Sakunala; Kinsey Drake; Albert Robbat; Matthew Booker; Robert R Dunn; Noah Fierer; Benjamin E Wolfe
Journal: Elife Date: 2021-01-26 Impact factor: 8.140

9. phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data.

Authors: Paul J McMurdie; Susan Holmes
Journal: PLoS One Date: 2013-04-22 Impact factor: 3.240

9 in total