Literature DB >> 27047589

Turning Participatory Microbiome Research into Usable Data: Lessons from the American Gut Project.

Justine W Debelius¹, Yoshiki Vázquez-Baeza², Daniel McDonald¹, Zhenjiang Xu¹, Elaine Wolfe¹, Rob Knight³.

Abstract

The role of the human microbiome is the subject of continued investigation resulting in increased understanding. However, current microbiome research has only scratched the surface of the variety of healthy microbiomes. Public participation in science through crowdsourcing and crowdfunding microbiome research provides a novel opportunity for both participants and investigators. However, turning participatory science into publishable data can be challenging. Clear communication with the participant base and among researchers can ameliorate some challenges. Three major aspects need to be considered: recruitment and ongoing interaction, sample collection, and data analysis. Usable data can be maximized through diligent participant interaction, careful survey design, and maintaining an open source pipeline. While participatory science will complement rather than replace traditional avenues, it presents new opportunities for studies in the microbiome and beyond.

Entities: Chemical Disease Gene Species

Year: 2016 PMID： 27047589 PMCID： PMC4798814 DOI： 10.1128/jmbe.v17i1.1034

Source DB: PubMed Journal: J Microbiol Biol Educ ISSN： 1935-7877

INTRODUCTION

The human microbiome is a poorly understood, but critical, component of health. Community structure is influenced by many factors, including genetics, diet, and xenobiotic and antibiotic use (4, 5, 9, 15). The gut microbiome, in particular, plays an important role in metabolism, immune development, and endocrine and neurological signaling (10, 16). Dysbiotic gut communities have been associated with a host of human diseases including obesity, inflammatory bowel disease, type I and type II diabetes, autism, multiple sclerosis, and malnutrition (3, 16). The gut microbiome can predict risk for conditions like Crohn’s disease (8). Fecal material transplant may also transmit clinical phenotypes in some cases: one report suggested a donor transmitted a risk for obesity to her human recipient along with her stool, while trans-species transmission of obesity is well established (1, 17). Human microbiome work has primarily focused on case-control studies of a few dozen to a few hundred individuals. Budget restrictions and strict disease focus by funding agencies often limit the size and scope of investigation. Although studies supported by traditional mechanisms have led to considerable advances, there are also major pitfalls with the traditional approach. Small cohorts create inconsistent observations among studies. Trends in community structure are often shared between studies, but the individual taxa driving these trends often are not. For example, studies have found correlations between obesity and both an increase and a decrease in Methanobrevibacter smithii (19). Meta-analysis can ameliorate inconsistencies due to data analysis, although it cannot correct for differences due to sample handling or the characteristics of the control and clinical groups (14). The problem is compounded by the absence of effective, mathematically justified ways to quantify effect size or the signal-to-noise landscape in the microbiome. Even previous efforts to define “healthy” microbiomes have been small compared with cohorts used for other types of studies, such as genome-wide association studies. The Human Microbiome Project (HMP) focused on 252 healthy professional students in their twenties and thirties living in two regions of the United States (11). The HMP contributed valuable information about the microbiome, including the variation in taxonomic abundance in healthy adults and the lack of a core healthy microbiome. However, the HMP did not answer all the open questions about the healthy microbiome. For instance, the cohort was not well suited to describe how the microbial communities change between age groups or what a healthy microbiome looks like relative to dietary or lifestyle choices. Public participation in microbiome research, through crowdsourcing and crowdfunding, may provide some potential solutions to these problems. Both models transform science into a public, participatory area, rather than a practice for experts in semi-isolation. Crowdfunding involves the public in science by asking for a monetary investment in a project. Lay people can determine what they consider worthy or unworthy of funding, whether it be comparative studies of the cat microbiome (https://fundrazr.com/campaigns/410aC4/ab/f4vYF9?) or a qualitative survey of the best burritos in San Francisco (https://experiment.com/projects/qualitative-survey-of-burritos-in-san-francisco). Crowdsourcing engages the public in collecting the data to be analyzed. Individuals participate by contributing data or samples. This typically involves the contribution of observational data, such as bird sightings or flu symptoms, in projects like Flu Near You (https://flunearyou.org), but may also involve crowdsourced data collection, modeled by the Personal Genome Project (www.personalgenomes.org), or even crowdsourcing data analysis, through platforms like the online games Foldit (https://fold.it/portal/) and EteRNA (http://eterna.cmu.edu/web/). Crowdsourcing may open opportunities to access populations, areas, or information that is difficult for a finite group of researchers to access. It also offers opportunities in exploratory science: the wealth of data allows for a degree of exploration that can be more difficult in traditionally sourced studies, where participant recruitment is more focused. The American Gut Project (www.americangut.org) is a crowdfunded, crowdsourced microbiome project run through the University of California at San Diego, which was initiated as a collaboration between the Earth Microbiome Project and the Human Food Project. Participants provide a physical sample (fecal, oral, skin, pet, or environmental), answer a survey about their health, lifestyle, and diet, and a make a monetary contribution that covers the cost of microbial DNA sequencing. Individual participants receive a report describing their results. De-identified data are also deposited in a public repository. We have used American Gut to draw conclusions about factors that affect participant health in the human microbiome (Debelius, McDonald, et al., in preparation). Here, we present three stages that have been important for aggregating the American Gut results and presenting usable data.

CRITICAL CONSIDERATIONS FOR GETTING USEFUL DATA FROM CROWDSOURCING

Communication is central to successful science, especially crowdsourced science. There is an added complexity in disseminating research to the general public, because complex concepts must be translated into messages that can be readily digested by individuals without specific domain knowledge. The inherent difficulty is magnified in participatory science, as there is a continual interaction with members of the general public. The challenge of communication can be broken down into three major areas critical to crowdfunding: participant recruitment and retention, data collection (both sample and metadata collection and quality), and data dissemination.

Participant recruitment and retention

The first area, recruitment and retention, is a crowdsourced project’s initial and primary interaction. At the outset, members of the public are unlikely to be interested in your project if they are unable to understand why you are doing the project in the first place. They also want to know how they benefit by participating. In the case of the American Gut Project, one of our goals was to provide an avenue through which members of the general public could engage in cutting-edge research and, in turn, learn about the organisms that inhabit their bodies. Participatory science can be self-selecting, and this may create a biased cohort, rather than a true representation of the population. Gut microbiome research, for example, may be more likely to attract individuals with diagnosed gastrointestinal conditions, such as inflammatory bowel disease (IBD). In the American Gut, we see a six-fold enrichment in participants with IBD compared with the US population as a whole (Debelius, McDonald et al., in preparation). Sponsors have also contributed funds to provide kits for participants in other populations of interest, including children with autism spectrum disorder. These sub-studies may lead to an understanding of compositional patterns associated with these specific populations. Other, less explicit biases may also appear in the data. The role of the Internet in participatory science cannot be discounted, meaning that participation is likely linked to Internet access (6). Coupling crowdfunding to crowdsourcing may limit the participant population to those able afford the cost. The financial burden may also create self-selection, even for those with the available disposable income. Many of the early American Gut participants were individuals who emphasized the importance of diet in health, and therefore tended toward more extreme dietary choices. These implicit biases in the population may be hard to identify, and harder to correct (although they are less important for the original goal of the project in terms of identifying the diversity of types of microbiome “out there in the wild”). Decoupling crowdsourcing and crowdfunding, at least for some cohorts, may help ameliorate some biases in data. A second important interaction with participants arises when, inevitably, participants have questions. Mismanagement of the participant base can be a major reason projects fail (7). The help burden stems not just from the number of questions coming in, but the number of personnel hours necessary to answer these questions. Given the nature of crowdfunding, the rate at which a project will grow is not known in advance, which makes scoping personnel effort difficult and risky (e.g., if the project “fails”). In microbiome research, the potential for health discoveries adds a new level of complexity. Participants and backers may choose to engage in a project in which the research personally benefits them. For the American Gut Project, despite all our efforts at dispelling the notion that the data generated have current medical value, we still frequently receive questions along the lines of “I have condition X. Given my microbiome, what do you recommend I do?”

Sample collection and quality

Once participants are recruited, the next major hurdle is collecting their data. In microbiome studies, this typically involves a physical sample, or set of physical samples, and information about the participant and sample. Physical sample collection poses a challenge for biologically-based projects. The sampling protocol needs to be simple and safe. However, even simple protocols can be complicated for novices without clear instructions. The unfortunate reality is that people are bad at following instructions (for example, we anticipate that few of the readers of this article read their cell phone manual cover to cover). Explicit, succinct, and engaging instructions are vital to minimize variability in how instructions are followed. To this end, the American Gut Project took two approaches. The first is an eye-catching “quick instructions” sheet that gives a rundown of the necessary steps. In addition, detailed instructions are provided, including video examples on the website. During the course of the project so far, it has been necessary to revise the instructions, based on feedback from participants, and address obvious issues with sample collection. Notably, we discovered that the amount of fecal matter to send in was ambiguous, leading us to provide graphic examples of good and bad samples. As we refined the instructions, we encountered fewer questions, and higher quality samples were returned.

Metadata collection and quality

The human microbiome is contextually dependent, making it impossible to understand a microbiome community without information about its host (12, 18). Therefore, participant and sample metadata (i.e., contextual information) are also an important consideration in participatory microbiome research. The goal of metadata collection is to maximize the amount of accurate, usable data that can be collected for every sample. Survey design and implementation can support or impede this end. Although it is possible to analyze a few dozen free response fields for a small number of samples, it is prohibitive to analyze large numbers of free-response fields for large numbers of samples. Free response fields are also more likely to contain human error: in the American Gut dataset, individuals have reported chicken as their most common carbohydrate, which would be surprising if true (standard nutritional data for chicken breast report zero carbohydrates). Questions with controlled vocabulary, such as multiple-choice questions or fields limited to accept bounded numeric responses, can help improve accuracy. It may also be important to consider the level of detail that is possible to record in a survey. Controlled vocabulary represents one of these trade-offs. Another is the decision of whether or not to pursue information about a specific medical condition. The American Gut has addressed these issues with triggered response questions, condition-specific surveys, and the option to follow up with participants. Metadata errors are inevitable—whether in self-reported data or well-funded clinical studies (2). There are two major considerations with error reporting: how the errors are identified and the way the errors are corrected or removed. Identifying obvious errors can be easy. In the American Gut, participants who reported birth dates prior to the start of the twentieth century were identified as obvious errors. There are also profound differences between adult microbial communities based on body site, which can help when participants forget which sample was collected on which swab (11). However, other errors can be more difficult to identify. In certain American Gut analyses, we noticed that alcohol had a larger effect than antibiotic use, and that infants (birth to three years of age) had microbiomes that were more diverse than older children; a contrast with previous publications (20). When we examined the infant data further, we identified several individuals with age listed as less than three years of age but self-reported height over four feet and reported drinking more than once a week, leading us to question the age data. In a large dataset, it can be useful to remove clearly erroneous information, especially if the correct answer is difficult to determine. Age values that are likely incorrect, given the rest of the contextual information, are therefore removed from analysis within the American Gut data. Mislabeled body sites can be corrected, even against a high background mislabeling rate, using a supervised learning technique, due to the strength of the association between body site and community structure (13). The same associations may be true for other parameters as we continue to collect data.

Data analysis and dissemination

Data dissemination and communication is a final step in the scientific process. In a traditional scientific model, this has taken the form of publication in grant reports, scientific journals, and the deposition of data to repositories. Participatory science opens questions about data ownership, dissemination, and communication. Rather than delivering results to a grant committee of peers, scientists instead must communicate results to a wider community. In crowdsourced projects, individualized results may be offered as an incentive for participation. When the project focuses on characterizing human biology, it may be challenging to balance providing novel results with avoiding presenting information that could be interpreted as a medical diagnosis. In crowdfunded projects, regular updates showing progress are important to continued investment and re-investment (7); for a scientific project, this can mean everything from a blog with regular updates to a public release of data and analyses techniques. Providing aggregated crowdsourced data to the general public can also crowdsource the analysis. It sends a clear message that the data are owned by the public. Large datasets present opportunities for exploration, new technique development, and technique refinement. Providing the dataset to a collaborator network early on fosters opportunities for new analyses and directions. Collaborations that play on the strengths and expertise of each group can accelerate the rate of discovery. Making the full dataset available through open access mechanisms early in the analysis process is one of the simplest ways to disseminate data to multiple collaborators at a variety of institutions. However, data release can raise privacy concerns. Institutional Review Board (IRB) protocols must make it clear how participants’ de-identified data can and will be used. Participants’ de-identified microbial DNA sequence data and per-sample and per-individual metadata will be made publicly available if that is a goal of the project. Releasing data into repositories without monitoring may make dissemination easier, but it can also mean that after participants withdraw, their data cannot be retracted. Additionally, extensive care has to be taken to avoid compromising the anonymity of the participants. Such steps include separating clearly identifying participant data from survey information; limiting access to raw survey answers; and removing identifying information from publicly available survey results, even inadvertently identifying information. To this end, the surveyed data must be validated against possible identification threats; for example, a combination of date of birth and zip code could provide an attacker with the identified personal information of a participant.

PROSPECTS

Crowdfunding and crowdsourcing, while powerful ways to fund projects, recruit participants, and raise public awareness and interest, are novel approaches and have their own pitfalls. The nature of a crowdfunded project requires different approaches from traditional study designs and considerations, especially with respect to public relations and communication. Defining the intention and standing of the project is vital when individuals have a personal and financial stake. Communication of the project expectations, what participants can expect to receive, and progress of the project and of the participants’ specific samples, especially if there is a waiting period between financial contribution and tangible results, cannot be overlooked. The participants themselves also must be considered. The topic of the crowdfunded research project is almost certainly expected to draw in a specific subset of the population, leading to potentially biased sampling. The financial aspect of participation may exclude an additional subset due to inability to afford participation (although this can be ameliorated by supplementing crowdfunding by philanthropic contributions and/or foundation support). Additionally, considerations of how to reduce and respond to errors in the data must be considered. Data dissemination, in the form of individualized results, and sharing analysis tasks can also benefit or hinder projects. In summary, citizen science provides a new opportunity for microbiome research. While it is unlikely to replace grant funding from government and private agencies, it may act as an additional mechanism for answering questions that are difficult to explore through traditional means.

18 in total

Review 1. Microbial endocrinology: the interplay between the microbiota and the endocrine system.

Authors: Hadar Neuman; Justine W Debelius; Rob Knight; Omry Koren
Journal: FEMS Microbiol Rev Date: 2015-02-19 Impact factor: 16.408

Review 2. The Inadmissibility of What We Eat in America and NHANES Dietary Data in Nutrition and Obesity Research and the Scientific Formulation of National Dietary Guidelines.

Authors: Edward Archer; Gregory Pavela; Carl J Lavie
Journal: Mayo Clin Proc Date: 2015-06-09 Impact factor: 7.616

3. Human genetics shape the gut microbiome.

Authors: Julia K Goodrich; Jillian L Waters; Angela C Poole; Jessica L Sutter; Omry Koren; Ran Blekhman; Michelle Beaumont; William Van Treuren; Rob Knight; Jordana T Bell; Timothy D Spector; Andrew G Clark; Ruth E Ley
Journal: Cell Date: 2014-11-06 Impact factor: 41.582

4. Xenobiotics shape the physiology and gene expression of the active human gut microbiome.

Authors: Corinne Ferrier Maurice; Henry Joseph Haiser; Peter James Turnbaugh
Journal: Cell Date: 2013-01-17 Impact factor: 41.582

Review 5. Interactions between the microbiota and the immune system.

Authors: Lora V Hooper; Dan R Littman; Andrew J Macpherson
Journal: Science Date: 2012-06-06 Impact factor: 47.728

6. Gut microbiota from twins discordant for obesity modulate metabolism in mice.

Authors: Vanessa K Ridaura; Jeremiah J Faith; Federico E Rey; Jiye Cheng; Alexis E Duncan; Andrew L Kau; Nicholas W Griffin; Vincent Lombard; Bernard Henrissat; James R Bain; Michael J Muehlbauer; Olga Ilkayeva; Clay F Semenkovich; Katsuhiko Funai; David K Hayashi; Barbara J Lyle; Margaret C Martini; Luke K Ursell; Jose C Clemente; William Van Treuren; William A Walters; Rob Knight; Christopher B Newgard; Andrew C Heath; Jeffrey I Gordon
Journal: Science Date: 2013-09-06 Impact factor: 47.728

7. Structure, function and diversity of the healthy human microbiome.

Authors:
Journal: Nature Date: 2012-06-13 Impact factor: 49.962

8. Human gut microbiome viewed across age and geography.

Authors: Tanya Yatsunenko; Federico E Rey; Mark J Manary; Indi Trehan; Maria Gloria Dominguez-Bello; Monica Contreras; Magda Magris; Glida Hidalgo; Robert N Baldassano; Andrey P Anokhin; Andrew C Heath; Barbara Warner; Jens Reeder; Justin Kuczynski; J Gregory Caporaso; Catherine A Lozupone; Christian Lauber; Jose Carlos Clemente; Dan Knights; Rob Knight; Jeffrey I Gordon
Journal: Nature Date: 2012-05-09 Impact factor: 49.962

9. Unlocking the potential of metagenomics through replicated experimental design.

Authors: Rob Knight; Janet Jansson; Dawn Field; Noah Fierer; Narayan Desai; Jed A Fuhrman; Phil Hugenholtz; Daniel van der Lelie; Folker Meyer; Rick Stevens; Mark J Bailey; Jeffrey I Gordon; George A Kowalchuk; Jack A Gilbert
Journal: Nat Biotechnol Date: 2012-06-07 Impact factor: 54.908

10. Meta-analyses of human gut microbes associated with obesity and IBD.

Authors: William A Walters; Zech Xu; Rob Knight
Journal: FEBS Lett Date: 2014-10-13 Impact factor: 4.124

13 in total

Review 1. Opportunities and Challenges for Environmental Exposure Assessment in Population-Based Studies.

Authors: Chirag J Patel; Jacqueline Kerr; Duncan C Thomas; Bhramar Mukherjee; Beate Ritz; Nilanjan Chatterjee; Marta Jankowska; Juliette Madan; Margaret R Karagas; Kimberly A McAllister; Leah E Mechanic; M Daniele Fallin; Christine Ladd-Acosta; Ian A Blair; Susan L Teitelbaum; Christopher I Amos
Journal: Cancer Epidemiol Biomarkers Prev Date: 2017-07-14 Impact factor: 4.254

Review 2. Clinician Guide to Microbiome Testing.

Authors: Christopher Staley; Thomas Kaiser; Alexander Khoruts
Journal: Dig Dis Sci Date: 2018-09-28 Impact factor: 3.199

3. Visualizing the invisible: class excursions to ignite children's enthusiasm for microbes.

Authors: Terry J McGenity; Amare Gessesse; John E Hallsworth; Esther Garcia Cela; Carol Verheecke-Vaessen; Fengping Wang; Max Chavarría; Max M Haggblom; Søren Molin; Antoine Danchin; Eddy J Smid; Cédric Lood; Charles S Cockell; Corinne Whitby; Shuang-Jiang Liu; Nancy P Keller; Lisa Y Stein; Seth R Bordenstein; Rup Lal; Olga C Nunes; Lone Gram; Brajesh K Singh; Nicole S Webster; Cindy Morris; Sharon Sivinski; Saskia Bindschedler; Pilar Junier; André Antunes; Bonnie K Baxter; Paola Scavone; Kenneth Timmis
Journal: Microb Biotechnol Date: 2020-05-14 Impact factor: 5.813

4. A multiple-dimension model for microbiota of patients with colorectal cancer from normal participants and other intestinal disorders.

Authors: Jian Shen; Gulei Jin; Zhengliang Zhang; Jun Zhang; Yan Sun; Xiaoxiao Xie; Tingting Ma; Yongze Zhu; Yaoqiang Du; Yaofang Niu; Xinwei Shi
Journal: Appl Microbiol Biotechnol Date: 2022-02-26 Impact factor: 4.813

5. Engaging Adolescent and Young Adults in Microbiome Sample Self-Collection: Strategies for Success.

Authors: Chen X Chen; Janet S Carpenter; Tabitha Murphy; Patricia Brooks; J Dennis Fortenberry
Journal: Biol Res Nurs Date: 2020-12-09 Impact factor: 2.318

Review 6. Optimizing methods and dodging pitfalls in microbiome research.

Authors: Dorothy Kim; Casey E Hofstaedter; Chunyu Zhao; Lisa Mattei; Ceylan Tanes; Erik Clarke; Abigail Lauder; Scott Sherrill-Mix; Christel Chehoud; Judith Kelsen; Máire Conrad; Ronald G Collman; Robert Baldassano; Frederic D Bushman; Kyle Bittinger
Journal: Microbiome Date: 2017-05-05 Impact factor: 14.650

7. Microbiome Responses to an Uncontrolled Short-Term Diet Intervention in the Frame of the Citizen Science Project.

Authors: Natalia S Klimenko; Alexander V Tyakht; Anna S Popenko; Anatoly S Vasiliev; Ilya A Altukhov; Dmitry S Ischenko; Tatiana I Shashkova; Daria A Efimova; Dmitri A Nikogosov; Dmitrii A Osipenko; Sergey V Musienko; Kseniya S Selezneva; Ancha Baranova; Alexander M Kurilshikov; Stepan M Toshchakov; Aleksei A Korzhenkov; Nazar I Samarov; Margarita A Shevchenko; Alina V Tepliuk; Dmitry G Alexeev
Journal: Nutrients Date: 2018-05-08 Impact factor: 5.717

8. Military-Related Exposures, Social Determinants of Health, and Dysbiosis: The United States-Veteran Microbiome Project (US-VMP).

Authors: Lisa A Brenner; Andrew J Hoisington; Kelly A Stearns-Yoder; Christopher E Stamper; Jared D Heinze; Teodor T Postolache; Daniel A Hadidi; Claire A Hoffmire; Maggie A Stanislawski; Christopher A Lowry
Journal: Front Cell Infect Microbiol Date: 2018-11-19 Impact factor: 5.293

9. Next steps in studying the human microbiome and health in prospective studies, Bethesda, MD, May 16-17, 2017.

Authors: Rashmi Sinha; Habibul Ahsan; Martin Blaser; J Gregory Caporaso; Joseph Russell Carmical; Andrew T Chan; Anthony Fodor; Mitchell H Gail; Curtis C Harris; Kathy Helzlsouer; Curtis Huttenhower; Rob Knight; Heidi H Kong; Gabriel Y Lai; Diane Leigh Smith Hutchinson; Loic Le Marchand; Hongzhe Li; Michael J Orlich; Jianxin Shi; Ann Truelove; Mukesh Verma; Emily Vogtmann; Owen White; Walter Willett; Wei Zheng; Somdat Mahabir; Christian Abnet
Journal: Microbiome Date: 2018-11-26 Impact factor: 14.650

10. Bayesian hierarchical negative binomial models for multivariable analyses with applications to human microbiome count data.

Authors: Amanda H Pendegraft; Boyi Guo; Nengjun Yi
Journal: PLoS One Date: 2019-08-22 Impact factor: 3.240