Literature DB >> 32566740

Characterizing dysbiosis of gut microbiome in PD: evidence for overabundance of opportunistic pathogens.

Zachary D Wallen1, Mary Appah1, Marissa N Dean1, Cheryl L Sesler1, Stewart A Factor2, Eric Molho3, Cyrus P Zabetian4, David G Standaert1, Haydeh Payami1.   

Abstract

In Parkinson's disease (PD), gastrointestinal features are common and often precede the motor signs. Braak and colleagues proposed that PD may start in the gut, triggered by a pathogen, and spread to the brain. Numerous studies have examined the gut microbiome in PD; all found it to be altered, but found inconsistent results on associated microorganisms. Studies to date have been small (N = 20 to 306) and are difficult to compare or combine due to varied methodology. We conducted a microbiome-wide association study (MWAS) with two large datasets for internal replication (N = 333 and 507). We used uniform methodology when possible, interrogated confounders, and applied two statistical tests for concordance, followed by correlation network analysis to infer interactions. Fifteen genera were associated with PD at a microbiome-wide significance level, in both datasets, with both methods, with or without covariate adjustment. The associations were not independent, rather they represented three clusters of co-occurring microorganisms. Cluster 1 was composed of opportunistic pathogens and all were elevated in PD. Cluster 2 was short-chain fatty acid (SCFA)-producing bacteria and all were reduced in PD. Cluster 3 was carbohydrate-metabolizing probiotics and were elevated in PD. Depletion of anti-inflammatory SCFA-producing bacteria and elevated levels of probiotics are confirmatory. Overabundance of opportunistic pathogens is an original finding and their identity provides a lead to experimentally test their role in PD.
© The Author(s) 2020.

Entities:  

Keywords:  Genomics; Parkinson's disease

Year:  2020        PMID: 32566740      PMCID: PMC7293233          DOI: 10.1038/s41531-020-0112-6

Source DB:  PubMed          Journal:  NPJ Parkinsons Dis        ISSN: 2373-8057


Introduction

Parkinson’s disease (PD) is a common, progressive, and debilitating disease, which currently cannot be prevented or cured. With the exception of rare genetic forms, the cause of PD is unknown. Many susceptibility loci[1] and environmental risk factors[2] have been identified, but each has a modest effect on risk and none is sufficient to cause disease. Gene–environment interaction studies have not been able to identify a causative combination[3-6]. The triggers that cause PD are unknown. The emerging information about the importance of the gut microbiome in human health and disease[7], together with the well-established connection between PD and the gut including common and early occurrence of constipation[8], inflammation[9], and increased gut membrane permeability[10], have raised the possibility that microorganisms in the gut may play a role in PD pathogenesis and prompted a fast growing literature on studies conducted in humans and animal models[11-30]. Every study that has compared the global composition of the gut microbiome in PD vs. controls found it to be significantly altered; in contrast, attempts to identify PD-associated microorganisms have produced inconsistent results[31,32]. Low reproducibility has been attributed to small sample sizes (missing true associations due to low power), relaxed statistical thresholds (inflating false-positive results), and publishing without a replication dataset (required for genomic studies). Differences in methods of sample collection, transportation and storage, DNA extraction, sequencing, bioinformatics, and statistics can all contribute to inter-study variations. The choice of taxonomic resolution for analysis (PD has been tested at all levels from phylum to species) and the inconsistent taxonomic assignments and nomenclature used in various reference databases add to the confusion when comparing results. Last but not least is confounding by heterogeneity in the populations that were studied: PD is heterogenous and so is the microbiome. PD subtypes cannot be readily identified; thus, patient populations are inevitably varied. A myriad of factors can affect the microbiome ranging from diet, health, and medication to cultural habits, lifestyles, race, and geography[33,34]. Identifying microorganisms involved in the dysbiosis of the microbiome is essential for understanding their role in disease. We conducted a hypothesis-free microbiome-wide association study (MWAS) modeled after and using the standards of rigors that are used in genome-wide association studies (GWAS), but with analytic methods that are appropriate for the high-dimensionality and compositionality of the microbiome data. We used two datasets to allow internal replication. The sample sizes in prior PD-microbiome studies have ranged from 10 to 197 PD cases and 10 to 130 controls[32]. The largest published study (197 cases and 130 controls) is the dataset 1 in the present study, re-analyzed here with a more advanced bioinformatics pipeline than we previously published[16]. In addition, we present an unpublished independent dataset with 323 cases of PD and 184 controls, analyzed in parallel to dataset 1. Two large datasets allowed for internal replication and power to detect both rare and common signals. We standardized data collection and processing as much as possible across the two datasets, and for variations that could not be handled in study design, we used statistical techniques to make appropriate adjustments. We used two different statistical tests for MWAS and focused only on results that were reproducibly significant across methods and across datasets. We employed correlation network analysis to infer interactions among PD-associated microorganisms. We were able to confirm some of the previously reported associations with common taxa, and, in addition, identified associations with rare microorganisms that are commensal, but can become opportunistic pathogens in immune-compromised hosts.

Results

Dramatic difference between datasets

We discovered a remarkable difference between the two datasets, despite efforts to standardize data collection and analysis (Fig. 1). All subjects lived in the United States. Diagnosis, subject selection, and data collection were performed by the NeuroGenetics Research Consortium (NGRC) investigators at the four NGRC-affiliated movement disorder clinics, using standardized methods. Dataset 1 (212 PD and 136 controls) was collected in Seattle, WA, Albany, NY, and Atlanta, GA, in 2014. Dataset 2 (323 PD and 184 controls) was collected in Birmingham, AL, during 2015–2017. We used uniform protocols for sample collection, transportation, and storage for the two datasets. Stool was collected using the same kit, DNA was extracted using the same chemistry, and the 16S rRNA gene V4 region was sequenced using the same primers, but in different laboratories, resulting in 10× greater sequence depth in dataset 2 than dataset 1. The same pipeline was used on the two datasets to process the sequences and assign taxonomic classification. Yet, principal component analysis (PCA)[35] revealed the composition of the microbiome of the samples to be strikingly different in the two datasets (Fig. 1) and the difference was statistically significant (P < 1E − 5, tested using permutational multivariate analysis of variance (PERMANOVA)). The separation of datasets was evident in cases and in controls, in the same pattern. Greater sequence depth in dataset 2 was a significant contributor to this disparity, but not the sole explanation, because the difference between datasets was still significant once sequence depth was adjusted for (PERMANOVA P < 1E − 5). For all statistical tests (global composition, MWAS, correlations, and network analysis), the two datasets were analyzed separately for two reasons: (i) for independent validation and (ii) to avoid confounding by mixing two clearly different datasets.
Fig. 1

The gut microbiome compositions of the two dataset differed significantly.

Principal component (PC) analysis was used to generate the graphs for PD cases (left, N = 522), controls (middle, N = 316), and cases and controls combined (right, N = 838), where each point represents the composition of the gut microbiome of one individual and distances indicate degree of similarity to other individuals. Percentages on the x-axis and y-axis correspond to the percent variation in gut microbiome compositions explained by PC1 and PC2. The difference between dataset 1 and dataset 2 was formally tested using PERMANOVA and was significant (P < 1E-5). Dataset 1: red (Albany, NY), purple (Seattle, WA), and green (Atlanta, GA). Dataset 2: blue (Birmingham, AL).

The gut microbiome compositions of the two dataset differed significantly.

Principal component (PC) analysis was used to generate the graphs for PD cases (left, N = 522), controls (middle, N = 316), and cases and controls combined (right, N = 838), where each point represents the composition of the gut microbiome of one individual and distances indicate degree of similarity to other individuals. Percentages on the x-axis and y-axis correspond to the percent variation in gut microbiome compositions explained by PC1 and PC2. The difference between dataset 1 and dataset 2 was formally tested using PERMANOVA and was significant (P < 1E-5). Dataset 1: red (Albany, NY), purple (Seattle, WA), and green (Atlanta, GA). Dataset 2: blue (Birmingham, AL).

Metadata and confounders

Metadata were collected using two self-administered questionnaires and medical records (Supplementary Table 1). An Environmental and Family History Questionnaire (EFQ)[4,36] was used to collect data relevant to PD. A Gut Microbiome Questionnaire (GMQ)[16] was completed immediately after stool collection and gathered data relevant to the microbiome including diet, gastrointestinal problems, medical conditions, and use of medications. PD medications that subjects were taking at the time of stool collection were extracted from medical records by clinical investigators. The aim of this study was to identify reproducible signals of association between PD and microbiota, and to that end, metadata were used as potential confounders, not as research questions. For example, we did not set out to test the effects of constipation, levodopa, or any of the 47 variables listed in Supplementary Table 1 on the microbiome, because, although of interest, that was not the primary aim of the study and doing so would have reduced the power for the primary aim. To identify which of the variables might confound the study, we tested the distribution of each variable in cases vs. controls and those that differed at a conservative uncorrected P < 0.05 in at least one dataset were tagged as potential confounders (Supplementary Table 1). These included, most notably, constipation in the past 3 months (more common in PD, P = 6E − 16 dataset 1, P = 6E − 10 dataset 2) and gastrointestinal discomfort on the day of stool collection (more common in PD, P = 2E − 9 dataset 1, P = 4E − 6 dataset 2), as well as sex and age, body mass index (BMI), weight loss, fruits or vegetable intake, alcohol use, and stool sample travel time. These variables and geographic site were tested along with case–control status in PERMANOVA (global composition test) and those that were significant were used as covariates in analysis of composition of microbiomes (ANCOM) (differential abundance test for MWAS). Thus, the results on both the global composition test and PD-associated taxa in MWAS have been adjusted for known potential confounders, except PD medications, which had to be handled differently because of collinearity with PD (see section on “Cause of disease or consequence of medication”).

Global composition of microbiome

First, we tested the difference between PD and controls in the global composition of the gut microbiome (β-diversity, Table 1). Case vs. control status was tested once by itself, once with all potential confounders in the model in a marginal test where each variable was tested while being adjusted for all others in the model, and once stratified by PD medication (Table 1). To gauge the effect of distance metric on the results, all tests were repeated with Aitchison[35], generalized UniFrac (GUniFrac)[37], and Canberra[38] distances. Tests were conducted using PERMANOVA[39] with 99,999 permutations limiting maximum achievable significance to P = 1E − 5.
Table 1

Effect of PD and other key variables on the global composition of gut microbiome.

Dataset 1Dataset 2
AitchisonGUniFracCanberraAitchisonGUniFracCanberra
%varP%varP%varP%varP%varP%varP
Model A. All PD vs. Control0.71<1E − 051.38<1E − 050.57<1E − 050.56<1E − 050.89<1E − 050.38<1E − 05
Model B. PD and confounders
Geography (Seattle, Atlanta, Albany)0.992E − 031.100.020.842E − 03
PD (case vs. control)0.581E − 031.127E − 050.534E − 050.48<1E − 50.629E − 050.322E − 05
Sex (male vs. female)0.519E − 030.520.080.492E − 040.482E − 050.492E − 030.342E − 05
Age (continuous)0.450.040.765E − 030.430.010.45<1E − 50.621E − 040.343E − 05
GI discomfort on day of stool collection (yes vs. no)0.450.040.400.260.439E − 030.240.20.220.390.230.2
Fruits or vegetables daily (yes vs. no)0.380.30.550.050.420.02
Constipation in the past 3 months (yes vs. no)0.340.770.380.350.370.390.260.060.380.020.240.05
BMI (continuous)0.400.210.480.120.390.130.333E − 030.340.040.276E − 03
Drinks alcohol (yes vs. no)0.350.660.310.640.370.350.260.070.280.150.240.1
Lost >10 pounds in past year (yes vs. no)0.340.710.360.420.360.640.200.870.150.910.210.71
Stool sample travel time (continuous)0.350.660.700.010.360.580.230.260.30.090.240.11
Model C. Removing PD medications
PD not on levodopa vs. control0.930.011.120.040.780.020.480.170.540.160.470.11
PD not on COMT inhibitors vs. control0.669E − 051.27<1E − 050.56<1E − 050.55<1E − 050.88<1E − 050.38<1E − 05
PD not on anticholinergics vs. control0.73<1E − 051.31<1E − 050.58<1E − 050.57<1E − 050.922E − 050.39<1E − 05
PD not on MAO-B inhibitors vs. control0.81<1E − 051.503E − 050.66<1E − 050.71<1E − 051.07<1E − 050.45<1E − 05
PD not on dopamine agonists vs. control0.812E − 041.513E − 050.70<1E − 050.571E − 040.803E − 040.444E − 05
PD not on amantadine vs. control0.733E − 051.37<1E − 050.60<1E − 050.483E − 050.743E − 050.37<1E − 05
PD not on any PD drug vs. control1.000.070.890.220.820.060.480.580.520.370.480.79

Model A tested PD vs. control without any other variable in the model. Sample size for Model A was 201 cases and 132 controls in dataset 1 and 323 cases and 184 controls in dataset 2. Model B included 11 variables (including case/control) and each variable was tested while adjusting for the other 10, without priority. Model B included subset of samples that had complete data on all 11 variables: N = 160 cases and 111 controls in dataset 1 and 283 cases, and 167 controls in dataset 2. For Model C, patients were stratified by each PD medication they were taking at the time of stool collection; those not on medication (varying N for different medications, see Supplementary Table 1) were tested against controls (N = 132 in dataset 1 and 184 in dataset 2). Power was low for patients not on L-dopa (N patients <50) and patients not on any PD medication (<20) due to small sample sizes, but not for other medications (N patients not on medication = 88–179 in dataset 1 and 153–312 in dataset 2). All analyses were repeated with three different distance measures: Aitchison, Canberra, and GUniFrac (generalized UniFrac). % var was the inter-individual variation explained by each variable. P-value was calculated using 99,999 permutations, setting the highest achievable significance at P = 1E − 05.

Effect of PD and other key variables on the global composition of gut microbiome. Model A tested PD vs. control without any other variable in the model. Sample size for Model A was 201 cases and 132 controls in dataset 1 and 323 cases and 184 controls in dataset 2. Model B included 11 variables (including case/control) and each variable was tested while adjusting for the other 10, without priority. Model B included subset of samples that had complete data on all 11 variables: N = 160 cases and 111 controls in dataset 1 and 283 cases, and 167 controls in dataset 2. For Model C, patients were stratified by each PD medication they were taking at the time of stool collection; those not on medication (varying N for different medications, see Supplementary Table 1) were tested against controls (N = 132 in dataset 1 and 184 in dataset 2). Power was low for patients not on L-dopa (N patients <50) and patients not on any PD medication (<20) due to small sample sizes, but not for other medications (N patients not on medication = 88–179 in dataset 1 and 153–312 in dataset 2). All analyses were repeated with three different distance measures: Aitchison, Canberra, and GUniFrac (generalized UniFrac). % var was the inter-individual variation explained by each variable. P-value was calculated using 99,999 permutations, setting the highest achievable significance at P = 1E − 05. PD microbiomes differed significantly from control microbiomes, in both datasets, with every distance metric measured (P < 1E − 5, Table 1). The PD effect was significant and independent of all analyzed confounders, including geography, constipation, gastrointestinal discomfort, sex, age, BMI, fruit or vegetable intake, alcohol use, and stool sample travel time. Results were in agreement with population studies in detecting significant effects of sex, age, BMI, gastrointestinal issues, and diet on the microbiome[33,34], and with other PD studies in detecting evidence for dysbiosis in PD[11-30].

Identification of PD-associated microorganisms

To identify PD-associated microorganisms, we conducted MWAS, testing differences between cases and controls in the relative abundances of genera. We conducted MWAS on each dataset separately to test whether results replicate and also to avoid confounding by the heterogeneity between datasets. Each dataset was tested with two methods to test analytic concordance: once using ANCOM[40] and again using Kruskal–Wallis (KW) rank sum test[41]. We chose ANCOM, because among the numerous methods that have been proposed, ANCOM singularly met three key criteria: incorporates compositionality of the eco-system, allows covariate adjustment, and keeps false-positive rate low while maintaining power[40,42]. Differential abundance was tested hypothesis-free microbiome-wide: ANCOM included all 445 genera detected in dataset 1 and 561 genera in dataset 2; KW included 109 genera in dataset 1 and 163 in dataset 2 (excluding unassigned genera and genera present in <10% of samples). In ANCOM, dataset-specific covariates were included and adjusted for (see MWAS section in Methods). Resulting significance metrics were corrected for multiple testing, using false discovery rate (FDR)-corrected P-values to calculate W in ANCOM and Benjamini–Hochberg FDR in KW. We detected association signals for 15 genera that were microbiome-wide significant by both methods and reproduced robustly in the two datasets, with or without covariate adjustment (Table 2 and Fig. 2). Five genera had higher abundances in PD than in controls: Porphyromonas, Prevotella, Corynebacterium_1, Bifidobacterium, and Lactobacillus. Ten genera had lower abundances in PD than controls: Faecalibacterium, Agathobacter, Blautia, Roseburia, Fusicatenibacter, Lachnospira, Butyricicoccus, Lachnospiraceae_ND3007_group, Lachnospiraceae_UCG-004, and Oscillospira. Complete MWAS results are in Supplementary Tables 2–5.
Table 2

PD-associated genera identified via MWAS.

PD-associated generaMWAS significant In Dataset 1MWAS significant In Dataset 2ClusterPubMed
PhylumClassOrderFamilyGenusMRAFCANCOM (W)KW (FDR)MRAFCANCOM (W)KW (FDR)
BacteroidetesBacteroidiaBacteroidalesPorphyromonadaceaePorphyromonas0.0014.204061E − 030.0012.944682E − 021Opp path
BacteroidetesBacteroidiaBacteroidalesPrevotellaceaePrevotella0.0022.564006E − 030.0014.394632E − 021Opp path
ActinobacteriaActinobacteriaCorynebacterialesCorynebacteriaceaeCorynebacterium_10.0011.963601E − 020.0022.534658E − 031Opp path
FirmicutesClostridiaClostridialesRuminococcaceaeFaecalibacterium0.060.634111E − 030.040.665353E − 032SCFA
FirmicutesClostridiaClostridialesLachnospiraceaeAgathobacter0.040.534412E − 040.020.565456E − 052SCFA
FirmicutesClostridiaClostridialesLachnospiraceaeBlautia0.020.684102E − 030.020.795334E − 022SCFA
FirmicutesClostridiaClostridialesLachnospiraceaeRoseburia0.020.483914E − 030.010.605413E − 042SCFA
FirmicutesClostridiaClostridialesLachnospiraceaeFusicatenibacter0.0040.563882E − 020.0050.695213E − 022SCFA
FirmicutesClostridiaClostridialesLachnospiraceaeLachnospira0.0040.804261E − 030.0050.685211E − 022SCFA
FirmicutesClostridiaClostridialesRuminococcaceaeButyricicoccus0.0020.663827E − 030.0020.685056E − 022SCFA
FirmicutesClostridiaClostridialesLachnospiraceaeLachnospiraceae_ND30070.0010.374182E − 040.0010.595386E − 042NC
FirmicutesClostridiaClostridialesLachnospiraceaeLachnospiraceae_UCG-0040.0010.483842E − 020.0010.385441E − 052NC
FirmicutesClostridiaClostridialesRuminococcaceaeOscillospira0.00060.653672E − 020.00050.645251E − 022NC
ActinobacteriaActinobacteriaBifidobacterialesBifidobacteriaceaeBifidobacterium0.011.834101E − 030.012.725536E − 073Probiotic
FirmicutesBacilliLactobacillalesLactobacillaceaeLactobacillus0.00046.614072E − 040.0041.574581E − 023Probiotic

MWAS was conducted in two datasets independently, testing differential abundance of genera in PD vs. controls, using two statistical methods (ANCOM and KW). The 15 genera shown are those that achieved microbiome-wide significance for association with PD in both datasets and by both methods, with (ANCOM) and without (KW) covariate adjustment (see “Methods” for covariates). Sample size: ANCOM included subset of subjects for whom complete data were available on all covariates tested: N = 171 cases and 117 controls in dataset 1 and 306 cases and 177 controls in dataset 2. KW included all subjects: N = 201 cases and 132 controls in dataset 1, and 323 cases and 184 controls in dataset 2. Clusters were identified hypothesis-free using correlation network analysis (Fig. 3). PubMed search was conducted after analyses were completed using genus and species name as search term (Supplementary Table 6). Function (opportunistic pathogen, SCFA, probiotic) was taken strictly from PubMed and is likely oversimplified. Microbiota have been studied under a narrow lens of what is already known about them. Opportunistic pathogens are often looked for in clinical specimen with infection, SCFA bacteria are studied intensively for their anti-inflammatory and other protective effects, and probiotics are understudied but highly advertised. The full function of the microbiota are not yet fully understood. In comparing results across published studies, note that a “genus” classified by one study may not be the same as the genus by the same name in another study. Taxonomic classifications and nomenclature are not standardized across reference databases, e.g., “Prevotella”, as annotated in some databases including NCBI, is further divided by SILVA (used here) into several non-monophyletic groups that SILVA calls, Prevotella_2, Prevotella_6, Prevotella_7, Prevotella_9, and Prevotella (see Discussion).

ANCOM analysis of composition of microbiomes. FC fold change in patients (MRA in patients/MRA in controls). FDR Benjamini–Hochberg false discovery rate (multiple testing corrected P-value). KW Kruskal–Wallis. MWAS microbiome-wide association study. MRA mean relative abundance in controls. NC not uncultured (uncharacterized). Opp path opportunistic pathogen (often commensal microorganism that can become pathogenic in immune-compromised individuals). Probiotic carbohydrate-metabolizing bacteria commonly known as probiotics. SCFA short-chain fatty acid-producing bacteria. W ANCOM score indicating the number of times a genus achieved FDR 0.05 as compared with other genera (maximum W possible: 444 in dataset 1, 560 in dataset 2, threshold 0.8 was used for significance, all shown genera were above significance threshold).

Fig. 2

Differential abundances of 15 PD-associated genera replicated in two datasets.

Relative abundances in PD cases (blue) and controls (orange) were plotted as log10 scale on the y-axis. Sample size was 201 cases and 132 controls in dataset 1, and 323 cases and 184 controls in dataset 2. Each dot represents a sample, plotted according to the relative abundance of the genus in the sample. The notch in each box indicates the confidence interval of the median. The bottom, middle, and top boundaries of each box represent the first, second (median), and third quartiles of the relative abundances. The whiskers (lines extending from the top and bottom of the box and ending in horizontal cap) extend to points within 1.5 times the interquartile range. The points extending above the whiskers are outliers.

PD-associated genera identified via MWAS. MWAS was conducted in two datasets independently, testing differential abundance of genera in PD vs. controls, using two statistical methods (ANCOM and KW). The 15 genera shown are those that achieved microbiome-wide significance for association with PD in both datasets and by both methods, with (ANCOM) and without (KW) covariate adjustment (see “Methods” for covariates). Sample size: ANCOM included subset of subjects for whom complete data were available on all covariates tested: N = 171 cases and 117 controls in dataset 1 and 306 cases and 177 controls in dataset 2. KW included all subjects: N = 201 cases and 132 controls in dataset 1, and 323 cases and 184 controls in dataset 2. Clusters were identified hypothesis-free using correlation network analysis (Fig. 3). PubMed search was conducted after analyses were completed using genus and species name as search term (Supplementary Table 6). Function (opportunistic pathogen, SCFA, probiotic) was taken strictly from PubMed and is likely oversimplified. Microbiota have been studied under a narrow lens of what is already known about them. Opportunistic pathogens are often looked for in clinical specimen with infection, SCFA bacteria are studied intensively for their anti-inflammatory and other protective effects, and probiotics are understudied but highly advertised. The full function of the microbiota are not yet fully understood. In comparing results across published studies, note that a “genus” classified by one study may not be the same as the genus by the same name in another study. Taxonomic classifications and nomenclature are not standardized across reference databases, e.g., “Prevotella”, as annotated in some databases including NCBI, is further divided by SILVA (used here) into several non-monophyletic groups that SILVA calls, Prevotella_2, Prevotella_6, Prevotella_7, Prevotella_9, and Prevotella (see Discussion).
Fig. 3

Correlation network analysis mapped PD-associated genera to three polymicrobial clusters.

Pairwise correlations in relative abundances were calculated for all genera microbiome-wide and were used to detect clusters of co-occurring microorganisms. To display, we used an arbitrary correlation coefficient threshold at r ≥ |0.4| to connect the genera that were correlated. All correlations noted were significant at P < 3E − 4 (the limit for 3000 permutations). Here we show the result for PD cases in dataset 2, because it had larger sample size (N = 323 cases) and greater sequencing depth than dataset 1 (see Supplementary Fig. 1 for cases and controls in dataset 1 and dataset 2). a Algorithm-detected clusters shown in different colors. b The algorithm-detected clusters, as in a but shown in gray, and PD-associated genera highlighted in blue (if increased in PD) or red (if decreased in PD). c Zoomed in version of b. The 15 PD-associated genera fell in three clusters. Cluster 1 was a tightly correlated cluster of microorganisms (r approaching 0.8), which included Porphyromonas, Prevotella, and Corynebacterium_1 (all elevated in PD). Cluster 2 included the ten genera that were reduced in PD, eight of which are shown connected at r ≥ 0.4, and two are unconnected but correlated significantly (P = 3E − 4) with the others in the cluster at r = 0.25 and r = 0.35. Lactobacillus and Bifidobacterium (correlated at r = 0.33 (P < 3E − 4)) were denoted cluster 3. For unconnected genera (r < 0.4), the proximity between nodules does not imply relatedness, e.g., Oscillospira (M) falls closer to Lactobacillus (N) than to Roseburia (G) but it is correlated significantly with Roseburia (r = 0.25, P < 3E − 4) and not with Lactobacillus (r = 0.04, P = 0.44).

ANCOM analysis of composition of microbiomes. FC fold change in patients (MRA in patients/MRA in controls). FDR Benjamini–Hochberg false discovery rate (multiple testing corrected P-value). KW Kruskal–Wallis. MWAS microbiome-wide association study. MRA mean relative abundance in controls. NC not uncultured (uncharacterized). Opp path opportunistic pathogen (often commensal microorganism that can become pathogenic in immune-compromised individuals). Probiotic carbohydrate-metabolizing bacteria commonly known as probiotics. SCFA short-chain fatty acid-producing bacteria. W ANCOM score indicating the number of times a genus achieved FDR 0.05 as compared with other genera (maximum W possible: 444 in dataset 1, 560 in dataset 2, threshold 0.8 was used for significance, all shown genera were above significance threshold).

Differential abundances of 15 PD-associated genera replicated in two datasets.

Relative abundances in PD cases (blue) and controls (orange) were plotted as log10 scale on the y-axis. Sample size was 201 cases and 132 controls in dataset 1, and 323 cases and 184 controls in dataset 2. Each dot represents a sample, plotted according to the relative abundance of the genus in the sample. The notch in each box indicates the confidence interval of the median. The bottom, middle, and top boundaries of each box represent the first, second (median), and third quartiles of the relative abundances. The whiskers (lines extending from the top and bottom of the box and ending in horizontal cap) extend to points within 1.5 times the interquartile range. The points extending above the whiskers are outliers.

Correlation network analysis

We questioned whether the 15 association signals were independent. We used hypothesis-free correlation network analysis[43] to infer ecological networks of interacting organisms microbiome-wide (Fig. 3 and Supplementary Fig. 1). The PD-associated genera mapped to three polymicrobial clusters. Porphyromonas, Prevotella, and Corynebacterium_1, which were elevated in PD, mapped to a community of highly correlated organisms, which we denoted as cluster 1. Cluster 1 was the most distinct cluster in the microbiome with correlations reaching r = 0.82 (P < 3E − 4), the highest in the microbiome in our data. The ten genera that were depleted in PD formed cluster 2, where eight of them clustered at r ≥ 0.4 (P < 3E − 4), and the remaining two (Oscillospira and Lachnospiraceae_UCG-004), clustered with the others at r = 0.25 (P < 3E − 4) and r = 0.35 (P < 3E − 4). Lactobacillus and Bifidobacterium, both elevated in PD, were correlated with each other at r = 0.33 (P < 3E − 4), which we denoted as cluster 3. Correlations within each cluster were all in the positive direction, i.e., members of clusters 1 tended to increase in abundance together, cluster 2 decreased together, and cluster 3 increased together.

Correlation network analysis mapped PD-associated genera to three polymicrobial clusters.

Pairwise correlations in relative abundances were calculated for all genera microbiome-wide and were used to detect clusters of co-occurring microorganisms. To display, we used an arbitrary correlation coefficient threshold at r ≥ |0.4| to connect the genera that were correlated. All correlations noted were significant at P < 3E − 4 (the limit for 3000 permutations). Here we show the result for PD cases in dataset 2, because it had larger sample size (N = 323 cases) and greater sequencing depth than dataset 1 (see Supplementary Fig. 1 for cases and controls in dataset 1 and dataset 2). a Algorithm-detected clusters shown in different colors. b The algorithm-detected clusters, as in a but shown in gray, and PD-associated genera highlighted in blue (if increased in PD) or red (if decreased in PD). c Zoomed in version of b. The 15 PD-associated genera fell in three clusters. Cluster 1 was a tightly correlated cluster of microorganisms (r approaching 0.8), which included Porphyromonas, Prevotella, and Corynebacterium_1 (all elevated in PD). Cluster 2 included the ten genera that were reduced in PD, eight of which are shown connected at r ≥ 0.4, and two are unconnected but correlated significantly (P = 3E − 4) with the others in the cluster at r = 0.25 and r = 0.35. Lactobacillus and Bifidobacterium (correlated at r = 0.33 (P < 3E − 4)) were denoted cluster 3. For unconnected genera (r < 0.4), the proximity between nodules does not imply relatedness, e.g., Oscillospira (M) falls closer to Lactobacillus (N) than to Roseburia (G) but it is correlated significantly with Roseburia (r = 0.25, P < 3E − 4) and not with Lactobacillus (r = 0.04, P = 0.44).

Functional characteristics

Analyses so far were all hypothesis-free, data-driven, and blind to the functional relevance of the microorganisms. Having identified the associations and their corresponding clusters, we broke the blind by searching PubMed. PubMed results on functional characteristics converged on clusters defined by agnostic network analysis. PubMed results suggest genera in cluster 1 are opportunistic pathogens. Porphyromonas and Prevotella are anaerobic, Gram-negative bacteria with lipopolysaccharides (endotoxins) in their outer membrane. They are commensal to the human gastrointestinal and urogenital tracts. Corynebacterium are aerobic, Gram-positive, and have a higher abundance in the skin microbiota than the gut. Although commensal and often harmless, Porphyromonas, Prevotella, and Corynebacterium are opportunistic pathogens capable of causing infections in immune-compromised individuals or if they gain access to sterile sites via compromised membranes, post surgery, bites, or wounds[44-46]. Many, but not all species of Porphyromonas, Prevotella, and Corynebacterium are pathogens. Corynebacterium diphtheriae is the leading cause of diphtheria. Porphyromonas gingivalis causes periodontal disease. We did not detect C. diphtheriae and P. gingivalis was extremely rare in our samples. We were interested in knowing the species that made up these three genera in our PD samples. The bioinformatic pipeline used in our study (DADA2 with SILVA as reference database) assigned the detected sequences (amplicon sequence variants (ASVs)) to species if the sequences were 100% identical; otherwise, the ASV was unassigned to species. To confirm and expand on DADA2-SILVA assignments, we blasted all the ASVs that made up each of the three genera against the NCBI 16S rRNA database, focusing only on matches that were >99–100% identical to a species with high statistical confidence. In PD patients, we found that 80% of Corynebacterium_1 was composed of one unique ASV with 100% identity to Corynebacterium amycolatum and Corynebacterium lactis; 96% of Porphyromonas was composed of ASVs that matched Porphyromonas asaccharolytica, Porphyromonas bennonis, Porphyromonas somerae, or Porphyromonas uenonis with >99–100% identity, and 98% of Prevotella was composed of ASVs that matched Prevotella bivia, Prevotella buccalis, Prevotella disiens, or Prevotella timonensis with >99–100% identity (83% of Prevotella matched P. bivia, P. buccalis, P. disiens, or P. timonensis at 100% identity). We conducted a PubMed search for each of these ten species, using genus and species name as the key word (ex. Corynebacterium amycolatum), with search filters as follows: Humans, English, and Title/Abstract. Excluding method papers, PubMed returned 104 articles that addressed function, characteristics, or relevance to human health, and every article was about the microorganism (search term) as a pathogen in clinical specimens from various infections (Supplementary Table 6). Clinical specimen from chronic wounds, infections, and inflammations are often polymicrobial[44-46]. Porphyromonas, Prevotella, Corynebacterium, and other members of cluster 1 are often observed together in these polymicrobial infections[44-46]. With the newly acquired knowledge on the potential biological significance of cluster 1, we questioned whether this polymicrobial group as a whole may be relevant to PD. The co-occurring organisms in cluster 1 (defined by correlation r ≥ 0.4) were Anaerococcus, Campylobacter, Ezakiella, Finegoldia, Murdochiella, Peptoniphilus, Porphyromonas, Prevotella, and Varibaculum in dataset 1, and Anaerococcus, Campylobacter, Corynebacterium_1, Ezakiella, Fastidiosipila, Finegoldia, Lawsonella, Mobiluncus, Mogibacterium, Murdochiella, Negativicoccus, Peptoniphilus, Porphyromonas, Prevotella, Prevotella_6, S5-A14a, Varibaculum, and unclassified Corynebacteriaceae in dataset 2. Most of these organisms are rare and may have been missed in MWAS. We conducted another MWAS where we collapsed the nonsignificant members of cluster 1 into one group (partial cluster 1), leaving Porphyromonas, Prevotella, and Corynebacterium_1 as individual genera along with the rest of the genera in MWAS. As expected, we recaptured all 15 PD-associated genera, as well as an additional signal for the partial cluster 1 that was ANCOM and KW significant in both datasets (dataset 1: 2.9-fold increased abundance in PD, ANCOM W = 392, KW FDR = 0.03; dataset 2: 2.5-fold increased abundance in PD, ANCOM W = 480, KW FDR = 0.002). Most (possibly all) genera in cluster 2 produce short-chain fatty acids (SCFAs). Of the ten PD-associated genera in cluster 2, three (Oscillospira, Lachnospiraceae_UCG-004, and Lachnospiraceae_ND3007_group) have been detected only by sequencing and not yet been cultured. The rest (Agathobacter, Blautia, Butyricicoccus, Faecalibacterium, Fusicatenibacter, Lachnospira, and Roseburia) are all anaerobic, Gram-positive bacteria in the Ruminococcaceae and Lachnospiraceae families. They are best known for producing SCFAs, mainly butyrate, which help maintain integrity of the gut membrane and have anti-inflammatory properties[47,48]. The literature on genera in cluster 3 suggest they are probiotic, but with the potential of becoming opportunistic pathogens and immunogenic. Lactobacillus[49] and Bifidobacteria[50] are anaerobic Gram-positive bacteria. They are among the ubiquitous inhabitants of the human gastrointestinal microbiome. They metabolize carbohydrates in plants and dairy, and are considered probiotic for their health benefits[51,52], although they have also been implicated as cause of infection and excessive immune stimulation in susceptible individuals[52,53].

Cause of disease or consequence of medication

Human association studies are powerful tools for identifying disease-relevant leads and to generate hypotheses that can then be tested experimentally. Even if we find a strong candidate that blurs the line between association and causality, we cannot prove that it preceded PD, because there are decades of preclinical and prodromal disease, and we do not know when it all begins. Although cause cannot be proven in these studies, we can sometimes tease out consequence. Medications have profound effects on the microbiome[33]. Levodopa is the most commonly used PD medication (>85% of PD patients were on varying doses of levodopa). To gauge if the association of PD with any of the 15 genera was a consequence of levodopa treatment, we tested whether the change in the differential abundance of the 15 genera correlated with increasing levodopa dose. We found no significant evidence to suggest that the increasing abundance of Porphyromonas, Prevotella, or Corynebacterium_1 (cluster 1) correlated with levodopa therapy. We did find significant evidence (two-sided P-value < 0.05) in dataset 2 to suggest that increasing doses of levodopa were correlated with decreasing levels of SCFA-producing organisms (Faecalibacterium P = 0.01, Agathobacter P = 0.02, Blautia P = 5E − 4, Roseburia P = 0.02, Fusicatenibacter P = 0.01, Lachnospira P = 5E − 3, Lachnospiraceae_ND3007_group P = 5E − 3, Lachnospiraceae_UCG-004 P = 0.03). A similar pattern was present in dataset 1, albeit most did not reach statistical significance possibly due to the smaller sample size of dataset 1. We also detected significant correlation between increasing levodopa dose and increasing levels of Bifidobacterium (dataset 1 P = 5E − 3, dataset 2 P = 2E − 6) and Lactobacillus (dataset 2 P = 4E − 3). These data suggest that the increase in abundance of cluster 1 (opportunistic pathogens) is independent of levodopa, but that the reduction in cluster 2 (SCFA) and increase in cluster 3 (probiotics), if not solely a consequence of medication, worsen with increasing doses of levodopa.

Discussion

To summarize, we first confirmed that the gut microbiome is altered in PD and showed that the PD effect on the global composition of the gut microbiome is independent of the effects of sex, age, BMI, constipation, gastrointestinal discomfort, geography, and diet. Next, using hypothesis-free microbiome-wide association studies we identified 15 PD-associated genera that achieved microbiome-wide significance in both datasets, with two methods, and with or without covariate adjustment. The 15 association signals were robust to the dramatic population-specific differences in the composition of microbiomes of the two datasets. We used hypothesis-free correlation network analysis to infer interactions and to identify communities of co-occurring microorganisms. Using this agnostic approach, we learned that the 15 PD-associated genera represent three polymicrobial clusters. Review of the literature revealed that the clusters, as defined by agnostic network analysis, also share functional characteristics. Our results suggest the gut microbiomes of persons with PD can present with (1) an overabundance of a polymicrobial cluster of opportunistic pathogens, (2) reduced levels of SCFA-producing bacteria, and/or (3) elevated levels of carbohydrate metabolizers commonly known as probiotics. Our data align with and expand on PD-microbiome literature. Reduced levels of SCFA-producing bacteria[12,14,16,18,19,21,26,27] and elevated levels of probiotic bacteria in PD[14,16,18,21,25-27] have been reported before, and thus are confirmatory. Overabundance of opportunistic pathogens, however, had not been reported before. We suspect the reason we were able to detect these microorganisms is because they are rare (Fig. 2) and we had a much larger sample size and power than prior studies. The microorganisms identified in prior PD studies were among the more abundant microorganisms in the gut. There have been two systematic reviews of PD-microbiome studies, which clearly show the vast disparity in the findings, but also reveal few findings that have emerged in more than one study[31,32]. The most recent review highlighted six associations that were significant in more than one study: Faecalibacterium, Roseburia, Bifidobacterium, Lactobacillus, Akkemansia, and Prevotella[32]. We confirmed the reduction in Faecalibacterium and Roseburia (cluster 2), and the increase in Bifidobacterium and Lactobacillus (cluster 3). We also confirmed increased Akkermansia in both datasets but it was only significant in dataset 1. Prevotella results are interesting, with Scheperjans et al.[11] and Petrov et al.[18] reporting it decreased in PD, whereas we find it elevated in both datasets. The apparent inconsistency may be simply because what is being referred to as “Prevotella” is not the same in these studies. We all used different taxonomic classification: Scheperjans et al.[11] reported at the family level (Prevotellaceae), we at genus level (Prevotella), and Petrov et al.[18] at species level (Prevotella copri). The SILVA database we used here, classified family Prevotellaceae into 11 genera. The more common genera in the Prevotellaceae family (Paraprevotella, Prevotella_9, and Prevotella_7) did in fact have lower frequencies in PD than in controls, as Scheperjans et al.[11] observed, but the difference was not significant in our datasets (FDR > 0.6 in both datasets). Species P. copri, which Petrov et al.[18] found reduced in PD, was the main species of the Prevotella_9 genus, which was reduced in our PD samples as well but not significantly (FDR > 0.8 in both datasets). We found instead elevated levels of the less common genus Prevotella (FDR = 0.006 in dataset 1 and FDR = 0.02 in dataset 2). These findings suggest family Prevotellaceae may be heterogenous in its association with PD. When comparing studies, another important consideration is the reference database: there are many and they have varied phylogenetic resolution and nomenclature. For example, genus Corynebacterium in NCBI is divided into two non-monophyletic genera in SILVA: Corynebacterium_1 and Corynebacterium. Similarly, what is called genus Prevotella in NCBI, is divided into multiple non-monophyletic genera in SILVA (we detected Prevotella, Prevotella_2, Prevotella_6, Prevotella_7, and Prevotella_9). The varying resolution at which the tests are conducted and the reference databases used cause confusion in the literature. The evidence for overabundance of opportunistic pathogens in PD gut microbiome was potentially the most exciting finding of this study. Braak et al.[54,55] originally hypothesized that non-inherited forms of PD are caused by a pathogen that can pass through the mucosal barrier of the gastrointestinal tract and spread to the brain through the enteric nervous system. Although many aspects of Braak’s hypothesis have gained support in recent years, there is no direct evidence that a pathogen is involved. Presence of α-synuclein in the gastrointestinal tract has been documented in persons with established Lewy body disease[56], as well as those with rapid eye movement sleep behavior disorder, which is considered prodromal PD[57]. Epidemiological studies suggest that truncal vagotomy if conducted decades before onset of PD reduces risk of developing PD[58,59]. In a mouse model, α-synuclein fibrils injected into the gut induced α-synuclein pathology which spread to the brain resulting in Parkinsonian neurodegeneration and behavioral phenotype; whereas truncal vagotomy and α-synuclein deficiency prevented the gut-to-brain spread and the associated neurodegeneration[60]. Human studies unrelated to PD have shown that infection in the gut or the olfactory system induce α-synuclein expression, and the increased abundance of α-synuclein mobilizes the immune system to fight the pathogen[61,62]. It was also shown in a genetic model of PD (pink1 knockout mice) that intestinal infection by pathogens elicits activation of cytotoxic T cells in the periphery and the brain, and leads to deterioration of dopaminergic cells and motor impairment, suggesting that intestinal infection acts as a triggering event in PD[63]. Despite the increasing evidence linking the gut, α-synuclein, and inflammation to PD, there was no direct evidence that a pathogen is responsible for the pathology. Here, we present evidence from human samples indicating an overabundance of opportunistic pathogens in the gut microbiome of persons with PD. The three genera that rose to significance (Porphyromonas, Prevotella, or Corynebacterium_1) represented a larger polymicrobial cluster of opportunistic pathogens that co-occur in controls as well as in patients (although at much lower abundances in healthy gut). Per literature, these opportunist pathogens are often harmless, but can grow and cause infections if the immune system is compromised or if they penetrate sterile sites through, e.g., compromised membranes[44-46]. The exciting question is whether these are Braak’s pathogens capable of triggering PD, or they are irrelevant to PD but are able to penetrate the gut and grow, because the gut lining is compromised in PD. We re-emphasize that no claims can be made on function based solely on association. The knowledge on the function of microorganisms in the gut is currently limited. Although there may be a large body of literature, each organism has been studied with a narrow lens. Organisms that are known to be opportunistic pathogens are being looked for in clinical specimen, whether they have other critical functions is not known. The identity of these microorganisms will enable experimental studies to determine if and how they play a role in PD. Our second main finding was a polymicrobial cluster of ten genera whose relative abundances were reduced in PD. All ten genera belong to the Lachnospiraceae and Ruminococcaceae families, well-known for producing SCFA. Several studies had found reduced levels of different SCFA-producing bacteria in PD patients[12,14,16,18,19,21,26,27]. Our finding is therefore confirmatory and expands on the list of PD-associated genera in these two taxonomic families. We and others noted that the decreasing levels of Lachnospiraceae correlate with increasing daily dose of levodopa, disease duration[12], disease severity and motor impairment[26], which suggest SCFA-producing microorganisms diminish as a consequence of medication and/or advancing disease. SCFA promote gastrointestinal motility, maintain integrity of the gut lining, and control inflammation in the gut and the brain[47,48,64-66], all of which are compromised in PD. It is important to note, however, that reduced levels of SCFA in the gut has been documented in many inflammatory disorders[67-71], and is not specific to PD. We also found elevated levels of Bifidobacterium and Lactobacillus in PD, which are generally considered as probiotics. Increased Bifidobacterium and Lactobacillus have been noted in some of the prior PD studies, albeit not consistently[14,16,18,21,25-27]. Both are ubiquitous inhabitants of human gut and metabolize carbohydrates derived from plants and dairy[49,50]. We found a significant correlation between increasing levodopa dose and increasing Bifidobacterium and Lactobacillus levels. Lactobacillus produce a bacterial enzyme that metabolizes levodopa into dopamine before it can reach the brain, reducing efficacy of the drug and requiring higher doses, which in feedback causes further growth of the bacteria[72,73]. Ironically, Bifidobacterium and Lactobacillus are sold in stores as probiotics, and a clinical trial has reported fermented milk, which contained Bifidobacterium, Lactobacillus, and fiber, among other active ingredients, improved constipation in PD[74]. Although generally believed to be safe, and possibly beneficial for the healthy population, they can act as opportunistic pathogens and cause infection and excessive immune stimulation in immune-compromised individuals[52,53]. It is important to understand why Bifidobacterium and Lactobacillus are elevated in PD and if they are beneficial (a compensatory mechanism to overcome the dysbiosis) or detrimental (feedback of levodopa). There were limitations in this study that should be considered in designing follow-up studies. The sample size, although the largest PD-microbiome study to date, was not sufficiently powered to detect rare microorganisms. If PD is indeed associated with polymicrobial clusters of rare opportunistic pathogens, larger sample sizes are needed to tease out the microorganisms individually. In addition to larger sample size, identifying the microorganisms will require shotgun metagenomic sequencing. The 16S amplicon sequencing used here was sufficient for exploratory MWAS, but did not provide the resolution to species, strain and gene level. We also lacked ability to detect viruses and fungi. Since this study was launched in 2014, the field has advanced rapidly. To maintain uniformity in data collection, we did not change the method of stool collection mid-study from sterile swabs to preservative solutions, but employed the latest advances if they could be applied to both datasets uniformly, notably in bioinformatics and statistics, and took analytic measures to identify potential confounders. We made certain decisions for data analyses, such as using stringent criteria to declare significance, and the choice of parameters used to define networks and clusters. We have made both the raw data and summary statistics publicly available so they can be analyzed with any methods and specifications. In conclusion, we uncovered robust and reproducible signals, which reaffirm (SCFA and probiotics) and generate leads (opportunistic pathogens) for experimentation into cause and effect, disease progression, and therapeutic targets. This study was limited by its singular and precise focus and intentionally conservative analytic execution. There is more to be learned with larger sample sizes with greater power, longitudinal studies to track change from prodromal to advanced disease, and by next-generation metagenome sequencing to broaden the scope from bacteria and archaea to include viruses and fungi, and improve the resolution to strain and gene level.

Methods

Subjects and data collection

The study was approved by institutional review boards for ethical conduct of human subject research at all participating institutions, namely New York State Department of Health, University of Alabama at Birmingham, VA Puget Sound Health Care System, Emory University, and Albany Medical Center. All subjects provided written informed consent for their participation. Subjects characteristics are provided in Supplementary Table 1. Subjects were enrolled by NGRC investigators, using standardized methods, at four NGRC-affiliated movement disorder clinics in the United States. Dataset 1 was collected in Seattle, WA, Albany, NY, and Atlanta, GA, in 2014 and included 212 persons with PD and 136 controls[16]. Dataset 2 was collected in Birmingham, AL, during 2015–2017 and included 323 PD and 184 controls (unpublished). PD was diagnosed by a movement disorder specialist using UK Brain Bank criteria[75], and controls were self-reported free of neurological disease. Each individual represents a distinct and unique data point (no repeated measurements were used). Metadata are provided in Supplementary Table 1. Data were collected using two self-administered questionnaires: an EFQ and GMQ[4,16,36]. EFQ covered sex, age, ancestry, and lifetime exposure data on PD-related risk factors. GMQ covered information pertinent to microbiome analysis and was filled out immediately after stool sample collection. PD medications that subjects were taking at the time of sample collection were extracted from medical records by clinical investigators. Stool samples were collected by the subjects at home using DNA/RNA-free sterile cotton swabs (BD BBL CultureSwab Sterile/Media-free Swabs, Fisher Scientific, Pittsburgh, PA). The sample was collected from excreted stool (the kit was not a rectal swab), thus minimizing contamination by skin microbiota, water, and urine. The stool samples were shipped immediately via standard US postal service at ambient temperature and stored at −20 °C upon arrival. The collection kit chosen was the most reasonable option at the time (2014). We did not use stabilizing solution, because collection kits with stabilizing solutions (e.g., OMNIgene GUT by DNA Genotek) were first introduced in 2015–2016. Immediate freezing was not feasible because we could not collect stool from over 800 participants, most of whom suffer constipation, while in clinic, nor was it acceptable to the participants to place their stool in their home freezer before shipping. We tested the effect of stool sample travel time on the results as follows. Subjects recorded the collection date and we recorded when it was placed in −20 °C freezer, the difference was calculated as the stool sample travel time. We tested the stool sample travel time in cases vs. controls (Supplementary Table 1). We adjusted the PERMANOVA and MWAS for stool sample travel time.

DNA extraction and sequencing

DNA extraction and sequencing of datasets were done in different laboratories (the Knight Lab at University of California San Diego for dataset 1[16] and HudsonAlpha Institute for Biotechnology for dataset 2), keeping methods uniform as possible. Negative controls were included in both datasets. DNA was extracted using MoBio PowerMag Soil DNA Isolation Kit for dataset 1 and MoBio PowerSoil DNA Isolation Kit for dataset 2, both kits using equivalent chemistries (MoBio Industries, Carlsbad, CA). Case and control samples were randomized on plates for sequencing to avoid batch effect. Hypervariable region 4 (V4) of the bacterial/archaeal 16S rRNA gene was PCR amplified using primers 515F (5′-GTGCCAGCMGCCGCGGTAA-3′) and 806R (5′-GGACTACHVGGGTWTCTAAT-3′) and sequenced using Illumina MiSeq. For dataset 1, paired-end 150 bp was used and all samples were sequenced in one run. For dataset 2, paired-end 250 bp was used and samples were sequenced in six runs. Sequence files were de-multiplexed using QIIME2 (core distribution 2018.6)[76] for dataset 1 and Illumina’s BCL2FASTQ software on BaseSpace for dataset 2. Fifteen samples in dataset 1 had low sequencing counts and were excluded for present analysis.

Bioinformatics

Forward and reverse primers were trimmed from the 5′-end of sequences using cutadapt v 1.16[77]. After primer trimming, only sequences with lengths of 147–151 bp in dataset 1 and 230–233 bp in dataset 2 were retained. DADA2 R package v 1.8[78] was used for the remaining bioinformatics with default parameters unless when specified. Sequences were quality trimmed and filtered using the filterAndTrim function: trimming 3′-ends to 147 bp (forward) and 147 bp (reverse) in dataset 1, and 228 bp (forward) and 203 bp (reverse) in dataset 2, and removing sequences if they exceeded a maximum of two expected errors. ASVs were inferred and ASV tables were constructed as follows. For each sequencing run (a) a model for sequencing error was constructed using the learnErrors function specifying that all bases in all sequences be used for constructing the model, (b) sequences were de-replicated to find unique sequences using the derepFastq function, (c) ASVs were inferred from de-replicated sequences using the dada function, (d) forward and reverse sequences were merged using the mergePairs function, and (e) sequences with <250 bp or >256 bp were removed. This resulted in one ASV table for dataset 1 and six ASV tables for dataset 2. The six ASV tables of dataset 2 were merged using the mergeSequenceTables function. Chimeras were detected and removed using the removeBimeraDenovo function. The following data transformation procedures were used to account for variable sequence depth. Sequence counts were normalized to relative abundances (calculated by dividing the number of sequences that were assigned to a unique ASV or to a genus by the total sequence count in the sample) for PERMANOVA when using Canberra or GUniFrac distance, for MWAS when using KW, and for testing correlation with levodopa drug dose. Centered-log ratio (clr) transformation (using the transform function of the microbiome v 1.2.1 R package (http://microbiome.github.com/microbiome)) was used for PCA and for PERMANOVA when using Aitchison distance. Log ratios (implemented internally in ANCOM and SparCC) were used when using ANCOM for MWAS and for correlation network analysis. Earlier microbiome studies (including our first study conducted with dataset 1)[16] often used rarefaction to normalize the sequence count. Although not as efficient as the other methods due to data loss[79], for added assurance, we rarefied the data, repeated the MWAS with ANCOM and were able to recover all 15 significant PD-associated genera. Taxonomic assignments were made using SILVA (v 132) in DADA2. MWAS and correlation network analysis were conducted at genus level. To define genera, first each unique ASV was assigned to a genus using the assignTaxonomy function, which performs DADA2’s native implementation of the Ribosomal Database Project naive Bayesian classifier[80], using SILVA v 132 as reference and a bootstrap confidence of 80%. Then, each genus (including the unclassified genera) was formed by agglomerating all ASVs that were assigned to that genus using the tax_glom function in phyloseq. Post MWAS, we explored PD-associated genera at the species level. DADA2 pipeline assigns ASVs to species only if the sequences match 100%. We used the addSpecies function in DADA2 with SILVA as reference and addMultiple=TRUE, first finding 100% matches, then filtering out those matches that did not correspond to the genus given by the assignTaxonomy function. To confirm and expand on DADA2-SILVA species assignments, we BLASTed ASVs against the NCBI 16S rRNA gene sequence database (downloaded on 12/3/2019), and extracted taxonomic designations with the most significant E-value. Nucleotide BLAST search was performed using the BLAST + executables v 2.9.0 with default parameters[81] (ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/). A phylogenetic tree of ASVs was constructed for each dataset, as described by Callahan et al.[82]. Briefly, multiple sequence alignment of ASVs was performed using the AlignSeqs function from the DECIPHER R package v 2.8.1[83]. Aligned ASVs were then used to build a phylogenetic tree using the phangorn R package v 2.5.3[84]. A phyloseq object was created for each dataset for use in conducting statistical analyses. For each dataset, the ASV table, taxonomic assignments, phylogenetic tree and metadata were merged into a single file, using phyloseq function in phyloseq R package v 1.24.2[85].

Data analysis and statistics

PCA was performed on the clr transformed ASV data[35] using the ordinate function in phyloseq. PC1 and PC2 were plotted using the plot_ordination function in phyloseq (Fig. 1). We interrogated 47 variables as potential confounders (Supplementary Table 1). In each dataset, we first tested the distribution of each variable in cases vs controls, using Fisher’s exact test (fisher.test function in R) for categorical variables, and Mann–Whitney U (wilcox.test function in R) for quantitative variables. Variables that differed between cases and control at uncorrected two-sided P < 0.05 were tagged as potential confounders, and were then included in PERMANOVA, along with case–control status, and tested for their effects on microbiome composition (Table 1). As PERMANOVA was conducted using marginal effects model without rank (see below), simultaneous inclusion of case–control and other variables allowed testing the association of each variable with microbiome composition while adjusting for all other variables in the model. Thus, PD effect on microbiome composition (β-diversity) was adjusted for variables that differed between cases and controls. Next, variables that were associated with microbiome composition at PERMANOVA P < 0.05 were included as covariates in MWAS. Thus, variables that could have led to spurious taxa-disease association because they differed between cases and controls and were also associated with microbiome, were adjusted for in MWAS. PD medications (also potential confounders) were present only in PD cases and could not be included as covariates in PERMANOVA or MWAS. To gauge the effect of PD on β-diversity independent of each medication, we performed PERMANOVA using cases not on PD medication vs. controls (Table 1). The potential confounding effect of medication on differential abundance of genera was tested post MWAS. For each genus whose relative abundance was associated with PD, we tested the correlation between relative abundance of the genus with daily dose of levodopa (mg/day) using Spearman correlation (two-sided P-value) implemented in the cor.test function in R. To investigate changes in the global composition of microbiome (β-diversity) PERMANOVA was used to identify variables that had a significant effect on β-diversity (Table 1). Tests were conducted using adonis2 function in vegan v 2.5.3 (https://CRAN.R-project.org/package=vegan). P-values were generated by 99,999 permutations which caps at P < 1E − 5 as highest significance. Three models were tested as follows: (Model A) PD vs. control: [Distance ~ case/control] (Model B) PD vs. control and all variables tagged as potential confounders: Dataset 1: [Distance ~ case/control + sex + age + geography + BMI + loss of 10 lbs in past year + gastrointestinal discomfort on day of stool collection + constipation in past 3 months + alcohol use + fruits or vegetables daily + stool sample travel time] Dataset 2: [Distance ~ case/control + sex + age + BMI + loss of 10 lbs in past year + gastrointestinal discomfort on day of stool collection + constipation in past 3 months + alcohol use + stool sample travel time] (Model C) Subset of PD cases not on a given PD medication vs controls: [Distance ~ case/control] where distance (a measure of (dis)similarity between pairs of samples), age (in years), BMI (kg/m2), and stool sample travel time (in days) were continuous variables and the remaining variables were categorical. We tested marginal effects, so that each variable was tested while being adjusted for all others in the model, without priority. To gauge the effect of the distance measure on the results, all three models were tested using Aitchison[35], GUniFrac[37], and Canberra[38] distances. Aitchison distances were calculated by first transforming the ASV data using clr, and then calculating the Euclidean distances using the vegdist function. To calculate GUniFrac distances, unrooted ASV phylogenetic trees were rooted using the root function in the ape v 5.3 R package[86] specifying the unique ASV with the highest raw count as the root, then data were transformed to relative abundances and distances were calculated using the GUniFrac function in the R package GUniFrac v 1.1[37], specifying α to be 0.5. To calculate Canberra distances, data were transformed to relative abundances and distances were calculated using the vegdist function in vegan. We conducted MWAS to identify the genera whose abundances differed in cases vs. controls. We chose genus classification, because it is the highest resolution attainable with high confidence from 16S sequencing. For statistical analysis of MWAS, we used ANCOM (Table 2 and Supplementary Tables 2–3). We chose ANCOM, because it incorporates compositionality of the microbiome data, has low false-positive rate, and allows covariate adjustment[40,42]. ANCOM was run using ANCOM.main function from the ANCOMv2 R code (https://sites.google.com/site/siddharthamandal1985/research). All genera that were detected in each dataset were included in ANCOM MWAS. Sequence counts were transformed to log ratios, as implemented in ANCOM. Case/control status was specified as the main variable. For each dataset, the variables that were significant at P < 0.05 in PERMANOVA were included as covariates to be adjusted, as follows: Dataset 1: [Genus ~ case/control + sex + age + geography + gastrointestinal discomfort on day of stool collection + fruits or vegetables daily + stool sample travel time] Dataset 2: [Genus ~ case/control + sex + age + BMI + constipation in past 3 months] where genus (ASV counts assigned to a genus, transformed to log ratios by ANCOM), age (in years), BMI (kg/m2), and stool sample travel time (in days) were continuous variables and the remaining variables were categorical. We used the taxa-wise FDR option (multcorr = 2) and set significance level to FDR < 0.05 to generate W statistics, and threshold of 0.8 for declaring an association as significant. For comparison, we repeated the MWAS using KW as statistical test (Table 2 and Supplementary Tables 4–5). For KW, genera counts were transformed to genera relative abundances. Unclassified genera, and genera present in <10% of samples were excluded from KW MWAS. KW does not allow covariate adjustment. The kruskal.test function from the stats R package was used to test for significance. P-values were two-sided and corrected for multiple testing using Benjamini–Hochberg FDR method implemented in the p.adjust function from stats package. To visualize the distribution of genera that were significant in MWAS (Fig. 2), boxplots were created using ggplot2 v 3.1.0 (https://ggplot2.tidyverse.org) with a pseudo-count of 1 added to counts before transforming to relative abundances to avoid taking the log of zero during plotting. Correlation network analysis was performed for each dataset, and for cases and controls separately (Fig. 3 and Supplementary Fig. 1). Pairwise correlations were calculated between all genera, microbiome-wide, using log-ratio transformed relative abundances as implemented in the SparCC[43] (https://bitbucket.org/yonatanf/sparcc). Significance of each correlation was determined by pseudo P-values based on 3000 permutations. Correlation networks were visualized by plotting all genera, microbiome-wide, and connecting correlated genera with an edge, using the program Gephi v 0.9.2[87]. We chose a minimum correlation (r) of 0.4 to connect two genera with an edge to create the graphic. All correlations r ≥ 0.4 were significant at P < 3E − 4, which is the maximum significance attainable with 3000 permutations. To better visualize networks of connected genera, we first used the force-directed algorithm, Force Atlas 2[88], then a community detection algorithm[89] as implemented in Gephi’s modularity function. Supplementary-materials Reporting-summary
  83 in total

Review 1.  Advances in environmental epidemiology.

Authors:  Caroline M Tanner
Journal:  Mov Disord       Date:  2010       Impact factor: 10.338

Review 2.  Risk and safety of probiotics.

Authors:  Shira Doron; David R Snydman
Journal:  Clin Infect Dis       Date:  2015-05-15       Impact factor: 9.079

3.  A comparison of clinical and pathological features of young- and old-onset Parkinson's disease.

Authors:  W R Gibb; A J Lees
Journal:  Neurology       Date:  1988-09       Impact factor: 9.910

4.  Short chain fatty acids and gut microbiota differ between patients with Parkinson's disease and age-matched controls.

Authors:  Marcus M Unger; Jörg Spiegel; Klaus-Ulrich Dillmann; David Grundmann; Hannah Philippeit; Jan Bürmann; Klaus Faßbender; Andreas Schwiertz; Karl-Herbert Schäfer
Journal:  Parkinsonism Relat Disord       Date:  2016-08-26       Impact factor: 4.891

5.  Gut microbiota in patients with Parkinson's disease in southern China.

Authors:  Aiqun Lin; Wenxia Zheng; Yan He; Wenli Tang; Xiaobo Wei; Rongni He; Wei Huang; Yuying Su; Yaowei Huang; Hongwei Zhou; Huifang Xie
Journal:  Parkinsonism Relat Disord       Date:  2018-08       Impact factor: 4.891

6.  phangorn: phylogenetic analysis in R.

Authors:  Klaus Peter Schliep
Journal:  Bioinformatics       Date:  2010-12-17       Impact factor: 6.937

7.  Meta-analyses on prevalence of selected Parkinson's nonmotor symptoms before and after diagnosis.

Authors:  Honglei Chen; Edward J Zhao; Wen Zhang; Yi Lu; Rui Liu; Xuemei Huang; Anna J Ciesielski-Jones; Michele A Justice; Deborah S Cousins; Shyamal Peddada
Journal:  Transl Neurodegener       Date:  2015-01-08       Impact factor: 8.014

8.  Alterations in fecal short-chain fatty acids in patients with irritable bowel syndrome: A systematic review and meta-analysis.

Authors:  Qinghua Sun; Qiong Jia; Lijin Song; Liping Duan
Journal:  Medicine (Baltimore)       Date:  2019-02       Impact factor: 1.817

9.  Intestinal Dysbiosis and Lowered Serum Lipopolysaccharide-Binding Protein in Parkinson's Disease.

Authors:  Satoru Hasegawa; Sae Goto; Hirokazu Tsuji; Tatsuya Okuno; Takashi Asahara; Koji Nomoto; Akihide Shibata; Yoshiro Fujisawa; Tomomi Minato; Akira Okamoto; Kinji Ohno; Masaaki Hirayama
Journal:  PLoS One       Date:  2015-11-05       Impact factor: 3.240

10.  Bioconductor workflow for microbiome data analysis: from raw reads to community analyses.

Authors:  Ben J Callahan; Kris Sankaran; Julia A Fukuyama; Paul J McMurdie; Susan P Holmes
Journal:  F1000Res       Date:  2016-06-24
View more
  41 in total

Review 1.  A Comprehensive Review on the Role of the Gut Microbiome in Human Neurological Disorders.

Authors:  Shokufeh Ghasemian Sorboni; Hanieh Shakeri Moghaddam; Reza Jafarzadeh-Esfehani; Saman Soleimanpour
Journal:  Clin Microbiol Rev       Date:  2022-01-05       Impact factor: 26.132

Review 2.  The microbiome-gut-brain axis in Parkinson disease - from basic research to the clinic.

Authors:  Ai Huey Tan; Shen Yang Lim; Anthony E Lang
Journal:  Nat Rev Neurol       Date:  2022-06-24       Impact factor: 44.711

Review 3.  Central and Peripheral Inflammation: Connecting the Immune Responses of Parkinson's Disease.

Authors:  Gregory P Williams; Aubrey M Schonhoff; Alessandro Sette; Cecilia S Lindestam Arlehamn
Journal:  J Parkinsons Dis       Date:  2022       Impact factor: 5.520

4.  Oral and gut dysbiosis leads to functional alterations in Parkinson's disease.

Authors:  Sungyang Jo; Woorim Kang; Yun Su Hwang; Seung Hyun Lee; Kye Won Park; Mi Sun Kim; Hyunna Lee; Hyung Jeong Yoon; Yoo Kyoung Park; Mauricio Chalita; Je Hee Lee; Hojun Sung; Jae-Yun Lee; Jin-Woo Bae; Sun Ju Chung
Journal:  NPJ Parkinsons Dis       Date:  2022-07-07

Review 5.  Neurodegenerative disorders and gut-brain interactions.

Authors:  Alpana Singh; Ted M Dawson; Subhash Kulkarni
Journal:  J Clin Invest       Date:  2021-07-01       Impact factor: 19.456

6.  To the Gut Microbiome and Beyond: The Brain-First or Body-First Hypothesis in Parkinson's Disease.

Authors:  Nathan D Nuzum; Amy Loughman; Ewa A Szymlek-Gay; Wei-Peng Teo; Ashlee M Hendy; Helen Macpherson
Journal:  Front Microbiol       Date:  2022-03-30       Impact factor: 5.640

Review 7.  Diet, Microbiota and Brain Health: Unraveling the Network Intersecting Metabolism and Neurodegeneration.

Authors:  Francesco Gentile; Pietro Emiliano Doneddu; Nilo Riva; Eduardo Nobile-Orazio; Angelo Quattrini
Journal:  Int J Mol Sci       Date:  2020-10-10       Impact factor: 5.923

8.  The impact of device-assisted therapies on the gut microbiome in Parkinson's disease.

Authors:  Michal Lubomski; Xiangnan Xu; Jean Y H Yang; Carolyn M Sue; Ryan L Davis; Andrew J Holmes
Journal:  J Neurol       Date:  2021-06-15       Impact factor: 6.682

Review 9.  Gut Microbiota Approach-A New Strategy to Treat Parkinson's Disease.

Authors:  Jing Liu; Fei Xu; Zhiyan Nie; Lei Shao
Journal:  Front Cell Infect Microbiol       Date:  2020-10-22       Impact factor: 5.293

10.  A High Protein Calorie Restriction Diet Alters the Gut Microbiome in Obesity.

Authors:  Tien S Dong; Kayti Luu; Venu Lagishetty; Farzaneh Sedighian; Shih-Lung Woo; Benjamin W Dreskin; William Katzka; Candace Chang; Yi Zhou; Nerea Arias-Jayo; Julianne Yang; Aaron Ahdoot; Zhaoping Li; Joseph R Pisegna; Jonathan P Jacobs
Journal:  Nutrients       Date:  2020-10-21       Impact factor: 5.717

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.