| Literature DB >> 34232099 |
Thomas H Hampton1, Devin Thomas2, Christopher van der Gast3, George A O'Toole1, Bruce A Stanton1.
Abstract
Microbial communities in the airways of persons with CF (pwCF) are variable, may include genera that are not typically associated with CF, and their composition can be difficult to correlate with long-term disease outcomes. Leveraging two large data sets characterizing sputum communities of 167 pwCF and associated metadata, we identified five bacterial community types. These communities explain 24% of the variability in lung function in this cohort, far more than single factors like Simpson diversity, which explains only 4%. Subjects with Pseudomonas-dominated communities tended to be older and have reduced percent predicted FEV1 (ppFEV1) compared to subjects with Streptococcus-dominated communities, consistent with previous findings. To assess the predictive power of these five communities in a longitudinal setting, we used random forests to classify 346 additional samples from 24 subjects observed 8 years on average in a range of clinical states. Subjects with mild disease were more likely to be observed at baseline, that is, not in the context of a pulmonary exacerbation, and community structure in these subjects was more self-similar over time, as measured by Bray-Curtis distance. Interestingly, we found that subjects with mild disease were more likely to remain in a mixed Pseudomonas community, providing some support for the climax-attack model of the CF airway. In contrast, patients with worse outcomes were more likely to show shifts among community types. Our results suggest that bacterial community instability may be a risk factor for lung function decline and indicates the need to understand factors that drive shifts in community composition. IMPORTANCE While much research supports a polymicrobial view of the CF airway, one in which the community is seen as the pathogenic unit, only controlled experiments using model bacterial communities can unravel the mechanistic role played by different communities. This report uses a large data set to identify a small number of communities as a starting point in the development of tractable model systems. We describe a set of five communities that explain 24% of the variability in lung function in our data set, far more than single factors like Simpson diversity, which explained only 4%. In addition, we report that patients with severe disease experienced more shifts among community types, suggesting that bacterial community instability may be a risk factor for lung function decline. Together, these findings provide a proof of principle for selecting bacterial community model systems.Entities:
Keywords: cystic fibrosis; lung infection; microbial communities
Mesh:
Year: 2021 PMID: 34232099 PMCID: PMC8549546 DOI: 10.1128/Spectrum.00029-21
Source DB: PubMed Journal: Microbiol Spectr ISSN: 2165-0497
FIG 1Determining cluster numbers. (A) Gap statistic for different numbers of k clusters, ranging from 1 to 10. Larger values indicate greater separation between clusters. An optimal number of clusters based on gap statistic is shown by a dashed vertical line. (B) Principal coordinate analysis of 5 clusters, based on the Bray-Curtis dissimilarity matrix. The legend showing the color code of each cluster is on the right. Together, the first two principal coordinates capture >70% of the variance in these data. Each dot represents one of 167 samples in the cross-sectional analysis described in the Materials and Methods. (C) Adjusted r squared from linear models of lung function (ppFEV1) using membership in a given cluster as the independent variable. Models were run for different numbers of k clusters, ranging from 2 to 10.
FIG 2Microbial clusters are the best predictor of airway function. The ability of linear models to predict ppFEV1, as estimated by adjusted R squared. Individual factors or combinations (denoted by “+”) of covariates are shown on the y axis: Simpson diversity only, age only, Pseudomonas only, the combination of Simpson and age, or Simpson and age and the relative abundance of other classic CF pathogens, compared to membership in one of the 5 clusters.
FIG 3Analysis of five identified clusters. (A) Mean abundances of the top 10 genera in each of the 5 clusters. Mnemonic abbreviations: Str.D, Streptococcus dominated; Pa.D, Pseudomonas dominated; Oth.D, Other dominated; Pa.M1, Pseudomonas mixed community number 1; Pa.M2, Pseudomonas mixed community number 2. The number of samples that belonged to each cluster appears beneath each cluster mnemonic. (B) Abundances of the top 10 genera for samples belonging to cluster 3 (Oth.D) in each of the 37 samples belonging to this group. (C) Maximum abundance of 10 genera achieving the highest relative abundance in any of the 167 samples, as a function of cluster.
Mapping the clusters back to the data sets
| Cluster no. | Mnemonic | Carmody | Cuthbertson |
|---|---|---|---|
| 1 | Str.D | 45 | 1 |
| 2 | Pa.D | 0 | 27 |
| 3 | Oth.D | 19 | 18 |
| 4 | Pa.M1 | 19 | 8 |
| 5 | Pa.M2 | 1 | 29 |
FIG 4Association of communities with lung function and age. (A) Association of communities with lung function. Subject-lung function profiles of 5 distinct communities. Box and whisker plot comparing ppFEV1 between Streptococcus-dominated (Str.D), Pseudomonas-dominated (Pa.D), communities dominated by other genera (Oth.D), Pseudomonas mixed community number 1 (Pa.M1), and Pseudomonas mixed community number 2 (Pa.M2), as shown in Fig. 3A. Communities sharing a common letter are not significantly different; those with no letter in common differ significantly (Tukey honest significant difference, P < . 05). For example, subjects with Str.D communities had higher ppFEV1 than those with any other community type. (B) Association of communities with age. Subject-age profiles of 5 distinct communities. Box and whisker plot comparing age between Streptococcus-dominated (Str.D), Pseudomonas-dominated (Pa.D), communities dominated by other genera (Oth.D), Pseudomonas mixed community number 1 (Pa.M1), and Pseudomonas mixed community number 2 (Pa.M2), as shown in Fig. 3A. Communities sharing a common letter are not significantly different; those with no letter in common differ significantly (Tukey honest significant difference, P < . 05). For example, subjects with Str.D communities were younger than those with Pa.D communities or Pa.M2 communities but did not differ significantly in age from subjects with Oth.D or Pa.M1 communities.
Characteristics of 24 subjects with at least 10 longitudinal samples
| Isolate | Aggressiveness |
| Period (yr) | Median age | Median FEV1 | Clinical states | FEV1/yr |
|---|---|---|---|---|---|---|---|
| 75 | Moderate/Severe | 16 | 8.99 | 14.9 | 32.5 | R, E, T, T, E, R, B, B, R, T, E, B, B, R, B, R | −2.23 |
| 139 | Moderate/Severe | 15 | 6.64 | 15.1 | 34 | B, B, B, T, R, T, B, B, B, B, R, B, T, T, R | −5.17 |
| 94 | Moderate/Severe | 13 | 9.16 | 17.2 | 39 | T, T, E, E, T, B, E, B, B, B, B, R, R | 1.56 |
| 229 | Moderate/Severe | 11 | 7.27 | 19.2 | 34 | R, R, R, B, B, T, R, B, T, B, B | −4.50 |
| 52 | Moderate/Severe | 18 | 9.74 | 23.0 | 43.5 | T, E, R, R, T, R, R, B, R, R, B, E, E, B, B, B, E, T, R, R | −1.84 |
| 179 | Moderate/Severe | 24 | 8.66 | 23.8 | 41 | B, E, T, T, B, E, T, R, B, B, E, B, R, R, E, R, T, B, T, E, R, B, E, E | −3.08 |
| 159 | Mild | 11 | 7.97 | 24.3 | 82 | B, B, B, E, B, B, B, B, B, B, R | −1.20 |
| 87 | Mild | 17 | 7.54 | 25.0 | 82 | T, T, R, B, E, B, B, B, B, B, R, T, R, B, R, B, B | 1.84 |
| 160 | Mild | 13 | 8.01 | 25.2 | 73 | B, T, B, B, R, B, R, B, R, B, T, E, B | 0.79 |
| 142 | Moderate/Severe | 20 | 10.14 | 25.4 | 67 | B, T, T, T, T, E, B, R, T, T, E, E, R, B, T, E, B, R, T, E | −1.16 |
| 141 | Mild | 12 | 8.91 | 26.3 | 85.5 | R, E, R, B, B, B, R, T, B, R, B, T | −0.20 |
| 124 | Moderate/Severe | 18 | 10.24 | 26.3 | 54.5 | B, B, T, B, T, B, E, T, B, E, B, E, B, E, T, B, E, B, B | −1.00 |
| 147 | Moderate/Severe | 18 | 8.17 | 27.2 | 36.5 | B, R, E, E, B, B, B, B, B, T, R, T, T, T, B, B, R, E | −3.73 |
| 209 | Moderate/Severe | 16 | 7.87 | 28.0 | 37.5 | T, B, T, B, E, T, R, E, R, R, R, T, R, R, B, R | −6.97 |
| 53 | Mild | 10 | 5.35 | 28.6 | 69.5 | E, R, T, E, B, E, B, E, B, NA, NA | 1.55 |
| 182 | Mild | 14 | 5.69 | 29.0 | 83.5 | T, B, R, E, E, B, B, B, B, B, B, E, E, E | −2.28 |
| 118 | Mild | 10 | 7.33 | 30.3 | 58.5 | E, T, B, R, B, B, B, T, R, E | −1.33 |
| 244 | Mild | 10 | 6.42 | 30.8 | 74.5 | E, R, B, B, R, B, B, B, E, B | −2.23 |
| 222 | Mild | 10 | 7.47 | 33.5 | 97 | B, R, B, T, B, T, B, B, B, NA | −0.21 |
| 136 | Mild | 15 | 7.6 | 36.8 | 57 | B, R, B, E, B, B, T, R, B, R, E, E, B, T, B | −2.99 |
| 119 | Mild | 14 | 7.72 | 37.1 | 36 | E, B, T, B, B, B, B, T, B, B, B, R, B, B | −3.42 |
| 195 | Mild | 11 | 8.24 | 41.4 | 54 | E, B, E, E, B, E, B, T, R, B, R | 0.45 |
| 50 | Mild | 10 | 6.27 | 43.5 | 70.5 | B, E, R, B, E, B, B, B, B, R | −2.27 |
| 117 | Mild | 20 | 9.72 | 49.6 | 51 | E, E, B, R, B, E, B, E, T, T, B, T, B, E, B, E, T, T, B, B | −0.31 |
Disease aggressiveness for each subject was based on publicly available metadata; n represents the number of sputum microbiomes available for each subject, observed over the period, given in years.
Clinical states observed: B, baseline; E, exacerbation; T, treatment; R, recovery.
Linear regression estimates of the annual change in FEV1 percent predicted for each subject shown at far right. The table is sorted by increasing median age.
FIG 5Association of clusters across samples and disease state. (A) Frequency of observing various community types in subjects designated having mild or moderate/severe disease. Subjects with mild disease (n = 14, left) are compared to subjects with moderate/severe disease (n = 10, right). “Subject identifier” corresponds to “isolate” in Table 2, a field provided in the Carmody (18) data set. Community types identified in these longitudinal samples were deemed similar to those in Fig. 3A by the random forest classifier. (B) Frequency of observing various clinical states for the 14 subjects with mild disease aggressiveness (left) and the 10 subjects with moderate/severe disease (right), as shown in Table 2. Clinical states are defined as B, baseline; E, exacerbation; T, treatment; and R, recovery. Community types identified in these longitudinal samples were deemed similar to those in Fig. 3A by the random forest classifier.
FIG 6Support for the climax-attack model. (A) Median within-subject pairwise distances (Bray-Curtis dissimilarity) for n = 14 subjects with mild disease aggressiveness (left) and n = 10 subjects with moderate/severe disease (right) as shown in Table 2. Subjects with mild disease had samples that were significantly more similar to each other (* signifies P < 0.05, Wilcoxon rank sum exact test). (B) Markov chain diagram for community transitions in subjects with mild disease (see Table 2). Each node in the diagram (orange circles) represents the state of being observed with a specific bacterial community. Edges leading from each node represent possible transitions, including the possibility of observing the same community again (loop). Numbers on arrows are estimates of the probability of taking a specific path. Subjects with mild disease were significantly more likely to remain in Pa.M1 than subjects with moderate/severe disease (red arrow; * indicates P = 0.03, Fisher’s exact test, odds ratio 1.78). (C) Markov chain diagram for community transitions in subjects with moderate/severe disease (see Table 2). Each node in the diagram (orange circles) represents the state of being observed with a specific bacterial community. Edges leading from each node represent possible transitions, including the possibility of observing the same community again (loop). Numbers on arrows are estimates of the probability of taking a specific path.