| Literature DB >> 34349739 |
Briony Jones1,2, Tim Goodall3, Paul B L George1,2, Hyun S Gweon4, Jeremy Puissant3, Daniel S Read3, Bridget A Emmett1, David A Robinson1, Davey L Jones3, Robert I Griffiths1.
Abstract
High-throughput sequencing 16S rRNA gene surveys have enabled new insights into the diversity of soil bacteria, and furthered understanding of the ecological drivers of abundances across landscapes. However, current analytical approaches are of limited use in formalizing syntheses of the ecological attributes of taxa discovered, because derived taxonomic units are typically unique to individual studies and sequence identification databases only characterize taxonomy. To address this, we used sequences obtained from a large nationwide soil survey (GB Countryside Survey, henceforth CS) to create a comprehensive soil specific 16S reference database, with coupled ecological information derived from survey metadata. Specifically, we modeled taxon responses to soil pH at the OTU level using hierarchical logistic regression (HOF) models, to provide information on both the shape of landscape scale pH-abundance responses, and pH optima (pH at which OTU abundance is maximal). We identify that most of the soil OTUs examined exhibited a non-flat relationship with soil pH. Further, the pH optima could not be generalized by broad taxonomy, highlighting the need for tools and databases synthesizing ecological traits at finer taxonomic resolution. We further demonstrate the utility of the database by testing against geographically dispersed query 16S datasets; evaluating efficacy by quantifying matches, and accuracy in predicting pH responses of query sequences from a separate large soil survey. We found that the CS database provided good coverage of dominant taxa; and that the taxa indicating soil pH in a query dataset corresponded with the pH classifications of top matches in the CS database. Furthermore we were able to predict query dataset community structure, using predicted abundances of dominant taxa based on query soil pH data and the HOF models of matched CS database taxa. The database with associated HOF model outputs is released as an online portal for querying single sequences of interest (https://shiny-apps.ceh.ac.uk/ID-TaxER/), and flat files are made available for use in bioinformatic pipelines. The further development of advanced informatics infrastructures incorporating modeled ecological attributes along with new functional genomic information will likely facilitate large scale exploration and prediction of soil microbial functional biodiversity under current and future environmental change scenarios.Entities:
Keywords: 16S database; HOF modeling; amplicon 16S rRNA; countryside survey; ecological responses; soil bacteria communities; traits
Year: 2021 PMID: 34349739 PMCID: PMC8326369 DOI: 10.3389/fmicb.2021.682886
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
FIGURE 1Coverage of bacterial 97% OTUs within the Countryside Survey (CS) dataset. Sample based richness accumulation curves were calculated across 1,006 CS soil samples (“All sites”), and within specific habitats. Standard deviations are calculated from random permutations of the data.
Validating the use of the CS OTU sequences as a database, through querying with independent datasets.
| 1 | Grassland and arable soils, Britain | 67.26% | 341f/806r V3-V4 | PRJEB36119 |
| 2 | All habitat soils survey, Wales | 58.49% | 515f/806rB V4 | PRJEB27883 |
| 3 | Thames River, Britain | 33.2% | 341f/806r V3-V4 | Unpublished, see |
FIGURE 2The CS database provides good coverage of dominant taxa within a query dataset. Query OTU reference sequences (dataset 1, Table 1) were grouped into 1,000 bins by decreasing rank (e.g., the 1000th bin contains the least abundant OTUs); and the proportion of each bin matching the CS dataset calculated and displayed on the y-axis. The proportion of matches to the CS database (> 97% similarity) declines as query taxa become rarer, despite the comprehensive nature of the CS database.
FIGURE 3Examples of the five HOF model types. HOF models were generated through fitting countryside survey OTU abundances to soil pH (a pH range from 3.63 to 8.75). The five HOF models used were: (I) no change in abundance across pH gradient, (II) montonic an increase or decrease in abundance along pH gradient, (III) plateau an increase or decrease in abundance along pH gradient that plateaus, (IV) symmetrical unimodal, abundance increases and decreases across gradient at an equal rate, (V) skewed unimodal, abundance increases and decreases across gradient at unequal rates.
Percentage of 13,781 CS OTUs fitted to each HOF model.
| V (Skewed Unimodal) | 45.76% |
| III (Plateau) | 24.13% |
| IV (Unimodal) | 23.52% |
| II (Monotonic) | 6.11% |
| I (No trend) | 0.49% |
Percentage of 13,781 CS OTUs classified to different pH response groups.
| Mid (5.2 < Optima < 7) | 34.8% |
| Neutral (Optima > 7) | 31.62% |
| Acid (Optima < 5.2) | 23.08% |
| Mid to Neutral (5.2 < Optimum1 < 7 and Optimum 2 > 7) | 7.41% |
| Acid to Neutral (Optimum1 < 5.2 and Optimum2 > 7) | 1.52% |
| Acid to Mid (Optimum1 < 5.2 and 5.2 < Optimum2 < 7) | 1.14% |
FIGURE 4The phylogenetic distribution of bacterial pH optima. A phylogenetic tree of all OTUs present in > 100 samples (totaling 6,385 OTUs), with each OTU annotated according to pH classification based on HOF model optima (outer ring).
FIGURE 5Validating the pH models using a query dataset. Taxa strongly responsive to soil pH were identified from Query dataset 1 (Table 1), and then matched to the CS database to evaluate utility of the approach. (A) NMDS ordination plot of the query dataset, with pH groupings denoted by color (red = pH < 5.2; green = pH > 5.2 < 7; and blue = pH > 7). (B) Indicator species analyses on the query dataset revealed 477 OTUs strongly associated with the three pH classes (“Observed pH class”). The y-axis values and point color denote the predicted pH optimum, and predicted pH class following matching to CS database. (C) The relative abundances of the 100 most abundant taxa in the query dataset were predicted using the CS HOF models of matched taxa, and subjected to NMDS ordination. The plot shows that the predicted abundances of these taxa reliably predicted the observed data first axis NMDS scores.