| Literature DB >> 24045641 |
Qiong Wang1, John F Quensen, Jordan A Fish, Tae Kwon Lee, Yanni Sun, James M Tiedje, James R Cole.
Abstract
UNLABELLED: Biological nitrogen fixation is an important component of sustainable soil fertility and a key component of the nitrogen cycle. We used targeted metagenomics to study the nitrogen fixation-capable terrestrial bacterial community by targeting the gene for nitrogenase reductase (nifH). We obtained 1.1 million nifH 454 amplicon sequences from 222 soil samples collected from 4 National Ecological Observatory Network (NEON) sites in Alaska, Hawaii, Utah, and Florida. To accurately detect and correct frameshifts caused by indel sequencing errors, we developed FrameBot, a tool for frameshift correction and nearest-neighbor classification, and compared its accuracy to that of two other rapid frameshift correction tools. We found FrameBot was, in general, more accurate as long as a reference protein sequence with 80% or greater identity to a query was available, as was the case for virtually all nifH reads for the 4 NEON sites. Frameshifts were present in 12.7% of the reads. Those nifH sequences related to the Proteobacteria phylum were most abundant, followed by those for Cyanobacteria in the Alaska and Utah sites. Predominant genera with nifH sequences similar to reads included Azospirillum, Bradyrhizobium, and Rhizobium, the latter two without obvious plant hosts at the sites. Surprisingly, 80% of the sequences had greater than 95% amino acid identity to known nifH gene sequences. These samples were grouped by site and correlated with soil environmental factors, especially drainage, light intensity, mean annual temperature, and mean annual precipitation. FrameBot was tested successfully on three ecofunctional genes but should be applicable to any. IMPORTANCE: High-throughput phylogenetic analysis of microbial communities using rRNA-targeted sequencing is now commonplace; however, such data often allow little inference with respect to either the presence or the diversity of genes involved in most important ecological processes. To study the gene pool for these processes, it is more straightforward to assess the genes directly responsible for the ecological function (ecofunctional genes). However, analyzing these genes involves technical challenges beyond those seen for rRNA. In particular, frameshift errors cause garbled downstream protein translations. Our FrameBot tool described here both corrects frameshift errors in query reads and determines their closest matching protein sequences in a set of reference sequences. We validated this new tool with sequences from defined communities and demonstrated the tool's utility on nifH gene fragments sequenced from soils in well-characterized and major terrestrial ecosystem types.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24045641 PMCID: PMC3781835 DOI: 10.1128/mBio.00592-13
Source DB: PubMed Journal: mBio Impact factor: 7.867
Frameshifts detected with FrameBot for regions from three genes amplified from defined communities
| Gene | Strain | No. of sequences passing FrameBot[ | No. of sequences with frameshifts[ | No. of frameshifts[ |
|---|---|---|---|---|
| 5,726 | 772 | 1,056 | ||
| 4,053 | 309 | 399 | ||
| 9,140 | 1,947 | 2,224 | ||
| 407 | 206 | 325 | ||
| 291 | 99 | 154 | ||
| 724 | 146 | 242 | ||
| 1,028 | 719 | 1,026 |
Number of reads with a FrameBot best match of greater than 30% identity to a (known) defined community sequence.
Number of reads with one or more frameshifts detected by FrameBot.
Total number of frameshifts detected by FrameBot.
nifH, nitrogenase reductase.
but, butyryl-CoA: acetate CoA-transferase.
bphA, biphenyl dioxygenase alpha subunit.
FIG 1 FrameBot performance using reference sequences at various percentages of identity to query sequences. Target protein sequences were chosen from the FunGene site (http://fungene.cme.msu.edu) at various distances from the known defined community sequences. The error rates at 100% identity represent baseline sequencing errors. The test genes are nifH (nitrogenase reductase) (A); bphA (biphenyl dioxygenase alpha subunit) (B); and but (butyryl-CoA: acetate CoA-transferase) (C). Dotted lines represent the overall error rates for FragGeneScan and HMMFrame on the same amplicon data. The error rate from HMMFrame for nifH shown here (0.36%) is calculated from an HMM trained on the group I, II, and III sequences from the augmented Zehr reference set. When trained on the entire augmented Zehr reference set, the error rate rose to 0.67%, and when trained on the group I-only sequences, the error rate was 0.34%.
Characteristics of NEON sites and samples used for nifH amplicon analysis[]
| Site parameter | Result for indicated site | |||
|---|---|---|---|---|
| Alaska (AK) | Florida (FL) | Hawaii (HI) | Utah (UT) | |
| Ecological region | Boreal forest/taiga | Subtropical/dry forest | Subtropical/lower montane wet forest | Grassland/shrubland |
| Soil type | Goldstream silt loam | Candler fine sand | Akaka silty clay loam | Taylorsflat loam |
| Soil taxonomy | Coarse-silty mixed, superactive, subgelic typic Hitoturbels | Hyperthermic, uncoated Lamellic Quartzipsamments | Hydrous, ferrihydritic, isothermic Acrudoxic Hydrudands | Fine-loamy, mixed, superactive, mesic, xeric Haplocalcids |
| Drainage | Very poor | Excessive | Moderately well | Well |
| MAP (mm) | 260 | 750 | 4,000 | 274 |
| MAT (°C) | −3 | 20 | 16 | 8.9 |
| Latitute, longitude | 65.15N 147.49W | 29.69N 81.99W | 19.93N 155.28W | 40.17N 112.45W |
| Light intensity[ | 3 | 4 | 2 | 5 |
| % OM | 18.3 ± 11.6 | 1.2 ± 0.3 | 51.4 ± 12.3 | 1.5 ± 0.3 |
| pH | 4.6 ± 0.8 | 5.0 ± 0.6 | 4.9 ± 0.7 | 8.0 ± 0.3 |
| Water content (g) | 5.5 ± 1.6 | 0.25 ± 0.18 | 0.74 ± 0.07 | 0.52 ± 0.56 |
| Ca (ppm) | 724 ± 337 | 121 ± 60 | 564 ± 401 | 5110 ± 651 |
| Na (ppm) | 15.7 ± 2.9 | 10.7 ± 1.6 | 33.1 ± 8.5 | 30.1 ± 4.6 |
| Microbial biomass (mg C/kg) | 4.7 ± 3 | 1.0 ± 0.5 | 31.1 ± 18.7 | 8.9 ± 3.5 |
| No. of samples | 26 | 17 | 171 | 8 |
| No. of sequences | 125,294 | 79,619 | 896,824 | 19,105 |
MAP and MAT were measured for each site; % OM was calculated as the average of the % OM of samples from the same site. The % OM, Water content, Ca, Na and microbial biomass data show mean and standard deviations.
Ordered 1 to 5 by perceived relative sunlight exposure.
FIG 2 Relative abundances of NEON reads grouped by nearest matches at the phylum and class levels, averaged for each site (observatory) as indicated by state. The three most dominant genera in alphaproteobacteria are also shown. Other, all phyla with less than 0.5% nearest matches from any site.
FIG 3 Principal component analysis of NEON samples. (A) PC1 and PC2. (B) PC2 and PC3. The input data were standardized using the Wisconsin square root normalization as implemented in R. Ellipses represent 1 standard deviation of the points from the centroid. The soil environmental variables were fitted to the ordination using the envfit method from the labdsv R package. Arrows were plotted for variables with significance of fit ≤ 0.01.
Number of Zehr (augmented) reference sequences matching the Poly primers and NEON reads
| Group(s)[ | Total | No. of matching | No. of matching reads[ |
|---|---|---|---|
| I | 183 | 168 | 145 |
| II | 88 | 44 | 21 |
| III | 48 | 17 | 8 |
| IV and V | 356 | 13 | 0 |
Group I consists of primarily aerobes, including cyanobacteria and proteobacteria, with typical Mo–dependent nitrogenases; group II consists of anaerobes and Archaea and group III those with alternative metal nitrogenases (15).
At most two mismatches to forward and reverse primers. A total of 118 group I reference sequences had 0 or 1 mismatch.
Reference sequences with at least 95% amino acid identity to one or more reads from the NEON samples.