| Literature DB >> 23270314 |
David G Weissbrodt1, Noam Shani, Lucas Sinclair, Grégory Lefebvre, Pierre Rossi, Julien Maillard, Jacques Rougemont, Christof Holliger.
Abstract
BACKGROUND: In molecular microbial ecology, massive sequencing is gradually replacing classical fingerprinting techniques such as terminal-restriction fragment length polymorphism (T-RFLP) combined with cloning-sequencing for the characterization of microbiomes. Here, a bioinformatics methodology for pyrosequencing-based T-RF identification (PyroTRF-ID) was developed to combine pyrosequencing and T-RFLP approaches for the description of microbial communities. The strength of this methodology relies on the identification of T-RFs by comparison of experimental and digital T-RFLP profiles obtained from the same samples. DNA extracts were subjected to amplification of the 16S rRNA gene pool, T-RFLP with the HaeIII restriction enzyme, 454 tag encoded FLX amplicon pyrosequencing, and PyroTRF-ID analysis. Digital T-RFLP profiles were generated from the denoised full pyrosequencing datasets, and the sequences contributing to each digital T-RF were classified to taxonomic bins using the Greengenes reference database. The method was tested both on bacterial communities found in chloroethene-contaminated groundwater samples and in aerobic granular sludge biofilms originating from wastewater treatment systems.Entities:
Mesh:
Substances:
Year: 2012 PMID: 23270314 PMCID: PMC3566925 DOI: 10.1186/1471-2180-12-306
Source DB: PubMed Journal: BMC Microbiol ISSN: 1471-2180 Impact factor: 3.605
Figure 1Data workflow in the PyroTRF-ID bioinformatics methodology. Experimental pyrosequencing and T-RFLP input datasets (black parallelograms), reference input databases (white parallelograms), data processing (white rectangles), output files (grey sheets).
Combinations of algorithms tested for the processing of pyrosequencing datasets for dT-RFLP profiling in PyroTRF-ID
| 1) Standard dT-RFLPd | >20e | >300 bp | Yes | >150f | Yes |
| 2) Filtered dT-RFLPe | >20 | >300 bp | No | >150 | Yes |
| 3) Raw dT-RFLPd | >20 | >300 bp | No | No (0)g | Yes |
a PHRED score = −10 log Perror with Perror = 10-PHRED/10 as the probability that a base was called incorrectly. For all trials, the raw pyrosequencing datasets were systematically filtered according to the PHRED quality score. Only sequences with a related PHRED score above 20 were conserved. This corresponds to a Perror of 1/100.
b A SW mapping score of 150 was set as cutoff. In the case when sequences were preliminarily denoised, it was nevertheless observed that no denoised sequence was rejected at the mapping stage. Processing without filtering by the SW mapping score was done by setting a cutoff of 0.
c The processed sequences were digested in silico with the restriction enzyme.
d The first combination with denoising was defined as the standard PyroTRF-ID procedure.
e In the second combination, only a filtering method at the mapping stage was considered.
f In the third combination, raw datasets of sequences obtained after PHRED-filtering of the pyrosequencing datasets were digested without post-processing.
Phylogenetic annotation of identified T-RFs
| n.a. (32)j | 39 | 34 | 550 | 70.6 | F: | 4015 | GQ396926 | 386 | 0.960 |
| (276) | (35.0) | (G: | (4045) | (EU834762) | (452) | (0.983) | |||
| (128) | (16.0) | (G: | (4035) | (EU834761) | (385) | (0.955) | |||
| | | | 112 | 14.3 | O: | 1151 | AY468464 | 434 | 1.000 |
| | | | 46 | 5.9 | F: | 2718 | AY212706 | 448 | 1.000 |
| | | | 37 | 4.8 | S: | 3160 | AB200295 | 363 | 0.917 |
| | | | 18 | 2.3 | O: | 1229 | GU454872 | 394 | 0.990 |
| | | | 5 | 0.6 | C: | 3370 | AY098896 | 403 | 0.906 |
| | | | 4 | 0.5 | O: | 2549 | EU429497 | 360 | 0.981 |
| | | | 4 | 0.5 | O: | 3246 | DQ228369 | 302 | 0.765 |
| | | | 1 | 0.1 | O: | 991 | EU104248 | 180 | 0.636 |
| 194 | 198 | 193 | 10 | 90.9 | G: | 3011 | AJ864847 | 384 | 1.000 |
| | | | 1 | 9.1 | F: | 4035 | EF027004 | 303 | 0.819 |
| 214 | 219 | 214 | 769 | 99.6 | S: | 3160 | AB200295 | 371 | 0.949 |
| | | | 1 | 0.1 | G: | 3158 | DQ066958 | 368 | 0.958 |
| | | | 1 | 0.1 | G: | 3156 | DQ413103 | 321 | 0.988 |
| | | | 1 | 0.1 | G: | 3136 | EU937892 | 278 | 0.753 |
| 220 | 225 | 220 | 50 | 92.6 | O: | 2580 | NR025302 | ||
| (31) | (57.0) | (G: | |||||||
| | | | 2 | 3.7 | S: | 3160 | AB200295 | 206 | 0.703 |
| | | | 1 | 1.9 | F: | 2656 | AF236001 | 229 | 0.636 |
| | | | 1 | 1.9 | P: | 2235 | DQ413080 | 284 | 1.000 |
| 216 | 221 | 216 | 10 | 34.5 | S: | 3160 | AF502230 | 296 | 0.773 |
| | | | 8 | 27.6 | G: | 3136 | GU183579 | 364 | 0.948 |
| | | | 6 | 20.7 | C: | 1317 | EU104216 | 202 | 0.598 |
| | | | 3 | 10.3 | G: | 3158 | CU922545 | 360 | 0.909 |
| | | | 1 | 3.4 | G: | 2580 | L20802 | 281 | 0.829 |
| | | | 1 | 3.4 | G: | 3156 | DQ413103 | 273 | 0.898 |
| 223 | 228 | 223 | 44 | F: | 418 | AF255629 | |||
| (G: | |||||||||
| | | | 15 | 24.6 | F: | 2656 | AF236001 | 298 | 0.674 |
| | | | 1 | 1.6 | F: | 441 | GQ009478 | 228 | 0.544 |
| | | | 1 | 1.6 | O: | 268 | GQ009478 | 153 | 0.447 |
| 239 | 243 | 238 | 275 | 98.9 | C: | 3370 | EU529737 | 446 | 0.982 |
| | | | 2 | 0.7 | G: | 4092 | AB476706 | 350 | 0.926 |
| | | | 1 | 0.4 | P: | 975 | EU332819 | 275 | 0.846 |
| 249 | 253 | 249 | 9 | 100.0 | S: | 3160 | AB200295 | 228 | 0.752 |
| 255 | 258 | 253 | 7 | 100.0 | O: | 1171 | FJ793188 | 355 | 0.989 |
| 260 | 263 | 258 | 16 | 94.1 | G: | 2360 | GQ487996 | 389 | 0.982 |
| | | | 1 | 5.9 | O: | 1171 | FJ536916 | 251 | 0.640 |
| 260 | 264 | 259 | 38 | 97.4 | O: | 1170 | EU104185 | 267 | 0.706 |
| | | | 1 | 2.6 | G: | 2360 | GQ487996 | 319 | 0.788 |
| 297 | 302 | 297 | 26 | 100.0 | G: | 1359 | NC009972 | 339 | 0.867 |
| 307 | 311 | 306 | 38 | 97.4 | P: | 975 | CU921283 | 218 | 0.472 |
| | | | 1 | 2.6 | O: | 1171 | EU104210 | 196 | 0.525 |
| 321 | 323 | 318 | 17 | 100.0 | G: | 1208 | EU104191 | 367 | 0.968 |
| 393 | 397 | 392 | 33 | 100.0 | G: | 3173 | CU466777 | 262 | 0.663 |
| 63 | 69 | 64 | 93 | 85.3 | F: | 3686 | AB354618 | 432 | 0.915 |
| | | | 14 | 12.8 | F: | 3681 | GU454947 | 290 | 0.816 |
| | | | 1 | 0.9 | F: | 3510 | AM902494 | 168 | 0.542 |
| | | | 1 | 0.9 | P: candidate phylum OP3 | 2388 | GQ356152 | 187 | 0.488 |
| 165 | 168 | 163 | 143 | 100.0 | G: | 1368 | EF059529 | 448 | 0.953 |
| 190 | 193 | 191 | 12 | 54.6 | F: | 3177 | AJ389624 | 379 | 0.945 |
| | | | 4 | 13.6 | F: | 2880 | AY785128 | 263 | 0.555 |
| | | | 2 | 9.1 | F: | 2872 | DQ811848 | 343 | 0.771 |
| | | | 2 | 9.1 | C: | 2451 | AY921822 | 337 | 0.926 |
| | | | 1 | 4.6 | F: | 2793 | AY625147 | 294 | 0.679 |
| | | | 1 | 4.6 | F: | 2641 | AB374390 | 328 | 0.877 |
| 198 | 201 | 196 | 140 | 98.6 | G: | 3215 | FJ810587 | 473 | 1.000 |
| | | | 2 | 1.4 | F: | 3039 | FN428768 | 311 | 0.814 |
| 210 | 214 | 209 | 233 | 98.3 | F: | 1367 | EU679418 | 262 | 0.665 |
| | | | 2 | 0.8 | O: | 3009 | AM777991 | 367 | 0.927 |
| | | | 1 | 0.4 | F: | 4130 | EU073764 | 295 | 0.848 |
| | | | 1 | 0.4 | P: candidate phylum TM7 | 4379 | DQ404736 | 277 | 0.723 |
| 216 | 221 | 216 | 1010 | 90.9 | F: | 3080 | EU802012 | 353 | 0.869 |
| | | | 94 | 8.5 | G: | 3050 | DQ628925 | 369 | 0.920 |
| | | | 3 | 0.3 | G: | 3093 | AY212692 | 291 | 0.744 |
| | | | 1 | 0.1 | G: | 3158 | GQ340363 | 296 | 0.765 |
| | | | 1 | 0.1 | F: | 2005 | AJ863357 | 338 | 0.833 |
| | | | 1 | 0.1 | C: | 1315 | AB179693 | 229 | 0.511 |
| | | | 1 | 0.1 | C: | 949 | EU644115 | 372 | 0.907 |
| 243 | 247 | 243 | 389 | 99.7 | F: | 1367 | EU679418 | 255 | 0.631 |
| 1 | 0.3 | F: | 1321 | AB447642 | 253 | 0.806 | |||
a Experimental (eT-RF) and digital T-RFs (dT-RF).
b Digital T-RF obtained after having shifted the digital dataset with the most probable average cross-correlation lag.
c Number of reads of the target phylotype that contribute to the T-RF.
d Diverse bacterial affiliates can contribute to the same T-RF.
e Phylogenetic affiliation of the T-RF (K: kingdom, P: phylum, C: class, O: order, F: family, G: genus, S: species). Only the last identified phylogenetic branch is presented here.
f Reference operational taxonomic unit (OTU) from the Greengenes public database related with the best SW mapping score. In the Greengenes taxonomy, OTU refer to terminal levels at which sequences are classified.
g GenBank accession numbers provided by Greengenes for reference sequences.
h Best SW mapping score obtained. SW scores consider nucleotide positions and gaps. The highest SW mapping score that can be obtained for a read is the length of the read itself.
i SW mapping score normalized by the read length, as an estimation of the percentage of identity.
j After having observed the presence of the dT-RF 34 bp, we returned to the raw eT-RFLP data and found an important eT-RF at 32 bp. However, Rossi et al. [8] considered that T-RFs below 50 bp are inconsistent and lacks of precision in sizing. This peak was therefore initially not taken into account in the original eT-RFLP profiles.
T-RF diversity for single phylogenetic descriptions
| 39 | 34 | 37 | 4.8 | 3160 | AB200295 | 363 | 0.917 | |
| | 199 | 194 | 1 | 25.0 | 3160 | AB200295 | 248 | 0.648 |
| | 205 | 200 | 3 | 100.0 | 3160 | AF204247 | 314 | 0.858 |
| | 210 | 205 | 1 | 100.0 | 3160 | AF204247 | 211 | 0.699 |
| | 218 | 213 | 11 | 91.7 | 3160 | AB200295 | 356 | 0.942 |
| | ||||||||
| | 220 | 215 | 6 | 37.5 | 3160 | AF502230 | 318 | 0.817 |
| | 221 | 216 | 1 | 7.7 | 3160 | AF502230 | 276 | 0.865 |
| | 225 | 220 | 2 | 3.7 | 3160 | AB200295 | 206 | 0.703 |
| | 252 | 247 | 3 | 100.0 | 3160 | AB200295 | 305 | 0.762 |
| | 253 | 248 | 9 | 100.0 | 3160 | AB200295 | 228 | 0.752 |
| | 257 | 252 | 1 | 20.0 | 3160 | AF502230 | 241 | 0.660 |
| 166 | 161 | 1 | 100.0 | 1368 | EF059529 | 290 | 0.775 | |
| | ||||||||
| | 169 | 164 | 2 | 100.0 | 1368 | EF059529 | 331 | 0.768 |
| | 170 | 165 | 2 | 100.0 | 1368 | EF059529 | 241 | 0.693 |
| | 171 | 166 | 1 | 50.0 | 1368 | EF059529 | 303 | 0.783 |
| | 173 | 168 | 1 | 100.0 | 1368 | EF059529 | 241 | 0.717 |
| | 176 | 171 | 1 | 100.0 | 1369 | DQ833317 | 211 | 0.687 |
| | 179 | 174 | 1 | 100.0 | 1369 | DQ833317 | 193 | 0.629 |
| 188 | 183 | 4 | 66.7 | 1369 | DQ833340 | 464 | 0.947 | |
a Digital T-RF obtained after having shifted the digital dataset with the most probable average cross-correlation lag.
b Number of reads of the target phylotype that contribute to the T-RF.
c Diverse bacterial affiliates can contribute to the same T-RF.
d Reference OTU from the Greengenes public database obtained after mapping.
e GenBank accession numbers provided by Greengenes for reference sequences.
f Best SW mapping score obtained.
g SW mapping score normalized by the read length.
Figure 2Density plots displaying the repartition of T-RFs along the 0–500 bp domain with different endonucleases. The effect of the different restriction endonucleases HaeIII, AluI, MspI, HhaI, RsaI and TaqI was tested on pyrosequencing datasets collected from the samples GRW01 (A) and AGS01 (B). Histograms represent the number of T-RFs produced per class of 50 bp (to read on the left y-axes). Thick black lines represent the cumulated number of T-RFs over the 500-bp fingerprints (to read on the right y-axes). The total cumulated number of T-RFs corresponds to the richness index. The number given in brackets corresponds to the Shannon′s diversity index.
Figure 3Mirror plot displaying the cross-correlation between digital and experimental T-RFLP profiles. This mirror plot was generated for the complex bacterial community of sample GRW01. Comparison of mirror plots constructed with raw (A) and denoised sequences (B). Relative abundances are displayed up to 5% absolute values. For those T-RFs exceeding these limits, the actual relative abundance is displayed beside the peak.
Cross-correlations between experimental and standard digital T-RFLP profiles
| | | | | | |
| GRW01d | −4 | 0.62 | 88 | 58 | 66 |
| GRW02d | −5 | 0.69 | 50 | 23 | 46 |
| GRW03d | −4 | 0.44 | 76 | 62 | 82 |
| GRW04d | −5 | 0.71 | 44 | 24 | 44 |
| GRW05d | −5 | 0.35 | 75 | 56 | 75 |
| GRW06d | −6 | 0.51 | 87 | 70 | 81 |
| GRW07e | −6 | 0.70 | 57 | 17 | 30 |
| GRW08e | −4 | 0.59 | 54 | 43 | 80 |
| GRW09e | −4 | 0.69 | 71 | 66 | 93 |
| GRW10e | −5 | 0.68 | 70 | 22 | 31 |
| | |||||
| AGS01e | −5 | 0.75 | 48 | 31 | 65 |
| AGS02e,f | −5 | 0.90 | 38 | 22 | 58 |
| AGS03e,f | −5 | 0.90 | 38 | 19 | 50 |
| AGS04e | −5 | 0.72 | 52 | 24 | 46 |
| AGS05e | −4 | 0.67 | 43 | 29 | 67 |
| AGS06e,f | −5 | 0.91 | 38 | 19 | 50 |
| AGS07e | −5 | 0.80 | 38 | 31 | 82 |
a Shift leading to optimal matching of the digital to the experimental T-RFLP profile.
b Maximum cross-correlation coefficients obtained after matching of the digital to the experimental T-RFLP profile.
c Number and percentage of experimental T-RFs having corresponding digital T-RFs.
d Samples GRW01-06 were pyrosequenced with the HighRA method.
e Samples GRW07-10 and AGS01-07 were pyrosequenced with the LowRA method.
f Samples AGS02, AGS03, and AGS06 are triplicates from the same DNA extract.
Figure 4Assessment of the impact of data processing on dT-RFLP profiles, and comparison with eT-RFLP profiles. Richness and Shannon′s H′ diversity indices were calculated in a way to quantify the impact of the pyrosequencing data processing parameters on the resulting dT-RFLP profiles. Two examples are given for samples pyrosequenced with the HighRA (GRW01) and LowRA methods (GRW07).
Figure 5Amount of bacterial affiliations contributing to T-RFs. The absolute (A) and relative numbers (B) of T-RFs that comprised one to several bacterial affiliations is given for the samples GRW01 and AGS01.