| Literature DB >> 23734239 |
Stephen J Salipante1, Dhruba J Sengupta, Christopher Rosenthal, Gina Costa, Jessica Spangler, Elizabeth H Sims, Michael A Jacobs, Samuel I Miller, Daniel R Hoogestraat, Brad T Cookson, Connor McCoy, Frederick A Matsen, Jay Shendure, Clarence C Lee, Timothy T Harkins, Noah G Hoffman.
Abstract
Classifying individual bacterial species comprising complex, polymicrobial patient specimens remains a challenge for culture-based and molecular microbiology techniques in common clinical use. We therefore adapted practices from metagenomics research to rapidly catalog the bacterial composition of clinical specimens directly from patients, without need for prior culture. We have combined a semiconductor deep sequencing protocol that produces reads spanning 16S ribosomal RNA gene variable regions 1 and 2 (∼360 bp) with a de-noising pipeline that significantly improves the fraction of error-free sequences. The resulting sequences can be used to perform accurate genus- or species-level taxonomic assignment. We explore the microbial composition of challenging, heterogeneous clinical specimens by deep sequencing, culture-based strain typing, and Sanger sequencing of bulk PCR product. We report that deep sequencing can catalog bacterial species in mixed specimens from which usable data cannot be obtained by conventional clinical methods. Deep sequencing a collection of sputum samples from cystic fibrosis (CF) patients reveals well-described CF pathogens in specimens where they were not detected by standard clinical culture methods, especially for low-prevalence or fastidious bacteria. We also found that sputa submitted for CF diagnostic workup can be divided into a limited number of groups based on the phylogenetic composition of the airway microbiota, suggesting that metagenomic profiling may prove useful as a clinical diagnostic strategy in the future. The described method is sufficiently rapid (theoretically compatible with same-day turnaround times) and inexpensive for routine clinical use.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23734239 PMCID: PMC3666980 DOI: 10.1371/journal.pone.0065226
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Distribution of read lengths and sequence errors.
(A) Kernel density plot of read lengths obtained by extended-length ion semiconductor sequencing. Each line represent results from an independent library, black line indicates library containing controls for error rate calculations and sensitivity studies. Vertical line marks the cutoff for full-length sequences. (B) Error rates for unprocessed and de-noised sequence reads, stratified by error type and reference organism. (C) Cumulative proportion of unprocessed and de-noised sequence reads at defined error counts. For unprocessed reads the fraction of sequences represented at a particular error count reflects the number of reads, and for de-noised sequences it reflects the total number of reads contributing to clusters.
Figure 2Recovery of low-prevalence species in polymicrobial specimens and reproducibility.
The fraction of de-noised sequence reads with highest pairwise alignment scores to the indicated reference sequence among four replicates of sequencing a mixture of reference organisms. Replicates 3 and 4 were generated from 1/10 and 1/100 the template DNA of the other replicates, respectively. The number of de-noised reads (black) or unprocessed reads (red) contributing to each analysis is indicated on the x-axis.
Uncultured clinical specimens and sequencing results.
| Deep Sequencing | ||||||
| Specimen Name/Clinical Sanger Sequencing results | Species name | % of total Reads | Number of Reads | Number of De-noised Clusters | Maximum % Identity | Minimum % Identity |
| Brain 1/ |
| 36.86 | 11269 | 5 | 99.69 | 99.07 |
| No diagnosis (multiple templates) | No match ≥99% | 34.43 | 10526 | 29 | ||
|
| 28.55 | 8728 | 11 | 99.68 | 99.05 | |
|
| 0.17 | 52 | 2 | 99.08 | 99.07 | |
| Brain 2/ |
| 99.01 | 6874 | 9 | 99.68 | 99.01 |
| No diagnosis (multiple templates) |
| 0.69 | 48 | 1 | 100 | 99.31 |
| No match ≥99% | 0.3 | 21 | 1 | |||
| Brain 3/ | No match ≥99% | 44.44 | 6155 | 33 | ||
| No diagnosis (multiple templates) |
| 31.62 | 4379 | 4 | 99.37 | 99.05 |
|
| 15.6 | 2161 | 3 | 99.68 | 99.37 | |
|
| 6.28 | 870 | 1 | 99.69 | 99.08 | |
|
| 2.06 | 286 | 2 | 99.41 | 99.12 | |
| Brain 4/ | No match ≥99% | 64.12 | 11410 | 24 | ||
| No diagnosis (multiple templates) |
| 25.06 | 4459 | 12 | 99.68 | 99.05 |
|
| 10.71 | 1905 | 2 | 99.69 | 99.07 | |
|
| 0.12 | 21 | 1 | 99.08 | 99.08 | |
| Lymphnode/ |
| 23.6 | 2742 | 1 | 99.7 | 99.05 |
|
| No match ≥99% | 22.36 | 2599 | 17 | ||
|
| 17.16 | 1994 | 2 | 100 | 99.32 | |
|
| 10.55 | 1226 | 3 | 100 | 99.07 | |
|
| 5.65 | 657 | 2 | 99.36 | 99.36 | |
|
| 5.22 | 607 | 3 | 100 | 99.04 | |
|
| 2.95 | 343 | 1 | 99.03 | 99 | |
|
| 2.62 | 304 | 1 | 99.68 | 99.05 | |
|
| 2.36 | 274 | 1 | 99.71 | 99.41 | |
|
| 2 | 232 | 1 | 99.68 | 99.05 | |
|
| 1.59 | 185 | 2 | 100 | 99.07 | |
|
| 0.69 | 80 | 2 | 99.68 | 99.03 | |
|
| 0.64 | 74 | 1 | 99.68 | 99.04 | |
|
| 0.46 | 54 | 1 | 99.36 | 99.04 | |
|
| 0.31 | 36 | 1 | 99.69 | 99.69 | |
|
| 0.31 | 36 | 1 | 100 | 99.38 | |
|
| 0.25 | 29 | 1 | 99.69 | 99.69 | |
|
| 0.24 | 28 | 1 | 99.69 | 99.69 | |
|
| 0.22 | 25 | 1 | 99.69 | 99.69 | |
|
| 0.22 | 25 | 1 | 99.66 | 99.32 | |
|
| 0.22 | 25 | 1 | 99.67 | 99.02 | |
|
| 0.2 | 23 | 1 | 99.68 | 99.05 | |
|
| 0.2 | 23 | 1 | 99.05 | 99.05 | |
100% identity against reference sequence.
CF Pathogens identified by Microbiological Culture and Deep Sequencing.
| Organism | Culture Only | Culture and Deep Sequencing | Deep Sequencing Only | Total Cases |
|
| 4 | 1 | 5 | |
|
| 1 | 1 | ||
|
| 1 | 1 | ||
|
| 1 | 1 | ||
|
| 1 | 4 | 5 | |
|
| 2 | 2 | ||
|
| 1 | 1 | ||
|
| 1 | 1 | 2 | |
|
| 1 | 1 | ||
|
| 1 | 1 | ||
|
| 2 | 36 | 8 | 46 |
|
| 1 | 1 | ||
|
| 2 | 2 | ||
|
| 2 | 1 | 3 | |
|
| 8 | 20 | 4 | 32 |
|
| 3 | 5 | 10 | 18 |
|
| 1 | 3 | 4 | |
|
| 1 |
| 1 | |
| All Organisms | 22 (17.3%) | 72 (56.7%) | 33 (26%) | 127 (100%) |
For one case, a single colony of Klebsiella pneumoniae was detected by culture.
45 patients had consensus sequences with best matches against both Streptococcus pneumoniae (pathogen) and Streptococcus mitis (normal microbiota). Because such consensus sequences cannot distinguish between these organisms, these instances were not counted.
Figure 3Metagenomic content and phylogenetic clustering of 66 CF sputa samples.
Taxonomic names (family, genus, species, or a combination of species where appropriate) appearing with a relative abundance of at least 15% of denoised reads in one or more specimens are indicated in the legend. Any taxonomic name that failed to meet this threshold was assigned the label “Other”. Organisms considered to be components of normal oropharyngeal microbiota by culture were not further speciated according to standard procedures in the clinical laboratory, and were assigned the general label “Contaminating orophoryngeal flora”. Taxonomic labels apply to parts B and C. (A) Phylogenetic “squash” clustering of CF bacterial composition. Samples are color-coded according to group (indicated in Roman numerals). Samples colored grey are ungrouped. (B) Classification performed by analysis of de-noised deep sequencing reads using pplacer (top panel) and culture (bottom panel). The relative number of each species (by read count or colony abundance, respectively) is represented by the height of corresponding bars. Phylogenetic “squash” clustering of specimens from deep sequence data is represented as a cladogram, with specimens colored as in part A. (C) Consensus microbiota profile of phylogenetic groups, averaged from all members of the group. Relative abundance of species, as estimated by the fraction of contributory reads, is indicated.