| Literature DB >> 22377371 |
Robert Schlaberg1, Keith E Simmon, Mark A Fisher.
Abstract
Sequencing of the 16S rRNA gene (16S) is a reference method for bacterial identification. Its expanded use has led to increased recognition of novel bacterial species. In most clinical laboratories, novel species are infrequently encountered, and their pathogenic potential is often difficult to assess. We reviewed partial 16S sequences from >26,000 clinical isolates, analyzed during February 2006-June 2010, and identified 673 that have <99% sequence identity with valid reference sequences and are thus possibly novel species. Of these 673 isolates, 111 may represent novel genera (<95% identity). Isolates from 95 novel taxa were recovered from multiple patients, indicating possible clinical relevance. Most repeatedly encountered novel taxa belonged to the genera Nocardia (14 novel taxa, 42 isolates) and Actinomyces (12 novel taxa, 52 isolates). This systematic approach for recognition of novel species with potential diagnostic or therapeutic relevance provides a basis for epidemiologic surveys and improvement of sequence databases and may lead to identification of new clinical entities.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22377371 PMCID: PMC3309591 DOI: 10.3201/eid1803.111481
Source DB: PubMed Journal: Emerg Infect Dis ISSN: 1080-6040 Impact factor: 6.883
Figure 1Anatomical sites that yielded 673 unidentified clinical bacterial isolates. The x-axis indicates relative frequency. Numbers to the right of bars represent isolate counts. GI, gastrointestinal; CNS, central nervous system; resp, respiratory; inv, invasive; GU, genitourinary.
Figure 2Sequence quality and number of ambiguous bases for 673 unidentified bacterial isolates. The median sequence length was 480 bases, with 84% of sequences in the range of 461 to 500 bases (A). The median phred sequence quality score was 45 (B). Most sequences had no ambiguous positions (n = 416, 61.8%). Up to 18 ambiguous positions were seen in isolates with multiple, nonidentical copies of the 16S rRNA gene (C). The x-axes indicate relative frequency. Numbers above columns represent isolate counts.
Figure 3Identities of 673 unidentified bacterial isolates to best match in BLASTn database () with species-level (A) or genus-level annotation (B) and identity to best match in database, regardless of annotation status (C). The x-axes indicate relative frequency. Numbers to the right of bars represent isolate counts.
Taxonomic distribution, by order of best species-level matches, for 673 isolates of possibly novel species of bacteria
| Order | No. isolates |
|---|---|
| Actinomycetales | 294 |
| Bacillales | 61 |
| Pseudomonadales | 56 |
| Flavobacteriales | 41 |
| Burkholderiales | 39 |
| Lactobacillales | 38 |
| Enterobacteriales | 33 |
| Neisseriales | 15 |
| Pasteurellales | 14 |
| Rhizobiales | 14 |
| Clostridiales | 10 |
| Cardiobacteriales | 9 |
| Sphingomonadales | 8 |
| Caulobacterales | 7 |
| Rhodospirillales | 7 |
| Xanthomonadales | 7 |
| Fusobacteriales | 6 |
| Bacteroidales | 5 |
| Sphingobacteriales | 4 |
| Rhodocyclales | 2 |
| Desulfovibrionales | 1 |
| Micrococcineae | 1 |
| Rhodobacterales | 1 |
Tentative novel species of bacteria represented by multiple isolates (clusters) for the different families in the order Actinomycetales
| Taxon (family) | In clusters |
| Not in clusters |
| Total | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| No. (%) isolates | Identity, %* | No. clusters | Cluster size | No. isolates | Identity, %* | No. isolates | Identity, % | |||||
| Min | Max | Med | ||||||||||
|
| 52 (71) | 97.9 | 12 | 2 | 12 | 3.5 | 21 | 96.5 | 73 | 97.5 | ||
|
| 42 (79) | 98.6 | 14 | 2 | 9 | 2 | 11 | 98.7 | 53 | 98.6 | ||
|
| 40 (68) | 98.1 | 9 | 2 | 10 | 3 | 19 | 97.6 | 59 | 97.9 | ||
|
| 17 (81) | 98.2 | 2 | 2 | 15 | 4 | 97.8 | 21 | 98.2 | |||
|
| 14 (45) | 97.3 | 6 | 2 | 3 | 2 | 17 | 96.5 | 31 | 96.9 | ||
|
| 9 (45) | 98.5 | 1 | 9 | 11 | 97.6 | 20 | 98.0 | ||||
|
| 2 (67) | 95.9 | 1 | 2 | 1 | 97.2 | 3 | 96.4 | ||||
|
| 6 | 97.8 | 6 | 97.8 | ||||||||
|
| 5 | 96.9 | 5 | 96.9 | ||||||||
|
| 4 | 97.4 | 4 | 97.4 | ||||||||
|
| 3 | 96.3 | 3 | 96.3 | ||||||||
|
| 3 | 97.7 | 3 | 97.7 | ||||||||
|
| 3 | 92.5 | 3 | 92.5 | ||||||||
|
| 2 | 98.5 | 2 | 98.5 | ||||||||
|
| 2 | 97.2 | 2 | 97.2 | ||||||||
|
| 2 | 97.8 | 2 | 97.8 | ||||||||
|
| 1 | 98.9 | 1 | 98.9 | ||||||||
|
| 1 | 93.7 | 1 | 93.7 | ||||||||
|
| 1 | 97.9 | 1 | 97.9 | ||||||||
|
| 1 | 96.8 | 1 | 96.8 | ||||||||
| Total | 176 (60) | 98.1 | 118 | 97.1 | 294 | 97.7 | ||||||
*Sequence identity to the closest GenBank match with species designation (match %). Min, minimum; max, maximum; med, median.
Figure A1Order-level taxonomic information for 673 bacterial isolates is summarized on the basis of the anatomical source. Numbers represent isolate count.
Tentative novel taxa represented by >5 clinical isolates*†
| Family | Identity, % | Initial cluster size | Reviewed cluster size | Gram stain morphology | Result |
|---|---|---|---|---|---|
|
| 98.5 | 15 | 0 | GPR | |
|
| 98.7 | 12 | 11 | GPR | 1 strain with >1% dissimilarity |
|
| 91.8 | 12 | 12 | GPR | Belong to |
|
| 96.4 | 11 | 11 | GNR | Most similar to |
|
| 98.1 | 10 | 10 | GPR | Most similar to |
|
| 98.6 | 10 | 5 | GPR | Most similar to |
|
| 98.9 | 10 | 10 | GNR | Most similar to |
|
| 98.5 | 9 | 0 | GPR |
|
|
| 98.9 | 9 | 9 | GPR | Most similar to |
|
| 98.9 | 8 | 0 | GNR | Belong to |
|
| 86.5 | 7 | 7 | GNR | Most similar to |
|
| 96.9 | 7 | 7 | GPR | Most similar to |
|
| 98.5 | 6 | 6 | GPR | Most similar to |
|
| 90.8 | 5 | 5 | GPR | Most similar to |
|
| 95.0 | 5 | 3+2 | GPR | 2 separate taxa |
|
| 96.7 | 5 | 5 | GPC | Most similar to |
|
| 97.3 | 5 | 5 | GNR | Most similar to |
|
| 97.8 | 5 | 3+2 | GPR | 2 separate taxa |
|
| 97.9 | 5 | 5 | GPC | Most similar to |
*GPR, gram-positive rods; GNR, gram-negative rods; GPC, gram-positive cocci. †initial and reviewed clusters sizes indicate number of isolates in each cluster before and after manual review, outcome of manual review, and most similar valid species names are listed. Manual review was performed for all clusters with at least 5 isolates. Sequences were aligned with type strain sequences, and manual BLAST () analysis was performed to calculate pairwise sequence identities.
Tentative novel species of bacteria represented by multiple isolates (clusters)
| Taxon | In clusters | Not in clusters | Total | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| No. (%) isolates | Identity, %* | No. clusters | Size | No. isolates | Identity, %* | No. isolates | Identity, % | |||||
| Min | Max | Med | ||||||||||
|
| 176 (60) | 98.1 | 45 | 2 | 15 | 3 | 118 | 97.1 | 294 | 97.7 | ||
|
| 30 (54) | 97.1 | 8 | 2 | 11 | 2.5 | 26 | 96.8 | 56 | 97.0 | ||
|
| 28 (74) | 97.9 | 8 | 2 | 5 | 3.5 | 10 | 95.1 | 38 | 97.1 | ||
|
| 27 (44) | 96.5 | 7 | 2 | 12 | 2 | 34 | 94.1 | 61 | 95.1 | ||
|
| 20 (49) | 92.9 | 6 | 2 | 7 | 3 | 21 | 95.6 | 41 | 94.3 | ||
|
| 19 (58) | 98.3 | 4 | 2 | 10 | 3.5 | 14 | 98.0 | 33 | 98.2 | ||
|
| 15 (38) | 97.3 | 6 | 2 | 4 | 2 | 24 | 97.6 | 39 | 97.4 | ||
|
| 8 (89) | 98.9 | 1 | 8 | 1 | 97.5 | 9 | 98.7 | ||||
|
| 6 (43) | 93.5 | 2 | 2 | 4 | 8 | 97.0 | 14 | 95.5 | |||
|
| 5 (33) | 97.9 | 2 | 2 | 3 | 10 | 96.9 | 15 | 97.3 | |||
|
| 3 (50) | 98.7 | 1 | 3 | 3 | 93.5 | 6 | 96.1 | ||||
|
| 3 (43) | 97.2 | 1 | 3 | 4 | 96.9 | 7 | 97.0 | ||||
|
| 2 (20) | 97.7 | 1 | 2 | 8 | 94.3 | 10 | 95.0 | ||||
|
| 2 (14) | 98.9 | 1 | 2 | 12 | 96.2 | 14 | 96.6 | ||||
|
| 2 (25) | 97.9 | 1 | 2 | 6 | 97.1 | 8 | 97.3 | ||||
|
| 2 (25) | 95.5 | 1 | 2 | 5 | 97.1 | 7 | 96.7 | ||||
|
| 7 | 97.6 | 7 | 97.6 | ||||||||
|
| 5 | 92.3 | 5 | 92.3 | ||||||||
|
| 4 | 87.0 | 4 | 87.0 | ||||||||
|
| 2 | 95.5 | 2 | 95.5 | ||||||||
|
| 1 | 94.6 | 1 | 94.6 | ||||||||
|
| 1 | 97.4 | 1 | 97.4 | ||||||||
|
| 1 | 98.4 | 1 | 98.4 | ||||||||
| Total | 348 (52) | 325 | 673 | |||||||||
*Sequence identity to the closest GenBank match with species designation (match %). Min, minimum; max, maximum; med, median.
Anatomical sites and possible novel bacterial isolates*
| Source | Identity, % | Best species-level match | Cluster | Gram stain morphology | Comment† |
|---|---|---|---|---|---|
| Tissue | 98.8 |
| N | GNR | |
| Tissue | 97.2 |
| N | GPR | |
| CSF | 98.3 |
| Y | GPR | |
| Pericardial fluid | 94.4 |
| N | GPC | |
| Tissue | 94.8 |
| N | GNR | |
| CSF | 93.8 |
| Y | GNR | |
| CSF | 98.3 |
| Y | GPR | |
| Tissue | 98.6 |
| Y | GNR | |
| CSF | 97.1 |
| Y‡ | GNR | |
| Tissue | 97.7 |
| Y‡ | GNR | |
| CSF | 97.2 |
| Y | GPC | |
| Tissue | 96.2 |
| Y | GPC | |
| CSF | 92.4 |
| Y | GPR |
|
| Synovial fluid | 91.2 |
| N | GVR | |
| Tissue | 96.0 |
| Y‡ | GPR | |
| Tissue | 95.9 |
| Y‡ | GVR | |
| Tissue | 96.9 |
| N | GNC | |
| Tissue | 97.9 |
| Y | GNCB | |
| Biopsy specimen | 98.7 |
| Y | GPR | |
| Brain | 98.9 |
| Y | GPR | |
| Tissue | 98.9 |
| N | GPR | |
| CSF | 96.1 |
| N | GNR | |
| Tissue | 97.5 |
| N | GVR | |
| Tissue | 95.2 |
| Y‡ | GNR | |
| Tissue | 95.2 |
| Y‡ | GNR | |
| CSF | 98.5 |
| Y | GPR | |
| Tissue | 98.0 |
| Y | GPC | |
| Valve | 96.8 |
| Y | GPC | |
| CSF | 98.0 |
| Y | GPC | |
| CSF | 96.4 |
| N | GPR | |
| CSF | 96.8 |
| N | GPC | |
| Tissue | 97.8 |
| N | GPR |
*GNR, gram-negative rods; GPR, gram-positive rods; CSF, cerebrospinal fluid; GPC, gram-positive cocci; GVR, gram-variable rods; GNC, gram-negative cocci; GNCB, gram-negative coccobacilli; Y, isolates belonging to tentative novel taxa represented multiple times in this study. †Results of manual review of BLASTn analysis (). ‡These pairs of isolates belong to the same 3 respective clusters.