| Literature DB >> 22969423 |
Abstract
Bacterial pathogens impose a heavy burden of disease on human populations worldwide. The gravest threats are posed by highly virulent respiratory pathogens, enteric pathogens, and HIV-associated infections. Tuberculosis alone is responsible for the deaths of 1.5 million people annually. Treatment options for bacterial pathogens are being steadily eroded by the evolution and spread of drug resistance. However, population-level whole genome sequencing offers new hope in the fight against pathogenic bacteria. By providing insights into bacterial evolution and disease etiology, these approaches pave the way for novel interventions and therapeutic targets. Sequencing populations of bacteria across the whole genome provides unprecedented resolution to investigate (i) within-host evolution, (ii) transmission history, and (iii) population structure. Moreover, advances in rapid benchtop sequencing herald a new era of real-time genomics in which sequencing and analysis can be deployed within hours in response to rapidly changing public health emergencies. The purpose of this review is to highlight the transformative effect of population genomics on bacteriology, and to consider the prospects for answering abiding questions such as why bacteria cause disease.Entities:
Mesh:
Year: 2012 PMID: 22969423 PMCID: PMC3435253 DOI: 10.1371/journal.ppat.1002874
Source DB: PubMed Journal: PLoS Pathog ISSN: 1553-7366 Impact factor: 6.823
Major bacterial causes of death: World and United States.
| Cause of Death | Total Deaths (Thousands) | % Communicable Disease Deaths | Key Bacterial Species |
| Global (2008 estimates) | |||
| Lower respiratory infections | 3,742 | 30.6 |
|
| Tuberculosis | 1,833 | 15.0 |
|
| Directly attributable | 1,250 | 10.2 | |
| HIV-associated | 583 | 4.8 | |
| Diarrhoeal disease | 1,687 | 13.8 |
|
| Meningitis | 270 | 2.2 |
|
| Pertussis | 194 | 1.6 |
|
| Tetanus | 128 | 1.0 |
|
| Syphilis | 81 | 0.7 |
|
| Upper respiratory infections | 69 | 0.6 |
|
| Chlamydia | 7 | 0.1 |
|
| Other communicable disease | 4,231 | 34.5 | |
| United States of America (1999–2007) | |||
| Sepsis | 280.3 | 48.17 | |
| Clostridium difficile infection | 30.2 | 5.19 |
|
| Staphylococcal infection | 16.6 | 2.86 |
|
| HIV-associated | 9.7 | 1.66 | |
| Tuberculosis | 8.8 | 1.50 |
|
| Directly attributable | 7.4 | 1.26 | |
| HIV-associated | 1.4 | 0.24 | |
| Streptococcal infection | 6.4 | 1.09 |
|
| Meningococcal disease | 1.4 | 0.24 |
|
| Legionnaires' disease | 0.7 | 0.12 |
|
| Other bacterial disease | 17.6 | 4.57 | |
| Other communicable disease | 210.1 | 36.1 |
The total number of deaths attributable to communicable diseases is shown for the world (2008 estimates) and United States (1999–2007), with key bacterial species highlighted. At the global level, the WHO classifications for causes of death are broad and usually encompass multiple etiological agents, not only bacterial species. The United States and some other countries classify deaths based on detailed ICD-10 four-digit codes that frequently specify the bacterial species responsible.
Estimated from the total number of HIV deaths assuming 26% are associated with tuberculosis [85].
Excluding other causes of death mentioned explicitly.
Figure 1An example workflow for high-throughput whole genome sequencing in bacteria.
Sample collection. A biological sample (e.g., blood) is collected. Culture. Bacterial colonies are isolated from the sample by culturing on appropriate media. DNA Preparation. DNA is extracted from the colonies and a DNA library is prepared ready for sequencing. High-Throughput Sequencing. Millions of short sequence reads are yielded, typically several hundred nucleotides long or less. To reconstruct the genome, one of two approaches is generally adopted. Mapping to Reference Genome. In reference-based mapping, the short sequences are mapped (i.e., aligned) to a reference genome using an algorithm (e.g., [73], [74]). Preferably the reference genome is high quality, complete, and closely related. The pie chart illustrates that not all reads necessarily map to the reference genome (e.g., because of novel regions not present in the reference). Filtering. Short reads cannot be mapped reliably to repetitive regions of the reference genome, so these are identified and filtered out. Sites that are problematic for other reasons (e.g., because too few reads have mapped or because the consensus nucleotide is ambiguous) are also filtered out. The pie chart illustrates that some portion of the reference genome does not get called due to filtering. In the mapped genome, these positions will receive an ambiguity code (i.e., N rather than A, C, G, or T). De novo Assembly of Contigs. An alternative to mapping is de novo assembly, in which no reference genome is used. An algorithm (e.g., [75], [76]) is used to assemble short reads into longer sequences known as contigs. The number and length of contigs will depend on general factors such as the length of sequence reads and the total amount of DNA sequence produced, as well as local factors such as the presence of repetitive regions. The pie chart shows an example of the proportion of all reads that assemble into contigs of a given length. Alignment. For further analysis, it is necessary to align local regions (e.g., genes) or whole genomes using appropriate algorithms (e.g., [77]–[79]). There is a trade-off in computational terms between the length of region and the number of sequences that can be aligned. Sequence Analysis. The two approaches produce sequence alignments that represent pairwise alignments against a reference (mapping) or multiple alignments one to another (de novo assembly). These alignments can be analyzed directly, or processed further to detect variants such as single nucleotide polymorphisms, insertions, and deletions. The pie charts are meant to be illustrative only, and were produced from data in [27].
Figure 2Whole genome sequencing reveals within-host evolution and recent transmission between patients.
Lieberman, Michel, and colleagues [35] sequenced the genomes of 112 isolates of Burkholderia dolosa from 14 cystic fibrosis patients involved in an outbreak in Boston, Massachusetts in the 1990s. (A) The maximum likelihood tree relating the bacterial genomes, color-coded by patient, is broadly consistent with a single founding infection for each patient. (B) The date of sampling and the chronological accumulation of mutations implied a network of transmission events. (C) Interesting patterns emerged when comparing bacteria isolated from different sites in the same patient. For two patients (subjects K and N), multiple genotypes appeared to have been transmitted from the airways to the bloodstream during septicemia, either concurrently or over the course of the infection. By contrast, a single genotype appeared to have been transmitted from the airways to the bloodstream in subject H. Reproduced from [35] appearing in Nature Genetics (Volume 43, 2011).
Figure 3Patterns of historical transmission reconstructed by whole genome sequencing.
Bos, Schuenemann, and colleagues [47] combined ancient DNA techniques with whole genome sequencing to reconstruct a draft genome of Yersinia pestis, the bacterium responsible for the Black Death, from five teeth recovered from a 660-year-old burial ground. (A) Genealogical reconstruction reveals that the bacteria responsible for the Black Death are positioned ancestral to modern Branch 1 Yersinia pestis, close to the most recent common ancestor of all modern Yersinia pestis pathogenic to humans. No derived mutations were observed in the ancient genome, suggesting that modern Branch 1 bacteria are essentially equivalent, and that differences in modern and 14th century epidemiology probably do not result from genetic changes in the bacteria. (B) Geographical origin of the bacterial isolates. (C) Inferred geographical spread of the Black Death through Europe [80]. Reproduced from [47] appearing in Nature (Volume 478, 2011).