| Literature DB >> 25737810 |
Eduardo Castro-Nallar1, Nur A Hasan2, Thomas A Cebula3, Rita R Colwell4, Richard A Robison5, W Evan Johnson6, Keith A Crandall1.
Abstract
The post-genomic era is characterized by the direct acquisition and analysis of genomic data with many applications, including the enhancement of the understanding of microbial epidemiology and pathology. However, there are a number of molecular approaches to survey pathogen diversity, and the impact of these different approaches on parameter estimation and inference are not entirely clear. We sequenced whole genomes of bacterial pathogens, Burkholderia pseudomallei, Yersinia pestis, and Brucella spp. (60 new genomes), and combined them with 55 genomes from GenBank to address how different molecular survey approaches (whole genomes, SNPs, and MLST) impact downstream inferences on molecular evolutionary parameters, evolutionary relationships, and trait character associations. We selected isolates for sequencing to represent temporal, geographic origin, and host range variability. We found that substitution rate estimates vary widely among approaches, and that SNP and genomic datasets yielded different but strongly supported phylogenies. MLST yielded poorly supported phylogenies, especially in our low diversity dataset, i.e., Y. pestis. Trait associations showed that B. pseudomallei and Y. pestis phylogenies are significantly associated with geography, irrespective of the molecular survey approach used, while Brucella spp. phylogeny appears to be strongly associated with geography and host origin. We contrast inferences made among monomorphic (clonal) and non-monomorphic bacteria, and between intra- and inter-specific datasets. We also discuss our results in light of underlying assumptions of different approaches.Entities:
Keywords: Biological weapons; Bioterrorism; Data type; Genomes; High-throughput sequencing; MLST; Molecular epidemiology; Phylogenomics; Phylogeography; SNP
Year: 2015 PMID: 25737810 PMCID: PMC4338773 DOI: 10.7717/peerj.761
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Summary of genomes sequenced and collected in this study.
Metadata on strain source, host, location and date of collection also provided when available.
| NCBI Accession | Species | Strain | Source | Host | Location | Date of |
|---|---|---|---|---|---|---|
|
|
| 5 | Public Health LaboratoryService, London | Sheep | Australia | 1949 |
|
|
| 6 | Public Health LaboratoryService, London | Human | Bangladesh | 1960 |
|
|
| 9 | Public Health LaboratoryService, London | Human | Pakistan | 1988 |
|
|
| 18 | Public Health LaboratoryService, London | Monkey | Indonesia | 1990 |
|
|
| 24 | Public Health LaboratoryService, London | Horse | France | 1976 |
|
|
| 25 | Public Health LaboratoryService, London | Soil | Madagascar | 1977 |
|
|
| 31 | Public Health LaboratoryService, London | Water drain | Kenya | 1992 |
|
|
| 33 | Public Health LaboratoryService, London | Manure | France | 1976 |
|
|
| 35 | Public Health LaboratoryService, London | Human | Vietnam | 1963 |
|
|
| 68 | Public Health LaboratoryService, London | Human | Fiji | 1992 |
|
|
| 91 | Public Health LaboratoryService, London | Sheep | Australia | 1984 |
|
|
| 104 | Public Health LaboratoryService, London | Goat | Australia | 1990 |
|
|
| 208 | Public Health LaboratoryService, London | Human | Ecuador | 1990 |
|
|
| 4075 | Public Health LaboratoryService, London | Human | Holland | 1999 |
|
|
| Darwin-035 | Royal Darwin Hospital | Human | Australia | 2003 |
|
|
| Darwin-051 | Royal Darwin Hospital | Dog | Australia | 1992 |
|
|
| Darwin-060 | Royal Darwin Hospital | Pig | Australia | 1992 |
|
|
| Darwin-077 | Royal Darwin Hospital | Bird | Australia | 1994 |
|
|
| Darwin-150 | Royal Darwin Hospital | Soil | Australia | 2006 |
|
|
| 80800117 | Utah Department of Health | Human | USA | 2008 |
|
|
| 1026b | hhayden@u.washington.edu | Human | Thailand | 1993 |
|
|
| 1106a | J. Craig Venter Institute (JCVI) | Human | Thailand | 1993 |
|
|
| MSHR346 | Joint Genome Institute/LANL Center | Human | Australia | 1996 |
|
|
| k96243 | Sanger Institute | Human | Thailand | 1996 |
|
|
| BPC006 | Third Military MedicalUniversity | Human | China | 2008 |
|
|
| 1106b | J. Craig Venter Institute (JCVI) | Human | Thailand | 1996 |
|
|
| 1710a | J. Craig Venter Institute (JCVI) | Human | Thailand | 1996 |
|
|
| 1710b | J. Craig Venter Institute (JCVI) | Human | Thailand | 1999 |
|
|
| 668 | J. Craig Venter Institute (JCVI) | Human | Australia | 1995 |
|
|
| Bp22 | Genome Institute of Singapore | Human | Singapore | 1989 |
|
|
| E264 | J. Craig Venter Institute (JCVI) | Soil | Thailand | 1994 |
|
|
| 1004, Strain 2032 | National Animal Disease Center | Bovine | Missouri, USA | 1990 |
|
|
| 1007, Strain 2045 | National Animal Disease Center | Bovine | Florida, USA | 1990 |
|
|
| 1019, Strain 2038 | National Animal Disease Center | Bovine | Tennessee, USA | 1990 |
|
|
| 1022, Strain 2073 | National Animal Disease Center | Bovine | Georgia, USA | 1990 |
|
|
| 1146, Strain 8-953 | National Animal Disease Center | Elk | Montana, USA | 1992 |
|
|
| 1668, Strain 00-666 | National Animal Disease Center | Elk | Wyoming, USA | 2000 |
|
|
| YELL-99-067 | Idaho National Engineering and Environmental Laboratory | Bison (amniotic fluid) | Wyoming, USA | 1999 |
|
|
| 1614, Strain | National Animal Disease Center | Bovine | Texas, USA | 2000 |
|
|
| 1107, Strain 1-107 | National Animal Disease Center | Canine | Missouri, USA | 1990 |
|
|
| 1253, Strain | National Animal Disease Center | Caprine | Unknown | 1994 |
|
|
| BA 4837 | New Mexico Departmentof Health | Human | New Mexico, USA | 2003 |
|
|
| 70000565 | Utah Department of Health | Blood, Human | Utah, USA | 2000 |
|
|
| 80600020 | Utah Department of Health | Blood, Human | Utah, USA | 2006 |
|
|
| 80800076 | Utah Department of Health | Human | California, USA | 2008 |
|
|
| 1156, Strain 5K33, ATCC#23459 | National Animal Disease Center | Desert wood rat | Unknown, USA | 1992 |
|
|
| 1117, Strain 1-507 | National Animal Disease Center | Ovine | Georgia, USA | 1991 |
|
|
| 1698, Strain | National Animal Disease Center | Ovine (semen) | Ft. Collins, Colorado, USA | 2001 |
|
|
| 70100304 | Utah Department of Health | Blood, Human | USA- Utah | 2001 |
|
|
| 1103, Strain 2483 | National Animal Disease Center | Porcine | South Carolina, USA | 1990 |
|
|
| 1108, Strain 1-138 | National Animal Disease Center | Porcine | New Jersey, USA | 1990 |
|
|
| A13334 | Macrogen | Bovine | Korea | Unknown |
|
|
| bv 1, 9-941 | USDA | Bovine | Wyoming, USA | Unknown |
|
|
| S19 | Crasta OR | Bovine | Unknown, USA | 1923 |
|
|
| ATCC 23365 | DOE Joint Genome Institute | Dog | Unknown | Unknown |
|
|
| HSK A52141 | National VeterinaryResearch and Quarantine | Dog | South Korea | Unknown |
|
|
| ATCC 23457 | Los Alamos National Lab | Human | India | 1963 |
|
|
| M28 | Chinese National Human GenomeCenter at Shanghai | Sheep | China | 1955 |
|
|
| bv 1, 16M | Integrated Genomics Inc | Goat | Unknown, USA | Unknown |
|
|
| bv. 1 Abortus 2308 | Lawrence Livermore National Lab | Standard laboratory strain | Unknown | Unknown |
|
|
| M5-90 | Chinese National Human GenomeCenter at Shanghai | Standard laboratory strain | Unknown | Unknown |
|
|
| bv. 3 NI | China Agricultural University | Bovine | Inner Mongolia, China | 2007 |
|
|
| CCM 4915 | Sudic S | Vole | Czech Republic | 2000 |
|
|
| ATCC 25840 | J. Craig Venter Institute | Sheep | Australia | 1960 |
|
|
| B2/94 | Zygmunt, M.S. | Seal | UK | 1994 |
|
|
| VBI22 | Harold R. Garner | Bovine, milk | Texas, USA | Unknown |
|
|
| bv 1, 1330 | J. Craig Venter Institute | Pig | Unknown, USA | 1950 |
|
|
| ATCC 23445 | Joint Genome Institute/LANL Center | Hare | UK | 1951 |
|
|
| ATCC 49188 | DOE Joint Genome Institute | Arsenical cattle-dipping fluid | Australia | 1988 |
|
|
| 4954 | New Mexico Departmentof Health | Human | NM, USA | 1987 |
|
|
| 1901b | New Mexico Departmentof Health | Human | NM, USA | 1983 |
|
|
| Java (D88) | Michigan State University | Unknown | Far East | Unknown |
|
|
| Kimberley (D17) | Michigan State University | Unknown | Near East | Unknown |
|
|
| KUMA (D11) | Michigan State University | Unknown | Manchuria, China | Unknown |
|
|
| TS (D5) | Michigan State University | Unknown | Far East | Unknown |
|
|
| 8607116 | New Mexico Departmentof Health | Dog | NM, USA | Unknown |
|
|
| 1866 | New Mexico Departmentof Health | Squirrel | NM, USA | Unknown |
|
|
| 4139 | New Mexico Departmentof Health | Cat | NM, USA | 1995 |
|
|
| 4412 | New Mexico Departmentof Health | Human | NM, USA | 1991 |
|
|
| 2965 | New Mexico Departmentof Health | Human | NM, USA | 1995 |
|
|
| 2055 | New Mexico Departmentof Health | Human | NM, USA | 1998 |
|
|
| 2106 | New Mexico Departmentof Health | Human | NM, USA | 2001 |
|
|
| 2772 | New Mexico Departmentof Health | Cat | NM, USA | 1984 |
|
|
| 3357 | New Mexico Departmentof Health | Mountain lion | NM, USA | 1999 |
|
|
| AS 2509 | New Mexico Departmentof Health | Rodent | NM, USA | 2004 |
|
|
| AS 200900596 | New Mexico Departmentof Health | Rabbit, liver/spleen | United States, Santa Fe, NM | 2009 |
|
|
| V-6486 | New Mexico Departmentof Health | Llama | Las Vegas, NM, USA | Unknown |
|
|
| KIM (D27) | Michigan State University | Human | Iran/Kurdistan | 1968 |
|
|
| AS200901509 | New Mexico Departmentof Health | Liver/spleen, prairie dog | Santa Fe, NM, USA | 2009 |
|
|
| A1122 | Los Alamos National Lab | Ground squirrel | California | 1939 |
|
|
| Angola | J. Craig Venter Institute (JCVI) | Human | Angola | Unknown |
|
|
| Antiqua | DOE Joint Genome Institute | Human | Congo | 1965 |
|
|
| B42003004 | J. Craig Venter Institute (JCVI) | Marmota baibacina | China | 2003 |
|
|
| CA88-4125 | DOE Joint Genome Institute | Human | California | 1988 |
|
|
| CO92 | Sanger Institute | Human/cat | Colorado | 1992 |
|
|
| D106004 | Chinese Center for DiseaseControl and Prevention | Apodemus chevrieri | Yulong County, China | 2006 |
|
|
| D182038 | Chinese Center for DiseaseControl and Prevention | Apodemus chevrieri | Yunnan, China | 1982 |
|
|
| E1979001 | J. Craig Venter Institute (JCVI) | Eothenomys miletus | China | 1979 |
|
|
| F1991016 | J. Craig Venter Institute (JCVI) | Flavus rattivecus | China | 1991 |
|
|
| FV-1 | The Translational GenomicsResearch Institute | Prairy dog | Arizona | 2001 |
|
|
| India 195 | DOE Joint Genome Institute | Human | India | Unknown |
|
|
| IP275 | The Institute for Genomic Research | Human | Madagascar | 1995 |
|
|
| IP31758 | J. Craig Venter Institute (JCVI) | Human | Russia | 1966 |
|
|
| K1973002 | J. Craig Venter Institute (JCVI) | Marmota himalaya | China | 1973 |
|
|
| KIM D27 | J. Craig Venter Institute (JCVI) | Human | Iran/Kurdistan | 1968 |
|
|
| KIM10+ | Genome Center of Wisconsin | Human | Iran/Kurdistan | 1968 |
|
|
| Medievalis str. Harbin 35 | Virginia Bioinformatics Institute | Human | China | 1940 |
|
|
| MG05-1020 | J. Craig Venter Institute (JCVI) | Human | Madagascar | 2005 |
|
|
| Microtus 91001 | Academy of Military Medical Sciences, The Institute of Microbiology andEpidemiology, China | Microtus brandti | China | 1970 |
|
|
| Nepal516 | Genome Center of Wisconsin | Human/soil | Nepal | 1967 |
|
|
| Pestoides A | DOE Joint Genome Institute | Human | FSU | 1960 |
|
|
| Pestoides F | DOE Joint Genome Institute | Human | FSU | Unknown |
|
|
| PEXU2 | Enteropathogen Resource Integration Center (ERIC) BRC | Rodent | Brazil | 1966 |
|
|
| UG05-045 | J. Craig Venter Institute (JCVI) | Human | Uganda | 2005 |
|
|
| Z176003 | CCDC | Marmota himalayana | Tibet | 1976 |
Genetic diversity and dataset length for different species and molecular survey approaches.
| MLST | SNP | Genome | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Chromosome I | Chromosome II | Chromosome I | Chromosome II | ||||||||
| Mean | Variance | Mean | Variance | Mean | Variance | Mean | Variance | Mean | Variance | ||
|
| theta | 36.05 | 127.39 | 7807.03 | 5593613.54 | 4333.68 | 1724029.33 | 4666.60 | 1999005.02 | 2187.99 | 439709.21 |
| pi | 4.45E–03 | 5.15E–06 | 9.58E–02 | 2.18E–03 | 9.69E–02 | 2.24E–03 | 5.67E–03 | 7.66E–06 | 7.14E–03 | 1.21E–05 | |
| length | 3518.00 | 31189.00 | 17313.00 | 289172.00 | 108654.00 | ||||||
|
| theta | 112.58 | 1205.64 | 914.77 | 78109.73 | 234.25 | 5161.72 | 635.85 | 37782.92 | 2046.37 | 390305.58 |
| pi | 1.00E–02 | 2.46E–05 | 8.01E–02 | 1.54E–03 | 7.66E–02 | 1.43E–03 | 7.96E–03 | 1.52E–05 | 1.62E–02 | 6.25E–05 | |
| length | 4409.00 | 3628.00 | 929.00 | 24110.00 | 36223.00 | ||||||
|
| theta | 36.17 | 120.42 | 3204.65 | 883301.99 | 527.40 | 24020.48 | ||||
| pi | 4.97E–04 | 6.68E–08 | 5.60E–02 | 7.40E–04 | 4.55E–04 | 4.94E–08 | |||||
| length | 20498.00 | 14116.00 | 281149.00 | ||||||||
Notes.
Burkholderia pseudomallei
Brucella spp.
Yersinia pestis
Figure 1Substitution rates for all datasets as estimated from different molecular survey approaches.
Genome/SNP chr I/II refers to estimates from different chromosomes. Burkholderia pseudomallei (A), Brucella spp. (B), and Yersinia pestis (C). Note different scale for species rates.
Figure 2Median node ages in years.
Burkholderia pseudomallei (A), Brucella spp. (B), and Yersinia pestis (C) median estimates and their 95% highest posterior density (HPD) interval according to molecular survey approach (only chromosome I showed; see Fig. S1). Nodes are numbered from youngest to oldest.
Figure 3Burkholderia pseudomallei phylogenies by survey approach.
MLST phylogeny (A) is less resolved and poorly supported compared to SNP (B) and genome (C) phylogenies (only chromosome I showed).
Topology distances among phylogenies inferred using different molecular survey approaches.
Bolded rows show tree comparisons between different chromosomes under the same molecular survey approach.
| Species | Tree comparisons | Matching cluster | R–F cluster | |
|---|---|---|---|---|
|
| mlst | snp-I | 181 | 16 |
|
| mlst | snp-II | 162 | 17 |
|
| mlst | genome-I | 149 | 18 |
|
| mlst | genome-II | 116 | 18 |
|
|
|
|
|
|
|
| snp-I | genome-I | 56 | 17 |
|
| snp-I | genome-II | 91 | 17 |
|
| snp-II | genome-I | 47 | 19 |
|
| snp-II | genome-II | 72 | 16 |
|
|
|
|
|
|
|
| mlst | snp-I | 34 | 5 |
|
| mlst | snp-II | 10 | 1.5 |
|
| mlst | genome-I | 73 | 4.5 |
|
| mlst | genome-II | 56 | 5 |
|
|
|
|
|
|
|
| snp-I | genome-I | 61 | 5.5 |
|
| snp-I | genome-II | 40 | 5 |
|
| snp-II | genome-I | 63 | 5 |
|
| snp-II | genome-II | 50 | 4.5 |
|
|
|
|
|
|
|
| mlst | snp | 223 | 13.5 |
|
| mlst | genome | 103 | 8 |
|
| snp | genome | 124 | 8.5 |
Notes.
Genome/SNP-I/II, chromosome I or II; R–F Cluster, Robinson–Foulds for rooted trees metric.
Trait-phylogeny association statistics.
Significant associations (p value <0.05) were found between traits (sampling location/host/time) and phylogenies inferred by using different data approaches. Association index (AI); Parsimony Score (PS); genome/SNP-I/II, chromosome I or II.
|
|
| ||
|---|---|---|---|
|
| |||
|
|
|
| |
|
| MLST, genome-I, genome-II, SNP-I, SNP-II | MLST, genome-I, genome-II, SNP-I, SNP-II | MLST, genome, SNP |
|
| MLST, genome-I, genome-II, SNP-II | MLST, SNP-II | MLST, genome, SNP |
|
| |||
|
|
|
| |
|
| None | MLST, genome-I, genome-II, SNP-I, SNP-II | Genome, SNP |
|
| None | MLST, genome-I, genome-II, SNP-I, SNP-II | None |
|
| |||
|
| Genome-I, genome-II | None | Genome |
|
| None | None | Genome |