| Literature DB >> 27871225 |
Tyler S Brown1, Apurva Narechania2, John R Walker3, Paul J Planet4, Pablo J Bifani5, Sergios-Orestis Kolokotronis6, Barry N Kreiswirth7, Barun Mathema8.
Abstract
BACKGROUND: Whole genome sequencing (WGS) has rapidly become an important research tool in tuberculosis epidemiology and is likely to replace many existing methods in public health microbiology in the near future. WGS-based methods may be particularly useful in areas with less diverse Mycobacterium tuberculosis populations, such as New York City, where conventional genotyping is often uninformative and field epidemiology often difficult. This study applies four candidate strategies for WGS-based identification of emerging M. tuberculosis subpopulations, employing both phylogenomic and population genetics methods.Entities:
Keywords: Mycobacterium tuberculosis; Phylogenomics; Surveillance; Whole genome sequencing
Mesh:
Year: 2016 PMID: 27871225 PMCID: PMC5117616 DOI: 10.1186/s12864-016-3298-6
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Characteristics of the isolates sequenced in this study
| Isolate | Lineage | Location | Year | Reads | Mean read depth | %Genome coverage | Filtered SNPs | Genbank Accession |
|---|---|---|---|---|---|---|---|---|
| BE_116771 | 1 | NJ | 1999 | 2,883,388 | 31.633 | 0.994 | 2065 | LKMF01000000 |
| BE3_11657 | 3 | NJ | 1999 | 2,934,017 | 63.728 | 0.996 | 1046 | LKDN01000000 |
| 001_13432 | 4 | NYC | 2000 | 3,027,826 | 75.362 | 0.996 | 1060 | LKDO01000000 |
| AH_14271 | 4 | NYC | 2001 | 4,045,981 | 95.556 | 0.998 | 1074 | LKDP01000000 |
| AH26_26663 | 4 | NYC | 2010 | 1,356,722 | 33.169 | 0.997 | 1044 | LJIQ01000000 |
| AH26_28866 | 4 | S. Africa | 2011 | 1,958,645 | 23.009 | 0.994 | 968 | LKMH01000000 |
| AU_8623 | 4 | NYC | 1998 | 6,294,738 | 71.734 | 0.996 | 983 | LKMG01000000 |
| BE_10225 | 4 | NJ | 1999 | 1,896,522 | 47.939 | 0.994 | 1049 | LJIK01000000 |
| BE_13443 | 4 | NYC | 2001 | 3,669,247 | 91.597 | 0.995 | 1071 | LKDQ01000000 |
| BE_14248 | 4 | NYC | 2001 | 2,921,250 | 70.22 | 0.995 | 1077 | LKDR01000000 |
| BE_7556 | 4 | NJ | 1997 | 3,580,176 | 41.681 | 0.995 | 961 | LJIL01000000 |
| C_913 | 4 | NYC | 1992 | 14,910,476 | 387.981 | 0.995 | 1101 | LKMI01000000 |
| C_10367 | 4 | NYC | 1999 | 2,082,141 | 51.722 | 0.995 | 1057 | LJIP01000000 |
| C_14229 | 4 | NYC | 2001 | 5,108,155 | 127.485 | 0.996 | 1128 | LKDS01000000 |
| C130 | 4 | NYC | 1991 | 2,704,576 | 64.086 | 0.996 | 1057 | LJIN01000000 |
| C24_20545 | 4 | NYC | 2005 | 2,856,851 | 71.852 | 0.997 | 1125 | LJIM01000000 |
| C28_9319 | 4 | NJ | 1998 | 2,179,814 | 53.922 | 0.995 | 1058 | LJIO01000000 |
| C28_9904 | 4 | NJ | 1999 | 1,632,080 | 39.971 | 0.994 | 1019 | LJIR01000000 |
| C30_19588 | 4 | NYC | 2004 | 4,631,768 | 115.895 | 0.996 | 1083 | LKDT01000000 |
| C34_13853 | 4 | NYC | 2001 | 2,008,488 | 48.272 | 0.995 | 1048 | LKHH01000000 |
| C4_16679 | 4 | NYC | 2002 | 3,966,004 | 96.262 | 0.996 | 1075 | LKIF01000000 |
| C49_20090 | 4 | NYC | 2005 | 1,966,774 | 47.421 | 0.995 | 1024 | LKIG01000000 |
| C53_20899 | 4 | NYC | 2006 | 3,105,243 | 74.778 | 0.994 | 1062 | LKIH01000000 |
| H_13559 | 4 | NYC | 2001 | 3,185,904 | 76.306 | 0.996 | 1041 | LKII01000000 |
| H_13571 | 4 | NYC | 2001 | 1,815,449 | 44.429 | 0.995 | 1021 | LKIJ01000000 |
| H_7300 | 4 | NYC | 1997 | 2,011,143 | 48.599 | 0.994 | 1020 | LKDL01000000 |
| H55_24991 | 4 | NYC | 2009 | 1,743,190 | 42.094 | 0.995 | 1041 | LKIK01000000 |
| H6_10443 | 4 | NJ | 1999 | 1,799,336 | 43.72 | 0.995 | 1019 | LKIL01000000 |
| H6_12226 | 4 | NJ | 2000 | 4,457,191 | 105.789 | 0.996 | 1074 | LKIM01000000 |
| H6_7420 | 4 | NJ | 1997 | 1,719,494 | 43.153 | 0.994 | 1041 | LKDM01000000 |
| I_15762 | 4 | NYC | 2002 | 2,785,341 | 66.101 | 0.995 | 1057 | LKIN01000000 |
| KI_19771 | 4 | NYC | 2004 | 2,884,815 | 69.225 | 0.995 | 1079 | LKIO01000000 |
| L_13621 | 4 | NYC | 2001 | 8,967,725 | 221.783 | 0.997 | 1098 | LKIP01000000 |
| V_13678 | 4 | NYC | 2001 | 1,517,250 | 35.643 | 0.997 | 1018 | LKIQ01000000 |
Genetic diversity and neutrality test statistics by lineage (L1-L4)
| Lineage | N | S | π |
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|
| L1 | 7 | 2238 | 1.7E-4 | −1.157 | −0.9802 | −1.0910 | 0.0812** | 457.62 | 1.180 |
| L2 | 9 | 2240 | 1.3E-4 | −1.627 | −1.66780 | −1.8172 | 0.0795** | 162.92 | 0.430 |
| L1/L2/L3 | 19 | 6249 | 2.7E-4 | −1.622 | −2.1420* | −2.2253* | 0.0631** | 635.80 | 0.671 |
| L4 | 47 | 4892 | 1.1E-4 | −2.205* | −4.3046** | −4.1559** | 0.0334** | 304.47 | 0.490 |
| L4: NYC | 21 | 2262 | 8.9E-5 | −1.649 | −2.5102* | −2.6277* | 0.0593** | 230.01 | 0.7874 |
| L4: S75 | 9 | 149 | 1.5E-5 | −0.498 | 0.07218 | −0.07462 | 0.1297 | −29.14 | −1.125 |
N, number of ingroup sequences; S, number of segregating sites; π, nucleotide diversity; k, average number of nucleotide differences; D , Tajima’s D; R , Ramos-Onsins and Rozas’ R , D and F , Fu and Li’s D and F (calculated with M. bovis as an outgroup); H , Fay and Wu’s H; Hn , Fay and Wu’s normalized H. Statistical significance was assess with 10,000 coalescent simulations (50,000 simulations for R ). *P < 0.05, **P < 0.005
Fig. 1Synapomorphic polymorphisms by functional category and isolate subgroup. a Virulence and adaptation, b Regulatory and information pathways, c Conserved proteins without known function, d Cell wall and lipid metabolism, e Intermediary metabolism and respiration. L4 includes all (n = 47) Lineage 4 isolates included in this study, NYC-NJ (N = 32) includes L4 isolates collected in New York City or New Jersey, USA, including the S75 outbreak cluster, and S75 (N = 9) includes isolates belonging the New Jersey outbreak cluster described in the text. Genes carrying diagnostic SNPs with known functions in virulence, growth, and/or adaptation are listed above each column, and of these genes, those with non-synonymous polymorphisms are highlighted in yellow. The number of total diagnostic SNPs unique to S75 (which includes those unique to L4 and NYC-NJ) are listed in the third column
Fig. 2a Consensus network of 1000 maximum likelihood bootstrap replicates for Mycobacterium tuberculosis isolates from North America, Sub-Saharan Africa, and Asia (n = 71) based on 14,601 SNPs. Branches are color-coded by lineage. Isolates from the S75 cluster, identified in New Jersey in 1997–2001, are highlighted; b World map of isolate collection locations color-coded by lineage
Fig. 3Unfolded site frequency spectra for isolates from the S75 outbreak cluster (L4:S75) and non-S75 isolates from the New York City area. Dark and light blue bars indicate the number of non-synonymous and synonymous SNPs at each SNP allele frequency (from singletons on the left to SNPs at fixation on the right)