| Literature DB >> 25484920 |
Maha R Farhat1, B Jesse Shapiro2, Samuel K Sheppard3, Caroline Colijn4, Megan Murray5.
Abstract
Whole genome sequencing is increasingly used to study phenotypic variation among infectious pathogens and to evaluate their relative transmissibility, virulence, and immunogenicity. To date, relatively little has been published on how and how many pathogen strains should be selected for studies associating phenotype and genotype. There are specific challenges when identifying genetic associations in bacteria which often comprise highly structured populations. Here we consider general methodological questions related to sampling and analysis focusing on clonal to moderately recombining pathogens. We propose that a matched sampling scheme constitutes an efficient study design, and provide a power calculator based on phylogenetic convergence. We demonstrate this approach by applying it to genomic datasets for two microbial pathogens: Mycobacterium tuberculosis and Campylobacter species.Entities:
Year: 2014 PMID: 25484920 PMCID: PMC4256898 DOI: 10.1186/s13073-014-0101-7
Source DB: PubMed Journal: Genome Med ISSN: 1756-994X Impact factor: 11.117
PubMed Search terms and inclusion and exclusion criteria
|
|
|
|
|
|---|---|---|---|
| Identify studies of Pathogen Biology using whole genome sequencing and analysis | ‘genome sequencing’ AND ‘tuberculosis’ AND (‘drug resistance’ OR ‘virulence’ OR ‘immunogenicity’ OR ‘transmissibility’ OR ‘fitness’) | All abstracts describing the use of WGS data to identify genes related to pathogen immunogenicity, virulence, transmissibility, drug resistance, or fitness | (1) Review articles |
| (2) Studies that published new sequence data only | |||
| (3) Studies that did not study MTB bacteria and its biology | |||
| In all PubMed fields | |||
| (4) Studies that only assess mutation rates in non-clinical settings |
Figure 1Flow chart detailing literature search.
Literature search results
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|
| Zhang | [ | 2013 | 161 | Identify drug resistance genes | Yes | No | Phylogenetics and comparison of rates with poisson distribution | Yes; list of genes provided |
| Farhat | [ | 2013 | 124 | Identify drug resistance genes | Yes | No | Phylogenetics and convergence analysis | Yes; list of genes provided |
| Lin | [ | 2013 | 2 | Identify drug resistance genes | Yes | No | Comparison with reference mycobacterial strains | No |
| Wu | [ | 2013 | 4 | Identify Beijing associated pathways | Yes | No | COG enrichment of genes with snps | General pathways rather than individual genes |
| Das | [ | 2013 | 5 | Identify genes related to extrapulmonary TB | Yes | No | COG enrichment of genes with snps | General pathways rather than individual genes |
| Ilina | [ | 2013 | 4 | Identify drug resistance genes | Yes | No | Comparison with reference mycobacterial strains | No |
| Abrahams | [ | 2013 | - | Identify resistance targets(s) for novel imidazole | No | Yes: spontaneous mutants resistant to drug and their sensitive ancestor | Identification of all mutations | Yes |
| Supply | [ | 2013 | 5 | Identify genes associated with smooth TB phenotype | Yes | No | Comparison with reference mycobacterial strains | General pathways rather than individual genes |
| Hartkoorn | [ | 2012 | - | Identify resistance targets(s) for pyridomycin | No | Yes: spontaneous mutants resistant to drug and their sensitive ancestor | Identification of all mutations | Yes acyl-carrier-protein |
| G. Sun | [ | 2012 | 7 | Identify drug resistance genes | Yes | Yes: serial samples from same patient | Identification of all mutations | No; but list of potential candidates with new fixed mutations provided |
| Grzegorzewicz | [ | 2012 | - | Identify resistance targets(s) for novel compound Adamantyl Urea | Yes | Yes: serial samples from the same patient | Identification of all mutations | Yes |
| Casali | [ | 2012 | 59 | Identify drug resistance genes | Yes | No | Phylogenetic tree and parallel evolution and convergence | Yes |
| Tahlan | [ | 2012 | - | Identify resistance targets(s) for novel compound SQ109 | No | Yes: spontaneous mutants resistant to drug and their sensitive ancestor | Identification of all mutations | Yes |
| La Rosa | [ | 2012 | - | Identify resistance target(s) for 1,5-diarylpyrrole derivative BM212 | No | Yes: spontaneous mutants resistant to drug and their sensitive ancestor | Identification of all mutations | Yes |
| Comas | [ | 2011 | 10 | Identify drug resistance genes | Yes | Yes: serial samples from the same patient | Identification of all mutations in rpoc. Assessment of convergence across different strain pairs | Yes confirmed rpoc |
| Manjunatha | [ | 2006 | - | Identify resistance targets(s) for PA-824 | No | Yes: spontaneous mutants resistant to drug and their sensitive ancestor | Identification of all mutations | Yes Rv3547 |
aThe term ‘phenotype related genes’ is used loosely here to describe genes that are associated with but not necessarily causative of the phenotype.
Figure 2Demonstration of the selection strategy. (A) Example initial MIRU-VNTR phylogeny constructed for selection of strains for sequencing and analysis. Grey circles represent strains with the phenotype of interest (ph+ strains), the white circles represent strains without the phenotype of interest (ph- strains). The Table with columns L1-5 represent the variable number of tandem repeat at each locus L. (B) Example of selection methodology: For each ph+ strain (grey circle) a neighboring ph- strain is selected such that the distance between the two strains in the phylogeny is minimized. Each control or study strain is only sampled once. The resultant tree of selected strains will consist of matched study and control strains.
Figure 3Power of the matched convergence test for identify nucleotide sites associated with a phenotype of interest. The average genetic distance between matched strains was set to an intermediate level of s = 100 mutations. Colors represent increasing values of site effect size f .
Figure 4Power of the matched convergence test to identify loci associated with a phenotype of interest. The average distance between matched strains was set at s = 100 mutations. Colors represent increasing values of locus effect size f .
Figure 5Power of the matched convergence test at the locus level as a function of genetic distance ( ) between matched strains pairs. Smaller s indicates closer genetic relatedness between strain pairs.
Figure 6Phylogeny of MTB strains chosen for genotype-phenotype analysis. Dots indicate the presence of the drug resistant phenotype. The tree demonstrates the matching of strains with and without the drug resistance phenotype.
Figure 7Distribution of SNPs/locus across the eight pairs of MTB genomes. Observed counts are represented by black bars. The dashed line represents the upper 95% confidence bounds on a Poisson distribution with the observed number of mutations.
Figure 8Phylogeny of Campylobacter strains. Branches highlighted in green lead up to the strain pairs chosen for genotype-phenotype association. Colored circles denote host specificity: red = cattle, green = chicken, purple = wild bird/non-host, orange = human.
Figure 9Distribution of variants/locus across the eight pairs of genomes. Observed counts are represented by black bars. The dashed red line represents the upper 95% confidence bounds on a Poisson distribution with the observed number of variants. Variant counts per locus for surE and Cj0294 are highlighted.