Literature DB >> 26867134

Practical Approaches for Detecting Selection in Microbial Genomes.

Abstract

Microbial genome evolution is shaped by a variety of selective pressures. Understanding how these processes occur can help to address important problems in microbiology by explaining observed differences in phenotypes, including virulence and resistance to antibiotics. Greater access to whole-genome sequencing provides microbiologists with the opportunity to perform large-scale analyses of selection in novel settings, such as within individual hosts. This tutorial aims to guide researchers through the fundamentals underpinning popular methods for measuring selection in pathogens. These methods are transferable to a wide variety of organisms, and the exercises provided are designed for researchers with any level of programming experience.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2016 PMID： 26867134 PMCID： PMC4750996 DOI： 10.1371/journal.pcbi.1004739

Source DB: PubMed Journal: PLoS Comput Biol ISSN： 1553-734X Impact factor: 4.475

This is part of the PLOS Computational Biology Education collection.

Introduction

Whole-genome sequencing (WGS) of microbial samples is now affordable and fast, which has enabled its widespread use in both research and clinical practice [1-3]. Analysis of the genetic variation within WGS data can help characterize the selective pressures acting on microbial populations [4,5] and provide novel insight into infectious disease transmission [6], the emergence of antibiotic resistance [7,8], and the population dynamics of bacterial epidemics [9,10]. Selection acts on both existing and novel mutations that arise in individuals within a population by removing those mutations detrimental to the fitness of the individual and favoring those that are beneficial. This process can leave a signature across the genome sequences within the population that can reveal which regions are under functional constraint [5,11] or that are rapidly adapting to changes in the environment [12]. This tutorial aims to provide microbiologists possessing limited experience in population genetics analyses with (i) training in statistical methods for detecting selection, (ii) familiarity with the underlying theory, and (iii) an awareness of the assumptions and limitations of these methods. A wide variety of approaches are available to address many questions regarding microbial evolution, and deciding which to take will depend on numerous factors. These include the evolutionary processes acting on the sequences, level of genetic variation present within the data, and computational resources available to the researcher. Here, we provide one approach to performing a basic population genetics analysis of evolution and selection in non-recombining microbial populations and a supplementary exercise demonstrating how these methods can be applied to bacterial WGS data (S1 File). Further examples of where these methods have been employed to address a variety of evolutionary questions in microbial genomics are described in S1 Table. These methods are not robust to homologous recombination and are therefore applicable when it is absent. It is also assumed that short-read sequence data have been aligned to a reference sequence and single nucleotide variants have been detected. The preceding steps in a typical bioinformatics pipeline are described in a number of recent reviews [13,14]. This guide is based on a workshop included as part of a course entitled “Genotype to Phenotype Mapping of Complex Traits” at the European Bioinformatics Institute at the Wellcome Trust Genome Campus (United Kingdom) in July 2014.

Step 1: Construction of a Phylogenetic Tree

Phylogenetic tree methods attempt to reconstruct the evolutionary relationships between a set of sampled sequences (Fig 1a and 1b). Construction of a phylogenetic tree can help to visualize the genetic relatedness between samples, infer the order of branching events, and provide one way to estimate important evolutionary parameters (such as the evolutionary rate, in Step 2). If sequences are sampled from multiple hosts, the phylogeny can also help to infer the transmission history during an epidemic [15-17]. Further details and examples of phylogenetic tree construction and interpretation can be found in several excellent resources (e.g., [18-20]).

Fig 1

Phylogenetic tree reconstruction and evolutionary rate estimation.

Phylogenetic tree reconstruction and evolutionary rate estimation.

A phylogenetic tree comprises a collection of branches that connect sampled sequences at the tips (called taxa) with the most recent common ancestor of the sample. The point where each pair of branches join together is called a node. The lengths of these branches represent the evolutionary distance between sequences at either end, usually measured in numbers of substitutions per site, which can be calculated using the scale bar. The length of the vertical branches and rotation of branches around each node are arbitrary. The tree can be rooted using a divergent sequence (called an outgroup) (a), in which case the direction of substitutions can be inferred and each node represents the common ancestor of all descendent nodes and taxa. The node furthest from the tips is called the root. The tree can also be left unrooted and displayed radially (b) (tip labels have been omitted for visual clarity). Assuming the phylogeny has been rooted correctly, linear regression analysis can be used to test for a signal of a molecular clock by plotting the sampling time of each sequence against its evolutionary distance from the root of the tree. If the test is significant (c), the slope of the regression line (red) can provide an estimate of the evolutionary rate. The lack of any temporal signal (d) may occur if insufficient time has passed for substitutions to accumulate or if the molecular clock has been violated (for example, due to selection, recombination, or hypermutation). Tree-building methods broadly fall into two categories. Distance-based methods use a clustering algorithm to sequentially group clusters of sequences, which makes them relatively fast. These include neighbor-joining (NJ) and un-weighted pair group method with arithmetic mean (UPGMA). However, neither method explicitly models back mutations or multiple hits (successive substitutions at a single site). Character-based methods evaluate a set of plausible trees based on certain criteria. This makes these methods slower, but information regarding the evolutionary history encoded by the characters is retained. These methods include maximum parsimony (MP), maximum likelihood (ML), and Bayesian methods. MP attempts to minimize the number of character changes across the tree. However, this can often underestimate the length of branches. ML and Bayesian methods are more popular, since they allow for specification of a probabilistic model of sequence evolution. These methods enable arbitrarily complex models of sequence evolution, but in the within-host context there may be limited data for reliable inference of highly-parameterized models, and simple models such as Jukes-Cantor may suffice. ML searches for the single tree with the greatest likelihood given the model, while Bayesian methods capture uncertainty in the tree by providing a distribution of trees that are likely given the data and explicit prior beliefs. Many phylogenetic analyses assume that sequences have evolved independently and under a constant evolutionary rate. However, in the presence of selection, convergent evolution may occur, in which the same substitution arises on different branches, which can cause some sequences on the tree to be inferred as more closely related than they truly are. A variety of programs are available for performing phylogenetic analyses of microbial populations. Several methods, including distance-based methods, can easily be carried out using the ape library in R. PhyML and RAxML are popular programs for ML analysis of small and larger datasets, respectively, while the Bayesian phylogenetics software BEAST is commonly employed for estimating time-calibrated trees [21-24]. Analysis of the observed number of substitutions between sequences alone is usually not sufficient to describe the underlying evolutionary process for a set of sequences. Principled statistical inference of phylogenetic trees requires specification of a sequence substitution model, describing the base frequencies (fi) and rate of change from allele i (rows) to allele j (columns) (rij) via entry Qij of a substitution rate matrix, Q. For example, for the general time-reversible nucleotide substitution model (GTR): GTR provides a high degree of flexibility and biological complexity by allowing all rates and frequencies to vary [25]. In some cases, it may be more suitable to use the HKY85 model (e.g., to prevent over-parameterization of limited data). This model distinguishes between transitions and transversions via the transition/transversion rate ratio (κ) [26]. In the simplest case, the Jukes Cantor (JC69) nucleotide substitution model assumes equal base frequencies and mutation rates [27]. Variation in the substitution rate across the genome can be modeled with a gamma distribution, which is often split into four discrete categories for computational efficiency [28].

Step 2: Estimation of the Evolutionary Rate

The substitution or evolutionary rate parameter describes the frequency with which new mutations replace existing variants within a population (they become “fixed”). This parameter differs from the mutation rate, which describes the frequency with which mutations arise during DNA replication. The evolutionary rate can provide some indication of the adaptive potential of the population in response to environmental changes. It is often termed the “clock rate” in reference to the molecular clock hypothesis that substitutions arise regularly over time in a population [29]. The evolutionary rate is often assumed constant across all branches in the phylogenetic tree (a strict molecular clock), in which case the branch lengths are interpreted as proportional to the time that elapsed between the ancestor and descendant of each branch. Support for a strict clock can also be tested using the relative rates test, which compares the distance of each individual in a pair of taxa with a more distantly related taxon [30,31]. Otherwise, the evolutionary rate might be estimated per branch (a relaxed molecular clock [32]) to investigate differences in evolutionary rate across time or space [33,34]. If the sampling times of genome sequences are known, then the evolutionary rate can be calibrated in terms of substitutions per site per unit time. The evolutionary rate can be quickly estimated by plotting the sampling time of each isolate against the total branch distance to the root of the phylogenetic tree, provided the position of the root is accurate (Fig 1c and 1d). The date-randomization test repeatedly shuffles the sampling times across the tips to generate the rate distribution expected in the absence of any temporal signal. If the rate estimated with the correct sample times lies sufficiently outside this distribution, this is deemed as support for clock-like behavior. Bayesian phylogenetics approaches such as BEAST can model the evolutionary rate parameter on each branch of the tree, allowing estimation of the variation in evolutionary rate across branches and the uncertainty in parameter estimates [23,24]. Estimates of the evolutionary rate are often made under the assumption of neutral evolution. The presence of selection can distort branch lengths in the phylogenetic tree and lead to inaccurate estimates of the evolutionary rate.

Step 3: Genome Annotation

Popular approaches to detecting selection rely on classification of substitutions according to their likely functional effect. This is discussed in Step 4, but first requires an interpretation of the genomic context in which substitutions occur, and this falls under the auspices of genome annotation. At its simplest, genome annotation involves prediction of coding sequences by identifying open reading frames (ORFs), which are regions of DNA sequence that encode a single polypeptide. However, sophisticated annotation pipelines now exist that perform a variety of functions that combine direct interpretation of the sequence with the borrowing or "lifting over" of annotations from other, better-studied reference genomes via searches for sequence similarity (homology). Annotation can be carried out using a variety of Web-based or locally installed systems (reviewed by [35]), such as XBASE [36], GeneMark [37], GLIMMER [38,39], BASys [40], RAST [41], and Prokka [42]. The accuracy of automated genome annotation is dependent on several factors, including the accuracy of reference genome databases and the pseudogene content and quality of the query genome, meaning that manual checking is often necessary [35].

Step 4: Classification of Substitutions

In order to perform basic tests for selection, it is necessary to classify all substitutions. At the most basic level, this can involve distinguishing protein-altering (non-synonymous) from non—protein-altering (synonymous) substitutions in coding regions. More sophisticated classification may further distinguish protein-truncating (nonsense) and intergenic (outside a coding region) substitutions, and it may sub-classify substitutions in coding regions by the function of the gene or non-coding substitutions by the regulatory function of the region or the distance from a gene [43,44]. When classifying substitutions, it helps to reconstruct ancestral sequences at internal nodes of the tree, which is usually carried out using parsimony or a probabilistic model of sequence evolution that returns the most likely ancestral sequences [45-47]. The programs FastML and PAML use maximum likelihood to perform ancestral sequence reconstruction for nucleotide, codon, or amino acid sequences [47-50]. The simplest method of classifying amino acid substitutions is to assume no more than a single nucleotide in the triplet changes along a branch. However, a more sophisticated approach is required when multiple sites in a codon may have undergone substitution. For these reasons, ML methods have been developed for estimating the number of synonymous and non-synonymous substitutions along a branch, which also account for variation in transition rates and base frequency [51,52].

Step 5: Testing for Selection

Selection can act on genetic variation in different ways. In a simple model of directional selection, a novel mutation may be favored if it confers some sort of selective advantage to the bacterium (positive selection) or it may be disfavored if the mutation is deleterious to the bacterium (purifying or negative selection). Both positive and negative selection can be measured at individual amino acids, across genes or over the entire genome. Here, we outline three approaches that can be applied to divergent microbial populations in the absence of recombination to detect selection acting on genes in the population since their most recent common ancestor. When applying these methods to clonally evolving bacteria, it’s also important to consider how the tight linkage across sites can affect estimates of selection (for reviews, see [53,54]).

A) Elevated substitution rates signal positive selection

Sites or genes are expected to mutate independently in microbial genomes within different individuals (or populations). Observing the recurrent emergence of the same substitution within different individuals is a signature of parallel or convergent evolution, most likely in response to a common selection pressure (Fig 2) [55]. For example, the selective pressure exerted on Mycobacterium tuberculosis by antimicrobial drugs during tuberculosis (TB) treatment is clearly identified by the frequent emergence of the same drug resistance point mutation within different patients [8]. Signals of positive selection may also manifest as numerous different substitutions across sites within a gene, given they are likely to have similar effects on the encoded protein [56]. The rpoB gene in Mycobacterium tuberculosis can mutate at several different sites within a “hot spot” region to confer resistance to the first-line anti-tuberculosis drug rifampicin [57].

Fig 2

Detecting selection from microbial sequence data.

Detecting selection from microbial sequence data.

The phylogeny shows the evolutionary history of 20 sequences sampled evenly from four divergent populations. dN/dS methods test for selection by comparing the rates of non-synonymous and synonymous substitution occurring between divergent lineages (i.e., only substitutions that have occurred on the black branches) with those expected under neutrality. In contrast, the McDonald-Kreitman test for selection compares the ratio of non-synonymous and synonymous polymorphisms that are present within populations (due to substitutions occurring on red branches) with the ratio of non-synonymous and synonymous fixed differences that are present between populations (due to substitutions occurring on black branches). The phylogeny can also be used to detect selection by identifying parallel evolution, whereby recurrent mutations occur at a site or across a gene during the evolutionary history of a sample (for example, substitution X on the phylogeny). Under the null hypothesis of neutral evolution, constant mutation rates across genes, and no recombination (H), the number of substitutions per gene is expected to follow a Poisson process. The number of substitutions expected per gene can be calculated by multiplying the per-site mutation rate and the length of the gene. Any significant increase in the substitution rate of a gene from that expected under H can be used as support for positive selection having acted on the gene. However, an elevated substitution rate within a gene of interest may be due to a number of other factors, including variation in the mutation rate across genes or recombination. Therefore, more commonly used methods for detecting positive selection look for a significant difference in the rate of substitutions that have a functional effect on the protein relative to those that do not.

B) Estimates of dN/dS

Comparison of the rate of non-synonymous substitution per non-synonymous site (dN) to the rate of synonymous substitution per synonymous site (dS) is a popular method of detecting selection between divergent populations [58,59]. Due to the redundancy of the genetic code, random mutations generate a greater number of non-synonymous than synonymous substitutions. In order to estimate dN/dS, the ratio of raw counts of non-synonymous and synonymous substitutions must be adjusted by the ratio that one would expect to see in the absence of any selection (i.e., under strict neutrality). The null hypothesis (H) is that the ratio of non-synonymous and synonymous counts does not significantly differ from the ratio expected by chance (r). This means that when dN/dS is close to one, it is inferred to be evolving strictly neutrally, in the absence of selection. Estimates >1 suggest that positive selection has acted on the sequence, while those <1 are indicative of negative selection. The estimate of dN/dS under the null hypothesis can be obtained via calculation of the codon substitution rate matrix, which describes the rate of substitution from one codon to another. The Nielsen and Yang (NY98) model of codon substitution is similar to the HKY85 model of nucleotide substitution, in that it allows both codon frequencies and the rates of transitions and transversions to vary [59]. Since there are many more codons than bases, the NY98 model is described by a (61 × 61) Q matrix (rather than the 4 × 4 HKY85 matrix above), which includes the probability of transitions between all pairs of amino-acid codons (rather than nucleotides). The model includes a parameter ω, representing the value of dN/dS and κ, the transition/transversion rate ratio. Rather than drawing the entire rate matrix for the NY98 model, we can describe it for a given pair of codons i and j, as: The codon frequencies, f, are often estimated directly from the sequence data, while κ can be estimated using maximum likelihood approaches, such as those implemented in the phylogenetics software PhyML [21]. Either ω can be estimated formally and tested against the null hypothesis that it equals one under neutrality, or the expected ratio r of non-synonymous and synonymous counts can be computed under neutrality and compared to the observed counts from Step 4 to test for any signal of positive or negative selection. However, application of dN/dS methods to microbial populations is complicated by several factors. Firstly, the test may be statistically underpowered for detecting non-neutral dN/dS per site if the number of substitutions expected at any individual position is small. Usually it is more powerful to sum substitutions across sites in the same gene to estimate a per-gene dN/dS, which can reveal whether selection has acted differently across genes. Secondly, the existence of sites subject to negative selection is highly likely in any functional protein-coding sequence, and these sites reduce the true value of dN/dS to below the value of one predicted under the strict neutrality hypothesis. The presence of sites subject to negative selection reduces the probability, and hence statistical power, to detect positive selection even when it is present. Thirdly, the dN/dS statistic assumes that differences between lineages are fixed (i.e., that lineages have been diverging for a long time), while substitutions between isolates sampled from closely related microbial populations (e.g., between hosts in an outbreak) are likely to represent segregating polymorphisms [60]. Within-population microbial variation has often arisen relatively recently and due to the evolutionary time-lag, selection may not yet have had time to purge deleterious mutations and fix beneficial mutations. Therefore patterns of polymorphism are expected to appear more neutral (dN/dS closer to one) than patterns of fixation. Over time, slightly deleterious non-synonymous substitutions are purged from the population, so estimates of dN/dS tend to decrease as sampled microbial lineages diverge from their most recent common ancestor [60]. The McDonald-Kreitman test, described in the next section, takes advantage of this phenomenon by comparing the divergence between lineages with the polymorphism within them, giving it greater power to detect selection [61].

C) The McDonald-Kreitman test

The McDonald-Kreitman (MK) test tests for non-neutral evolution by comparing the ratio of non-synonymous to synonymous polymorphisms within a species (Pn/Ps) to the ratio of non-synonymous to synonymous fixed differences between species (Dn/Ds) (Fig 2) [61]. It compares the ratios of raw counts without directly calculating a dN/dS ratio. Although it is often applied to test for selection within species, it can also be applied to sub-populations (e.g., comparing within and between host rates of substitution). The test is set up with a two-way contingency table (Table 1).

Table 1

Two-way contingency table used in the MacDonald-Kreitman test.

	Fixed differences	Polymorphisms
Synonymous mutations	D_s	P_s
Non-synonymous mutations	D_n	P_n

Dn/Ds > Pn/Ps, indicates an excess of non-synonymous changes among the fixed differences distinguishing the two groups, thus implying positive selection. Dn/Ds < Pn/Ps represents a paucity of non-synonymous fixed differences between groups, indicating their removal by purifying selection. The proportion of non-synonymous substitutions (α) under positive selection can be calculated for each gene individually, or a genome-wide estimate of α can be obtained by averaging these count data across genes [62]. The MK test is robust to variation in the mutation rate and evolutionary histories across sites in the genome [63]. However, the presence of mildly deleterious mutations that are not immediately purged from the population increases Pn/Ps and reduces estimates of α, leading to loss of power to detect positive selection. Extensions of the MK test attempt to remove the effect of mildly deleterious mutations by excluding polymorphisms segregating at low frequencies from the analysis [64,65].

Conclusions

This tutorial has demonstrated how basic population genetics methods can be applied to microbial WGS data to learn about their evolutionary history and the selective pressures acting on them. The methods presented here and in the accompanying exercise (S1 File) have not attempted to address analysis of selection in recombining bacteria. In analyses that rely on estimation of phylogenetic trees, homologous recombination and horizontal gene transfer risk causing false detection of positive selection [66-68]. Several methods are available for detecting such processes (for reviews see [69-71]), while new methods developed specifically for application to whole bacterial genomes are also now available [72-74].

Exercise: Practical approaches for detecting within-host selection in Burkholderia dolosa.

Compressed file containing all material for the exercise, including the description of the exercises and input data files. (ZIP) Click here for additional data file.

Microbial genomics applied.

A selection of published analyses employing the methods described in Steps 1–5 to address a range of evolutionary questions across different microbial species. (PDF) Click here for additional data file.

66 in total

1. A fast algorithm for joint reconstruction of ancestral amino acid sequences.

Authors: T Pupko; I Pe'er; R Shamir; D Graur
Journal: Mol Biol Evol Date: 2000-06 Impact factor: 16.240

2. Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models.

Authors: Z Yang; R Nielsen
Journal: Mol Biol Evol Date: 2000-01 Impact factor: 16.240

3. Positive and negative selection on the human genome.

Authors: J C Fay; G J Wyckoff; C I Wu
Journal: Genetics Date: 2001-07 Impact factor: 4.562

Review 4. Recombination in evolutionary genomics.

Authors: David Posada; Keith A Crandall; Edward C Holmes
Journal: Annu Rev Genet Date: 2002-06-11 Impact factor: 16.830

Review 5. The evolutionary genomics of pathogen recombination.

Authors: Philip Awadalla
Journal: Nat Rev Genet Date: 2003-01 Impact factor: 53.242

6. Effect of recombination on the accuracy of the likelihood method for detecting positive selection at amino acid sites.

Authors: Maria Anisimova; Rasmus Nielsen; Ziheng Yang
Journal: Genetics Date: 2003-07 Impact factor: 4.562

7. Potential impact of recombination on sitewise approaches for detecting positive natural selection.

Authors: Daniel Shriner; David C Nickle; Mark A Jensen; James I Mullins
Journal: Genet Res Date: 2003-04 Impact factor: 1.588

8. Phylogeny for the faint of heart: a tutorial.

Authors: Sandra L Baldauf
Journal: Trends Genet Date: 2003-06 Impact factor: 11.639

9. Improved microbial gene identification with GLIMMER.

Authors: A L Delcher; D Harmon; S Kasif; O White; S L Salzberg
Journal: Nucleic Acids Res Date: 1999-12-01 Impact factor: 16.971

10. Adaptive protein evolution in Drosophila.

Authors: Nick G C Smith; Adam Eyre-Walker
Journal: Nature Date: 2002-02-28 Impact factor: 49.962

10 in total

1. Genomic Diversity and Recombination among Xylella fastidiosa Subspecies.

Authors: Mathieu Vanhove; Adam C Retchless; Anne Sicard; Adrien Rieux; Helvecio D Coletta-Filho; Leonardo De La Fuente; Drake C Stenger; Rodrigo P P Almeida
Journal: Appl Environ Microbiol Date: 2019-06-17 Impact factor: 4.792

2. Phylogenomics and antimicrobial resistance of the leprosy bacillus Mycobacterium leprae.

Authors: Andrej Benjak; Charlotte Avanzi; Pushpendra Singh; Chloé Loiseau; Selfu Girma; Philippe Busso; Amanda N Brum Fontes; Yuji Miyamoto; Masako Namisato; Kidist Bobosha; Claudio G Salgado; Moisés B da Silva; Raquel C Bouth; Marco A C Frade; Fred Bernardes Filho; Josafá G Barreto; José A C Nery; Samira Bührer-Sékula; Andréanne Lupien; Abdul R Al-Samie; Yasin Al-Qubati; Abdul S Alkubati; Gisela Bretzel; Lucio Vera-Cabrera; Fatoumata Sakho; Christian R Johnson; Mamoudou Kodio; Abdoulaye Fomba; Samba O Sow; Moussa Gado; Ousmane Konaté; Mariane M A Stefani; Gerson O Penna; Philip N Suffys; Euzenir Nunes Sarno; Milton O Moraes; Patricia S Rosa; Ida M F Dias Baptista; John S Spencer; Abraham Aseffa; Masanori Matsuoka; Masanori Kai; Stewart T Cole
Journal: Nat Commun Date: 2018-01-24 Impact factor: 14.919

3. Pairwise diversity and tMRCA as potential markers for HIV infection recency.

Authors: Sikhulile Moyo; Eduan Wilkinson; Alain Vandormael; Rui Wang; Jia Weng; Kenanao P Kotokwe; Simani Gaseitsiwe; Rosemary Musonda; Joseph Makhema; Max Essex; Susan Engelbrecht; Tulio de Oliveira; Vladimir Novitsky
Journal: Medicine (Baltimore) Date: 2017-02 Impact factor: 1.889

4. Rapidly evolving changes and gene loss associated with host switching in Corynebacterium pseudotuberculosis.

Authors: Marcus Vinicius Canário Viana; Arne Sahm; Aristóteles Góes Neto; Henrique Cesar Pereira Figueiredo; Alice Rebecca Wattam; Vasco Azevedo
Journal: PLoS One Date: 2018-11-12 Impact factor: 3.240

5. Dissecting the molecular evolution of fluoroquinolone-resistant Shigella sonnei.

Authors: Hao Chung The; Christine Boinett; Duy Pham Thanh; Claire Jenkins; Francois-Xavier Weill; Benjamin P Howden; Mary Valcanis; Niall De Lappe; Martin Cormican; Sonam Wangchuk; Ladaporn Bodhidatta; Carl J Mason; To Nguyen Thi Nguyen; Tuyen Ha Thanh; Vinh Phat Voong; Vu Thuy Duong; Phu Huong Lan Nguyen; Paul Turner; Ryan Wick; Pieter-Jan Ceyssens; Guy Thwaites; Kathryn E Holt; Nicholas R Thomson; Maia A Rabaa; Stephen Baker
Journal: Nat Commun Date: 2019-10-23 Impact factor: 14.919

6. Population genomics provides insights into the evolution and adaptation to humans of the waterborne pathogen Mycobacterium kansasii.

Authors: Tao Luo; Peng Xu; Yangyi Zhang; Jessica L Porter; Marwan Ghanem; Qingyun Liu; Yuan Jiang; Jing Li; Qing Miao; Bijie Hu; Benjamin P Howden; Janet A M Fyfe; Maria Globan; Wencong He; Ping He; Yiting Wang; Houming Liu; Howard E Takiff; Yanlin Zhao; Xinchun Chen; Qichao Pan; Marcel A Behr; Timothy P Stinear; Qian Gao
Journal: Nat Commun Date: 2021-05-03 Impact factor: 14.919