Literature DB >> 23169001

d(N)/d(S) does not show positive selection drives separation of polar-tropical SAR11 populations.

Haiwei Luo, Austin L Hughes.   

Abstract

Entities:  

Mesh:

Year:  2012        PMID: 23169001      PMCID: PMC3531906          DOI: 10.1038/msb.2012.58

Source DB:  PubMed          Journal:  Mol Syst Biol        ISSN: 1744-4292            Impact factor:   11.429


× No keyword cloud information.
Large-scale sequencing of environmental microorganisms (including genomes, metagenomes, and metatranscriptomes) now allows researchers to apply sequence-based evolutionary genetics approaches to microbial ecology questions, including studying how microbes adapt to a diversity of natural environments. There is, however, a danger for misapplication and misinterpretation if the caveats and limitations of these evolutionary methods are not fully appreciated. Here, we highlight an example of such problems in an evolutionary analysis of potential positive selection within strains of SAR11 marine bacteria (Brown et al, 2012). We feel that these issues deserve wider recognition, particularly as evolutionary approaches become more widespread among environmental microbiologists. In the vast majority of genes, the ratio (ω) of non-synonymous substitutions per non-synonymous site (d) to synonymous substitutions per synonymous site (d) is <1, which is evidence of purifying selection, acting to remove deleterious mutations (most of which are non-synonymous). If ω is >1, then this may be indicative of positive selection favoring amino-acid replacements (Hughes and Nei, 1988, Hughes, 1999). If ω is not statistically different from 1, then there is evidence for neither purifying nor positive selection. Positive selection is not the only factor that can cause ω to be >1; since ω is a ratio, an unusually high ω value may indicate either an unusually high d or an unusually low d. For example, purifying selection on synonymous mutations can lead to a low d and thus high ω (Parmley and Hurst, 2007). Approaches for calculating d and d can be classified into two categories, based on whether the underlying nucleotide evolutionary model takes into account the base/codon frequency bias (Supplementary Table S1; Zhang et al, 2006). For instance, the Yang and Nielsen method (YN00) considers the base/codon frequency bias (Yang and Nielsen, 2000), whereas the Nei and Gojobori method (NG86) does not (Nei and Gojobori, 1986). Taking base/codon frequency into account can be important when nucleotide content biases are sufficiently strong to reduce the observed number of synonymous substitutions. In a recent publication, Brown et al (2012) calculated ω using the NG86 method over full gene alignments as well as over 60-codon sliding windows for orthologs shared by two SAR11 genomes, IMCC9063 from polar waters and HIMB114 from tropical waters. They identified 116 genes that had at least one window showing ω>1, and claimed that these genes are under positive selection (hereafter, 116 PS genes). They conducted a similar analysis with two other polar-tropical pairs of genomes, ACE_P3.2 (polar) versus HIMB114 (96 PS genes), and ACE_P1a.1 (polar) versus HTCC7211 (tropical) (50 PS genes). However, there are problems with the interpretation of ω in these analyses. Here, we present in-depth analyses of ω calculation for orthologous genes shared by IMCC9063 and HIMB114. We did not include the other two genome pairs of Brown et al (2012) because the ACE_P3.2 and ACE_P1a.1 sequences were derived from metagenomes, which could be subject to assembly errors. However, similar average nucleotide identity (ANI) between all pairs (74%, 70%, and 76%) suggests that conclusions from the IMCC9063/HIMB114 analysis will also be relevant to these pairs. The 1083 orthologous genes shared by IMCC9063 and HIMB114 (Supplementary Table S2) were aligned using the same approach as in Brown et al (2012). The median of G+C content at the third codon position (%GC3) is 0.20 among all orthologous genes and 0.19 among the 116 PS genes, suggesting a strong bias in base/codon usage. Using PAML software (Yang, 1997), the NG86 method (which was used by Brown et al and does not consider base/codon frequency bias) and YN00 method (which considers base/codon frequency bias) gave similar estimates of d (median=0.29 and 0.27, respectively), but very different estimates of d (median=1.47 and 3.64, respectively) (Supplementary Table S3). We hypothesized that the failure to account for base/codon frequency bias in genomes with evidence for strong bias causes the NG86 approach to underestimate d in these data sets. To test this hypothesis, we employed several other methods in common use for calculating ω (implemented in the KaKs_Calculator software of Zhang et al, 2006). Methods that assume equal base frequency (hereafter, ‘equal frequency methods'; these include NG86, LWL85, LPB93, MLWL04, and MLPB04) consistently produced greater estimates of ω and lower estimates of d than those which take into account the base/codon frequency bias (hereafter, ‘frequency bias methods'; these include YN00, GY94, MYN06, MS06, and MA06; these 10 methods are all referred in Zhang et al, 2006), but comparable results for d. For simplicity, only comparison of MS06 with NG86 was shown (Figure 1), but the results of other frequency bias methods were similar. The best-fit model for ω calculation among the 14 candidate DNA sequence evolutionary models (Supplementary Table S1; Zhang et al, 2006) was always one of the latter models (Supplementary Table S4) using corrected Akaike information criterion (AICC).
Figure 1

Comparison of the MS06 method and the NG86 method in estimating (A) d and d ratio (ω), (B) d, (C) d of 851 orthologous genes shared by IMCC9063 and HIMB114 with measurable ω. MS06 is a maximum likelihood method implementing automatic model selection based on AICC. The selected best-fit models for these data sets are always those taking into account the nucleotide/codon frequency bias. NG86 is an approximate method without considering the nucleotide/codon frequency bias; it is used by Brown et al (2012). Values of d » 1.0 and calculated ω using these d are probably meaningless, because synonymous sites are saturated with substitutions.

Thus, equal frequency methods do not provide reliable estimates of d and ω for base/codon frequency biased genomes, such as IMCC9063 and HIMB114. Further, because synonymous sites are saturated with substitutions when d » 1.0 (Ochman et al, 1999), the finding that the d of most orthologous genes was ≥3 using the frequency bias methods (Figure 1B) suggests that even these methods cannot accurately estimate ω at the level of genetic divergence of these genome pairs. Because stochastic errors can be substantial in a short sequence, the calculation of ω over 60-codon sliding windows by Brown et al (2012) was particularly problematic. Using this approach, d is significantly correlated with %GC3 in 76% of all genes and 70% of the 116 PS genes (Supplementary Table S3). Among the significantly correlated genes, 92% are positively correlated (Supplementary Table S3). We identified 107 genes, which contain at least one window of 60 codons showing ω>1 using the NG86 method employed by Brown et al (2012), including 92 that were found in the 116 PS genes. For all 2951 windows with ω>1 (Supplementary Table S5), the Spearman rank correlation between d and %GC3 was 0.506 (P<0.001). The phenomenon of the correlation between d and %GC3 suggests that the small sliding window analysis is not appropriate; G+C content is a significant determinant of the calculated d and thus affects ω for these windows. This is true whatever factors (including stochastic error, mutational bias, or natural selection) cause variation in G+C content. A final concern in the approach of Brown et al (2012) is that the statistical significance of cases in which ω>1 was not determined. A Fisher's exact test (implemented in the KaKs_Calculator) showed that only one ortholog pair (HIMB114_0556 and SAR11G3_00373) contained at least one 60-codon window showing ω>1 with statistical significance (Supplementary Table S6). Here, ω was estimated by NG86 using KaKs_Calculator, which is slightly different from that estimated by NG86 using PAML. This ortholog pair contains five such windows, all of which have extremely low %GC3, ranging from 0.08 to 0.11, and low d compared with other windows in the same gene, yet d is only slightly greater than the median, suggesting that the low G+C content reduced the number of observed synonymous substitutions in these windows, resulting in ω>1. In addition, this orthologous gene encodes a putative porphobilinogen deaminase; there is no evidence showing that it is involved in cold adaptation. In conclusion, the finding of Brown et al (2012) that positive selection drives biogeographic separation of polar and tropical phylotypes of SAR11 populations is based on an evolutionary model that does not account for the low G+C/high codon bias of the genomes being analyzed. Our analysis of evolutionary statistics using optimized evolutionary models does not support the contention that 116 genes shared by polar and tropical SAR11 populations have diverged as a result of positive selection.
  8 in total

1.  Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models.

Authors:  Z Yang; R Nielsen
Journal:  Mol Biol Evol       Date:  2000-01       Impact factor: 16.240

2.  Calibrating bacterial evolution.

Authors:  H Ochman; S Elwyn; N A Moran
Journal:  Proc Natl Acad Sci U S A       Date:  1999-10-26       Impact factor: 11.205

3.  Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions.

Authors:  M Nei; T Gojobori
Journal:  Mol Biol Evol       Date:  1986-09       Impact factor: 16.240

4.  PAML: a program package for phylogenetic analysis by maximum likelihood.

Authors:  Z Yang
Journal:  Comput Appl Biosci       Date:  1997-10

5.  Pattern of nucleotide substitution at major histocompatibility complex class I loci reveals overdominant selection.

Authors:  A L Hughes; M Nei
Journal:  Nature       Date:  1988-09-08       Impact factor: 49.962

6.  How common are intragene windows with KA > KS owing to purifying selection on synonymous mutations?

Authors:  Joanna L Parmley; Laurence D Hurst
Journal:  J Mol Evol       Date:  2007-06-07       Impact factor: 2.395

7.  Global biogeography of SAR11 marine bacteria.

Authors:  Mark V Brown; Federico M Lauro; Matthew Z DeMaere; Les Muir; David Wilkins; Torsten Thomas; Martin J Riddle; Jed A Fuhrman; Cynthia Andrews-Pfannkoch; Jeffrey M Hoffman; Jeffrey B McQuaid; Andrew Allen; Stephen R Rintoul; Ricardo Cavicchioli
Journal:  Mol Syst Biol       Date:  2012-07-17       Impact factor: 11.429

8.  KaKs_Calculator: calculating Ka and Ks through model selection and model averaging.

Authors:  Zhang Zhang; Jun Li; Xiao-Qian Zhao; Jun Wang; Gane Ka-Shu Wong; Jun Yu
Journal:  Genomics Proteomics Bioinformatics       Date:  2006-11       Impact factor: 7.691

  8 in total
  5 in total

Review 1.  Evolutionary ecology of the marine Roseobacter clade.

Authors:  Haiwei Luo; Mary Ann Moran
Journal:  Microbiol Mol Biol Rev       Date:  2014-12       Impact factor: 11.056

2.  Evolutionary analysis of a streamlined lineage of surface ocean Roseobacters.

Authors:  Haiwei Luo; Brandon K Swan; Ramunas Stepanauskas; Austin L Hughes; Mary Ann Moran
Journal:  ISME J       Date:  2014-01-23       Impact factor: 10.302

3.  Homologous Recombination in Core Genomes Facilitates Marine Bacterial Adaptation.

Authors:  Ying Sun; Haiwei Luo
Journal:  Appl Environ Microbiol       Date:  2018-05-17       Impact factor: 4.792

4.  Genetic and phenotypic intra-species variation in Candida albicans.

Authors:  Matthew P Hirakawa; Diego A Martinez; Sharadha Sakthikumar; Matthew Z Anderson; Aaron Berlin; Sharvari Gujja; Qiandong Zeng; Ethan Zisson; Joshua M Wang; Joshua M Greenberg; Judith Berman; Richard J Bennett; Christina A Cuomo
Journal:  Genome Res       Date:  2014-12-11       Impact factor: 9.043

5.  The use of evolutionary approaches to understand single cell genomes.

Authors:  Haiwei Luo
Journal:  Front Microbiol       Date:  2015-03-10       Impact factor: 5.640

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.