Chiara Pontremoli1, Diego Forni1, Rachele Cagliani1, Uberto Pozzoli1, Mario Clerici2,3, Manuela Sironi1. 1. Bioinformatics, Scientific Institute IRCCS E. MEDEA, 23842 Bosisio Parini, Italy. 2. Department of Physiopathology and Transplantation, University of Milan, 20090 Milan, Italy. 3. Don C. Gnocchi Foundation ONLUS, IRCCS, 20148 Milan, Italy.
Abstract
Telomeres protect the ends of eukaryotic chromosomes and are essential for cell viability. In mammals, telomere dynamics vary with life history traits (e.g. body mass and longevity), suggesting differential selection depending on physiological characteristics. Telomeres, in analogy to centromeric regions, also represent candidate meiotic drivers and subtelomeric DNA evolves rapidly. We analyzed the evolutionary history of mammalian genes implicated in telomere homeostasis (TEL genes). We detected widespread positive selection and we tested two alternative hypotheses: (i) fast evolution is driven by changes in life history traits; (ii) a conflict with selfish DNA elements at the female meiosis represents the underlying selective pressure. By accounting for the phylogenetic relationships among mammalian species, we show that life history traits do not contribute to shape diversity of TEL genes. Conversely, the evolutionary rate of TEL genes correlates with expression levels during meiosis and episodes of positive selection across mammalian species are associated with karyotype features (number of chromosome arms). We thus propose a telomere drive hypothesis, whereby (sub)telomeres and telomere-binding proteins are engaged in an intra-genomic conflict similar to the one described for centromeres.
Telomeres protect the ends of eukaryotic chromosomes and are essential for cell viability. In mammals, telomere dynamics vary with life history traits (e.g. body mass and longevity), suggesting differential selection depending on physiological characteristics. Telomeres, in analogy to centromeric regions, also represent candidate meiotic drivers and subtelomeric DNA evolves rapidly. We analyzed the evolutionary history of mammalian genes implicated in telomere homeostasis (TEL genes). We detected widespread positive selection and we tested two alternative hypotheses: (i) fast evolution is driven by changes in life history traits; (ii) a conflict with selfish DNA elements at the female meiosis represents the underlying selective pressure. By accounting for the phylogenetic relationships among mammalian species, we show that life history traits do not contribute to shape diversity of TEL genes. Conversely, the evolutionary rate of TEL genes correlates with expression levels during meiosis and episodes of positive selection across mammalian species are associated with karyotype features (number of chromosome arms). We thus propose a telomere drive hypothesis, whereby (sub)telomeres and telomere-binding proteins are engaged in an intra-genomic conflict similar to the one described for centromeres.
Telomeres, the protein/DNA complexes that preserve the ends of eukaryotic chromosomes, are essential for genomic stability and cell viability. Telomeres have two main functions: to protect the chromosome terminus from nuclease and DNA repair activities and to provide a mechanism of compensation for the inability of global genome DNA polymerases to replicate the 5′ end of a linear chromosome (1).Mammalian telomeres consist of hundreds to thousands of tandem repeats of the sequence 5′-TTAGGG-3′. The majority of these repeats are double-stranded, but the very end of each chromosome has a single-stranded G-rich 3′ overhang, which is important for telomerase-driven telomere extension (2). Mammalian telomerase is a ribonucleoprotein, with a catalytic core formed by the TERT protein component and by the TERC RNA component. In the holoenzyme, telomerase associates with several proteins, which contribute to the regulation, maturation, assembly, and localization of the telomerase complex (Figure 1).
Figure 1.
Mammalian telomeric complex. The TEL proteins we analyzed are shown to provide a general overview of their function. Gene products are colour-coded. INM = inner nuclear membrane.
Mammalian telomeric complex. The TEL proteins we analyzed are shown to provide a general overview of their function. Gene products are colour-coded. INM = inner nuclear membrane.Telomere function is also critically dependent on a complex of telomere-associated proteins known as shelterin, whose major function is to protect the chromosome ends (Figure 1).Telomere homeostasis is essential for cell viability, as telomere dysfunction leads to senescence, apoptosis and malignant transformation. In fact, mutations in genes encoding proteins required for telomere elongation, repair, and maintenance are associated with humangenetic disorders such as diskeratosis congenita, one of the primary complex telomere biology disorders (TBDs) (3). Notably, genome instability is often a hallmark of TBDs (3).In most human somatic cells, telomerase activity is undetectable and telomere length is progressively shortened during cell replication (4). As a result, somatic cells undergo replicative senescence, which plays an important role in suppressing tumorigenesis (5). Telomerase repression in somatic tissues is not an universal phenomenon among mammals, and several small-sized species, which are characterized by a short lifespan, express telomerase in somatic cells (6). In a study performed on >60 mammalian species, telomerase expression was found to co-evolve with body mass, whereas telomere length inversely correlated with lifespan (which is, in turn, strongly correlated with mass) (6). A commonly accepted explanation for these observations is that replicative senescence evolved to mitigate the increased cancer risk conferred by a large number of cells (large body mass) and long lifespans (5). Because lifespan is related to several life-history traits such as extrinsic mortality risk and reproductive strategies (5,7), telomere dynamics were suggested to be subject to natural selection depending on the ecological characteristics of different mammalian species (7). This clearly raises the possibility that genes involved in telomere elongation, maintenance, and repair evolved at different rates in distinct mammals and that natural selection shaped their diversity across species.Recent evidence indicated that the evolution of genes involved in the maintenance of telomere integrity was dominated by positive selection in Drosophila (8). Because Drosophila telomeres are not composed of 5′-TTAGGG-3′ repeats but, rather, they are generated by insertion of specific retrotransposons at the chromosome termini, the rapid evolution of telomere-integrity proteins may be driven by the need to avoid retrotrasposon over-proliferation (8). Alternatively, an intra-genomic conflict with telomeric selfish elements that distort meiotic segregation in females may explain the fast evolution of Drosophila proteins involved in telomere homeostasis (8). The latter possibility is reminiscent of the proposed evolutionary scenario for kinetochore proteins, which are engaged in a conflict with selfish centromeric DNA (centromere drive hypothesis) (9–12).Telomeres play an essential role in meiosis: at the early stages of meiosis I all the telomeres in the cell attach to the internal nuclear membrane (INM) and move to the centrosome in a ‘bouquet’ configuration. Bouquet formation is important to ensure homolog pairing and recombination (13). In mammals, three specialized telomere-associated proteins specifically expressed at meiosis (TERB1, TERB2 and MAJIN) determine attachment to the INM (14) (Figure 1).Although mammalian telomeres are invariably composed of 5′-TTAGGG-3′ repeats, subtelomeric DNA evolves rapidly (15), suggesting that proteins involved in telomere maintenance and elongation may take part in some form of intra-genomic conflict.Herein, we analyzed the evolutionary history of mammalian genes implicated in telomere protection, elongation and maintenance (TEL genes). We found pervasive positive selection at TEL genes and we used different approaches to determine the most likely underlying selective pressures.
MATERIALS AND METHODS
Gene selection
TEL genes were selected on the basis of the following GO terms (as of 1 June 2016): ‘Telomerase holoenzyme complex’ (GO:0005697) and ‘Telomere cap complex’ (GO: 0000782) filtering with ‘direct assay’ as evidence and ‘Homo sapiens’ as organism. We added RTEL1 and PARN as these genes were associated to telomere dysfunction and humangenetic diseases (3) (Figure 1). Mammalian homologs of humanTEL genes were included only if they represented one-to-one orthologs, as reported in the EnsemblCompara GeneTrees (16). The majority of HNRNPCmammalian orthologs represented one-to-many or many-to-many orthologs: for this reason HNRNPC was not considered. For the same reason, sequences from rodents and lagomorpha were not included for POT1, as these animals have two Pot1 paralogues (Pot1a and Pot1b) (17,18). The final list included 32 TEL genes (Supplementary Table S1). To obtain a list of ‘control’ (CTR) genes, for each TEL gene we randomly selected one human gene with a GC content and coding sequence length differing less than 5% from those of the humanTEL gene. Mammalian homologs of CTR genes were included only if they represented one-to-one orthologs (Supplementary Table S1).As a comparison, genes involved in centromere function (CEN genes) were selected from a recent review (19). In particular, we included CENPA and genes coding for proteins involved in the correct localization and stabilization of CENPA at centromeric nucleosomes.For all genes, coding sequence information for at least 53 mammalian species (Supplementary Tables S1 and S2) were retrieved from the NCBI database (http://www.ncbi.nlm.nih.gov/), from the Ensembl website (http://www.ensembl.org/index.html), and from the UCSC server (http://genome.ucsc.edu/).
Evolutionary analysis in mammals
Coding sequence alignments were generated using the RevTrans 2.0 utility (http://www.cbs.dtu.dk/services/RevTrans/, MAFFT v6.240 as an aligner) (20), a software which uses the protein sequence alignment as a scaffold for constructing the corresponding DNA multiple alignment. This latter was checked and edited using TrimAl (automated1 mode) (21); manual editing was used to correct a few misalignments in proximity of small gaps (http://phylemon.bioinfo.cipf.es/utilities.html).To evaluate the level of substitution saturation, we applied the Xia's index implemented in DAMBE (22) to all gene alignments. No statistical evidence of substitution saturation was detected for any alignment (Supplementary Table S3).All alignments were also screened for the presence of recombination using GARD (Genetic Algorithm Recombination Detection) (23). Evidence of recombination was detected only for WRAP53, HJURP and EHD4, whereas no breakpoints were detected for all the remaining genes. The coding alignments of WRAP53, HJURP and EHD4 were split on the basis of the recombination breakpoints and sub-regions were used as the input for molecular evolution analyses.The average nonsynonymous substitution(dN)/synonymous substitution (dS) rate and the dN–dS parameter were calculated using the single-likelihood ancestor counting (SLAC) method (24).Gene trees were generated using the phyML program with gamma-distributed rates, four substitution rate categories, and estimation of transition/transversion ratio and proportion of invariable sites (25).Evidence of positive selection was searched for using the codon-based codeml program implemented in the PAML (Phylogenetic Analysis by Maximum Likelihood) suite (26). We applied different site (NSsite) models. In particular, M7 is a null model that assumes that 0 < dN/dS < 1 and is beta distributed among sites; M8 is a positive selection model: it is the same as M7 but also includes a category of sites with dN/dS > 1; M8a is the same as M8, except that it does not allow positive selection, but only neutral evolution. To assess statistical significance, twice the difference of the likelihood (ΔlnL) for the models (M7 vs M8 and M8a vs M8) is compared to a χ2 distribution (2 degrees of freedom for the M7 versus M8 comparison, 1 degree of freedom for M8a versus M8). LRT tests were run with both F3 × 4 and F61 codon frequency models, with an initial value of ω = 0.4 (Supplementary Tables S4–S6). To further investigate whether substitution saturation was responsible for the high fraction of TEL and CEN positively selected genes, we used the PAML Free Ratio (FR) model to estimate dS for all branches of the gene/region phylogenies (27). A minority of branches had dS > 0.25 and really few had dS > 0.5 (Supplementary Tables S4 and S5). Only two gene phylogenies (TERT and DKC1) had branches with dS > 1. Analyses were thus re-run after removing 1 branch both for DKC1 and for TERT. Very similar results to those with the full dataset were obtained (Supplementary Table S4). For positively selected genes, results were validated using a different initial value for ω (ω = 1) (not shown).Positively selected sites were identified using the BEB (Bayes Empirical Bayes) method (with a posterior probability ≥ 0.90) (28), the Random effects likelihood (REL) (24) and the Fast Unconstrained Bayesian AppRoximation (FUBAR) (29) methods from the HyPhy package (30). Sites were declared as positively selected by REL or FUBAR if they showed a Bayes Factor ≥ 50 or a posterior probability ≥ 0.90, respectively. Finally, only sites identified by at least two methods were considered as positively selected.The PAML Free Ratio (FR) model was also used to estimate the value of dN/dS on the branches of the phylogenies of all gene sets (27). The FR model assumes different dN/dS for each lineage and can be compared with a null model with one dN/dS for the entire phylogeny. Statistical significance is assessed by comparing twice the ΔlnL of the two models with a χ2 distribution with degrees of freedom equal to the difference in model parameters.The branch model Clade model C (CmC) (31) was used to identify codons under divergent selection pressure among different clades in the phylogeny. In the CmC model, three classes of sites are assumed in the alignment: the first two represent evolutionary conserved and neutral evolving codons, while the third class represents codons under different selective pressures in the different clades selected a priori. This model is then compared against a null model proposed by Weadick and Chang that does not allow variation among lineages (32).
TEL genes mutations
The list of TEL genes mutations was obtained from the Human Gene Mutation Database (HGMD, http://www.hgmd.cf.ac.uk/ac/, last accessed December 2017) and from the Online Mendelian Inheritance in Man (OMIM, https://www.omim.org/, last accessed December 2017) database.
Phylogenetic regression analysis
Life-history traits were retrieved for all mammalian species of the AnAge database (n = 933). In particular, we focused on information about adult weight as a proxy of body mass and maximum recorded lifespan as a proxy for longevity. We also retrieved information about male and female sexual maturity (expressed in days) and interbirth intervals (called inter litter). A mammal supertree was retrieved from Bininda-Emonds et al. (33), and then pruned based on the AnAge species.We performed a continuous regression analysis within a maximum-likelihood PGLS (Phylogenetic Generalized Least Squares) framework with the BayesTrait software (34) using log10 transformed body mass and longevity values. BayesTrait incorporates the phylogenetic signal into the regression model by taking into account the shared ancestry specified by the phylogenetic tree. As expected, a positive and significant (P < 0.001) correlation between body mass and longevity was obtained (6).The regression coefficient and the intercept were used to plot the regression line, and the life-history traits (LHT) for the species analyzed in the positive selection analyses were then plotted as points. We assigned these species to a specific category if they fell inside, over, or under the plot region defined by the regression line plus its standard errors. These three categories were then used to create the groups for the CmC test.We applied the Coevol software (35) to evaluate the correlation of molecular variables with life history traits. This software models the correlated evolution of these traits by assuming a multivariate Brownian diffusion process. All parameters in the model are estimated in a Bayesian framework and the relationship between variables is returned as a posterior probability of covariation. We run Coevol with all LHTs simultaneously along with dS and dN/dS, and we analyzed the covariation among these parameters after controlling for the effect of body mass and longevity. All life-history trait values were log10-transformed before running the analyses.
Correlation with meiotic gene expression
Gene expression changes (fold-change) during female and male mouse meiosis were retrieved from previous works (36,37). The correlation between dN/dS and fold-changes was evaluated using Kendall's correlation. Kendall's correlation is a non-parametric test based on ranks. Ties were not present in any variable (either dN/dS or fold changes) for any dataset (TEL and CTR genes). Under these conditions, the power of Kendall's correlation depends uniquely on the sample size. Thus, irrespective of the expression level and range of fold changes we had equal power to detect correlation for the TEL and CTR datasets.
Comparison of the selective pattern of CEN and TEL genes
To analyze the pattern of selection at CEN and TEL genes and to compare it to that of CTR genes, we calculated pairwise dN/dS correlations for a set of internal branches and for all tip branches across the phylogenies. In particular, Kendall's correlation coefficient (tau) was calculated for all possible gene pairs. Next, we evaluated tau distribution among different gene sets: for instance, the TEL-TEL distribution was obtained by calculating tau values for all possible pairwise combinations of TEL genes, whereas the TEL+CEN - CTR distribution was obtained by calculating tau for all possible combinations of a TEL or CEN gene with a CTR gene. Distributions were compared by applying Kolmogorov-Smirnov tests.
Correlation with karyotype features
Information on karyotype features was retrieved from literature sources (38–43) and was available for 60 of the mammalian species that also had molecular evolutionary data. For these taxa, the median dN/dS for TEL, CEN and CTR genes with significant FR test was calculated and correlated with different karyotype features (i.e. aFN, diploid chromosome count, percentage of acrocentric/telocentric chromosomes, percentage of metacentric chromosomes). To assess the role of phylogenetic inertia in driving the correlation between dN/dS and aFN, we generated a concatenated alignment of TEL genes (with significant FR test) for the 60 species with available karyotype information. The alignment and aFN data were used as the input for Coevol, with body mass and longevity entered as covariates.
Evolutionary analysis in ciliates and yeasts
Tetrahymena genes that encode proteins involved in telomere elongation or maintenance were identified by literature searches (44–50). Tetrahymena thermophila coding sequences were derived from the NCBI database (http://www.ncbi.nlm.nih.gov/) or from the Tetrahymena genome database (http://ciliate.org/index.php/home/) (51). Orthologs in the Tetrahymena malaccensis, Tetrahymena elliotti and Tetrahymena borealis genomes were identified via the BLAST utility at the Tetrahymena genome database website. Genes with no orthologs in one or more Tetrahymena species were discarded. For 11 TEL genes, one single ortholog with high similarity to the T. thermophila sequence was identified in the other species. To obtain an equal number of control genes, we randomly selected one T. thermophila gene with the same coding sequence length and having orthologs in the three other Tetrahymena species. Coding sequences were aligned as described for mammalian sequences. Positive selection was tested by applying two methods, M7/M8 contrast, as described above (the more conservative M8a/M8 test was not performed as its power is expected to be very limited when four sequences only are included) and BUSTED (Branch-site Unrestricted Statistical Test for Episodic Diversification) (52). BUSTED allows ω to vary from branch to branch and it tests whether a gene has experienced positive selection at at least one site on at least one branch. Genes that showed significant evidences of positive selection (with either method) were analyzed for the presence of recombination using GARD. No evidence of recombination was detected.Yeast genes involved in telomere homeostasis were retrieved through Gene Ontology annotations for Saccharomyces cerevisiae (GO:0005697 and GO: 0000782). Yeast genes that encode meiosis-specific telomere-associated proteins and proteins involved in centromere specification were identified via literature searches (19,53). Evidence of positive selection (M7/M8 contrast) was derived from a previous genome-wide analysis (54).
RESULTS
TEL genes evolve at different rates depending on function
We analyzed the evolutionary history of 32 TEL genes and a matched list of CTR genes (see Methods for gene selection procedures) (Figure 1 and Supplementary Table S1) in a large phylogeny of placental mammals. Because they represent an epitome of fast-evolving essential genes that encode repetitive DNA-binding proteins, ten CEN genes were also included (Supplementary Table S1).We first calculated the average non-synonymous substitution/synonymous substitution rate (dN/dS, also referred to as ω). Comparison of CTR and TEL genes indicated that the latter have significantly higher average dN/dS (Wilcoxon Rank Sum Test for paired samples, two tailed, P = 0.025). In turn, average dN/dS was significantly higher for CEN genes than for TEL genes (Wilcoxon Rank Sum Test, two tailed, P = 0.037) (Supplementary Table S1).To gain further insight into the evolutionary pattern of TEL genes, a codon-wise measure of natural selection was obtained by calculating the dN–dS parameter (24). This metric was preferred over the conventional dN/dS because it is not rendered to infinite for dS values equal to 0. TEL genes were divided into six groups, based on their function (Figures 1, 2A and Supplementary Table S1). The distribution of dN–dS values was significantly different across gene groups (Kruskal–Wallis rank sum test, P < 10−15). CEN genes and meiosis-specific TEL genes had significantly higher dN–dS compared to CTR genes (Nemenyi post-hoc test, P < 10−15 for both comparisons) (Figure 2A). The same applied to genes encoding shelterin proteins (Nemenyi post-hoc test, P = 8.6 × 10−10), although fewer codons had dN–dS > 0 (Figure 2A). Conversely, genes belonging to other functional categories displayed a definitely higher proportions of constrained sites (dN–dS < 0) and the median dN–dS was either significantly lower than that of CTR genes (CTS complex, hTERC interaction, telomere regulation, Nemenyi post-hoc test, P < 10−8 for all tests) or not significantly different (Telomerase holoenzyme complex, Nemenyi post-hoc test, P = 0.92). Thus, TEL genes evolve at different rates, depending on their function, and they are generally more constrained than CEN loci, partially explaining their lower average dN/dS (Figure 2A).
Figure 2.
Distribution of dN–dS values. (A) Violin plots of dN–dS values calculated for all TEL, CEN and CTR genes. TEL genes are grouped based on their function, as in Figure 1. The dotted line represents a value dN–dS = 0 (neutral evolution). (B) Boxplot representation of dN–dS: TEL disease gene codons carrying missense mutations responsible for TBDs are compared with codons where no missense mutation has been described.
Distribution of dN–dS values. (A) Violin plots of dN–dS values calculated for all TEL, CEN and CTR genes. TEL genes are grouped based on their function, as in Figure 1. The dotted line represents a value dN–dS = 0 (neutral evolution). (B) Boxplot representation of dN–dS: TEL disease gene codons carrying missense mutations responsible for TBDs are compared with codons where no missense mutation has been described.Clearly, a major selective constraint acting on protein coding genes is related to the disease state or loss of fitness that amino acid replacements can determine by altering protein function. Among TEL genes, 13 have been associated with humanTBDs. Analysis of dN–dS indicated that disease genes are generally more constrained than the other TEL genes (Wilcoxon Rank Sum Test, two-tailed, P < 10−16) and, within these genes, sites where causative missense mutations were identified (n = 151) have lower dN–dS than sites where mutation have never been described (Wilcoxon Rank Sum test, two tailed, P = 0.00051) (Figure 2B).An important contribution of negative selection is expected for genes that play essential cellular functions. To formally test whether positive selection also drove the evolution of TEL genes in placental mammals, we applied likelihood ratio tests (LRT) implemented in the PAML suite (26,55). Null site models (M7 and M8a) were rejected in favor of the positive selection model (M8) for 18 genes. As a comparison, evidence of positive selection was detected for 9 CEN genes and for 8 CTR genes out of 32 (Supplementary Tables S4-S6). The proportion of CTR genes showing evidence of selection is consistent with a previous genome-wide analysis of 29 mammals that, using less stringent criteria than those we applied, reported ∼34% of genes with at least one positively selected codon (56).These results indicate that the proportion of TEL genes targeted by positive selection is significantly higher than that of CTR genes (Fisher's Exact Test, P = 0.021).In positively selected genes, the fraction of sites that were called as positively selected (see methods) was 1.00% for TEL, 1.47% for CEN and 0.57% for CTR (Figure 3, Supplementary Figure S1, Supplementary Tables S4 and S5). Thus, both TEL and CEN genes display a significantly higher fraction of positively selected sites compared to CTR genes (TEL: Fisher's Exact Test, P = 0.0025; CEN: Fisher's Exact Test, P = 9 × 10−7).
Figure 3.
Localization of positively selected sites. Domain representation of a subset of positively selected TEL proteins; positively selected sites are indicated in red. Colour-codes are as in Figure 1.
Localization of positively selected sites. Domain representation of a subset of positively selected TEL proteins; positively selected sites are indicated in red. Colour-codes are as in Figure 1.Analysis of selected sites in TEL proteins indicated that they often fall within DNA- and RNA-interacting domains. This is the case of sites in TERF1, TERF2 and SMG6 (Figure 3 and Supplementary Figure S1). In CTC1, STN1 and TEP1, several selected sites map within the OB-fold domains or within the TROVE domain (Supplementary Figure S1), which are responsible for the binding to ssDNA and RNA, respectively. Sites in telomeric DNA-binding domains were also detected in TERB1 (MYB domain) and MAJIN (Figure 3).
Life history traits do not explain positive selection at TEL genes
As mentioned above, telomerase activity and telomere length correlate with body mass and longevity in mammals (6). We thus tested whether the signatures of positive selection at TEL genes were driven by changes in these life history traits. To this aim, we applied a Phylogenetic Generalized Least Squares (PGLS) framework, which accounts for the shared ancestry among species in a phylogeny, to analyze mass and longevity (6). We retrieved body mass and longevity data for all the mammalian species present in the AnAge database (http://genomics.senescence.info/species/) and we calculated the regression line by incorporating the phylogenetic signal (Figure 4). We then defined whether each of the species analyzed for TEL genes deviated from this correlation (plus confidence intervals): we thus generated three categories of species based on the mass∼lifespan expectation (increased, unchanged, or decreased longevity based on body mass) (Figure 4).
Figure 4.
Correlation plot between body mass and longevity. A regression line (plus confidence intervals) between maximum lifespan and adult weight (log10-transformed) for 933 mammalian species is plotted in red. Species analyzed in this work are shown with black dots.
Correlation plot between body mass and longevity. A regression line (plus confidence intervals) between maximum lifespan and adult weight (log10-transformed) for 933 mammalian species is plotted in red. Species analyzed in this work are shown with black dots.We next applied the codeml clade models, which were developed to test for variation in selection pressure among ecologically divergent species (32). Specifically, these models allow site-specific divergence in selective constraint among clades of a phylogeny. We analyzed the different selective patterns in all the TEL genes found under positive selection (n = 18) in the three categories defined by the PGLS method. Results showed that, although there were some significant differences in dN/dS among groups, no group had a class of codons evolving with dN/dS higher than 1 for any gene (Supplementary Table S7).We next analyzed additional life history traits, namely age at female maturity, age at male maturity, and inter litter interval. Analyses were performed using Coevol, a Bayesian method that corrects for phylogenetic inertia (35). Body mass and longevity were used as covariates. Out of eighteen analyzed genes, only ACD showed a significant correlation of dN/dS with age at female and male maturity (Supplementary Table S8).Overall, these data suggest that positive selection at TEL genes is not mainly related to changes in life-history traits.
Evolutionary rates of TEL genes correlate with expression during female meiosis
Because meiosis-specific TEL genes displayed high average dN–dS values and were found to be positively selected, we investigated the relationship between evolutionary rates and TEL gene meiotic expression. Indeed, besides TERB1, TERB2 and MAJIN, several other TEL genes are expressed during gametogenesis.We thus used genome-wide RNA-seq data for fetal mouse ovaries to retrieve information on the expression of TEL and CTR genes before and during meiosis (36). Specifically, we obtained expression level changes (fold-change) for the leptotene (E14.5) and pachytene (E16.5) stages compared to a pre-meiotic (E12.5) stage. These values were correlated to average dN/dS. For TEL genes, a positive correlation was obtained for both the leptotene and the pachytene stages (Figure 5A). Positive and significant (or almost significant) correlations were also observed when TERB1, TERB2 and MAJIN were excluded from the analysis (leptotene stage, Kendall's tau = 0.25, P = 0.06; pachytene stage, Kendall's tau = 0.27, P = 0.045). No significant correlation was observed between dN/dS and increased meiotic expression for CTR genes (leptotene stage, Kendall's tau = 0.076, P = 0.55; pachytene stage, Kendall's tau = 0.12, P = 0.34).
Figure 5.
Evolutionary rates and expression in meiosis. Average dN/dS for all TEL genes is plotted against the log2 fold-change (FC) of gene expression in the leptotene or pachytene stages versus the pre-meiotic stage of mouse oogenesis (A) or spermatogenesis (B). Dots are colored based on the gene function (see Figure 1). Kendall's correlation coefficients are also reported.
Evolutionary rates and expression in meiosis. Average dN/dS for all TEL genes is plotted against the log2 fold-change (FC) of gene expression in the leptotene or pachytene stages versus the pre-meiotic stage of mouse oogenesis (A) or spermatogenesis (B). Dots are colored based on the gene function (see Figure 1). Kendall's correlation coefficients are also reported.To assess whether the same pattern was observed at the male meiosis, we retrieved expression changes of TEL and CTR genes during different stages of mouse male meiosis compared to pre-meiotic stages (6 days post partum, dpp). In particular, time periods that roughly correspond to the leptotene/zygotene stage (10 dpp) and pachytene stage (14 dpp) were analyzed (37). No correlation between average dN/dS and expression changes was observed for either TEL (Figure 5B) or CTR genes (leptotene/zygotene stage, Kendall's tau = –0.018, P = 0.89; pachytene stage, Kendall's tau = 0.032, P = 0.81).Overall, these data indicate that TEL genes that are up-regulated in female meiosis evolve faster than TEL genes which show no or limited expression changes. Conversely, TEL genes that are highly expressed during male gametogenesis have similar evolutionary rates as those showing low expression.
Similar selective patterns for CEN and TEL genes across the mammalian phylogeny
Because a high proportion of both TEL and CEN genes showed fast evolutionary rates, we investigated their selection pattern across the whole phylogeny of placental mammals. To this aim, we applied the free ratio (FR) model implemented in the PAML software (27). This model estimates a value of dN/dS for each lineage in the phylogeny and is compared with a null model that estimates a single dN/dS for all lineages. Results indicated that the FR model fitted the data better than the null model for 17 TEL and for 9 CEN genes (Supplementary Table S9). Thus, for these genes, the selective pressure has been acting differently across the phylogeny, with some branches showing dN/dS values higher than 1 (Supplementary Table S9). Twenty CTR genes also had a significant FR test (Supplementary Table S9).To formally test whether the 17 TEL and 9 CEN genes showed a similar selective pattern and whether such pattern is different from that of CTR genes, we calculated pairwise dN/dS correlations for tip branches and a sub-set of internal branches for gene pairs (Figure 6A). In particular, Kendall's correlation coefficients (tau) were calculated. Results indicated no difference among the distribution of tau for TEL-TEL and CEN-CEN gene pairs (Kolmogorov–Smirnov test, P = 0.84). These distributions were also not different from that obtained for TEL-CEN gene pairs (Kolmogorov-Smirnov test, P > 0.60 for TEL-TEL versus CEN-TEL and for CEN-CEN versus CEN-TEL). CTR-CTR gene pairs also had a similar distribution. This is not surprising, though, as features such as body mass and longevity influence dN/dS across mammalian species (57). However, the distribution obtained by considering TEL and CEN genes as a single group (TEL+CEN in Figure 6B) was significantly different from that calculated for TEL+CEN - CTR gene pairs (Kolmogorov-Smirnov test, P = 0.009) (Figure 6B), suggesting that TEL and CEN genes show similar selection patterns and that such patterns are different from those of CTR genes.
Figure 6.
Selective patterns across the mammalian phylogeny. (A) Phylogenetic trees are drawn to summarize the selection patterns of TEL and CEN genes. Colored branches were analyzed for genes showing a significant free ratio model. Branch thickness is proportional to the number of genes showing a dN/dS > 1. Branches are colored based on the mammalian orders as shown in the legend. Internal branches that were analyzed but do not define orders are in red. (B) Distributions of tau correlation coefficients. The distributions of pairwise dN/dS correlation coefficients calculated for gene pairs from different groups are reported, as described in the text. (C) Relationship between selection strength and karyotype. Correlation plot between species median dN/dS calculated for all genes selected in the free ratio model and the fundamental number of those species. Species are colored as in panel (A).
Selective patterns across the mammalian phylogeny. (A) Phylogenetic trees are drawn to summarize the selection patterns of TEL and CEN genes. Colored branches were analyzed for genes showing a significant free ratio model. Branch thickness is proportional to the number of genes showing a dN/dS > 1. Branches are colored based on the mammalian orders as shown in the legend. Internal branches that were analyzed but do not define orders are in red. (B) Distributions of tau correlation coefficients. The distributions of pairwise dN/dS correlation coefficients calculated for gene pairs from different groups are reported, as described in the text. (C) Relationship between selection strength and karyotype. Correlation plot between species median dN/dS calculated for all genes selected in the free ratio model and the fundamental number of those species. Species are colored as in panel (A).We next overlaid the signals of selection (proportion of genes showing dN/dS > 1) over the phylogenetic tree to obtain a glimpse of whether positive selection acted on specific lineages. Most of the branches leading to superoders/orders showed at least one gene with dN/dS > 1, both for TEL and for CEN genes (Figure 6A). In particular, the Xenarthra/Afrotheria and the Boroeutheria branches showed a relatively high number of CEN and TEL selected genes. The same observation applies to the branches leading to Ferungulata and primates. As for tip branches, selection appeared strong in equids and macaques. In general, weak selection signals were detected in rodents, especially for TEL genes (Figure 6A).
The strength of selection at TEL genes correlates with karyotype features
Mammalian karyotypes evolve rapidly, with the number of chromosomes/chromosome arms (and therefore of telomeres and centromeres) changing widely among species (58). We thus reasoned that karyotype features may correlate with the rate of evolution at TEL and/or CEN genes.We thus compared the karyotype fundamental number for autosomes (aFN, a measure of the number of chromosome arms) in different mammalian species with the strength of selective pressure. In particular, for all taxa shown in Figure 6A, we calculated the median dN/dS across all genes showing a significant free ratio model. For those same taxa we retrieved aFN information from literature sources (60 species had available information) (38–43). We found a positive significant correlation (Kendall's tau = 0.23, P = 0.012) between aFN and dN/dS for TEL genes (Figure 6C). No significant correlation was found when CEN and CTR genes were analyzed (CEN: Kendall's tau = 0.07, P = 0.40; CTR: Kendall's tau = –0.081, P = 0.37). Also, no significant correlation for either TEL, CEN or CTR genes was detected with the diploid number of chromosomes, the percentage of acrocentrics/telocentrics, or the percentage of metacentrics (not shown). Although the correlation between aFN and median dN/dS does not seem to be due to clade/order-specific effects (Figure 6C), we used Coevol on a concatenated alignment of TEL genes to correct for phylogenetic inertia (35). A correlation coefficient of 0.195 was obtained with good statistical support (posterior probability = 0.83).Thus, the overall rate of evolution is faster for TEL genes in species showing a high number of chromosome arms (i.e. a higher number of telomeres).
TEL and CEN genes in eukaryotes with diverse meiotic programs
If the selective pressure responsible for TEL and CEN gene evolution mainly derives from their roles during gametogenesis, eukaryotes with different meiotic programs may display distinct selection patterns, as previously noted for CENPA in ciliates and yeasts (59,60).Telomere biology has been deeply investigated in Tetrahymena species (ciliates that only carry out asymmetric meiosis) and several proteins that associate with the telomerase and/or promote telomere maintenance have been characterized (44–50). We exploited this information and the availability of four fully sequenced Tetrahymena genomes to search for orthologs of 18 Tetrahymena thermophila genes that encode proteins involved in telomere biology. For 11 genes, orthologs were identified in the three other Tetrahymena genomes (T. borealis, T. malaccensis and T. elliotti). An equal number of control genes were also selected (see methods).Positive selection tests have little power when only few sequences are analyzed, but they are not prone to false positives (61). We however used two methods, that are based on different assumptions of dN/dS variation, to test for positive selection: the M7/M8 LRT and BUSTED. Both methods detected positive selection at four TEL genes (three in common) (Table 1). In contrast, none of the control genes showed evidence of positive selection with either the M7/M8 LRT or with BUSTED (Supplementary Table S10). Overall, these data suggest that positive selection is relatively common at TEL genes in Tetrahymena.
Table 1.
Likelihood ratio test statistics for models of variable selective pressure among sites for Tetrahymena TEL genes
Systematic gene name
Standard gene name
Description
-2ΔlnLa
M7 vs M8 P valueb
BUSTED P value
References
TTHERM_00658760
TAP19 (p19)
Telomerase- (or TERT-) associated protein of 19 kDa
9.20 × 10−5
0.999
1
(48)
TTHERM_00429730
TTP80 (p80)
Telomerase- (or TERT-) associated protein of 80 kDa
3.09
0.214
0.12
(50)
TTHERM_00218760
Teb1 (p82)
TEB1 (TElomerase DNA Binding subunit 1; also known as Telomerase- (or TERT-) associated protein of 82 kDa)
7.79
0.020
0.84
(48)
TTHERM_00985160
TTP95 (p95)
Telomerase- (or TERT-) associated protein of 95 kDa
8.73
0.013
0.001
(50)
TTHERM_00013120
PAT1
Pot1 associated in Tetrahymena 1
0.001
0.999
0.68
(47,91,92)
TTHERM_000378989
POT1
Protection of telomeres 1
1.71
0.426
1
(47,49)
TTHERM_00726370
RLP1
Replication factor-A-like protein 1
7.23
0.027
0.016
(46)
TTHERM_00459400
RLP2
RPA-Like Protein 2
2.35
0.309
0.31
(45)
TTHERM_00106890
RFA1
Homolog of budding yeast RFA1 and mouse RPA1
10.86
0.004
0.0001
(48)
TTHERM_00439320
Teb3
TElomerase DNA Binding subunit 3
3.41
0.182
0.005
(93)
TTHERM_000523058
Tpt1
TPP1/Tpz1 in Tetrahymena thermophila, POT1 associated protein
4.46
0.107
0.076
(47,91,92)
aTwice the difference of likelihood for the two models compared.
b
P value of rejecting the neutral model (M7) in favor of the positive selection model (M8).
Likelihood ratio test statistics for models of variable selective pressure among sites for TetrahymenaTEL genesaTwice the difference of likelihood for the two models compared.b
P value of rejecting the neutral model (M7) in favor of the positive selection model (M8).Finally, we analyzed TEL and CEN genes in yeasts, organisms with symmetric meiosis only. In particular, we used data from a genome-wide analysis of positive selection across five Saccharomyces sensu stricto species (54). We retrieved 19 genes involved in telomere integrity and elongation from Gene Ontology classifications for Saccharomyces cerevisiae (see methods and Supplementary Table S11): 17 of these were included in the genome-wide analysis and none showed signals of positive selection (based on the M7/M8 LRT). The same applied to three genes that encode telomere-associated proteins specifically expressed in meiosis (Ndj1, Csm4, and Mps3) (53) (Supplementary Table S11). As for CEN genes, positive selection was not detected for the functional homolog of mammalianCENPA (encoded by Cse4), in agreement with previous observations (60). However, Scm3, encoding the functional homolog of the HJURP chaperone (19), showed statistically significant evidence of positive selection (M7/M8 LRT, P value = 0.0081) (Supplementary Table S11).
DISCUSSION
Herein we show that TEL genes evolve at different rates depending on their function and we detect a significant correlation between the evolutionary rates of TEL genes and their up-regulation during mouse female (but not male) meiosis. No evidence that selection at TEL genes is driven by life history traits was obtained. Conversely, episodes of positive selection across mammalian species were found to be associated with karyotype features.These data may be consistent with a telomere drive hypothesis, whereby telomeres and telomere-binding proteins are engaged in an intra-genomic conflict similar to the one described for centromeres. The centromere drive hypothesis posits that selfish centromeric DNA elements promote their preferential inclusion in the oocyte through the recruitment of kinetochore components (9–12,62). Because the effects of centromere drive are deleterious during the symmetric male gametogenesis, CEN genes are thought to be under selective pressure to evolve mutations that restore meiotic balance (9–12,62). However, as noted elsewhere (59), deleterious effects of meiotic drive may also occur at the female meiosis itself. Indeed, our results on germline expression suggest that in the case of TEL genes the whole conflict occurs at oogenesis. This is also supported by the observation that TEL genes evolve under positive selection in eukaryotes with asymmetric meiosis only, but not in those with symmetric meiosis alone. In this respect, we note that Tetrahymena species differ from most eukaryotes as these ciliates carry out programmed somatic chromosome breakage and de novo telomere addition (63). We cannot therefore conclude that the selective forces acting on Terahymena TEL genes are only related to meiotic drive. We also mention that the detection of positive selection at the yeastScm3 gene is not necessarily in contrast with the centromere-drive hypothesis. For instance, the 2 μm plasmid of S. cerevisiae recruits Cse4 to its partitioning locus STB to stably propagate itself (64). Because the association of Cse4 with STB is dependent on Scm3 (65), the latter may be involved in a genetic conflict with extrachromosomal selfish DNA elements (9,12).Clearly, a major difference between telomeres and centromeres is that centromeric DNA evolves rapidly whereas mammalian telomeres are invariably composed of 5′-TTAGGG-3′ repeats. The centromere drive hypothesis indicates that positive selection at CENPA (and other CEN proteins) affects its centromeric DNA binding preferences, thus suppressing centromere drive (9,62). However, this model was recently questioned by the observation that selected sites in the DrosophilaCENPA are unlikely to affect DNA binding specificity, but rather modulate its association with the CAL1 chaperone (66). Thus, positive selection at CENPA was proposed to modulate the efficiency of its centromeric deposition by CAL1 (66). Whatever the underlying mechanism, the role of centromeric sequences was recently supported by the observation that expanded satellite repeats act as selfish elements at mouse meiosis by incorporating more CENPA containing nucleosomes (67).Telomeric DNA is invariant in sequence but, in analogy to centromeric/pericentromeric chromosome regions, subtelomeric DNA is a preferential site for segmental duplications (68). Subtelomeres are enriched of subtelomeric repeat elements (SREs) and large variant alleles, mainly composed of SREs, have been described for many human chromosomes (15,68). Analysis of primate genomes indicated rapid evolution of subtelomeric regions, with intrachromosomal recombination playing an important role in this process (15). Notably, large blocks of subterminal heterochromatin are common in chimpanzee and gorilla chromosomes but are absent in human chromosomes. These blocks, which are composed primarily of specific sets of segmental duplications and subterminal satellites (69,70), were suggested to prolong the bouquet stage during meiotic prophase in chimpanzee (71).In humans, growing evidence also suggests that, at least in somatic cells, different chromosome arms display distinct telomere dynamics and that subtelomere-specific and haplotype-specific effects influence telomere length, indicating the presence of cis-acting regulators of telomere dynamics (72–76). Although such regulators are presently uncharacterized, they are likely to reside in the highly variable subtelomeric regions and may act as selfish elements by promoting their preferential segregation during female meiosis.Bouquet formation at meiosis is highly conserved across eukaryotes and it is necessary to assist homolog pairing, to resolve entanglements between non-homologs, and to ensure recombination (13). The recombination events are in turn crucial to ensure proper segregation of homologous chromosomes during the first meiotic metaphase. In mammals, shelterin is released during prophase and the TERB1/2-MAJIN complex establishes a direct link between telomeric DNA and the inner nuclear membrane (14). Mice lacking any of the three proteins of the complex display no overt somatic phenotype but are infertile due to meiotic arrest (14). Chromosome movements and bouquet formation are impaired in these mutants (14).Clearly, it remains to be clarified whether and how bouquet formation or other features associated to meiotic telomeres can be exploited by selfish DNA elements and how this may have negative effects on oogenesis. One possibility is that differences in telomere length (determined by subtelomeric sequences) influence the directionality of chromosome movements along the nuclear envelope, and thus affect chromosome positioning at later stages, eventually skewing segregation. Altered chromosome topology at the bouquet stage may in turn increase the probability of segregation errors for other homolog pairs (Figure 7A).
Figure 7.
Possible models of telomere drive. (A) Via binding to TEL proteins, a subtelomeric selfish DNA element influences the directionality of chromosome movements and thus affects chromosome topology at the bouquet stage. This, in turn, modifies chromosome positioning at later stages (not shown), skews segregation in favor of the selfish element, and results in increased probability of segregation errors for other homolog pairs (orange pair). Mutations in TEL proteins that reinstate correct chromosome movements are positively selected so that bouquet topology and segregation balance are restored. (B) A selfish DNA element recruits an excess of a rate-limiting protein component, thus curtailing the availability of this same component for other chromosomes. Protein binding avidity favors the transmission of the selfish elements but increases segregation errors for other homolog pairs. Mutations in TEL proteins that restore equal binding are selected for and segregation balance is re-established.
Possible models of telomere drive. (A) Via binding to TEL proteins, a subtelomeric selfish DNA element influences the directionality of chromosome movements and thus affects chromosome topology at the bouquet stage. This, in turn, modifies chromosome positioning at later stages (not shown), skews segregation in favor of the selfish element, and results in increased probability of segregation errors for other homolog pairs (orange pair). Mutations in TEL proteins that reinstate correct chromosome movements are positively selected so that bouquet topology and segregation balance are restored. (B) A selfish DNA element recruits an excess of a rate-limiting protein component, thus curtailing the availability of this same component for other chromosomes. Protein binding avidity favors the transmission of the selfish elements but increases segregation errors for other homolog pairs. Mutations in TEL proteins that restore equal binding are selected for and segregation balance is re-established.Alternatively, telomere length/subtelomere sequences may influence remodeling during bouquet formation and alter the rigidity of the telomeric region or modify the chromosome's biophysical properties. Indeed, recent evidence suggest that TERB1 recruits cohesin to telomeres, resulting in rigid structures reminiscent of centromeres (77), and two of the TERB1 selected sites we detected are located within the cohesin-binding region. In this scenario, selfish elements may hoard cohesin (or some rate-limiting protein component) away from other homologs (Figure 7B). Interestingly, decreased cohesion not only at centromeric regions, but also along distal chromosome arms is thought to contribute to cross-over terminalization, bivalent hypestretching, and eventually mis-segregation during aged mouse oogenesis (78).Another possible mechanism is that telomeres directly interact with spindle components. In fact, evidence from mouse oocytes indicates that telomere dysfunction leads to disruption of the meiotic spindle, misalignment of metaphase chromosomes, and a high incidence of lagging chromosomes (79).A recent study in Drosophila indicated that extreme telomere-length variations do not cause meiotic drive, and identified a weak driver linked to a centromere region (80). It should however be noted that Drosophila telomeres are not maintained by telomerase and fruitfly lineages have extremely variable and labile telomere lengths (80). Some aspects of telomere biology may thus totally differ from organisms that canonically rely on telomerase. Indeed, whereas crosses of Drosophila lines with short and long telomeres were fertile, experiments in mice indicated that large differences in telomere lengths at homologous chromosomes compromise meiotic pairing, synapsis, and recombination, especially at the female meiosis (81).In contrast to the Drosophila study, evidence of transmission ratio distortion at a telomeric (and at a centromeric) region was previously obtained in chicken (82). Interestingly, these animals, like most other birds, have extremely long telomeres (83).Finally, we mention that selfish elements may affect telomere features different from length. Like centromeres, telomeres and subtelomeres are assembled into heterochromatin domains (84). Epigenetic features may thus act as selfish elements, again a situation reminiscent of centromeric regions, which are mainly specified epigenetically (85).Comparison of TEL and CEN genes indicated that their evolutionary patterns across the mammalian phylogeny are similar and differ from those of control genes. This suggests that features directly (e.g. frequency of polyandry) or indirectly (e.g. climatic changes) connected to the ecology of meiotic drive (86) have played a role in shaping the diversity of these loci over evolutionary times. In fact, limited studies of simple organisms have indicated that, irrespective of their nature and molecular mechanisms, interaction between drivers and environmental factors can modulate the fitness of carrying populations/species (86). For TEL genes, however, we found that evolutionary rates are also related to the number of chromosome arms. A very simplistic interpretation of this finding is that more chromosome arms and, therefore more telomeres, in a karyotype imply a higher possible number of selfish DNA elements and stronger selection for driver suppressors. Another possibility is that the correlation we observed is accounted for by variations in recombination rate (which in turn correlates with the number of chromosomes/chromosome arms in mammals (87,88)). Finally, the correlation between chromosome arms and TEL gene evolutionary rates may depend on chromosome average size. Because genome size is relatively constant in mammals, a high number of chromosome arms implies that the karyotype is dominated by small chromosomes (88). At least in humans, small chromosomes also tend to display the highest probability of meiotic segregation errors at oogenesis, especially in aged women (89). Finally, we note that mammalian karyotypes evolve rapidly, partially because of meiotic drive that favors or disfavors the inclusion of Robertsonian fusions versus metacentrics in the oocyte (40,85,90). The correlation between chromosome arms and evolutionary rates at TEL genes may thus reflect more complex dynamics.Our study provides evidence that mammalian genes involved in telomere homeostasis are common targets of positive selection, which might ensue from an intra-genomic conflict akin to that described for centromeres and centromere-associated proteins. Centromeres and telomeres can potentially influence chromosome movement during meiosis and thus represent excellent candidates as meiotic drivers. These findings raise the possibility that telomere dynamics influence karyotype evolution and eventually speciation in mammals.
DATA AVAILABILITY
Coding sequence information were retrieved from the NCBI database (http://www.ncbi.nlm.nih.gov/), from the Ensemble website (http://www.ensembl.org/index.html), and from UCSC server (http://genome.ucsc.edu/), which provide access to biomedical and genomic information.The list of TEL genes mutations was obtained from the Human Gene Mutation Database (HGMD, http://www.hgmd.cf.ac.uk/ac/, last accessed December 2017) and from the Online Mendelian Inheritance in Man (OMIM, https://www.omim.org/, last accessed December 2017) database.Life-history traits were retrieved for all mammalian species of the AnAge database (http://genomics.senescence.info/species/).Click here for additional data file.
Authors: Ben Murrell; Sasha Moola; Amandla Mabona; Thomas Weighill; Daniel Sheward; Sergei L Kosakovsky Pond; Konrad Scheffler Journal: Mol Biol Evol Date: 2013-02-18 Impact factor: 16.240
Authors: Bethan Britt-Compton; Jan Rowson; Matthew Locke; Ian Mackenzie; David Kipling; Duncan M Baird Journal: Hum Mol Genet Date: 2006-01-18 Impact factor: 6.150
Authors: Kerstin Lindblad-Toh; Manuel Garber; Or Zuk; Michael F Lin; Brian J Parker; Stefan Washietl; Pouya Kheradpour; Jason Ernst; Gregory Jordan; Evan Mauceli; Lucas D Ward; Craig B Lowe; Alisha K Holloway; Michele Clamp; Sante Gnerre; Jessica Alföldi; Kathryn Beal; Jean Chang; Hiram Clawson; James Cuff; Federica Di Palma; Stephen Fitzgerald; Paul Flicek; Mitchell Guttman; Melissa J Hubisz; David B Jaffe; Irwin Jungreis; W James Kent; Dennis Kostka; Marcia Lara; Andre L Martins; Tim Massingham; Ida Moltke; Brian J Raney; Matthew D Rasmussen; Jim Robinson; Alexander Stark; Albert J Vilella; Jiayu Wen; Xiaohui Xie; Michael C Zody; Jen Baldwin; Toby Bloom; Chee Whye Chin; Dave Heiman; Robert Nicol; Chad Nusbaum; Sarah Young; Jane Wilkinson; Kim C Worley; Christie L Kovar; Donna M Muzny; Richard A Gibbs; Andrew Cree; Huyen H Dihn; Gerald Fowler; Shalili Jhangiani; Vandita Joshi; Sandra Lee; Lora R Lewis; Lynne V Nazareth; Geoffrey Okwuonu; Jireh Santibanez; Wesley C Warren; Elaine R Mardis; George M Weinstock; Richard K Wilson; Kim Delehaunty; David Dooling; Catrina Fronik; Lucinda Fulton; Bob Fulton; Tina Graves; Patrick Minx; Erica Sodergren; Ewan Birney; Elliott H Margulies; Javier Herrero; Eric D Green; David Haussler; Adam Siepel; Nick Goldman; Katherine S Pollard; Jakob S Pedersen; Eric S Lander; Manolis Kellis Journal: Nature Date: 2011-10-12 Impact factor: 49.962
Authors: Nicholas A Stover; Ravinder S Punia; Michael S Bowen; Steven B Dolins; Theodore G Clark Journal: Database (Oxford) Date: 2012-03-20 Impact factor: 3.451