| Literature DB >> 35456514 |
Jiaqi Wu1, Takahiro Yonezawa2, Hirohisa Kishino3,4,5.
Abstract
It is unknown what determines genetic diversity and how genetic diversity is associated with various biological traits. In this work, we provide insight into these issues. By comparing genetic variation of 14,671 mammalian gene trees with thousands of individual human, chimpanzee, gorilla, mouse, and dog/wolf genomes, we found that intraspecific genetic diversity can be predicted by long-term molecular evolutionary rates rather than de novo mutation rates. This relationship was established during the early stage of mammalian evolution. Moreover, we developed a method to detect fluctuations of species-specific selection on genes based on the deviations of intraspecific genetic diversity predicted from long-term rates. We showed that the evolution of epithelial cells, rather than connective tissue, mainly contributed to morphological evolution of different species. For humans, evolution of the immune system and selective sweeps caused by infectious diseases are the most representative examples of adaptive evolution.Entities:
Keywords: gene effect; gene-specific molecular evolutionary rates; genetic diversity; human-specific evolution; locus effect; long-term molecular evolutionary rates; species-specific evolution
Mesh:
Year: 2022 PMID: 35456514 PMCID: PMC9031814 DOI: 10.3390/genes13040708
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.141
Figure 1Effect of long-term evolutionary rates and recombination on shaping genetic diversity. π0 represents the level of the intrinsic genetic diversity. Selection on a genomic locus reduces the genetic diversity at the locus. Hitchhiking propagates the effect of selection on the surrounding region. We described the selective effect () on a gene in terms of the long-term molecular evolutionary rate () as (see Materials and Methods). Parameter α describes the impact of recombination relative to long-term rate in the model. Higher α values in genomic regions indicate that recombination contributed more to the prediction of genetic diversity than the regions with lower α values. β describes the impact of the long-term rates of nearby genes. Higher β values in genomic regions indicate that the long-term rates of genes contributed more to genetic diversity than the regions with lower β values. The three parameters were estimated for each chromosome and for each feature (3′-UTR, 5′-UTR, exon, or intron). In total, we obtained 88 (22 chromosomes 4 features) estimates each for π0, α, and β. (a–c). Pie charts for π0, α, and β, respectively, representing the contributions of chromosome, feature, and residuals. The variances of π0, α, and β were decomposed into among-chromosome variances, among-feature variances, and variances of residuals in two-way ANOVA that represented the interactions of each set of factors. (d–f). The among-chromosome variations of π0, α, and β, respectively, for each feature (3′-UTR, 5′-UTR, exon, or intron). (g). Hierarchical clustering of π0, α, and β for each chromosome and feature. We normalized the values of π0, α, and β to analyze the three parameters together.
Figure 2Bridging micro- and macroevolution at the molecular level: (a). Correlation between the human proportion of segregating sites (q) and the rate of mammalian molecular evolution (r) of 14,671 genes. cor* indicates the bias-corrected correlation. (b). Correlation between the human proportion of segregating sites (q) and the rate of molecular evolution (r) based on gene trees of different taxonomic clades. (c). Correlation between the proportion of human de novo mutations and the rate of mammalian molecular evolution (r). (d). Correlation between the proportion of human singletons and the rate of mammalian molecular evolution (r). (e,f). Principal component analysis (PCA) of q of five species and the long-term rate using 5560 single-copy genes.
Negative binomial regression of numbers of human de-novo mutations and singletons on the median branch length of the mammalian gene trees. The regression model is : .
| de novo Mutation | Singleton | |||||
|---|---|---|---|---|---|---|
| Estimate | SE | Estimate | SE | |||
| intercept | −9.938 | 0.307 | <2 × 10−16 | −2.177 | 0.026 | <2 × 10−16 |
| log(Long-term rates) | −0.008 | 0.064 | 0.904 | 0.413 | 0.006 | <2 × 10−16 |
Figure 3Disease-associated gene set test: (a). Scatter plot, boxplot, and regression model of the human proportion of segregating sites (q) vs. mammalian molecular evolutionary rates of 14,671 genes. The blue curve represents the prediction by the negative binomial regression (see Methods). (b). Examples of significant disease-associated gene sets in humans revealed by analysis of 14,267 disease-associated gene sets. Genes related to the following diseases in humans are shown as examples: substance-related disorders, lymphopenia, and pneumonia.
Figure 4Correspondence analysis of genes and disease-associated gene sets: (a). Correspondence analysis of 14,671 genes of five species using of each gene. (b,c). Correspondence analysis of 14,267 disease-associated gene sets of five species using of each disease-associated gene set. The diseases highlighted are significant diseases with a false discovery rate of 0.01. (d). Distribution of t-values of all diseases and significant disease-associated gene sets detected by correspondence analysis. The pale blue histogram indicates the distribution of average t-values of five species for each disease.