| Literature DB >> 31504490 |
Zackery A Ely1, Jiyun M Moon1, Gregory R Sliwoski1, Amandeep K Sangha2,3, Xing-Xing Shen1, Abigail L Labella1, Jens Meiler2,3, John A Capra1,4, Antonis Rokas1,4.
Abstract
Immunity genes have repeatedly experienced natural selection during mammalian evolution. Galectins are carbohydrate-binding proteins that regulate diverse immune responses, including maternal-fetal immune tolerance in placental pregnancy. Seven human galectins, four conserved across vertebrates and three specific to primates, are involved in placental development. To comprehensively study the molecular evolution of these galectins, both across mammals and within humans, we conducted a series of between- and within-species evolutionary analyses. By examining patterns of sequence evolution between species, we found that primate-specific galectins showed uniformly high substitution rates, whereas two of the four other galectins experienced accelerated evolution in primates. By examining human population genomic variation, we found that galectin genes and variants, including variants previously linked to immune diseases, showed signatures of recent positive selection in specific human populations. By examining one nonsynonymous variant in Galectin-8 previously associated with autoimmune diseases, we further discovered that it is tightly linked to three other nonsynonymous variants; surprisingly, the global frequency of this four-variant haplotype is ∼50%. To begin understanding the impact of this major haplotype on Galectin-8 protein structure, we modeled its 3D protein structure and found that it differed substantially from the reference protein structure. These results suggest that placentally expressed galectins experienced both ancient and more recent selection in a lineage- and population-specific manner. Furthermore, our discovery that the major Galectin-8 haplotype is structurally distinct from and more commonly found than the reference haplotype illustrates the significance of understanding the evolutionary processes that sculpted variants associated with human genetic disease.Entities:
Keywords: comparative modeling; galectins; human evolution; mammalian evolution; population genetics; positive selection
Mesh:
Substances:
Year: 2019 PMID: 31504490 PMCID: PMC6751361 DOI: 10.1093/gbe/evz183
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
. 1.—Galectin protein carbohydrate recognition domain (CRD) organization, classification, and sequence. (A) Classification of galectin subtypes based on CRD organization. Prototype galectins are defined by a single CRD, chimera type galectins are defined by a single CRD fused to an additional collagen-like domain, and tandem-repeat type galectins consist of two CRDs joined by a peptide linker of up to 70 amino acids. (B) Amino acid sequences of representatives of each of the three galectin classes. Residues directly involved in carbohydrate binding are highlighted in red and are mostly invariant across all galectin family members. The two residues highlighted in blue represent missense variants described in panel (C). (C) Population disparities and putative disease links of the common Galectin-3 single nucleotide variant, rs4652; the C allele, which is at 95% frequency in African genomes, is part of a codon for proline (Pro), whereas the A allele is part of a codon for threonine (Thr). Similarly, the A allele of the common Galectin-8 single nucleotide variant, rs1126407, is part of a codon for tyrosine (Tyr), whereas the T allele is part of a codon for phenylalanine (Phe).
Two Sets of Evolutionary Hypotheses Tested for the Rate of Evolution of Placental Galectin Genes
| Hypotheses for ancient galectins | Parameters |
|---|---|
| Null: Uniform rate for all branches | ωoutgroups=ωplacental stem=ωplacental clade=ωprimate clade |
| 2: Unique rates for placental mammals and placental stem branch | ωplacental stem≠ωoutgroups≠ωplacental clade=ωprimate clade |
| 3: Unique rate for placental mammals | ωplacental stem=ωoutgroups≠ωplacental clade=ωprimate clade |
| 4: Unique rate for placental stem branch | ωplacental stem≠ωoutgroups=ωplacental clade=ωprimate clade |
| 5: Unique rate for primates | ωplacental stem=ωoutgroups=ωplacental clade≠ωprimate clade |
| 6: Unique rates for placental mammals and primates | ωplacental stem=ωoutgroups≠ωplacental clade≠ωprimate clade |
|
| |
|
|
|
|
| |
| Null: Uniform rate for all branches | ωprimate stem=ωoutgroups=ωGalectin-13=ωGalectin-14=ωGalectin-16 |
| 2: Unique rate for | ωprimate stem=ωoutgroups=ωGalectin-14=ωGalectin-16≠ωGalectin-13 |
| 3: Unique rate for | ωprimate stem=ωoutgroups=ωGalectin-13=ωGalectin-16≠ωGalectin-14 |
| 4: Unique rate for | ωprimate stem=ωoutgroups=ωGalectin-13=ωGalectin-14≠ωGalectin-16 |
| 5: Unique rates for | ωprimate stem=ωoutgroups≠ωGalectin-13≠ωGalectin-14≠ωGalectin-16 |
| 6: Unique rate for primate stem branch | ωprimate stem≠ωoutgroups=ωGalectin-13=ωGalectin-14=ωGalectin-16 |
| 7: Unique rates for primate stem branch and primate clade | ωprimate stem≠ωoutgroups≠ωGalectin-13=ωGalectin-14=ωGalectin-16 |
| 8: Unique rate for primate clade | ωprimate stem=ωoutgroups≠ωGalectin-13=ωGalectin-14=ωGalectin-16 |
F ST Values, Phenotype Links, and Allele Ages of Placental Galectin Variants with Significant Population Differentiation and Putative Links to Disease or Other Phenotypes
| SNP | Gene |
| Empirical | Functional Consequence | Highest Calculated | PolyPhen Score | Putative Disease/Phenotype Link | ID and Age of Oldest Allele | ID and Age of Youngest Allele |
|---|---|---|---|---|---|---|---|---|---|
| rs4652 |
| 0.004 | 0.002 | Missense (T > P) | 0.52 | 0 (benign) | Several | C; 105 | A; 15.8 |
| rs10403702 |
| 0.03 | 0.017 | Missense (L > P) | 0.20 | 0 | Acute insulin response to glucose | C; 43 | T; human-specific allele |
| rs4830 |
| 0.02 | 0.009 | Missense (C > S/R) | 0.32 | 0.02 | Epistatic effect in breast cancer | A; human-specific | T; human-specific |
| rs1009977 |
| 0.01 | 0.008 | Intron variant | 0.35 | N/A | Cognitive function in old age | G; 29.4 | T; 15.8 |
| rs10498474 |
| 0.02 | 0.012 | Intron variant | 0.24 | N/A | Body weight | A; 43 | G; human-specific |
| rs2273865 |
| 0.048 | 0.029 | Nonsense variant | 0.14 | N/A | Differential expression of Galectin-8 isoforms | T; 105 | A; human-specific |
| rs4659682 |
| 0.01 | 0.008 | Intron variant | 0.34 | N/A | ADHD-IV impulsivity | A; 105 | G; human-specific |
| rs4644 |
| 0.056 | 0.032 | Missense (P > H) | 0.11 | 1 (deleterious) | Several | C; 105 | A; human-specific |
| rs1126407 |
| 0.066 | 0.041 | Missense (F > Y) | 0.10 | 0.01 | Rheumatoid arthritis | A; 105 | T; 6.23 |
All ages are given in Myr; human-specific alleles are identified as such.
These SNPs are marginally nonsignificant according to simulated distributions but are linked to disease, so we opted to include them.
Statistically Significant Differences in the Rate of Evolution of Placental Galectin Genes
| Gene | Hypothesis | Likelihood (−ln |
| ω Values | |||
|---|---|---|---|---|---|---|---|
| Outgroups | Placental Clade | Placental Stem | Primates | ||||
|
| Null | 5,353.76 | N/A | 0.15 | 0.15 | 0.15 | 0.15 |
| 2 | 5,344.95 | 1.5E-04 | 0.20 | 0.16 | 0.01 | 0.16 | |
| 4 | 5,345.53 | 5.0E-05 | 0.17 | 0.17 | 0.01 | 0.17 | |
|
| Null | 7,886.70 | N/A | 0.25 | 0.25 | 0.25 | 0.25 |
| 2 | 7,834.11 | 1.44E-23 | 0.07 | 0.39 | 0.21 | 0.39 | |
| 3 | 7,834.84 | 2.33E-24 | 0.08 | 0.39 | 0.08 | 0.39 | |
| 4 | 7,884.15 | 2.39E-02 | 0.26 | 0.26 | 0.08 | 0.26 | |
| 5 | 7,873.09 | 1.82E-07 | 0.22 | 0.22 | 0.22 | 0.58 | |
| 6 | 7,831.69 | 1.29E-24 | 0.07 | 0.35 | 0.07 | 0.57 | |
|
| Null | 8,161.84 | N/A | 0.19 | 0.19 | 0.19 | 0.19 |
| 5 | 8,153.47 | 4.29E-05 | 0.17 | 0.17 | 0.17 | 0.36 | |
| 6 | 8,153.36 | 2.1E-04 | 0.18 | 0.17 | 0.18 | 0.36 | |
|
| Null | 14,016.54 | N/A | 0.34 | 0.34 | 0.34 | 0.34 |
| 2 | 14,007.70 | 1.4E-04 | 0.28 | 0.36 | 0.11 | 0.36 | |
| 3 | 14,010.44 | 4.74E-04 | 0.24 | 0.36 | 0.24 | 0.36 | |
| 4 | 14,009.79 | 2.36E-04 | 0.35 | 0.35 | 0.11 | 0.35 | |
| 5 | 14,014.56 | 0.0461 | 0.32 | 0.32 | 0.32 | 0.40 | |
| 6 | 14,009.85 | 0.0012 | 0.24 | 0.35 | 0.24 | 0.40 | |
Numbers correspond to the hypotheses described in table 1.
. 2.—Changes in the evolutionary rate (measured by the ω ratio) in ancient placentally expressed galectin genes. The name of each ancient galectin gene is provided below each topology. The placental stem branch is located at the node representing the split between Laurasiatheria and other placental mammals (Glires and primates). All analyses were performed using codeML (Yang 2007). The complete set of hypotheses tested for each of the ancient galectin genes are provided in table 1 and their results in table 2. The full gene trees are provided in supplementary figure 8, Supplementary Material online. (A) LGALS1 corresponds to hypothesis 4. (B) LGALS3 corresponds to hypothesis 6. (C) LGALS8 corresponds to hypothesis 5. (D) LGALS9 corresponds to hypothesis 2.
. 3.—Evolution and genomic organization of placental cluster galectins and related genes. (A) Cladogram of the placental cluster galectins. Bootstrap support values are included as branch labels. Taxa are labeled as the gene names reported by Ensembl. All alternative models failed to reject the null hypothesis of an equal evolutionary rate (ω=0.60) for the placental cluster galectins. (B) Illustration of genomic organization of galectin gene clusters in human, cow, and goat. Remarkably, even though the cow and goat galectin gene clusters originated independently from the human galectin cluster, all three clusters are flanked by the same two genes: EID2 on the 5′ end and DYRK1B on the 3′ end. Cow and goat genes without official gene names are abbreviated with the last three numbers of their Ensembl accession numbers: ENSBTAG00000015260, ENSBTAG00000047030, and ENSCHIG00000015792. A phylogram depicting relative branch lengths is provided in supplementary figure 1, Supplementary Material online.
. 4.—Representative distributions of Tajima’s D and H12. Vertical lines indicate the significance cutoff corresponding to the 95th percentile of the null distribution. Red bars indicate where the actual values calculated for galectin genes fall along the distribution. Significantly negative values of Tajima’s D are interpreted as evidence of positive selection via hard selective sweeps, whereas significantly positive values of H12 are interpreted as evidence of positive selection via soft selective sweeps. (A) A significantly negative value of Tajima’s D is found for LGALS13. (B) A significantly positive value of H12 is found for LGALS3. (C) and (D) LGALS1 did not show significant results for these statistics.
. 5.—Protein haplotype frequencies of the canonical Galectin-8 isoform across and within five major human populations. (A) Global frequency of two common Galectin-8 protein haplotypes defined by four missense variants. The reference haplotype commonly used in Galectin-8 functional studies is not the most common protein haplotype. None of the 2,504 human genomes in the 1000 Genomes Project contains the hybrid haplotype. (B) Population frequencies of different Galectin-8 protein haplotypes mapped to five major continental populations defined in the 1000 Genomes Project: Africa, America (admixed populations), East Asia, Europe, and South Asia. Frequencies are based on data from the 1000 Genomes Project compiled in Ensembl for the canonical isoform. The Ensembl accession number of the corresponding LGALS8 transcript is ENST00000341872.10.
. 6.—Protein structural differences between the central models of the major haplotype, the reference haplotype, and the hybrid haplotype. The protein structure for the reference haplotype is colored gray, the major haplotype purple, and the hybrid haplotype yellow, respectively. Variant residues are colored green. Cyan-colored structures illustrate the approximate position typically occupied by carbohydrate ligands that bind the carbohydrate recognition domain. RMSD values are given in units of angstroms. (A) Model based on the major haplotype superimposed with the model based on the reference haplotype (RMSD = 0.79). (B) Model based on the hybrid haplotype superimposed with the model based on the reference haplotype (RMSD = 0.70). (C) Model based on the hybrid haplotype superimposed with model based on the major haplotype (RMSD = 0.74).