| Literature DB >> 35580180 |
Chao Zhang1, Anurag Verma1,2, Yuanqing Feng1, Marcelo C R Melo3,4,5, Michael McQuillan1, Matthew Hansen1, Anastasia Lucas1, Joseph Park1, Alessia Ranciaro1, Simon Thompson1, Meagan A Rubel1, Michael C Campbell6, William Beggs1, Jibril Hirbo7, Sununguko Wata Mpoloka8, Gaonyadiwe George Mokone9, Thomas Nyambo10, Dawit Wolde Meskel11, Gurja Belay11, Charles Fokunang12, Alfred K Njamnshi13,14, Sabah A Omar15, Scott M Williams16, Daniel J Rader1, Marylyn D Ritchie1, Cesar de la Fuente-Nunez3,4,5, Giorgio Sirugo2, Sarah A Tishkoff1,17,18.
Abstract
Human genomic diversity has been shaped by both ancient and ongoing challenges from viruses. The current coronavirus disease 2019 (COVID-19) pandemic caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has had a devastating impact on population health. However, genetic diversity and evolutionary forces impacting host genes related to SARS-CoV-2 infection are not well understood. We investigated global patterns of genetic variation and signatures of natural selection at host genes relevant to SARS-CoV-2 infection (angiotensin converting enzyme 2 [ACE2], transmembrane protease serine 2 [TMPRSS2], dipeptidyl peptidase 4 [DPP4], and lymphocyte antigen 6 complex locus E [LY6E]). We analyzed data from 2,012 ethnically diverse Africans and 15,977 individuals of European and African ancestry with electronic health records and integrated with global data from the 1000 Genomes Project. At ACE2, we identified 41 nonsynonymous variants that were rare in most populations, several of which impact protein function. However, three nonsynonymous variants (rs138390800, rs147311723, and rs145437639) were common among central African hunter-gatherers from Cameroon (minor allele frequency 0.083 to 0.164) and are on haplotypes that exhibit signatures of positive selection. We identify signatures of selection impacting variation at regulatory regions influencing ACE2 expression in multiple African populations. At TMPRSS2, we identified 13 amino acid changes that are adaptive and specific to the human lineage compared with the chimpanzee genome. Genetic variants that are targets of natural selection are associated with clinical phenotypes common in patients with COVID-19. Our study provides insights into global variation at host genes related to SARS-CoV-2 infection, which have been shaped by natural selection in some populations, possibly due to prior viral infections.Entities:
Keywords: African diversity; SARS-CoV-2/COVID-19; genetic variation; natural selection; phenotype association
Mesh:
Substances:
Year: 2022 PMID: 35580180 PMCID: PMC9173769 DOI: 10.1073/pnas.2123000119
Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN: 0027-8424 Impact factor: 12.779
Fig. 1.Genetic variation at ACE2. (A) Location of coding variants and their MAF at ACE2 identified from the pooled dataset. (B) MAF of coding variants in diverse global ethnic groups. (C) The geographic distribution of the MAF for rs138390800 at ACE2 in diverse global ethnic groups is highlighted. Each pie denotes frequencies of alleles in the corresponding population. (D) Locations of identified nonsynonymous variants within the secondary structure of the ACE2 protein. (E) Six regulatory eQTLs located in an upstream enhancer of ACE2. RNA Pol2 ChIA-PET data and DNase-seq data of the large intestine, small intestine, lung, kidney, and heart are from ENCODE (68). (F) Haplotype frequencies of the six eQTLs in global populations. The six eQTLs were ordered by their genomic position on the chromosome (i.e., with the following order: rs4830977, rs4830978, rs5936010, rs4830979, rs4830980, and rs5934263). Haplotypes with frequency <0.01 are not shown.
Fig. 2.Natural selection signatures at ACE2 in the Cameroon CAHG populations. (A) Haplotypes over 150 kb flanking ACE2 in CAHG populations. The x axis denotes the genetic variant position, and the y axis represents haplotypes. Each haplotype (one horizontal line) is composed of the genetic variants (columns). Red dots indicate the derived allele, while green dots indicate the ancestral allele. Haplotypes surrounded by the upper left vertical black lines suggest that these haplotypes carry derived allele(s) of the labeled variant near the corresponding black line. For example, the first black line denotes all the haplotypes that have the derived allele at rs138390800 (dark red line). Haplotypes carrying rs138390800, rs147311723, rs145437639, and rs186029035 show more homozygosity than other haplotypes; 1, 2, 3, and 4 along the top of the plot denote positions for rs147311723, rs186029035, rs145437639, and rs138390800, respectively. (B) EHH of rs138390800, rs186029035, and rs147311723 (rs145437639 is in strong LD with rs147311723) at ACE2 in CAHG populations.
Fig. 3.Natural selection signatures at the upstream region of ACE2 in African populations. (A) iHS signals at the upstream region of ACE2 (chrX:15650000-15720000) in African populations. Each dot represents a SNP. Red dots denote SNPs that are significant (|iHS| > 2). The gray solid lines denote the gene body region of ACE2. Putatively causal tag SNPs are annotated in the plots. (B) Haplotype network over 150 kb flanking ACE2 in diverse ethnic populations. The network was constructed with SNPs that showed iHS signals in all populations and overlapped with DNase regions or eQTLs. The four functional candidates identified in Cameroon CAHG were also included in the networks. Each pie represents a haplotype, each color represents a geographical population, and the size of the pie is proportional to that haplotype frequency. The dashed line denotes the boundary of clade 1 and clade 2. Black ovals denote haplotypes containing the corresponding variants. (C) Haplotypes containing variants rs5936010, rs5934263, rs4830984, and rs4830986 are highlighted. Red pies denote haplotypes containing the derived allele of the corresponding variants, while green pies denote haplotypes containing the ancestral allele of the corresponding variants.
Fig. 4.Genetic variation at TMPRSS2. (A) Location of coding variants and their MAF at TMPRSS2 identified from the pooled dataset. (B) MAF of coding variants in diverse global ethnic groups. (C) The geographic distribution of the MAF for rs75603675 at TMPRSS2 in diverse global ethnic groups is highlighted. Each pie denotes frequencies of alleles in the corresponding population. (D) Two regulatory eQTLs located in the promoter region of the TMPRSS2 gene. DNase-seq data of the large intestine, small intestine, lung, kidney, and heart are from ENCODE (68).
Fig. 5.Natural selection signatures at TMPRSS2. (A) The result of the MK test for TMPRSS2 in the pooled dataset. Nonsyn indicates nonsynonymous variants; Syn indicates synonymous variants. “Fixed” denotes variants that were fixed between the human and the chimpanzee; “Poly” represents polymorphic variants within human populations. The transcript ENST00000398585.7 was used for the calculation. (B) Illustration of locations of variants that are divergent between the human and chimpanzee lineages on the TMPRSS2 protein domains. Boxes denote the protein domains of TMPRSS2. Red lines represent nonsynonymous variants that occurred in the corresponding domains of TMPRSS2, with the amino acids and positions of the human and the chimpanzee annotated at the bottom of the lines. Blue lines denote synonymous variants. LDLRA, LDL receptor class A; SRCR, scavenger receptor cysteine-rich domain 2; TM, transmembrane domain.
Fig. 6.Associations between genetic variations at four genes and clinical disease phenotypes. (A) Gene-based association result between coding variants at four genes and 12 disease classes. The disease classes are shown on the x axis, and the y axis represents the P values. (B) PheWAS plot of the eQTLs associated with four genes and ∼1,800 disease codes across 17 disease categories. The disease categories are shown on the x axis, and the y axis represents the −log10 of the P values. The colored dots represent eQTLs and the direction of effect of the association. The red dashed lines denote the 0.0001 cutoff, and the blue dashed lines represent the 0.001 cutoff.
Associations of ACE2, DPP4, TMPRSS2, and LY6E with 12 disease classes derived from EHR data
| Disease phenotype | Gene | Cases | Controls | Carrier controls | Carrier cases | SKAT | Burden | Burden OR | Burden SE | 95% CI | Dataset |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Hepatic encephalopathy |
| 97 | 8,045 | 441 | 5 | 1.1E-12 | 0.0043 | 5.73 | 0.61 | 0.55–2.94 | AA |
| Respiratory syncytial virus infectious disease |
| 56 | 6,392 | 85 | 1 | 6.8E-07 | 0.1221 | 6.06 | 1.17 | −0.48–4.09 | AA |
| Respiratory failure |
| 199 | 6,392 | 11 | 2 | 2.3E-06 | 0.0124 | 7.31 | 0.80 | 0.43–3.55 | AA |
| Respiratory failure |
| 199 | 6,392 | 351 | 12 | 9.0E-05 | 0.0509 | 3.10 | 0.58 | 0–2.26 | AA |
| Upper respiratory tract disease |
| 144 | 6,392 | 85 | 3 | 2.5E-04 | 0.0978 | 4.16 | 0.86 | −0.26–3.11 | AA |
| Respiratory syncytial virus infectious disease |
| 56 | 6,392 | 11 | 1 | 3.9E-04 | 0.0217 | 11.63 | 1.07 | 0.36–4.55 | AA |
| Hepatic coma |
| 16 | 6,817 | 318 | 1 | 4.3E-31 | 0.0019 | 10.45 | 0.76 | 0.87–3.83 | EA |
| Respiratory syncytial virus infectious disease |
| 40 | 5,859 | 274 | 3 | 2.3E-07 | 0.1650 | 3.61 | 0.92 | −0.53–3.1 | EA |
| Cirrhosis of liver |
| 10 | 6,817 | 43 | 1 | 1.8E-04 | 0.0837 | 9.40 | 1.30 | −0.3–4.78 | EA |