Literature DB >> 30374405

eLD: entropy-based linkage disequilibrium index between multiallelic sites.

Yukinori Okada1,2.   

Abstract

Quantification of linkage disequilibrium (LD) is a critical step in studies investigating human genome variations. Commonly used LD indices such as r 2 handle LD of biallelic variants for two sites. As shown in a previously introduced LD index of ε, normalized entropy difference of the haplotype frequency between LD and linkage equilibrium (LE) could be utilized to estimate LD of biallelic variants for multiple sites. Here, we developed eLD (entropy-based Linkage Disequilibrium index between multiallelic sites) as publicly available software to calculate ε of multiallelic variants for two sites. Application of eLD could dissect complex LD structures among multiple HLA genes (e.g., strong LD among HLA-DRB1, HLA-DQA1, and HLA-DQB1 in East Asians). Use of eLD is not restricted to haplotype-based LD; it is also applicable to genotype-based LD. Therefore, eLD enables estimation of trans-regional LD of SNP genotypes at two unlinked loci, such as the nonlinear LD between functional missense variants of ADH1B (rs1229984 [Arg47His]) and ALDH2 (rs671 [Glu504Lys]).

Entities:  

Year:  2018        PMID: 30374405      PMCID: PMC6197273          DOI: 10.1038/s41439-018-0030-x

Source DB:  PubMed          Journal:  Hum Genome Var        ISSN: 2054-345X


Linkage disequilibrium (LD) is defined as the nonrandom distribution of alleles at different loci[1]. Quantitative assessment of LD in a population of interest is an important procedure to conduct fine-mapping of causal variants embedded in the disease risk loci identified by genome-wide association studies (GWAS)[2]. Population-specific features of LD are related to ethnically heterogeneous distributions of single-nucleotide polymorphisms (SNPs)[3]. The most widely used measurements of LD are r and D′; both values quantify LD between biallelic variants (i.e., SNPs) for two sites, reflecting nonrandom distributions of four haplotypes consisting of pairwise combinations of the alleles. Specifically, r2 can be interpreted as Pearson’s correlation measurement (R2) of allele distributions and is known to be proportional to χ2 values of genotype–phenotype association statistics between two sites[1]. LD values can easily be calculated using publicly available software (e.g., PLINK and vcftools), or using downloaded pre-calculated values from websites (e.g., HaploReg and LocusZoom). Nothnagel et al.[4] previously demonstrated that r2 can also be interpreted as normalized entropy in haplotype frequencies, and introduced a novel LD index named ε (see definition in Supplementary Information). ε represents the normalized entropy difference of the haplotype frequencies between LD and those expected under the null hypothesis of no LD (i.e., linkage equilibrium [LE]). The value of ε ranges between 0 and 1, with larger values indicating stronger LD. Application of ε enabled LD quantification of biallelic variants for multiple sites (Fig. 1)[4], which was effective in selecting tag SNPs free from ambiguous definitions of LD blocks in an unbiased manner[5].
Fig. 1

eLD: entropy-based linkage disequilibrium index between multiallelic sites. While r quantifies LD of biallelic variants for two sites, ε quantifies LD of biallelic variants for multiple sites[4] or multiallelic variants for two sites[6]. We developed eLD (ntropy-based inkage isequilibrium index between multiallelic sites) as publicly available software to calculate ε of multiallelic variants for two sites (see the software URL)

eLD: entropy-based linkage disequilibrium index between multiallelic sites. While r quantifies LD of biallelic variants for two sites, ε quantifies LD of biallelic variants for multiple sites[4] or multiallelic variants for two sites[6]. We developed eLD (ntropy-based inkage isequilibrium index between multiallelic sites) as publicly available software to calculate ε of multiallelic variants for two sites (see the software URL) We have recently extended ε to further quantify LD of multiallelic variants for two sites as described elsewhere (Fig. 1)[6]. Here, we developed eLD (ntropy-based inkage isequilibrium index between multiallelic sites) as publicly available software to calculate the ε of multiallelic variants for two sites (see the software URL). Various multiallelic variants exist with important clinical impacts in terms of genotype–phenotype associations. Of these, polymorphisms of human leukocyte antigen (HLA) genes in the major histocompatibility (MHC) locus have a wide spectrum of risk for a variety of human diseases. While elucidation of the complex LD structure of HLA genes has been challenging, application of ε clearly identified hidden LD relationships among the HLA genes[6]. For example, we observed relatively strong LD between HLA-C and HLA-B, among HLA-DRB1, HLA-DQA1, and HLA-DQB1, and between HLA-DPA1 and HLA-DPB1 (ε > 0.15; calculated using 4-digit classical alleles of a subset of the East Asian subjects [n = 300] enrolled in the original studies[6,7]; Fig. 2). Since estimation of the haplotype frequency could be biased when its distribution is sparse, an option to combine the alleles with frequencies lower than the defined threshold (0.05 in default settings) into a single dummy allele is implemented in eLD.
Fig. 2

Pairwise quantification of LD among the classical HLA gene variants. Pairwise LD index of ε among the 4-digit alleles of the classical HLA genes were evaluated using eLD. Phased HLA alleles were obtained from a subset of the East Asian subjects (n = 300) of the original studies[6, 7]. Strong LD between HLA-C and HLA-B, among HLA-DRB1, HLA-DQA1, and HLA-DQB1, and between HLA-DPA1 and HLA-DPB1 was observed (ε > 0.15)

Pairwise quantification of LD among the classical HLA gene variants. Pairwise LD index of ε among the 4-digit alleles of the classical HLA genes were evaluated using eLD. Phased HLA alleles were obtained from a subset of the East Asian subjects (n = 300) of the original studies[6, 7]. Strong LD between HLA-C and HLA-B, among HLA-DRB1, HLA-DQA1, and HLA-DQB1, and between HLA-DPA1 and HLA-DPB1 was observed (ε > 0.15) One of the novel features of eLD is to empirically estimate a value of ε in a null hypothesis of LE (= ε_NULL). Additionally, it also calculates the ε actually observed in a given data set ( = ε_Observed). eLD calculates ε_NULL based on a permutation approach. By randomly shuffling connections of the alleles between the two sites, ε_NULL is estimated as the mean value of ε obtained in each iteration step (×1000 iterations in default settings). Since the baseline value of ε_NULL depends on the number of alleles in each site, calculation of ε_NULL as well as ε_Observed would help to evaluate the relative strength of LD relationships at the observed sites. Another feature of the software is that application of eLD is not restricted to haplotype-based LD; it is also applicable to genotype-based LD. Using eLD, one can estimate LD between loci where phasing of the haplotypes is theoretically difficult. As an illustrative example, we estimated trans-regional LD in two unlinked loci: ADH1B at 4q23 and ALDH2 at 12q24. ADH1B and ALDH2 harbor well known functional missense variants at rs1229984 (Arg47His) and rs671 (Glu504Lys), respectively. Both of these SNPs have pleiotropic effects on a number of human complex traits, including dietary habits. Studies investigating natural selection pressure identified strong significant positive selection on these missense variants in Japanese or other East Asian populations, which was closely linked to geographical heterogeneity in allele frequency spectra of these SNPs even within a single population[8]. Here, using eLD, we calculated ε to estimate trans-regional LD between rs1229984 and rs671 (Fig. 3). We obtained genotypes for these SNPs from East Asian subjects within the 1000 Genomes Projects (n = 504, phase 3 version 5), and found a high ε_Observed value (=0.0053) when compared to ε_NULL (= 0.0024). As expected from natural selection pressure on these variants[8], rs1229984AA-rs671AA genotypes and rs1229984GG-rs671GG genotypes had increased frequencies compared to those variants in LE (≥1.21-fold), while rs1229984GG-rs671GA genotypes had decreased frequencies (0.58-fold) compared to those variants in LE. While Pearson’s correlation between genotypes can also evaluate trans-regional LD, nonlinear relationships of genotypes (such as the reduced frequency of rs1229984GG-rs671GA) would not have been reflected with this measurement.
Fig. 3

-regional LD between functional missense variants of and . eLD can quantify genotype-based LD as well as haplotype-based LD and thus can estimate trans-regional LD between two unlinked loci without haplotype phasing. LD between genotypes of the two functional missense variants of ADH1B (rs1229984 [Arg47His]) and ALDH2 (rs671 [Glu504Lys]) was assessed by using eLD. Observed ε (= ε_Observed) and ε expected under the null hypothesis of LE (= ε_NULL) are indicated. Frequencies of the genotypes and genotype combinations are visualized as in the legend. We observed nonrandom and nonlinear distribution of the genotypes between the variants (i.e., reduced frequency in rs1229984GG-rs671GA)

-regional LD between functional missense variants of and . eLD can quantify genotype-based LD as well as haplotype-based LD and thus can estimate trans-regional LD between two unlinked loci without haplotype phasing. LD between genotypes of the two functional missense variants of ADH1B (rs1229984 [Arg47His]) and ALDH2 (rs671 [Glu504Lys]) was assessed by using eLD. Observed ε (= ε_Observed) and ε expected under the null hypothesis of LE (= ε_NULL) are indicated. Frequencies of the genotypes and genotype combinations are visualized as in the legend. We observed nonrandom and nonlinear distribution of the genotypes between the variants (i.e., reduced frequency in rs1229984GG-rs671GA) In summary, we developed software, which we named eLD, that quantifies the entropy-based LD index of ε in multiallelic variants for two sites, such as LD between highly polymorphic HLA genes. eLD also enables estimation of trans-regional LD of SNP genotypes, such as functional variants of ADH1B and ALDH2. We note that normalized entropy has increased the potential to dissect complex dependencies among human genome variations (e.g., Y-chromosomal short tandem repeat [STR] marker selection[9]), and development of additional methodology should be warranted.

Software availability

eLD is freely available at http://www.sg.med.osaka-u.ac.jp/tools.html with example data sets.
  9 in total

1.  Entropy as a measure for linkage disequilibrium over multilocus haplotype blocks.

Authors:  M Nothnagel; R Fürst; K Rohde
Journal:  Hum Hered       Date:  2002       Impact factor: 0.444

2.  Shannon's equivocation for forensic Y-STR marker selection.

Authors:  Sabine Siegert; Lutz Roewer; Michael Nothnagel
Journal:  Forensic Sci Int Genet       Date:  2015-02-09       Impact factor: 4.882

Review 3.  From genome-wide associations to candidate causal variants by statistical fine-mapping.

Authors:  Daniel J Schaid; Wenan Chen; Nicholas B Larson
Journal:  Nat Rev Genet       Date:  2018-08       Impact factor: 53.242

4.  Construction of a population-specific HLA imputation reference panel and its application to Graves' disease risk in Japanese.

Authors:  Yukinori Okada; Yukihide Momozawa; Kyota Ashikawa; Masahiro Kanai; Koichi Matsuda; Yoichiro Kamatani; Atsushi Takahashi; Michiaki Kubo
Journal:  Nat Genet       Date:  2015-06-01       Impact factor: 38.330

5.  The effect of single-nucleotide polymorphism marker selection on patterns of haplotype blocks and haplotype frequency estimates.

Authors:  Michael Nothnagel; Klaus Rohde
Journal:  Am J Hum Genet       Date:  2005-10-19       Impact factor: 11.025

Review 6.  Linkage disequilibrium--understanding the evolutionary past and mapping the medical future.

Authors:  Montgomery Slatkin
Journal:  Nat Rev Genet       Date:  2008-06       Impact factor: 53.242

7.  Empirical estimation of genome-wide significance thresholds based on the 1000 Genomes Project data set.

Authors:  Masahiro Kanai; Toshihiro Tanaka; Yukinori Okada
Journal:  J Hum Genet       Date:  2016-06-16       Impact factor: 3.172

8.  Risk for ACPA-positive rheumatoid arthritis is driven by shared HLA amino acid polymorphisms in Asian and European populations.

Authors:  Yukinori Okada; Kwangwoo Kim; Buhm Han; Nisha E Pillai; Rick T-H Ong; Woei-Yuh Saw; Ma Luo; Lei Jiang; Jian Yin; So-Young Bang; Hye-Soon Lee; Matthew A Brown; Sang-Cheol Bae; Huji Xu; Yik-Ying Teo; Paul I W de Bakker; Soumya Raychaudhuri
Journal:  Hum Mol Genet       Date:  2014-07-28       Impact factor: 6.150

9.  Deep whole-genome sequencing reveals recent selection signatures linked to evolution and disease risk of Japanese.

Authors:  Yukinori Okada; Yukihide Momozawa; Saori Sakaue; Masahiro Kanai; Kazuyoshi Ishigaki; Masato Akiyama; Toshihiro Kishikawa; Yasumichi Arai; Takashi Sasaki; Kenjiro Kosaki; Makoto Suematsu; Koichi Matsuda; Kazuhiko Yamamoto; Michiaki Kubo; Nobuyoshi Hirose; Yoichiro Kamatani
Journal:  Nat Commun       Date:  2018-04-24       Impact factor: 14.919

  9 in total
  5 in total

1.  Recessive Z-linked lethals and the retention of haplotype diversity in a captive butterfly population.

Authors:  Ilik J Saccheri; Samuel Whiteford; Carl J Yung; Arjen E Van't Hof
Journal:  Heredity (Edinb)       Date:  2020-05-13       Impact factor: 3.821

2.  A high-resolution HLA reference panel capturing global population diversity enables multi-ancestry fine-mapping in HIV host response.

Authors:  Yang Luo; Masahiro Kanai; Wanson Choi; Xinyi Li; Saori Sakaue; Kenichi Yamamoto; Kotaro Ogawa; Maria Gutierrez-Arcelus; Peter K Gregersen; Philip E Stuart; James T Elder; Lukas Forer; Sebastian Schönherr; Christian Fuchsberger; Albert V Smith; Jacques Fellay; Mary Carrington; David W Haas; Xiuqing Guo; Nicholette D Palmer; Yii-Der Ida Chen; Jerome I Rotter; Kent D Taylor; Stephen S Rich; Adolfo Correa; James G Wilson; Sekar Kathiresan; Michael H Cho; Andres Metspalu; Tonu Esko; Yukinori Okada; Buhm Han; Paul J McLaren; Soumya Raychaudhuri
Journal:  Nat Genet       Date:  2021-10-05       Impact factor: 38.330

3.  Human leukocyte antigen class II gene diversity tunes antibody repertoires to common pathogens.

Authors:  Taushif Khan; Mahbuba Rahman; Ikhlak Ahmed; Fatima Al Ali; Puthen Veettil Jithesh; Nico Marr
Journal:  Front Immunol       Date:  2022-08-08       Impact factor: 8.786

4.  Genome-Wide Natural Selection Signatures Are Linked to Genetic Risk of Modern Phenotypes in the Japanese Population.

Authors:  Yoshiaki Yasumizu; Saori Sakaue; Takahiro Konuma; Ken Suzuki; Koichi Matsuda; Yoshinori Murakami; Michiaki Kubo; Pier Francesco Palamara; Yoichiro Kamatani; Yukinori Okada
Journal:  Mol Biol Evol       Date:  2020-05-01       Impact factor: 16.240

5.  Estimation of German KIR Allele Group Haplotype Frequencies.

Authors:  Ute V Solloch; Daniel Schefzyk; Gesine Schäfer; Carolin Massalski; Maja Kohler; Jens Pruschke; Annett Heidl; Johannes Schetelig; Alexander H Schmidt; Vinzenz Lange; Jürgen Sauter
Journal:  Front Immunol       Date:  2020-03-12       Impact factor: 7.561

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.