| Literature DB >> 31869403 |
Madeline H Kowalski1, Huijun Qian2, Ziyi Hou3, Jonathan D Rosen1, Amanda L Tapia1, Yue Shan1, Deepti Jain4, Maria Argos5, Donna K Arnett6, Christy Avery7, Kathleen C Barnes8, Lewis C Becker9, Stephanie A Bien10, Joshua C Bis11, John Blangero12, Eric Boerwinkle13,14, Donald W Bowden15, Steve Buyske16, Jianwen Cai17, Michael H Cho18,19, Seung Hoan Choi20, Hélène Choquet21, L Adrienne Cupples22,23, Mary Cushman24, Michelle Daya8, Paul S de Vries14, Patrick T Ellinor20,25, Nauder Faraday9, Myriam Fornage26, Stacey Gabriel27, Santhi K Ganesh28,29, Misa Graff7, Namrata Gupta27, Jiang He30, Susan R Heckbert31,32, Bertha Hidalgo33, Chani J Hodonsky7, Marguerite R Irvin33, Andrew D Johnson23,34, Eric Jorgenson21, Robert Kaplan35, Sharon L R Kardia36, Tanika N Kelly30, Charles Kooperberg10, Jessica A Lasky-Su18,19, Ruth J F Loos37,38, Steven A Lubitz20,25, Rasika A Mathias9, Caitlin P McHugh4, Courtney Montgomery39, Jee-Young Moon35, Alanna C Morrison14, Nicholette D Palmer15, Nathan Pankratz40, George J Papanicolaou41, Juan M Peralta12, Patricia A Peyser36, Stephen S Rich42, Jerome I Rotter43, Edwin K Silverman18,19, Jennifer A Smith36, Nicholas L Smith31,32,44, Kent D Taylor43, Timothy A Thornton4, Hemant K Tiwari45, Russell P Tracy46, Tao Wang47, Scott T Weiss18,19, Lu-Chen Weng20, Kerri L Wiggins11, James G Wilson48, Lisa R Yanek9, Sebastian Zöllner49,50, Kari E North7,51, Paul L Auer52, Laura M Raffield53, Alexander P Reiner31, Yun Li1,53,54.
Abstract
Most genome-wide association and fine-mapping studies to date have been conducted in individuals of European descent, and genetic studies of populations of Hispanic/Latino and African ancestry are limited. In addition, these populations have more complex linkage disequilibrium structure. In order to better define the genetic architecture of these understudied populations, we leveraged >100,000 phased sequences available from deep-coverage whole genome sequencing through the multi-ethnic NHLBI Trans-Omics for Precision Medicine (TOPMed) program to impute genotypes into admixed African and Hispanic/Latino samples with genome-wide genotyping array data. We demonstrated that using TOPMed sequencing data as the imputation reference panel improves genotype imputation quality in these populations, which subsequently enhanced gene-mapping power for complex traits. For rare variants with minor allele frequency (MAF) < 0.5%, we observed a 2.3- to 6.1-fold increase in the number of well-imputed variants, with 11-34% improvement in average imputation quality, compared to the state-of-the-art 1000 Genomes Project Phase 3 and Haplotype Reference Consortium reference panels. Impressively, even for extremely rare variants with minor allele count <10 (including singletons) in the imputation target samples, average information content rescued was >86%. Subsequent association analyses of TOPMed reference panel-imputed genotype data with hematological traits (hemoglobin (HGB), hematocrit (HCT), and white blood cell count (WBC)) in ~21,600 African-ancestry and ~21,700 Hispanic/Latino individuals identified associations with two rare variants in the HBB gene (rs33930165 with higher WBC [p = 8.8x10-15] in African populations, rs11549407 with lower HGB [p = 1.5x10-12] and HCT [p = 8.8x10-10] in Hispanics/Latinos). By comparison, neither variant would have been genome-wide significant if either 1000 Genomes Project Phase 3 or Haplotype Reference Consortium reference panels had been used for imputation. Our findings highlight the utility of the TOPMed imputation reference panel for identification of novel rare variant associations not previously detected in similarly sized genome-wide studies of under-represented African and Hispanic/Latino populations.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31869403 PMCID: PMC6953885 DOI: 10.1371/journal.pgen.1008500
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 6.020
Fig 1Comparison of imputation reference panels, for variants with MAF < 1%.
Imputation quality (measured by true R2 [Y-axis]) is plotted with progressively more stringent post-imputation filtering from left to right, with filtering according to estimated R2 (X-axis), for variants with MAF < 1%. Top panels are for the JHS cohort and bottom panels for the HCHS/SOL cohort. Three reference panels are shown: TOPMed (TOPMed freeze 5b), 1000G (the 1000 Genomes Phase 3), and HRC (the Haplotype Reference Consortium).
Number of well-imputed variants using TOPMed freeze 5b, 1000 Genomes Phase 3 (1000G) and Haplotype Reference Consortium (HRC).
| Imputation Reference Panel | Total number of variants in reference panel | Total number of well imputed variants | Total number of well imputed variants with MAF<0.5% | Total number of well imputed variants with MAF<0.05% | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| JHS | HCHS/SOL | JHS | avgTrueR2 | HCHS/SOL | avgTrueR2 | JHS | avgTrueR2 | HCHS/SOL | avgTrueR2 | ||
| TOPMed5b | 88,062,238 | 51,467,522 | 57,845,194 | 33,355,468 | 89.89% | 44,439,594 | 89.99% | 16,205,279 | 88.64% | 28,230,718 | 75.21% |
| 1000G | 49,143,605 | 28,454,330 | 35,178,969 | 7,857,211 | 73.60% | 19,192,645 | 81.08% | 734,063 | 83.72% | 4,901,159 | 71.39% |
| HRC | 39,635,008 | 21,745,746 | 26,012,190 | 5,488,848 | 67.33% | 13,330,317 | 75.19% | 1,371,526 | 78.77% | 2,637,393 | 67.78% |
HCHS/SOL, Hispanic Community Health Study/Study of Latinos, JHS, Jackson Heart Study, MAF, minor allele frequency
The total number of well imputed variants is extrapolated from three selected 3 Mb regions: 16-19Mb region from chromosomes 3, 12, and 20. These regions were chosen arbitrarily across a range of chromosome sizes, avoiding centromere, telomere, and low-mappability regions. Imputation was carried out using all typed SNPs +/-1Mb (i.e., 15-20Mb) and quality was evaluated in the core 3Mb region. Post imputation quality control was carried out in seven MAF categories separately: < .05%, .05-.2%, .2-.5%, .5–1%, 1–3%, 3–5%, and >5%. In each MAF category, an estimated R2 threshold (standard imputation software metric calculated based on the ratio of observed variance in imputed dosages over expected variance based on allele frequencies) was selected to ensure variants above the threshold have an average estimated R2 of at least 0.8. These variants constitute the well imputed variants. For variants with a MAF<0.5% and <0.05%, respectively, we additionally assessed avgTrueR2, average true squared Pearson correlation between imputed genotypes and genotypes from available whole genome sequencing data (JHS) or genotyping array data (HCHS/SOL).
Imputation quality for rare variants (minor allele count< = 10) in the Jackson Heart Study (JHS).
| JHS MAC | #Variants | #QC+ | avgMAC | avgMAC_QC+ | avgEstR2 | avgTrueR2 |
|---|---|---|---|---|---|---|
| 1 | 8,673,112 | 6,236,211 | 29.3 | 31.0 | 86.9% | 92.0% |
| 2 | 5,488,071 | 4,502,844 | 37.2 | 39.0 | 89.0% | 86.7% |
| 3 | 3,865,676 | 3,304,749 | 46.7 | 48.4 | 90.3% | 86.2% |
| 4 | 2,786,048 | 2,425,855 | 59.1 | 60.9 | 91.1% | 86.4% |
| 5 | 2,058,252 | 1,809,190 | 73.7 | 75.8 | 91.6% | 86.9% |
| 6 | 1,570,124 | 1,377,280 | 91.0 | 93.9 | 92.0% | 87.5% |
| 7 | 1,223,738 | 1,088,972 | 110.3 | 112.4 | 92.3% | 88.1% |
| 8 | 992,012 | 890,572 | 127.3 | 129.0 | 92.5% | 88.5% |
| 9 | 836,222 | 753,584 | 145.7 | 147.4 | 92.8% | 89.1% |
| 10 | 713,541 | 643,909 | 163.4 | 165.0 | 93.0% | 89.5% |
MAC, minor allele count, #Variants, total number of variants with a given MAC in JHS which overlapped with the TOPMed freeze 5b reference panel, QC+, number of these variants which passed imputation quality control, avgMAC, the average minor allele count in the (TOPMed freeze 5b minus JHS) reference panel of these variants, avgMAC_QC+, the average minor allele count in the (TOPMed freeze 5b minus JHS) reference panel of variants which passed imputation quality control. avgEstR2, average estimated R2 for imputed variants (standard imputation software metric calculated based on the ratio of observed variance in imputed dosages over expected variance based on allele frequencies), avgTrueR2, average true squared Pearson correlation between imputed genotypes and genotypes from available whole genome sequencing data. Variants that did not have a MAC>5 in the full TOPMed freeze 5b reference panel were not evaluated.
Novel variants detected in TOPMed freeze 5b imputed Hispanic/Latino and African ancestry cohorts, in association analyses with white blood cell count, hemoglobin, and hematocrit.
| Ancestry | rs# | Estimated R2 1 | Phenotype | Effect allele | EAF | β | SE | P-value | Replication β | Replication P-value | Gene | Annotation |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| African ancestry | rs33930165 | 0.831–0.994 | WBC | T | 1.14% | 0.35 | 0.04 | 8.8E-15 | 0.27 | 4.6E-04 | missense (p.Glu7Lys) | |
| Hispanic/Latino | rs11549407 | 0.862–1.000 | HCT | A | 0.03% | -1.66 | 0.27 | 8.8E-10 | NA4 | stop gain (p.Gln40Ter) | ||
| Hispanic/Latino | rs11549407 | 0.862–1.000 | HGB | A | 0.03% | -1.92 | 0.27 | 1.5E-12 | NA4 | stop gain (p.Gln40Ter) | ||
EAF, effect allele frequency, HCT, hematocrit, HGB, hemoglobin, WBC, white blood cell count.
Imputation R2 (estimated R2) range reported across all included imputed cohorts.
Association results adjusted for nearby known SNPs whenever applicable.
Association models for rs33930165 were adjusted for SNP rs2814778; removing potential minor allele homozygotes
Association models for rs11549407 were adjusted for SNPs rs334, rs33930165, and rs2213169 rs334 and rs2213169 did not pass variant quality filters in TOPMed freeze 5b and were not included in our main analyses. However, to follow up our novel results in the HBB locus, we phased the failed variants in freeze5b and performed targeted imputation using TOPMed freeze 5b calls for rs334 and rs2213169
NA: among TOPMed freeze 5b Hispanic/Latino individuals, MAC = 1 so association statistics are not available