| Literature DB >> 33121163 |
Yuanjing Ma1, Hongmei Jiang1, Sanjiv J Shah2, Donna Arnett3, Marguerite R Irvin4, Yuan Luo2.
Abstract
In this work, we proposed a process to select informative genetic variants for identifying clinically meaningful subtypes of hypertensive patients. We studied 575 African American (AA) and 612 Caucasian hypertensive participants enrolled in the Hypertension Genetic Epidemiology Network (HyperGEN) study and analyzed each race-based group separately. All study participants underwent GWAS (Genome-Wide Association Studies) and echocardiography. We applied a variety of statistical methods and filtering criteria, including generalized linear models, F statistics, burden tests, deleterious variant filtering, and others to select the most informative hypertension-related genetic variants. We performed an unsupervised learning algorithm non-negative matrix factorization (NMF) to identify hypertension subtypes with similar genetic characteristics. Kruskal-Wallis tests were used to demonstrate the clinical meaningfulness of genetic-based hypertension subtypes. Two subgroups were identified for both African American and Caucasian HyperGEN participants. In both AAs and Caucasians, indices of cardiac mechanics differed significantly by hypertension subtypes. African Americans tend to have more genetic variants compared to Caucasians; therefore, using genetic information to distinguish the disease subtypes for this group of people is relatively challenging, but we were able to identify two subtypes whose cardiac mechanics have statistically different distributions using the proposed process. The research gives a promising direction in using statistical methods to select genetic information and identify subgroups of diseases, which may inform the development and trial of novel targeted therapies.Entities:
Keywords: NMF; clustering algorithm; hypertension; subtype identification; variable selection
Mesh:
Year: 2020 PMID: 33121163 PMCID: PMC7693873 DOI: 10.3390/genes11111265
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Figure 1Flow chart of the procedure to select informative variants and identify hypertension subgroups.
Summary of number of variants with different functions using three gene annotation databases for African American patients.
| SNP Array | ANNOVAR Gene-Based Annotation | ||
|---|---|---|---|
| AA |
|
|
|
| nonsynonymous SNP | 10,554 | 11,073 | 11,404 |
| stopgain | 90 | 110 | 117 |
| stoploss | 12 | 17 | 17 |
| synonymous SNP | 17,802 | 18,231 | 18,511 |
| unknown | 553 | 10 | 9 |
Description of 9 phenotypic/outcome variables that are used to examine the significance of genetic-based hypertension subtype identification.
| Abbreviation | Full Name | Description |
|---|---|---|
| elateral | Lateral e’ velocity | Left ventricular early diastolic relaxation velocity, |
| eseptal | Septal e’ velocity | Left ventricular early diastolic relaxation velocity, |
| gcs | Global circumferential strain | Left ventricular circumferential strain, |
| gls | Global longitudinal strain | Left ventricular longitudinal strain, |
| grs | Global radial strain | Left ventricular radial strain, |
| sr_a | strain rate-atrial | Left ventricular late (atrial) diastolic strain rate, |
| sr_e | strain rate-early diastlic | Left ventricular early diastolic strain rate, |
| sr_s | strain rate-early systolic | Left ventricular systolic strain rate, |
| sseptal | Septal s’ velocity | Left ventricular systolic longitudinal velocity, |
| slateral | Lateral s’ velocity | Left ventricular systolic longitudinal velocity, |
Summary of Kruskal–Wallis test results for each phenotypic variable using different combination of filtrations.
| Methods | Phenotypic Variables | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Elateral | Eseptal | gcs | gls | grs | sr_a | sr_e | sr_s | Sseptal | |
| Logistic (0.05) + F + | 0.11 | 0.97 | 0.80 | ||||||
| Logistic (0.05) + F + | 0.51 | 0.32 | 0.18 | 0.11 | 0.16 | ||||
| Logistic (0.05) + del | 0.97 | 0.68 | 0.73 | 0.76 | 0.19 | 0.62 | 0.11 | ||
| Logistic (0.05) + del | 0.76 | 0.30 | 0.42 | 0.74 | 0.48 | 0.64 | 0.29 | ||
| Logistic (0.1) + del + | 0.49 | 0.11 | 0.97 | 0.16 | 0.72 | 0.13 | 0.29 | 0.30 | 0.39 |
| Logistic (0.1) + del | 0.77 | 0.82 | 0.89 | 0.65 | 0.21 | 0.45 | 0.15 | ||
[a] The first row shows the name of 9 phenotypic variables; [b] “Rare” means only rare variants are used; [c] “geneaggr” means aggregating variants within the same genetic region; [d] “All” means all variants are used; [e] “del” means deleterious variants.
The first half is the summary of number of significant phenotypic variables and number of variants selected to do the clustering analysis using different combinations of filtrations. The second half summarizes the number of hypertensive patients in each subgroup under different combinations of filtrations.
| Methods/Number of | Number of Significant | Number of Hypertensive | ||
|---|---|---|---|---|
| African American | Caucasian | African American | Caucasian | |
| Logistic (0.05) + F + Rare + geneaggr | 6 (472) | 0 | ||
| Logistic (0.05) + F + All + geneaggr | 4 (12,555) | 6 (675) | 430/145 | 583/29 |
| Logistic (0.05) + del + geneaggr | 2 (370; 339) | 6 (217; 213) | 318/257 | 485/127 |
| Logistic (0.05) + del | 2 (370) | 6 (217) | 445/130 | 485/127 |
| Logistic (0.1) + del + geneaggr | 0 (735, 643) | 6 (467, 379) | 398/177 | 486/126 |
| Logistic (0.1) + del | 2 (735) | 7 (467) | 273/302 | 484/128 |
Figure 2Box plots of phenotypic/outcome variables whose distributions are significantly different between two clusters in African American cohort. The p-value for gls is 0.0205 and the p-value for sr_s is 0.0141.
Figure 3Box plots of phenotypic/outcome variables whose distributions are significantly different between two clusters in Caucasian cohort. The p-values for eseptal, elateral, gls, sr_a, sr_s and sseptal are 0.0135, 0.0465, 0.0246, 0.0151, 0.0774, 0.0879 respectively.