Literature DB >> 35602224

Estimating heritability of glycaemic response to metformin using nationwide electronic health records and population-sized pedigree.

Iris N Kalka^1,2, Amir Gavrieli^1,2, Smadar Shilo^1,2,3, Hagai Rossman^1,2, Nitzan Shalom Artzi^1,2, Nancy-Sarah Yacovzada^1,2,4, Eran Segal^1,2.

Abstract

Background: Variability of response to medication is a well-known phenomenon, determined by both environmental and genetic factors. Understanding the heritable component of the response to medication is of great interest but challenging due to several reasons, including small study cohorts and computational limitations.
Methods: Here, we study the heritability of variation in the glycaemic response to metformin, first-line therapeutic agent for type 2 diabetes (T2D), by leveraging 18 years of electronic health records (EHR) data from Israel's largest healthcare service provider, consisting of over five million patients of diverse ethnicities and socio-economic background. Our cohort consists of 80,788 T2D patients treated with metformin, with an accumulated number of 1,611,591 HbA1C measurements and 4,581,097 metformin prescriptions. We estimate the explained variance of glycated hemoglobin (HbA1c%) reduction due to inheritance by constructing a six-generation population-size pedigree from national registries and linking it to medical health records.
Results: Using Linear Mixed Model-based framework, a common-practice method for heritability estimation, we calculate a heritability measure of h 2 = 12.6 % (95% CI, 6.1 % - 19.1 % ) for absolute reduction of HbA1c% after metformin treatment in the entire cohort, h 2 = 21.0 % (95% CI, 7.8 % - 34.4 % ) for males and h 2 = 22.9 % (95% CI, 10.0 % - 35.7 % ) in females. Results remain unchanged after adjusting for pre-treatment HbA1c%, and in proportional reduction of HbA1c%. Conclusions: To the best of our knowledge, our work is the first to estimate heritability of drug response using solely EHR data combining a pedigree-based kinship matrix. We demonstrate that while response to metformin treatment has a heritable component, most of the variation is likely due to other factors, further motivating non-genetic analyses aimed at unraveling metformin's action mechanism.

Entities: Chemical

Keywords: Heritable quantitative trait; Type 2 diabetes

Year: 2021 PMID： 35602224 PMCID： PMC9053254 DOI： 10.1038/s43856-021-00058-4

Source DB: PubMed Journal: Commun Med (Lond) ISSN： 2730-664X

Introduction

During the past three decades, there has been a twofold increase diabetes prevalence in the general population (WHO), currently estimated to afflict one in every 16 adults[1]. Type 2 diabetes (T2D), which accounts for ~90% of the total diabetic population, is a major cause of morbidity and is among the top ten mortality causes in adults[2,3]. Metformin is the first-line oral agent for lowering blood sugar levels in T2D patients. Through inhibition of hepatic glucose production it reduces intestinal glucose absorption, and improves both glucose uptake and its utilization[4]. The significant role of metformin in T2D management is particularly remarkable since its mechanism is still not fully understood[5-7]. Glycaemic response to metformin is varied across patients[6,7], and remains unexplained by individual features. Some variation can be accounted for by personal characteristics including sex, age, and BMI, as well as features describing treatment strategies such as dosage and adherence[8]. In addition, a small fraction of the response variability is attributed to genetic variants, providing motivation to further explore heritable variance in metformin response[9]. Medication response variations are widely agreed upon to be determined by the interplay of environmental and genetic factors[10,11]. The effect of heritable factors has been suggested as early as 1908[12]. This notion led to the development of pharmacogenomics, which investigates genetic variants that account for differential drug responses and personal responses to treatments[13]. Traditionally heritability estimates are deciphered through twins and family studies, however, those are difficult to construct in the context of medication response. Drug response data, same diagnosis, and similar treatment are rarely available in multiple family members[14,15]. Moreover, because close relatives often share environment and not only genetics, such studies have difficulties in separating the genetic and environmental effects. Other types of studies estimating the effect of genetic variability in drug responses rely on small cohorts undergoing costly genetic tests and use genetic relatedness estimation methods[16-19]. Some of these studies employ methods such as genome-wide complex trait analysis, which requires a large cohort, ideally greater than 10,000, however, most such cohorts are limited in their size resulting in estimates with low statistical power and do not represent the true distribution of the population[20,21]. In studies bypassing genetic tests, such as family-linkage studies, information is highly sparse, and determining the response to medication by genetic and environmental factors is computationally challenging. Epigenetics may also play a role in the response to medication making the task even harder[22]. Metformin’s effect is routinely measured through glycaemic control assessments using either fasting glucose or HbA1c%[23]. The latter is an indicator of blood sugar levels over the course of three months[24], making it more reliable than the former, which is a snapshot of a single time point. Moreover, fasting glucose is affected by the strictness of fasting prior to the blood tests, an unrecorded measure, making fasting glucose more prone to mistakes. In this study, we used Electronic Health Records (EHRs) from the Clalit Healthcare database, Israel’s largest healthcare service provider[25]. This population-size EHR provides a real-world view of the internal variability in healthcare systems, where patients, diagnoses, and treatment plans vary considerably. In general, EHRs can contain medical information on millions of patients, however, data are sparse and noisy, and not cross-sectional[26]. Combined with pedigree information from Israel’s national registries this unique data allowed us to include the family medical history of first-order relatives and extended family members alike. Today, heritability estimation is typically performed using genotyping-based methods such as LD Score Regression from GWAS results[27]. Such models consider a matrix of standardized genotypes, estimating the heritability from the effects of the genetic variates that are accounted for. An alternative method was presented using Sparse Cholesky Factorization (Sci-LMM) package[28], a statistical modeling framework for analyzing population-size pedigrees. Sci-LMM replaces the genetic matrix in a Linear Mixed Model (LMMs)[29] with a vector that is sampled from the normal multi-dimensional distribution whose covariance matrix is a kinship matrix. The kinship matrix, commonly computed from genetic information, can be constructed from pedigree relationships solely from EHR information, without costly genetic testing. We estimated the heritability of absolute HbA1c% reduction in response to metformin to be (95% CI, 6.1–19.1%) for the entire cohort, (95% CI, 7.8–34.4%) for males and (95% CI, 10.0–35.7%) in females of the total explained variability.

Methods

Data

We used EHRs of Clalit Health Services (Clalit), Israel’s largest healthcare provider. Clalit’s data are heterogeneous in terms of geography and socioeconomics, including more than five million people (over half of Israel’s population) with longitudinal measurements dating back to 2002. EHRs are reflective of the members’ full clinical experience including diagnoses, lab test results, and medication prescribed and dispensed. Patients’ information is combined with national registries to provide demographics consisting of the date of birth, sex, parental information, and county of birth, from which ethnicity is inferred[30]. The full-study protocol was approved by the Clalit Helsinki Committee 0195-17-COM2, with exemption from informed consent as the study, is observational and used de-identified data.

Pedigree and kinship matrix construction

We obtained pedigree information through demographics of past and present patients as well as their parents, and then excluded cases where parental relationships and sex contradicted (e.g., a female father). We converted the entire pedigree to a directed graph using NetworkX[31], where nodes and edges corresponded to individuals and to parenthood respectively, and removed all edges of directed cycles, as these are not feasible[32]. Heritability estimates require a kinship matrix, also known as an Additive Relationship Matrix (ARM)[33], measuring the proportion of identical alleles between pairs of individuals. We approximated the ARM solely from pedigree information, under the assumption that alleles distribute uniformly, meaning each gene has an identical probability to be passed on[34]. For every pair of individuals and a unique shortest path between them through a shared ancestor, we increased their similarity by where is the number of edges in the path (Supplementary Fig. 1a–c). We decided against removing first-degree relatives in heritability estimates. Although some studies suggest it reduces estimation bias, we found it less relevant to our case[35].

Identification of T2D patients

In Israel, T2D is diagnosed based on plasma glucose criteria, in accordance with The American Diabetes Association standard of care[36]. Meeting any of the following criteria is sufficient for T2D diagnosis: (1) random plasma glucose mg/dL; (2) HbA1c% ; (3) two separate test samples of fasting plasma glucose mg/dL following no caloric intake for at least 8 h; (4) plasma glucose mg/dL 2 h after oral glucose-tolerance test (OGTT). Note that although fasting glucose could be used in the diagnosis of T2D, data is inaccurate as some non-fasting patients take the test as well. Also, OGTT tests are not performed regularly in clinics, making us disregard the corresponding criterion. Due to the nature of the Israeli healthcare system, it is a possibility that an individual was diagnosed with diabetes based on tests unavailable in our database (e.g., in hospital). Therefore, in addition to identifying T2D patients through test results, we made use of diagnoses data. Including all patients diagnosed with T2D according to the appropriate International Classification of Diseases, Ninth Revision (ICD-9) codes[37] (Supplementary Table 1 and Supplementary Fig. 2).

Cohort definition

Our cohort constitutes T2D patients treated for diabetes with metformin only after a diabetes diagnosis. We identified those from drug prescriptions with the fifth level Anatomical Therapeutic Chemical (ATC)[38] code of “A10BA02”. We defined the first metformin prescription date for every patient as index date, yielding a single unique date per individual by which all other dates were measured. We identified faulty metformin prescriptions consisting of more than three pills per day, and removed information from these prescriptions. We removed from our cohort individuals where the first metformin prescription was faulty. To establish glycaemic response to metformin we used HbA1c% blood concentration before and after metformin treatment initiation (Supplementary Fig. 1d). We defined baseline (pretreatment) HbA1c% as the latest test occurring 90 days prior to 14 days post index date. This interval was chosen in order to ensure a balance between measurements being within a red blood cell life cycle and metformin’s onset of action, which is within 2 weeks[39]. To ensure stability of results, we estimate heritability on several baseline time intervals for the entire cohort (Supplementary Table 2). We define the on-treatment HbA1c% as the closest test to the index date that is at least 90 days from both index date and baseline HbA1c% date, indicating hemoglobin turning rate. We discarded on-treatment HbA1c% tests later than 180 days from index date, as those are confounded by unmeasured variables. We defined the study participation period as the time from index date or baseline measurement date, whichever preceded, until the on-treatment measurement date. We ensured measuring the effect of metformin and eliminated cases of initial non-adherence by further screening patients who were treated throughout the entire study participation period[40]. We removed all patients who stopped metformin treatment before on-treatment HbA1c% test or who started taking metformin before being diagnosed with T2D, the majority of which were prescribed metformin while already diagnosed as pre-diabetic. We also exclude all patients who are prescribed any other anti-diabetic medication (ATC level 2 code of ‘A10’) apart from metformin to ensure the effect on HbA1c% levels can be attributed solely to metformin. We further removed all patients who were diagnosed with type 1 diabetes according to ICD-9 codes (Supplementary Table 3). In addition, we excluded individuals with abnormal estimated Glomerular Filtration Rate (eGFR) who should not be treated according to medical guidelines[41]. GFR is estimated using creatinine blood tests and reflects renal clearance and total clearance, which after oral administration of metformin decrease approximately in proportion to it[42] (Table 1).

Table 1

Inclusion exclusion criteria.

Inclusion	Exclusion
Type 2 diabetic treated with metformin only after diagnosis.	Abnormal estimated glomerular filtration rate (eGFR<30 mL/min/1.73 m²)
Baseline HbA1c% test exists 90 days prior to 30 days after first metformin prescription	Treated for diabetes with non-metformin drugs (ATC2 is “A10”)
On-treatment HbA1c% test exists 90–180 days after first metformin prescription	Baseline HbA1c% and on-treatment HbA1c% tests <90 days apart

Inclusion exclusion criteria.

Glycaemic response outcomes

We defined three phenotypes commonly used in metformin pharmacogenetics studies for measuring the response to metformin; absolute, proportional, and adjusted reduction in HbA1c%[16]. These were induced from the difference between the baseline and the on-treatment HbA1c% tests. The absolute reduction was defined as the absolute difference between on-treatment and baseline HbA1c%, proportional reduction was defined as the absolute reduction divided by the baseline HbA1c%. We trained a linear model to predict absolute reduction from pretreatment HbA1c% measurement, the number of days between pretreatment and on-treatment HbA1c% measurement dates and average metformin dose during the study (see further explanation below). The adjusted reduction was defined as the residuals from the linear model’s predicted phenotype to the true absolute reduction values. Since Linear Mixed Models assume normal distribution we performed the Kolmogorov–Smirnov goodness of fit test for all three phenotypes[43,44].

Height outcome

Being that the heritability estimate of height is well established and agreed upon in the literature; we used it as a positive control to validate our methods and data. We gathered height measurements recorded at adulthood (age 18 years). For patients who had multiple measurements, we considered the latest measurement only. We removed outlier measurements where Z score>4.

Heritability estimation

We computed heritability with the Sci-LMM Python package, which constructs and works with large-scale relationships matrices and fits them to the corresponding LMM within several hours. Our Identity By Descent (IBD) matrix (an identity-by-descent relationships-based matrix) was the ARM computed from the entire pedigree[28]. We used Haseman-Elston regression to compute the heritability measure , and we estimate the standard error via the average information restricted maximum likelihood (AI-REML) procedure[45,46]. We constructed the following features used either as covariates for our regression model or as means of subsampling the cohort: Demographics: Year of birth Age at index date Gender BMI: note that since is considered heritable we did not use it as a covariate in our regression. Measurements’ metadata: Baseline to index gap: number of days between baseline date to index date Index to on-treatment gap: number of days between index date and on-treatment date Baseline to on-treatment gap: number of months between on-treatment date and baseline date. Note that due to co-linearity with the two previous covariates, this covariate was not in use. Number of HbA1c% tests: the absolute number of HbA1c% tests performed up until the on-treatment date Lab test measurements: Estimated glomerular filtration rate (eGFR): We used MDRD GFR Equation:[47] where value is multiplied by 0.742 for females. Baseline HbA1c% Treatment metadata: Average dosage: weighted average of metformin doses where is the number of pills per day prescribed in prescription , and is the number of pills in prescription . Only issued prescriptions were accounted for. Adherence: since adherence is not reported, we capture it through four features representing the average number of days on metformin in four equal consecutive time intervals between index date and on-treatment date. We assumed that all dispensed prescriptions were also consumed by patients. In order to identify environmental variance, we computed the explained variance from only the covariates. We trained a linear regressor from covariates predicting absolute reduction. We had then computed the Pearson’s correlation of predicted and true reduction as well as the score.

Predicting outcome

We assessed the predictive potential family history could give to treatments, we predicted responses to metformin from both covariates and family information. We constructed family history features for each individual by computing mean absolute reduction from relatives. We computed four features considering either all relatives or only relatives from the same gender as the individuals, and taking only first-degree relatives or all available relatives. We predicted on-treatment HbA1c% for the entire cohort with the above-mentioned covariates, excluding adherence, as it is only available while on-treatment. Predictions were performed using XGBoost regression with 100-fold cross-validation and n_estimators=20[48]. We computed the mean squared error (MSE) on predicted outcome for the entire cohort as well as for only individuals who have any relatives within the cohort. We also predicted for these individuals the outcome using both covariates and family history features.

Statistics and reproducibility

All statistics were performed using Python 3.7 software. Statistical significance was determined by mentioned unpaired tests using Scipy 1.3.1. Experiments are reproducible with existing EHR records; however, this work was performed on data from Clalit Health Services which is not publicly available. Sample sizes are defined within the work.

Table 2

Baseline characteristics of the study cohort.

	Gender	Mean	STD	Median	FDR-corrected P value*	Availability, %
Demographics
Age, years	F	61.65	12.38	62	2E-89	100
Age, years	M	59.9	12.38	60	2E-89	100
BMI, Kg/m²	F	32.41	7.02	31.54	<1E-350	82.95
BMI, Kg/m²	M	30.36	6.04	29.65	<1E-350	83.17
Gender (is male)	Joint	0.49	0.5	0		100
Year of birth	F	1948.78	13.15	1949	1E-106	100
Year of birth	M	1950.82	13.18	1951		100
Lab tests measurements
Baseline HbA1c%	F	7.34	1.34	7	4E-290	100
Baseline HbA1c%	M	7.73	1.64	7.2	4E-290	100
eGFR**	Joint	93.54	24.5	90.95		87.47
Measurements’ metadata
Number of HbA1c% measurements	F	6.01	4.54	5	2E-27	100
Number of HbA1c% measurements	M	5.67	4.39	4	2E-27	100
Base to index gap, days	F	18.93	21.55	11	1E-42	100
Base to index gap, days	M	16.89	20.69	9	1E-42	100
Base to on-treatment gap, days	F	150.09	34.69	147	1E-29	100
Base to on-treatment gap, days	M	147.33	34.25	145	1E-29	100
Index to on-treatment gap, days	F	131.16	26.53	129	0.0002	100
Index to on-treatment gap, days	M	130.45	26.56	128	0.0002	100
Treatment metadata
Adherence 1, %	F	92.84	12.06	100	0.002	100
Adherence 1, %	M	93.11	11.92	100	0.002	100
Adherence 2, %	Joint	58.24	38.84	70.73	0.02	100
Adherence 3, %	F	55.69	40.4	69.05	0.02	100
Adherence 3, %	M	55.01	40.29	66.67		100
Adherence 4, %	F	53.9	40.56	65.52	0.0002	100
Adherence 4, %	M	52.8	40.42	62.5	0.0002	100
Average dose, Mg	F	1.32	0.52	1	2E-107	100
Average dose, Mg	M	1.4	0.56	1	2E-107	100

*P value for in ttest comparing male and female distributions. Appears only when the difference in distributions is significant.

**eGFR is in units defined in the formula above.

Availability of “Average dose while taking” and “Adherence” was calculated through the percent of available prescriptions, which were used to generate these features.

Table 3

Cohort Phenotypes Statistics.

	Gender	Mean	Standard deviation	Median	FDR-corrected P value
Height, m	F	1.62	0.07	1.62	<E-350
	M	1.75	0.07	1.75	<E-350
	Joint	1.68	0.10	1.68
HbA1c% absolute reduction	F	0.67	1.20	0.40	7E-308
	M	1.04	1.57	0.60	7E-308
	Joint	0.85	1.40	0.50
HbA1c% adjusted reduction	F	−0.16	0.92	−0.21	5E-121
	M	0.01	1.12	−0.09	5E-121
	Joint	−0.08	1.03	−0.15
HbA1c% proportional reduction	F	0.08	0.12	0.06	1E-303
	M	0.11	0.15	0.08	1E-303
	Joint	0.09	0.14	0.07

Table 4

Heritability estimates of metformin responses.

Cut	HbA1c absolute reduction		HbA1c-adjusted reduction		HbA1c proportional reduction		Number of patients
Cut	h²	CI	h²	CI	h²	CI	Number of patients
All cohort	0.126	[0.061, 0.191]	0.126	[0.061, 0.191]	0.138	[0.073, 0.203]	80,788
Male	0.21	[0.078, 0.343]	0.21	[0.078, 0.343]	0.232	[0.099, 0.364]	39,335
Female	0.229	[0.1, 0.357]	0.229	[0.1, 0.357]	0.223	[0.094, 0.351]	41,453

h2 estimates and their confidence intervals of our cohort including different subgroups.

49 in total

1. GCTA: a tool for genome-wide complex trait analysis.

Authors: Jian Yang; S Hong Lee; Michael E Goddard; Peter M Visscher
Journal: Am J Hum Genet Date: 2010-12-17 Impact factor: 11.025

2. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies.

Authors: Brendan K Bulik-Sullivan; Po-Ru Loh; Hilary K Finucane; Stephan Ripke; Jian Yang; Nick Patterson; Mark J Daly; Alkes L Price; Benjamin M Neale
Journal: Nat Genet Date: 2015-02-02 Impact factor: 38.330

3. Prediction of gestational diabetes based on nationwide electronic health records.

Authors: Nitzan Shalom Artzi; Smadar Shilo; Eran Hadar; Hagai Rossman; Shiri Barbash-Hazan; Avi Ben-Haroush; Ran D Balicer; Becca Feldman; Arnon Wiznitzer; Eran Segal
Journal: Nat Med Date: 2020-01-13 Impact factor: 53.440

4. The International Classification of Diseases: ninth revision (ICD-9)

Authors: V N Slee
Journal: Ann Intern Med Date: 1978-03 Impact factor: 25.391

5. Quantitative analysis of population-scale family trees with millions of relatives.

Authors: Joanna Kaplanis; Assaf Gordon; Tal Shor; Omer Weissbrod; Dan Geiger; Mary Wahl; Michael Gershovits; Barak Markus; Mona Sheikh; Melissa Gymrek; Gaurav Bhatia; Daniel G MacArthur; Alkes L Price; Yaniv Erlich
Journal: Science Date: 2018-03-01 Impact factor: 47.728

6. Common SNPs explain a large proportion of the heritability for human height.

Authors: Jian Yang; Beben Benyamin; Brian P McEvoy; Scott Gordon; Anjali K Henders; Dale R Nyholt; Pamela A Madden; Andrew C Heath; Nicholas G Martin; Grant W Montgomery; Michael E Goddard; Peter M Visscher
Journal: Nat Genet Date: 2010-06-20 Impact factor: 38.330

Review 7. Pharmacogenomics: the genetics of variable drug responses.

Authors: Dan M Roden; Russell A Wilke; Heyo K Kroemer; C Michael Stein
Journal: Circulation Date: 2011-04-19 Impact factor: 29.690

8. Association of Genetic and Environmental Factors With Autism in a 5-Country Cohort.

Authors: Dan Bai; Benjamin Hon Kei Yip; Gayle C Windham; Andre Sourander; Richard Francis; Rinat Yoffe; Emma Glasson; Behrang Mahjani; Auli Suominen; Helen Leonard; Mika Gissler; Joseph D Buxbaum; Kingsley Wong; Diana Schendel; Arad Kodesh; Michaeline Breshnahan; Stephen Z Levine; Erik T Parner; Stefan N Hansen; Christina Hultman; Abraham Reichenberg; Sven Sandin
Journal: JAMA Psychiatry Date: 2019-10-01 Impact factor: 21.596

9. A new equation to estimate glomerular filtration rate.

Authors: Andrew S Levey; Lesley A Stevens; Christopher H Schmid; Yaping Lucy Zhang; Alejandro F Castro; Harold I Feldman; John W Kusek; Paul Eggers; Frederick Van Lente; Tom Greene; Josef Coresh
Journal: Ann Intern Med Date: 2009-05-05 Impact factor: 25.391

10. Estimating the effects of second-line therapy for type 2 diabetes mellitus: retrospective cohort study.

Authors: Assaf Gottlieb; Chen Yanover; Amos Cahan; Yaara Goldschmidt
Journal: BMJ Open Diabetes Res Care Date: 2017-11-30