| Literature DB >> 34702360 |
Mohsen Ghanbari1, Manfred Kayser2, Silvana C E Maas3,4, Athina Vidaki4, Alexander Teumer5,6,7, Ricardo Costeira8, Rory Wilson9,10, Jenny van Dongen11, Marian Beekman12, Uwe Völker6,13, Hans J Grabe14, Sonja Kunze9,10, Karl-Heinz Ladwig10, Joyce B J van Meurs3,15, André G Uitterlinden3,15, Trudy Voortman3, Dorret I Boomsma11, P Eline Slagboom12, Diana van Heemst16, Carla J H van der Kallen17,18, Leonard H van den Berg19, Melanie Waldenberger9,10,20, Henry Völzke5,6, Annette Peters9,10,20,21, Jordana T Bell8, M Arfan Ikram3.
Abstract
BACKGROUND: Information on long-term alcohol consumption is relevant for medical and public health research, disease therapy, and other areas. Recently, DNA methylation-based inference of alcohol consumption from blood was reported with high accuracy, but these results were based on employing the same dataset for model training and testing, which can lead to accuracy overestimation. Moreover, only subsets of alcohol consumption categories were used, which makes it impossible to extrapolate such models to the general population. By using data from eight population-based European cohorts (N = 4677), we internally and externally validated the previously reported biomarkers and models for epigenetic inference of alcohol consumption from blood and developed new models comprising all data from all categories.Entities:
Keywords: Alcohol inference; Blood; DNA methylation; Epigenetics; Inference; Prediction
Mesh:
Substances:
Year: 2021 PMID: 34702360 PMCID: PMC8549335 DOI: 10.1186/s13148-021-01186-3
Source DB: PubMed Journal: Clin Epigenetics ISSN: 1868-7075 Impact factor: 6.551
Fig. 1Use of study populations in each analysis. The 363 alcohol-associated CpGs previously identified by Liu et al. were replicated using data from 2042 participants of five cohorts studies embedded within the BIOS consortium. An additional 841 participants from the KORA F4 study were combined with these 2042 participants and together comprise our model building dataset. The model building dataset was used to train the prediction models and to test the reproducibility of the prediction models via internal cross-validation. The transportability of the models was tested in the external validation phase based on 1794 participants from three cohorts that were independent from the data used for model building and internal validation. Abbreviations: CODAM, Cohort on Diabetes and Atherosclerosis Maastricht; KORA, Cooperative Health Research in the Region of Augsburg study; LLS, Leiden Longevity Study; NTR, Netherlands Twin Register; PAN, Prospective ALS Study Netherlands; RS, Rotterdam Study; SHIP-Trend, Study of Health in Pomerania-Trend; TwinsUK- The TwinsUK Study; TwinsUK2- Subset of the TwinsUK Study
Dataset characteristics used in model building, internal and external validation
| Study | N | Age (years), mean (SD) | Men (%) | BMI mean (SD) | Alcohol gr/day, Median (min, max) | Non-drinkers (%) | Light drinkers (%) | At-risk drinkers (%) | Heavy drinkers (%) |
|---|---|---|---|---|---|---|---|---|---|
| RS-II-3/III-2 | 611 | 67 (6) | 275 (45) | 27.8 (4) | 8.6 (1, 57) | 0 (0) | 545 (89) | 52 (9) | 14 (2) |
| CODAM | 159 | 66 (7) | 86 (54) | 28.9 (4) | 7.9 (0, 72) | 12 (8) | 117 (74) | 23 (14) | 7 (4) |
| NTR | 617 | 39 (14) | 188 (31) | 24.6 (4) | 5.1 (0, 69) | 195 (32) | 348 (56) | 44 (7) | 30 (5) |
| LLS | 491 | 58 (6) | 231 (47) | 25.3 (3) | 13.0 (0, 90) | 36 (7) | 309 (63) | 98 (20) | 49 (10) |
| PAN | 164 | 62 (9) | 100 (61) | 26.0 (4) | 11.0 (0, 77) | 1 (1) | 127 (77) | 20 (12) | 16 (10) |
| KORA F4 | 841 | 61 (9) | 415 (49) | 28.0 (5) | 7.6 (0, 150) | 251 (30) | 354 (42) | 133 (16) | 103 (12) |
| Total dataset | 2883 | 57 (14) | 1295 (45) | 26.7 (4) | 8.0 (0, 150) | 495 (17) | 1800 (62) | 370 (13) | 218 (8) |
| SHIP-Trend | 433 | 51 (14) | 205 (47) | 27.2 (4.1) | 3.6 (0, 82) | 47 (11) | 346 (80) | 28 (6) | 12 (3) |
| TwinsUK | 713 | 58 (10) | 0 (0) | 26.7 (5) | 2.3 (0, 101) | 187 (26) | 423 (59) | 67 (9) | 36 (5) |
| TwinsUK2 | 442 | 59 (9) | 0 (0) | 26.6 (5) | 5.3 (0, 94) | 36 (8) | 311 (70) | 46 (10) | 49 (11) |
| RS-III-1 | 648 | 59.6 (8) | 298 (46) | 27.7 (5) | 6.4 (0, 57) | 64 (10) | 495 (76) | 79 (12) | 10 (2) |
The total model building dataset was also used for internal ten-fold cross-validation. BMI, body mass index; CODAM, Cohort on Diabetes and Atherosclerosis Maastricht; KORA F4, The Cooperative Health Research in the Region of Augsburg study; LLS, Leiden Longevity Study; NTR, Netherlands Twin Register; PAN, Prospective ALS Study Netherlands; RS, Rotterdam Study; SD, standard deviation; SHIP, Study of Health in Pomerania-Trend cohort; TwinsUK, The TwinsUK Study; TwinsUK2, Subset of the TwinsUK Study. The alcohol categories were defined as; non-drinkers were defined as participants with no alcohol consumption; light drinkers with an alcohol consumption of 0 < g per day ⩽28 in men and 0 < g per day ⩽14 in women; and heavy drinkers with an alcohol consumption of ⩾42 g per day in men and ⩾28 g per day in women
Fig. 2Epigenetic inference of alcohol consumption from blood based on Liu et al. biomarkers and models. Prediction accuracy for alcohol consumption expressed as Area Under the Curve (AUC) for A heavy drinkers vs. non-drinkers and B heavy drinkers vs. light drinkers using the CpG marker sets from Liu et al. [12]. Data from participants who do not fit the inferred categories were excluded from the respective prediction models following the approach used by Liu et al. ‘Internal Validation’: Mean AUC and SD from internal validation using ten-fold cross-validation in our model building dataset. ‘External Validation’: AUCs from external validation by applying our models trained in the model building dataset to independent data from three external validation cohorts (Rotterdam Study, N = 648; SHIP-Trend, N = 433; and TwinsUK, N = 713 and N = 442). Based on interview or self-reported information, non-drinkers were defined as participants with no alcohol consumption; light drinkers with an alcohol consumption of 0 < g per day ⩽28 in men and 0 < g per day ⩽14 in women; and heavy drinkers with an alcohol consumption of ⩾42 g per day in men and ⩾28 g per day in women. Abbreviations: RS- The Rotterdam Study; SHIP- Study of Health in Pomerania-Trend cohort; TwinsUK- The TwinsUK Study; TwinsUK2- Subset of the TwinsUK Study
Fig. 3Epigenetic inference of alcohol consumption from blood based on newly developed models including all categories. Prediction accuracy for alcohol consumption expressed as Area Under the Curve (AUC) for A heavy and at-risk drinkers vs. light and non-drinkers and B heavy, at-risk and light drinkers vs. non-drinkers. In these models, all available participants from all categories were included, in contrast to Fig. 2. ‘Internal Validation’: Mean AUC and SD from internal validation using ten-fold cross-validation in our model building data set. ‘External Validation’: AUCs from external validation by applying our model trained in the model building dataset to independent data from three external validation cohorts (Rotterdam Study, N = 648; SHIP-Trend, N = 433; and TwinsUK, N = 713 and N = 442). For phenotype definition, see legend of Fig. 2. Abbreviations: RS- The Rotterdam Study; SHIP- Study of Health in Pomerania-Trend cohort; TwinsUK- The TwinsUK Study; TwinsUK2- Subset of the TwinsUK Study