| Literature DB >> 20233412 |
John H Warner1, Qiwei Liang, Mohamadi Sarkar, Paul E Mendes, Hans J Roethig.
Abstract
BACKGROUND: This article describes the data mining analysis of a clinical exposure study of 3585 adult smokers and 1077 nonsmokers. The analysis focused on developing models for four biomarkers of potential harm (BOPH): white blood cell count (WBC), 24 h urine 8-epi-prostaglandin F2alpha (EPI8), 24 h urine 11-dehydro-thromboxane B2 (DEH11), and high-density lipoprotein cholesterol (HDL).Entities:
Mesh:
Substances:
Year: 2010 PMID: 20233412 PMCID: PMC2846953 DOI: 10.1186/1471-2288-10-19
Source DB: PubMed Journal: BMC Med Res Methodol ISSN: 1471-2288 Impact factor: 4.615
Overview of variable group (and number of variables) that appear in the data mining data set
| Group | Variables |
|---|---|
| BOPH (4) | DEH11, EPI8, HDL, WBC |
| Special interest variables: BOE (9): | HPMA3, DHBMA, MHBMA, NICEQ, COTIN, OHP, TOTNN, ABP, COHB |
| Special interest variables: cumulative effects (2) | "AGE", "SMKYRS" (years smoked, excluded from analysis of non-smokers) |
| Special interest variables: stratification (1) | Smoking Status |
| Exposure variables (4) | Measures of exposure to exhaust and chemicals, from questionnaire |
| Exposure variables (non-smokers, 8) | Measures of exposure to secondary smoke, from questionnaire |
| Exposure variables (smokers, 19) | Measures of exposure to tobacco, including number of cigarettes smoked, tar and nicotine content, presence of menthol, included only in the analysis for smokers |
| Demographics (13) | Weight, gender, race, geographical location, income, etc. |
| Vital signs (5) | Respiratory rate, temperature, blood pressure, pulse |
| Clinical measures (2) | Measures of respiratory capacity, FVC, FEV1 |
| General health (20) | General health questions, from questionnaire |
| Lab values (22) | Clinical chemistry laboratory values |
| Creatinine clearance (1) | CRCL: 24 h urine creatinine/plasma creatinine |
| Lab value flags (6) | Lab value flags |
| Medical history indicators (15) | Medical history findings broken down into 15 categories |
| Concomitant medications (61) | Concomitant medications broken down into 61 categories |
Variables appearing in the final models, names and abbreviations
| Names | Abbreviations | Units/category names |
|---|---|---|
| 24 h urine 11-Dehydro-thromboxane B2 | DEH11 | ng/24 hr urine |
| 24 h urine 8- | EPI8 | ng/24 hr urine |
| High density lipoprotein cholesterol | HDL | ng/dL |
| White Blood Cell Count | WBC | ×10^3/uL |
| 24 h urine Nicotine equivalents | NICEQ | mg/24 hr urine |
| Serum Cotinine | COTIN | ng/mL |
| 24 h urine total 1-hydroxypyrene | OHP | ng/24 hr urine |
| 24 h urine total 4-(methylnitrosamino)-1-(3-pyridyl)-1-butanol (total NNAL) | TOTNN | ng/24 hr urine |
| Creatinine clearance | CRCL | dL/day |
| 24 h urine creatinine | UCRCAL | mg/24 hr urine |
| Aspartate aminotransferase | AST | U/L |
| Alkaline phosphatase | ALKPH | U/L |
| Serum Triglycerides | TRIG | mg/dL |
| Platelet count | PLATE | ×10^3/uL |
| High-sensitivity C-reactive protein | CRP | mg/L |
| Hemoglobin | HGB | g/dL |
| Age | AGE | Yrs |
| Weight | WEIGHTK | kg |
| Vitamin supplements | VITIMINS | yes, no |
| Alcohol consumption | DRINK | > = once/day, >once/week, once/week, <once/week, no |
| Non-steroidal antiinflammatory drugs | NSAID | no, yes |
| Sex | SEX | "female", "male" |
| Race | RACE | other, native Amer, multi-racial, Caucasian, Asian, black |
| If greater than | IFGT | |
| If less than | IFLT | |
| Equal | EQ |
Parameter estimates in models for DEH11 and EPI8 for all subjects (analysis data set)
| a) Model for DEH11 | R | |||
|---|---|---|---|---|
| Intercept | -4045.3022 | 752.258 | -5.38 | <0.001 |
| COTIN.IFGT.11 | 1.4494 | 0.123 | 11.78 | <0.001 |
| UCRCAL.IFLT.3036 | 0.3531 | 0.0484 | 7.30 | <0.001 |
| CRCL.IFLT.4325 | 0.3698 | 0.0391 | 9.45 | <0.001 |
| AST.IFGT.25 | 17.9064 | 1.2832 | 13.95 | <0.001 |
| AST.IFGT.126 | -18.2428 | 2.3656 | -7.71 | <0.001 |
| ALKPH.IFLT.184 | 3.1360 | 0.6583 | 4.76 | <0.001 |
| ALKPH.IFGT.184 | 30.7979 | 3.6627 | 8.41 | <0.001 |
| NSAID.EQ.YES | -346.0560 | 38.2528 | -9.05 | <0.001 |
| VITAMIN.EQ.YES | -211.7183 | 29.8500 | -7.09 | <0.001 |
| Intercept | 1383.0200 | 574.483 | 2.41 | 0.016 |
| TOTNN.IFLT.57 | 5.1813 | 0.7683 | 6.74 | <0.001 |
| TOTNN.IFGT.57 | 0.5548 | 0.0593 | 9.35 | <0.001 |
| TOTNN.IFGT.1452 | -1.4595 | 0.3895 | -3.75 | 0.000 |
| OHP.IFLT.473 | 0.8552 | 0.1212 | 7.05 | <0.001 |
| CRCL | 0.6268 | 0.0262 | 23.93 | <0.001 |
| AST.IFGT.22 | -135.4800 | 30.5048 | -4.44 | <0.001 |
| AST.IFGT.24 | 247.4960 | 59.5308 | 4.16 | <0.001 |
| AST.IFGT.26 | -108.4000 | 33.1568 | -3.27 | 0.001 |
| AST.IFLT.106 | 12.2512 | 2.2425 | 5.46 | <0.001 |
| WEIGHTK | 6.2845 | 0.7274 | 8.64 | <0.001 |
| VITAMIN.EQ.YES | -250.7700 | 28.4736 | -8.81 | <0.001 |
Parameter estimates in models for WBC and HDL for all subjects (analysis data set)
| a) Model for WBC | R | |||
|---|---|---|---|---|
| Intercept | -6.4951 | 1.4436 | -4.50 | <0.001 |
| TOTNN.IFGT.51 | -0.0055 | 0.0026 | -2.17 | 0.030 |
| TOTNN.IFLT.471 | 0.0078 | 0.0023 | 3.36 | 0.001 |
| TOTNN.IFGT.471 | 0.0065 | 0.0026 | 2.49 | 0.013 |
| CRP.IFLT.2 | 0.3426 | 0.0548 | 6.25 | <0.001 |
| CRP.IFGT.2 | 0.0715 | 0.0096 | 7.48 | <0.001 |
| CRP.IFGT.20 | -0.0477 | 0.0169 | -2.82 | 0.005 |
| PLATE.IFLT.245 | 0.0106 | 0.0014 | 7.69 | <0.001 |
| PLATE.IFGT.245 | 0.007 | 0.0007 | 10.64 | <0.001 |
| HGB.IFLT.14 | 0.3242 | 0.0477 | 6.80 | <0.001 |
| TRIG.IFLT.171 | 0.0051 | 0.0008 | 6.32 | <0.001 |
| TRIG.IFGT.503 | 0.0025 | 0.0008 | 3.17 | 0.002 |
| RACE.EQ.BLACK | -0.679 | 0.0914 | -7.42 | <0.001 |
| Intercept | 55.4292 | 2.948 | 18.80 | <0.001 |
| COTIN.IFGT.31 | -0.0291 | 0.0036 | -7.97 | <0.001 |
| COTIN.IFGT.193 | 0.0258 | 0.0069 | 3.75 | 0.000 |
| TRIG.IFGT.52 | -0.5819 | 0.0786 | -7.40 | <0.001 |
| TRIG.IFGT.64 | 0.483 | 0.0816 | 5.92 | <0.001 |
| TRIG.IFGT.177 | 0.0753 | 0.0086 | 8.76 | <0.001 |
| WEIGHT.IFLT.76 | -0.267 | 0.032 | -8.34 | <0.001 |
| WEIGHT.IFGT.76 | -0.0834 | 0.0157 | -5.30 | <0.001 |
| AGE.IFLT.55 | 0.3119 | 0.0197 | 15.83 | <0.001 |
| SEX.EQ.F | 6.72 | 0.4723 | 14.23 | <0.001 |
| DRINK.EQ.Y | 2.0567 | 0.5099 | 4.03 | 0.0001 |
| DRINK.GT.1.PER.WK | 7.875 | 0.5393 | 14.60 | <0.001 |
Figure 1Variable importance plots for DEH11 and EPI8 for All Subjects (Analysis data set).
Figure 2Variable importance plots for HDL and WBC for All Subjects (Analysis data set).
Summary of model fit statistics across models for all subjects
| Model | Random Forest R | MARS GCV R | Linear Regression R | Linear Regression R | Linear Regression R |
|---|---|---|---|---|---|
| WBC | 0.28 | 0.27 | 0.29 | 0.29 | 0.31 |
| EPI8 | 0.40 | 0.42 | 0.41 | 0.35 | 0.38 |
| DEH11 | 0.28 | 0.28 | 0.29 | 0.25 | 0.27 |
| HDL | 0.40 | 0.37 | 0.39 | 0.40 | 0.41 |