| Literature DB >> 20920953 |
Michail Papathomas1, John Molitor, Sylvia Richardson, Elio Riboli, Paolo Vineis.
Abstract
BACKGROUND: Profile regression is a Bayesian statistical approach designed for investigating the joint effect of multiple risk factors. It reduces dimensionality by using as its main unit of inference the exposure profiles of the subjects that is, the sequence of covariate values that correspond to each subject.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20920953 PMCID: PMC3018505 DOI: 10.1289/ehp.1002118
Source DB: PubMed Journal: Environ Health Perspect ISSN: 0091-6765 Impact factor: 9.031
Risk factors included in the profile regression analysis
| Risk factor | Categories |
|---|---|
| Exposure to pollution due to heavy traffic | 0, subject does not live on a main road |
| 1, subject lives on a main road | |
| Exposure to PM10 | 0, < 30 μg/m3 |
| 1, 30–40 μg/m3 | |
| 2, > 40 to 50 μg/m3 | |
| 3, > 50 μg/m3 | |
| Exposure to NO2 | 0, < 30 μg/m3 |
| 1, 30–40 μg/m3 | |
| 2, > 40 μg/m3 | |
| Physical activity at work | 0, sedentary occupation |
| 1, standing occupation | |
| 2, manual work | |
| 3, heavy manual work | |
| Physical activity at leisure | 0, 1, 2, with increasing activity from 0 to 2 |
| BMI | 0, normal weight |
| 1, overweight | |
| 2, obese | |
| Deletion polymorphism in | 0, wild type |
| 1, deletion polymorphism | |
| Polymorphism in the | 0, wild type |
| 1, heterozygous or homozygous variant | |
| Information on bulky DNA adducts | 0, not detectable |
| 1, < median | |
| 2, > median | |
Figure 1Profile regression output: 829 subjects analyzed with average risk. Abbreviations: bmi, BMI; gstm1, GSTM1 gene; mainroad, residential proximity to a main road; no2, exposure to NO2; pa-leis, physical activity at leisure; pa-work, physical activity at work; pm10, exposure to PM10; ralc, bulky DNA adducts; xrcc1, XRCC1 gene. For each covariate and each category, we provide the 95% credible interval for the difference between the probability φ(x) of attribute x in group k, and the corresponding average probability in the whole population. Credible intervals are presented as bars. Green indicates that zero is contained in the 95% credible interval; red (blue) indicates positive (negative) credible intervals that exclude zero.
MDR results (545 subjects)
| Model | Prediction accuracy | CVC |
|---|---|---|
| ralc | 0.50 | 7/10 |
| ralc, pm10 | 0.53 | 6/10 |
| paleis, pawork, no2 | 0.50 | 5/10 |
| paleis, pm10, ralc, bmi | 0.45 | 5/10 |
| paleis, ralc, bmi, no2, pawork | 0.48 | 6/10 |
| paleis, ralc, bmi, no2, pawork, x1 | 0.50 | 9/10 |
| paleis, ralc, bmi, no2, pawork, gstm1, x1 | 0.50 | 6/10 |
| paleis, ralc, bmi, no2, pawork, gstm1, x1, mainroad | 0.50 | 5/10 |
| paleis, ralc, bmi, no2, pawork, gstm1, x1, mainroad, pm10 | 0.50 | 10/10 |
Abbreviations: bmi, BMI: gstm1, GSTM1 gene; mainroad, residential proximity to a main road; no2, exposure to NO2; paleis, physical activity at leisure; pawork, physical activity at work; pm10, exposure to PM10; ralc, bulky DNA adducts; x1, XRCC1 gene.
An estimate of the predictive ability of the corresponding p-factor combination, produced with 10-fold cross-validation.
Number of times a specific p-factor combination was identified as best in the 10 testing sets during the 10-fold cross-validation procedure.
Figure 2MDR graphical representation of how PM10 and relative adduct labeling combine to affect risk, derived with the standard MDR open-source software. Darker shading indicates combinations where the ratio of controls to cases is higher than 0.1156, the average control:case ratio in the sample.
Figure 3Classification tree using the “gini” impurity criterion. Pruning is done with 5-fold cross-validation. The average risk in the sample is 0.1156. n denotes the number of subjects corresponding to a terminal node. Values for P(cancer) indicate the average probability that a member of the subgroup will be a case.