| Literature DB >> 25154404 |
Yohann Mansiaux1, Fabrice Carrat.
Abstract
BACKGROUND: Big data is steadily growing in epidemiology. We explored the performances of methods dedicated to big data analysis for detecting independent associations between exposures and a health outcome.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25154404 PMCID: PMC4146451 DOI: 10.1186/1471-2288-14-99
Source DB: PubMed Journal: BMC Med Res Methodol ISSN: 1471-2288 Impact factor: 4.615
Significant associations with RF, BRT, LASSO and UFMLR
| | | | | | | | | |
| Pre-seasonal HAI titer (log) | 0.004 | 0.003 | 0.001 | 0.001 | 0.001 | 0.002 | 0.002 | 0.001 |
| History of asthmab | 0.009 | 0.012 | 0.001 | 0.001 | 0.009 | 0.019 | | 0.004 |
| Professional activity involves contact with ill peopleb | 0.009 | 0.023 | 0.008 | | | | 0.001 | 0.001 |
| Age (years) | | 0.004 | 0.001 | 0.001 | | 0.001 | | |
| Daily frequency of hand washing (with soap or hand sanitizer) ≥ 5b | 0.014 | 0.023 | 0.019 | | | | | 0.008 |
| Always or often covers mouth while coughing or sneezingb | 0.033 | 0.023 | 0.003 | 0.001 | | | | |
| “Craftsman, shopkeeper, chief executive officer” (socio-professional group)b | | 0.036 | 0.006 | | | | 0.047 | |
| History of chemotherapyb | 0.048 | | 0.021 | | | | 0.027 | |
| Average living room temperature (°C) | | | 0.001 | 0.001 | | 0.031 | 0.047 | 0.002 |
| Presence of a dishwasher in the kitchenb | | | 0.002 | 0.001 | 0.048 | 0.007 | | 0.007 |
| Sex = maleb | 0.048 | | | | | | | |
| Professional activity is primarily outdoorsb | 0.045 | | | | | | | |
| Age < 15 yearsb | 0.016 | | | | | | | |
| Any respiratory diseaseb | 0.030 | | | | | | | |
| Number of children (<15 years) in the household (n) | 0.004 | | | | | | | |
| Body mass index (kg/m2) | | 0.032 | | | | | | |
| Proportion of inhabitants > 15 years without a diploma (in IRIS zonei) | | 0.024 | | | | | | |
| Proportion of habitations rented by inhabitants (in IRIS zonei) | | 0.023 | | | | | | |
| Proportion of habitations owned by inhabitants (in IRIS zonei) | | 0.049 | | | | | | |
| Habitation = houseb | | 0.016 | | | | | | |
| Duration of contacts with subjects aged between 60 and 99 years (log (min)) | | | 0.006 | | | | 0.023 | 0.019 |
| Number of subjects in the household (n) | | | 0.002 | 0.002 | | | | |
| Kitchen surface area per subject (m2) | | | 0.005 | 0.002 | | | | |
| Cardiac arrhythmiab | | | 0.021 | | | | | |
| History of radiotherapyb | | | 0.016 | | | | | |
| Daily consumption of green tea (n) | | | 0.040 | | | | | |
| Number of birds inside habitation (n) | | | 0.030 | | | | | |
| Number of rooms per subject in habitation (n) | | | 0.009 | | | | | |
| Number of children in the bedroom (n) | | | 0.010 | | | | | |
| Bedroom windows face: gardenb | | | 0.010 | | | | | |
| Duration of contacts at home (log (min)) | | | 0.029 | | | | | |
| Longitude of the habitation (degrees) | | | 0.027 | | | | | |
| Latitude of the habitation (degrees) | | | 0.028 | | | | | |
| Proportion of “farmer, primary sector” (socio-professional group) (in IRIS zonei) | | | 0.004 | | | | | 0.005 |
| Kitchen filtration of areab | | | | | | | 0.021 | 0.034 |
| Tiles flooring in the kitchenb | | | | | | | 0.015 | 0.029 |
| Agricultural land near habitationb | 0.048 |
binary covariate. iIRIS zones are statistical block groups of about 2000 inhabitants defined by the French Institut National de la Statistique et des Etudes Economiques (INSEE).
Performances of RF, BRT, LASSO and UFMLR in the 500 simulated datasets
| | | | | | | | | | |
| True Positive Rates (TPR) | 8 | 85% (55% - 100%) | 80% (51% - 100%) | 78% (52% - 100%) | 71% (41% - 100%) | 28% (3% - 54%) | 45% (20% - 70%) | 26% (0% - 65%) | 49% (15% - 84%) |
| 4 | 86% (49% - 100%) | 80% (41% - 100%) | 77% (40% - 100%) | 69% (26% - 100%) | 24% (0% - 63%) | 41% (0% - 83%) | 24% (0% - 72%) | 46% (0% - 96%) | |
| 4 | 84% (46% - 100%) | 79% (41% - 100%) | 79% (41% - 100%) | 73% (32% - 100%) | 32% (0% - 74%) | 50% (6% - 93%) | 28% (0% - 76%) | 53% (5% - 100%) | |
| 4 | 90% (55% - 100%) | 82% (47% - 100%) | 82% (49% - 100%) | 77% (41% - 100%) | 35% (0% - 74%) | 49% (15% - 84%) | 29% (0% - 76%) | 55% (14% - 95%) | |
| 4 | 80% (34% - 100%) | 78% (35% - 100%) | 74% (34% - 100%) | 64% (20% - 100%) | 22% (0% - 57%) | 41% (8% - 74%) | 23% (0% - 69%) | 44% (0% - 90%) | |
| False Positive Rates (FPR) | 292 | 4% (1% - 6%) | 4% (2% - 6%) | 9% (0% - 17%) | 4% (0% - 9%) | 2% (1% - 4%) | 3% (1% - 5%) | 4% (0% - 12%) | 9% (0% - 18%) |
| 12 | 46% (7% - 85%) | 33% (3% - 63%) | 23% (0% - 48%) | 18% (0% - 41%) | 4% (0% - 15%) | 7% (0% - 22%) | 9% (0% - 39%) | 19% (0% - 60%) | |
| 280 | 2% (0% - 4%) | 3% (1% - 5%) | 8% (0% - 17%) | 3% (0% - 8%) | 2% (1% - 4%) | 3% (1% - 4%) | 4% (0% - 11%) | 8% (0% - 16%) | |
| 146 | 3% (0% - 6%) | 3% (1% - 6%) | 9% (0% - 18%) | 4% (0% - 9%) | 2% (0% - 5%) | 3% (0% - 6%) | 5% (0% - 13%) | 9% (0% - 18%) | |
| 146 | 4% (0% - 8%) | 5% (2% - 9%) | 9% (0% - 18%) | 3% (0% - 9%) | 2% (0% - 5%) | 3% (0% - 5%) | 4% (0% - 12%) | 9% (0% - 18%) |
Performances are shown as: Mean (95% confidence interval).
Figure 1Cumulative distribution curves of the True Positive Rates in the 500 simulated datasets. y-axis shows the proportion of simulated datasets with True Positive Rates above or equal to the True Positive Rates on the x-axis.
Figure 2Cumulative distribution curves of the False Positive Rates in the 500 simulated datasets. y-axis shows the proportion of simulated datasets with False Positive Rates above or equal to the False Positive Rates on the x-axis.