| Literature DB >> 29088280 |
Bienvenue Kouwaye1,2,3, Fabrice Rossi1, Noël Fonton2,3, André Garcia4,5,6, Simplice Dossou-Gbété7, Mahouton Norbert Hounkonnou2, Gilles Cottrell4,5,6.
Abstract
Recent studies have highlighted the importance of local environmental factors to determine the fine-scale heterogeneity of malaria transmission and exposure to the vector. In this work, we compare a classical GLM model with backward selection with different versions of an automatic LASSO-based algorithm with 2-level cross-validation aiming to build a predictive model of the space and time dependent individual exposure to the malaria vector, using entomological and environmental data from a cohort study in Benin. Although the GLM can outperform the LASSO model with appropriate engineering, the best model in terms of predictive power was found to be the LASSO-based model. Our approach can be adapted to different topics and may therefore be helpful to address prediction issues in other health sciences domains.Entities:
Mesh:
Year: 2017 PMID: 29088280 PMCID: PMC5663424 DOI: 10.1371/journal.pone.0187234
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Description of original variables.
| Variable | Nature | Number of modalities | Modalities |
|---|---|---|---|
| Repellent | Non-numeric | 2 | Yes/ No |
| Bed-net | Non-numeric | 2 | Yes/ No |
| Type of roof | Non-numeric | 2 | Sheet metal/ Straw |
| Ustensils | Non-numeric | 2 | Yes/ No |
| Presence of constructions | Non-numeric | 2 | Yes/ No |
| Type of soil | Non-numeric | 2 | Humid/ Dry |
| Water course | Non-numeric | 2 | Yes/ No |
| Season | Non-numeric | 4 | 1/2/3/4 |
| Village | Non-numeric | 9 | |
| House | Non-numeric | 41 | |
| Rainy days before mission | Numeric | Discrete | 0/2/⋯/9 |
| Rainy days during mission | Numeric | Discrete | 0/1/⋯/3 |
| Fragmentation Index | Numeric | Discrete | 26/⋯/71 |
| Openings | Numeric | Discrete | 1/⋯/5 |
| Number of inhabitants | Numeric | Discrete | 1/⋯/8 |
| Mean rainfall | Numeric | Quantitative | 0/⋯/82 |
| Vegetation | Numeric | Quantitative | 115.2/⋯/ 159.5 |
| Total Mosquitoes | Numeric | Discrete | 0/⋯/481 |
| Total Anopheles | Numeric | Discrete | 0/⋯/87 |
| Anopheles infected | Numeric | Discrete | 0/⋯/9 |
Season: 1, beginning of dry Season; 2, end of rainy Season; 3 beginning of rainy Season; 4, end of dry Season.
Description of recoded variables.
Variables with star are recoded.
| Variable | Nature | Number of modalities | Modalities |
|---|---|---|---|
| Repellent | Non-numeric | 2 | Yes/ No |
| Bed-net | Non-numeric | 2 | Yes/ No |
| Type of roof | Non-numeric | 2 | Sheet metal/ Straw |
| Utensils | Non-numeric | 2 | Yes/ No |
| Presence of constructions | Non-numeric | 2 | Yes/ No |
| Type of soil | Non-numeric | 2 | Humid/ Dry |
| Water course | Non-numeric | 2 | Yes/ No |
| Season | Non-numeric | 4 | 1/2/3/4 |
| Village* | Non-numeric | 9 | |
| House* | Non-numeric | 41 | |
| Rainy days before mission* | Non-numeric | 3 | Quartile |
| Rainy days during mission | Numeric | Discrete | 0/1/⋯/3 |
| Fragmentation index* | Non-numeric | 4 | Quartile |
| Openings* | Non-numeric | 4 | Quartile |
| Nber of inhabitants* | Non-numeric | 3 | Quartile |
| Mean rainfall* | Non-numeric | 4 | Quartile |
| Vegetation* | Non-numeric | 4 | Quartile |
| Total Mosquitoes | Numeric | Discrete | 0/⋯/481 |
| Total Anopheles | Numeric | Discrete | 0/⋯/87 |
| Anopheles infected | Numeric | Discrete | 0/⋯/9 |
Season: 1, beginning of dry Season; 2, end of rainy Season; 3, beginning of rainy Season; 4, end of dry Season.
Summary of predictions for B-GLM, LDLM, and LDLS on original variables.
| Threshold | Strategy | Mean | Quadratic risk | Absolute risk |
|---|---|---|---|---|
| - | B-GLM | 3.75 | 62.20 | 3.81 |
| 100 | ||||
| LDLS | 3.74 | 54.50 | 3.62 | |
| 95 | LDLM | 3.74 | 72.01 | 4.42 |
| LDLS | 3.74 | 72.03 | 4.40 | |
| 90 | LDLM | 3.74 | 72.00 | 4.47 |
| LDLS | 3.75 | 72.01 | 4.42 | |
| 80 | LDLM | 3.75 | 74.00 | 4.71 |
| LDLS | 3.72 | 73.02 | 4.52 | |
| 75 | LDLM | 3.74 | 71.84 | 4.41 |
| LDLS | 3.74 | 72.00 | 4.31 |
Summary of predictions for B-GLM, LDLM, and LDLS on recoded variables.
| Threshold | Strategy | Mean | Quadratic risk | Absolute risk |
|---|---|---|---|---|
| B-GLM | 3.75 | 62.29 | 3.88 | |
| 100 | LDLM | 3.85 | 82.06 | 4.67 |
| LDLS | 3.76 | 74.08 | 4.76 | |
| 95 | LDLM | 3.84 | 81.06 | 4.61 |
| LDLS | 3.76 | 74.08 | 4.76 | |
| 90 | LDLM | 3.87 | 83.06 | 4.72 |
| LDLS | 3.75 | 75.07 | 4.86 | |
| 80 | LDLM | 3.87 | 84.06 | 4.81 |
| LDLS | 3.75 | 75.07 | 4.86 | |
| 75 | LDLM | 3.89 | 84.05 | 4.79 |
| LDLS | 3.77 | 75.56 | 4.85 |
Frequency of original stable covariables.
| Variable | Frequency for LDLM (%) | Frequency for LDLS (%) |
|---|---|---|
| Season | 100 | 100 |
| Mean rainfall: Openings | 100 | 80 |
| Rainy days before mission: Nbr of inhabitants | 100 | - |
| Rainy days during mission: Vegatation | 100 | 95 |
| Season: water course | 95 | - |
| Season: Type of Soil | 95 | - |
| Season: Village | 95 | - |
| Mean rainfall: Vegetation | 95 | - |
| Rainy days durin mission: Village | 90 | - |
| Season:Rainy days durin mission | 80 | - |
| Season: Repellent | 75 | - |
| Season: Presence of construction | 75 | - |
Number of original stable covariables for the strategies LDLM and LDLS.
| Threshold (%) | Number for LDLM | Number for LDLS |
|---|---|---|
| 100 | 4 | 1 |
| 95 | 8 | 2 |
| 90 | 9 | 2 |
| 80 | 10 | 3 |
| 75 | 12 | 3 |
Number of recoded stable covariables for the strategies LDLM and LDLS.
| Threshold (%) | Number for LDLM | Number for LDLS |
|---|---|---|
| 100 | 31 | 11 |
| 95 | 39 | 11 |
| 90 | 44 | 16 |
| 80 | 50 | 22 |
| 75 | 52 | 29 |
Fig 1Frequent variables.
The x-axis shows the variables including the interactions, and the y-axis shows the percentage of presence of the variables. The left figure corresponds to the LDLM strategy and the right figure corresponds to LDLS strategy. Each vertical band represents one variable.
Fig 2Frequent variables.
The x-axis shows the variables including the interactions, and the y-axis shows the percentage of presence of the variables. The left figure corresponds to the LDLM strategy and the right figure corresponds to LDLS strategy. Each vertical band represents one variable.
Fig 3Comparison between observed and predicted number of anopheles in eight houses.
The line with “⋆” is for observed values, the line with “o” is for B-GLM and the line with “+” is for LOLO-DCV.