| Literature DB >> 21931645 |
Cheikh Loucoubar1, Richard Paul, Avner Bar-Hen, Augustin Huret, Adama Tall, Cheikh Sokhna, Jean-François Trape, Alioune Badara Ly, Joseph Faye, Abdoulaye Badiane, Gaoussou Diakhaby, Fatoumata Diène Sarr, Aliou Diop, Anavaj Sakuntabhai, Jean-François Bureau.
Abstract
Complex, high-dimensional data sets pose significant analytical challenges in the post-genomic era. Such data sets are not exclusive to genetic analyses and are also pertinent to epidemiology. There has been considerable effort to develop hypothesis-free data mining and machine learning methodologies. However, current methodologies lack exhaustivity and general applicability. Here we use a novel non-parametric, non-euclidean data mining tool, HyperCube®, to explore exhaustively a complex epidemiological malaria data set by searching for over density of events in m-dimensional space. Hotspots of over density correspond to strings of variables, rules, that determine, in this case, the occurrence of Plasmodium falciparum clinical malaria episodes. The data set contained 46,837 outcome events from 1,653 individuals and 34 explanatory variables. The best predictive rule contained 1,689 events from 148 individuals and was defined as: individuals present during 1992-2003, aged 1-5 years old, having hemoglobin AA, and having had previous Plasmodium malariae malaria parasite infection ≤10 times. These individuals had 3.71 times more P. falciparum clinical malaria episodes than the general population. We validated the rule in two different cohorts. We compared and contrasted the HyperCube® rule with the rules using variables identified by both traditional statistical methods and non-parametric regression tree methods. In addition, we tried all possible sub-stratified quantitative variables. No other model with equal or greater representativity gave a higher Relative Risk. Although three of the four variables in the rule were intuitive, the effect of number of P. malariae episodes was not. HyperCube® efficiently sub-stratified quantitative variables to optimize the rule and was able to identify interactions among the variables, tasks not easy to perform using standard data mining methods. Search of local over density in m-dimensional space, explained by easily interpretable rules, is thus seemingly ideal for generating hypotheses for large datasets to unravel the complexity inherent in biological systems.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21931645 PMCID: PMC3170284 DOI: 10.1371/journal.pone.0024085
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
List of explanatory categorical variables.
| Categorical (nominal) Variables | No of levels |
| House | 67 (36 in Dielmo and 31 in Ndiop) |
| Independent Family | 36 (12 in Dielmo and 24 in Ndiop) |
| Sex | 2 |
| Hemoglobin Type | 7 (5 in Dielmo and 7 in Ndiop) |
| ABO blood group | 4 |
| G6PD Haplotype (on 4 SNPs: G6PD-376 | 11 |
| PMI | 2 |
| POI | 2 |
G6PD: Glucose-6-phosphate dehydrogenase, PMI: Plasmodium malariae infection, POI: Plasmodium ovale infection.
*: Position on the gene.
List of explanatory continuous variables.
| Continuous Variables | Mean | Median | Min | Max |
| Age | 21.35 (23.14 in Dielmo and 19.46 in Ndiop) | 15.90 (17.06 in Dielmo and 14.97 in Ndiop) | 0 | 97.88 (97.88 in Dielmo and 83.25 in Ndiop) |
| Mean genetic relatedness (Pedigree-based) | 0.012 (0.012 in Dielmo and 0.012 in Ndiop) | 0.011 (0.012 in Dielmo and 0.008 in Ndiop) | 0.001 | 0.041 (0.028 in Dielmo and 0.041 in Ndiop) |
| Mean genetic relatedness IBD | 0.008 (0.008 in Dielmo and 0.007 in Ndiop) | 0.007 (0.008 in Dielmo and 0.007 in Ndiop) | 0.002 | 0.029 (0.025 in Dielmo and 0.029 in Ndiop) |
| No. of previous PMI | 2.53 (4.10 in Dielmo and 0.82 in Ndiop) | 1 (1 in Dielmo and 0 in Ndiop) | 0 | 44 (44 in Dielmo and 9 in Ndiop) |
| Time since first PMI (year) | 6.07 (6.67 in Dielmo and 5.03 in Ndiop) | 5.25 (5.95 in Dielmo and4.32 in Ndiop) | 0 | 18.51 (18.51 in Dielmo and 15.25 in Ndiop) |
| No. of previous POI | 1.09 (1.33 in Dielmo and 0.83 in Ndiop) | 0 | 0 | 11 (11 in Dielmo and 10 in Ndiop) |
| Time since first POI (year) | 5.52 (6.20 in Dielmo and 4.72 in Ndiop) | 4.88 (5.55 in Dielmo and 4.25 in Ndiop) | 0 | 18.51 (18.51 in Dielmo and 15 in Ndiop) |
| Exposure (number of days present in the village) per trimester | 80.76 (81.65 in Dielmo and 79.87 in Ndiop) | 91 (91 in Dielmo and 90 in Ndiop) | 1 | 92 |
| Distance to animal enclosure (meters) | 322 in Dielmo and 147 in Ndiop | 271 in Dielmo and 139 in Ndiop | 1 in Dielmo and 2 in Ndiop | 765 in Dielmo and 393 in Ndiop |
| Distance to toilets (meters) | 326 in Dielmo and 149 in Ndiop | 280 in Dielmo and 143 in Ndiop | 1 in Dielmo and 2 in Ndiop | 774 in Dielmo and 401 in Ndiop |
| Distance to house's tree (meters) | 344 in Dielmo and 152 in Ndiop | 311 in Dielmo and 149 in Ndiop | 1 in Dielmo and 1 in Ndiop | 759 in Dielmo and 386 in Ndiop |
| Distance to wells (meters) | 365 in Dielmo and 195 in Ndiop | 453 in Dielmo and 174 in Ndiop | 17 in Dielmo and 17 in Ndiop | 719 in Dielmo and 483 in Ndiop |
| Distance to all (animals, toilets, house's tree, wells) together (meters) | 329 in Dielmo and 150 in Ndiop | 288 in Dielmo and 143 in Ndiop | 1 in Dielmo and 1 in Ndiop | 774 in Dielmo and 483 in Ndiop |
*IBD: Identity-By-Descent.
Parameters used and rules obtained from the HyperCube® analyses.
| Cohort | Total number of events | Learning Set | Validation Set |
|
|
| Time of run |
| Number of Total | Number of minimized | Number of validated | Number of replicated |
| Dielmo | 23,832 | 11,893 | 11,939 | 0.73 | 4.00 | 400 | 27 h | 67% | 4,853 | 52 | 51 | 51 |
| Ndiop | 23,005 | 11,530 | 11,475 | 0.74 | 3.49 | 400 | 23 h | 72% | 6,860 | 36 | 36 | 36 |
Purity: prevalence of events {PFA = 1} in the rule; Lift: Relative Risk of belonging to the rule compared to the total population; Size: number of events in the rule; Coverage: percentage of events {PFA = 1} in all rules found by HyperCube® compared to the total number of events {PFA = 1} in the whole dataset.
Figure 1Typical result from HyperCube®.
A) Table “Key Indicators” shows Lift: 1.39; Size: 1,689; Purity: 0.73. B) Graph showing comparative proportion of events within the rule and events in the entire population, pink: affected (PFA positive), green unaffected (PFA negative). Both pink and green bars would reach the horizontal red line if there was same proportion of positive PFA in the rule and in the entire population. C) Table “Rule space” shows marginal contribution of each variable to the lift. Loss: gives partial decreases of lift when removing each variable (or risk factor) from the rule; Coverage: percentage of events {PFA = 1} defined by the corresponding variable alone compared to the total number of events {PFA = 1} in the whole dataset; Size: increase of events in a rule when the constraint defined within a variable is cancelled or by dropping the variable. D) Graphs showing distribution (in blue) of each variable, and the range of values (in green) within the rule.
Multivariate analysis of risk factors associated with clinical P. falciparum malaria attacks in Dielmo using the HyperCube® rule.
| Parameters | DF | Estimate | SE | χ2 | Pr>χ2 | OR | Wald 95%CL | |||
| Intercept | 1 | −3.43 | 0.16 | 483.4 | <.0001 | - | - | - | ||
| Age group (years) | 1 to 5 | 1 | 0.38 | 0.28 | 1.8 | 0.178 | 1.46 | [0.84 | 2.53] | |
| Type of hemoglobin | AA | 1 | 0.38 | 0.07 | 27.8 | <.0001 | 1.46 | [1.27 | 1.68] | |
| Year | After 1991 and Before 2004 | 1 | 1.80 | 0.15 | 139.4 | <.0001 | 6.07 | [4.50 | 8.19] | |
| Number of previous | ≤10 | 1 | 0.80 | 0.15 | 29.4 | <.0001 | 2.23 | [1.67 | 2.97] | |
| Age group * | 1 to 5 | ≤10 | 1 | 1.62 | 0.27 | 36.5 | <.0001 | 5.06 | [2.99 | 8.56] |
| Age group* Year | 1 to 5 | Before 2004 | 1 | 0.77 | 0.10 | 55.8 | <.0001 | 2.15 | [1.76 | 2.63] |
|
| ≤10 | Before 2004 | 1 | −1.38 | 0.16 | 72.2 | <.0001 | 0.25 | [0.18 | 0.35] |
DF: degree of freedom; Estimate: effect of explanatory variable's levels on logit(Probability of {PFA = 1}); SE: standard error; χ2: chi-square DF = 1; OR: Odds ratio; CL: confidential level.
Number of positive/negative PFA events (P. falciparum malaria attacks) in subgroups of individuals in and out of the HyperCube® rule.
| PFA positive | No PFA | |
| In the | 1232 | 457 |
| Out of the | 7977 | 37171 |
| Total population | 9209 | 37628 |
Univariate logistic regression analysis of each categorical risk factor for clinical falciparum malaria (PFA) attacks in Dielmo.
| No of Person-trimesters | ||||||||
| N = 23832 | ||||||||
| PFA = 0 | PFA = 1 | Estimate (Std. Error) | Crude OR | Wald 95%CL |
| Global | ||
| N(%) = 19475 | N (%) = 4357 | |||||||
| Age group (years) | [0–0.4] | 303 (84.17) | 57 (15.83) | Ref. | 1 | |||
| [0.4–6.7] | 2344 (46.72) | 2673 (53.28) | 1.80 (0.15) | 6.06 | [4.54–8.09] | <.0001 | ||
| [6.7–8.12] | 692 (67.13) | 338 (32.82) | 0.95 (0.16) | 2.6 | [1.9–3.55] | <.0001 | <.0001 | |
| [8.12–13.6] | 2943 (81.28) | 678 (18.72) | 0.20 (0.15) | 1.22 | [0.91–1.65] | 0.1782 | ||
| ≥13.6 | 13138 (95.58) | 608 4.42) | −1.40 (0.15) | 0.25 | [0.18–0.33] | <.0001 | ||
| Missing data | 55 | 3 | - | - | - | - | ||
| Sex | Male | 9663 (80.77) | 2301 (19.23) | Ref. | 1 | |||
| Female | 9812 (82.68) | 2056 (17.32) | −0.13 (0.03) | 0.88 | [0.82–0.94] | - | <.0001 | |
| Blood group | O | 7597 (79.56) | 1952 (20.44) | Ref. | 1 | |||
| A | 5131 (83.65) | 1003 (16.35) | −0.27 (0.04) | 0.76 | [0.70–0.83] | <.0001 | ||
| AB | 920 (90.20) | 100 (9.80) | −0.86 (0.11) | 0.42 | [0.34–0.52] | <.0001 | <.0001 | |
| B | 4496 (82.40) | 960 (17.60) | −0.19 (0.04) | 0.83 | [0.76–0.91] | <.0001 | ||
| Missing data | 1331 | 342 | - | - | - | - | ||
| Type of hemoglobin | AA | 16304 (81.28) | 3756 (18.72) | Ref. | 1 | |||
| AC/AS/SS | 2007 (87.53) | 286 (12.47) | −0.48 (0.07) | 0.62 | [0.54–0.70] | <.0001 | ||
| Missing data | 5196 | 1438 | - | - | - | - | ||
| G6PD | Normal alleles | 6448 (84.0) | 1228 (16.0) | Ref. | 1 | |||
| Mutated allele | 7865 (82.30) | 1691 (17.70) | −0.12 (0.04) | 0.89 | [0.82–0.96] | 0.0032 | ||
| Missing data | 5162 | 1438 | - | - | - | - | ||
|
| ≤1 (median) | 9348 (81.99) | 2099 (18.34) | Ref. | 1 | |||
| >1 | 8983 (79.91) | 2258 (20.09) | 0.11 (0.03) | 1.12 | [1.04–1.20] | - | 0.0008 | |
| missing | 1144 | 0 | - | - | - | |||
|
| ≤0 (median) | 9946 (81.54) | 2251 (18.46) | Ref. | 1 | |||
| >0 | 8385 (79.93) | 2106 (20.07) | 0.10 (0.03) | 1.11 | [1.04–1.19] | - | 0.002 | |
| missing | 1144 | 0 | - | - | - | |||
Estimate: effect of explanatory variable's levels on logit(Probability of {PFA = 1}); SE: standard error; OR: Odds ratio; CL: confidential level; Ref.: reference level.
Age and Exposure were categorized using CART and previous PMIs and previous POIs using median since CART did not find significant cut-off values.
Univariate logistic regression analysis of each temporal risk factor for clinical falciparum malaria (PFA) attacks in Dielmo.
| No of Person-trimesters | ||||||||
| N = 23832 | ||||||||
| PFA = 0 | PFA = 1 | Estimate (Std. Error) | Crude OR | Wald 95%CL |
| Global | ||
| N(%) = 19475 | N (%) = 4357 | |||||||
| Year | 1990 | 587 (82.21) | 127 (17.79) | Ref. | 1 | |||
| 1991 | 740 (81.59) | 167 (18.41) | 0.04 (0.13) | 1.04 | [0.81–1.35] | 0.7457 | ||
| 1992 | 717 (77.18) | 212 (22.82) | 0.31 (0.13) | 1.37 | [1.07–1.75] | 0.0126 | ||
| 1993 | 790 (78.61) | 215 (21.39) | 0.23 (0.12) | 1.26 | [0.99–1.61] | 0.0653 | ||
| 1994 | 774 (75.44) | 252 (24.56) | 0.41 (0.12) | 1.50 | [1.19–1.91] | 0.0008 | ||
| 1995 | 796 (77.06) | 237 (22.94) | 0.32 (0.12) | 1.38 | [1.08–1.75] | 0.0093 | ||
| 1996 | 853 (72.23) | 328 (27.77) | 0.58 (0.12) | 1.78 | [1.41–2.24] | <.0001 | ||
| 1997 | 818 (73.3) | 298 (26.7) | 0.52 (0.12) | 1.68 | [1.33–2.13] | <.0001 | ||
| 1998 | 1179 (80.2) | 291 (19.8) | 0.13 (0.12) | 1.14 | [0.91–1.44] | 0.2632 | ||
| 1999 | 1137 (78.09) | 319 (21.91) | 0.26 (0.12) | 1.30 | [1.03–1.63] | 0.0258 | <.0001 | |
| 2000 | 1151 (76.84) | 347 (23.16) | 0.33 (0.12) | 1.39 | [1.11–1.75] | 0.0041 | ||
| 2001 | 1019 (77.91) | 289 (22.09) | 0.27 (0.12) | 1.31 | [1.04–1.65] | 0.0222 | ||
| 2002 | 1061 (80.75) | 253 (19.25) | 0.1 (0.12) | 1.10 | [0.87–1.40] | 0.4188 | ||
| 2003 | 1055 (80.47) | 256 (19.53) | 0.11 (0.12) | 1.12 | [0.89–1.42] | 0.3396 | ||
| 2004 | 1153 (87.81) | 160 (12.19) | −0.44 (0.13) | 0.64 | [0.50–0.83] | 0.0006 | ||
| 2005 | 1312 (91.11) | 128 (8.89) | −0.8 (0.13) | 0.45 | [0.35–0.59] | <.0001 | ||
| 2006 | 1228 (83.2) | 248 (16.8) | −0.07 (0.12) | 0.93 | [0.74–1.18] | 0.5663 | ||
| 2007 | 1495 (90.44) | 158 (9.56) | −0.72 (0.13) | 0.49 | [0.38–0.63] | <.0001 | ||
| 2008 | 1610 (95.72) | 72 (4.28) | −1.58 (0.16) | 0.21 | [0.15–0.28] | <.0001 | ||
| Season | Jan–Mar | 4749 (82.62) | 999 (17.38) | Ref. | 1 | |||
| April–June | 4912 (82.03) | 1076 (17.97) | 0.04 (0.05) | 1.04 | [0.95–1.14] | 0.4029 | ||
| July–Sept | 4841 (80.38) | 1182 (19.62) | 0.15 (0.05) | 1.16 | [1.06–1.27] | 0.0017 | 0.0128 | |
| Oct–Dec | 4973 (81.89) | 1100 (18.11) | 0.05 (0.05) | 1.05 | [0.96–1.16] | 0.2973 | ||
| Exposure | ≤66.5 days | 2978 (94.33) | 179 (5.67) | Ref. | 1 | |||
| >66.5 days | 15745 (81.57) | 3558 (18.43) | 1.32 (0.08) | 3.76 | [3.22–4.39] | - | <.0001 | |
| 752 | 620 | - | - | - | ||||
Estimate: effect of explanatory variable's levels on logit(Probability of {PFA = 1}); SE: standard error; OR: Odds ratio; CL: confidential level; Ref.: reference level.
Age and Exposure were categorized using CART and previous PMIs and previous POIs using median since CART did not find significant cut-off values.
Univariate analysis of each risk factor (redefined in only two levels) for clinical P. falciparum malaria attacks (PFA) in Dielmo.
| No of Person-trimesters | |||||||
| N = 23832 | |||||||
| PFA = 0 | PFA = 1 | Estimate (Std. Error) | Crude OR | Wald 95%CL |
| ||
| N (%) = 19475 | N (%) = 4357 | ||||||
| (81.72) | (18.28) | ||||||
| Age group (years) | <0.4 or ≥8.12 | 16384 (92.42) | 1343 (7.58) | Ref. | 1 | ||
| [0.4–8.12] | 3036 (50.21) | 3011 (49.79) | 2.49 (0.04) | 12.1 | [11.22–13.04] | <.0001 | |
| Missing data | 55 | 3 | - | - | - | ||
| Blood group | A or B or AB | 10547 (83.64) | 2063 (16.36) | Ref. | 1 | ||
| O | 7597 (79.56) | 1952 (20.44) | 0.27 (0.04) | 1.31 | [1.23–1.41] | <.0001 | |
| Missing data | 1331 | 342 | - | - | - | ||
| Year | ≥2004 | 6798 (89.87) | 766 (10.13) | Ref. | 1 | ||
| <2004 | 12677 (77.93) | 3591 (22.07) | 0.92 (0.04) | 2.51 | [2.31–2.73] | <.0001 | |
| Semester | Jan–Jun | 9661 (82.32) | 2075 (17.68) | Ref. | 1 | ||
| Jul–Dec | 9814 (81.13) | 2282 (18.87) | 0.08 (0.03) | 1.08 | [1.16–1.16] | 0.0179 | |
Estimate: effect of explanatory variable's levels on logit(Probability of {PFA = 1}); SE: standard error; OR: Odds ratio; CL: confidential level; Ref.: reference level.
Multivariate model selection for risk factors associated with clinical P. falciparum malaria attacks (PFA) in Dielmo using factors identified from univariate logistic analysis.
| Best model (with lowest AIC) when number of explanatory variables is equal to: | |||||||||||||
| Variables | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | Forward | Backward | |
| Sex | Female | √ | NSE | 1 | |||||||||
| Age group (years) | 0.4 to 8.1 | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | 1 | NSR |
| Blood Type | O | √ | √ | 9 | NSR | ||||||||
| Type of hemoglobin | AA | √ | √ | √ | √ | 7 | NSR | ||||||
| G6PD | Normal | √ | √ | √ | √ | √ | √ | 6 | NSR | ||||
| Year | Before 2004 | √ | √ | √ | √ | √ | √ | √ | √ | √ | 2 | NSR | |
| Semester | Jul–Dec | √ | √ | √ | 8 | NSR | |||||||
| Exposure | >66.5 | √ | √ | √ | √ | √ | √ | √ | 4 | NSR | |||
|
| ≤1 | √ | √ | √ | √ | √ | √ | √ | √ | 3 | NSR | ||
|
| ≤0 | √ | √ | √ | √ | √ | 5 | NSR | |||||
| RR | 2.53 | 2.96 | 3.22 | 3.22 | 3.15 | 2.98 | 3.08 | 3.27 | 3.32 | 2.95 | 3.32 | 3.32 | |
| (95% CI) | (2.45–2.61) | (2.86–3.05) | (3.10–3.35) | (3.09–3.37) | (2.93–3.38) | (2.71–3.27) | (2.79–3.40) | (2.89–3.70) | (2.83–3.91) | (2.18–3.99) | (2.83–3.91) | (2.83–3.91) | |
| p-value | <.0001 | <.0001 | <.0001 | <.0001 | <.0001 | <.0001 | <.0001 | <.0001 | <.0001 | <.0001 | <.0001 | <.0001 | |
| Size of subset defined by all risk factors | 6044 | 4277 | 2000 | 1520 | 507 | 316 | 261 | 143 | 78 | 31 | 78 | 78 | |
√ : For selected variables.
NSE: No (additional) effects met the 0.05 significance level for entry into the model.
NSR: No (additional) effects met the 0.05 significance level for removal from the model.
*: Both Forward and Backward methods selected the best (in terms of AIC) model with 9 explanatory variables.
Predictive values of modified HyperCube® rule.
| Variable | Size | RR | 95%CL | OR | 95%CL | χ2 | DF | Pr>χ2 | ||
| M.ref: | 3.71 | 3.58 | 3.84 | 11.02 | 9.87 | 12.29 | 2741 | 1 | <.0001 | |
|
|
| |||||||||
| M.ref− |
| 3.65 | 3.52 | 3.77 | 10.35 | 9.30 | 11.51 | 2705 | 1 | <.0001 |
| M.ref−Year |
| 3.44 | 3.33 | 3.56 | 8.58 | 7.82 | 9.40 | 2843 | 1 | <.0001 |
| M.ref−Age |
| 1.18 | 1.14 | 1.23 | 1.24 | 1.18 | 1.30 | 71 | 1 | <.0001 |
| M.ref−Hemoglobin |
| 3.60 | 3.48 | 3.73 | 9.94 | 9.00 | 10.99 | 2898 | 1 | <.0001 |
| M.ref+Sex− | 879 | 3.69 | 3.53 | 3.86 | 10.82 | 9.31 | 12.57 | 1475 | 1 | <.0001 |
| M.ref+Sex−Year | 1031 | 3.59 | 3.44 | 3.75 | 9.82 | 8.57 | 11.25 | 1592 | 1 | <.0001 |
| M.ref+Sex−Age |
| 1.16 | 1.10 | 1.22 | 1.20 | 1.13 | 1.29 | 29 | 1 | <.0001 |
| M.ref+Sex−Hemoglobin | 990 | 3.62 | 3.46 | 3.78 | 10.06 | 8.75 | 11.56 | 1562 | 1 | <.0001 |
| M.ref+Blood Type− | 784 | 3.61 | 3.44 | 3.79 | 10.03 | 8.58 | 11.72 | 1249 | 1 | <.0001 |
| M.ref+Blood Type−Year | 966 | 3.46 | 3.30 | 3.63 | 8.69 | 7.57 | 9.96 | 1351 | 1 | <.0001 |
| M.ref+Blood Type−Age |
| 1.29 | 1.22 | 1.36 | 1.38 | 1.29 | 1.49 | 78 | 1 | <.0001 |
| M.ref+Blood Type−Hemoglobin | 852 | 3.66 | 3.50 | 3.83 | 10.48 | 9.01 | 12.19 | 1399 | 1 | <.0001 |
| M.ref+G6PD− | 651 | 3.76 | 3.58 | 3.95 | 11.56 | 9.69 | 13.79 | 1162 | 1 | <.0001 |
| M.ref+G6PD−Year | 717 | 3.72 | 3.55 | 3.91 | 11.17 | 9.46 | 13.20 | 1244 | 1 | <.0001 |
| M.ref+G6PD−Age |
| 1.17 | 1.11 | 1.23 | 1.22 | 1.13 | 1.31 | 30 | 1 | <.0001 |
| M.ref+G6PD−Hemoglobin | 661 | 3.84 | 3.66 | 4.02 | 12.59 | 10.53 | 15.05 | 1249 | 1 | <.0001 |
| M.ref+Semester− | 884 | 3.77 | 3.62 | 3.94 | 11.76 | 10.09 | 13.69 | 1574 | 1 | <.0001 |
| M.ref+Semester−Year | 1117 | 3.56 | 3.41 | 3.72 | 9.54 | 8.38 | 10.86 | 1677 | 1 | <.0001 |
| M.ref+Semester−Age |
| 1.23 | 1.17 | 1.30 | 1.31 | 1.23 | 1.40 | 64 | 1 | <.0001 |
| M.ref+Semester−Hemoglobin | 988 | 3.76 | 3.61 | 3.92 | 11.62 | 10.06 | 13.42 | 1734 | 1 | <.0001 |
| M.ref+Exposure− | 1403 | 3.66 | 3.25 | 3.80 | 10.46 | 9.29 | 11.78 | 2228 | 1 | <.0001 |
| M.ref+Exposure−Year |
| 3.44 | 3.31 | 3.57 | 8.51 | 7.69 | 9.42 | 2367 | 1 | <.0001 |
| M.ref+Exposure−Age |
| 1.15 | 1.11 | 1.20 | 1.20 | 1.14 | 1.27 | 42 | 1 | <.0001 |
| M.ref+Exposure−Hemoglobin | 1535 | 3.62 | 3.49 | 3.76 | 10.14 | 9.05 | 11.35 | 2361 | 1 | <.0001 |
| M.ref+ | 729 | 3.88 | 3.71 | 4.06 | 13.13 | 11.06 | 15.60 | 1410 | 1 | <.0001 |
| M.ref+ | 759 | 3.87 | 3.71 | 4.05 | 13.05 | 11.02 | 15.44 | 1459 | 1 | <.0001 |
| M.ref+ |
| 1.52 | 1.44 | 1.59 | 1.73 | 1.62 | 1.86 | 246 | 1 | <.0001 |
| M.ref+ | 768 | 3.85 | 3.69 | 4.03 | 12.79 | 10.82 | 15.10 | 1456 | 1 | <.0001 |
M.ref: reference model; Size: number of events; RR: risk ratio; OR: Odds ratio; χ2: chi-square DF = 1; CL: confidential level.
Figure 2Decision tree generated by Classification and Regression Tree (CART) analysis of risk factors determining the occurrence of P. falciparum malaria attacks (PFA) per trimester.
Figure shows the cut-off values identified by CART that divide the dataset into two. At each leaf are given the Relative Risk (RR) and the number of events associated with that leaf.
Figure 3Effect on relative risk (RR) of modifying the ranges of continuous variables.
Graphs show RR for all other possible definitions of risk group on the explanatory variables, with equal or greater size than the HyperCube® rule. Y-axis indicates the RR. A) Only ranges of Age are modified: 102 choices among 4,851 possible choices had size equal or greater than 1,689 (size of the HyperCube® rule) and are plotted; B) Only ranges of previous PMIs are modified: 35 choices among 1,035 possible; C) Only ranges of Year are modified: 25 choices among 190 possible; D) Ranges of both Age and previous PMIs are modified simultaneously: 25,040 choices among 5,020,785 possible; E) Ranges of both Age and Year are modified simultaneously: 8,912 choices among 921,690 possible; F) Ranges of both previous PMIs and Year are modified simultaneously: 1,110 choices among 196,650 possible. Filled red triangle represents the RR of HyperCube®'s rule (HyperCube®'s risk group), empty black circles represent the RR of other choices of risk groups.
Effect size of each variable in the rule.
| DIELMO | NDIOP | |||||
| All year | July | December | ||||
| Loss | % Loss | Loss | % Loss | Loss | % Loss | |
|
| 3.71 | 100% | 2.35 | 100% | 3.78 | 100% |
|
| −2.53 | −68.2% | −0.82 | −34.9% | −1.26 | −33.3% |
|
| −0.67 | −18.1% | −0.7 | −29.8% | 0.05 | 1.3% |
|
| −0.27 | −7.3% | −0.07 | −3.0% | −0.06 | −1.6% |
|
| −0.11 | −3.0% | −7.0% | −3.0% | −0.09 | −2.4% |
|
| −0.06 | −1.6% | −0.13 | −5.5% | −0.12 | −3.2% |
|
| - | - | - | - | −1.43 | −37.8% |
|
| −3.64 | −98% | −1.79 | −76% | −2.91 | −77% |
|
| 0.07 | 1.9% | 0.56 | 23.8% | 0.87 | 23.0% |
Loss: partial decreases of lift when removing each variable from the rule.
Primer sequences probes, restriction enzymes and rs numbers used for typing Glucose-6-phosphate dehydrogenase (G6PD) and ABO blood group single nucleotide polymorphisms.
| Polymorphism name | rs number | Genotyping method | Forward primer (5′-3′) | Reverse primer (5′-3′) | Probe (5′-3′) | Restriction enzyme |
|
| ||||||
| G6PD-202 | rs1050828 | PCR-RFLP |
|
|
| |
| G6PD-376 | rs1050829 | PCR-RFLP |
|
|
| |
| G6PD-542 | rs5030872 | TaqMan® |
|
| probe 1- | |
| probe 2- | ||||||
| G6PD-968 | rs76723693 | TaqMan® |
|
| probe 1- | |
| probe 2- | ||||||
|
| ||||||
| ABO-261 | rs8176719 | PCR-RFLP |
|
|
| |
| ABO-297 | rs8176720 | TaqMan® |
|
| probe 1- | |
| probe 2- | ||||||
| ABO-467 | rs1053878 | PCR-RFLP |
|
|
| |
| ABO-526 | rs7853989 | PCR-RFLP |
|
|
| |
| ABO-771 | rs8176745 | SNaPshot® |
|
|
|