| Literature DB >> 29940834 |
Rafael V Veiga1,2, Helio J C Barbosa3,4, Heder S Bernardino3, João M Freitas3, Caroline A Feitosa5, Sheila M A Matos5, Neuza M Alcântara-Neves6, Maurício L Barreto7,5.
Abstract
BACKGROUND: Asthma and allergies prevalence increased in recent decades, being a serious global health problem. They are complex diseases with strong contextual influence, so that the use of advanced machine learning tools such as genetic programming could be important for the understanding the causal mechanisms explaining those conditions. Here, we applied a multiobjective grammar-based genetic programming (MGGP) to a dataset composed by 1047 subjects. The dataset contains information on the environmental, psychosocial, socioeconomics, nutritional and infectious factors collected from participating children. The objective of this work is to generate models that explain the occurrence of asthma, and two markers of allergy: presence of IgE antibody against common allergens, and skin prick test positivity for common allergens (SPT).Entities:
Keywords: Allergy; Asthma; Classifier; Genetic programming; Multiobjective
Mesh:
Substances:
Year: 2018 PMID: 29940834 PMCID: PMC6047363 DOI: 10.1186/s12859-018-2233-z
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Example of a derivation tree [46]
Fig. 2Example of crossover operators of Grammar Guided GP [46]
Fig. 3Example of mutation operators of Grammar Guided GP [46]
Fig. 4Example of domination rank with two objective, the rank 1 is nondominated, rank 2 is only dominated by rank1 and rank 3 is dominated by rank1 an rank 2
Variables used to build Models
| Variables | Type | Freq % |
|---|---|---|
| Target variables | ||
| IgE (positives) | Boolean | 38.6 |
| SPT(positives) | Boolean | 30.3 |
| Asthma (positives) | Boolean | 22.9 |
| Input variables | ||
| Gender (males) | Boolean | 52.7 |
| Age | Categorical | |
| 4 and 5 | 35.9 | |
| 6 and 7 | 35.1 | |
| 8 to 11 | 29.0 | |
| Parental asthma (presence) | Boolean | 12.6 |
| HSV (positives) | Boolean | 54.9 |
| HZV (positives) | Boolean | 45.8 |
| EBV (positives) | Boolean | 88.4 |
| HAV (positives) | Boolean | 16.7 |
| Boolean | 18.4 | |
| Boolean | 27.6 | |
| Boolean | 16.2 | |
| Boolean | 11.2 | |
| Sibling number | Categorical | |
| none | 18.9 | |
| 1 | 35.2 | |
| 2 | 24.0 | |
| 3 or more | 21.9 | |
| Daycare ever (yes) | Boolean | 15.4 |
| Smoke at home (presence) | Boolean | 27.1 |
| Sewage disposal system (presence) | Boolean | 83.5 |
| Change bed linen ≥ 1 per week | Boolean | 45.0 |
| Cat at home (presence) | Boolean | 17.6 |
| Dog at home (presence) | Boolean | 39.8 |
| Mold/moisture at home (presence) | Boolean | 68.6 |
| Piped water system (presence) | Boolean | 91.9 |
| Paving of the street (absence) | Boolean | 35.1 |
| Fly at home (presence) | Boolean | 51.5 |
| Mother Psychological disorder | ||
| (suspect) | Boolean | 37.2 |
| Dietary patterns 1 to 4 | Categorical | Split by tertiles |
| Daily calories ( | Numerical | 2210(929) |
| BMI | Categorical | |
| Overweight / Obesity | 12.2 | |
| Eutrophic | 75.1 | |
| Slimness | 12.7 | |
| GNI | Categorical | Split by tertiles |
Fig. 5ROC space for training groups of RL algorithm, showing the difference of balanced data and unbalanced data
Fig. 6The classification error and the complexity of the set of non-dominated solutions for a training group in the final generation of an MGGP execution. The RL are classification error of RL algorithm for the same training group. C4.5 are the classification error of the same training group in C4.5 algorithm
Accuracy obtained in the test groups for different techniques, where RL is logistic regression, RF is random forest and * indicates that all executions converged to the same model
| Asthma | |||||
|---|---|---|---|---|---|
| Mean | Median | sd | Min | Max | |
| RL | 56.67 | 56.19 | 4.82 | 45.71 | 63.81 |
| C4.5 | 61.97 | 62.86 | 3.21 | 55.24 | 66.67 |
| RF | 61.81 | 62.38 | 3.78 | 53.33 | 68.57 |
| MGGP | 61.15 | 62.36 | 5.53 | 50.64 | 71.87 |
| SPT | |||||
| RL | 54.19 | 55.24 | 3.60 | 44.76 | 60.00 |
| C4.5 | 50.38 | 50.48 | 0.72 | 46.67 | 51.43 |
| RF | 57.87 | 57.62 | 2.91 | 52.38 | 65.71 |
| MGGP | 56.69 | 57.74 | 4.18 | 49.02 | 66.46 |
| IgE | |||||
| RL | 55.43 | 55.24 | 2.94 | 50.48 | 63.81 |
| C4.5 * | 53.33 | 53.33 | 0 | 53.33 | 53.33 |
| RF | 58.39 | 58.49 | 2.92 | 52.83 | 64.15 |
| MGGP | 58.39 | 58.57 | 3.05 | 48.39 | 63.26 |
Fig. 7Average accuracy and their 95% confidence interval for solutions of asthma, SPT and IgE in the test group obtained by algorithms RL, C4.5, RF and different ranges of complexity for solutions obtained by MGGP
Examples of relations found by MGGP for IgE, SPT and asthma. The first column is odds ratio of all database without test group, second column is odds ratio in the test group, and the third column is accuracy of relation express. Where “ {1}” means positive for outcome, “ {0}” negative for outcome
| Asthma | ||||
|---|---|---|---|---|
| All database without test group | Test group | |||
| OR(C.I.95%) | OR(C.I.95%) | Accuracy (%) | Complexity | Important relations |
| 2.42(1.96; 2.99) | 1.28(0.69; 2.41) | 53.1 | 10 | if((Dog at Home != 0) or (Age = 0)){ 1 }else{ 0 } |
| 2.48(2.01; 3.06) | 3.80(2.01; 7.55) | 66.0 | 10 | if((Cat at Home = 1) or (Age = 0)){ 1 }else{ 0 } |
| 2.64(2.11; 3.31) | 3.25(1.64; 6.42) | 63.0 | 10 | if((Mother Psychological disorder = 1) or (Age = 0)){ 1 }else{ 0 } |
| 2.33(1.89; 2.88) | 2.36(1.25; 4.43) | 60.5 | 10 | if((Dog at Home != 1) and (Mother Psychological disorder = 0)){ 0 }else{ 1 } |
| 3.25(2.62; 4.03) | 3.23(1.69; 6.14) | 64.2 | 14 | if((Age = 0) or ((Cat at Home = 1) and (Nutritional Factor3 <= 0))) |
| { 1 }else{ 0 } | ||||
| 3.26(2.63; 4.07) | 2.92(1.53; 5.56) | 63.0 | 14 | if(((Age > 0) or (Nutritional Factor2 = 2)) and |
| (Cat at Home != 1)){ 0 }else{ 1 } | ||||
| 3.59(2.87; 4.49) | 2.56(1.34; 4.88) | 61.1 | 18 | if(((Cat at Home = 1) and (Nutritional Factor3 <= 0)) or |
| ((Nutritional Factor2 < 2) and (Age = 0))){ 1 }else{ 0 } | ||||
| 3.86(3.10; 4.80) | 2.50(1.32; 4.73) | 61.1 | 22 | if(((Age != 0) or (((Mother Psychological disorder = 0) and |
| (Dog at Home != 1)) and (HZV = 1))) and (Cat at Home != 1)){ 0 }else{ 1 } | ||||
| 3.91(3.12; 4.93) | 1.73(0.93; 3.23) | 56.8 | 22 | if(((HSV != 1) and ((Linen Bed Exchange != 0) and (Age = 1))) xor |
| ((Dog at Home != 0) or (Age <= 0))){ 1 }else{ 0 } | ||||
| 4.45(3.57; 5.56) | 2.78(1.46; 5.28) | 62.3 | 31 | if(Cat at Home = 1){ 1 }else{ (if(((Alumbricoides != 0) or |
| (Nutritional Factor2 < 2)) and ((Dog at Home != 0) or (((HZV != 1) or | ||||
| (Mother Psychological disorder = 1)) and (Age < 1)))){ 1 }else{ 0 }) } | ||||
| 18.01(13.85; 23.60) | 3.77(1.91; 7.46) | 64.8 | 231 | too large to show |
| SPT | ||||
| 2.03(1.62; 2.55) | 2.44(1.22; 4.86) | 60.3 | 10 | if((Linen Bed Exchange != 0) and (Ttrichiura != 1)){ 1 }else{ 0 } |
| 2.01(1.61; 2.51) | 1.58(0.81 ;3.06) | 55.5 | 10 | if((Nutritional Factor4 > 0) and (Linen Bed Exchange != 0)){ 1 }else{ 0 } |
| 1.93(1.55; 2.41) | 2.92(1.46; 5.86) | 62.3 | 10 | if((Linen Bed Exchange = 0) or (BMI != 0)){ 0 }else{ 1 } |
| 2.11(1.67; 2.66) | 1.52(0.77; 2.99) | 54.8 | 10 | if((HSV = 0) and (Linen Bed Exchange != 0)){ 1 }else{ 0 } |
| 2.46(1.97; 3.08) | 2.06(1.06; 3.98) | 58.9 | 14 | if(((HSV = 0) or (Nutritional Factor3 >= 1)) and |
| (Linen Bed Exchange != 0)){ 1 }else{ 0 } | ||||
| 2.68(2.15; 3.35) | 2.45(1.26; 4.77) | 60.9 | 18 | if(((HSV = 0) or (daycare = 1)) and ((Nutritional Factor4 != 1) or |
| (Linen Bed Exchange != 0))){ 1 }else{ 0 } | ||||
| 2.45(1.96; 3.07) | 2.23(1.14; 4.38) | 60.0 | 20 | if((HSV != 0) and (Nutritional Factor3 = 0)){ (if(Nutritional Factor1 != 1) |
| { 0 }else{ 1 }) }else{ (if(Linen Bed Exchange != 0){ 1 }else{ 0 }) } | ||||
| 3.57(2.85; 4.49) | 2.31(1.19; 4.48) | 60.3 | 39 | if((Nutritional Factor4 < 1) xor (((Nutritional Factor2 >= 1) or |
| ((num siblings = 1) and (Fly at Home = 0))) and | ||||
| (((Mother Psychological disorder != 1) or (num siblings >= 1)) xor | ||||
| (Linen Bed Exchange = 0)))){ (if((HSV != 1) or (Tgondi != 1)) | ||||
| { 1 }else{ 0 }) }else{ 0 } | ||||
| 6.73(5.30; 8.59) | 2.35(1.14; 4.85) | 58.9 | 124 | too large to show |
| IgE | ||||
| 1.76(1.39; 2.22) | 2.28(1.12; 4.63) | 60.0 | 10 | if((Gender != 1) or (Cat at Home != 0)){ 0 }else{ 1 } |
| 2.00(1.58; 2.53) | 1.87(0.93; 3.75) | 57.7 | 10 | if((Gender != 0) and (sewage disposal != 1)){ 1 }else{ 0 } |
| 1.63(1.28; 2.08) | 1.81(0.88; 3.72) | 56.9 | 10 | if((Nutritional Factor1 != 1) and (Gender != 0)){ 1 }else{ 0 } |
| 2.17(1.71; 2.75) | 2.37(1.11; 5.06) | 59.2 | 14 | if(((Tgondi = 0) and (Gender != 0)) and (sewage disposal != 1)){ 1 }else{ 0 } |
| 2.02(1.60; 2.57) | 1.98(0.99; 3.98) | 58.5 | 14 | if(((Gender != 1) and (Cat at Home = 0)) or |
| (sewage disposal = 1)){ 0 }else{ 1 } | ||||
| 2.39(1.88; 3.04) | 1.75(0.87; 3.52) | 56.9 | 18 | if(((Nutritional Factor1 = 1) or (Gender = 1)) and ((sewage disposal = 1) xor |
| (Tgondi = 0))){ 1 }else{ 0 } | ||||
| 2.46(1.94; 3.12) | 2.13(1.05; 4.31) | 59.2 | 22 | if(((Tgondi = 1) or (sewage disposal = 1)) or ((Gender != 1) and |
| ((Nutritional Factor2 <= 0) xor (sewage disposal = 0)))){ 0 }else{ 1 } | ||||
| 3.64(2.84; 4.69) | 2.92(1.43; 5.96) | 63.1 | 46 | if((((Tgondi != 1) or ((Nutritional Factor4 = 1) xor |
| (paving of the street != 1))) xor (sewage disposal != 1)) or ((Gender != 1) and | ||||
| ((((Age <= 1) and (Nutritional Factor2 < 1)) xor (sewage disposal = 0)) xor | ||||
| (((Nutritional Factor1 <= 0) xor (Ttrichiura = 1)) and | ||||
| (num siblings > 2))))){ 0 }else{ 1 } | ||||
| 3.89(3.05; 4.98) | 2.14(1.06; 4.35) | 59.2 | 58 | if((Tgondi = 0) xor ((Nutritional Factor4 <= 0) and (Age < 2))) |
| { (if(((Gender = 1) xor (Ttrichiura = 1)) or ((sewage disposal != 0) xor | ||||
| (Nutritional Factor2 <= 0))){ (if((((BMI = 1) xor (HAV = 0)) or | ||||
| (Nutritional Factor1 >= 1)) xor (sewage disposal != 1)){ 0 }else{ 1 }) }else | ||||
| { 0 }) }else{ (if(Gender != 0){ (if(HZV = 0){ 1 }else{ 0 }) }else{ 0 }) } | ||||
A variable followed by “ 0” means negative for this variable, and “ 1” is positive
| < | < |
| < | < |
| < | < |
| < | |
| < | < |
| < | |
| < | |
| < | |
| < | < | <= | == | >= | > | != |
| < | == | != |
| < | < | <= | > | >= |
| < | |
| < | 0 | 1 |