| Literature DB >> 24260339 |
Susanne Bornelöv1, Annika Sääf, Erik Melén, Anna Bergström, Behrooz Torabi Moghadam, Ville Pulkkinen, Nathalie Acevedo, Christina Orsmark Pietras, Markus Ege, Charlotte Braun-Fahrländer, Josef Riedler, Gert Doekes, Michael Kabesch, Marianne van Hage, Juha Kere, Annika Scheynius, Cilla Söderhäll, Göran Pershagen, Jan Komorowski.
Abstract
Both genetic and environmental factors are important for the development of allergic diseases. However, a detailed understanding of how such factors act together is lacking. To elucidate the interplay between genetic and environmental factors in allergic diseases, we used a novel bioinformatics approach that combines feature selection and machine learning. In two materials, PARSIFAL (a European cross-sectional study of 3113 children) and BAMSE (a Swedish birth-cohort including 2033 children), genetic variants as well as environmental and lifestyle factors were evaluated for their contribution to allergic phenotypes. Monte Carlo feature selection and rule based models were used to identify and rank rules describing how combinations of genetic and environmental factors affect the risk of allergic diseases. Novel interactions between genes were suggested and replicated, such as between ORMDL3 and RORA, where certain genotype combinations gave odds ratios for current asthma of 2.1 (95% CI 1.2-3.6) and 3.2 (95% CI 2.0-5.0) in the BAMSE and PARSIFAL children, respectively. Several combinations of environmental factors appeared to be important for the development of allergic disease in children. For example, use of baby formula and antibiotics early in life was associated with an odds ratio of 7.4 (95% CI 4.5-12.0) of developing asthma. Furthermore, genetic variants together with environmental factors seemed to play a role for allergic diseases, such as the use of antibiotics early in life and COL29A1 variants for asthma, and farm living and NPSR1 variants for allergic eczema. Overall, combinations of environmental and life style factors appeared more frequently in the models than combinations solely involving genes. In conclusion, a new bioinformatics approach is described for analyzing complex data, including extensive genetic and environmental information. Interactions identified with this approach could provide useful hints for further in-depth studies of etiological mechanisms and may also strengthen the basis for risk assessment and prevention.Entities:
Mesh:
Year: 2013 PMID: 24260339 PMCID: PMC3833974 DOI: 10.1371/journal.pone.0080080
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Overview of the epidemiologic studies BAMSE and PARSIFAL.
|
|
| |||
|---|---|---|---|---|
| Total number | 2033 | 3113 | ||
| Boys (%) | 52 | 51 | ||
| Age (years; average) | 8.3 | 9.0 | ||
| Phenotypes (n = count) | Affected | Unaffected | Affected | Unaffected |
| Asthma | 293 | 1661 | 261 | 2801 |
| Allergic asthma | 158 | 1123 | 144 | 2058 |
| Non-allergic asthma | 135 | 1123 | 117 | 2058 |
| Current asthma | 131 | 1568 | 119 | 2663 |
| Wheeze | 226 | 1796 | 236 | 2849 |
| Eczema | 182 | 1775 | 399 | 2650 |
| Allergic eczema | 98 | 1190 | 190 | 1960 |
| Non-allergic eczema | 84 | 1190 | 209 | 1960 |
| Rhinoconjunctivitis | 313 | 1714 | 215 | 2868 |
| Atopic sensitization >3.5 kU/l | 349 | 1682 | 487 | 2625 |
| Atopic sensitization >0.35 kU/l | 717 | 1314 | 896 | 2214 |
Figure 1Analysis methodology for factors related to childhood allergy in the epidemiologic studies BAMSE and PARSIFAL.
Allergy phenotypes were modeled based on genetic and exposure data to identify (A) rules using gene and (B) gene and environment data. MCFS selected significant predictors of a phenotype, which was used to generate rules by ROSETTA. First model used 110 SNPs in BAMSE and PARSIFAL, while the second model included both genetic and exposure data in PARSIFAL, using BAMSE for validation when applicable.
Summary of the analyses on combinations of genetic variants using MCFS and rule generation in BAMSE (n=2033) and PARSIFAL (n=3113).
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|
| Allergic asthma | BAMSE | 13 | 92.3% | 57.0% | 61 | 4 | 3 |
| PARSIFAL | 8 | 79.6% | 53.4% | 18 | 0 | 0 | |
| Non-allergic asthma | BAMSE | 16 | 92.7% | 58.2% | 70 | 2 | 2 |
| PARSIFAL | 10 | 84.9% | 56.3% | 37 | 1 | 1 | |
| Asthma | BAMSE | 9 | 47.2% | 56.6% | 21 | 2 | 2 |
| PARSIFAL | 17 | 95.3% | 52.0% | 111 | 1 | 1 | |
| Current asthma | BAMSE | 12 | 76.4% | 56.0% | 34 | 3 | 3 |
| PARSIFAL | 9 | 94.4% | 56.5% | 53 | 4 | 3 | |
| Atopic sensitization >3.5 kU/L | BAMSE | 4 | 4.2% | 67.7% | 3 | 1 | 1 |
| PARSIFAL | 6 | 18.7% | 47.5% | 3 | 1 | 1 | |
| Atopic sensitization >0.35 kU/L | BAMSE | 18 | 93.6% | 50.2% | 124 | 1 | 0 |
| PARSIFAL | 21 | 93.9% | 49.2% | 184 | 0 | 0 | |
| Allergic eczema | BAMSE | 5 | 33.2% | 57.1% | 11 | 1 | 1 |
| PARSIFAL | 8 | 46.2% | 56.9% | 29 | 1 | 1 | |
| Eczema | BAMSE | 8 | 49.8% | 54.5% | 17 | 0 | 0 |
| PARSIFAL | 11 | 73.2% | 56.2% | 45 | 3 | 2 | |
| Non-allergic eczema | BAMSE | 10 | 92.0% | 56.1% | 41 | 2 | 0 |
| PARSIFAL | 7 | 23.5% | 58.8% | 7 | 2 | 2 | |
| Rhinoconjunctivitis | BAMSE | 9 | 42.4% | 47.5% | 18 | 0 | 0 |
| PARSIFAL | 5 | 16.4% | 61.5% | 7 | 1 | 1 | |
| Wheeze | BAMSE | 18 | 90.5% | 57.4% | 121 | 6 | 5 |
| PARSIFAL | 21 | 93.8% | 52.9% | 106 | 3 | 2 | |
|
|
|
|
|
|
|
|
Eleven allergy phenotypes were modeled by combining Monte Carlo feature selection (MCFS) and rule generation using 110 SNPs in BAMSE and PARSIFAL. An overview of the number of significant factors (Factors) identified by MCFS and the estimated model coverage (Cover) and accuracy (Accur), i.e., the quality of the rules, is shown (described in the Methods S1). “Rules”=Total number of rules, “Val.Rules”=rules used for validation and “Valid”=rules that passed validation.
Figure 2SNP combinations with relevance for current asthma and wheeze in BAMSE and PARSIFAL.
The combination of specific genetic variants in (A) ORMDL3-RORA increases the risk for current asthma1, and in (B) ORMDL3-RORA-COL29A1 increase the risk for wheeze2. The risk for current asthma and wheeze increased with the number of risk genotypes described by corresponding rule (C-D). ORs and 95% confidence interval are shown. The major allele count is indicated for each gene below i.e. describing 0, 1 or 2 copies of the major allele. The reference category includes children who do not fulfill the rule.
1 IF ORMDL3_rs2305480=2[GG] AND RORA_rs17270362=1[AG] THEN current asthma.
2 IF COL29A1_rs11917356=2[AA] AND ORMDL3_rs7216389=0[TT] AND RORA_rs17270362=1[AG] THEN wheeze.
Summary of the analyses on combinations of genetic variants and environmental factors using MCFS and rule generation in PARSIFAL (n=3113).
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| Allergic asthma | 20 | 94.1% | 61.2% | 73 | 20 | 18 |
| Asthma | 16 | 93.2% | 62.6% | 66 | 16 | 14 |
| Non-allergic asthma | 16 | 95.8% | 64.0% | 72 | 16 | 16 |
| Current asthma | 19 | 93.0% | 63.2% | 39 | 12 | 10 |
| Atopic sensitization >3.5 kU/L | 14 | 88.4% | 64.0% | 51 | 7 | 7 |
| Atopic sensitization >0.35 kU/L | 3 | 17.3% | 59.0% | 3 | 0 | 0 |
| Allergic eczema | 24 | 95.0% | 67.4% | 83 | 17 | 17 |
| Eczema | 11 | 65.3% | 61.7% | 30 | 17 | 15 |
| Non-allergic eczema | 8 | 69.5% | 59.7% | 43 | 9 | 9 |
| Rhinoconjunctivitis | 18 | 87.1% | 63.8% | 41 | 14 | 13 |
| Wheeze | 16 | 82.1% | 60.8% | 59 | 15 | 13 |
|
|
|
|
|
|
|
|
Eleven allergy phenotypes were modeled by combining Monte Carlo feature selection (MCFS) and rule generation using genetic and environmental/lifestyle factors in PARSIFAL. An overview of the number of significant factors (Factors) identified by MCFS and the estimated model coverage (Cover) and accuracy (Accur), i.e., the quality of the rules, is shown (described in the Methods S1). “Rules”=Total number of rules, “Val.Rules”=rules used for validation and “Valid”=rules that passed validation.
Figure 3Visualization of co-occurring factors in rules for allergic eczema, asthma and atopic sensitization in PARSIFAL.
Rule networks for (A-B) allergic eczema, (C-D) asthma and (E-F) atopic sensitization; affected and unaffected, respectively. Conditions that occur in the rules are on the outer ring, and co-occurrences of conditions in the rules are illustrated by ribbons across the circle connecting the conditions. The ribbon color indicates high (red) to low (grey) scores. The width of the edges is proportional to the number of correctly classified children.
Figure 4Combinations of genetic variants and/or environmental factors in relation to allergy and asthma in PARSIFAL.
Odds ratios are shown for the top-hits rules identified for (A) allergic eczema; affected1-10 and unaffected58-67 (B) asthma; affected1-10 and unaffected44-53 and (C) atopic sensitization; affected1-10 and unaffected37-46. The odds ratios were calculated for children that fulfill all conditions in the rule using all other children as reference.