| Literature DB >> 26495028 |
J Gilberto Rodríguez-Escobedo1, Christian A García-Sepúlveda2, Juan C Cuevas-Tello1.
Abstract
Killer-cell immunoglobulin-like receptors (KIRs) are membrane proteins expressed by cells of innate and adaptive immunity. The KIR system consists of 17 genes and 614 alleles arranged into different haplotypes. KIR genes modulate susceptibility to haematological malignancies, viral infections, and autoimmune diseases. Molecular epidemiology studies rely on traditional statistical methods to identify associations between KIR genes and disease. We have previously described our results by applying support vector machines to identify associations between KIR genes and disease. However, rules specifying which haplotypes are associated with greater susceptibility to malignancies are lacking. Here we present the results of our investigation into the rules governing haematological malignancy susceptibility. We have studied the different haplotypic combinations of 17 KIR genes in 300 healthy individuals and 43 patients with haematological malignancies (25 with leukaemia and 18 with lymphomas). We compare two machine learning algorithms against traditional statistical analysis and show that the "a priori" algorithm is capable of discovering patterns unrevealed by previous algorithms and statistical approaches.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26495028 PMCID: PMC4606520 DOI: 10.1155/2015/141363
Source DB: PubMed Journal: Comput Math Methods Med ISSN: 1748-670X Impact factor: 2.238
Clinical data for the haematological cohort.
|
| % | |
|---|---|---|
| Gender | ||
| Male | 23 | 53 |
| Female | 20 | 46 |
|
| ||
| Diagnosis | ||
| Chronic myeloid leukaemia | 25 | 58 |
| Hodgkin's lymphoma | 18 | 42 |
|
| ||
| B symptoms | ||
| Present | 30 | 70 |
| Absent | 13 | 30 |
|
| ||
| ECOGa | ||
| 0 | 3 | 7 |
| 1 | 16 | 37 |
| 2 | 20 | 46 |
| 3 | 3 | 7 |
| 4 | 1 | 2 |
aEastern Cooperative Oncology Group (ECOG).
Figure 1KIR gene features present in the healthy unrelated donor and haematological malignancy cohorts. KIR haplotype. A,—corresponds to group A homozygous haplotypes, whereas A Hp includes both homozygous and heterozygous group A haplotypes (vice versa for B). cB01 haplotypes having KIR2DS3 but not KIR2DS5 are indicated as “cB01(s3),” vice versa for those containing KIR2DS5 instead of KIR2DS3. The same applies to cB03 and tB01 categories. Combinations of centromeric and telomeric motifs that are thought to be very likely occurring based on Pyo's 2010 criteria [11] have been included at the bottom of the figure as extended haplotypes.
Study population; for visualization purposes, we only show the first five rows (disease, C = 1) and the last three rows (healthy, C = 0). Note that the last column corresponds to the class. Boxes with the mark ✓ indicate the presence of the gen (1), otherwise the absence (0).
| Id | 2DL1 | 2DL2 | 2DL3 | 2DL5 | 2DS1 | 2DS2 | 2DS3 | 2DS4 | 2DS5 | 2DP1 | 3DL1 | 3DS1 | Disease (class— |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 1 | ||||||
| 2 | ✓ | ✓ | ✓ | ✓ | ✓ | 1 | |||||||
| 3 | ✓ | ✓ | ✓ | ✓ | ✓ | 1 | |||||||
| 4 | ✓ | ✓ | ✓ | ✓ | ✓ | 1 | |||||||
| 5 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 1 | |||
| ⋮ | ⋮ | ||||||||||||
| 341 | ✓ | ✓ | ✓ | ✓ | ✓ | 0 | |||||||
| 342 | ✓ | ✓ | ✓ | ✓ | ✓ | 0 | |||||||
| 343 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 0 |
Truth table, AND operator (∧).
|
|
|
|
|---|---|---|
| 0 | 0 | 0 |
| 0 | 1 | 0 |
| 1 | 0 | 0 |
| 1 | 1 | 1 |
This table contains 20 records; there are two variables (g1 and g2); the class C also represents 0 when the donor is healthy and 1 diseased.
| # |
|
|
|
|---|---|---|---|
| 1 | 1 | 1 | 1 |
| 2 | 0 | 0 | 0 |
| 3 | 0 | 1 | 0 |
| 4 | 1 | 1 | 1 |
| 5 | 0 | 0 | 0 |
| 6 | 1 | 0 | 0 |
| 7 | 0 | 1 | 0 |
| 8 | 1 | 1 | 1 |
| 9 | 0 | 0 | 0 |
| 10 | 0 | 1 | 0 |
| 11 | 1 | 1 | 1 |
| 12 | 0 | 0 | 0 |
| 13 | 1 | 0 | 0 |
| 14 | 0 | 1 | 0 |
| 15 | 0 | 1 | 0 |
| 16 | 1 | 1 | 1 |
| 17 | 1 | 1 | 1 |
| 18 | 0 | 0 | 0 |
| 19 | 0 | 1 | 0 |
| 20 | 0 | 0 | 0 |
Figure 2Results from the example. (a) Statistical test. (b) J48 pruned tree. (c) Rules given by the a priori algorithm.
Pseudocode 1
Figure 3J48 decision tree.
Univariate statistical analysis.
| 2DL1 | 2DL2 | 2DL3 | 2DL5 | 2DS1 | 2DS2 | 2DS3 | 2DS4 | 2DS5 | 2DP1 | 3DL1 | 3DS1 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
| 0.752 | 0.0000087 | 0.467 | 0.214 | 0.421 | 0.271 | 0.131 | 0.199 | 0.946 | 0.921 | 0.042 | 0.888 |
|
| 0.100 | 19.764 | 0.530 | 1.547 | 0.649 | 1.213 | 2.281 | 1.649 | 0.005 | 0.010 | 4.128 | 0.020 |
Multivariate statistical analysis; here we show only the variable combinations associated to the haplotype cA01|tA01. Boxes with the mark ✓ indicate that the variable is part of the variable combination; otherwise it is not taken in account.
| # | 2DL1 | 2DL2 | 2DL3 | 2DL5 | 2DS1 | 2DS2 | 2DS3 | 2DS4 | 2DS5 | 2DP1 | 3DL1 | 3DS1 |
|
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 0.00036 | 12.7 | ||||||
| 2 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 0.01918 | 5.4 | |||||
| 3 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 0.00053 | 11.9 | |||||
| 4 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 0.00022 | 13.5 | |||||
| 5 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
|
| |||||
| 6 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 0.04289 | 4.09 | ||||
| 7 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 0.01918 | 5.4 | ||||
| 8 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 0.00574 | 7.6 | ||||
| 9 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 0.01918 | 5.4 | ||||
| 10 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 0.00213 | 9.4 | ||||
| 11 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 0.00246 | 9.1 | ||||
| 12 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 0.04289 | 4.09 | |||
| 13 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 0.04289 | 4.09 | |||
| 14 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 0.01918 | 5.4 | |||
| 15 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 0.00574 | 7.6 | |||
| 16 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 0.04289 | 4.09 |
Rules generated by the a priori algorithm represented in tabular form. This figure contains only 24 rules with frequency 10, where the class = 1 (C).
| # | Id | KIR2DL1 | KIR2DL2 | KIR2DL3 | KIR2DL5 | KIR2DS2 | KIR2DS4 | KIR2DP1 | KIR3DL1 | Frequency |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1476 | 2DL2 = 1 | 2DL5 = 1 | 2DS2 = 0 | 2DS4 = 1 | 10 | ||||
| 2 | 1477 | 2DL2 = 1 | 2DL5 = 1 | 2DS2 = 0 | 3DL1 = 1 | 10 | ||||
| 3 | 1528 | 2DL1 = 1 | 2DL2 = 1 | 2DL5 = 1 | 2DS2 = 0 | 2DS4 = 1 | 10 | |||
| 4 | 1529 | 2DL1 = 1 | 2DL2 = 1 | 2DL5 = 1 | 2DS2 = 0 | 3DL1 = 1 | 10 | |||
| 5 | 1558 | 2DL2 = 1 | 2DL3 = 1 | 2DL5 = 1 | 2DS2 = 0 | 2DS4 = 1 | 10 | |||
| 6 | 1559 | 2DL2 = 1 | 2DL3 = 1 | 2DL5 = 1 | 2DS2 = 0 | 3DL1 = 1 | 10 | |||
| 7 | 1560 | 2DL2 = 1 | 2DL5 = 1 | 2DS2 = 0 | 2DS4 = 1 | 2DP1 = 1 | 10 | |||
| 8 | 1561 | 2DL2 = 1 | 2DL5 = 1 | 2DS2 = 0 | 2DS4 = 1 | 3DL1 = 1 | 10 | |||
| 9 | 1562 | 2DL2 = 1 | 2DL5 = 1 | 2DS2 = 0 | 2DP1 = 1 | 3DL1 = 1 | 10 | |||
| 10 | 1651 | 2DL1 = 1 | 2DL2 = 1 | 2DL3 = 1 | 2DL5 = 1 | 2DS2 = 0 | 2DS4 = 1 | 10 | ||
| 11 | 1652 | 2DL1 = 1 | 2DL2 = 1 | 2DL3 = 1 | 2DL5 = 1 | 2DS2 = 0 | 3DL1 = 1 | 10 | ||
| 12 | 1653 | 2DL1 = 1 | 2DL2 = 1 | 2DL5 = 1 | 2DS2 = 0 | 2DS4 = 1 | 2DP1 = 1 | 10 | ||
| 13 | 1654 | 2DL1 = 1 | 2DL2 = 1 | 2DL5 = 1 | 2DS2 = 0 | 2DS4 = 1 | 3DL1 = 1 | 10 | ||
| 14 | 1655 | 2DL1 = 1 | 2DL2 = 1 | 2DL5 = 1 | 2DS2 = 0 | 2DP1 = 1 | 3DL1 = 1 | 10 | ||
| 15 | 1681 | 2DL2 = 1 | 2DL3 = 1 | 2DL5 = 1 | 2DS2 = 0 | 2DS4 = 1 | 2DP1 = 1 | 10 | ||
| 16 | 1682 | 2DL2 = 1 | 2DL3 = 1 | 2DL5 = 1 | 2DS2 = 0 | 2DS4 = 1 | 3DL1 = 1 | 10 | ||
| 17 | 1683 | 2DL2 = 1 | 2DL3 = 1 | 2DL5 = 1 | 2DS2 = 0 | 2DP1 = 1 | 3DL1 = 1 | 10 | ||
| 18 | 1684 | 2DL2 = 1 | 2DL5 = 1 | 2DS2 = 0 | 2DS4 = 1 | 2DP1 = 1 | 3DL1 = 1 | 10 | ||
| 19 | 1784 | 2DL1 = 1 | 2DL2 = 1 | 2DL3 = 1 | 2DL5 = 1 | 2DS2 = 0 | 2DS4 = 1 | 2DP1 = 1 | 10 | |
| 20 | 1785 | 2DL1 = 1 | 2DL2 = 1 | 2DL3 = 1 | 2DL5 = 1 | 2DS2 = 0 | 2DS4 = 1 | 3DL1 = 1 | 10 | |
| 21 | 1786 | 2DL1 = 1 | 2DL2 = 1 | 2DL3 = 1 | 2DL5 = 1 | 2DS2 = 0 | 2DP1 = 1 | 3DL1 = 1 | 10 | |
| 22 | 1787 | 2DL1 = 1 | 2DL2 = 1 | 2DL5 = 1 | 2DS2 = 0 | 2DS4 = 1 | 2DP1 = 1 | 3DL1 = 1 | 10 | |
| 23 | 1806 | 2DL2 = 1 | 2DL3 = 1 | 2DL5 = 1 | 2DS2 = 0 | 2DS4 = 1 | 2DP1 = 1 | 3DL1 = 1 | 10 | |
|
|
|
|
|
|
|
|
|
|
|
|
(a) Multivariate statistical analysis
| Disease | Healthy | |
|---|---|---|
| Disease | 18 | 25 |
| Healthy | 46 | 254 |
p value = 0.00002; χ 2 = 17.4.
(b) A priori algorithm
| Disease | Healthy | |
|---|---|---|
| Disease | 10 | 33 |
| Healthy | 0 | 300 |
p value = 0.0; χ 2 = 71.86.