| Literature DB >> 33248550 |
Aneta Polewko-Klim1, Wojciech Lesiński2, Agnieszka Kitlas Golińska2, Krzysztof Mnich3, Maria Siwek4, Witold R Rudnicki5.
Abstract
Two categories of immune responses-innate and adaptive immunity-have both polygenic backgrounds and a significant environmental component. The goal of the reported study was to define candidate genes and mutations for the immune traits of interest in chickens using machine learning-based sensitivity analysis for single-nucleotide polymorphisms (SNPs) located in candidate genes defined in quantitative trait loci regions. Here the adaptive immunity is represented by the specific antibody response toward keyhole limpet hemocyanin (KLH), whereas the innate immunity was represented by natural antibodies toward lipopolysaccharide (LPS) and lipoteichoic acid (LTA). The analysis consisted of 3 basic steps: an identification of candidate SNPs via feature selection, an optimisation of the feature set using recursive feature elimination, and finally a gene-level sensitivity analysis for final selection of models. The predictive model based on 5 genes (MAPK8IP3 CRLF3, UNC13D, ILR9, and PRCKB) explains 14.9% of variance for KLH adaptive response. The models obtained for LTA and LPS use more genes and have lower predictive power, explaining respectively 7.8 and 4.5% of total variance. In comparison, the linear models built on genes identified by a standard statistical analysis explain 1.5, 0.5, and 0.3% of variance for KLH, LTA, and LPS response, respectively. The present study shows that machine learning methods applied to systems with a complex interaction network can discover phenotype-genotype associations with much higher sensitivity than traditional statistical models. It adds contribution to evidence suggesting a role of MAPK8IP3 in the adaptive immune response. It also indicates that CRLF3 is involved in this process as well. Both findings need additional verification.Entities:
Keywords: chicken; immune response; machine learning; marker gene
Mesh:
Year: 2020 PMID: 33248550 PMCID: PMC7704721 DOI: 10.1016/j.psj.2020.08.059
Source DB: PubMed Journal: Poult Sci ISSN: 0032-5791 Impact factor: 3.352
Figure 1Boxplot for KLH7 data set. The blue line depicts the mean value of KLH7 response calculated for all individuals and batches, and the red dots mark the mean value of KLH7 in each batch.
Figure 2Selection of the relevant variables using random forest importance in the double cross-validation scheme. The external cross-validation is used to establish good estimate of classification.
Sensitivity analysis of genes for KLH7.
| Stage 1 | ||||
|---|---|---|---|---|
| Group | Gene | SNPs | Mean | Mean diff. |
| Reference | All SNPs | 11.2% | 0 | |
| I | MAPK8IP3 | |||
| 14692425, | 8.1% | −3.1% | ||
| CRLF3 | ||||
| 8.9% | −2.3% | |||
| UNC13D | 10.2% | −1.0% | ||
| II | ILR9 | 10.7% | −0.5% | |
| PRCKB | 10.8% | −0.4% | ||
| MAP2K3 | 10.9% | −0.3% | ||
| ST6GAL1 | 10.9% | −0.3% | ||
| CARD11 | 14071669, | 11.1% | −0.1% | |
| PTGER4 | 11.1% | −0.1% | ||
| GPC1 | 16651464 | 11.1% | −0.1% | |
| III | SOX14 | 15947324 | 11.3% | 0.1% |
| JAK2 | 14777688 | 11.3% | 0.1% | |
| PDGFA | 14070244 | 11.4% | 0.2% | |
| NLRC3 | 29005402 | 11.4% | 0.2% | |
| JMJD6 | 15820319 | 11.5% | 0.3% | |
| MAP2K4 | 11.7% | 0.5% | ||
| 15810344 | ||||
| SMURF1 | 11.7% | 0.5% | ||
Stage 1: The numbers describe the performance of the random forest models, built without the indicated gene. The reference row displays performance of model containing all the genes. Horizontal lines separate 3 group pf genes. The SNPs that were present in all 297 sets in the previous step are displayed in boldface. The SNPs that were present in more than 90% of cases are displayed in italic. Stage 2: The effect of adding selected genes and combinations of genes to the base set consisting of genes from the group I.
The SNPs that were present in all 297 sets in the previous step are displayed in boldface.
Figure 3Boxplot of gene sensitivity for KLH7 trait (Table 1). The red vertical lines divide genes into 3 groups by their influence on the models. The last plot describes a reference series with all the genes. The horizontal line is a reference level—median of the reference models.
Sensitivity analysis of genes for LPS.
| Group | Gene | SNPs | Mean | Mean diff. |
|---|---|---|---|---|
| Reference | All SNPs | 4.4 | 0 | |
| I | ST6GAL1 | 15965697 | 3.1% | −1.3% |
| TRAF7 | 14072516 | 3.0% | −1.4% | |
| ITGB4 | 14110474,13507637 | 3.4% | −1.0% | |
| PTGER4 | 16102750 | 3.4% | −1.0% | |
| SPHK1 | 15039217 | 3.5% | −0.9% | |
| MAPK8IP3 | 15714774, 16001483 | 3.5% | −0.9% | |
| CRLF3 | 15826598, 15827424, 15040786 | 3.6% | −0.8% | |
| MAP2K4 | 15035880, 15810344, 14105858 | 3.6% | −0.8% | |
| PROCR | 15968294 | 3.6% | −0.8% |
The numbers describe the performance of the random forest models, built without the gene. The reference row describes the series of models containing all the genes.
Sensitivity analysis of genes for LTA.
| Stage 1 | ||||
|---|---|---|---|---|
| Group | Gene | SNPs | Mean | Mean diff. |
| Reference | All SNPs | 4.5% | 0.00% | |
| I | CRLF3 | 15826598, | 2.3% | −2.2% |
| MAPK8IP3 | 3.0% | −1.5% | ||
| TNFRSF13B | 14072943 | 3.4% | −1.1% | |
| SMURF1 | 14072521, 15725673 | 3.5% | −1.0% | |
| PDGFA | 14070244 | 3.6% | −0.9% | |
| PTGER4 | 16102750 | 3.8% | −0.7% | |
| II | SOX14 | 10730793 | 3.9% | −0.6% |
| FOXJ1 | 14110239 | 4.3% | −0.2% | |
| GPC1 | 15943775 | 4.3% | −0.2% | |
| III | ITGB4 | 15821339, 14110474 | 4.5% | +0.0% |
| SPHK1 | 15039217 | 4.7% | +0.2% | |
| IL9R | 15732513 | 4.7% | +0.2% | |
| MAP2K4 | 15035854, 15035880, 15810344 | 4.9% | +0.4% | |
| ST6GAL1 | 15965697 | 4.9% | +0.4% | |
| JMJD6 | 15820338 | 5.0% | +0.5% | |
The numbers describe the performance of the random forest models, built without the gene. The reference row describes the series of models containing all the genes.
The SNPs that were present in all 297 sets in the previous step are displayed in boldface.
Figure 4Histograms of the performance of random forest models for KLH7, LPS, and LTA phenotypic traits. Models were built using optimal feature set for each trait. Histograms were generated using 1,000 iterations of 3-fold cross-validation.
Summary of the sensitivity analysis for all traits.
| Gene | Chromosome | KLH7 | LPS | LTA | |||
|---|---|---|---|---|---|---|---|
| P19 | S15 | P19 | S15 | P19 | S15 | ||
| EPHB1 | 9 | + | ++ | ||||
| GPC1 | 9 | ∗∗ | ∗∗ | ||||
| KLHL6 | 9 | + | + | + | |||
| PROCR | 9 | + | ∗∗∗ | ++ | |||
| SOX14 | 9 | ∗ | ∗∗∗ | ||||
| ST6GAL1 | 9 | ∗∗ | ∗∗∗ | ∗ | |||
| CARD11 | 14 | ∗∗ | ++ | + | |||
| IL9R | 14 | ∗∗∗ | ++ | + | ∗ | ++ | |
| MAP2K3 | 14 | ∗∗ | ++ | ||||
| MAPK8IP3 | 14 | ∗∗∗ | ++ | ∗∗∗ | + | ∗∗∗ | ++ |
| NLRC3 | 14 | ∗ | |||||
| PDGFA | 14 | ∗ | ++ | ∗∗∗ | |||
| PRKCB | 14 | ∗∗∗ | ++ | + | ++ | ||
| SMURF1 | 14 | ∗ | ∗∗∗ | ||||
| SOCS1 | 14 | + | |||||
| TNFRSF13B | 14 | ∗∗∗ | + | ||||
| TRAF7 | 14 | + | ∗∗∗ | ||||
| CRLF3 | 18 | ∗∗∗ | ∗∗∗ | ++ | ∗∗∗ | ||
| FOXJ1 | 18 | + | ++ | ∗∗ | ++ | ||
| ITGB4 | 18 | ++ | ∗∗∗ | ∗ | + | ||
| JMJD6 | 18 | ∗ | + | ∗ | ++ | ||
| MAP2K4 | 18 | ∗ | + | ∗∗∗ | ∗ | ||
| SPHK1 | 18 | ∗∗∗ | ∗ | ||||
| UNC13D | 18 | ∗∗∗ | ++ | ||||
| JAK2 | Z | ∗ | |||||
| PTGER4 | Z | ∗∗ | ∗∗∗ | ++ | ∗∗∗ | ++ | |
The results of the present study are denoted as P19 and are compared with results from Siwek et al. (2015), denoted as S15. The following symbols are used. P19: ∗∗∗ gene included in the final model, ∗∗ positive sensitivity score, negative sensitivity score; S15: ++ gene significant both in RMM and CAR analysis, + gene significant only in CAR analysis.