| Literature DB >> 33286767 |
Luis Javier Herrera1, Carlos José Todero Peixoto2, Oresti Baños1, Juan Miguel Carceller3, Francisco Carrillo1, Alberto Guillén1.
Abstract
The study of cosmic rays remains as one of the most challenging research fields in Physics. From the many questions still open in this area, knowledge of the type of primary for each event remains as one of the most important issues. All of the cosmic rays observatories have been trying to solve this question for at least six decades, but have not yet succeeded. The main obstacle is the impossibility of directly detecting high energy primary events, being necessary to use Monte Carlo models and simulations to characterize generated particles cascades. This work presents the results attained using a simulated dataset that was provided by the Monte Carlo code CORSIKA, which is a simulator of high energy particles interactions with the atmosphere, resulting in a cascade of secondary particles extending for a few kilometers (in diameter) at ground level. Using this simulated data, a set of machine learning classifiers have been designed and trained, and their computational cost and effectiveness compared, when classifying the type of primary under ideal measuring conditions. Additionally, a feature selection algorithm has allowed for identifying the relevance of the considered features. The results confirm the importance of the electromagnetic-muonic component separation from signal data measured for the problem. The obtained results are quite encouraging and open new work lines for future more restrictive simulations.Entities:
Keywords: cosmic rays; deep learning; feature selection; mass composition; ultra high energy
Year: 2020 PMID: 33286767 PMCID: PMC7597327 DOI: 10.3390/e22090998
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Classification report obtained by the classification approach with five and three features over test dataset.
| 5 Features | 3 Features | |||||
|---|---|---|---|---|---|---|
| trn. Time (s.) | Accuracy | f1-Score | trn. Time (s.) | Accuracy | f1-Score | |
| ANN | 48,715 | 0.91 (0.015) | 0.92 (0.012) | 23,957 | 0.76 (0.14) | 0.77 (0.017) |
| XGBoost | 909 | 0.97 (0.002) | 0.97 (0.002) | 843 | 0.87 (0.002) | 0.87 (0.002) |
| SVMs | 9536 | 0.94 (0.003) | 0.94 (0.003) | 10,677 | 0.83 (0.004) | 0.83 (0.004) |
| KNN | 3.59 | 0.78 (0.003) | 0.79 (0.003) | 2.75 | 0.62 (0.006) | 0.63(0.005) |
Hyperparameters obtained for each classification approach with five and three features.
| Classifier | 5 Features | 3 Features |
|---|---|---|
| ANN | 2 layers, | 2 layers, |
| XGBoost |
|
|
| SVMs |
|
|
| KNN |
|
|
Figure 1Confusion matrix for the first test set returned by XGBoost classification with five features.
Figure 2Confusion matrix for the first test set returned by XGBoost classification with three features.
Figure 3Evolution of the test performance on the problem according to the ranking returned by the Markov Blanket Mutual Information Feature Selection (MBFS) algorithm using XGBoost. Hyperparameters of XGBoost were optimized for each feature subset size combination.