| Literature DB >> 32355406 |
Kurt Varmuza1, Peter Filzmoser1, Nicolas Fray2, Hervé Cottin2, Sihane Merouane3, Oliver Stenzel3, John Paquette3, Jochen Kissel3, Christelle Briois4, Donia Baklouti5, Anaïs Bardyn6, Sandra Siljeström7, Johan Silén8, Martin Hilchenbach3.
Abstract
The instrument COSIMA (COmetary Secondary Ion Mass Analyzer) onboard of the European Space Agency mission Rosetta collected and analyzed dust particles in the neighborhood of comet 67P/Churyumov-Gerasimenko. The chemical composition of the particle surfaces was characterized by time-of-flight secondary ion mass spectrometry. A set of 2213 spectra has been selected, and relative abundances for CH-containing positive ions as well as positive elemental ions define a set of multivariate data with nine variables. Evaluation by complementary chemometric techniques shows different compositions of sample groups collected during two periods of the mission. The first period was August to November 2014 (far from the Sun); the second period was January 2015 to February 2016 (nearer to the Sun). The applied data evaluation methods consider the compositional nature of the mass spectral data and comprise robust principal component analysis as well as classification with discriminant partial least squares regression, k-nearest neighbor search, and random forest decision trees. The results indicate a high importance of the relative abundances of the secondary ions C+ and Fe+ for the group separation and demonstrate an enhanced content of carbon-containing substances in samples collected in the period with smaller distances to the Sun.Entities:
Keywords: KNN classification; comet 67P/Churyumov‐Gerasimenko; random forest classification; time‐of‐flight secondary ion mass spectrometry; variable importance
Year: 2020 PMID: 32355406 PMCID: PMC7187198 DOI: 10.1002/cem.3218
Source DB: PubMed Journal: J Chemom ISSN: 0886-9383 Impact factor: 2.467
Class characteristics
| Data | Class 1 | Class 2 |
|---|---|---|
| Period 1, Far Sun | Period 2, Near Sun | |
| Heliocentric distance (AU) | 2.93‐3.57 | 1.24‐2.48 |
| First date of collection begin | August 11, 2014 | January 24, 2015 |
| Last date of collection begin | November 21, 2014 | February 29, 2016 |
| Number of collection periods | 10 | 13 |
| Number of particles | 69 | 157 |
| Particle size (area in image) | ||
| First to third quartiles, μm2 | 1 600‐10 600 | 2 200‐8 800 |
| Median, maximum, μm2 | 9 200, 73 000 | 10 500, 133 000 |
| Number of spectra | 839 | 1 374 |
Predictive abilities obtained by the applied classification methods DPLS, KNN, and RF
| Method | Parameter |
|
|
|
|---|---|---|---|---|
| DPLS |
| 0.39 | 0.94 | 0.67 |
| KNN |
| 0.73 | 0.87 | 0.80 |
| RF | 500 trees | 0.76 | 0.91 | 0.83 |
Note. Means of 100 repetitions in repeated cross validation.
Abbreviations: DPLS: discriminant analysis with partial least squares; KNN: k‐nearest neighbor; RF: random forest.
Figure 1Robust principal component analysis (PCA) with a random sample of 200 spectra from each class; variables were centered log‐ratio‐transformed because of their compositional nature
Figure 2Variation of the predictive abilities of 100 repetitions in repeated cross validation. P 1, samples from first period, far from the Sun; P 2, samples from second period, near the Sun; P is the mean of both
Figure 3Boxplots showing the distributions of selected ion counts (sum 100 normalized) and ion count ratios for samples in class 1 (collected during the first period, far the Sun, left, in blue color) and class 2 (second period, near the Sun, right, red); values higher than the 0.9 quantile are cut
Figure 4Importance of the variables for class separation with multivariate methods. b DPLS standardized regression coefficient of a DPLS discriminant variable; MDA, mean decreasing accuracy from random forest classification