| Literature DB >> 33108372 |
Rolando A Gittens1,2, Alejandro Almanza1, Kelly L Bennett1,3, Luis C Mejía1,3, Javier E Sanchez-Galan1,4, Fernando Merchan5, Jonathan Kern5,6, Matthew J Miller7,8, Helen J Esser9, Robert Hwang10, May Dong10, Luis F De León1,11, Eric Álvarez12, Jose R Loaiza1,3,12.
Abstract
Matrix-assisted laser desorption/ionization (MALDI) time-of-flight mass spectrometry is an analytical method that detects macromolecules that can be used for proteomic fingerprinting and taxonomic identification in arthropods. The conventional MALDI approach uses fresh laboratory-reared arthropod specimens to build a reference mass spectra library with high-quality standards required to achieve reliable identification. However, this may not be possible to accomplish in some arthropod groups that are difficult to rear under laboratory conditions, or for which only alcohol preserved samples are available. Here, we generated MALDI mass spectra of highly abundant proteins from the legs of 18 Neotropical species of adult field-collected hard ticks, several of which had not been analyzed by mass spectrometry before. We then used their mass spectra as fingerprints to identify each tick species by applying machine learning and pattern recognition algorithms that combined unsupervised and supervised clustering approaches. Both Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) classification algorithms were able to identify spectra from different tick species, with LDA achieving the best performance when applied to field-collected specimens that did have an existing entry in a reference library of arthropod protein spectra. These findings contribute to the growing literature that ascertains mass spectrometry as a rapid and effective method to complement other well-established techniques for taxonomic identification of disease vectors, which is the first step to predict and manage arthropod-borne pathogens.Entities:
Mesh:
Substances:
Year: 2020 PMID: 33108372 PMCID: PMC7647123 DOI: 10.1371/journal.pntd.0008849
Source DB: PubMed Journal: PLoS Negl Trop Dis ISSN: 1935-2727
Fig 1Optical micrographs of Neotropical hard ticks.
The image shows the dorsal and ventral sides for all 18 species of hard ticks in the genera Amblyomma, Dermacentor, Haemaphysalis, Ixodes, and Rhipicephalus used to generate protein spectra with our MALDI mass spectrometry approach.
Fig 2Baseline-corrected and smoothed spectra for 18 species of ticks in the genus Amblyomma, Dermacentor, Haemaphysalis, Ixodes and Rhipicephalus.
Major ion peaks and their molecular weights are annotated in the range of 2,000 to 20,000 m/z for all species.
Description of specimens subjected to analysis with the MALDI mass spectrometry procedure.
| Species Name | # of specimens | Locality code | # of expected spectra | # of obtained spectra | MALDI automatic spectra acquisition rate (%) |
|---|---|---|---|---|---|
| 4 | a | 12 | 6 | 50% | |
| 5 | a, b | 15 | 15 | 100% | |
| 4 | c | 12 | 9 | 75% | |
| 4 | d | 12 | 12 | 100% | |
| 4 | a | 12 | 10 | 83% | |
| 4 | a, e | 12 | 8 | 67% | |
| 4 | e | 12 | 11 | 92% | |
| 4 | e | 12 | 11 | 92% | |
| 3 | f | 9 | 9 | 100% | |
| 4 | g | 12 | 9 | 75% | |
| 5 | f | 15 | 9 | 60% | |
| 26 | e, g | 78 | 56 | 72% | |
| 4 | e | 12 | 12 | 100% | |
| 4 | e | 12 | 6 | 50% | |
| 4 | c | 12 | 9 | 75% | |
| 6 | a, e | 18 | 11 | 61% | |
| 10 | c, d | 30 | 30 | 100% | |
| 4 | a | 12 | 6 | 50% | |
(a) = Panama: West Panama, Las Pavas; (b) = Panama: Colon, Madden Road; (c) = Panama: Colon, Achiote; (d) = Panama: West Panama, Capira; (e) Panama: Colon, Barro Colorado Island; (f) Panama: Colon, Sierra Llorona Lodge; (g) Panama: Colon, Gamboa. (*) Indicates some specific specimens that upon collection were stored fresh in Silica Gel (For more metadata information about these samples see also S1 Table).
Performance of PCA and LDA clustering algorithms.
| Species Name | PCA Positive Identification Rate (%) | LDA Positive Identification Rate (%) | Spectra per Class | # Training Elements | # Test Elements |
|---|---|---|---|---|---|
| 100.0% | 100.0% | 6 | 4000 | 2000 | |
| 100.0% | 99.6% | 15 | 12000 | 3000 | |
| 67.6% | 67.6% | 9 | 7000 | 2000 | |
| 99.1% | 99.6% | 12 | 9000 | 3000 | |
| 100.0% | 100.0% | 10 | 8000 | 2000 | |
| 100.0% | 100.0% | 8 | 6000 | 2000 | |
| 100.0% | 100.0% | 11 | 8000 | 3000 | |
| 99.8% | 99.0% | 11 | 8000 | 3000 | |
| 69.3% | 85.9% | 9 | 7000 | 2000 | |
| 99.8% | 100.0% | 9 | 7000 | 2000 | |
| 100.0% | 100.0% | 9 | 7000 | 2000 | |
| 97.8% | 97.8% | 56 | 44000 | 12000 | |
| 21.7% | 45.6% | 12 | 9000 | 3000 | |
| 90.9% | 97.8% | 6 | 4000 | 2000 | |
| 84.0% | 89.5% | 9 | 7000 | 2000 | |
| 96.8% | 98.8% | 11 | 8000 | 3000 | |
| 93.1% | 98.7% | 30 | 24000 | 6000 | |
| 100.0% | 100.0% | 6 | 4000 | 2000 | |
Fig 3Principal component analysis (PCA) of individual species plotted against first, second and third principal components (PC).
All species were classified using a Monte Carlo simulation with 1000 iterations, in which 80% of the samples were used as training set (⎕) and the remaining 20% as test set (• for positive identifications and + for negative ones). The cluster centroid of each species is also presented in the graph (⋄). The plots show (A) the training and test sets for the species belonging to the Dermacentor, Haemaphysalis, Ixodes and Rhipicephalus genera, and (B) only the test sets for better visualization; as well as the training set and test set of (C) Amblyomma species alone or (D) Amblyomma in combination with Ixodes genera. The unsupervised PCA algorithm had a global positive identification rate of 91.2%. These 3D plots represent only one of the 1000 Monte Carlo iterations performed with the algorithm.
Fig 4Linear Discriminant Analysis (LDA) applied to spectra from tick species of the genera Amblyomma, Dermacentor, Haemaphysalis, Ixodes and Rhipicephalus.
The plots show (A) the training and test sets for species in the Dermacentor, Haemaphysalis, Ixodes and Rhipicephalus genera projected over the first three components of the LDA, as well as (B) only the test set for better visualization; and also the training and test sets for (C) the Amblyomma genus alone, as well as (D) the Amblyomma genus compared to the Ixodes genus. These 3D plots represent only one of the 1000 Monte Carlo iterations performed with the algorithm. The supervised LDA algorithm had a 94.2% global positive identification rate.