Literature DB >> 35297608

Utilization of Machine Learning for the Differentiation of Positional NPS Isomers with Direct Analysis in Real Time Mass Spectrometry.

Jennifer L Bonetti^1,2, Saer Samanipour¹, Arian C van Asten^1,3.

Abstract

The differentiation of positional isomers is a well established analytical challenge for forensic laboratories. As more novel psychoactive substances (NPSs) are introduced to the illicit drug market, robust yet efficient methods of isomer identification are needed. Although current literature suggests that Direct Analysis in Real Time-Time-of-Flight mass spectrometry (DART-ToF) with in-source collision induced dissociation (is-CID) can be used to differentiate positional isomers, it is currently unclear whether this capability extends to positional isomers whose only structural difference is the precise location of a single substitution on an aromatic ring. The aim of this work was to determine whether chemometric analysis of DART-ToF data could offer forensic laboratories an alternative rapid and robust method of differentiating NPS positional ring isomers. To test the feasibility of this technique, three positional isomer sets (fluoroamphetamine, fluoromethamphetamine, and methylmethcathinone) were analyzed. Using a linear rail for consistent sample introduction, the three isomers of each type were analyzed 96 times over an eight-week timespan. The classification methods investigated included a univariate approach, the Welch t test at each included ion; a multivariate approach, linear discriminant analysis; and a machine learning approach, the Random Forest classifier. For each method, multiple validation techniques were used including restricting the classifier to data that was only generated on one day. Of these classification methods, the Random Forest algorithm was ultimately the most accurate and robust, consistently achieving out-of-bag error rates below 5%. At an inconclusive rate of approximately 5%, a success rate of 100% was obtained for isomer identification when applied to a randomly selected test set. The model was further tested with data acquired as a part of a different batch. The highest classification success rate was 93.9%, and error rates under 5% were consistently achieved.

Entities: Chemical

Mesh：

Year: 2022 PMID： 35297608 PMCID： PMC8968871 DOI： 10.1021/acs.analchem.1c04985

Source DB: PubMed Journal: Anal Chem ISSN： 0003-2700 Impact factor: 6.986

Introduction

The previous decade has been marked by a considerable increase in the complexity and diversity of novel psychoactive substances (NPSs) arriving on the illicit drug market. As of December 2020, over 1000 new NPSs have been reported, nearly double the number reported by the end of 2019 and nearly eight times as many as in 2009.[1] This onset of chemical complexity in casework has presented a substantial challenge to forensic chemists as techniques that have been the gold standard for decades (gas chromatography coupled with mass spectrometry (GC-MS) combined with nonspecific presumptive testing methods) are typically incapable of differentiating highly similar NPS structures. A common struggle is the conclusive identification of positional ring isomers, compounds whose structures only differ in the precise location of a substituent on an aromatic ring. This challenge is of particular interest because positional NPS isomers are not always controlled at the same level. In The Netherlands, 4-methylmethcathinone (4-MMC) is a List I compound and thus controlled as a “hard drug”, whereas its positional ring isomer 3-MMC was only recently classified as a List II chemical (“soft drug”)[2] and 2-MMC is currently not regulated at all. Many different approaches have been investigated for improving the ease with which forensic experts can confidently differentiate positional ring isomers. Some recent successful approaches include GC-IRD,[3] GC-VUV,[4−6] UPLC,[7] product ion spectrometry,[8] and infrared ion spectroscopy.[9] Since GC-MS is so commonly utilized in the testing of novel psychoactive substances in forensic laboratories, the analysis of these data has been improved using chemometric analysis as a means of enhancing the technique’s isomeric differentiation capabilities.[5,10−19] Some of these studies used a univariate approach where each m/z fragment is analyzed separately with an unequal variance t test.[10−12] Other studies have seen success with a multivariate approach such as principal component analysis (PCA) followed by linear discriminant analysis (LDA),[5,13] LDA with different variable selection techniques,[14] hierarchal clustering analysis,[15] and canonical discriminant analysis.[16,17] Another study used PCA solely as a means to highlight which m/z ions to further investigate for differentiation.[18] Despite the success of these currently available techniques, many require significant analysis time, either through additional sample preparation such as derivatization[19] or simply the more substantial time required for a typical GC analysis, particularly since a blank, sample, and standard would each require a separate injection. In contrast, the minimal sample preparation required for Direct Analysis in Real Time–Time-of-Flight mass spectrometry (DART-ToF) in combination with its rapid analysis time make it an underutilized resource for tackling the NPS challenge. Given its high throughput and intrinsic selectivity, DART-ToF has been successfully introduced into forensic laboratories.[20−22] The DART source was introduced in 2004,[23] and its many forensic applications have been extensively reviewed.[24] Several publications note the instrument’s ability to differentiate positional drug isomers, particularly when combined with in-source collision induced dissociation (is-CID), typically introduced through function switching of the Orifice 1 voltage.[25,26] However, these publications do not touch on aromatic ring isomers in particular, focusing instead on positional isomers that involve more substantial structural changes such as hydrocodone vs codeine[25] and cathinones with changes to the carbon backbone.[26] These structural changes result in clear visual differences in the DART-ToF spectra, allowing the forensic expert to confidently assign the correct structure. These notable differences are absent from the DART spectra of aromatic ring isomers, making differentiation via visual examination challenging, if not impossible. Further complicating the issue is the presence of ambient ionization which, while largely responsible for many of the strengths of the instrument, results in a considerable amount of variation from one analysis to the next. Therefore, if NPS ring isomer differentiation is to be attempted based on DART-ToF data, more advanced data analysis techniques must be investigated. The results of this study showcase the improved ability of the Random Forest classifier over other more common chemometric methods to highlight small but reproducible differences in DART-ToF spectra for the purpose of NPS isomer differentiation. Three sets of positional ring isomers were studied, including fluoroamphetamine (FA) and fluoromethamphetamine (FMA), both of which have para-isomers that have recently gained popularity in The Netherlands,[27,28] and the methylmethcathinones (MMCs). The structures are provided in Figure S1 of the Supporting Information. To our knowledge, this work is the first reported attempt to differentiate ring isomers of psychoactive substances with DART-ToF.

Experimental Section

Reagents and Materials

The nine isomers were purchased as primary standards in the form of hydrochloride salts from Cayman Chemical (Ann Arbor, MI) with the exception of 4-MMC, which was purchased from Sigma-Aldrich (St. Louis, MO). The isomer identity of each standard was confirmed using Fourier-transform infrared spectroscopy (FTIR). The solvents used were GC grade methanol and multipurpose use chloroform from Honeywell B&J Brand (Muskegon, MI). Solutions were prepared as the hydrochloride salt at a base concentration of 0.1 mg/mL. The phenethylamines were prepared in chloroform, while the cathinones were prepared in methanol per recommendations from Ciallella et al.[29] All solutions were stored at −20 °C when not in use. The polyethylene glycol (PEG) 600 (Tokyo Chemical Industry, Co. Portland, OR) calibrator solution was prepared at a concentration of 0.5 mg/mL in methanol. As described in the seminal DART-ToF publication, this solution results in a spectrum with fragments of known accurate mass across the m/z range and can therefore be used to calibrate the mass axis of the instrument.[23] A drug mix consisting of approximately 161:8:32 μg/mL of a cocaine base (Sigma-Aldrich), methamphetamine HCl (Sigma-Aldrich), and nefazadone (Alfa Aesar, Haverhill, MA) in methanol was utilized for quality assurance (QA) to ensure successful calibration for all analytical runs, which is consistent with current casework procedures.[21]

Data Acquisition

Data were acquired using a Direct Analysis in Real Time (Model DART SVP, IonSense, Saugus MA) AccuToF (Model JMS-T100LP, JEOL USA, Peabody MA) system equipped with a linear rail according to the analytical parameters listed in Table S1. The linear rail was utilized with QuickStrip cards from IonSense to minimize spectral variation caused by sample introduction. Over a period of 8 weeks in early 2021, four sample QuickStrip cards were analyzed per isomer type on a weekly basis. Each QuickStrip card had 12 separated mesh locations for sample deposition. The QuickStrip card setup is shown in Table S2. For the calibrator, drug mix, and solvent blank, 3 μL spots were deposited onto the appropriate mesh locations using a Gilson (Middleton, WI) M25 pipette. Each drug compound was run in triplicate on each QuickStrip card by placing 3 μL of the 0.1 mg/mL drug solutions in each sample location for two of the four QuickStrip cards and 6 μL per sample location for the other two QuickStrip cards. In total, this resulted in 96 analyses per compound, or 288 analyses per isomer type. Each QuickStrip card, containing nine separate drug analyses, had an elapsed run time of between 6 and 6.5 min. Therefore, the entire data set for each isomer type was generated within 3.5 h of instrumental run time over the analysis period. Each Orifice 1 voltage utilized via the function switching setting produced a “total ion chronogram” (response vs time). All spectra were collected using the MassCenter software (Jeol USA, Peabody, MA) by averaging the instrument response across the width of the appropriate peak. In the case of the drug samples, the spectra were background subtracted by subtracting the average response over the solvent blank region of the chronogram. All spectra were centroided with an abundance threshold of 120. The 30 V spectrum of the PEG calibrator solution for each QuickStrip card was used to calibrate the mass axis for the data file, and the 30 V QA drug mix spectrum was used to confirm successful calibration with an acceptance criteria of ±5 mDa for each test compound.[21] Sample spectra were collected from the 30, 60, and 90 V total ion chronograms and saved as .jsp files. A Perl script was used to convert all of the drug spectra for each isomer type into a single .csv file containing spectra for all three Orifice 1 voltages in terms of raw abundance in m/z bins of 0.025 Da width. The bins were named for the upper limit. For example, the 109.050 bin contained abundance values for m/z 109.025–109.050 Da. While the primary data for this study were acquired in early 2021, an external validation set of FMA data was generated, originally for exploratory purposes, using 1.0 mg/mL solutions in the latter half of 2020. This data set included 11 QuickStrip cards of data, set up in the same manner as Table S2 and using the same instrument parameters as seen in Table S1. The first three QuickStrip cards used 10 μL depositions, and the latter eight used 3 μL depositions. It was eventually determined that this concentration was too high with a potential risk of carryover between the samples, which is why this data set was not included in the main study. However, due to these differences from the main data set, both in terms of time between analyses and concentrations used, this data set was ultimately used to further test the robustness of the Random Forest classifier.

Data Analysis

Calculations

All variable selection, normalization, and subsequent data analysis was performed using R (R version 4.0.0, Rstudio Version 1.2.5042, Boston, MA). The packages utilized were dplyr, stringr, xlsx, ggplot2, ggpubr, randomForest, ggrepel, grid, gridExtra, gtable, MASS, reshape2, and matrixStats. The github repository containing all code is available online.[30]

Variable Selection and Normalization

To limit the data set to the m/z bins of interest, several steps were taken. To start, m/z bins were not specific to a given voltage, meaning that a single sample had three separate spectra of data, one for each voltage. Each m/z bin was investigated to determine if it met the “percent abundance threshold“ set for the data set. To do this, each spectrum was temporarily normalized to a base peak value of 100%. Any m/z bin for which at least 50 spectra did not meet the set percent abundance threshold was discarded. For the MMC isomers, the percent abundance thresholds were set at 1% and 10%. For the FA and FMA isomers, the 1% threshold resulted in minimal to no additional bins when compared to the 10% threshold. Therefore, the percent abundance thresholds were set at 0.3% and 10% for these two isomer types. This base peak normalization was only performed to identify m/z bins above the percent abundance threshold. As alternative normalization techniques were explored, the original raw abundance values were utilized in the remaining data analysis steps. The reduced data set was then separated by voltage for further variable reduction. It was common for some m/z bins to reach the overall percent abundance threshold but not substantially contribute to all voltages. Each voltage was examined to identify the most relevant m/z bins. A step-by-step illustration of this process can be found in Figure S3. The final resulting m/z bins for each voltage for each isomer type are shown in Table S3. Two types of normalization were investigated. In both cases, spectra were normalized per voltage. The first type was referred to as “ion current” in which the abundance for each m/z bin was normalized by the sum of the abundance values for all retained m/z bins in the same voltage. In other words, each abundance was normalized by the total ion current generated only by the selected m/z bins. The second normalization was referred to as “vector length”. In this case, the abundance values for a given voltage were considered as a vector, and this vector was normalized to a unit vector by dividing by its length. Therefore, the abundance for each m/z bin was normalized by the square root of the sum of the abundance values of all m/z bins at the same voltage. For both normalization procedures, only contributions from the final selected m/z bins were considered. An example of each normalization calculation procedure is shown in Table S4. After normalization, the m/z bins from all three voltages were combined into one row of data per sample, with one normalized abundance value per m/z bin and voltage combination. In this manner, each m/z bin/voltage combination was one separate variable. To preprocess the external validation data set, the same m/z bins that were identified for the main FMA data set were retained. Again, both vector length and ion current normalization methods were studied.

Multiway Analysis of Variance

A multiway Analysis of Variance (ANOVA) was performed on each of the final normalized m/z bins based on isomer identity, the week the analysis was performed, and the volume used for sample deposition. For the three isomer types, this was performed separately for both normalization types as well as for both percent abundance thresholds studied.

Univariate Analysis

The first method of analysis performed was based on the statistical comparison of electron impact (EI) mass spectral data as introduced by Bodnar Willard et al.[31] and applied in numerous studies[10−12] in which the unequal variance t test (Welch test) is applied at each m/z fragment. In this manner, it is determined whether the mean abundance at any given m/z fragment is distinguishable between an unknown sample and a known isomer. The comparison of an unknown to its correct isomer identity should result in a higher number of indistinguishable m/z fragments than the comparison of the unknown sample to an incorrect isomer identity. To apply this method, a leave-one-sample-out cross-validation approach was employed where “one” refers to each QuickStrip card’s worth of data. For each triplicate of data from the excluded QuickStrip card, the Welch test was performed at each m/z bin to compare the test triplicate to the remaining data for each known isomer separately. The number of indistinguishable m/z bins was counted for each comparison. This entire analysis was conducted 20 times, with a randomly selected confidence level between 90 and 99.999% and a randomly selected normalization technique. To test the accuracy of using this analysis as a classification method, it was investigated whether there exists a threshold of indistinguishable m/z bins that would only be reached by a comparison of a test sample with its correct isomer identity. Receiver Operator Characteristic (ROC) curves were generated by plotting the false positive rate (FPR) vs the true positive rate (TPR) based on an increasing number of indistinguishable m/z bins for each replicate analysis. In addition, ROC curves were also prepared using the average FPR and TPR of the 20 replicates in order to assess the general robustness and accuracy of this type of classification. An alternate method was attempted in which the results (number of indistinguishable m/z bins) for the comparisons of the test triplicate to each of the three isomers were compared to one another. In this case, rather than a set number of indistinguishable m/z bins needing to be achieved, the test set was classified as belonging to whichever isomer had the highest number of indistinguishable m/z bins provided the minimum difference when compared to the other two isomers exceeded a selected threshold. If this threshold difference was not met, the result was considered inconclusive. For both methods, a second analysis was performed wherein the training data set was only made up of the QuickStrip cards that were analyzed on the same day as the excluded test QuickStrip card. The accuracy of each m/z bin was assessed by dividing the sum of the true positives and true negatives by the total number of comparisons.

Multivariate Classification

The multivariate method applied was Linear Discriminant Analysis. As discussed above, multivariate methods including LDA have been used successfully for the differentiation of EI/MS data. In addition, Easter and Steiner used both PCA and LDA when validating the isomer differentiation potential of DART-ToF for pharmaceutical confirmation.[25] LDA was performed using the normalized abundance values for all included m/z bins with no additional variable reduction step. LDA assumes a multivariate normal distribution and equivalent variance for all classes to calculate the posterior probability that an unknown sample belongs to each class. The prior probability that an unknown sample could belong to a particular class given no experimental data was set to be equal for all isomer identities. The accuracy of this supervised classification method was assessed using leave-one-sample-out cross-validation as well as randomly separating the data set into 80% training and 20% test sets. For both validation methods, two types of conclusions were tested. First, the removed sample(s) were classified by simply assigning the sample(s) to the isomer with the highest calculated posterior probability. Subsequently, the posterior probabilities were compared to an acceptance criteria of sequentially increasing thresholds (0.50–0.95, stepwise by 0.05). If the highest calculated posterior probability for the classification of an excluded sample to an isomer was above the threshold, then this assignment was compared to the actual isomer identity to determine if the classification was correct or erroneous. If the posterior probability was below the threshold, the result was considered inconclusive. The validation procedure described above was also applied to the data collected on a single day. As with the full data set analysis, conclusions were based on both the isomer resulting in the highest posterior probability as well as by incrementally increasing thresholds in comparison to the posterior probabilities of the excluded sample(s). The influence of each m/z bin on both linear discriminant functions was determined by the scaling coefficient for each variable. A high absolute coefficient value indicated that the given m/z bin had a high impact on the transformation onto that particular axis.

Machine Learning Algorithm

Finally, a machine learning technique was utilized. The Random Forest classifier, introduced by Breiman in 2001, utilizes a large number of uncorrelated decision trees acting as an ensemble to result in a more robust classifier than any single decision tree by itself.[32] The lack of correlation is partly ensured through bootstrapping, wherein the number of samples in the data set for a given decision tree in the random forest is equal to the number of samples in the original data set. However, due to random sampling-with-replacement, the exact makeup of the data set will differ between trees. In addition, for any given node of a decision tree, the variables available for selection are selected from a limited random subset of the original variables. Once the “forest” of decision trees is generated, unknown samples are classified based on which class is the assigned result for the highest proportion of decision trees. For this study, default parameters in the random Forest package in R were utilized in order to explore the feasibility of applying this technique for the purpose of isomer classification. Therefore, the number of decision trees used was 500, and the number of variables in the subsets at each node was equal to the square root of the total number of variables. To assess the robustness of the classification, “out-of-bag” samples were used in which each sample was analyzed using the decision trees which were unaffected by this specific sample. Each of these trees gave a classification result, and the proportion of trees assigning the sample to each isomer was determined. Typically, the samples would be classified as whichever isomer was assigned the highest proportion of decision trees. To avoid making conclusions in cases where there was not a clear distinction between the isomers, the highest resulting proportion of trees was instead evaluated against an acceptance criterion of incrementally increasing thresholds (0.50–0.95, stepwise by 0.05) before reaching a conclusion. If the threshold was met, the classification was compared to the known isomer identity to determine if the classification was either correct or flawed. If the highest proportion of trees was below the threshold, the result was considered inconclusive. For both LDA and the Random Forest classifier, thresholds were introduced after the models were generated and thus did not change the models themselves or the output of the classifiers. They only served to determine whether a classification met an acceptance criterion in order to make a conclusion. Thus, while increasing thresholds could reduce the type I (false positive) error rate (and/or true positive ”success” rate) of a series of analyses, they could not serve to change the resulting classification of a sample from one isomer to another, only from a potentially erroneous conclusion to an inconclusive result. In addition to out-of-bag samples, the data set was also randomly separated into 80% training and 20% test sets. The test set samples were then analyzed by all 500 decision trees, and the same threshold comparison was performed. As with the univariate and multivariate classifiers, the analysis was also performed using only samples analyzed on the same day. Out-of-bag samples were again utilized as well as selecting each QuickStrip card as a test set and generating the forest using the remaining QuickStrip cards for a specific day. Permutation of the variables was applied as a sensitivity analysis in order to assess the impact of each m/z bin.[32] After determining the out-of-bag accuracy rate for the full data set (classifications based solely on the isomer assignment with the highest proportion of trees), the abundance values in each m/z bin were randomly shuffled among samples. After each permutation, the out-of-bag samples were classified again, and the accuracy of the classifier was re-established. The importance of each variable was determined by how much its permutation affected the accuracy and is reported as the Mean Decrease Accuracy (MDA). To assess the robustness of the classifier to the removal of low impact variables, a series of 12-fold cross validations were performed with removal of the least important variables. Half of the variables were removed sequentially until a single variable remained.

Results and Discussion

ANOVA, Distribution and Stability of Isomer Abundance

To illustrate the challenge in differentiating the DART-ToF spectra of these isomers, an example is shown in Figure S2, which was generated using the function switching technique and showing the data at three different voltages, the two highest of which (60 and 90 V) produce fragmentation patterns due to is-CID. The isomer spectra are visually indistinguishable, which is why three chemometric techniques were applied in an effort to statistically determine whether any latent differences exist that can be used for isomer classification. An example of the density distribution for six m/z bins of the FMA data set can be found in Figure . Mean abundance values with associated standard deviation for the lowest percent abundance threshold data sets with ion current normalization are shown in Tables S6–S8.

Figure 1

Density distribution of the top six most discriminating m/z bins for FMA isomers using 0.3% abunndance threshold and ion current normalization. Top six determined by increasing ANOVA p value based on isomer identity. In order to assess whether statistical differences existed in the data set, a multiway ANOVA was performed on each m/z bin based on isomer identity, analysis week, deposition volume, and the interactions between those three factors. The results of the multiway ANOVA showed that for nearly all m/z bins, the isomer identity had a significant effect on the mean abundance (p < 0.05). For example, for the FA data set with a 0.3% abundance threshold, the average p value was 3.63 × 10–2 with a standard deviation of 1.78 × 10–1 when ion current normalization was applied. Only one m/z bin (109.050 30 V) showed a nonsignificant difference based on isomer identity (p = 8.71 × 10–1). If removed, the average p value for this data set was 1.57 × 10–5 with a standard deviation of 5.42 × 10–5. Additional ANOVA results based on isomer identity, week, and volume are shown in Table S5. The sample deposition volume had a significant effect on the mean abundance for approximately half of the m/z bins for all three isomer types. This proportion of m/z bins was seen regardless of the percent abundance threshold applied, suggesting that it was not only the low abundance m/z bins that showed significant differences based on deposition volume. The analysis week had a significant impact on the mean abundance for most of the bins for all isomers, showing similar p value ranges and number of significant m/z bins to the isomer identity analysis. In addition, the interaction between isomer identity and the week of analysis also frequently had a significant effect on the mean abundance. To investigate this further, plots were generated for each bin showing the change in mean abundance per isomer over time with error bars showing the standard deviation. Although there were some changes week-to-week, when these plots were produced as a function of an accumulating data set (week 1; weeks 1 and 2; weeks 1, 2, and 3; etc.) the changes seemed to stabilize for most bins and most isomers after 4–5 weeks. As the sampling area of the instrument is open to the atmosphere, small changes in relative ion abundance are expected when performing analyses on different days due to small changes in the immediate environment. The collection of data over multiple days seems to ensure this variability is normalized, which will enhance the changes in abundance values due to isomer identity. An example of a week-to-week plot and its corresponding accumulating data set plot is shown in Figure S4 and Figure S5, respectively.

Univariate Classification

As seen in an example in Figure as well as in Figures S6 and S7, the Welch test method performed better than a random classifier but was ultimately not robust or accurate enough for use in a forensic laboratory. For each type of isomer, applying the lower percent abundance threshold consistently resulted in the best classification results as this produced a higher number of variables (m/z bins) for the comparison. The procedure of comparing the test QuickStrip card to a data set containing only other QuickStrip cards analyzed on the same day did perform marginally better than using the full data set, but overall the selectivity was still insufficient for casework. The results for the individual replicates of the Welch test analysis did not vary substantially from one another.

Figure 2

Receiver operator characteristic curves for an average of 20 replicates of Welch test analysis of FMA isomers, using a 0.3% abundance threshold for m/z bin selection using (a) full data set (AUC = 0.7131) and (b) same day restricted data set (AUC = 0.8346). Dotted line approximating a random classifier of AUC = 0.5 is shown.

Multivariate Classification

Linear Discriminant Analysis was more successful than the Welch test at discriminating the isomers, albeit with some limitations. As seen in Figure , as well as in Figures S8 and S9, the data sets with the lower percent abundance threshold and therefore more m/z bins to use for LDA showed more separation between the most similar isomers. For the FA and FMA isomers, the most challenging isomer pair was the 2- and 3-isomers (ortho and meta), whereas for the MMC isomers, the 3- and 4- (meta and para) isomers were most difficult to differentiate. For the MMC isomers with the 1% abundance threshold, the same-day analysis resulted in an error due to the bin containing m/z 129.075 90 V data producing a within-class variance below the default allowed tolerance (1.0 × 10–8). This bin was therefore removed for this analysis.

Figure 3

Linear discriminant analysis scores plots for training vs test sets of FMA data set using ion current normalization for (a) full data set 0.3% threshold, (b) full data set 10% threshold, (c) same day analysis (first week, first QuickStrip card as test set) 0.3% threshold, (d) same day analysis (first week, first QuickStrip card as test set) 10% threshold. Following the example set by Kranenburg et al., the likelihood ratios (LR) for the conclusions were calculated by dividing the highest posterior probability by the second highest posterior probability.[5] Calibration of LR values has been the focus of many recent forensic studies.[33−37] However, for this study, no thorough validation of the likelihood ratios was conducted due to the suspected overfitting of the LDA model. Thus, the following LR results should be used for indicative purposes only. For the example shown in Figure , the LR values ranged from 60 to 1.72 × 10120, with the LR of 60 arising from the only sample that was misclassified. However, when the percent abundance threshold was raised to 10%, there was less differentiation between the FMA isomers, particularly 2-FMA and 3-FMA. Using the same normalization, there were now nine misclassifications (LR range from 1.1 to 86) with considerable LR overlap with the successful classifications (1.2–9.5 × 10113). Twenty-eight of the correct classifications (approximately 48% of the total conclusions) resulted in LR values that overlapped with those resulting from erroneous conclusions. In some instances, same-day analysis showed excellent separation and 100% successful classification of the test set samples without the use of thresholds to rule out inconclusive samples, even with the higher percent abundance threshold data sets. However, this success was not consistent among all QuickStrip cards and all weeks, with several test sets giving error rates as high as 44.44% (four of nine samples on the test set QuickStrip card were misclassified). The misclassifications appeared to be due to a combination of overfitting of the LDA model and insufficient separation of especially similar isomers. In addition, some of the misclassified samples produced very high likelihood ratios. For example, within the FMA data set using the 0.3% abundance threshold and ion current normalization, a 3-FMA sample from the third QuickStrip card analyzed on week five was misclassified as 2-FMA with a likelihood ratio of 7.2 × 1036. These high error rates were only partially reduced with the use of thresholds, as discussed later. In all cases, any separation between the most similar isomers was only seen along the second linear discriminant function axis, which frequently explained a limited fraction of the between-to-within class variance present in the data set. This suggests a high degree of correlation between the variables, which was indeed found to exist as illustrated through the heat maps in Figures S10–S12.

Machine Learning and Classification of External Validation Data

As seen in Figure , even without the use of thresholds to reduce error rates, the Random Forest classifier already performed quite well and maintained a low error rate for all three isomer types even as the number of variables was decreased. For the FA and FMA data sets with the higher percent abundance threshold, these error rates were higher and more affected by the variable reduction. This suggests that there is important discriminating information present in the lower abundance m/z bins.

Figure 4

Random Forest 12-fold cross-validation error rates observed after removing the least important variables sequentially. The x axis is shown on a logarthmic scale. All examples are from data sets with the lowest percent abundance threshold applied and ion current normalization. (a) FA, (b) FMA, (c) MMC. When thresholds were incorporated, and the data set was split into 80% training and 20% test sets, the Random Forest classifier frequently achieved a 0% error rate associated with inconclusive rates of approximately 5% or lower, as seen in Table , particularly for the lower percent abundance threshold data sets. The phenethylamines in particular were more affected by the higher percent abundance thresholds, with the FMA 10% abundance threshold data set showing a test set error rate of 14% at a proportion of decision trees threshold of 50% when using an ion current normalization. However, that error rate was reduced to 5.2% at a decision tree threshold of 60%, increasing the inconclusive rate to 19%. In contrast, the highest error rate achieved with the MMC test set using the higher percent abundance threshold was only 5.2%, and the same data set achieved a 0% error rate at a proportion of decision trees threshold of 65% with an inconclusive rate of 14%. For all isomers, this suggests that low abundance ions have an impact on the ability of the method to differentiate the isomers.

Table 1

Classification Results for Random Forest 20% Test Sets Using Ion Current Normalization and Lowest Percent Abundance Threshold

		threshold
		0.50	0.55	0.60	0.65	0.70	0.75	0.80	0.85	0.90	0.95
FMA	success	0.983	0.966	0.948	0.948	0.914	0.896	0.862	0.758	0.603	0.362
	inconclusive	0	0.017	0.052	0.052	0.086	0.104	0.138	0.242	0.397	0.638
	error	0.017	0.017	0	0	0	0	0	0	0	0
FA	success	0.983	0.983	0.948	0.948	0.914	0.896	0.810	0.810	0.620	0.310
	inconclusive	0	0	0.052	0.052	0.086	0.104	0.190	0.190	0.380	0.690
	error	0.017	0.017	0	0	0	0	0	0	0	0
MMC	success	1	0.983	0.983	0.983	0.983	0.948	0.914	0.775	0.620	0.431
	inconclusive	0	0.017	0.017	0.017	0.017	0.052	0.086	0.225	0.380	0.0.569
	error	0	0	0	0	0	0	0	0	0	0

Since the Random Forest classifier gave the lowest consistent error rates for the main isomer data sets, the external validation data set was used to challenge the model. In addition to utilizing the thresholds of proportions of decision trees to determine whether a conclusion was made for each sample, it was also investigated whether making only one conclusion per triplicate rather than one per sample would further improve the accuracy of this method. This is possibly more in line with how forensic laboratories would integrate this method into their current workflow. The first method tested was that all three samples in the triplicate had to result in the same conclusion. If they did not, the result was inconclusive. If they did, the average proportion of decision trees among all three samples was compared to the thresholds. The second method allowed for only two of the three samples in the triplicate to reach the same conclusion (majority voting). If at least two samples gave the same isomer, the proportions of trees for the identifying samples were averaged and compared to the thresholds. Finally, the third method of reaching a conclusion was to only record a result for the sample in the triplicate which amassed the highest proportion of trees. This sample’s proportion of trees was then compared to the thresholds and used to reach a conclusion. Despite the differences between the main data set and the external validation data set, the Random Forest classifier built with the main data set was still successful in most instances at classifying the external validation data set. This success was typically further improved when using the triplicate data of each sample to arrive at a final conclusion as shown in Table . Interestingly, for most proportions of trees thresholds, the data set with the higher abundance threshold actually showed somewhat reduced error rates, although this frequently could be attributed to increased inconclusive rates. It is also important to note here that the external validation data set was only made up of 99 samples, or 33 triplicates. So, a 3% error rate when only one conclusion is made per triplicate is equivalent to only one misclassified triplicate.

Table 2

Random Forest Classification Results for External Validation FMA Data Set −0.3% Ion Current (Bottom Three Sections Show One Conclusion Made Per Triplicate Analysis)

		threshold
		0.50	0.55	0.60	0.65	0.70	0.75	0.80	0.85	0.90	0.95
all samples	success	0.828	0.657	0.515	0.444	0.384	0.313	0.040	0	0	0
	inconclusive	0.081	0.283	0.465	0.535	0.606	0.677	0.960	1	1	1
	error	0.091	0.061	0.020	0.020	0.010	0.010	0	0	0	0
all 3 match	success	0.727	0.576	0.485	0.394	0.364	0.333	0.030	0	0	0
	inconclusive	0.242	0.394	0.485	0.606	0.636	0.667	0.970	1	1	1
	error	0.030	0.030	0.030	0	0	0	0	0	0	0
2 or more	success	0.939	0.758	0.606	0.394	0.364	0.333	0.030	0	0	0
	inconclusive	0	0.182	0.364	0.606	0.636	0.667	0.970	1	1	1
	error	0.061	0.061	0.030	0	0	0	0	0	0	0
highest	success	0.879	0.788	0.667	0.576	0.424	0.394	0.091	0	0	0
	inconclusive	0	0.091	0.242	0.333	0.545	0.576	0.909	1	1	1
	error	0.121	0.121	0.091	0.030	0.030	0	0	0	0	0

Comparison of the Three Classification Methods

For all three chemometric methods, the introduction of thresholds served to reduce the error rates by labeling the less convincing classifications as inconclusive. However, while in some cases the LDA error rate decreased, in many instances changing the thresholds did not affect the results to the same extent as observed for the Welch test and the Random Forest classifier. This is likely due to some degree of overfitting, resulting in extremely high posterior probabilities even for incorrect classifications. Although LDA often resulted in low error rates, those rates could not be reduced to zero through threshold adjustment as was often feasible for the Random Forest classifier. The Welch t test typically gave the highest error rates of the three classifiers, and while this could be decreased to 0% with increasing thresholds, it always came at the cost of an extremely high inconclusive rate (>80%). For casework, this would mean that in a vast majority of the cases, follow-up analyses using other techniques would be required, effectively nullifying the benefits of using DART-ToF. Due to the speed of the instrumental technique and the ease with which multiple samples can be analyzed at one time, the inconclusive rates achieved by the Random Forest classifier, even when the higher percent abundance threshold data sets were used, would be manageable in a forensic casework laboratory. Plots showing a visual representation of these comparisons for the full data sets are shown in Figures S13–S15. The same day (excluded QuickStrip card treated as test set) analysis comparisons can be found in Figures S16–S18.

Variable Importance and Chemical Reasoning

Although each method naturally lends itself to a different means of assessing the contribution of m/z bin to the classification, the importance of each variable can still be investigated across all three methods. Plots showing variable importance per analysis method for the three isomers are depicted in Figures S19–S21. For the FA and FMA isomers, the bins at m/z 137.100 were consistently important variables, in all voltages, for the Random Forest classifier. This fragment, likely C9H10F, occurs through the removal of the amine from the protonated molecule, either through rearrangement and subsequent loss of NH3 (FA) or through the bond dissociation between the nitrogen and its α carbon (FMA).[38] For both phenethylamines, the 4-isomer (para) was easiest to distinguish from the remaining isomers, and this loss is one likely reason. The mean abundance of the bin at m/z 137.100 for all voltages was considerably higher for the para isomer for both FA and FMA and for both normalization methods. The presence of the fluorine as an electron-withdrawing group in the para position allows for stabilizing conjugation of this protonated fragment which makes this loss more favorable for this isomer.[39] The difference between the mean abundance of the isomers in this bin is less prominent when the Orifice 1 voltage is increased to 90 V as this induces collisions of sufficient energy for subsequent fragmentation. Fittingly, for the FMA isomers, this fragment was influential for the first linear discriminant function which differentiated the para-isomer from the others, but not the second function (which was most effective at distinguishing the more similar isomers). The 137.100 fragment was not as influential for either linear discriminant function for the FA isomers. Instead, another variable that was influential for all three classification methods was the m/z 135.075 (C9H8F) at both 60 and 90 V. This fragment likely is formed by the removal of H2 from the 137.100 fragment. The normalized mean abundances of all three isomers for this fragment at 90 V were separated from one another by more than one standard deviation with the highest abundance seen in the para isomer and the lowest in the ortho isomer. Interestingly, the [M + H – HF]+ fragment that is highly abundant in the chemical ionization mass spectra of fluoroamphetamines[8] was not present in high enough abundance to exceed even the lowest percent abundance threshold. For both the Random Forest classifier and the second linear discriminant function, the most important variable for the FMA isomers was the bin containing m/z 115.075 90 V data. This fragment was also particularly influential in the second linear discriminant function for the FA isomers and somewhat influential in the Random Forest classifier as well as the first linear discriminant function. This fragment’s accurate mass indicates a C9H7 ion, possibly a phenyl cyclopropyl ion, formed during the is-CID occurring at the highest Orifice 1 voltage. The ortho effect likely plays a role here as the mean abundance of this m/z bin was higher for the ortho isomer than the mean abundance of the other two isomers by more than one standard deviation for both normalization methods and both the FA and FMA isomers. For the MMC isomers, the bins at m/z 129.075 (C10H9) and m/z 128.075 (C10H8) provided a large influence on all three classification methods. In addition, the m/z 130.075 (C10H10) for both 60 and 90 V heavily influenced the second linear discriminant function. These fragments all likely formed from the loss of a methylamine from the protonated molecule (with varying amounts of unsaturation) and a water molecule. For both normalization methods, the ortho-isomer had a higher mean abundance for these three fragments compared with the mean abundances of the same fragments for the other two isomers, frequently differing by more than one standard deviation. This suggests the presence of an ortho effect favoring these fragmentation pathways through the formation of a six-member ring as a transition state.[40,41] This ortho effect seems particularly prevalent for the 129.075 fragment, which may be why it drives the first linear discriminant function, which easily differentiates the ortho isomer from the other two. Interestingly, one of the most influential m/z bins for the Random Forest classifier of the MMC isomers was the 160.125 (C11H14N) fragment in the 60 V Orifice 1 voltage. This peak likely originated due to the loss of a water molecule as a result of a rearrangement reaction along with the formation of an azirine-containing ion as proposed by Franski et al.[42] The normalized mean abundance of this variable did differ between the three isomers, but frequently not by more than one standard deviation. Although the aromatic substitution is not involved in this rearrangement, one might expect this fragment to have the highest mean abundance for the ortho isomer due to the ortho effect, but actually the para isomer showed the highest mean abundance, followed by ortho and then the meta isomer. This might be due to the fragment forming more readily for the ortho isomer (as it is more abundant in the 30 V data) and subsequently dissociating into the 128.075, 129.075, and 130.075 fragments. For all three isomer types, when the lower percent abundance threshold was utilized, the low abundance variables typically provided a high amount of influence on the classification models. This demonstrates the important contributions of the low abundance m/z bins in the correct identification of the NPS ring isomer. This suggests that while variable reduction is important to reducing the effects of mixture interference, it is beneficial to retain more m/z bins than just those that may be visually identified as prominent peaks.

Conclusions and Future Work

While the benefits and strengths of the DART-ToF instrument are plentiful, it is well understood that ambient sample ionization makes use of this data for positional ring isomer differentiation challenging due to the high variation between analyses and the absence of chromatographic separation prior to MS analysis. This study has shown that while all three types of chemometric classification methods show some success in tackling this challenge, a machine learning technique, such as the Random Forest classifier, appears to be most well-suited to handle this analytical problem by effectively utilizing minor differences in the mass spectra to assign the correct isomeric form. The two normalization techniques showed similar performance, although the ion current method seemed to slightly outperform the vector length procedure, particularly in terms of the inconclusive rate at which a 0% error rate was achieved. However, this improvement in performance was minimal, and both methods of normalization are worth considering in future studies. There are some limitations when it comes to incorporating this rapid and novel classification method into the analytical scheme for a forensic casework laboratory. The most difficult challenge to overcome is interference from possible mixture constituents. Typically, these classes of NPS are not found in mixtures with the same frequency as other common drugs of abuse, but it is still worth investigating. It is the hope of the authors that by restricting the analysis and normalization to only m/z fragments that are relevant to the isomer, as determined from the analysis of primary standards, extraneous fragments due to diluents or cutting agents may be ignored. In addition, a key benefit to the Random Forest classifier is that each tree is unique and does not include all variables. Both of these features may allow this model to still successfully differentiate NPS ring isomers in case mixtures. However, future studies must be conducted including casework samples to determine if the presence of common cutting agents would impact the fragmentation of the isomer itself. This type of work has been conducted on fentanyl and some fentanyl analogues, and minimal competitive ionization effects were seen, though it is important to note that this study focused on ionization for detection, not necessarily the reproducibility of fragmentation patterns in the presence of mixtures.[43] Since mixtures are a well-known challenge for DART-ToF in many instances; there has been some work performed in sample cleanup techniques such as solid-phase microextraction[44,45] and other reproducible sample introduction techniques such as thermal desorption.[43,46] It is possible that in the presence of challenging mixtures, the combination of appropriate sample clean up steps and data analysis tools such as those presented here could still result in successful NPS isomer classification. However, it is unlikely that this method could be used if a sample contains a mixture of two or more different isomeric forms (for example 2-FA and 4-FA). Although this is rare in casework, it is possible, and it needs to be investigated whether this method would be capable of determining that such a mixture is present or whether an erroneous conclusion would be reached. Other challenges in moving this work from proof-of-concept into use in forensic casework involve possible sample-to-sample variation. In this study, only one primary standard for each isomer was utilized, so while highly pure it is unknown if there may be impurities from the synthesis that could affect the analysis when compared to street samples. Similarly, differences in concentration could affect the results. However, this is why two different deposition volumes were included in the data set. Sample concentration much higher than used here would likely result in sample carryover, while sample concentrations lower would show a low TIC signal. Both instances would cause an experienced analyst to adjust the concentration to a more suitable level. An additional limitation that needs to be investigated is whether this model could be utilized to identify isomers that were analyzed on a different DART-ToF instrument than the one used to create the classifier model. For this study, all samples were analyzed on the same instrument, but it is the hope that eventually other laboratories could compare their data to one central, robust data set to identify their unknown samples without the need to create their own data sets. In practice, this would involve the laboratories analyzing their own standards to compare against the model data set to ensure correct classification is achieved, and subsequently an unknown sample could be confidently analyzed. Future work will involve the creation of a simple user interface to accommodate the use of a centralized model. In order for this to be feasible, another important challenge to investigate is model stability over time for the various DART-ToF setups. As the instruments are used and serviced, does excessive variation build up over time that would impede the success of the model? While these challenges have yet to be addressed, the success seen here with data taken all on one day suggests that if such a robust, centralized data set is not feasible, a quick, smaller data set could be easily generated in the course of the analysis and still produce rapid, successful ring isomer differentiation. Although the instrumental time required to generate the entire data set for each isomer was much smaller than for a comparably sized GC-MS data set, the time required for spectral acquisition after analysis is a potential limitation. Currently the software that is typically used with this instrument does not have the functionality to automatically extract multiple spectra from the total ion chronogram generated for each Orifice 1 voltage. There may however be commercially available software packages that can be explored to expedite this process. Once the model is constructed (either a large centralized model or a smaller lab-specific model), the authors anticipate that such a model can be used directly and for a prolonged period of time. This means that model creation is a one-time investment. The combination and preprocessing of the data is conducted through a computer script[30] and is performed nearly instantaneously immediately before comparison to the classification models. It is the hope that, with the proposed future step of a centralized model and user interface, this process would be made even simpler. In that case, isomer identification with DART-MS could be established in a similar time frame as a typical library search and in a very high throughput compared to GC-MS.

27 in total

1. Rapid Analysis of Trace Drugs and Metabolites Using a Thermal Desorption DART-MS Configuration.

Authors: Edward Sisco; Thomas P Forbes; Matthew E Staymates; Greg Gillen
Journal: Anal Methods Date: 2016-08-16 Impact factor: 2.896

2. Validation of the direct analysis in real time source for use in forensic drug screening.

Authors: Robert R Steiner; Robyn L Larson
Journal: J Forensic Sci Date: 2009-03-03 Impact factor: 1.832

3. Pharmaceutical identifier confirmation via DART-TOF.

Authors: Jacob L Easter; Robert R Steiner
Journal: Forensic Sci Int Date: 2014-04-01 Impact factor: 2.395

4. Resolution of isomeric new designer stimulants using gas chromatography - Vacuum ultraviolet spectroscopy and theoretical computations.

Authors: Ludovit Skultety; Petr Frycak; Changling Qiu; Jonathan Smuts; Lindsey Shear-Laude; Karel Lemr; James X Mao; Peter Kroll; Kevin A Schug; Angelica Szewczak; Cory Vaught; Ira Lurie; Vladimir Havlicek
Journal: Anal Chim Acta Date: 2017-03-22 Impact factor: 6.558

5. Differentiation of regioisomeric ring-substituted fluorophenethylamines with product ion spectrometry.

Authors: F Westphal; P Rösner; Th Junge
Journal: Forensic Sci Int Date: 2009-11-08 Impact factor: 2.395

6. Regioisomeric and enantiomeric analyses of 24 designer cathinones and phenethylamines using ultra high performance liquid chromatography and capillary electrophoresis with added cyclodextrins.

Authors: Li Li; Ira S Lurie
Journal: Forensic Sci Int Date: 2015-07-23 Impact factor: 2.395

7. Direct analysis in real time mass spectrometry (DART-MS) of "bath salt" cathinone drug mixtures.

Authors: Ashton D Lesiak; Rabi A Musah; Robert B Cody; Marek A Domin; A John Dane; Jason R E Shepard
Journal: Analyst Date: 2013-05-02 Impact factor: 4.616

8. Distinguishing drug isomers in the forensic laboratory: GC-VUV in addition to GC-MS for orthogonal selectivity and the use of library match scores as a new source of information.

Authors: Ruben F Kranenburg; Alan R García-Cicourel; Corina Kukurin; Hans-Gerd Janssen; Peter J Schoenmakers; Arian C van Asten
Journal: Forensic Sci Int Date: 2019-07-26 Impact factor: 2.395

9. Study of Substituted Phenethylamine Fragmentation Induced by Electrospray Ionization Mass Spectrometry and Its Application for Highly Sensitive Analysis of Neurotransmitters in Biological Samples.

Authors: Daiki Asakawa; Eiji Sugiyama; Hajime Mizuno; Kenichiro Todoroki
Journal: J Am Soc Mass Spectrom Date: 2021-07-22 Impact factor: 3.109