Literature DB >> 34591852

A comparative field evaluation of six medicine quality screening devices in Laos.

Céline Caillet1,2,3,4, Serena Vickers1,2,3, Stephen Zambrzycki5, Facundo M Fernández5, Vayouly Vidhamaly1,2,3, Kem Boutsamay1,2,3, Phonepasith Boupha1,2,3, Pimnara Peerawaranun4, Mavuto Mukaka2,4, Paul N Newton1,2,3,4.   

Abstract

BACKGROUND: Medicine quality screening devices hold great promise for post-market surveillance (PMS). However, there is little independent evidence on their field utility and usability to inform policy decisions. This pilot study in the Lao PDR tested six devices' utility and usability in detecting substandard and falsified (SF) medicines. METHODOLOGY/PRINCIPAL
FINDINGS: Observational time and motion studies of the inspections by 16 Lao medicine inspectors of 1) the stock of an Evaluation Pharmacy (EP), constructed to resemble a Lao pharmacy, and 2) a sample set of medicines (SSM); were conducted without and with six devices: four handheld spectrometers (two near infrared: MicroPHAZIR RX, NIR-S-G1 & two Raman: Progeny, Truscan RM); one portable mid-infrared spectrometer (4500a), and single-use paper analytical devices (PAD). User experiences were documented by interviews and focus group discussions. Significantly more samples were wrongly categorised as pass/fail with the PAD compared to the other devices in EP inspections (p<0.05). The numbers of samples wrongly classified in EP inspections were significantly lower than in initial visual inspections without devices for 3/6 devices (NIR-S-G1, MicroPHAZIR RX, 4500a). The NIR-S-G1 had the fastest testing time per sample (median 93.5 sec, p<0.001). The time spent on EP visual inspection was significantly shorter when using a device than for inspections without devices, except with the 4500a, risking missing visual clues of samples being SF. The main user errors were the selection of wrong spectrometer reference libraries and wrong user interpretation of PAD results. Limitations included repeated inspections of the EP by the same inspectors with different devices and the small sample size of SF medicines.
CONCLUSIONS/SIGNIFICANCE: This pilot study suggests policy makers wishing to implement portable screening devices in PMS should be aware that overconfidence in devices may cause harm by reducing inspectors' investment in visual inspection. It also provides insight into the advantages/limitations of diverse screening devices in the hands of end-users.

Entities:  

Mesh:

Substances:

Year:  2021        PMID: 34591852      PMCID: PMC8483322          DOI: 10.1371/journal.pntd.0009674

Source DB:  PubMed          Journal:  PLoS Negl Trop Dis        ISSN: 1935-2727


Background

According to a recent World Health Organization (WHO) report, ~10.5% of medical products circulating in low- and middle-income countries (LMICs) are either substandard or falsified (SF) [1]. Falsified medicines are the result of criminal activity, purporting to be genuine, authorized medicines, but are deliberately and fraudulently mislabelled with respect to identity and/or source [2]. Substandard medicines are ‘authorized medical products that fail to meet either their quality standards or their specifications, or both’ [2]. Currently, national Medicines Regulatory Authorities (MRA) medicine inspectors in LMICs performing post-marketing surveillance (PMS) largely rely only on their own senses and knowledge to detect circulating SF [3]. A plethora of portable analysis screening tools have been developed over the last decade [4,5], allowing some degree of objective analysis of medicines in the ‘field’. However, there are enormous key gaps regarding the evidence-base to inform national MRAs of the optimal choice of device to detect SF medical products [4,6]. This is the third paper in the Collection ‘A multi-phase evaluation of portable screening devices to assess medicines quality for national Medicines Regulatory Authorities’, evaluating devices in Laos. Six devices deemed ‘field suitable’ in the laboratory evaluation phase were evaluated in the hands of medicine inspectors from the Lao Bureau of Food and Drug Inspection (BFDI) of the Ministry of Health [7]. Inspectors of medicines quality in Laos typically undertake routine inspection of pharmacies bi-annually, focusing on adherence to legislation and drug registration. Occasionally, medicines are purchased from a selection of pharmacies for screening using the Minilab [8]. All samples which fail Minilab screening, and a further 10% of those which pass, are sent to the National Center for Food and Drug Analysis (NCFDA) [previously ‘Food and Drug Quality Control Center’], Vientiane, for pharmacopeial testing. We aimed to assess the utility and usability of six portable screening devices in the hands of Lao medicine inspectors for inspection in a simulated Evaluation Pharmacy.

Methods

Six devices and the Minilab, in line with its current use in Laos, were evaluated (). An outline of the different steps is given in

Outline of the field evaluation study.

EP: evaluation pharmacy; SSM: Sample set of medicines. *Rapid Diagnostic Tests (RDT) and single-use immunoassay devices were deemed field-suitable in the laboratory evaluation work [7], but could not be evaluated in the present study because the developers of the single-use immunoassay test were unable to supply sufficient samples of the devices within the timeframe of the project. D, Under development; FTIR, Fourier Transform Infrared; M, Marketed; MIR, Mid-Infrared; MS, Mass spectrometry; N, No; NIR, Near infrared; S, Single-use device; TLC, Thin-layer chromatography; Y, Yes. a The costs reported here do not include VAT that may vary by country of purchase. Ordering several devices from the manufacturer is subject to potential reduced purchase cost. b Unlike other devices, the Minilab was evaluated by laboratory technicians involved in current routine quality control at the National Center for Food and Drug Analysis. c At the time of the study the NIR unit was produced by Young Green energy. It is now produced by InnoSpectra Corporation. d The near-infrared sampling unit is marketed, but the smartphone application is not.

Study setting

An evaluation pharmacy (EP) was fashioned at Mahosot Hospital, Vientiane to resemble a Lao Class 2 pharmacy [18,19] and stocked with genuine and falsified field-collected medicines (FCM) stated to contain 41 different API or API combinations. The participants were asked to focus on inspections of medicines containing seven targeted API: ofloxacin (OFLO), sulfamethoxazole-trimethoprim (SMTM), azithromycin (AZITH), amoxicillin-clavulanic acid (ACA), artemether-lumefantrine (AL), artesunate (ART) (intravenous/intramuscular formulation) and dihydroartemisinin-piperaquine (DHAP). Genuine medicines were obtained from manufacturers and distributors in Laos and Thailand. Falsified versions of the antimalarial Coartem, containing none of the stated artemether or lumefantrine API, were provided by collaborators. Ultra-performance liquid chromatography (UPLC) was used as the reference technique to determine the amount of API(s) contained in FCM ( and ), except the falsified field-collected samples that had been tested by mass spectrometry [20].

Device settings

Qualitative results obtained with the devices were based upon pattern comparison between a known good quality medicine reference and the test medicine sample data. PAD chemically reacted with the medicine ingredients generating a color pattern that was then visually compared to a reference photograph. The spectrometers computationally compared experimentally-collected spectra to reference spectra of good quality medicines stored in the device’s database. Reference spectra were created for each brand tested in the study and each ‘good quality’ simulated medicine during the laboratory evaluation phase of the project [7]. The protocol for creation of reference spectra is available in . Each sample spectrum acquired was given a score by the device software resulting from the comparison with the good quality reference library entry. Such scores needed to meet a given threshold to determine if a medicine passed. For the NIR-S-G1 spectrometer, reference samples were sent to the developer who prepared the reference libraries. The passing threshold values for the correlation coefficient or p-value testing initially set as default by the developer in the MicroPHAZIR RX, NIR-S-G1, Progeny, and TruScan RM spectrometers were utilized. These devices readout would directly tell the user ‘pass’ or ‘fail’, which were recorded. The pass threshold for the 4500a MIR spectrometer’s correlation coefficient was set by us at >0.9 because the device would not output a direct pass/fail result, but rather give a list of matches with their associated correlation coefficients.

Training medicine inspectors

Sixteen medicine inspectors, employees of the Vientiane Central Bureau for Food and Drug Inspection based in Vientiane Capital (n = 10) and Vientiane districts BFDI offices (n = 6), volunteered for the study. They were randomised to receive either a ‘rudimentary’ verbal training with opportunity to rehearse use of the device on a few practice samples just prior the EP inspection (5–10 min); or an ‘intensive’ training including verbal presentation and substantial practice with the device (1–2 h), plus an additional rudimentary verbal training and practice just prior to the EP inspection. The trainings were given by Lao post-graduate research pharmacists, trained by the lead chemist overseeing the laboratory evaluation phase ().

Evaluation pharmacy inspections

To refine the study protocol, EP inspections without devices were piloted by three pharmacy students from the Faculty of Pharmacy (University of Health Sciences, Vientiane) prior to the initial visual inspections. Subsequently, four EP inspections by four different medicine inspectors were conducted per device, between September and December 2017. Each medicine inspector performed one simulated inspection without any device as a baseline (‘Initial visual inspection’), and up to three simulated EP medicine inspections with a device. The inspectors were randomly assigned to a combination of training and devices using the Excel RandBetweenrandom number generator. Constraints were that no inspector would test more than one handheld spectrometer (Progeny, MicroPHAZIR RX or Truscan RM) due to operating procedure similarity. Only inspectors from the district office would test the NIR-S-G1 because some Vientiane BFDI office inspectors had already used the NIR-S-G1 in a previous study by our research group. All inspections were carried out independently by the participant working alone with one device per inspection. Inspectors were asked to test any suspicious samples containing the targeted API assuming: no time limit (hence the collectors were free to inspect as many samples as they wished to), no budget restriction, that it was June 2015 (to avoid bias because some of the medicines in the EP had already expired by the evaluation) and that all blisters had no tablets missing (as some tablets were removed for analysis). They were encouraged to test samples through the blisters where appropriate. However, if they wished to perform testing requiring opening of primary packaging, the observer provided inspectors with already unpackaged samples (in a small zip-lock bag) of the same batch number of the product that were stored at the same conditions. If the inspector regarded all medicines as not suspicious at the end of the inspection, the inspector was asked to select a sample of 10% of those which did not look suspicious or passed the device test, for the Minilab testing. To reduce recall bias by medicine inspectors inspecting the EP several times, brands were changed between inspections and moved to different places.

Sample set inspections

After each EP inspection, a pre-determined ‘sample set’ of medicines (SSM) was tested by each inspector with the device in an office outside the EP. These SSMs tests facilitated direct comparisons between the devices for the time taken to test a single sample and to observe user errors. The samples consisted of FCM and ‘simulated’ samples made during the laboratory evaluation [7] that were presented as single tablets, with packaging removed, in transparent zip-lock plastic bags labelled with the brand name, manufacturer, and dosage (). Three of six SSM samples of either AL, SMTM or OFLO were prepared to ensure that no inspector assessed each SSM more than once. However, eight FCM samples used for the creation of five spectrometer reference library entries and one used as a test sample (an artemether-lumefantrine sample), were subsequently found to be out-of-specification by UPLC analyses, rendering these reference libraries unreliable [7]. We thus discarded spectrometer results of testing of five samples included in SSM inspections (), and eight samples included in the EP inspections. No brands were discarded from the analysis for the PAD as they use reference colour codes pictures provided by the device developers. The test sample (SPS22) and the reference library samples (n = 4: SPS20, SPS21, SPS06, SPS07) that were subsequently discarded from the results because of unexpected out-of-specifications API content as per UPLC analysis, are given in italics and highlighted. The samples with out-of-specifications reference library samples were still used for the PAD evaluation as the PAD reference libraries are independent reference pictures provided by the device developers. G: genuine; F: falsified. *Simulated sample ¥ ‘look-alike’ medicines are defined as medicines stated as containing specific API (not one of the seven API included in this study) but the tablets were visually indistinguishable from genuine medicines in order to mimic a falsified medicine with a wrong API; the actual medicine was Diabeta (chlorpropamide), but the tablets looked identical to Sulfatrim (SMTM)] [21].

Baseline screening method: Minilab

Three laboratory technicians from the National Center for Food and Drug Analysis (NCFDA) of the Lao Ministry of Health (formally trained with the Minilab and involved in training of provincial inspectors), tested the samples selected as suspicious by medicine inspectors during the 16 initial visual inspections (without device), a random set of 10% of the samples considered good quality during the 16 initial visual inspections, and the medicines of the three SSM, in line with the current use of the Minilab in Laos. Each technician was assigned to the testing of all samples of two or three API (e.g. inspector A tested all the samples of SMTM and AL).

User satisfaction and focus group discussions

After completion of each SSM testing, the medicine inspectors were asked five open-ended questions, through face-to-face interviews in Lao language (). Two months after the last EP inspection, three focus group discussions (FGD) were held. Each FGD had five medicine inspectors to give further insight into the utility and usability of the tested devices to support PMS systems ().

Outcomes

In the absence of spectrometer manufacturer’s guidelines, when a sample failed the first test with a device, medicine inspectors were instructed to operate a ‘best of three’ system for overall sample classification. Three tests were performed with the device on the failing samples, the most frequently occurring of ‘pass’ or ‘fail’ would then be the overall sample classification. For the PAD, inspectors were instructed to re-run failing samples once, as recommended by the developer. If the sample failed again, the sample was classified as failing. For both the EP and the SSM inspections, medicine inspectors were asked to record the sample identifier, pass/fail results of each single test, and the overall pass/fail classification on a recording sheet (). Data analysis was performed using results from the inspector’s overall pass/fail classification of the sample. Time and motion studies were conducted by two observers (only one observer for the Minilab). In EP inspections, one observer unobtrusively, with no conversation allowed with the participant, recorded the times to perform specific tasks on a recording sheet (). Another observer recorded deviations from device protocol (‘user errors’). For SSM, two observers recorded the times to perform specific tasks. The tasks recorded in EP were the times spent to conduct sample visual inspections, testing with the device (‘sample testing’), and interpret/record the results (. For SSM no visual inspection was conducted by the inspectors, as the tablets were provided outside their packaging. In addition to the time to interpret/record the results, two phases were identified as part of the SSM ‘sample testing’; ‘sampling’ (started when use of the device or removal of the tablet from the packaging to begin testing started; ended when the process to obtain a result is started) and ‘device testing’ (started when the process to obtain a result is started; ended when result is obtained).

Data analysis

The median and interquartile range (IQR) number of samples wrongly classified, and the percentage with 95% confidence intervals (CI) of samples wrongly classified over all the EP inspections per device, are presented. Fisher’s exact tests were used for the comparisons of the proportions of the number of samples wrongly classified by device pairs. Wilcoxon rank-sum tests were used to compare the number of samples wrongly categorised with and without devices in EP inspections. For SSM differences in accuracy in correctly classifying samples between devices were examined using mixed effects logistic regression yielding adjusted odds ratios, adjusted for training type (rudimentary/intensive), and sample set type as factors and inspectors as cluster-specific random effects. The total time spent in EP inspection, time spent per phase during SSM, and total time spent per sample in SSM testing are described using medians (IQR). Wilcoxon rank-sum tests were used to test the differences in the times between the initial EP visual inspection and EP inspection for each device. For SSM testing, differences in the times between devices were examined using mixed effect generalised linear regression models to obtain the estimated devices’ effect compared to the reference devices, adjusted for training group and sample set as factors and inspectors and observers as cluster specific random effects. The data demonstrated skewed distributions for time and we therefore used the variable transformed to natural logarithm. All tests were performed using a 5% (0.05) significance level. Microsoft Excel 2013 and STATA version 14.0 were used for analyses. The user error(s) observed during EP and SSM inspections are summarized as narratives by category of errors (e.g. selection of the wrong reference library for spectrometers). The information of face-to-face interviews and the FGD are summarized and presented as narratives highlighting emerging common themes. More details by device are provided in .

Results

Times results

Time to inspect the evaluation pharmacy

EP inspections with each device took significantly longer to complete compared with the initial visual inspections (25 min 16 s) without devices (p<0.05, Wilcoxon rank sum), except for the NIR-S-G1 (32 min 33 s, p = 0.307) (. Visual inspection duration when using a device was significantly shorter for all devices than for initial visual inspections of the EP with no device, except for the 4500a FTIR (p = 0.061). During more than one-third of the inspections with devices (n = 9, 41%), inspectors spent less than one minute in sample visual inspection (. As one inspector did not perform the negative control of the PAD and the observers failed to record the calibration time with the MicroPHAZIR RX during one inspection, these data were excluded from the analyses.

Time spent inspecting evaluation pharmacy, by device.

Values in the figure are medians (IQR); 2,000 seconds is ~ 33 minutes and 8,000 seconds is about 2 hours.

Time per sample in sample set inspections

For SSM inspections, the median time to test one sample ranged from 94 sec for the NIR-S-G1 to 2,063 sec (34 min 23 s) for the Minilab. The Minilab and PAD took significantly longer total times per sample compared to other devices (p<0.001)(). P-values of the mixed effects generalised linear regression model of ln(total time) adjusted by device and training, and clustered by inspectors and observers The NIR-S-G1 had a significantly shorter total time per sample (median 94 s) than any other devices tested (p<0.001); sampling was significantly faster than for the other devices and in interpreting/recording compared to all the devices except the MicroPHAZIR RX (median of 14 s vs 22 s, p = 0.78) (Table A in ). The MicroPHAZIR RX was significantly faster in testing one sample than all other devices except the NIR-S-G1. The Progeny was significantly slower in device testing and interpreting/recording times per sample than the MicroPHAZIR RX and the Truscan RM (p<0.001). The PAD and 4500a FTIR sampling times were not significantly different (4 min 2 s and 3 min 49 s, respectively, p = 0.059). The inspectors with rudimentary training did not spend longer testing one sample, compared to the inspectors with intensive training, adjusted for devices, sample set tested, and clustered by inspectors and observers (p = 0.11, Table B in ).

Device accuracy

Over all EP inspections, samples were wrongly categorized with a frequency of 0% with the Truscan RM and MicroPHAZIR RX, to 37.9% (95% CI, 20.7–57.7%) with the PAD (). Significantly more samples were wrongly classified with the PAD compared to all other devices (p<0.05). All incorrect classification results were for genuine medicines being classified as suspicious (false positive). P-values of the Fisher’s exact test are presented * p<0.05 **p<0.01 ***p<0.001 a Not applicable as no samples were wrongly categorised in inspections with the Truscan RM or MicroPHAZIR RX b Artesunate samples were discarded from the results analysis because samples were scanned through the glass vials by the inspectors, although reference library was created by scanning through a replacement packaging (plastic packaging) The median numbers of samples wrongly classified in EP inspections with the 4500a FTIR [1 (0.3–1)], MicroPHAZIR RX [0 (0–0)], NIR-S-G1 [1 (0.3–1)] and Truscan RM [0 (0–0)] were significantly lower than in initial visual inspections [p = 0.048, p = 0.008, p = 0.048 and p = 0.005, respectively] (). There were no statistical differences in the number of samples wrongly classified in EP inspections with the PAD [2 (1–5.3)] and Progeny [0 (0–1.5)], compared to initial visual inspections (p = 0.631 and p = 0.059, respectively). Z statistic and p-value of the Wilcoxon rank sum test are presented * p<0.05 **p<0.01 ***p<0.001 $ The numbers of samples wrongly categorized in initial inspections without devices used in the comparisons vary because we included only brands tested in initial inspections that each device were able to test (e.g. AL samples wrongly categorized during initial inspections were excluded for the PAD, as the PAD could not test samples containing AL). In both initial inspections without devices and inspections with devices, we excluded samples wrongly categorized from brands subsequently found to have reference library spectra obtained from poor quality reference samples (as per UPLC analyses), except for the PAD. For SSM inspections there were no significant differences between devices that wrongly classified samples as suspicious or not suspicious, adjusted by training status, sample set tested, and clustered by inspectors (). Odds ratio (95% CI) of the mixed effect logit model, with adjustment on the type of training received (rudimentary or intensive), sample set type (OFLO, AL, SMTM), and clustered by inspectors, are presented $ 95%CI for binomial distribution Over all SSM inspections, 10 out of 18 (55.6%) misclassifications were false negative results with samples containing 50% of OFLO or SMTM, two (11.1%) were false negative falsified FCM stated to contain AL and six (33.3%) were false positive samples of OFLO and SMTM. The two 50% API samples tested with both the MicroPHAZIR RX and NIR-S-G1 were correctly classified as suspicious whereas the 4500a FTIR correctly classified 1/2, the Minilab 0/2, the PAD 3/4, the Progeny 0/3, and the Truscan RM 1/4. Inspectors with rudimentary training were not significantly more likely to wrongly classify the samples compared to those with intensive training in SSM [OR 1.5 (95% CI 0.5–4.9)] adjusted by devices and sample set tested and clustered by inspectors ().

Observed user errors

The main observed user errors with the MicroPHAZIR RX, NIR-S-G1, Progeny, and Truscan RM were the selection of wrong comparator reference libraries in 3.9%, 27.0%, 7.5% and 20.0% of the scans in EP inspections and 0.0%, 27.8%, 0.0% and 25.8% in SSM inspections, respectively (). The inspectors recognized all their errors and repeated the tests during MicroPHAZIR RX and Progeny inspections, resulting in no overall misclassification of samples. However, in 16 of 17 scans (88.9%) using wrong reference libraries with the Truscan RM and 21/27 (77.8%) using the NIR-S-G1, the users did not realize the errors. In all these cases, the wrong brand (a different brand stated to contain the same API and strength) was selected by the users as the reference library. None of the 11 samples scanned with wrong reference libraries for which the users did not recognize the errors were misclassified with the Truscan RM. However, four samples out of the 17 samples tested with wrong reference library with the NIR-S-G1, for which the users did not recognize the errors, were incorrectly classified (all were false positives). Two out of 29 samples tested (6.9%), and four out of 24 samples tested (16.7%) were misclassified as a result of errors in PAD interpretation in EP and SSM inspections, respectively. *3 inspections only with the MicroPHAZIR RX because results of one inspection were discarded because of an issue over the inbuilt reference library $Errors were recognized by the inspectors who re-tested the samples without mistakes ¥ Wrong selection of the reference library can happen only with the ’Application’ function. Using the analyse function, no user errors were observed during both EP and SSM

User satisfaction

All spectrometers, except the NIR-S-G1, were felt to be heavy and/or rather cumbersome. The portable 4500a FTIR spectrometer was perceived as suitable in inspections of manufacturing and distributing sites by inspectors who liked the extra information given by the table of results (table of matches, with its list of API and % match) as this was felt to increase confidence in the device results. However, the 4500a was identified as not suitable for routine pharmacy inspection due to its large size and the need for sample crushing and for cleaning the sampling window: ” The MicroPHAZIR RX was described as easy to use, reliable, comfortable, and fast. The sample window indicator, which shows the inspector whether the sample is sufficiently covered by the sample window to produce a reliable result, was cited as a helpful additional feature giving inspectors additional confidence in their sampling technique. The device froze during one of the four EP inspection and all the records were lost, which made that inspector think that the use of the device would be ‘. The NIR-S-G1 was singled out by medicine inspectors as well-suited for any level of the supply chain due to its small size, fast testing time, and easy-to-use smartphone application. However, the lack of capability to create and update the reference library of comparators locally was perceived as a key limitation. Although medicine inspectors liked the PAD lack of reliance on electricity or sophisticated instrumentation, the need to prepare samples, to have a working space to carry out the analysis, and the longer experiment time were frequently raised as concerns with regards to their usability in pharmacies and at distributors sites. Difficulties in interpreting the results were often highlighted: Two medicine inspectors acknowledged that the PAD would be useful to test raw materials at manufacturers. Medicine inspectors liked the ability of the Progeny to display more than a ‘pass’ or ‘fail’ result. It was felt to be quite slow to scan, and three inspectors commented that the touchscreen was not very responsive. Three inspectors stated that the supplied tablet holder was difficult to use with small tablets. It was noted as interesting for inspections in manufacturers, distributors, or border points, but of limited use in pharmacy outlets. The Truscan RM was perceived as easy and comfortable to use, but with a slow device testing time (this was raised during face-to-face interviews). It was deemed more suitable for use in inspection of manufacturers’ plants, distributors, or border check points rather than in pharmacy outlets.

Discussion

This pilot study provides insight into the performances, the advantages/limitations of six screening devices in the hands of Lao medicines inspectors in simulated inspections. The NIR-S-G1 was the fastest spectrometer to test one sample whilst the PAD and the Minilab took significantly longer to test one sample than all the spectrometers. Within the limited context of this study, the five spectrometers showed promising accuracies to identify falsified and genuine medicines. Inspectors’ difficulties to read and interpret colour barcodes with the single-use PAD may have led to a lower accuracy of these compared to other devices in the evaluation pharmacy. The selection of the wrong reference libraries by inspectors was observed with all spectrometers but these errors did not lead to final erroneous classifications of medicines, except in four cases with the NIR-S-G1. Findings also suggest that policy makers wishing to implement devices in PMS should be aware that overconfidence in devices risks harm by reducing inspectors’ visual inspection investment. Of the six devices studied, five were spectrometers and there were no significant differences in the performance outcome measured in EP or SSM inspections between them in our limited data set. The inspections of the EP with all spectrometers except the Progeny were more accurate than the initial visual inspection with no device. Most spectrometers, except for the NIR-S-G1, were felt too heavy and cumbersome for pharmacy inspections. In one inspection, the MicroPHAZIR RX was used with the device resting on the bench of the simulated pharmacy rather than as a ‘handheld’ device. The handheld NIR-S-G1 was perceived as light, handy and user-friendly, thus suitable for routine inspections at any level of the supply chain. The Truscan RM, Progeny (using the ‘application’ function), MicroPHAZIR RX and the NIR-S-G1 all gave a simple ‘pass/fail’ result, a feature appreciated by the medicine inspectors. The ‘matching’ values given by the Progeny (using the ‘analyse’ function) and the 4500a FTIR gave reassurance in the results. Except the 4500a FTIR, the spectrometers require the user to select the correct reference library entry for comparison with the tested spectrum. Out of the samples tested with the NIR-S-G1 with the inspector selecting the wrong reference library, almost one-fourth gave false classifications. In instances where the wrong reference libraries were selected with the TruScan RM, the Progeny, or the MicroPHAZIR RX, overall classification of the sample as suspicious or not suspicious by the inspector were not compromised. Indeed, with the MicroPHAZIR RX and Progeny, the mistakes were all recognised by the inspectors who repeated the analysis. In all cases with the Truscan RM, the library selection errors were not recognized by the inspectors but the device gave the correct result and the samples were accurately classified overall. There appeared to be a lack of awareness that different brands of the same API may contain different excipients, resulting in need for different reference libraries. Indeed, in all the cases the wrong ‘brands’ library entries selected by the inspectors were that of medicines containing the same API(s) at the same strength than the tested medicine, such as selecting Sulfatrim instead of Vactrim (both containing SMTM). In some instances, the result shown by the device was that expected if the correct library had been selected. For example, in the case of Sulfatrim vs. Vactrim, the device gave a ‘pass’ to the Vactrim tablet being tested against the Sulfatrim reference library. It is likely that in some cases different brands containing similar API and excipient compositions lead to a correct overall classification of the samples because the medicines’ are chemically similar. Although little is known about the variability of response of portable spectrometers to different brands of the same API(s), our findings suggest that the Raman devices may be less susceptible to formulation-specific signature variations than the NIR [22]. Improvement of the function to select the reference library in the NIR-S-G1 and using the in-built barcode reader (featured in the Truscan RM, MicroPHAZIR RX, and Progeny) are likely to reduce the risk of wrong library selection and thus the number of incorrect results. The MicroPHAZIR RX and the NIR-S-G1 correctly classified all 50% API medicines tested, whereas the other spectrometers correctly classified none to less than half of them. Identification of substandard samples containing between 70 and 90% of API with spectrometers, and those containing higher than the upper limit of the specifications-commonly found in field surveys [23,24], should be further explored. Whilst the spectrometers were expected to give information on the dosage formulation of the tested samples, the PADs were designed to indicate whether specific API(s) or excipients were present in the tested samples, but not to identify samples containing lower than stated amount of the expected API. Hence, caution is needed to interpret the results of their comparison with the spectrometers. The PADs required the user to make subjective judgement on the visual likeness of the test sample result to the reference result. This is likely to have contributed to the significantly higher number of samples wrongly categorised by the medicine inspectors as compared to other devices in the EP inspections. Problems with colour interpretation were also observed for the Artemisinin derivative test (ADT), despite the high level of confidence in the results expressed by the laboratory technicians who were newly trained to use the ADT [25]. These and issues of colour blindness are likely to be greatly helped by automated smartphone interpretation software with image analysis software, such as ImageJ [26-28]. In addition, although the inspectors were told to change the water used for the PAD before running a new sample if water contamination occurred, one inspector, with rudimentary training, did not. This supports the impression that the training given may have been insufficient. For the advantages of the PAD to be realised, training schemes with user proficiency testing, continuing education, certification and quality control will be necessary, as they will be for the spectrophotometers. Interestingly, although they are not designed to identify samples with API content below specifications, the PAD identified three out of four 50% API samples tested. Expert readers had previously identified 19/20 samples containing 40% of the stated amount of chloroquine as giving a ‘weak’ signal, compared to the 18/18 full-strength formulations giving a ‘strong’ signal, but mixed results were observed in identification of samples with 70% stated API [29]. The PAD deserve more consideration as devices for semi-quantitative detection of substandard antimicrobials that are probably important drivers of antimicrobial resistance. Indeed, samples containing low ceftriaxone concentrations were successfully identified with high sensitivity by a novice PAD user and ImageJ, showing promising results for identification of substandard samples containing less than 80% of the stated API [28]. Recently, the μPAD, a competitive enzyme assay on paper, has shown encouraging results in the hands of five users unfamiliar with the device, to identify falsified beta-lactams [30]. The semi-quantitative properties of the ChemoPADs, using similar features to the PAD, coupled with an image analyser identified 11 of 20 substandard cisplatin samples found in Ethiopia [27]. As expected, devices requiring sample preparation and user data interpretation (4500a FTIR, PAD, Minilab) took the inspectors significantly longer time per sample than those which do not. This was particularly pronounced for both the PAD and the Minilab, but this may be offset by their ability to run more than one sample concurrently. The NIR-S-G1 and MicroPHAZIR RX were the fastest devices to test individual samples. At the time of the study, the NIR-S-G1 did not have the ability to record sample details on the device and did not have a sample holder for unpackaged tablets. This contributed to its fast speed of analysis, but there may be inconsistency with tablets that are too small or do not fit flush against the sampling window. The time spent on visual inspection in the EP was significantly shorter when using a device than for initial visual inspections alone, except for the 4500a FTIR (p = 0.061). The selection of suspicious samples by visual inspection may be key to identify poor quality medicines, especially those with obvious defects such as discoloration or typographic errors [31]. Therefore, reduction in visual inspection time may have negative consequences in finding suspicious medicines samples. Hence, it is possible that device introduction could be counterproductive depending on the prevalence of SF medicines that could be visually recognised. Instead of visually inspecting different blisters of the same brand as in the inspection without device, it seems that the inspectors chose to test a sample of one packet of a brand with the device, taking that result to be representative of all samples from that brand. This may be an artefact of the experimental set-up as the EP was inspected without devices first. When questioned about why they chose not to do visual inspection, some inspectors replied that they would expect samples of the same brand of medicine in the same pharmacy to be from the same batch, hence of identical quality. The paucity of visual inspection of samples could also be related to the increased perceived time pressure to complete an extra task within the ‘normal’ pharmacy inspection time. Further work is needed to investigate these findings and the impact of device use on real-life inspection effectiveness. Non-destructive testing of samples is preferable for pharmacy inspections [6]. The lack of budget to buy medicines to test, and the waste of samples for the pharmacy being inspected were mentioned by medicine inspectors as pitfalls of destructive technologies. Even for non-destructive devices, testing can currently only be carried out through transparent packaging. Of the fourteen brands of the target EP medicines, ten were in opaque packaging and therefore had to be removed from the packaging (thus ‘destroyed’) prior to testing. Innovations to blister pack and packaging could facilitate accurate spectroscopic evaluation [4]. We have been unable to find discussion as to the impact of different plastics on spectral acquisition. Spatially offset Raman spectroscopy (SORS) technology deserves investigation to scan through opaque packaging [32,33]. Few have discussed training requirements for device users [34-39], and limited scientific evidence can be retrieved from studies that were not primarily designed for that purpose. All the inspectors in our study were able to successfully complete pharmacy inspection and sample set testing with the devices regardless of the training they had received. In our limited data set, inspectors with intensive training were not significantly more likely to correctly classify the samples as suspicious or not than those with less training. Important limitations exist in our study and those, as well as the difficulties to perform such research encountered in our study are listed in . For the spectrometers, only one unit of each device was evaluated. We therefore make no assessment of variability between different units of the same device type. Only six API/combinations of APIs, all antimicrobials, and all sourced from one region were evaluated. One API (DHAP) of the seven initially selected for investigation had to be removed from analysis due to poor quality samples being used in construction of the device reference libraries. Only one parenteral formulation was investigated; all other samples were formulated as tablets. No testing of topical/liquid/capsule dosage forms was conducted. For laboratory-created spectrometer reference libraries (does not include NIR-S-G1, for which the developer created the reference library): ○ Manufacturer-set default values were used with no attempt to optimise these for specific medicines tested ○ Limited consideration of batch-to-batch variability for field-collected medicines Evaluation pharmacy included a small proportion of falsified medicines (3/~110 blisters stocked) Due to limited stock, some samples had exceeded their expiry date. Inspectors were specifically asked to overlook important normal cues for visual inspection (expiry date, inclusion on national list of registered medicines, condition of packaging, storage conditions). Overlooking these cues during inspection of the evaluation pharmacy has limited resemblance to the inspectors’ standard practice. The field-study team did not receive any direct training from the manufacturer and followed protocols in a second language. The 4500a FTIR does not give a pass/fail result and requires interpretation. Bias may have been introduced in the measure and comparisons of effectiveness between the 4500a FTIR and other devices because of instructions given to the inspectors that were incorrect, due to a misunderstanding by the trainer (). Whenever possible, the EP stock consisted of complete blisters, in original packaging. Medicines inspectors were encouraged to test samples through the blisters where appropriate. If needed, they were provided with already unpackaged samples. This was because of the limited number of samples available for the study in the EP, especially falsified medicines, and to preserve the complete blisters to avoid inspection bias introduced by progressively having more incomplete blisters stocked in the pharmacy. To avoid recall bias by medicine inspectors inspecting the EP several times, some of the brands stocked in the pharmacy were changed between inspections, samples were moved to different places, and the samples stocked in the pharmacy was thus not always consistent between inspections. Opinions from the inspectors were formed in the context of a ‘routine’ pharmacy inspection. Use of devices in different contexts, e.g. by manufacturers, or in a basic laboratory such as might be found at a provincial level, may have resulted in different user opinions. Samples of parenteral artesunate powder were scanned with the Truscan RM and Progeny through the glass vials by the inspectors, although the reference library was created by scanning through a replacement packaging (plastic packaging). These results were discarded from our analysis. We did not investigate whether the plastics of primary packaging were different between batches used for the creation of the reference libraries and the tested samples. Samples were always stored in fridges away from light. Although this was not investigated in our work, we believe that the time between the creation of the reference libraries ‘through the blister’ and the field study (6 months) was brief enough to minimize the risks of wrong conclusions due to degraded plastics. Some samples were found to be poor quality by UPLC analysis, but results were not available until after completion of the study. As a result, we did not have access to good reference library comparators, and it was decided to discard results of the 13 affected samples. The number of inspections carried out with each device and the number of samples stocked, particularly of SF medicines, were limited. The devices used in this study may have evolved since this study was performed and their performances may have changed. The results of this study may not reflect these changes. A statistical interaction between ‘device’ and ‘inspector’ was not included in our statistical analyses. This would have led to overparameterization of the models. We included inspectors as cluster-specific random effect. As an independent public health investigation performed in a field setting, this exploratory study gives evidence on some aspects of the use of devices in the field, to facilitate MRAs decisions as to whether these new technologies are appropriate for screening of diverse medicines in their countries. This article is part of a series of publications describing studies of multiple aspects of the use and implementation of devices in PMS such as their costs, cost-effectiveness, and barriers identified for their implementations (e.g. the difficulties for creating quality reference libraries for spectrometers) [7, 40–42]. These should be considered when making decisions on the best devices to use in specific settings. Without further objective validation, device implementation should be cautious. Their advantages, limitations, and cost-effectiveness [40] should be clearly understood and further investigated. However, the innovation of testing devices in an evaluation pharmacy holds promise for enhancing our understanding of their use between laboratory and real-life field scrutiny. With further work such devices hold great promise to empower medicine inspectors globally.

UPLC confirmatory methods protocols.

(PDF) Click here for additional data file.

Basic processes for qualitative spectral comparison and protocols for reference library creation. Text A. Illustration of the basic processes for qualitative spectral comparison. Fig A. Illustration of the process for reference library creation and spectral comparison analysis. Text B. The MicroPHAZIR RX spectrometer. Text C. The 4500a FTIR spectrometer. Text D. The Progeny spectrometer. Text E. The Truscan RM spectrometer. Text F. Difficulties encountered during reference library entries creation.

(PDF) Click here for additional data file.

Instructing the trainers and inspectors training in the use of the devices.

(PDF) Click here for additional data file.

Simulated Medicine Preparations. Text A. Details about simulated medicine preparation. Table A. Formulations of simulated medicine preparations.

(PDF) Click here for additional data file.

Results of the evaluation by device.

Text A. 4500a FTIR Single Reflection. Table A. Results from evaluation pharmacy inspections with the 4500a FTIR by four inspectors. Table B. Results from sample set testing for the 4500a FTIR: AL and OFLO sample set tested twice by a total of 4 inspectors. Text B. MicroPHAZIR RX. Table A. Main errors made by the three inspectors during the evaluation pharmacy inspections with the MicroPHAZIR RX. Table B. Performance of the MicroPHAZIR RX during evaluation pharmacy inspections by three inspectors. Table C. Results from sample set testing with the MicroPHAZIR RX (SMTM, OFLO, and AL sample sets, each tested once by one inspector). Text C. Minilab. Table A. Results from Minilab testing of sample sets conducted by 3 FDQCC Lao technicians. Text D. NIR-S-G1 (Beta Version). Table A. Main errors made by four inspectors during the evaluation pharmacy inspections with the NIR-S-G1. Table B. Performance of the NIR-S-G1 during evaluation pharmacy inspections by four inspectors. Table C. Results from sample set testing with NIR-S-G1. Text E. Paper Analytical Devices (PAD). Table A. Performance of the PAD during evaluation pharmacy inspections by four inspectors. Figure A. Inspector record sheet (left) for an AZITH sample (in blue pen). Lane interpretation instructions for AZITH are given (right). Table B. Main errors made by four inspectors during the evaluation pharmacy inspections with the PAD. Table C. Results from sample set testing–Paper analytical devices. Text F. Progeny. Table A. Number of samples tested and scans performed using analyse or application functions during four inspections of the evaluation pharmacy with the Progeny. Table B. Performance of the Progeny during evaluation pharmacy inspections by four inspectors. Table C. Results from four sample sets tests (SMTM by two inspectors, OFLO and AL by one inspector each) with the Progeny. Text G. Truscan RM. Table A. Results from evaluation pharmacy inspections with Truscan RM by four inspectors. Table B. Performance of the Truscan RM during evaluation pharmacy inspections by four inspectors. Table C. Results from sample set testing with the Truscan RM. (PDF) Click here for additional data file.

Main characteristics and UPLC results of the medicines utilized in the evaluation.

(PDF) Click here for additional data file.

User satisfaction questionnaire.

(PDF) Click here for additional data file.

Outline of the focus group discussions (5 inspectors per group discussions).

(PDF) Click here for additional data file. Extract of the inspection record sheet for (A) Evaluation Pharmacy inspections (B) Sample set of medicines inspections. Note that the record sheets were adapted for the PAD, Progeny, and 4500a FTIR which are interpreted differently than other devices (e.g. inspectors were asked to fill in which column of the PAD they read for interpreting the results). Table A. Evaluation pharmacy inspection. Table B. Sample set medicine inspections. (PDF) Click here for additional data file. Time and motion study observer recording sheet in a) Evaluation Pharmacy inspection b) Sample Set of medicines inspection. Note that record sheets were adapted for the Paper Analytical Cards. Table A. Evaluation pharmacy inspection. Table B. Sample set inspections. (PDF) Click here for additional data file.

Definition of the times measured in the evaluation pharmacy and sample set inspections.

(PDF) Click here for additional data file. Time spent inspecting the evaluation pharmacy by phase–(A) Wilcoxon rank sum test results and (B) primary data. Table A. P-values of the Wilcoxon rank sum test (times are not normally distributed) results for the comparison between evaluation pharmacy inspection with specified device vs initial visual inspection are presented. Table B. Time spent inspecting evaluation pharmacy by phase—primary data. (PDF) Click here for additional data file.

Median (IQR) times (seconds) per sample per device in sample set testing and results of the mixed effects generalised linear regression model. Table A. Median (IQR) sampling, device testing and recording times (seconds) per sample per device in sample set testing.

Table B. Factors influencing the total time per sample [ln(total time)] in sample set testing—mixed effects generalised linear regression model (with inspectors and observers as random effects). (PDF) Click here for additional data file.

Factors influencing the wrong classification of samples in sample set testing—mixed effects logistic regression (with inspectors as cluster-specific random effects).

(PDF) Click here for additional data file. 8 Apr 2021 Dear Dr Caillet, Thank you very much for submitting your manuscript "A comparative field evaluation of six medicine quality screening devices in Laos" for consideration at PLOS Neglected Tropical Diseases. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments. The authors are advised to very carefully consider the reviewers' comments and suggestions if they decide to submit a revised manuscript for re-consideration. We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation. When you are ready to resubmit, please upload the following: [1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. [2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file). Important additional instructions are given below your reviewer comments. Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts. Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments. Sincerely, Thuy Le Associate Editor PLOS Neglected Tropical Diseases Ricardo Fujiwara Deputy Editor PLOS Neglected Tropical Diseases *********************** The authors are advised to very carefully consider the reviewers' comments and suggestions if they decide to submit a revised manuscript for re-consideration. Reviewer's Responses to Questions Key Review Criteria Required for Acceptance? As you describe the new analyses required for acceptance, please consider the following: Methods -Are the objectives of the study clearly articulated with a clear testable hypothesis stated? -Is the study design appropriate to address the stated objectives? -Is the population clearly described and appropriate for the hypothesis being tested? -Is the sample size sufficient to ensure adequate power to address the hypothesis being tested? -Were correct statistical analysis used to support conclusions? -Are there concerns about ethical or regulatory requirements being met? Reviewer #1: The methods are well applied and the study design was appropriate. However, several general comments regarding the method may be pointed out: 1. The different devices are compared in the same way, which is, in my opinion, a misleading representation of their applicability. Indeed, the devices do not provide the same kind of information and should be used in different situations. For example, the PADs only provide information on the presence (only qualitative and not quantitative information) of a specific API and a few common excipients. This device cannot, and is not intended to (at least to the extent of my knowledge), identify specific brands or detecting substandards. Therefore, I think that the devices should be compared regarding their intended use. This should be clarified and maybe specified for each device. 2. In my opinion, the different results regarding the performances should be counterbalance by the advantages and drawback of the different techniques. Indeed, PADs have a poorer capacity at detecting falsified medicines but there is no calibration phase and their price is very low compared to most Raman and NIR devices. This may also be part of the final decision for the choice of a device. 3. An important information is missing regarding the calibration and comparison algorithms used with the spectroscopic devices. This information may possibly be found elsewhere but it is not clear to the reader where to find it. However, this information is crucial when comparing the performances of the devices since a very wide panel of algorithms and calibration strategies may be envisaged. In addition, this calibration phase and the afferent difficulties should be emphasized since it constitutes a major limitation to the use of spectroscopic techniques. 4. Have the statistical models (used to transform the spectral information into a PASS/FAIL result) been validated? If yes, this information together with the validation results should be provided. Reviewer #2: The objectives were clearly articulated and the pilot study design was appropriate. The sample size should have been larger to increase statistical power. Also the authors do not describe sufficiently and clearly how the 'simulated' medicine samples were prepared and verified against their original products especially in term of their packaging and labeling. The study should have included more dosage forms, including tablet, capsule, injectable, suspension, etc. Overall, the study method is quite creative! Reviewer #3: Are the objectives of the study clearly articulated with a clear testable hypothesis stated? 5/10 -Is the study design appropriate to address the stated objectives? 5/10 -Is the population clearly described and appropriate for the hypothesis being tested? N/A -Is the sample size sufficient to ensure adequate power to address the hypothesis being tested? N/A -Were correct statistical analysis used to support conclusions? 4/10 -Are there concerns about ethical or regulatory requirements being met? N/A -------------------- Results -Does the analysis presented match the analysis plan? -Are the results clearly and completely presented? -Are the figures (Tables, Images) of sufficient quality for clarity? Reviewer #1: _Table 1: The microphazir is not a FT based NIR device but it rather use a Hadamard transform. It is therefore more comparable to the NIR-S-G1 than to a FT instrument (for more information see doi: 10.1177/0003702818809719) _The authors should add a table of abbreviations Reviewer #2: Results were clearly presented with appropriate graphical presentations Reviewer #3: - Does the analysis presented match the analysis plan? 5/10 -Are the results clearly and completely presented? 7/10 -Are the figures (Tables, Images) of sufficient quality for clarity? 10/10 -------------------- Conclusions -Are the conclusions supported by the data presented? -Are the limitations of analysis clearly described? -Do the authors discuss how these data can be helpful to advance our understanding of the topic under study? -Is public health relevance addressed? Reviewer #1: _A supplementary limitation that should be listed is the fact that the tested device may not correspond to the actual devices due to software/hardware evolution. _Line 144: a risk when analysing tablets through the blisters is the impact of the latter on the final decision. Indeed, from the reviewer experience, a same brand of medicine may be blistered with plastics that exhibit a different spectral signature depending on the batches or providers. In addition, degradation of the blister (e.g. due to exposition to UV light) may distort the spectral signature and lead to wrong conclusion. _Table 2: In my opinion, it is risky to use laboratory-made tablets to calibrate a NIR device and use this calibration set to analyse industrially manufactured tablets. The differences in excipient origin/particle size but also the tablet shape, force of compression etc. will have an impact on the spectrum. This may lead to over optimistic results because the detection of the “falsified” or “substandard” tablet may be linked to these differences rather on the API strength. This kind of information should be present in the validation of the methods. _The authors should discuss the performances of the techniques regarding mono vs multi component formulations. Indeed, it is very hard for spectroscopic techniques to detect the absence of artemether in a lumefantrine/artemether formulation whether it is easier for colorimetric techniques. Reviewer #2: The conclusions drawn from this pilot study emphasize the need for further investigations and studies in real life settings and proper validation of the technologies prior to their deployment for post-market surveillance by the regulatory agencies. The limitations of this pilot study are clearly explained. The study is very relevant to the current public health issues on substandard and falsified medicines that need to be addressed. Reviewer #3: Are the conclusions supported by the data presented? 2/10 -Are the limitations of analysis clearly described? 8/10 -Do the authors discuss how these data can be helpful to advance our understanding of the topic under study? 8/10 -Is public health relevance addressed? 6/10 -------------------- Editorial and Data Presentation Modifications? Use this section for editorial suggestions as well as relatively minor modifications of existing data that would enhance clarity. If the only modifications needed are minor and/or editorial, you may wish to recommend “Minor Revision” or “Accept”. Reviewer #1: _Line 34: please specify “near infrared” for the Phazir and NIR-S-G1 to avoid confusion with the “mid infrared “ 4500a device at line 35. Reviewer #2: No Reviewer #3: Please see comments to the authors -------------------- Summary and General Comments Use this section to provide overall comments, discuss strengths/weaknesses of the study, novelty, significance, general execution and scholarship. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. If requesting major revision, please articulate the new experiments that are needed. Reviewer #1: The paper submitted by Caillet et al. describes the comparison of six devices to screen the quality of medicines. The paper is well written and easy to read despite the huge amount of data and results. The present paper is one of the first papers to try to have an objective evaluation and comparison of screening devices in real (or almost realistic) conditions. This should be emphasized and was well appreciated by the reviewer. The paper deserves publication in PLOS NTD since it provides very useful information regarding the acceptability of these techniques by field inspectors. This kind of information is crucial when developing new devices or methods to ensure their correct application and acceptance by end-users. Another interesting finding is the impact that the use of screening devices has on the usual visual inspection leading to potentially sloppy inspection. Indeed, the screening devices should be considered as a supplementary tool to help inspectors detecting falsifications but not as the ultimate solution. Reviewer #2: (No Response) Reviewer #3: The objective of this original work is to evaluate the usefulness and ease of use of six portable screening devices in the hands of Lao medical inspectors for an inspection in a simulated evaluation pharmacy. This is a topical issue in the fight against the use of counterfeit drugs in non-industrialized countries that affect both human and animal health. The text is well written, the summary tables are clear and even the limitations of the study are extremely well presented. No worries on the form. I wonder about the options chosen by the authors to clarify, to promote decision support for the use of such or such control devices. First, I would like to come back to the two objectives: usefulness and ease of use Could we define the term usefulness. because if I understood correctly the objective is still to discern "the true" from the "false" for drugs in terms of sensitivity specificity and/or false positive, etc. ..... To summarize, are we evaluating the "machine" or the "assistant" or both? If I understand correctly, the last option is retained. Then the question arises of the assistants and more exactly of their training in terms of calibration (intra, extra examiner variability) as recommended by the WHO. It is impossible to evaluate the performance of the devices if one biases from the beginning by not placing oneself in correct human resources conditions. Unless you place yourself in a pragmatic and not explanatory logic, which is not the case of your study. The statistical analysis can be improved. We have a "machine" effect, an "examiner" effect (which varies in intra and inter) and a "Machine "X "Examiner" dependent effect. You thus approach in a very precise way the time spent by acts which is not in itself on a priority evaluation grid (absent in the material and methods). In the same way, approaching the perception or satisfaction of users in a "QALY study" manner is not logical. Here again, it is necessary to prepare a robust evaluation questionnaire beforehand. Concerning the discussion and the conclusion, if I am a public health decision-maker, I remain uncertain. The logic, like any comparative trial, would be to prioritize the products or group of products and produce operational recommendations. Otherwise, we miss our target. My advice, but it is only advice, would be to restructure your article. The priority would be: What are in terms of sensitivity, specification, etc... the best or worst devices. And this, by placing you in a pragmatic logic, ie, a standard level of training, "basic" by describing the nature of training in the form of figure or table of synthesis. And therefore take out the criteria "time" and "perception" of the analysis to arrive at objective "machine" recommendations more or less hierarchical. The time factor could then be a secondary outcome or... a second article -------------------- PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: Yes: Souly Phanouvong Reviewer #3: No Figure Files: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Data Requirements: Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5. Reproducibility: To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols Submitted filename: PNTD-D-20-02193_reviewer-sp.pdf Click here for additional data file. 17 Jun 2021 Submitted filename: Answer to reviewers - First revision ADB article 3.docx Click here for additional data file. 23 Jul 2021 Dear Dr Caillet, We are pleased to inform you that your manuscript 'A comparative field evaluation of six medicine quality screening devices in Laos' has been provisionally accepted for publication in PLOS Neglected Tropical Diseases. Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests. Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated. IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript. Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS. Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Neglected Tropical Diseases. Best regards, Thuy Le Associate Editor PLOS Neglected Tropical Diseases Ricardo Fujiwara Deputy Editor PLOS Neglected Tropical Diseases *********************************************************** Reviewer's Responses to Questions Key Review Criteria Required for Acceptance? As you describe the new analyses required for acceptance, please consider the following: Methods -Are the objectives of the study clearly articulated with a clear testable hypothesis stated? -Is the study design appropriate to address the stated objectives? -Is the population clearly described and appropriate for the hypothesis being tested? -Is the sample size sufficient to ensure adequate power to address the hypothesis being tested? -Were correct statistical analysis used to support conclusions? -Are there concerns about ethical or regulatory requirements being met? Reviewer #1: My comments are fully answered Reviewer #2: The authors have addressed my comments/suggestions provided in the earlier version satisfactorily. I have further comments on this revised version. ********** Results -Does the analysis presented match the analysis plan? -Are the results clearly and completely presented? -Are the figures (Tables, Images) of sufficient quality for clarity? Reviewer #1: My comments are fully answered Reviewer #2: Results are well-presented and figures are of sufficient quality for clarity. ********** Conclusions -Are the conclusions supported by the data presented? -Are the limitations of analysis clearly described? -Do the authors discuss how these data can be helpful to advance our understanding of the topic under study? -Is public health relevance addressed? Reviewer #1: My comments are fully answered Reviewer #2: Conclusions are supported by the data generated from the study. The limitations of the study are also well-described. The study results would fairly help policy decision makers, regulatory officials and program managers as well as public health personnel understand better the capability and limitations of the 6 field detection technologies and thus inform their decision in selecting which one(s) to be used. ********** Editorial and Data Presentation Modifications? Use this section for editorial suggestions as well as relatively minor modifications of existing data that would enhance clarity. If the only modifications needed are minor and/or editorial, you may wish to recommend “Minor Revision” or “Accept”. Reviewer #1: My comments are fully answered Reviewer #2: No comment. ********** Summary and General Comments Use this section to provide overall comments, discuss strengths/weaknesses of the study, novelty, significance, general execution and scholarship. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. If requesting major revision, please articulate the new experiments that are needed. Reviewer #1: The paper may be published in its revised version Reviewer #2: Overall, the study findings would contribute to the literature and practices in the fight against substandard and falsified antimalarial and other medicine classes potentially. ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No 23 Aug 2021 Dear Dr Caillet, We are delighted to inform you that your manuscript, "A comparative field evaluation of six medicine quality screening devices in Laos," has been formally accepted for publication in PLOS Neglected Tropical Diseases. We have now passed your article onto the PLOS Production Department who will complete the rest of the publication process. All authors will receive a confirmation email upon publication. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any scientific or type-setting errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Note: Proofs for Front Matter articles (Editorial, Viewpoint, Symposium, Review, etc...) are generated on a different schedule and may not be made available as quickly. Soon after your final files are uploaded, the early version of your manuscript will be published online unless you opted out of this process. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. Thank you again for supporting open-access publishing; we are looking forward to publishing your work in PLOS Neglected Tropical Diseases. Best regards, Shaden Kamhawi co-Editor-in-Chief PLOS Neglected Tropical Diseases Paul Brindley co-Editor-in-Chief PLOS Neglected Tropical Diseases
Table 1

Main characteristics of the devices included in the study*.

Device nameManufacturer or InstitutionMarket statusTechnology Main SpecificationsHandheldCosta
4500a FTIR Single Reflection SpectrometerAgilent Technologies [9]MFTIR-MIR Spectral range 4,000cm-1-650cm-1NUS$ 31,067
MinilabbGlobal Pharma Health Fund E.V. [10]MTLC, disintegration testNUS$ 2,510 (without reference standards)
MicroPHAZIR RX analyserThermoFisher Scientific [11]MNIR–Dispersive Wavelength range 1,600nm-2,400nmYUS$47,500
NIR-S-G1 SpectrometerYoung Green Energy -Innospectrac [12,13] (Global Good Fund developed the smartphone application)MdNIR–Dispersive Wavelength range 900nm-1,700nmYUS$1,199 (without smartphone)
Paper Analytical DeviceUniversity of Notre-Dame [14] and Veripad [15] (Kenya, New-York and Boston)DPaper-based colour testY (S)US$3
Progeny SpectrometerRigaku [16]MRaman 1,064 nm laserY(ex-demo model)
TruScan RM SpectrometerThermoFisher Scientific [17]MRaman 785 nm laserYUS$ 62,500 (including chemometric software package and tablet holder)

*Rapid Diagnostic Tests (RDT) and single-use immunoassay devices were deemed field-suitable in the laboratory evaluation work [7], but could not be evaluated in the present study because the developers of the single-use immunoassay test were unable to supply sufficient samples of the devices within the timeframe of the project.

D, Under development; FTIR, Fourier Transform Infrared; M, Marketed; MIR, Mid-Infrared; MS, Mass spectrometry; N, No; NIR, Near infrared; S, Single-use device; TLC, Thin-layer chromatography; Y, Yes.

a The costs reported here do not include VAT that may vary by country of purchase. Ordering several devices from the manufacturer is subject to potential reduced purchase cost.

b Unlike other devices, the Minilab was evaluated by laboratory technicians involved in current routine quality control at the National Center for Food and Drug Analysis.

c At the time of the study the NIR unit was produced by Young Green energy. It is now produced by InnoSpectra Corporation.

d The near-infrared sampling unit is marketed, but the smartphone application is not.

Table 2

Samples sets of medicines initially included in sample set testing.

APIStudy CodeBrand nameQuality type and origin of the medicines
SMTM SPS20 Sulfatrim G—Field-collected
SPS21 Sulfatrim G—Field-collected
SPS16Diabeta 250F—Look-alike¥ - Field-collected
SPS03N/A*100% API simulated medicine
SPS04N/A*50% API simulated medicine
SPS02N/A*0% API simulated medicine
AL SPS06 IPCA G—Field-collected
SPS07 IPCA F—Field-collected
SPS22 Coartem G—field-collected
SPS09CoartemG—Field-collected
SPS10CoartemF—Field-collected
SPS11CoartemF—field collected
OFLO SPS14OfloceeG—Field-collected
SPS15OfloxacinG—Field-collected
SPS13Di-FloG- Field-collected
SPS05N/A*100% API—Simulated medicine
SPS01N/A*50% API simulated medicine
SPS02N/A*0% API simulated medicine

The test sample (SPS22) and the reference library samples (n = 4: SPS20, SPS21, SPS06, SPS07) that were subsequently discarded from the results because of unexpected out-of-specifications API content as per UPLC analysis, are given in italics and highlighted. The samples with out-of-specifications reference library samples were still used for the PAD evaluation as the PAD reference libraries are independent reference pictures provided by the device developers.

G: genuine; F: falsified.

*Simulated sample

¥ ‘look-alike’ medicines are defined as medicines stated as containing specific API (not one of the seven API included in this study) but the tablets were visually indistinguishable from genuine medicines in order to mimic a falsified medicine with a wrong API; the actual medicine was Diabeta (chlorpropamide), but the tablets looked identical to Sulfatrim (SMTM)] [21].

Table 3

Pairwise comparisons of the median total time taken per sample in sample set testing.

4500a FTIRMicroPHAZIR RXMinilabNIR-S-G1PADProgenyTruscan RM
4500a FTIR -<0.001<0.001<0.001<0.0010.0040.009
MicroPHAZIR RX --<0.001<0.001<0.001<0.0010.002
Minilab ---<0.001<0.001<0.001<0.001
NIR-S-G1 ----<0.001<0.001<0.001
PAD -----<0.001<0.001
Progeny ------0.51
Truscan RM -------
Median total time per sample (IQR)/secs 316 (206–373)134 (98–170)2,063 (1,766–2,920)94 (61–112)620 (562–716)273 (163–302)148 (109–299)

P-values of the mixed effects generalised linear regression model of ln(total time) adjusted by device and training, and clustered by inspectors and observers

Table 4

Pairwise comparisons of the percentage of samples wrongly classified over all inspections out of total samples tested overall with the devices in the evaluation pharmacy inspections.

4500a FTIRMicroPHAZIR RXNIR-S-G1PADProgenybTruscan RMb
4500a FTIR -0.1031.0000.014*1.0000.242
MicroPHAZIR RX --0.243<0.001***0.167N/Aa
NIR-S-G1 ---0.005**1.0000.269
PAD ----0.023*<0.001***
Progeny -----0.225
Truscan RM ------
% samples wrongly classified (95% CI) 9.7 (2.0–25.8)0 (0–10.3)7.7 (1.6–20.9)37.9 (20.7–57.7)8.3 (1.0–27.0)0 (0–13.2)

P-values of the Fisher’s exact test are presented

* p<0.05

**p<0.01

***p<0.001

a Not applicable as no samples were wrongly categorised in inspections with the Truscan RM or MicroPHAZIR RX

b Artesunate samples were discarded from the results analysis because samples were scanned through the glass vials by the inspectors, although reference library was created by scanning through a replacement packaging (plastic packaging)

Table 5

Comparison of the number of samples incorrectly classified in evaluation pharmacy inspections with devices vs initial visual inspection without device.

DeviceZp-valueMedian (IQR) number of samples wrongly classified with the deviceMedian (IQR) number of samples wrongly classified in initial inspection$
4500a FTIR -1.9800.048*1.0 (0.3–1.0)2.0 (1.0–2.3)
MicroPHAZIR RX 2.6380.008**0 (0–0)2.0 (1.0–2.3)
NIR-S-G1 1.9800.048*1.0 (0.3–1.0)2.0 (1.0–2.3)
PAD -0.4800.6312.0 (1.0–5.3)2.0 (0.8–2.3)
Progeny 1.8910.0590 (0–1.5)1.5 (1.0–2.3)
Truscan RM 2.8140.005**0 (0–0)1.5 (1.0–2.3)

Z statistic and p-value of the Wilcoxon rank sum test are presented

* p<0.05

**p<0.01

***p<0.001

$ The numbers of samples wrongly categorized in initial inspections without devices used in the comparisons vary because we included only brands tested in initial inspections that each device were able to test (e.g. AL samples wrongly categorized during initial inspections were excluded for the PAD, as the PAD could not test samples containing AL). In both initial inspections without devices and inspections with devices, we excluded samples wrongly categorized from brands subsequently found to have reference library spectra obtained from poor quality reference samples (as per UPLC analyses), except for the PAD.

Table 6

Matrix of pairwise comparisons of accuracy of devices in classifying samples incorrectly during sample set inspections (test device in row vs reference device in column).

4500a FTIRMicroPHAZIR RXMinilabNIR-S-G1PADProgenyTruscan RM
4500a FTIR -1.2 (0.1–25.0)0.5 (0.0–7.5)0.5 (0.0–5.8)0.5 (0.0–4.9)0.3 (0.0–3.5)0.6 (0.1–7.5)
MicroPHAZIR RX --0.4 (0.0–6.5)0.4 (0.0–5.3)0.4 (0.0–3.8)0.2 (0.0–2.8)0.5 (0.0–6.1)
Minilab ---0.9 (0.1–10.2)0.9 (0.1–6.5)0.6 (0.1–4.2)1.3 (0.2–10.8)
NIR-S-G1 ----1.0 (0.1–6.9)0.6 (0.1–3.0)1.4 (0.2–10.9)
PAD -----0.4 (0.1–2.5)1.4 (0.3–7.0)
Progeny ------2.3 (0.4–13.0)
% samples wrongly classified (95%CI) $ 5.6 (0.1–27.3)7.7 (0.2–36.0)11.8 (1.5–36.4)11.1 (1.4–34.7)20.8 (7.1–42.2)23.5 (6.8–49.9)15.0 (3.2–37.9)

Odds ratio (95% CI) of the mixed effect logit model, with adjustment on the type of training received (rudimentary or intensive), sample set type (OFLO, AL, SMTM), and clustered by inspectors, are presented

$ 95%CI for binomial distribution

Table 7

Observed user errors during EP and SSM inspections.

Selection of wrong reference libraries in EPSelection of wrong reference libraries in SSMOther errors
Device Scans % (n/N) Samples % (n/N) % (n/N) of samples misclassified (N = total number tested) Scans % (n/N) Samples % (n/N) % (n/N) of samples misclassified (N = total number tested) Description % (n/N) of samples misclassified (N = total number tested) Comments
4500a FTIR N/AN/AN/AN/AN/AN/A5.9% (3/51) scans in EP and 3.0% (1/33) in SSM were not renamed after acquisition in the device memory0Samples were recorded on paper by the inspectors. Thus errors did not result in sample misclassification, but could affect traceability in practice
MicroPHAZIR RX * 3.9% (2/51)5.9% (2/34)0.0% (0/34)0.0% (0/33)0.0% (0/13)0.0% (0/13)$5.9% (3/51 scans with tablets not inserted in sample cover) in EP0$All errors made by inspector with rudimentary training
NIR-S-G1 27.0% (17/63)28.2% (11/39)5.1% (2/39)27.8% (10/36)33.3% (6/18)11.1% (2/18)NoneNone
PAD N/AN/AN/AN/AN/AN/AReading result errors in 24.1% (7/29) samples tested in EP and 20.8% (5/24) in SSM6.9% (2/29) in EP and 16.7% (4/24) in SSMIn some cases both the PAD showed wrong colours and the user made an error of interpretation, leading to overall correct classification (more details in S5 Text. Results of the evaluation by device)
None of the failing samples were rerun despite clear instructions to rerun suspicious samplesUncertain
Use of the same visibly contaminated water for multiple PAD during one EP inspection (inspector with rudimentary training)Uncertain
Progeny ¥ 7.5% (4/53)13.3% (4/30)0.0% (0/30)$0.0% (0/21)0.0% (0/13)0.0%$Deviation from study protocol: One inspector did not run the ’Application’ test after running the ’Analyse’ function0
Truscan RM 20.0% (9/45)19.4% (6/31)0.0% (0/31)25.8% (8/31)20.0% (5/20)0.0%NoneNoneInspectors did not recognize they selected the wrong library entry, but the device returned correct result

*3 inspections only with the MicroPHAZIR RX because results of one inspection were discarded because of an issue over the inbuilt reference library

$Errors were recognized by the inspectors who re-tested the samples without mistakes

¥ Wrong selection of the reference library can happen only with the ’Application’ function. Using the analyse function, no user errors were observed during both EP and SSM

  15 in total

1.  The quality of drugs in private pharmacies in Lao PDR: a repeat study in 1997 and 1999.

Authors:  Lamphone Syhakhang; Cecilia Stålsby Lundborg; Björn Lindgren; Göran Tomson
Journal:  Pharm World Sci       Date:  2004-12

2.  Proficiency testing as a tool to assess the performance of visual TLC quantitation estimates.

Authors:  Peter Risha; Zera Msuya; Margareth Ndomondo-Sigonda; Thomas Layloff
Journal:  J AOAC Int       Date:  2006 Sep-Oct       Impact factor: 1.913

3.  Quality Assessment of 7 Cardiovascular Drugs in 10 Sub-Saharan Countries: The SEVEN Study.

Authors:  Marie Antignac; Bara Ibrahima Diop; Bernard Do; Roland N'Guetta; Ibrahim Ali Toure; Patrick Zabsonre; Xavier Jouven
Journal:  JAMA Cardiol       Date:  2017-02-01       Impact factor: 14.676

4.  Paper-Based Enzyme Competition Assay for Detecting Falsified β-Lactam Antibiotics.

Authors:  Katherine E Boehle; Cody S Carrell; Joseph Caraway; Charles S Henry
Journal:  ACS Sens       Date:  2018-06-26       Impact factor: 7.711

5.  Population awareness of risks related to medicinal product use in Vientiane Capital, Lao PDR: a cross-sectional study for public health improvement in low and middle income countries.

Authors:  Céline Caillet; Chanvilay Sichanh; Lamphone Syhakhang; Cyrille Delpierre; Chanthanom Manithip; Mayfong Mayxay; Maryse Lapeyre-Mestre; Paul N Newton; Anne Roussin
Journal:  BMC Public Health       Date:  2015-06-27       Impact factor: 3.295

6.  Role of Medicines of Unknown Identity in Adverse Drug Reaction-Related Hospitalizations in Developing Countries: Evidence from a Cross-Sectional Study in a Teaching Hospital in the Lao People's Democratic Republic.

Authors:  Céline Caillet; Chanvilay Sichanh; Gaëtan Assemat; Myriam Malet-Martino; Agnès Sommet; Haleh Bagheri; Noudy Sengxeu; Niphonh Mongkhonmath; Mayfong Mayxay; Lamphone Syhakhang; Maryse Lapeyre-Mestre; Paul N Newton; Anne Roussin
Journal:  Drug Saf       Date:  2017-09       Impact factor: 5.606

7.  Field detection devices for screening the quality of medicines: a systematic review.

Authors:  Serena Vickers; Matthew Bernier; Stephen Zambrzycki; Facundo M Fernandez; Paul N Newton; Céline Caillet
Journal:  BMJ Glob Health       Date:  2018-08-29

8.  A random survey of the prevalence of falsified and substandard antibiotics in the Lao PDR.

Authors:  Patricia Tabernero; Isabel Swamidoss; Mayfong Mayxay; Maniphone Khanthavong; Chindaphone Phonlavong; Chanthala Vilayhong; Sengchanh Yeuchaixiong; Chanvilay Sichanh; Sivong Sengaloundeth; Michael D Green; Paul N Newton
Journal:  J Antimicrob Chemother       Date:  2019-08-01       Impact factor: 5.790

9.  Global landscape assessment of screening technologies for medicine quality assurance: stakeholder perceptions and practices from ten countries.

Authors:  Lukas Roth; Ameena Nalim; Beth Turesson; Laura Krech
Journal:  Global Health       Date:  2018-04-25       Impact factor: 4.185

10.  Substandard Cisplatin Found While Screening the Quality of Anticancer Drugs From Addis Ababa, Ethiopia.

Authors:  Madeline S Eberle; Ayenew Ashenef; Heran Gerba; Patrick J Loehrer; Marya Lieberman
Journal:  JCO Glob Oncol       Date:  2020-03
View more
  1 in total

1.  An open-source smartphone app for the quantitative evaluation of thin-layer chromatographic analyses in medicine quality screening.

Authors:  Cathrin Hauk; Mark Boss; Julia Gabel; Simon Schäfermann; Hendrik P A Lensch; Lutz Heide
Journal:  Sci Rep       Date:  2022-08-04       Impact factor: 4.996

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.