Literature DB >> 31978198

An automated alarm system for food safety by using electronic invoices.

Wan-Tzu Chang^1,2, Yen-Po Yeh^3,4, Hong-Yi Wu², Yu-Fen Lin³, Thai Son Dinh², Ie-Bin Lian^1,2.

Abstract

BACKGROUND: Invoices had been used in food product traceability, however, none have addressed the automated alarm system for food safety by utilizing electronic invoice big data. In this paper, we present an alarm system for edible oil manufacture that can prevent a food safety crisis rather than trace problematic sources post-crisis.
MATERIALS AND METHODS: Using nearly 100 million labeled e-invoices from the 2013‒2014 of 595 edible oil manufacturers provided by Ministry of Finance, we applied text-mining, statistical and machine learning techniques to "train" the system for two functions: (1) to sieve edible oil-related e-invoices of manufacturers who may also produce other merchandise and (2) to identify suspicious edible oil manufacture based on irrational transactions from the e-invoices sieved.
RESULTS: The system was able to (1) accurately sieve the correct invoices with sensitivity >95% and specificity >98% via text classification and (2) identify problematic manufacturers with 100% accuracy via Random Forest machine learning method, as well as with sensitivity >70% and specificity >99% through simple decision-tree method.
CONCLUSION: E-invoice has bright future on the application of food safety. It can not only be used for product traceability, but also prevention of adverse events by flag suspicious manufacturers. Compulsory usage of e-invoice for food producing can increase the accuracy of this alarm system.

Entities: Chemical Disease Species

Mesh：

Substances：
Oils

Year: 2020 PMID： 31978198 PMCID： PMC6980643 DOI： 10.1371/journal.pone.0228035

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

The emerging use of rapidly collected, complex data in unprecedented quantities is ushering the world into the era of big data [1]. Although utilization of big data has the potential to afford new insights, improve decision making and governance, and enhance the quality and efficiency of products and services, their application in the food safety domain is still limited [2]. Food safety data and information comprise structured and non-structured data from multiple sectors such as environment, animal, agriculture, food, public health, trade and economy. Previous efforts have explored the predictive power of big data in foodborne illness surveillance, environmental microbial contamination of crops, food safety violations and interpretation of genomic data for tracking and tracing foodborne illnesses [2]. A recent interesting application is that the Chicago Department of Public Health (CDPH) [3] used the data collected from routine food inspections of over 15,000 retail food establishments to train a machine learning model, which produces risk scores for CDPH to prioritize the schedule of inspections. In past decades, a variety of food fraud incidents have been reported in many countries. Such incidents have had a profound impact on public health and consumer confidence in the safety of food [4]. In response to these incidents, one of the main focuses of food fraud prevention has been on novel prediction models of food fraud using a big data approach, which considered different factors from within and outside the food supply chain. [5-8]. The Bayesian Network by Marvin et al. [6] in 2016 is the first modelling approach or system we can find for the food fraud detection. The approach uses multiple factors on food safety to predict the increased likelihood of occurrence of safety incidents so as to prevent. Bouzembraka et al. [7] developed a food fraud tool called MedISys-FF that collects, processes and presents food fraud reports published globally in the media, which utilize text mining analysis of the articles to facilitate the development of control measures and to detect food fraud. Other approaches like ISAR-Tool (Import Screening for the Anticipation of Food Risks) [8] facilitates a descriptive analysis of the food commodity listed in the national trade statistic, and enables automated detection of unexpected changes in volumes and prices for potential food fraud. Due to the lack of information of business-to-business transactions between manufactures, most of the above are holistic approaches that utilize summarized statistics or reports. In 2013 and 2014, several food fraud scandals broke out in Taiwan; in particular, contamination and mislabeling of cooking oil were discovered [9], including the use of recycled cooking oil and low quality lard. Although all companies involved were convicted, these incidents caused controversy and severe damage to Taiwan’s reputation on food safety. Despite many detection methods for the identification of adulterated oils and fats have been developed, they have not been adopted as official methods internationally due to their complexity and limited applications [10]. Taiwan Food and Drug Administration had developed analytical methods and “Sanitation Standard for Edible Oils” [11] to assess whether oil has been adulterated. However, the investigation in these incidences detected very few substandard samples. It is fair to say that currently there is little efficient methods on detecting refined adulterated oils from edible oils [12]. Actually the local government had long been suspicious of these problematic manufacturers. Several surprise inspections found that manufacturers were able to manipulate the chemical components of edible oil discreetly so that the composition of fatty acids and other indices matched all the criteria. The convictions on these manufacturers were made not based on the analytic detection of false adulterated chemicals but on their irrational business transaction invoices, i.e., disproportionate amount of materials bought for edible oil to the amount of products sold because they frequently used materials illegal for food production. The result gave us a hint that effective ways to prevent the illegal adulteration of oils may need both on-site inspections and appropriate source management. The outbreak of major food safety incidence could result costly public panic and damage of goodwill [13]. To address loopholes in existing food safety and traceability rules, the central government of Taiwan passed a law that requires edible oil manufacturers to use electronic invoicing (e-invoice) starting from October 2014 to make transactions of oil merchandise traceable [14]. The Taiwan Ministry of Finance launched an e-invoice system in 2009, and since then, the amount issued increased yearly [15]. In 2018, 7.2 billion business-to-business (B2B) e-invoices were issued and saved in the Fiscal Information Agency (FIA) of the Ministry of Finance. Using invoices in food product traceability systems is not a new idea [16-17]. To establish traceability system, all the related records (ie. invoices, receiving and shipping papers etc.) of each transaction should be kept and retained for a period of five years at least, regardless of whether the form of such records is in paper, electronic or otherwise. To assess the consistency of these records and to identify unusual and inappropriate trends is time-consuming and demanding for experienced manpower because of the heavy burden to inspect the large amount of detailed information concerning date of purchase or supply, name of products, quantity received or supplied, and name and address of the suppliers or distributors etc. Utilizing e-invoice big data provides an opportunity to overcome aforementioned difficulty. Early detection of food fraud incidents via warning signs of suspicious transactions is a plausible approach. However, based on the authors’ knowledge, no literature has addressed automated alarm systems for food safety by utilizing e-invoice big data. An efficient alarm system for edible oil manufacture must be able to (1) sieve the edible oil-related e-invoices of manufacturers who may also produce other merchandise and (2) identify suspicious edible oil manufacture based on irrational transactions from the e-invoice sieved from (1). Accordingly, this study has two aims: (1) Sieving invoices related to edible oil: The e-invoice big data provided by the FIA can be used to build a classifier based on text mining and machine learning method that can sieve automatically and accurately edible oil-related invoices for each manufacturer and future invoices. (2) Identifying suspicious manufacturers with suspicious monthly transactions: An efficient classifier to identify suspicious manufacturers based on the related e-invoice was shown in The above two functions were integrated into an automatic alarm system based on SAS® Enterprise Miner™ 14.3 [18]. In this study, edible oil refers specifically to cooking oil, which is plant, animal, or synthetic fat used in frying, baking, and other types of cooking, as well as in salad dressings.

Methods

Data processing

A bilateral agreement was signed between FIA and the institute of the authors for permission of accessing the e-invoice data and analyzing it under the supervision of FIA. Data from 99,926,514 e-invoices transacted from January 1, 2014 to October 30, 2017 by 595 edible oil manufacturers were then provided by the FIA. All registered edible oil manufacturers with a capital amount of over 300 million Taiwan dollars (roughly 1 million USD) in Taiwan have been required to use an e-invoice system for all their transactions since 2014 [19] and 595 manufacturers were qualified. Each invoice provided information on the business identities of the vendor and the vendee, including date, amount of money, and item name. However, quantity was not always listed. Given the importance placed on the protection of privacy, the manufacturer names were encrypted. However, according to the list we provided, the FIA sorted them into three categories: A: 21 benchmark manufacturers, B: six problematic manufacturers, and C: 568 unspecified. The A-manufacturers were those commended by the Taiwan Food and Drug Administration before 2014 for their good reputation in high-quality food manufacturing and B-manufacturers were those who had adverse food safety incidence reported or convictions in 2013‒2014. The A, B C class each has 10,958,095, 44,590 and 88,923,829 invoices (99,926,514 in total), comprised of 110,448, 10,717 and 970,715 items respectively (over 1 million different items in total). The dataset used in this study is owned by FIA, and was analyzed inside the FIA data center. To access the data researchers can contact Taiwan FIA to apply for the authorization.

Sieving out edible oil related invoice by text mining

We used the following steps to determine the optimal classifier for sieving out new e-invoices. The flow chart of the steps was illustrated in .

Step 1. Summarizing merchandize name

The 99,926,514 e-invoices were summarized into 1,011,596 different items in transaction, and among them, 29,942 items were sieved using the keyword “oil”. However, they were not yet necessary for edible oil-related items of interest. For example, the Chinese item names of soybean sauce or some cosmetic products also had the keyword “oil” in it, and these should be filtered out. Therefore, we conducted the next steps by sieving further.

Step 2. Labeling

We used the eyeballing method to label the 29,942 items into 7,847 “edible oil-related” and 22,095 “non-edible oil-related” products.

Step 3. Text mining to identify keywords/topics

The text mining function in the SAS Enterprise Miner automates keywords/topics process using the following steps: identifies keywords and their instances in data using Nature Language Processing (NLP); eliminates malicious or irrelevant text; and clusters the keywords into m topics, where m is a preset parameter. In this study, we applied m = 60, 90, 120, 150 and 180. The authors had held several consultation meeting with experts including the Director-general of Changhua County Health Bureau, who has rich experience in dealing with the food safety issues, and managing teams from several manufacturers. We also conducted an expert-knowledge based process to compare with the performance of the above process in which the topics/keywords were selected by artificial intelligence (AI). Researchers selected 60 keywords that they agreed could best distinguish edible oil-related items from others.

Step 4. Supervised machine learning

We used the topics or keywords from the above step as features for the following popular supervised learning classifiers: k-nearest neighbor (KNN), support vector machine (SVM), neural network, logistic regression and random forest (RF). KNN is a decision made by examining the labels on the KNNs and voting [20]. SVMs and neural networks tend to perform better when dealing with multi-dimensions and continuous features [21]. RF is an ensemble classifier that performs well compared to other traditional classifiers for effective classification [22]. We also compared them with a built-in classifier in SAS® EM™ 14.3 called Text Rule Builder. This step was conducted using the with and without feature selection procedure for comparison. The data was randomly divided into training-testing data by 8:2 ratio and results in Tables 1 and 2 were based on the 20% testing data. The classifiers modeled using 5-fold cross-validation strategy. The respective sensitivity and specificity were calculated. The sensitivity refers to the probability of identifying edible oil-related e-invoices. Specificity refers to the probability to identify non-related ones.

Table 1

Performance of various classifiers on identifying correct invoices using different choice of m (number of topics).

Topic	Accuracy*	KNN (k = 5)	SVM (linear)	Logistic Regression	Neural Network	Random Forest
m:30	Sensitivity	91.6%	81.1%	81.2%	88.3%	93.8%
	Specificity	97.9%	95.7%	96.3%	97%	98.1%
	Error rate	3.8%	8.1%	7.7%	5.3%	3%
m:60	Sensitivity	92%	84.5%	85.1%	91.6%	95.4%
	Specificity	97.2%	95.8%	96.1%	95.2%	98.4%
	Error rate	4.1%	7.2%	6.8%	5.1%	2.4%
m:90	Sensitivity	91.8%	87.7%	89.3%	91.2%	95.2%
	Specificity	97.2%	96.2%	96.4%	96.8%	98.4%
	Error rate	4.2%	6%	5.4%	4.7%	2.4%
	Sensitivity	92.3%	90.1%	91%	93.5%	95.2%
m:120	Specificity	97.5%	96.2%	96.9%	97.1%	98.5%
	Error rate	3.9%	5.4%	4.6%	3.9%	2.4%
	Sensitivity	92.4%	90.4%	91.4%	93.2%	95.2%
m:150	Specificity	97.3%	96.2%	97%	97.1%	98.6%
	Error rate	4%	5.3%	4.4%	4%	2.3%
	Sensitivity	94.1%	92.1%	93.1%	93.6%	95.2%
m:180	Specificity	97.9%	96.7%	97.6%	97.9%	98.5%
	Error rate	3.1%	4.5%	3.6%	3.2%	2.4%

*Sensitivity: probability to identify related e-invoice

Specificity: probability to identify the non-related e-invoice

Error rate: total proportion of accuracy

Table 2

Performance of various classifiers on identifying correct invoices using customized keywords.

Custom	Accuracy	KNN (k = 5)	SVM (linear)	Logistic Regression	Neural Network	Random Forest
(no feature selection)m:60	sensitivity	91.7%	92.7%	93.2%	93.6%	93.9%
	specificity	97.9%	98.2%	98.4%	98.3%	98.1%
	error rate	3.8%	3.2%	3%	3%	3%
(feature selection)m:60→31	sensitivity	89.7%	89.8%	90.2%	91%	90.5%
	specificity	97.7%	97.7%	97.8%	97.8%	97.9%
	error rate	4.4%	4.4%	4.2%	4.4%	4.1%

*Sensitivity: probability to identify related e-invoice Specificity: probability to identify the non-related e-invoice Error rate: total proportion of accuracy

Step 5. Model comparison

shows the integration of the above steps for sieving related invoices via SAS® EM™ 14.3. The sensitivity and specificity of each classifier were used to determine: Which m is a better selection How the feature selection procedure would improve accuracy Which classifier has the best performance and would apply ensemble techniques on several classifiers to improve performance [23]

Identifying manufacturers with irrational monthly transaction

Using the edible oil-related e-invoices, we summed up the monthly amounts of purchase and sales for each manufacture. Then, ideally each of the 21 A-labeled and 6 B-labeled manufacturers had records of 46 monthly purchase amount and sales amount. In consultation meetings experts including the Director-general of Changhua County Health Bureau, and managing teams from several A-manufacturers had discussed the proper features on classifying the "rational" vs "irrational" transactions. Based on the longitudinal features of purchase and sales amount and their functions, we applied the supervised learning method including KNN, SVM, neural network, logistic regression, RF, as well as some simple discriminant function build up optimal classifiers based on their sensitivity and specificity. Sensitivity refers to the probability of identifying suspicious B manufacturer, while specificity refers to the probability of identifying benchmark A manufacturer. Similar 5-fold and cross-validation strategy in process (1) were applied here.

Results

Performance on sieving out correct invoices

Table 1 shows the performance of various classifiers on identifying correct edible oil-related invoices using different choices of m (number of topics). The results were based on 20% of the randomly selected testing data. The error rate stands for the total proportion of accuracy. For SVM, we only listed the results for using linear kernel. For KNN, we only listed k = 5 because they have better performance than other choices of parameters. The increase of m caused a slightly increase in the sensitivity (se) and specificity (sp) across all methods. Overall, RF has the best performance with se >95% and sp >98%, followed by KNN (se >92% and sp >97%), neural network, logistic regression and SVM. Even the logistic regression had se and sp equal to 91% and 96.9%, respectively when m = 120. The text rule builder can generate an ordered set of rules with se>92% and sp>98%. Certain variations were tested, including adding feature selection steps before classification and several classifiers into one. However, they do not improve the original results, which is already sufficient. Table 2 lists the results of performance of the same classifiers using 60 keywords selected based on expert knowledge. The results reflected se >93% and sp >98% for most classifiers. Compared with those in Table 1, the machine-automated selection scheme in Table 1 is at least substantial to the expert’s knowledge when using more topics (m = 180 for example). This result indicates that while applying this system to other food manufacture monitoring, we can trust the keyword/topic selection ability of the machine without inputting additional expert knowledge. Similar to previous results, reducing keyword numbers (from 60 to 31) by feature selection does not improve the performance.

Performance on identifying suspicious manufacturers

After summing up the monthly amount of purchase and sales based on the edible oil-related e-invoices of each manufacturer, we further used the series moving averages of the last three months to place the artificial bounds of the month under fuzzification and filter out the noise. In the chain supply, the purchase of a business identity from another identity is referred to as the downstream (D) in contrast to the sales part, which is referred to as the upstream (U). Therefore, we denote the purchase and sales amount of the ith month after smoothing as follows: We considered that the purchase and sales occurred on the first day of the ith month were not much different from the last day of the previous month. In the results, a total of 425 A-labeled manufacturer month records and 27 B-labeled manufacturer month records are available. A considerable amount of B-labeled records are missing because the compulsory law for using e-invoice started only in 2014 and thus, several B-labeled manufacturers might not have used it before then. After the scandals broke out, certain B-labeled manufacturers were closed, and naturally, the transaction records ceased to exist. In addition to the original features of D and U, we also tested their various transformation and combination in modeling classifiers to improve its accuracy in classifying “good” and “bad” manufacturers. We found the following two features are the most influential and had relatively better performance in classification: X1 is the log-transformed turnover ratio of sales to purchase in the same month i, and X2 considers the lag effect between purchasing materials and sale products. shows the scatter plot of A- and B-labeled manufacturer month records, with 95% and 99% elliptic prediction regions, respectively. B-labels appears to cluster on the upper right corner. Apart from KNN, SVM, neural network, logistic regression and RF, we also consider a naïve classifier in the form of a decision tree: a point is classified as B if X > 6 and X > 6, and as A, otherwise. (A) Scatter plot of A- and B-labeled manufacturer, (B) Scatter plot of A-, B- and C-labeled manufacturer, with 95% and 99% prediction regions. Table 3 shows the performance of various classifiers on discriminating As and Bs. Evidently, RF had the best performance with se >96% and sp >99%. KNN has the second best performance, follow by the naïve decision tree with simple criteria. These results are useful because the rule can be implemented easily. We also applied the above system to classify the unspecified 1,969 C-labeled manufacturer month records. presents a scatter plot of adding these unspecified Cs into Fig 4(A). A few Cs were also located at the upper right corner. Using the classifiers trained by As and Bs, roughly 15% of the Cs are classified as suspicious, and these records belong to 13 manufacturers. The results only indicate that 13 manufacturers out of the 569 unspecified C-manufacturers are suspicious enough to be inspected further, and we do not know if they are actually problematic.

Table 3

Performance on identifying problematic manufacturers.

	Error rate	Sensitivity*	Specificity*
Logistic Regression	2.65%	66.67%	99.29%
KNN	2.21%	77.78%	99.06%
Neural Network	2.43%	66.67%	99.53%
Random Forest	0.44%	96.30%	99.76%
SVM (linear)	5.97%	0.00%	100.00%
Naïve decision tree	3.97%	70.37%	99.53%

*Sensitivity: probability to identify B as suspicious manufactures

Specificity: probability to identify A as benchmark manufactures.

Fig 4

(A) Scatter plot of A- and B-labeled manufacturer, (B) Scatter plot of A-, B- and C-labeled manufacturer, with 95% and 99% prediction regions.

*Sensitivity: probability to identify B as suspicious manufactures Specificity: probability to identify A as benchmark manufactures.

Discussion

To establish traceability system, all the related records (ie. invoices, receiving and shipping papers etc.) of each transaction should be kept and retained for a period of several years, regardless of whether the form of such records is in paper, electronic or otherwise. To assess the consistency of these records and to identify unusual and inappropriate trends is time-consuming and demanding for experienced manpower because it is a heavy burden to inspect the large amount of detailed information concerning date of purchase or supply, name of products, quantity received or supplied, and name and address of the suppliers or distributors etc. Furthermore, since information provided by various suppliers differed with respect to food classifications, product categories and product naming conventions, few regulatory agencies can inspect all the players, upstream and downstream, in the entire food supply chain. Digitalized transactional big data that recorded users’ behavior had been applied for human health surveillance, for example, prediction of flu trends [24]. However, no literature had been found on applying electronic invoice for food safety. This study utilizes B2B e-invoice big data to develop an early alert system for edible oil food safety. The system automates the processes of (1) sieving edible oil-related invoices, and (2) identifying suspicious manufacturers with irrational monthly transactions. Processes are based on modeling classifiers via statistical and machine learning methods. In (1), SAS® EM™ 14.3 first automates the construction of m topics from 29,942 merchandise titles of all e-invoices. The assessment on various classifiers shows that based on the 29,942 pre-labeled e-invoices, all classifiers, including KNN, SVM, neural network, logistic regression, RF and text rule builder, can identify accurately the edible oil-related invoice with se >92% and sp >96% with m ≧ 150. Particularly, RF has the best performance at se = 95.2% and sp = 98.9% on the testing dataset. The increased performance of a larger number of topics may be because of the diversity of the merchandise name and the tendency of the manufacturer to use catchy names to attract customers. In (2), RF also has the best performance with se = 96.3% and sp = 99.76%. However, a naïve decision tree with a simple rule to classify whether manufacturer month record is suspicious if X > 6 and X > 6 also have a reasonably good performance with se = 70.37% and sp = 99.53%. The excess large values of X, for example, X >6, indicates that the turnover of products sales is e6 (>400) times more than that of material purchase for the specific manufacturer in the same month, while X take into account of the lag effect between product sale and materials purchasing from previous month. With both X and X >6, the manufacturer had sold products of value 400 times more than the materials it had purchased within a 2-month period. One possible interpretation for this scenario is that a large part of the materials used in products are not considered as proper ingredients for edible-oil manufacture. The increase of se for naïve decision tree can be achieved by lowering the threshold 6 to smaller value, however as a tradeoff this will reduce its sp resulting extra false alarm. The drawbacks in this study are as follows: Only 27 B-labeled records are available because of the compulsory law for using e-invoice started only in October 2014. Thus, several B-labeled manufacturers did not use it before then. After the scandals broke out, certain B-labeled manufacturers closed down, and the transaction records became unavailable. The B2B e-invoice data cover only the transactions among domestic companies. The records of materials that manufacturers directly import from overseas were not included in this study. The Taiwanese government is currently working on integrating the Customs records into the FIA system, which will make transactions more complete in the future. As modern food supply chains become more and more complex, the importance of food production transparency increases. Consumer expectation and right to know has added a challenging dimension to this transparency. Transparency in food supply chains via traceability requires implementation of new technologies such as the Internet of Things (IoT), blockchain and Big Data analytics [25-26]. In response to the food incidences in Taiwan since 2011 [10], Taiwan's Legislative Yuan passed the Act Governing Food Safety and Sanitation in 2013, and enacted new regulations in 2014 [9]. The amendment aim to provide information about food safety as consumers right with efficient trace management of materials, which is based on the registration of their importation, the labeling of additives, the labeling of final products, etc.. Beside that, it also important for government to be able to identify suspicious cases before becoming public event. This alert system can automatically flag suspicious manufacturers for high-priority onsite inspection if implemented into the FIA e-invoice system. Big data analysis can also complement expensive inspections by unveiling numerous unseen issues in traditional methods. However, these promising results should prompt further studies to assess the effectiveness of its application in real-world situations. This study shows that e-invoice has bright future on the application of food safety; not only for product traceability, but also for prevention of adverse events. 23 Oct 2019 PONE-D-19-24654 An automated alarm system for food safety by using electronic invoices PLOS ONE Dear Prof Lian, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. ============================== The reviewers commented on your manuscript and despite the fact that they considered your work of practical relevance, interesting and potentially novel, there are a couple of shortcomings that should be addressed before it can be acceptable for publication. Please, address the concerns raised by the reviewers. ============================== We would appreciate receiving your revised manuscript by Dec 07 2019 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. We look forward to receiving your revised manuscript. Kind regards, Anderson de Souza Sant'Ana, PhD Academic Editor PLOS ONE Journal Requirements: 1. When submitting your revision, we need you to address these additional requirements. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at http://www.journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and http://www.journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. Thank you for stating the following in the Acknowledgments Section of your manuscript: "The work was funded by MOST 106-2314-B-O 18-001-MY2 from the Ministry of Science and Technology. The e-invoice data was provided by Fiscal Information Agency of Taiwan Ministry of Finance." We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form. Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows: The author(s) received no specific funding for this work. 3. Please could you provide further details about the expert-knowledge based process in section 2.2., step 3: *If the 'researchers' involved were not the authors, please provide details of their positions and expertise and how they were recruited for this exercise. *Please clarify if you obtained consent from the researchers to participate in this study. *Please provide details of how the process was carried out: What questions were the researchers asked? How did they come to an agreement? Thank you for your attention to these points. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Partly Reviewer #2: Yes ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: I Don't Know Reviewer #2: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: No ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: The paper presents a system to detect edible oil manufactures that are suspicious to sell adulterated oils. The system makes use of data mining techniques to detect anomalous individuals in a big sample. The aim of the paper is interesting and potentially new. Nevertheless, I have seen several shortcomings in the paper, described below. I recommend the authors considering the comments in order to improve the quality of the paper: 1) The contribution is not contextualized. I miss a section dealing with related work about, at least, systems to detect anomalous individuals using data mining techniques. There are many recent contributions in this aspect, for example in financial fraud detection, which is related to this paper. By doing a literature review, the authors can put it clear what the paper adds to knowledge and practice in this line of research. 2) In the Data section, I miss an explanation of how expert knowledge was extracted. Do you interview some key agents? How many? Which was the criterium to select the experts? 3) The selection of the training data should be also discussed. Why not selecting data from a certain time period (e.g., one year) as training data and the following period as testing data? By doing so, the results would show the system ability to predict suspicious manufactures in the future. 4) Which are the specific features influencing the most on the classification? It would be interesting to know and interpret these features. 5) Related to the previous comment, which would be the interpretation of the naïve decision tree in section 3.2? Do you have any explanation about why this rule identifies suspicious manufacturers? 6) Part of the stuff in the Discussion section (that dealing with figure 4(b)) would be better located in the Results section. I also miss the last column in Table 4, commented in the Discussion section. 7) I also recommend including a Conclusion section. Reviewer #2: I think the manuscript is very interesting and may be very useful to the government from Taiwan. In addition, the method may be inspiring for other countries. I have some suggestions, as follows: “Prevention is always better than remedy”- I think the sentence is very colloquial and should be replaced. “Early detection of food safety events via warning signs of suspicious transactions is thus needed.” – Please give more details about it. Does the e-invoice was already used to detect food adulteration? What are the possible clues to detect a food safety adulteration using the e-invoice system? “All registered edible oil manufacturers with a capital amount of over 1 million USD in Taiwan have been required to use an e-invoice system” – Can you include the reference about this requirement? What is FIC? Please include the meaning of abbreviations on their first use. I think the methods and results sections could be more detailed. The method could be improved to increase the interest from food safety/ food policy specialists and stakeholders I don’t think Table 1 is necessary. You could include the data in the text. “Researchers selected 60 keywords that they agreed could best distinguish edible oil-related items from others.” Can you give some examples? The discussion should be improved. Topics about food safety and consumer’s right could be included. Are there any other studies about using an automated system to detect frauds? The system implementation is expansive? “Prevention is always better than cure” – same as the sentence aforementioned Only one reference is from 2017 and one from 2016. The other references are from <2015. I think more recent studies can be included. More references about food issues could be included. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Juan M Hernandez Reviewer #2: Yes: Diogo Thimoteo da Cunha [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step. 18 Nov 2019 Journal Requirements: 1. When submitting your revision, we need you to address these additional requirements. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at http://www.journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and http://www.journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf Reply: We revised the manuscript according to PLOS ONE's requirements. 2. Thank you for stating the following in the Acknowledgments Section of your manuscript: "The work was funded by MOST 106-2314-B-O 18-001-MY2 from the Ministry of Science and Technology. The e-invoice data was provided by Fiscal Information Agency of Taiwan Ministry of Finance." We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form. Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows: The author(s) received no specific funding for this work. Reply: We remove funding-related text from the manuscript and update the Funding Statement. “The work was funded by MOST 106-2314-B-018-001-MY2 from the Ministry of Science and Technology. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. The e-invoice data was provided by Fiscal Information Agency of Taiwan Ministry of Finance. We also like to thank Changhua County Public Health Bureau for the constructive discussion.” 3. Please could you provide further details about the expert-knowledge based process in section 2.2., step 3: *If the 'researchers' involved were not the authors, please provide details of their positions and expertise and how they were recruited for this exercise. *Please clarify if you obtained consent from the researchers to participate in this study. *Please provide details of how the process was carried out: What questions were the researchers asked? How did they come to an agreement? Reply: 1. All the researchers who involved in data analysis and paper writing were authors. 2. Process: (I) The question “how to efficiently alert food safety issues by using e-invoice” was raised by one of the authors (Dr Yeh), who has served as Director-general of Changhua County Health Bureau for years, and has rich experience in dealing with the food safety issues with related manufacturers. (II) The authors had also hold two meetings with some manufacturer managing teams to discuss about the proper on identifying irrational transaction. (III) the correspondent author (Dr Lian) had signed an agreement with Fiscal Information Agency (FIA) of the Ministry of Finance to access the e-invoice data, and analyze it under the supervision of FIA. On 'Data processing' section, we added: “A bilateral agreement was signed by FIA and the institute of the authors (National Changhua University of Education) for permission of accessing the e-invoice data and analyzing it under the supervision of FIA. “ “The dataset used in this study is owned by FIA, and was analized inside the FIA data center. To access the data researchers can contact Taiwan FIA to apply for the authorization.” On page 11 we added: “The authors had held several consultation meeting with experts including the Director-general of Changhua County Health Bureau, who has rich experience in dealing with the food safety issues, and managing teams from several A-manufacturers, to discuss the proper features on classifying the "rational" vs "irrational" transactions.” Reviewer #1: The paper presents a system to detect edible oil manufactures that are suspicious to sell adulterated oils. The system makes use of data mining techniques to detect anomalous individuals in a big sample. The aim of the paper is interesting and potentially new. Nevertheless, I have seen several shortcomings in the paper, described below. I recommend the authors considering the comments in order to improve the quality of the paper: 1) The contribution is not contextualized. I miss a section dealing with related work about, at least, systems to detect anomalous individuals using data mining techniques. There are many recent contributions in this aspect, for example in financial fraud detection, which is related to this paper. By doing a literature review, the authors can put it clear what the paper adds to knowledge and practice in this line of research. Reply: Thank you for your suggestions. We have added some paragraphs at the introduction section to describe the background of this study regarding food safety and big data. We have also dealt with related works about prediction models of food fraud using big data approach. So it reads: “The emerging use of rapidly collected, complex data in unprecedented quantities is ushering the world into the era of big data [1]. Although utilization of big data has the potential to afford new insights, improve decision making and governance, and enhance the quality and efficiency of products and services, their application in the food safety domain is still limited [2]. Food safety data and information comprise structured and non-structured data from multiple sectors such as environment, animal, agriculture, food, public health, trade and economy. Previous efforts have explored the predictive power of big data in foodborne illness surveillance, environmental microbial contamination of crops, food safety violations and interpretation of genomic data for tracking and tracing foodborne illnesses [2-3]. In past decades, a variety of food fraud incidents have been reported in many countries. Such incidents have had a profound impact on public health and consumer confidence in the safety of food [4]. In response to these incidents, one of the main focuses of food fraud prevention has been on novel prediction models of food fraud using a big data approach, which considered different factors from within and outside the food supply chain. [5-8].” 2) In the Data section, I miss an explanation of how expert knowledge was extracted. Do you interview some key agents? How many? Which was the criterium to select the experts? Reply: One of the authors (Dr Yeh) has served as Director-general of Changhua County Health Bureau for years, and has rich experience in dealing with the food safety issues with related manufacturers. The authors had also hold two meetings with some manufacturer managing teams to discuss about the proper on identifying irrational transaction. On page 11 we added “The authors had held several consultation meeting with experts including the Director-general of Changhua County Health Bureau, who has rich experience in dealing with the food safety issues, and managing teams from several A-manufacturers, to discuss the proper features on classifying the "rational" vs "irrational" transactions.”. 3) The selection of the training data should be also discussed. Why not selecting data from a certain time period (e.g., one year) as training data and the following period as testing data? By doing so, the results would show the system ability to predict suspicious manufactures in the future. Reply: We used the 5-fold strategy with 8:2 training-testing-ratio. This is now mentioned on page 10. We cannot divide the data into training and testing sets by year due to the fact that the compulsory law for using e-invoice started only in October 2014. Thus, several B-labeled manufacturers did not use it before then. After the scandals broke out, certain B-labeled manufacturers closed down, and the transaction records became unavailable. 4) Which are the specific features influencing the most on the classification? It would be interesting to know and interpret these features. Reply: Among the features we had tried, the combinations of X1 and X2 have relatively better performance: X1 is the log-transformed ratio of sales to purchase in the same month i, and X2 considers the lag effect between purchasing materials and sale products. We added the following interpretation in Discussion on page 16-17: “The excess large values of X1 , for example, X1 >6, indicates that the turnover of products sales is e^6 (>400) times more than that of material purchase for the specific manufacturer in the same month, while X2 take into account of the lag effect between product sale and materials purchasing from previous month. With both X1 and X2 >6, the manufacturer had sold products of value 400 times more than the materials it had purchased within a 2-month period. One possible interpretation for this scenario is that a large part of the materials used in products are not considered as proper ingredients for edible-oil manufacture. The increase of se for naïve decision tree can be achieved by lowering the threshold 6 to smaller value, however as a tradeoff this will reduce its sp resulting extra false alarm.” 5) Related to the previous comment, which would be the interpretation of the naïve decision tree in section 3.2? Do you have any explanation about why this rule identifies suspicious manufacturers? Reply: We had added the explanation as answered in last question. 6) Part of the stuff in the Discussion section (that dealing with figure 4(b)) would be better located in the Results section. I also miss the last column in Table 4, commented in the Discussion section. Reply: We had move the part that dealing with figure 4(B) to Results section. We remove the words “the last column in Table 4”. The result already shown in Fig 4. 7) I also recommend including a Conclusion section. Reply: We added a Conclusion section in abstract. Reviewer #2: I think the manuscript is very interesting and may be very useful to the government from Taiwan. In addition, the method may be inspiring for other countries. I have some suggestions, as follows: 1.“Prevention is always better than remedy”- I think the sentence is very colloquial and should be replaced. Reply: The sentence had been replaced by “The outbreak of major food safety incidence could result costly public panic and damage of goodwill [13].” 2.“Early detection of food safety events via warning signs of suspicious transactions is thus needed.” – Please give more details about it. Does the e-invoice was already used to detect food adulteration? What are the possible clues to detect a food safety adulteration using the e-invoice system? Reply: We rephrased the sentence and added a new paragraph on page 6 to explain: “To establish traceability system, all the related records (ie. invoices, receiving and shipping papers etc.) of each transaction should be kept and retained for a period of five years at least, regardless of whether the form of such records is in paper, electronic or otherwise. To assess the consistency of these records and to identify unusual and inappropriate trends is time-consuming and demanding for experienced manpower because of the heavy burden to inspect the large amount of detailed information concerning date of purchase or supply, name of products, quantity received or supplied, and name and address of the suppliers or distributors etc. Utilizing e-invoice big data provides an opportunity to overcome aforementioned difficulty. Early detection of food fraud incidents via warning signs of suspicious transactions is plausible approach.” 3.“All registered edible oil manufacturers with a capital amount of over 1 million USD in Taiwan have been required to use an e-invoice system” – Can you include the reference about this requirement? Reply: The reference was added on page 8 and in reference [19]. 19. Ministry of Health and Welfare/Food Drug Association (MOHW/FDA). Taiwan Food and Drug Administration 2015 Annual Report, page 115. https://www.fda.gov.tw/tc/includes/GetFile.ashx?id=f636694230125946085 Accessed 10.26.19. 4.What is FIC? Please include the meaning of abbreviations on their first use. Reply: We corrected “FIC” to “FIA” and it was first time mentioned with full name on page 6. FIA is the abbreviation of the Financial Intelligence Agency. 5.I think the methods and results sections could be more detailed. The method could be improved to increase the interest from food safety/ food policy specialists and stakeholders Reply: We had add the description of process, including consultation meetings with some manufacturers and agreement with FIA of the Ministry of Finance to access the e-invoice data, and analyze it under the supervision of FIA. New paragraph about food safety/ food policy and consumer’s right were added in page 4-6 in Introduction as well as in Discussion (page15-17) as requested by other comments. 6.I don’t think Table 1 is necessary. You could include the data in the text. Reply: The content of Table 1 is now included in the text (page 8). 7.“Researchers selected 60 keywords that they agreed could best distinguish edible oil-related items from others.” Can you give some examples? Reply: These keywords could be used for inclusion criteria like “Canola (kanola) oil”, “grapeseed”, “sunflower (seed)”, and “cold pressing” etc., or as exclusion like “sauce”, “flavor”, “cosmetic”, and “skin”, etc.. 8.The discussion should be improved. Topics about food safety and consumer’s right could be included. Are there any other studies about using an automated system to detect frauds? The system implementation is expansive? Reply: We added an paragraph in discussion page 15: “To establish traceability system, all the related records (ie. invoices, receiving and shipping papers etc.) of each transaction should be kept and retained for a period of several years, regardless of whether the form of such records is in paper, electronic or otherwise. To assess the consistency of these records and to identify unusual and inappropriate trends is time-consuming and demanding for experienced manpower because it is a heavy burden to inspect the large amount of detailed information concerning date of purchase or supply, name of products, quantity received or supplied, and name and address of the suppliers or distributors etc. Furthermore, since information provided by various suppliers differed with respect to food classiﬁcations, product categories and product naming conventions, few regulatory agencies can inspect all the players, upstream and downstream, in the entire food supply chain. Digitalized transactional big data that recorded users’ behavior had been applied for human health surveillance, for example, prediction of flu trends [24]. However, no literature had been found on applying electronic invoice for food safety.” And on page 17: “As modern food supply chains become more and more complex, the importance of food production transparency increases. Consumer expectation and right to know has added a challenging dimension to this transparency. Transparency in food supply chains via traceability requires implementation of new technologies such as the Internet of Things (IoT), blockchain and Big Data analytics [25-26]. In response to the food incidences in Taiwan since 2011 [10], Taiwan's Legislative Yuan passed the Act Governing Food Safety and Sanitation in 2013, and enacted new regulations in 2014 [9]. The amendment aim to provide information about food safety as consumers right with efficient trace management of materials, which is based on the registration of their importation, the labeling of additives, the labeling of final products, etc.. Beside that, it also important for government to be able to identify suspicious cases before becoming public event.” 9.“Prevention is always better than cure” – same as the sentence aforementioned Reply: The sentence had been replaced by “The outbreak of major food safety incidence could result costly public panic and damage of goodwill [13].” 10. Only one reference is from 2017 and one from 2016. The other references are from <2015. I think more recent studies can be included. More references about food issues could be included. Reply: We now added the following reference. 1.Wyber R, Vaillancourt S, Perry W, et al. Big data in global health: improving health in low- and middle-income countries. Bull World Health Organ. 2015; 93(3): 203–208. doi: 10.2471/BLT.14.139022 2.Marvin HJ, Janssen EM, Bouzembrak Y, Hendriksen PJ, Staats M. Big data in food safety: An overview. Critical Reviews in Food Science and Nutrition. 2017;57(11):2286-2295. doi: 10.1080/10408398.2016.1257481 3.Kannan V, Shapiro MA, and Bilgic M. Hindsight Analysis of the Chicago Food Inspection Forecasting Model. Presented at the AAAI Fall Symposium Series (FSS) 2019: Artificial Intelligence in Government and Public Sector. Arlington, Virginia, USA. 4.Spink, J. and Moyer, D. C.. Defining the Public Health Threat of Food Fraud. Journal of Food Science.2011; 76(9):R157-63. doi: 10.1111/j.1750-3841.2011.02417.x. 5.Fritsche J. Recent Developments and Digital Perspectives in Food Safety and Authenticity. Journal of Agricultural and Food Chemistry, 2018;66 (29), 7562-7567.doi: 10.1021/acs.jafc.8b00843 6.Marvin HJP, Bouzembrak Y, Janssen EM, van der Fels-Klerx HJ, van Asselt ED, Kleter GA. A holistic approach to food safety risks: Food fraud as an example, Food Research International. 2016; 463-470. doi: 10.1016/j.foodres.2016.08.028, 89 7.Bouzembrak Y, Steen B, Neslo R, Linge J, Mojtahed V, Marvin H.J.P. Development of food fraud media monitoring system based on text mining, Food Control. 2018; 93; 283-296. doi: 10.1016/j.foodcont.2018.06.003 8.Verhaelen, K., Bauer, A., Günther, F., Müller, B., Nist, M., Ülker Celik, B., et al. Anticipation of food safety and fraud issues: ISAR - a new screening tool to monitor food prices and commodity flows. Food Control. 2018;94; 93–101. doi: 10.1016/j.foodcont.2018.06.029 9.Richterich, A.Using Transactional Big Data for Epidemiological Surveillance: Google Flu Trends and Ethical Implications of ‘Infodemiology’ . In: Mittelstadt B., Floridi L. (eds) The Ethics of Biomedical Big Data. Law, Governance and Technology Series. 2016;29: 41-72. Springer, Cham 10.Astill J, Dara RA, Campbell M, Farber JM, Fraser E.D.G, Sharif S, et al. Transparency in food supply chains: A review of enabling technology solutions, Trends in Food Science & Technology. 2019; 91:240-247. doi:10.1016/j.tifs.2019.07.024. 11.Messer KD, Costanigro M, Kaiser HM. Labeling Food Processes: The Good, the Bad and the Ugly, Applied Economic Perspectives and Policy. 2017;39(3): 407–427. doi:10.1093/aepp/ppx028 Submitted filename: Response to Reviewers 20191119.docx Click here for additional data file. 2 Jan 2020 PONE-D-19-24654R1 An automated alarm system for food safety by using electronic invoices PLOS ONE Dear Prof Lian, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. ============================== Your revised manuscript has been reassessed by the reviewers and there are a minor revisions to be done before it can be accepted for publication. ============================== We would appreciate receiving your revised manuscript by Feb 16 2020 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. We look forward to receiving your revised manuscript. Kind regards, Anderson de Souza Sant'Ana, PhD Academic Editor PLOS ONE [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: (No Response) Reviewer #2: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: No Reviewer #2: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: The authors have followed some of my recommendations in the first review and now the method and results are clearer. Nevertheless, I need further insist in the first point of my previous review, about the contextualization of the contribution. The authors have framed the work in the literature around food fraud prevention using big data (references 5-8, 24-26). That’s fine, but simply mentioning some references in brackets does not help very much. Your study would be more valued if you make a comprehensive (not necessarily very long) review about the state of the art and what this paper adds to this body of research. In other words, I recommend commenting the cited references (perhaps including new ones), maybe in a new section of related work, highlighting the previous findings and what your study adds. I have also some other minor comments: 1) Is there any error when citing references 16-17 and 18? It seems reference 16 should be reference 18. 2) Details about the number and role of experts consulted are better located in epigraph “Step 3”, page 9. 3) The quality of the figures should be much improved. I cannot read Fig 3 and Fig 4B. Reviewer #2: The authors presented a enhanced version of the manuscript. All my comments were addressed. The manuscript can be accepted in my opinion. ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step. 3 Jan 2020 Dear Sir: We had revised the paper according to reviewer’s comments item by item.. Iebin Lian _________________________________________ 6. Review Comments to the Author Reviewer #1: The authors have followed some of my recommendations in the first review and now the method and results are clearer. Nevertheless, I need further insist in the first point of my previous review, about the contextualization of the contribution. The authors have framed the work in the literature around food fraud prevention using big data (references 5-8, 24-26). That’s fine, but simply mentioning some references in brackets does not help very much. Your study would be more valued if you make a comprehensive (not necessarily very long) review about the state of the art and what this paper adds to this body of research. In other words, I recommend commenting the cited references (perhaps including new ones), maybe in a new section of related work, highlighting the previous findings and what your study adds. Reply: We thanks you for the advises and had added some descriptions for the most recent literatures [3, 6-8] on page 4 and 5. I have also some other minor comments: 1) Is there any error when citing references 16-17 and 18? It seems reference 16 should be reference 18. Reply: We corrected citing references 16-18. 2) Details about the number and role of experts consulted are better located in epigraph “Step 3”, page 9. Reply: Thank you for your suggestions. We have added some paragraphs at “Step 3” about the number and role of experts consulted. 3) The quality of the figures should be much improved. I cannot read Fig 3 and Fig 4B. Reply: The separate files of figures have higher resolution than those converted to pdf. Submitted filename: Response to Reviewers.docx Click here for additional data file. 7 Jan 2020 An automated alarm system for food safety by using electronic invoices PONE-D-19-24654R2 Dear Dr. Lian, We are pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it complies with all outstanding technical requirements. Within one week, you will receive an e-mail containing information on the amendments required prior to publication. When all required modifications have been addressed, you will receive a formal acceptance letter and your manuscript will proceed to our production department and be scheduled for publication. Shortly after the formal acceptance letter is sent, an invoice for payment will follow. To ensure an efficient production and billing process, please log into Editorial Manager at https://www.editorialmanager.com/pone/, click the "Update My Information" link at the top of the page, and update your user information. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, you must inform our press team as soon as possible and no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. With kind regards, Anderson de Souza Sant'Ana, PhD Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: 8 Jan 2020 PONE-D-19-24654R2 An automated alarm system for food safety by using electronic invoices Dear Dr. Lian: I am pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. For any other questions or concerns, please email plosone@plos.org. Thank you for submitting your work to PLOS ONE. With kind regards, PLOS ONE Editorial Office Staff on behalf of Professor Anderson de Souza Sant'Ana Academic Editor PLOS ONE

7 in total

Review 1. Defining the public health threat of food fraud.

Authors: John Spink; Douglas C Moyer
Journal: J Food Sci Date: 2011 Nov-Dec Impact factor: 3.167

2. A holistic approach to food safety risks: Food fraud as an example.

Authors: Hans J P Marvin; Yamine Bouzembrak; Esmée M Janssen; H J van der Fels-Klerx; Esther D van Asselt; Gijs A Kleter
Journal: Food Res Int Date: 2016-08-25 Impact factor: 6.475

Review 3. Big data in food safety: An overview.

Authors: Hans J P Marvin; Esmée M Janssen; Yamine Bouzembrak; Peter J M Hendriksen; Martijn Staats
Journal: Crit Rev Food Sci Nutr Date: 2017-07-24 Impact factor: 11.176

4. Recent Developments and Digital Perspectives in Food Safety and Authenticity.

Authors: Jan Fritsche
Journal: J Agric Food Chem Date: 2018-07-11 Impact factor: 5.279

Review 5. Major food safety episodes in Taiwan: implications for the necessity of international collaboration on safety assessment and management.

Authors: Jih-Heng Li; Wen-Jing Yu; Yuan-Hui Lai; Ying-Chin Ko
Journal: Kaohsiung J Med Sci Date: 2012-07-07 Impact factor: 2.744

Review 6. Big data in global health: improving health in low- and middle-income countries.

Authors: Rosemary Wyber; Samuel Vaillancourt; William Perry; Priya Mannava; Temitope Folaranmi; Leo Anthony Celi
Journal: Bull World Health Organ Date: 2015-01-30 Impact factor: 9.408

7. Food suppliers' perceptions and practical implementation of food safety regulations in Taiwan.

Authors: Wen-Hwa Ko
Journal: J Food Drug Anal Date: 2015-07-22 Impact factor: 6.157

7 in total