Literature DB >> 32198296

Validation of an algorithm to evaluate the appropriateness of outpatient antibiotic prescribing using big data of Chinese diagnosis text.

Houyu Zhao¹, Jiaming Bian², Li Wei³, Liuyi Li⁴, Yingqiu Ying⁵, Zeyu Zhang⁶, Xiaoying Yao¹, Lin Zhuo⁷, Bin Cao⁶, Mei Zhang², Siyan Zhan⁸.

Abstract

OBJECTIVE: We aimed to evaluate the validity of an algorithm to classify diagnoses according to the appropriateness of outpatient antibiotic use in the context of Chinese free text. SETTING AND PARTICIPANTS: A random sample of 10 000 outpatient visits was selected between January and April 2018 from a national database for monitoring rational use of drugs, which included data from 194 secondary and tertiary hospitals in China. RESEARCH
DESIGN: Diagnoses for outpatient visits were classified as tier 1 if associated with at least one condition that 'always' justified antibiotic use; as tier 2 if associated with at least one condition that only 'sometimes' justified antibiotic use but no conditions that 'always' justified antibiotic use; or as tier 3 if associated with only conditions that never justified antibiotic use, using a tier-fashion method and regular expression (RE)-based algorithm. MEASURES: Sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) of the classification algorithm, using classification made by chart review as the standard reference, were calculated.
RESULTS: The sensitivities of the algorithm for classifying tier 1, tier 2 and tier 3 diagnoses were 98.2% (95% CI 96.4% to 99.3%), 98.4% (95% CI 97.6% to 99.1%) and 100.0% (95% CI 100.0% to 100.0%), respectively. The specificities were 100.0% (95% CI 100.0% to 100.0%), 100.0% (95% CI 99.9% to 100.0%) and 98.6% (95% CI 97.9% to 99.1%), respectively. The PPVs for classifying tier 1, tier 2 and tier 3 diagnoses were 100.0% (95% CI 99.1% to 100.0%), 99.7% (95% CI 99.2% to 99.9%) and 99.7% (95% CI 99.6% to 99.8%), respectively. The NPVs were 99.9% (95% CI 99.8% to 100.0%), 99.8% (95% CI 99.7% to 99.9%) and 100.0% (95% CI 99.8% to 100.0%), respectively.
CONCLUSIONS: The RE-based classification algorithm in the context of Chinese free text had sufficiently high validity for further evaluating the appropriateness of outpatient antibiotic prescribing. © Author(s) (or their employer(s)) 2020. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ.

Entities: Chemical Disease Gene Species

Keywords: antibiotics; drug utilisation; electronic health records; prescriptions; validation

Mesh：

Substances：
Anti-Bacterial Agents

Year: 2020 PMID： 32198296 PMCID： PMC7103794 DOI： 10.1136/bmjopen-2019-031191

Source DB: PubMed Journal: BMJ Open ISSN： 2044-6055 Impact factor: 2.692

This is the first study to establish rules for evaluating appropriateness of outpatient antibiotic prescriptions by using Chinese diagnosis text. The rule-based and tier-fashion algorithm for classifying diagnoses according to whether antibiotic was indicated or not had high validity. The algorithm provides a feasible method to use big electronic medical records or claim data to evaluate the appropriateness of antibiotic use in China. Certain synonyms, abbreviations or acronyms of medical terms may be omitted thus cannot be detected by the algorithm.

Introduction

Overuse of antibiotics and the consequential antimicrobial resistance (AMR) have been serious threats to public health worldwide.1 China is one of the countries with the highest antibiotic consumption and hence, high prevalence of AMR in the world.[2 3] Reducing unnecessary and inappropriate use of antibiotics is essential to reduce both AMR and adverse drug reactions,4 5 which cause a large number of deaths and economic losses every year.4 6 7 In China, the evaluation of the appropriateness of antibiotic use has occurred mainly thorough manual prescription review, which is time consuming and, hence, it is not feasible for evaluating large-scale prescription data. The rapid implementation of the electronic medical records (EMRs) across the whole country has enabled the routine collection of medical data available for research purposes.8 9 However, a huge amount of these data, especially diagnosis information, is in the form of Chinese free text, which makes it difficult for evaluating the appropriateness of antibiotic use. Previous studies using big EMR or claim data for the evaluation of appropriate antibiotic prescribing applied an innovative tier-fashion method to classify and assign diagnosis to the outpatient visits based on whether antibiotic is indicated for treatment or not.4 10 11 The key aspect of this method is to classify all diseases into three tiers corresponding to whether the disease always, sometimes or never justifies antibiotic use4 5 11; that is, tier 1 diagnoses are diseases for which antibiotics are almost always indicated, such as pneumonia and specific bacterial infections; tier 2 diagnoses are diseases for which antibiotics are only sometimes indicated, such as sinusitis and pharyngitis; finally, tier 3 diagnoses are all the other diseases for which antibiotics are not indicated or the indication is unclear, such as influenza and cancer. However, most of these studies are conducted in the USA and the UK where 20%–30% of outpatient antibiotics are estimated to be used inappropriately,4 10 11 with little reliable evidence from China and other developing countries where antibiotic use is high.[2 12] The aim of this study was to establish and validate an regular expression (RE)-based algorithm for extracting and classifying outpatient diagnosis from the Chinese free text using the tier-fashion method mentioned above.

Methods

Data sources

We used a national database for monitoring the rational use of drugs. Online supplementary appendix 1 describes the recruitment process of the hospitals and the representativeness of the database. In total, there were 194 hospitals from 128 cities of 31 provinces, autonomous regions or municipalities in China. The database consisted of both outpatients’ and inpatients’ information of demographic characteristics, prescriptions, item costs and diagnoses from EMR of sample hospitals between October 2014 and April 2018. The prescription for each outpatient visit consisted of three parts, which were recorded in different tables in the database. The preface mainly included basic information of the patients, such as gender and age, and diagnosis; the main body consisted of a list of drug information, including drug name, dosage and usage; and the postscript included other information such as doctor’s signature. All information generated during the same visit could be linked by a unique identifier consisting of the hospital code, patient identification number and the date of visit. Chemical drugs including antibiotics were coded according to the Anatomical Therapeutic Chemical classification system.13 Outpatient diagnosis in the database was in the form of Chinese narrative free text. Several diagnoses could be written together and separated by punctuation marks. We treated multiple prescriptions and diagnoses from the same patient on the same day in the same hospital as one visit. Thus, multiple diagnose and drugs could be linked to the same visit. At present, there were 239 million outpatient visits and 170 million diagnosis records. Fifty-five hospitals did not submit diagnosis records and 88% prescriptions from the remaining 139 hospitals could be linked to at least one valid diagnosis. This proportion was comparable to previous studies.5 11 In this study, antibiotics for systemic use were evaluated (see online supplementry appendix 2 for the list of antibiotics in the database).

Diagnosis classification

Outpatient diagnoses were processed in three steps (see figures 1 and 2 in online supplementry appendix 3): standard tiers classifying, REs establishing, and dictionary and pattern mapping.

Standard tiers classifying

In the first step, based on the standard description and classification of the International Statistical Classification of Diseases and Related Health Problems 10th Revision (ICD-10) in 2016 Chinese version,[14] we first classified all diseases into three tiers according to whether antibiotics are indicated or not: tier 1 if the disease almost always justifies use of antibiotics, such as pneumonia; tier 2 if the disease only sometimes justifies use of antibiotics, such as sinusitis; or tier 3 if the disease almost never justifies use of antibiotics, such as cancer. Two groups of researchers worked on this independently and in parallel, the first group classified the standard diagnoses according to the list of categories established in previously published researches.4 5 11 The other group consisting of one clinician and one pharmacist performed the same classification based on their clinical knowledge and experience. If there were any conflicts between the results of the two groups, recommendations for antibiotic use from the Guidelines for Clinical Application of Antibiotics15 and UpToDate16 were referenced and the final tiers of the diagnoses were determined. Online supplementry appendix 4 gives the basic rules that we used to classify the diagnoses. Then, the primary list of the tiers of diagnoses was reviewed by an expert of respiratory disease, an expert of infectious disease and an experienced pharmacist in three of the top hospitals in Beijing. Conflicts of the reviewers were discussed and the primary list was modified, if necessary. Finally, we classified 34 337 ICD codes, among which 2465 (7.2%) were classified as tier 1, 2608 (7.6%) were classified as tier 2 and 29 264 (85.2%) were classified as tier 3. The final list of the standard tiers of diagnoses (LiSToD) is available from https://www.researchgate.net/publication/336286590_LiSToD_The_list_of_the_standard_tiers_of_diagnoses_based_on_China-ICD-10. Online supplementry appendix 5 gives a comparison with the classification schemes from previous studies. Furthermore, we set up more diagnostic subcategories according the ICD-10 chapters and previous studies,[4 5 17 18] and finally all diagnoses were classified into 15, 21 and 13 different categories of tier 1, tier 2 and tier 3 diagnoses, respectively (see online supplementary appendix 6 for more details).

RE establishing

In the second step, we first established the dictionary and patterns of the clinical terms in the LiSToD. The clinical terms were derived from the following sources: (1) code string descriptions of Chinese version of ICD-10 and corresponding synonyms or abbreviations; (2) pilot string searches for the diagnoses in the LiSToD; (3) clinicians’ suggestions about the common abbreviations (in Chinese or in English) of infectious diseases. Similar patterns or key words of different descriptions of the same condition in the raw data were identified and used to establish the REs which can be used for information extraction. For example, Helicobacter pylori infection coded as A49.809, standard Chinese name in the ICD-10 system would be ‘幽门螺旋杆菌感染’, but it may be written in the raw diagnosis text as ‘幽门螺旋杆菌’, ‘幽门螺杆菌’, ‘幽门螺旋菌’, ‘幽门杆菌’, ‘幽门螺菌’, or even ‘幽门螺旋杆’, ‘幽门螺杆’ or in English abbreviation ‘HP’, ‘Hp’, ‘hP’ or ‘hp’. Thus, the REs for matching the diagnoses of H. pylori infection after converting all the letters to the uppercase would be ‘(幽门螺?旋?杆?菌)|(幽门螺旋?杆)’ and ‘([ˆA-Z]HP[ˆA-Z])|([ˆA-Z]HP$)|(ˆHP[ˆA-Z])|(ˆHP$)’ (this subexpression means that the abbreviation of ‘HP’ cannot be prefixed or suffixed by any other letter). Finally, we established the lists of REs for diagnoses (REoD) in the LiSToD. Online supplementry appendix 7 gives more details of the rules we used for constructing REs.

Dictionary and pattern mapping

In the third step (figure 2 in online supplementry appendix 3), first, we did the preprocessing of the diagnosis text. All the punctuations (except question marks and short dashes which may indicate the uncertainty and negation) and blanks were converted to semicolons, and all the English letters were converted to single byte, uppercase ones. Then raw diagnosis text was cut into segments of single disease. Finally, the first five diagnoses were extracted and used for mapping the clinical terms in the LiSToD based on the REoD. Initially, we tried to identify the tier 1 diagnoses from each of the first five diagnoses. For accurate extraction of the clinical information, modification information was further tried to detect from the identified tier 1 diagnoses, as the underlying meaning of the free diagnosis text is significantly affected by other co-occurring concepts.19 We detected three kinds of modifiers: (1) negation modifiers, such as ‘排除’ (except or exclude), ‘除外’ (except or exclude), ‘阴性’ (negative), ‘非’ (non or not) and the short dash ‘-’; (2) temporal information, such as ‘复查’ (re-examination), ‘复诊’ (revisit); (3) uncertainty modifiers, which indicated that the event may not have actually occurred, such as ‘待排除’ (to be excluded), ‘待查’ (unknown origin or to be examined), ‘可能’ (maybe or likely), ‘不能排除’ (cannot be excluded), ‘咨询’ (for consultation or consulting) and the question mark ‘?’. If negation modifiers were detected, the tier 1 diagnosis was changed to tier 3; while if the temporal or uncertainty modifiers were detected, the tier of the diagnosis remained unchanged with the addition of a marker indicating uncertainty to the diagnosis. Further, if no tier 1 diagnosis or only negative tier 1 diagnosis was identified, we tried to identify tier 2 diagnosis. Modification information was also further detected for classified tier 2 diagnosis. Negative tier 2 diagnosis was changed to tier 3 or uncertainty marker was added when the information of uncertainty was detected. If no tier 1 or tier 2 diagnosis, or only negative tier 1 or tier 2 diagnosis was identified, then the diagnosis was classified as tier 3. After all the first five diagnoses were classified as tier 1, tier 2 or tier 3, with or without uncertainty, the tier-fashion method was applied to assign a single diagnosis to each visit. This means that, for multiple diagnoses in the same visit, priority was given to tier 1 diagnosis without uncertainty marker (tier 1A), followed by tier 1 diagnosis with uncertainty marker (tier 1B), then tier 2 diagnosis without uncertainty marker (tier 2A), then tier 2 diagnosis with uncertainty marker (tier 2B), and finally, tier 3 diagnosis. If multiple diagnoses from a single tier exist in the visit, the first-listed certain diagnosis was assigned. All the procedures above have been encapsulated into a PL/SQL package, which is accessible in the online supplementry appendix 8.

Reference standard and validation

We selected a random sample of 10 000 outpatient visits that could be linked with diagnosis records during a 4-month period of 1 January 2018 to 30 April 2018 in the database. The sample data in this study had a good representativeness of the entire database (online supplementry appendix 9). Prescription review was used as the reference standard for classification validation against which the sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) for classifying different tiers of diagnosis were calculated. Two researchers who were trained with the classification scheme and blinded to drug exposure status independently reviewed the raw prescription and classified the diagnoses into tier 1, tier 2, or tier 3 according to the LiSToD, with or without uncertainty. Conflicting results of the two researchers were further discussed with a third clinician and the final tier of the diagnosis was determined.

Statistical analysis

We calculated the sensitivity, specificity, PPV and NPV for each tier of diagnosis and the Clopper-Pearson Exact method was used for calculating the 95% CIs. For tier X diagnosis (X was 1, 2, 3, or 1A, 1B, 2A, 2B), the sensitivity was the proportion of true tier X diagnosis that was classified as tier X by the RE algorithm; the specificity was the proportion of true non-tier X that was classified as non-tier X by the RE algorithm; the PPV was the proportion of tier X classified by the RE algorithm that was true tier X diagnosis; the NPV was the proportion of non-tier X classified by the RE algorithm that was true non-tier X diagnosis. The information extraction, and mapping the diagnosis text to the LiSToD using REoD were performed by using Oracle 11gR2 and PL/SQL developer V.11.0 (Oracle Corp., Redwood Shores, California, USA). Statistical analyses were performed by using SAS V.9.4 (SAS Institute).

Patient and public involvement

Patients and the public were not involved in the design or conduct of the study.

Results

Out of the 10 000 outpatient visits, 91.1% occurred in outpatient clinics, and 92.9% visits occurred in tertiary-level hospitals. Further, 90.5% of the patients were adults and 51.0% were women. Most outpatient visits were from the Eastern and Western regions, which accounted for 54.8% and 30.2% of all the visits (table 1). One thousand and ninety-two visits (10.9%) ended with antibiotic prescription and 1417 individual antibiotics were prescribed. The most commonly used antibiotics in the dataset were cefdinir, azithromycin, levofloxacin, cefixime and moxifloxacin. Then, 9984 of the 10 000 visits could be linked to valid diagnoses, and the diagnoses of 16 visits were pure numbers or punctuation that did not contain useful information. Among the 10 000 visits, 66% had just one diagnosis, over 85% had less than 2 diagnoses, and only 2.2% had more than 5 diagnoses. Among all diagnosis records, the median of the length of diagnosis text was 6 (IQR 4–10) characters; 73.1% contained less than 10 characters.

Table 1

Basic characteristics of the sample outpatient visits*

Subgroups	No of prescriptions	Percentage (%)
Type of patients
Outpatient clinic	9105	91.1
Emergency department	895	9.0
Age group
0–5	430	4.3
6–17	480	4.8
18–44	4180	41.8
45–64	3129	31.3
≥65	1742	17.4
Unknown	39	0.4
Gender of patients
Female	5099	51.0
Male	4893	48.9
Unknown	8	0.1
Hospital level
Second	711	7.1
Third	9289	92.9
Region of China†
Eastern	5476	54.8
Central	542	5.4
Western	3021	30.2
North-eastern	961	9.6
Top five most used antibiotics
Cefdinir	127	9.0
Azithromycin	126	8.9
Levofloxacin	111	7.8
Cefixime	103	7.3
Moxifloxacin	79	5.6
No of diagnoses
Non-valid diagnosis‡	16	0.2
One diagnosis	6597	66.0
Two diagnoses	1921	19.2
Three diagnoses	753	7.5
Four diagnoses	330	3.3
Five diagnoses	163	1.6
>5 diagnoses	220	2.2
Length of diagnosis text§
Non-valid diagnosis	16	0.2
1–4 characters	3645	36.5
5–9 characters	3662	36.6
10–14 characters	1477	14.8
15–19 characters	574	5.7
≥20 characters	626	6.3

*For rounding reasons, the sum of the percentages of some subgroups may not be exactly equal to 100%.

†Regions of China were divided according the National Bureau of Statistics of China. http://www.stats.gov.cn/ztjc/zthd/sjtjr/dejtjkfr/tjkp/201106/t20110613_71947.htm

‡Diagnosis that contained only numbers, punctuation (eg, comma, semicolon, exclamation mark, etc), and other non-Chinese characters, told nothing about the indication for antibiotics and were defined as invalid diagnosis.

§Whitespace and punctuation were not counted.

Basic characteristics of the sample outpatient visits* *For rounding reasons, the sum of the percentages of some subgroups may not be exactly equal to 100%. †Regions of China were divided according the National Bureau of Statistics of China. http://www.stats.gov.cn/ztjc/zthd/sjtjr/dejtjkfr/tjkp/201106/t20110613_71947.htm ‡Diagnosis that contained only numbers, punctuation (eg, comma, semicolon, exclamation mark, etc), and other non-Chinese characters, told nothing about the indication for antibiotics and were defined as invalid diagnosis. §Whitespace and punctuation were not counted. Among the 9984 visits with valid diagnoses, 3.9% (n=390), 11.5% (n=1144) and 84.6% (n=8450) were, respectively, classified as tier 1, tier 2, and tier 3 diagnosis (table 2). The sensitivities of the RE algorithm were 98.2% (95% CI 96.4% to 99.3%), 98.4% (95% CI 97.6% to 99.1%) and 100.0% (95% CI 100.0% to 100.0%) for classifying tier 1, tier 2 and tier 3 diagnoses, respectively. The specificities were 100.0% (95% CI 100.0% to 100.0%), 100.0% (95% CI 99.9% to 100.0%) and 98.6% (95% CI 97.9% to 99.1%). The PPVs for classifying tier 1, tier 2 and tier 3 diagnoses were 100.0% (95% CI 99.1% to 100.0%), 99.7% (95% CI 99.2% to 99.9%) and 99.7% (95% CI 99.6% to 99.8%), respectively. The NPVs were 99.9% (95% CI 99.8% to 100.0%), 99.8% (95% CI 99.7% to 99.9%) and 100.0% (95% CI 99.8% to 100.0%), respectively, for tier 1, tier 2 and tier 3 diagnoses (table 3). In addition, the RE-based algorithm had sufficiently high accuracy to detect the diagnosis modifiers, with all sensitivities, specificities, PPVs and NPVs for classifying diagnoses with uncertainty (tier 1B and tier 2B diagnoses) approaching or exceeding 95%. Online supplementry appendix 10 gives more detailed results of all the 49 subcategories under different diagnosis tiers.

Table 2

Confusion matrix for the regular expression (RE)-based diagnosis classification algorithm*

Classified by the RE algorithm	Classified by manual prescriptions review
Classified by the RE algorithm	Tier 1A	Tier 1B	Tier 2A	Tier 2B	Tier 3
Tier 1A	348	0	0	0	0
Tier 1B	1	41	0	0	0
Tier 2A	2	1	1074	0	0
Tier 2B	0	0	4	63	0
Tier 3	4	0	17	1	8428

*Tier 1A: tier 1 diagnoses without uncertainty. Tier 1B: tier 1 diagnoses with uncertainty. Tier 2A: tier 2 diagnoses without uncertainty. Tier 2B: tier 2 diagnoses with uncertainty.

Table 3

Validation of the regular expression-based classification algorithm

Diagnosis tiers*	Sensitivity (%, 95% CI)	Specificity (%, 95% CI)	PPV (%, 95% CI)	NPV (%, 95% CI)
Tier 1A	98.0 (96.0 to 99.2)	100.0 (100.0 to 100.0)	100.0 (98.9 to 100.0)	99.9 (99.9 to 100.0)
Tier 1B	97.6 (87.4 to 99.9)	100.0 (99.9 to 100.0)	97.6 (87.4 to 99.9)	100.0 (99.9 to 100.0)
Tier 2A	98.1 (97.1 to 98.8)	100.0 (99.9 to 100.0)	99.7 (99.2 to 99.9)	99.8 (99.6 to 99.9)
Tier 2B	98.4 (91.6 to 100.0)	100.0 (99.9 to 100.0)	94.0 (85.4 to 98.3)	100.0 (99.9 to 100.0)
Tier 1	98.2 (96.4 to 99.3)	100.0 (100.0 to 100.0)	100.0 (99.1 to 100.0)	99.9 (99.8 to 100.0)
Tier 2	98.4 (97.6 to 99.1)	100.0 (99.9 to 100.0)	99.7 (99.2 to 99.9)	99.8 (99.7 to 99.9)
Tier 3	100.0 (100.0 to 100.0)	98.6 (97.9 to 99.1)	99.7 (99.6 to 99.8)	100.0 (99.8 to 100.0)

*Tier 1A: tier 1 diagnoses without uncertainty. Tier 1B: tier 1 diagnoses with uncertainty. Tier 2A: tier 2 diagnoses without uncertainty. Tier 2B: tier 2 diagnoses with uncertainty.

NPV, negative predictive value; PPV, positive predictive value.

Confusion matrix for the regular expression (RE)-based diagnosis classification algorithm* *Tier 1A: tier 1 diagnoses without uncertainty. Tier 1B: tier 1 diagnoses with uncertainty. Tier 2A: tier 2 diagnoses without uncertainty. Tier 2B: tier 2 diagnoses with uncertainty. Validation of the regular expression-based classification algorithm *Tier 1A: tier 1 diagnoses without uncertainty. Tier 1B: tier 1 diagnoses with uncertainty. Tier 2A: tier 2 diagnoses without uncertainty. Tier 2B: tier 2 diagnoses with uncertainty. NPV, negative predictive value; PPV, positive predictive value. An analysis of the errors in precision was performed (table 4). In total, diagnoses from 30 visits were inaccurately classified, among these, 17 were because the infectious disease were written after the 5th diagnosis, 12 were due to inappropriate writing of the diagnoses (where single diagnosis was improperly split or multiple diagnoses were incorrectly concatenated together); the last inaccuracy was due to the use of traditional Chinese, which was not considered when constructing REs.

Table 4

Reasons for inaccurate classifications of the regular expression (RE)-based algorithm

Classified by the RE algorithm*	Classified by manual prescription review*	Reasons for inaccurate classifications†
Tier 1B	Tier 1A	Multiple diagnoses incorrectly concatenated together (n=1).
Tier 2A	Tier 1A	The infectious disease written after the fifth diagnosis (n=2).
Tier 2A	Tier 1B	The infectious disease written after the fifth diagnosis (n=1).
Tier 2B	Tier 2A	Multiple diagnoses incorrectly concatenated together (n=4).
Tier 3	Tier 1A	Single diagnosis improperly split (n=1);The infectious disease written after the fifth diagnosis (n=3).
Tier 3	Tier 2A	Multiple diagnoses incorrectly concatenated together (n=5);Single diagnosis improperly split (n=2);The infectious disease written after the fifth diagnosis (n=10);Traditional Chinese used (n=1).
Tier 3	Tier 2B	The infectious disease written after the fifth diagnosis (n=1);

*Tier 1A: tier 1 diagnoses without uncertainty. Tier 1B: tier 1 diagnoses with uncertainty. Tier 2A: tier 2 diagnoses without uncertainty. Tier 2B: tier 2 diagnoses with uncertainty.

†There was a visit (classified as tier 3 by computer and tier 2A by manual review) in which one diagnosis was divided into two diagnoses and the second part was written together with another one; thus, the total number of incorrect classification for different reasons was 31 in this table.

Reasons for inaccurate classifications of the regular expression (RE)-based algorithm *Tier 1A: tier 1 diagnoses without uncertainty. Tier 1B: tier 1 diagnoses with uncertainty. Tier 2A: tier 2 diagnoses without uncertainty. Tier 2B: tier 2 diagnoses with uncertainty. †There was a visit (classified as tier 3 by computer and tier 2A by manual review) in which one diagnosis was divided into two diagnoses and the second part was written together with another one; thus, the total number of incorrect classification for different reasons was 31 in this table.

Discussion

We developed a rule-based approach with high validity which could be used for identifying whether the use of antibiotics in outpatient prescriptions is appropriate in the Chinese context. Our findings indicated that the sensitivities, specificities, PPVs and NPVs of the algorithm were all over 98% for classifying tier 1, tier 2 and tier 3 diagnosis according to whether antibiotics were indicated by using the previously proposed method.[4 5 11] To our knowledge, there is no current estimate of the appropriateness of outpatient antibiotic prescribing at the national level in China. This paper presents an approach to extract structured diagnosis information that represents the antibiotic usage to support subsequent pharmacoepidemiology studies, rather than depending on the manual review of prescriptions, which is very time consuming and involves high cost of labour. In addition, this approach can provide a method to evaluate antibiotic use for different categories of diseases by using large EMRs and administrative data. The study is the first to use rule-based natural language processing (NLP) to establish the classification system for evaluating inappropriate prescribing of antibiotics using Chinese diagnosis text. NLP has been used for over 30 years to identify key clinical information from unstructured and semistructured text.19–21 As the amount of EMRs and administrative data is increasing rapidly and most of the information is stored in unprocessed and heterogeneous textual formats, NLP plays a crucial role in this context to transform narrative medical text into structured data.22 23 The main approaches to NLP are rule based,24 machine learning25 or hybrid approaches.8 26 Machine learning can offer easier portability solutions, while rule-based methods tend to provide reliable results.24 NLP has been used for extracting and standardising information on drugs,9 27 for automatic detection of diseases,[28 29] and for deidentification of protected health information.8 However, as most of the NLP researches have been performed in English, similar researches in Chinese are relatively limited.[30-32] The primary reason for incorrect classification using our algorithm was that diagnoses of infectious diseases were not written or inputted in a priority order. In this study, 3 tier 1 and 11 tier 2 diagnoses were misclassified as tier 3 because the diagnoses of the infections were written after the fifth one. The Prescription Administrative Policy33 and the Regulations on Prescription Review Management34 issued by the Ministry of Health of China (now the National Health Commission) stipulate that no more than five kinds of drugs can be prescribed in the same prescription, and thus logically, if a clinician wants to prescribe antibiotics, he/she should mention the infectious diseases in the first five diagnoses; otherwise, the prescription will be considered as irrational. In addition, clinicians tend to write the primary diagnosis in a priority order and other diagnoses were written as complications or comorbidities. This was reflected in our research as nearly 90% of tier 1 and tier 2 diagnoses were in the first two diagnoses. Thus, these 14 misclassifications due to lack of prioritisation of infectious diagnosis writing may actually not be misclassified and if we assign single diagnosis to a visit based on more than 5 diagnoses, the risk of overdiagnosing may arise. Another kind of errors that occurred was due to structural ambiguity of diagnosis writing. There were cases where the diagnosis text could be interpreted in contrary ways, since single diagnosis was improperly split, or multiple diagnoses belonging to different tiers were incorrectly concatenated together. This resulted in incorrect word segmentation. In the first case, parts of key information was cut into another independent diagnosis, which may contain the opposite meaning, resulting in mapping of incomplete information and misclassification. In the other case, since we performed the classification using a step-by-step process, in the first mapping process, the tier 1 or tier 2 diagnosis was detected, while in the second process involving detection of modifiers or lower tier diagnosis, negation was detected or another tier was mapped due to the coexistence of contrary information. Typo and traditional Chinese created another category of errors that may led to misclassification but this was very rare in this study. As there are many homophonic and homomorphic words in Chinese language, especially homophonics, for example, ‘幽门’ (Chinese pronunciation ‘youmen’), which means pylorus in the diagnosis of H. pylori, may be inputted as ‘油门’ (Chinese pronunciation ‘youmen’) which means accelerator and has nothing to do with disease. Thus, typos and traditional Chinese characters are likely to occur in a larger amount of diagnosis text, and it may depend on which type of input method is used by clinicians since Chinese characters can be inputted thorough pronunciation-based (the Chinese Pinyin) and glyph-based (the five stroke) input method, making this kind of error difficult for the rule-based method to detect.

Strengths and limitations

Our study had some strengths. The rule-based and tier-fashion algorithm that we used provided a feasible and validated method to evaluate the appropriateness of antibiotic prescribing by using big EMRs. It can process diagnosis extremely rapidly compared with manual prescription review, with less than 30 s needed to classify all the sample diagnoses. The algorithm was effective and had good extensibility and portability, as it is easy to add new REs or remove old ones from the REoD we developed. Since REs are well supported by most of the common database management system and statistical software, it is easy for other researchers to reuse our algorithm for conducting similar research using other kinds of data written in Chinese text. However, there were limitations in our study. Building rule-based systems is often time consuming.20 In this study, the first two steps for establishing the LiSToD and REoD took us several months. The validity of our method depended heavily on whether the LiSToD included all the bacterial infections, and whether the REoD contained all possible patterns of diseases in the LiSToD. Medicine is a large and complex domain with rich synonyms and semantically similar and related concepts.23 In addition, medical abbreviations and acronyms are common and can also be ambiguous, making it difficult to identify them.20 Although we applied multiple strategies to make the list of synonyms, abbreviations and similar patterns of infectious disease as complete as possible, some variations of standard concepts in the ICD-10 may still be possible to not be found. Since we can only extract features through the manually constructed word list, some omissions may occur in a much larger diagnosis data.

Conclusions

In conclusion, to our knowledge, our study is the first to use the rule-based algorithm to establish the classification system for evaluating inappropriate prescribing of antibiotics using Chinese diagnosis text. Further studies focusing on antibiotics in China can apply this validated algorithm to evaluate the appropriateness of antibiotic use by using big EMRs or administrative data.

27 in total

1. An ontology-based measure to compute semantic similarity in biomedicine.

Authors: Montserrat Batet; David Sánchez; Aida Valls
Journal: J Biomed Inform Date: 2010-09-15 Impact factor: 6.317

2. Meeting the challenge of antibiotic resistance.

Authors: Otto Cars; Liselotte Diaz Högberg; Mary Murray; Olle Nordberg; Satya Sivaraman; Cecilia Stålsby Lundborg; Anthony D So; Göran Tomson
Journal: BMJ Date: 2008-09-18

Review 3. Natural Language Processing in Radiology: A Systematic Review.

Authors: Ewoud Pons; Loes M M Braun; M G Myriam Hunink; Jan A Kors
Journal: Radiology Date: 2016-05 Impact factor: 11.105

Review 4. Natural language processing in pathology: a scoping review.

Authors: Gerard Burger; Ameen Abu-Hanna; Nicolette de Keizer; Ronald Cornet
Journal: J Clin Pathol Date: 2016-07-22 Impact factor: 3.411

5. European Surveillance of Antimicrobial Consumption (ESAC): disease-specific quality indicators for outpatient antibiotic prescribing.

Authors: Niels Adriaenssens; Samuel Coenen; Sarah Tonkin-Crine; Theo J M Verheij; Paul Little; Herman Goossens
Journal: BMJ Qual Saf Date: 2011-03-21 Impact factor: 7.035

6. Defining the appropriateness and inappropriateness of antibiotic prescribing in primary care.

Authors: David R M Smith; F Christiaan K Dolk; Koen B Pouwels; Morag Christie; Julie V Robotham; Timo Smieszek
Journal: J Antimicrob Chemother Date: 2018-02-01 Impact factor: 5.790

7. Prevalence of Inappropriate Antibiotic Prescriptions Among US Ambulatory Care Visits, 2010-2011.

Authors: Katherine E Fleming-Dutra; Adam L Hersh; Daniel J Shapiro; Monina Bartoces; Eva A Enns; Thomas M File; Jonathan A Finkelstein; Jeffrey S Gerber; David Y Hyun; Jeffrey A Linder; Ruth Lynfield; David J Margolis; Larissa S May; Daniel Merenstein; Joshua P Metlay; Jason G Newland; Jay F Piccirillo; Rebecca M Roberts; Guillermo V Sanchez; Katie J Suda; Ann Thomas; Teri Moser Woo; Rachel M Zetts; Lauri A Hicks
Journal: JAMA Date: 2016-05-03 Impact factor: 56.272

8. A Text Structuring Method for Chinese Medical Text Based on Temporal Information.

Authors: Runtong Zhang; Fuzhi Chu; Donghua Chen; Xiaopu Shang
Journal: Int J Environ Res Public Health Date: 2018-02-27 Impact factor: 3.390

9. Appropriateness of outpatient antibiotic prescribing among privately insured US patients: ICD-10-CM based cross sectional study.

Authors: Kao-Ping Chua; Michael A Fischer; Jeffrey A Linder
Journal: BMJ Date: 2019-01-16

10. Development and validation of method for defining conditions using Chinese electronic medical record.

Authors: Yuan Xu; Ning Li; Mingshan Lu; Robert P Myers; Elijah Dixon; Robin Walker; Libo Sun; Xiaofei Zhao; Hude Quan
Journal: BMC Med Inform Decis Mak Date: 2016-08-20 Impact factor: 2.796

3 in total

Review 1. Interventions to optimize the use of antibiotics in China: A scoping review of evidence from humans, animals, and the environment from a One Health perspective.

Authors: Liyan Shen; Xiaolin Wei; Jia Yin; D Rob Haley; Qiang Sun; Cecilia Stålsby Lundborg
Journal: One Health Date: 2022-04-06

2. Appropriateness of Antibiotic Prescriptions in Chinese Primary Health Care and the Impact of the COVID-19 Pandemic: A Typically Descriptive and Longitudinal Database Study in Yinchuan City.

Authors: Houyu Zhao; Shengfeng Wang; Ruogu Meng; Guozhen Liu; Jing Hu; Huina Zhang; Shaohua Yan; Siyan Zhan
Journal: Front Pharmacol Date: 2022-04-14 Impact factor: 5.988

3. Antibiotic Prescriptions among China Ambulatory Care Visits of Pregnant Women: A Nationwide Cross-Sectional Study.

Authors: Houyu Zhao; Mei Zhang; Jiaming Bian; Siyan Zhan
Journal: Antibiotics (Basel) Date: 2021-05-19

3 in total