Literature DB >> 36175526

A tree-based scan statistic for zero-inflated count data in post-market drug safety surveillance.

Goeun Park1, Inkyung Jung2.   

Abstract

After new drugs enter the market, adverse events (AE) induced by their use must be tracked; rare AEs may not be detected during clinical trials. Some organizations have been collecting information on suspected drugs and AEs via a spontaneous reporting system to conduct post-market drug safety surveillance. These organizations use the information to detect a signal representing potential causality between drugs and AEs. The drug and AE data are often hierarchically structured. Accordingly, the tree-based scan statistic can be used as a statistical data mining method for signal detection. Most of the AE databases contain a large number of zero-count cells. Notably, not only an observational zero from the Poisson distribution, but also a true zero exists in zero-count cells. True zeros represent theoretically impossible observations or possible but unreported observations. The existing tree-based scan statistic assumes that all zeros are zero-valued observations from the Poisson distribution. Therefore, true zeros are not considered in the modeling, which can lead to bias in the inferences. In this study, we propose a tree-based scan statistic for zero-inflated count data in a hierarchical structure. According to our simulation study, in the presence of excess zeros, our proposed tree-based scan statistic provides better performance than the existing tree-based scan statistic. The two methods were illustrated using Korea Adverse Event Reporting System data from the Korea Institute of Drug Safety and Risk Management.
© 2022. The Author(s).

Entities:  

Mesh:

Year:  2022        PMID: 36175526      PMCID: PMC9522808          DOI: 10.1038/s41598-022-19998-5

Source DB:  PubMed          Journal:  Sci Rep        ISSN: 2045-2322            Impact factor:   4.996


Introduction

After new drugs enter the market, the adverse events (AE) induced by their use must be tracked because rare AEs may not be detected during clinical trials owing to short trial durations, limited sample sizes, or limited population representation. Once drugs are commercialized, they are used in different ways and by more people than those covered during clinical trials. Accordingly, drug safety must be monitored even after commercialization to identify AEs that may not have been identified previously[1-7]. Drug and vaccine safety monitoring systems have traditionally been based on spontaneous reporting systems, such as the US Food and Drug Administration’s Adverse Event Reporting System (AERS), the US Vaccine Adverse Event Reporting System (VAERS), and VigiBase, the World Health Organization’s (WHO) global Individual Case Safety Reports database. AERS is a large database supporting the US Food and Drug Administration’s program for monitoring drug safety; VAERS helps monitor vaccine-related AEs and is maintained by the US Center for Disease Control and Prevention and the US Food and Drug Administration; and VigiBase is managed by the Uppsala Monitoring Centre (UMC) on behalf of the WHO. VigiBase receives individual case safety reports from 80 countries. In South Korea, the Korea Institute of Drug Safety and Risk Management provides information on AEs collected through the Korea Adverse Event Reporting System (KAERS) to the UMC. These spontaneous reporting systems play an important role in detecting AE signals in post-market drug safety surveillance[8,9]. Disproportionality data mining methods have been used to analyze these databases to identify signs that certain drugs may be posing unrecognized safety hazards. Frequentist methods, such as the proportional reporting ratio[10], relative odds ratio[11], Yule’s test[12], chi-squared test[13], and likelihood ratio test (LRT)[14], and Bayesian methods, including the Bayesian confidence propagating neural network[15], multi-item gamma Poisson shrinker[16], and simplified Bayes (sB) methods[15-19] are often used to detect drugs with previously unrecognized AE[16,20-25]. In pharmacovigilance data, AE information uses adverse reaction terms, which have a hierarchical structure. For example, as shown in Fig. 1, the WHO Adverse Reaction Terminology (WHO-ART) developed for the WHO drug monitoring program has a four-level hierarchical structure. (https://www.who-umc.org) Owing to this type of structure, it is difficult to determine the level of AE definition that should be used during data mining. To solve the problem, tree-based scan statistics, which find signals at each level of AEs in the form of a hierarchical tree, have been proposed by Kulldorff et al.[26] and have been recently used by some researchers to detect AE signals[27-29]. The tree-based scan statistic is distinct from most disproportionality methods; it is based on scan statistical theory and uses a hierarchical diagnosis tree to simultaneously assess risk at any level of granularity, adjusting for a multiple testing problem in several overlapping evaluated groups[7,26,30].
Figure 1

WHO-ART structure.

WHO-ART structure. Most of these AE databases have large numbers of zero-count cells. For example, AERS data from 2006 to 2011 show that the percentage of zero-count cells by the drug ranges from 50 to 99.99%[31]. However, based on KAERS data from 2012 to 2016, the percentage of zero-count cells by the drug ranges from 75 to 100%. Zero-count cells may contain not only zero-valued observations from the Poisson distribution, but also true zeros, which represent theoretically impossible observations or possible but unreported observations. Data with a large number of zeros cannot be assumed to have a Poisson distribution as some zeros are true zeros. The distribution of such data is typically more dispersed than the Poisson distribution, resulting in equality between the variance and the mean of the distribution. To solve this problem, the zero-inflated Poisson (ZIP) model proposed by Lambert[32] can be used. Huang et al.[31,33] proposed a zero-inflated Poisson model based likelihood ratio test (ZIP-LRT) method as an extended version of LRT, a frequentist data mining method. Further, Hu et al.[24] developed the zero-inflated Poisson simplified Bayes method and the zero-inflated Poisson Dirichlet process method, which are Bayesian data mining methods. The existing tree-based scan statistic assumes all zero values are zero-valued observations from the Poisson distribution. As a result, true zeros are not considered in the modeling, which can lead to bias in the inferences. Therefore, in this study, we proposed a new tree-based scan statistic using the ZIP model for data with excess zeros in a hierarchical structure. In section “A tree-based scan statistic”, we introduce the existing tree-based scan statistic. In section “A tree-based scan statistic for zero inflated count data”, we propose a tree-based scan statistic for zero-inflated count data. In section “Simulation study”, a simulation study to evaluate the performance of the proposed method is presented. In section “Real data”, the two methods are compared through a real data example. Finally, in section “Conclusion and discussion”, we summarize the results and conclude with our recommendations.

Hierarchical diagnosis tree

The tree-based scan statistic uses hierarchical classification systems to represent clinical concepts, such as drugs, procedures, or diagnoses[30]. To code adverse drug reactions in postmarket drug surveillance, medical terminologies, such as Medical Dictionary for Regulatory Activities (MedDRA) and WHO-ART, are used. In the KEARS data, WHO-ART is used to code the AEs. WHO-ART is the terminology for coding clinical information related to pharmacotherapy and is commonly used for coding the AEs. When new drugs and new symptoms create new terms that incorporate them, the structure of the terms is updated to include the newly integrated terms while retaining their previous relationships and the existing structure of terms. WHO-ART has a four-level hierarchical structure, which consists of System Organ Class (SOC), High Level Terms (HLT), Preferred Terms (PT), and Included Terms (IT). The highest level, the SOC, corresponds to body systems and organs, which contain grouping terms. The HLT is used to group related or similar PTs, but all PTs are not grouped into the HLT. The PTs are principal terms used to describe AEs and the ITs are synonyms of the PTs, which help in the search for the PTs. An example of the WHO-ART is shown in Table 1.
Table 1

Example of WHO-ART.

SOCPTITDiagnosisHLT
0800Metabolic and nutritional disorders
08000363Acidosis0363
08000363003Bicarbonate reserve decreased0363
08000363004PH reduced0363
08000363005Acidosis Metabolic0363
08000363006Blood bicarbonate decreased0363
08000363007Blood PH decreased0363
08000363008Acidosis hyperchloraemic0363
08000364Acidosis lactic0363
08000364003Lactate blood increase0363
08000393Ketosis0363
08000393003Ketoacidosis0363
08000393004Acetonuria0363
08000393005Acetone breath0363
08000393006Acetonaemia0363
08000393007Diabetic ketoacidosis0363
08001465Acidosis respiratory0363
08001465002Blood carbon dioxide increased0363
Example of WHO-ART.

A tree-based scan statistic

Review of a tree-based scan statistic

The tree-based scan statistic is a statistical data mining method that has been used for signal detection in a hierarchically structured data, such as a classification system for coding AEs. This statistic searches signals at any level of AE definitions, called leaves. Each leaf contains information on the total number of patients with a specific AE and the number of patients with a specific AE from a certain drug. Mutually-related leaves are grouped into a higher level, called a node. Of note, a cut defines a branch of the tree where a node or a leaf may have more events than expected. The tree-based scan statistic method considers all possible cuts. For each cut, the total number of AEs from all drugs and a certain drug are respectively calculated for the leaves within that cut. The test statistic is generated by a likelihood function in which risk is estimated separately for the leaves defined by the cut and those outside of the cut[26,34,35]. Let be the observed number of patients with ith AE potentially caused by a certain drug in leaf and be the total observed number of patients with th AE in leaf . For a rare disease, with covariates ignored, is approximately Poisson distributed with mean , where is the probability that th AE is caused by a certain drug. For all leaves on the tree, let and where I is the number of all leaves in the tree. For each cut G, a leaf or a group of related leaves, let and . R is the rest of the leaves except those included in G. The following null hypothesis and the alternative hypothesis are considered. The null hypothesis suggests that the probability that AEs in a cut G due to a certain drug are not lower or higher than that of all AEs. The alternative hypothesis is that at least one cut is defined by a set G such that , where R is a group of the remaining leaves. Of note, the analysis is only concerned with C, as the total number of AEs represented by the tree is not of interest. In fact, only the relative distribution between the different AEs is relevant. The likelihood can then be expressed as using a multinomial distribution. As a maximum likelihood estimator (MLE) of is given G, a likelihood ratio test statistic is when ; otherwise, the statistic is 1. The log-likelihood ratio-based test statistic is given by where I() is the indicator function[26].

Hypothesis testing

To calculate the test statistic T, the likelihood of each possible cut was determined. The cut, which is maximizing the likelihood ratio value, is defined as the most likely cut; the likelihood ratio value is defined as the test statistic T. As the null distribution of the test statistic is unknown, it is produced using the Monte Carlo simulation[36]. Given the total number of patients with AEs from a certain drug, a large number of random data sets was created under the null hypothesis, and the test statistics for each random data set and the real data were calculated. The obtained test statistics for random datasets were compared to the test statistic for the real data. The P-value was calculated using the equation: rank/(1 + B), where rank is the relative position of the test statistic for the real data among the test statistics for the random data sets and B is the number of Monte Carlo replications.

A tree-based scan statistic for zero-inflated count data

In the presence of excess zero, the Poisson model tends to underestimate the observed dispersion. In this case, the ZIP model can be employed as one of the approaches to resolve the problem as this model is more flexible than the Poisson model. If the number of ith AE with a certain drug follows the ZIP model, with the probability p of a true zero and the average number of events , , the mean and variance can be expressed as and . It can also be expressed as ; thus, when p > 0. As the ZIP model has an additional parameter relative to the tree-based scan statistic, its mean is smaller than that of the Poisson model. Thus, the ZIP model correctly calculates a reduced number of ith AEs with a certain drug due to the presence of true zeros. Given the parameters and , the probability of is described as follows: For the tree-based ZIP scan statistic, the hypotheses of interest are the same as those in section “Review of a tree-based scan statistic”. The zeros are assumed to be known, whether or not they are true zeros, as it is difficult to find a closed form of MLE when the nature of each zero is unknown. As tree-based scan statistics are based on scan statistic theory, the methodology of Cançado et al.[37], who proposed a spatial scan statistical method for zero-inflated Poisson processes, was employed. We consider a vector where for a true zero in leaf and for an observational zero in leaf . s are Bernoulli random variables with the probability p of a true zero. Given a set of observations that are bivariate data such that , , the likelihood function for set G can be expressed as When s are known, the MLEs under the null hypothesis are . However, under the alternative hypothesis, the MLEs are . When s are unknown, an expectation–maximization (EM) algorithm is used to find the MLEs of and . In the expectation step (E-step), the expected value of , given , is calculated using the following formula: Under , is considered and in each cut G and the remaining leaves R, respectively. In the maximization step (M-step), the MLEs of , and are updated via the equations with replaced by when s are known. Until the maximum likelihood estimates for each possible cut G converge, the above E- and M-steps are performed repeatedly. To perform a faster calculation, we used the ‘zeroinfl’ function in the R package “pscl”[38]. For the possible candidate cuts, this process should be conducted and the most likely cut should be determined. The likelihood ratio for cut G can be expressed as Thereafter, the maximum likelihood ratio is defined as the test statistic, As it is impossible to know the null distribution of the likelihood ratio test statistic T, Monte Carlo hypothesis testing was conducted to assess statistical significance[37].

Simulation study

Data generating process and performance assessment measures

We conducted a simulation study to assess the performance of the proposed tree-based scan statistic for zero-inflated count data (TreeScan-ZIP) and the existing tree-based scan statistic (TreeScan-Poisson). For the simulation study, datasets with the hierarchical structure where AEs can be expressed in terms of WHO-ART SOCs and PTs were generated. Only 105 of the 1292 AEs in the PT terms were considered to reduce computation time. Different artificial true signals and true zeros were generated using a tree with 105 leaves and 9 nodes. The total numbers of patients with each AE varied from 10 to 4670. The total number of patients in all leaves of the tree was 19,920 and the total number of patients with AEs from a certain drug was 640. First, true zeros were randomly allocated using the Bernoulli distribution with the probability p, where p is the percentage of the true zero leaves. Thereafter, for each iteration, the total number of patients with AEs from a certain drug, that is , was randomly assigned to the leaves on the tree as multinomial, with probabilities proportional to the relative risk. The relative risk of ith leaf was computed as For true zero leaves, . If the ith leaf was not a true zero, the dataset was generated using a multinomial distribution. Under H, the vector follows a multinomial distribution with parameters and , where . Under , , where are the relative risks of all types of AEs. The relative risk of the randomly selected true signal leaves ranged from 3, 4, and 2 to 6; however, for the other leaves, except the true zero leaves, the relative risk was equal to 1. Based on the total number of cases, C = 640, we considered 0, 10, 30, 50, and 70 for the number of true zero leaves, and 1%, 3%, 5%, and 10% for the true signal leaves with the relative risk (RR). All possible combinations were simulated. To evaluate the performance of the two methods, we computed type I error, power, sensitivity, and positive predicted value (PPV). First, the critical value T* was obtained from 10,000 random datasets under H by the Monte Carlo replications for each scenario according to the number of true zeros (0, 10, 30, 50, 70). Thereafter, B random datasets were generated under and to calculate type I error, power, sensitivity, and PPV. For each of the B random datasets, test statistic was calculated using both methods. Thereafter, type I error and power were estimated using Sensitivity and PPV for each random datasets are expressed as Overall sensitivity and PPV were calculated as the average of sensitivity and PPV over random datasets, where .

Results

The results obtained using the simulated data are presented in Table 2. The type I errors for the TreeScan-Poisson and TreeScan-ZIP methods were close to 0.05, except when the data had a Poisson distribution. The type I error of the TreeScan-Poisson method was above the nominal significance level of 0.05, while the type I error of the TreeScan-ZIP method tended to be less than 0.05.
Table 2

Type I error, power, sensitivity and positive predictive value obtained by the two methods according to the number of true signals and relative risk.

True zeroTrue signalRRTreeScan-PoissonTreeScan-ZIP
Power*SensitivityPPVPower*SensitivityPPV
000.0430.043
230.0610.1440.2750.0610.1440.275
240.0860.2600.4900.0860.2610.491
2(3.8, 6)0.1250.3430.6540.1260.3390.648
630.8330.1760.9760.8330.1760.976
640.9880.1920.9840.9890.1920.983
6(3.8, 6)1.0000.2090.9871.0000.2090.986
731.0000.3790.9951.0000.3800.995
741.0000.4290.9971.0000.4290.997
7(3.8, 6)1.0000.4510.9971.0000.4510.997
1331.0000.2340.9961.0000.2350.996
1341.0000.2970.9981.0000.2990.998
13(3.8, 6)1.0000.3860.9991.0000.3890.999
1000.0520.046
230.0470.0000.0000.0700.1730.338
240.0420.0000.0000.1040.3040.581
2(3.8, 6)0.0400.0000.0000.1590.3880.732
630.0020.0690.4170.9520.1920.983
640.0850.1660.9931.0000.2410.988
5(3.8, 6)0.9010.2001.0001.0000.3880.992
731.0000.2861.0001.0000.3980.998
741.0000.2861.0001.0000.4340.999
8(3.8, 6)1.0000.2761.0001.0000.3890.999
1330.9960.1531.0001.0000.2500.998
1341.0000.1541.0001.0000.3180.999
13(3.8, 6)1.0000.1641.0001.0000.4041.000
3000.0510.049
230.0440.0000.0000.0710.2050.390
240.0370.0000.0000.1150.3380.629
2(3.8, 6)0.0350.0000.0000.1800.4040.742
630.0020.0000.0000.9820.2420.986
640.0000.0000.0001.0000.3390.991
5(3.8, 6)0.1740.2001.0001.0000.4520.993
830.8300.2321.0001.0000.3590.998
841.0000.2501.0001.0000.3880.999
8(3.8, 6)1.0000.2501.0001.0000.4210.999
1430.7330.1281.0001.0000.2580.998
1441.0000.1431.0001.0000.3260.999
14(3.8, 6)1.0000.1431.0001.0000.4071.000
5000.0520.051
230.0400.0000.0000.1030.2980.543
240.0330.0000.0000.1840.4230.761
2(3.8, 6)0.0230.0000.0000.3130.5190.875
630.0010.0000.0000.9980.3490.990
640.0000.1671.0001.0000.3970.993
6(3.8, 6)0.2580.1671.0001.0000.4040.995
830.9690.2471.0001.0000.4090.998
841.0000.2501.0001.0000.4640.999
8(3.8, 6)1.0000.2501.0001.0000.5291.000
1430.9080.1381.0001.0000.2871.000
1441.0000.1431.0001.0000.3621.000
14(3.8, 6)1.0000.1431.0001.0000.4661.000
7000.0500.049
230.0400.0000.0000.2260.4990.826
240.0350.0000.0000.4620.5860.926
2(3.8, 6)0.0330.0000.0000.7350.6640.967
630.0001.0000.4170.996
640.0130.1671.0001.0000.4990.997
6(3.8, 6)0.9690.1671.0001.0000.5270.999
731.0000.2861.0001.0000.4290.999
741.0000.2861.0001.0000.4770.999
7(3.8, 6)1.0000.2861.0001.0000.5860.999
1331.0000.1541.0001.0000.2490.999
1341.0000.1541.0001.0000.2940.999
13(3.8, 6)1.0000.1431.0001.0000.4661.000

* Type I error when the number of true signals is 0.

Type I error, power, sensitivity and positive predictive value obtained by the two methods according to the number of true signals and relative risk. * Type I error when the number of true signals is 0. When the data did not include true zeros (i.e., the data were generated from the Poisson distribution), the TreeScan-Poisson and TreeScan-ZIP methods produced similar power, sensitivity, and PPV estimates. The TreeScan-ZIP method was identified to produce higher power and sensitivity estimates than the TreeScan-Poisson method when the number of true zeros was greater than or equal to 10. In the presence of zero inflation, when the number of true signals was greater than or equal to 5 and the RR was high, the PPV of the TreeScan-Poisson method was 1.0. The TreeScan-Poisson method could detect highly significant cuts, resulting in a small number of detected signals, which indicated high PPV and low sensitivity. The TreeScan-ZIP method performed better than the TreeScan-Poisson in every dataset with true zero. The estimated power was almost 1.0 and the PPV was greater than 0.98 when the number of true zeros was greater than or equal to 10 and the number of true signals was greater than or equal to 5. The TreeScan-ZIP method was more sensitive than the TreeScan-Poisson method. The sensitivity and PPV of the TreeScan-ZIP method became higher with higher RR. When two true signals existed, both methods had a relatively low power; however, the power of the TreeScan-ZIP method increased as the number of true zeros and RR increased. The simulation study showed that in the presence of zero inflation, the TreeScan-ZIP method performed better than the TreeScan-Poisson method.

Real data

Korea adverse event reporting system data

KAERS is a spontaneous AE reporting system maintained by the Korea Institute of Drug Safety and Risk Management (https://www.drugsafe.or.kr). Consumers, Healthcare Professionals, Regional Pharmacovigilance Centers (RPVCs), and pharmaceutical companies can report suspected drug information and AE information using the KAERS. RPVCs evaluate causality between the suspected drug and AE and report them to KIDS. The information is then stored in the KAERS as an individual case safety report (ICSR), which contains information on suspected drug, AE, causal relationship, and demographic. The ICSRs are periodically summited to the WHO-UMC. Further, safety information obtained from KAERS data and signal analysis is periodically reported to the Ministry of Food and Drug Safety. For the real data analysis, data cleansing was performed. Because a certain drug and AE information can be reported multiple times depending on the dose and time of administration, if the same drug and AE were reported twice or more, only the first report was used. In the causality, only drug–AE pairs that received ratings of possible or above were included in this study. There are 6 levels of causality: certain, probable, possible, unlikely, conditional, and unassessable[39,40]. In KAERS database, AEs are coded by the WHO-ART. As more than half of the reports included information down to the PT level, and HLT may not exist, this study used two levels of hierarchy, SOC and PT, with the exception of the HLT and IT level. Data obtained between 2012 and 2016 from KAERS were used. During this period, 716,584 people reported experiencing AEs. There were 1.8 million drug reports on 1981 types of drugs and 1.1 million AE reports on 4078 types of AEs. Further, a total of 2.4 million unique drug-AE pairs were found. When removing pairs that had beneath the ‘possible’ threshold, the final dataset analyzed in this study included 1,077,060 drug-AE pairs representing 1292 types of AEs in PTs. Further, 1981 types of drugs were identified in 557,390 reports.

Paclitaxel and docetaxel

The two proposed methods were applied to detect the AE signals to the drug–AE pairs data from KAERS. Paclitaxel and docetaxel, which have the highest sales among all anticancer drugs in the world, were selected[41]. Of note, these are representatives of the new class of taxane drugs, which have emerged as a fundamental treatment for breast cancer. Paclitaxel and docetaxel have similar main structures and mechanisms of action[42]. Paclitaxel is used to treat a number of cancer types, including Kaposi sarcoma, breast cancer, ovarian cancer, lung cancer, cervical cancer, and pancreatic cancer (https://www.ashp.org/). Docetaxel is also used as to treat several cancer types, including breast cancer, non-small cell lung cancer, prostate cancer, head and neck cancer, and stomach cancer (https://www.cancer.gov/). The most frequently reported AEs related to taxene from MICROMEDEX® include cardiovascular effects, dermatologic effects, endocrine/metabolic effects gastrointestinal effects, hematologic effects, hepatic effects, immunologic effects, musculoskeletal effects, neurologic effects, ophthalmic effects, otic effects, renal effects, respiratory effects, and others (https://www.who.int/).

Paclitaxel

Nine signals were identified by the TreeScan-Poisson method and 30 signals were detected by the TreeScan-ZIP method (Table 3). The nine signals detected by the TreeScan-Poisson method were also detected by the TreeScan-ZIP method. The AEs corresponding to the signals found by both methods were related to the following SOCs: central & peripheral nervous system disorders (0410), respiratory system disorders (1100), white cell and reticuloendothelial system disorders (1220), and body as a whole—general disorders (1810). Further, their PTs were paresthesia (0410.0137), neuropathy peripheral (0410.1313), dyspnea (1100. 0514), granulocytopenia (1220.0572), leucopenia (1220.0908), chest pain (1810.0718), and temperature change sensations (1810.1705). The TreeScan-ZIP method detected signals related to 10 SOC terms. The nine signals detected by the two methods were included in the known AEs. However, some signals detected by TreeScan-ZIP alone were included in the known AEs.
Table 3

Results of signal detection of adverse events of paclitaxel by the two methods.

SOCPTDiagnosisMarginal totalObsTreeScan-PoissonTreeScan-ZIP
ExpO/Ep-valueExpO/Ep-value
0100Skin and appendages disorders352,94910201176.50.91.000868.31.21.000
0002ALOPECIA13,1939444.02.10.92932.52.90.001
0043SWEATING INCREASED13,43310044.82.20.89033.03.00.001
0828HYPOTRICHOSIS234130.816.70.8970.622.60.001
0200Musculo-skeletal system disorders50,667305168.91.80.639124.62.40.001
0073MYALGIA24,67924482.33.00.11960.74.00.001
0410Central & peripheral nervous system disorders233,135781777.11.01.000573.51.41.000
0117HYPOAESTHESIA2349327.84.10.9345.85.50.001
0130NEUROPATHY41988114.05.80.31810.37.80.001
0137PARAESTHESIA12,16521340.55.30.01529.97.10.001
1313NEUROPATHY PERIPHERAL563415018.88.00.01513.910.80.001
2082POLYNEUROPATHY18370.611.50.9980.515.50.001
0800Metabolic and nutritional disorders68,66992228.90.41.000168.90.51.000
0368CACHEXIA31844910.64.60.7547.86.30.001
1030Heart rate and rhythm disorders22,34620674.52.80.27055.03.70.001
0221PALPITATION11,7789939.32.50.81029.03.40.001
0224TACHYCARDIA45699415.26.20.17711.28.40.001
1040Vascular (extracardiac) disorders13,67110045.62.20.89733.63.00.001
0207FLUSHING53349117.85.10.31213.16.90.001
1100Respiratory system disorders137,936588459.81.30.961339.31.70.001
0514DYSPNOEA36,735410122.43.30.01090.44.50.001
0537RESPIRATORY INSUFFICIENCY1292184.34.20.9963.25.70.001
1220White cell and RES disorders92,5311085308.43.50.001227.64.80.001
0570AGRANULOCYTOSIS80898927.03.30.65619.94.50.001
0572GRANULOCYTOPENIA58,735674195.83.40.001144.54.70.001
0908LEUCOPENIA20,45631468.24.60.00850.36.20.001
1700Neoplasms63362521.11.21.00015.61.60.996
1345NEOPLASM MALIGNANT591132.06.60.9891.58.90.001
1810Body as a whole—general disorders220,6901295735.61.80.011542.92.40.001
0712ALLERGIC REACTION1394184.63.90.9983.45.20.001
0718CHEST PAIN25,85647986.25.60.00163.67.50.001
0730PAIN72215424.12.20.98717.83.00.001
1705TEMPERATURE CHANGED SENSATION920424530.78.00.00322.610.80.001
2237ANAPHYLACTIC REACTION36804412.33.60.8959.14.90.001
Results of signal detection of adverse events of paclitaxel by the two methods.

Docetaxel

The TreeScan-Poisson and the TreeScan-ZIP methods identified 9 and 56 signals, respectively (Table 4). All signals detected by the TreeScan-Poisson method were also detected by the TreeScan-ZIP method. The AEs corresponding to the signals found by both methods were related to the following SOCs: skin and appendages disorders (0100), musculo-skeletal system disorders (0200), central & peripheral nervous system disorders (0410), red blood cell disorders (1210), white cell and reticulo-endothelial system (RES) disorders (1220). Their PTs were alopecia (0100.0002), nail disorder (0100.0020), myalgia (0200.0073), sensory disturbance (0410.0148), anemia (1210.0544), and granulocytopenia (1220.0572). The TreeScan-ZIP method detected signals related to 18 SOC terms. All signals detected by the two methods were included in the known AEs. A few signals that were not detected by TreeScan, but were detected by TreeScan-ZIP, were included in known AEs, such as vision disorders, gastro-intestinal system disorders, liver and biliary system disorders, urinary system disorders, etc.
Table 4

Results of signal detection of adverse events of docetaxel by the two methods.

SOCPTDiagnosisMarginal totalObsTreeScan-PoissonTreeScan-ZIP
ExpO/Ep-valueExpO/Ep-value
0100Skin and appendages disorders352,94944205126.40.91.0003467.11.31.000
0002ALOPECIA13,1932212191.611.50.001129.617.10.001
0008DERMATITIS EXFOLIATIVE161112.34.71.0001.67.00.024
0020NAIL DISORDER324876047.216.10.00531.923.80.001
1199SKIN EXFOLIATION16735424.32.21.00016.43.30.001
1634NAIL DISCOLOURATION429916.214.60.9274.221.60.001
0200Musculo-skeletal system disorders50,6672387735.93.20.011497.74.80.001
0063ARTHRALGIA8201416119.13.50.81380.65.20.001
0073MYALGIA24,6791908358.45.30.003242.47.90.001
0410Central & peripheral nervous system disorders233,13520813386.10.61.0002290.10.91.000
0148SENSORY DISTURBANCE247376735.921.40.00324.331.60.001
1313NEUROPATHY PERIPHERAL563423081.82.80.98655.34.20.001
1532LOWER MOTOR NEURONE LESION117121.77.11.0001.110.40.001
0431Vision disorders17,634186256.10.71.000173.21.11.000
1049LACRIMATION ABNORMAL6471169.412.30.8796.418.30.001
1462EPIPHORA151222.210.01.0001.514.80.001
0433Special senses other, disorders469256368.18.30.12146.112.20.001
0267TASTE PERVERSION419555560.99.10.10341.213.50.001
0500Psychiatric disorders129,81912611885.50.71.0001275.21.01.000
0165ANOREXIA36,109690524.51.31.000354.71.90.001
0600Gastro-intestinal system disorders636,32068139242.10.71.0006250.71.11.000
0204CONSTIPATION45,356991658.81.50.988445.52.20.001
0269ANUS DISORDER321174.73.61.0003.25.40.005
0298HAEMORRHOIDS14424520.92.11.00014.23.20.005
0321PROCTITIS91241.318.20.9990.926.80.001
0327STOMATITIS10,870256157.91.61.000106.82.40.001
1014HAEMORRHAGE RECTUM655339.53.51.0006.45.10.001
1083GINGIVITIS135313319.76.80.94913.310.00.001
1351MUCOSITIS NOS497817072.32.40.99948.93.50.001
1376TOOTH ACHE10324215.02.81.00010.14.10.001
0700Liver and biliary system disorders52,619643764.30.81.000516.91.21.000
0360SGPT INCREASED12,811341186.11.80.998125.82.70.001
0800Metabolic and nutritional disorders68,669714997.40.71.000674.61.11.000
0381HYPERCHOLESTEROLAEMIA198213928.84.80.97819.57.10.001
0387HYPOCALCAEMIA243311235.33.20.99823.94.70.001
1040Vascular (extracardiac) disorders13,671255198.61.31.000134.31.90.001
0207FLUSHING533421677.52.80.98652.44.10.001
1413ERYTHROMELALGIA81271.222.90.9980.833.90.001
1100Respiratory system disorders137,93616442003.40.81.0001355.01.21.000
0523PHARYNGITIS18,340361266.41.41.000180.22.00.003
1210Red blood cell disorders30,1161675437.43.80.030295.85.70.001
0544ANAEMIA25,8891668376.04.40.011254.36.60.001
1220White cell and RES disorders92,53139691344.03.00.002909.04.40.001
0570AGRANULOCYTOSIS8089375117.53.20.89979.54.70.001
0572GRANULOCYTOPENIA58,7352474853.12.90.028577.04.30.001
0908LEUCOPENIA20,4561091297.13.70.167200.95.40.001
1300Urinary system disorders49,509301719.10.41.000486.30.61.000
0621RENAL PAIN466406.85.91.0004.68.70.001
1420Reproductive disorders, female10,695283155.31.81.000105.12.70.001
0636AMENORRHOEA80015111.613.00.7617.919.20.001
0669VAGINITIS463236.73.41.0004.55.10.001
1839BREAST PAIN500387.35.21.0004.97.70.001
1810Body as a whole—general disorders220,69049953205.41.60.3342167.92.30.001
0401OEDEMA PERIPHERAL9444607137.24.40.39892.86.50.001
0716ASTHENIA24,301456353.01.31.000238.71.90.006
0717BACK PAIN9781209142.11.51.00096.12.20.003
0718CHEST PAIN25,856928375.52.50.681254.03.70.001
0724FATIGUE14,561515211.52.40.933143.03.60.001
1705TEMPERATURE CHANGED SENSATION9204822133.76.10.07990.49.10.001
1765PALMAR-PLANTAR ERYTHRODYSAESTHESIA7415517107.74.80.44172.87.10.001
2101PAIN AXILLARY159202.38.71.0001.612.80.001
1820Application site disorders25,336150368.00.41.000248.90.61.000
0058INJECTION SITE REACTION338510649.22.21.00033.33.20.001
2000Secondary terms—events12,32292179.00.51.000121.00.80.001
1813SURGICAL SITE REACTION290684.216.10.9642.823.90.001
Results of signal detection of adverse events of docetaxel by the two methods.

Conclusion and discussion

This study sought to reveal how the tree-based scan statistic developed by Kulldorff et al.[26] can be extended for the zero-inflated count data. To consider a large number of zero cells, we proposed the TreeScan-ZIP method, which integrates a zero-inflated Poisson model into the TreeScan-Poisson method. Herein, a simulation study was conducted with different settings for the relative risk and the number of true zero leaves and true signal leaves. Based on the findings of the simulation study, the TreeScan-ZIP method performed better than the TreeScan-Poisson method in terms of power, sensitivity, and PPV, especially when the proportion of true zeros was high. The real data examples also supported the simulation results. The TreeScan-Poisson method may have missed many signals that were detected by the TreeScan-ZIP method in datasets with a large number of true zeros. If the TreeScan-ZIP method detects too many false positive signals, it may increase confusion in further investigation and utilize unnecessary energy. However, even the known AEs were not detected by the TreeScan-Poisson method. Although we do not know whether all signals detected by the TreeScan-ZIP method were true, it is safer to over-detect than to miss any signal in drug safety surveillance. The data used were extracted from spontaneous reporting systems, which is a limitation. As spontaneous reporting systems are based on self-reporting by people, such as consumers and healthcare professionals, underreporting or overreporting of AEs may easily occur. For example, only the number of cases reported can be known. Thus, whether the same AE occurred multiple times in the same person cannot be known. Cases of overreporting may thus lead to bias in the analysis. In this study, the TreeScan-ZIP method and TreeScan-Poisson method identified signals of AEs for a particular drug, and could identify drugs that are more frequently reported to be related to a particular AE. Cuts were made either above or below nodes in this study; however, more elaborate cuts, such as the combinational cuts proposed by Kulldorff et al.[7] can also be made. In this study, we used a two-level structure; however, structures with more than two levels or other spontaneous reporting system data with more delicate levels can be employed. Further studies could use a zero-inflated double Poisson or zero-inflated negative binomial model to accommodate large numbers of true zeros and overdispersion[43]. When a priory level of AE definition cannot be determined in the tree structure and the data have a large number of zeros, the proposed tree-based scan statistic can serve as a very useful method for detecting signals in the post-market drug safety surveillance.

Data availability

The KARES database is provided via the Korea Institute of Drug Safety and Risk management webpage. (https://open.drugsafe.or.kr/original/invitation.jsp) upon request.
  26 in total

1.  Multicenter epidemiologic and health services research on therapeutics in the HMO Research Network Center for Education and Research on Therapeutics.

Authors:  R Platt; R Davis; J Finkelstein; A S Go; J H Gurwitz; D Roblin; S Soumerai; D Ross-Degnan; S Andrade; M J Goodman; B Martinson; M A Raebel; D Smith; M Ulcickas-Yood; K A Chan
Journal:  Pharmacoepidemiol Drug Saf       Date:  2001 Aug-Sep       Impact factor: 2.890

Review 2.  Application of data mining techniques in pharmacovigilance.

Authors:  Andrew M Wilson; Lehana Thabane; Anne Holbrook
Journal:  Br J Clin Pharmacol       Date:  2004-02       Impact factor: 4.335

3.  Managing drug-risk information--what to do with all those new numbers.

Authors:  Jerry Avorn; Sebastian Schneeweiss
Journal:  N Engl J Med       Date:  2009-07-27       Impact factor: 91.245

4.  A Bayesian neural network method for adverse drug reaction signal generation.

Authors:  A Bate; M Lindquist; I R Edwards; S Olsson; R Orre; A Lansner; R M De Freitas
Journal:  Eur J Clin Pharmacol       Date:  1998-06       Impact factor: 2.953

5.  A Review of Statistical Methods for Safety Surveillance.

Authors:  Lan Huang; Ted Guo; Jyoti N Zalkikar; Ram C Tiwari
Journal:  Ther Innov Regul Sci       Date:  2014-01       Impact factor: 1.778

6.  Data Mining for Adverse Drug Events With a Propensity Score-matched Tree-based Scan Statistic.

Authors:  Shirley V Wang; Judith C Maro; Elande Baro; Rima Izem; Inna Dashevsky; James R Rogers; Michael Nguyen; Joshua J Gagne; Elisabetta Patorno; Krista F Huybrechts; Jacqueline M Major; Esther Zhou; Megan Reidy; Austin Cosgrove; Sebastian Schneeweiss; Martin Kulldorff
Journal:  Epidemiology       Date:  2018-11       Impact factor: 4.822

7.  Likelihood ratio test-based method for signal detection in drug classes using FDA's AERS database.

Authors:  Lan Huang; Jyoti Zalkikar; Ram C Tiwari
Journal:  J Biopharm Stat       Date:  2013       Impact factor: 1.051

8.  Drug safety data mining with a tree-based scan statistic.

Authors:  Martin Kulldorff; Inna Dashevsky; Taliser R Avery; Arnold K Chan; Robert L Davis; David Graham; Richard Platt; Susan E Andrade; Denise Boudreau; Margaret J Gunter; Lisa J Herrinton; Pamala A Pawloski; Marsha A Raebel; Douglas Roblin; Jeffrey S Brown
Journal:  Pharmacoepidemiol Drug Saf       Date:  2013-03-20       Impact factor: 2.890

9.  Bacillus Calmette-Guérin (BCG) vaccine safety surveillance in the Korea Adverse Event Reporting System using the tree-based scan statistic and conventional disproportionality-based algorithms.

Authors:  Ju Hwan Kim; Hyesung Lee; Ju-Young Shin
Journal:  Vaccine       Date:  2020-04-09       Impact factor: 3.641

10.  Data-mining for detecting signals of adverse drug reactions of fluoxetine using the Korea Adverse Event Reporting System (KAERS) database.

Authors:  Seonji Kim; Kyounghoon Park; Mi-Sook Kim; Bo Ram Yang; Hyun Jin Choi; Byung-Joo Park
Journal:  Psychiatry Res       Date:  2017-06-13       Impact factor: 3.222

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.