Sara K Quinney1. 1. Department of Obstetrics and Gynecology and Division of Clinical Pharmacology, Department of Medicine, Indiana University School of Medicine, School of Informatics and Computing, Indiana University Purdue University Indianapolis, Indianapolis, Indiana, USA.
Bioinformatics approaches applied to healthcare data are a crucial tool to identifying potential drug‐drug interactions (DDIs). Researchers must exercise caution in their interpretation of results from big data studies, as structure and function of databases may generate bias. Assessment of results may be hampered by the lack of gold standard positive and negative interacting drug pairs. However, integrating clinical data mining with mechanistic understanding of drug action can promote confidence in DDI predictions.Adverse drug events (ADEs) are a major cause of morbidity and mortality, resulting in over 3.5 million outpatient clinic and 790,000 emergency department visits annually in the United States.1 DDIs lead to increased risk of ADEs, and ADE risk increases with higher drug burden. Individuals taking five or more medications have 1.88‐times greater risk of ADE than individuals taking one drug.1 Some DDI mechanisms, such as cytochrome P450 induction or inhibition, can be detected during early drug development. However, many DDIs are not apparent during the controlled, relatively small clinical studies performed during drug development. Postmarketing pharmacovigilance studies using real‐world data and the application of data mining and bioinformatics approaches have become vehicles for detecting novel drug‐drug interactions leading to adverse drug events (DDI‐ADE).The expansion of big data resources in health care and science has provided a rich source of information for pharmacovigilance studies to investigate DDIs. A variety of real‐world data have been employed to identify and predict potential DDIs, including clinical, social media, physiochemical, and biological data. Many studies rely on electronic health records (EHRs), healthcare claims data, or spontaneous reporting systems (SRS), such as the US Food and Drug Administration's (FDA's) Adverse Events Reporting System (FAERS), to provide real‐world data on medication use and adverse events. Other studies have applied natural language processing and machine learning algorithms to published literature, social media (e.g., Instagram and Twitter), or biological and chemical databases (e.g., KEGG, DrugBank, and PubChem) to predict DDIs. For a more detailed description of data and bioinformatics approaches utilized in translational DDI research, the reader is directed to recently published reviews.2, 3
Clinical Databases
EHR, claims databases, and other large healthcare databases provide a rich source of information regarding DDI‐ADE risk. However, there are also limitations to big data that must be considered during design and interpretation of DDI‐ADE studies.First, it is critical to understand the intended purpose of the databases. SRS, such as the FAERS and Vigibase, collect information from patients, healthcare professionals, and pharmaceutical companies regarding drug–adverse event reports. While SRS provide large quantities of data, the quality and details provided by each record can vary substantially. The use of EHRs is nearly universal in the United States and many other countries. Primary uses of EHRs include patient care delivery, management and support processes, patient self‐management, and financial and other administrative processes.4 EHR systems provide several advantages over SRS, including collection of information over time and review by healthcare professionals. Healthcare claims databases available from the Center for Medicare and Medicaid Services (CMS) or other insurance companies provide structured information on pharmacy or medical claims, including diagnosis codes. Similar to EHR data, claims data provide longitudinal information. Unlike EHR data, claims data provide information across healthcare systems. However, there is still potential for patients to be lost to follow‐up as they may move between healthcare insurers.Second, the structure of the database influences analyses. FAERS data are gathered from voluntary or mandatory reports to MedWatch. While MedWatch report forms are largely free text, most researchers utilize the structured FAERS data. The conversion from unstructured to structured data may lead to loss of information. EHR data are collected from various sources and stored in either structured or unstructured (free text) formats. Common structured data include laboratory results, diagnosis codes (e.g., International Classification of Diseases (ICD) codes), billing data (e.g., Current Procedural Terminology (CPT) codes), medication orders, and some demographic data. Unstructured data include clinical notes and reports from imaging or pathological studies. Health claims data are in a structured format, based on specific billing codes.Finally, one must be aware of biases inherent in the design of these databases.5 Reports are voluntarily entered into SRS, which may lead to underreporting or overreporting of particular events. A number of factors may lead to varying rates of reporting. For instance, the time a drug has been on the market may substantially impact rates of adverse event reports, with more reports expected within the first few years after a drug comes to market. Media coverage can also lead to increased awareness of a given adverse event, prompting individuals to report additional cases. It is also more likely that serious or extremely rare adverse events will be reported. Conversely, when data already support a link between a drug and adverse event, it may be less likely to be reported.5 On the other hand, EHRs were designed with the goal of improving clinical care, not as research tools. Thus, data may not be input to adequately address research questions. EHR data may also be incomplete. Patients transfer between healthcare providers or systems, leading to missing data or loss to follow‐up. Importantly for DDI‐ADE studies, there are many deficiencies in collecting and reporting of medication use and adherence data in EHRs. Medication orders and pharmacy records only capture prescribing patterns and delivery of prescription medications to patients. Although refill data may provide an indication of patient persistence and adherence to medications, this is an imperfect measure and not suitable for evaluating medications, such as antibiotics, used to treat acute conditions. Additionally, use of over‐the‐counter medications and supplements are only assessed clinically through patient interview and inconsistently documented within EHRs. Thus, studies of DDIs using EHRs are largely limited to interactions between prescription medications, reflecting potential but not necessarily actual use of the drug by patients. Like EHR data, claims data are collected longitudinally. While they are better able to capture information across providers (i.e., various pharmacies and physician groups), loss to follow‐up is common as individuals may move between insurance providers.
Defining Positive and Negative Results
In evaluating DDIs identified using biomedical informatics approaches, it is important to consider the definition of a positive and negative result. Data mining approaches often focus on high sensitivity, often at the cost of specificity. This leads to a large number of false positive findings. However, assessment of a data mining approach by the identification of true positive (or gold standard) DDIs may also not be valid due to selective prescribing or reporting patterns. Medication alerts with higher severity are more likely to be acted upon by clinicians. In the case of DDI research, clinicians may be less likely to use drugs together that have a known high risk of interaction. Thus, it may be difficult to distinguish “known positive” interactions in clinical databases. This may lead to discrepancies between DDIs identified from literature, gene, or pathway analyses and those identified in clinical databases. Additionally, there is no gold‐standard definition of a “positive” or “negative” DDI, as there is no one recognized standard resource for DDIs, and DDI reports vary among different compendia. At best, positive results from DDI studies can be grouped into three categories: previously reported known interactions; unknown but mechanistically plausible; and unknown with no obvious mechanism. While previously reported known interactions may be interpreted as “true positive DDI,” it is nearly impossible to define a “true negative DDI.” For instance, we have identified a potential increased risk for myopathy in individuals taking azithromycin with zolendroante.6 A careful review of the literature and other drug interaction resources failed to uncover reported interactions or mechanisms of increased risk of myopathy due to these drugs. However, it may be that these drugs have not previously been evaluated in this context. Yet another explanation for this finding may be that these drugs are commonly coadministered with other agents that are associated with increased myopathy. Without a definitive controlled study evaluating the two drugs in combination, it is unfeasible to classify a potential DDI as a false positive.The definition of a “true positive” and “true negative” may also be difficult to distinguish in biomedical literature. An initial step in natural language processing and machine learning methodologies for text mining is the manual annotation of a corpus to develop a data set of true positive and true negative phrases. These corpora are subsequently used for training and testing of algorithms prior to deployment. During the development of a full‐text corpus of clinical pharmacokinetic data, pairs of annotators in my lab reviewed a total of 23,372 phrases from 170 full texts. Annotators disagreed upon classification of nearly 20% of the phrases (S. Quinney, unpublished data). This is consistent with inter‐annotator agreement of complex concepts reported by other groups.7, 8, 9 Disagreements among annotators may be mitigated by additional experts in the field (e.g., a third annotator) or excluded from the corpus. However, it should be noted that exclusion may bias downstream analyses.
Alternative Approaches to Support Findings from Clinical Data
These limitations should not discourage investigators from utilizing big data resources, as they provide a rich resource of clinical data relating to drug‐ADEs. However, these approaches should not be used in isolation. Associations identified in EHRs or SRS do not equate to causation. Some medications may be associated with an adverse event due to their use in diagnosing or treating the ADE. For instance, we have identified an association between gadolinium‐based contrast agents and myopathy in the FAERS.6 While there is the potential for these agents to cause myopathy, a more likely explanation of this association is that contrast agents were employed during imaging studies during the diagnosis of the myopathy. To establish causation between DDIs and risk of ADE, studies must incorporate additional methods. Some researchers have utilized pathway analyses to develop drug‐gene‐drug links among potential DDIs.2, 3, 7
In vitro exploration of potential DDI mechanisms, such as cytochrome P450 or transporter induction or inhibition, coupled with pharmacokinetic in vitro in vivo extrapolation approaches may also be employed.2
Conclusion
Bioinformatics approaches are an important weapon in our arsenal for DDI assessment. When utilized as part of a comprehensive approach that incorporates mechanistic understanding through drug‐gene interactions or in vitro investigations, detection of potential DDIs through clinical data becomes a powerful discovery tool. However, researchers must be aware of limitations within these databases, such as reporting bias or inability to distinguish temporal patterns. Collaboration with a clinical expert may provide insight into the database structure and practice for inputting data. Clinicians can also assist with interpretation of potential DDIs, especially if they seem ambiguous. While bioinformatics approaches for detection of potential DDIs may be beset with a number of challenges and limitations, its potential for generating new knowledge and identifying novel DDIs among drugs with no known mechanism of interaction is exciting. These studies, when viewed as hypothesis generating, may uncover novel pathways of pharmacological action and DDIs, further propelling translational investigations.
Funding
S.K.Q. is funded in part by NIH/NIGMS R01GM104483 and R01GM117206.
Conflict of Interest
The author declared no competing interests for this work.
Authors: Florence T Bourgeois; Michael W Shannon; Clarissa Valim; Kenneth D Mandl Journal: Pharmacoepidemiol Drug Saf Date: 2010-09 Impact factor: 2.890
Authors: Rui Zhang; Michael J Cairelli; Marcelo Fiszman; Graciela Rosemblat; Halil Kilicoglu; Thomas C Rindflesch; Serguei V Pakhomov; Genevieve B Melton Journal: J Biomed Inform Date: 2014-01-19 Impact factor: 6.317
Authors: Danai Chasioti; Xiaohui Yao; Pengyue Zhang; Samuel Lerner; Sara K Quinney; Xia Ning; Lang Li; Li Shen Journal: IEEE J Biomed Health Inform Date: 2018-10-08 Impact factor: 5.772
Authors: Jon D Duke; Xu Han; Zhiping Wang; Abhinita Subhadarshini; Shreyas D Karnik; Xiaochun Li; Stephen D Hall; Yan Jin; J Thomas Callaghan; Marcus J Overhage; David A Flockhart; R Matthew Strother; Sara K Quinney; Lang Li Journal: PLoS Comput Biol Date: 2012-08-09 Impact factor: 4.475