Literature DB >> 32885823

Recommendations for patient similarity classes: results of the AMIA 2019 workshop on defining patient similarity.

Nathan D Seligson^1,2, Jeremy L Warner³, William S Dalton^4,5, David Martin⁶, Robert S Miller⁷, Debra Patt⁸, Kenneth L Kehl^9,10, Matvey B Palchuk^10,11, Gil Alterovitz^10,12, Laura K Wiley¹³, Ming Huang¹⁴, Feichen Shen¹⁴, Yanshan Wang¹⁴, Khoa A Nguyen¹⁵, Anthony F Wong¹⁶, Funda Meric-Bernstam¹⁷, Elmer V Bernstam¹⁸, James L Chen¹⁹.

Abstract

Defining patient-to-patient similarity is essential for the development of precision medicine in clinical care and research. Conceptually, the identification of similar patient cohorts appears straightforward; however, universally accepted definitions remain elusive. Simultaneously, an explosion of vendors and published algorithms have emerged and all provide varied levels of functionality in identifying patient similarity categories. To provide clarity and a common framework for patient similarity, a workshop at the American Medical Informatics Association 2019 Annual Meeting was convened. This workshop included invited discussants from academics, the biotechnology industry, the FDA, and private practice oncology groups. Drawing from a broad range of backgrounds, workshop participants were able to coalesce around 4 major patient similarity classes: (1) feature, (2) outcome, (3) exposure, and (4) mixed-class. This perspective expands into these 4 subtypes more critically and offers the medical informatics community a means of communicating their work on this important topic.

Entities: Chemical Disease Gene Species

Keywords: patient matching; patients like me; personalized medicine, similar patients; precision medicine

Mesh：

Year: 2020 PMID： 32885823 PMCID： PMC7671612 DOI： 10.1093/jamia/ocaa159

Source DB: PubMed Journal: J Am Med Inform Assoc ISSN： 1067-5027 Impact factor: 4.497

INTRODUCTION

The premise of precision medicine is deceptively simple: similar patients with similar features have similar outcomes. While traditional clinical trial design creates strong evidence in regard to the activity of a singular intervention, it does not provide the basis for personalizing medical care for a specific patient. Finding similar patients furthers the pursuit of precision medicine by identifying key traits and features of patients that may identify their clinical course., Patient matching provides the opportunity to improve patient care and clinical research by identifying and potentially controlling for key covariates that may help predict a patient’s outcome. Previously identified key challenges in the patient similarity space included data heterogeneity and data-sharing algorithm selection. Significant progress has been made in these areas, especially within the field of oncology. Efforts by the cancer community, such as those of Minimal Common Oncology Data Element (mCODE) and Global Alliance for Genomics and Health (GA4GH), continue to develop and refine standards for parsing ever-evolving patient features. Further, data elements like tumor genomics and PD-L1 positivity have rapidly evolved to become commonplace in research and clinical care. In the domain of data sharing, consortia efforts, such as the Oncology Research Information Exchange Network and the American Association for Cancer Research Genie project, have amassed large volumes of clinico-genomic patient data., Continued publication of results of high enrollment, multi-arm treatment trials, such as the tumor-agnostic National Cancer Institute Molecular Analysis for Therapy Choice and the lung cancer Alchemist trials, have been anxiously awaited to evaluate the utility of their patient-matching criteria. Naturally, multi-dimensional patient-matching algorithms have proliferated in part due to the variety of use cases, specific features, and outcome variables available. While the science of patient matching has vastly improved, our ability to communicate about the type of patient similarity we use has become a significant challenge., Advances in multi-dimensional patient matching have been slow to develop due in part to the heterogeneous interpretations that exist within similarity matching. A continued lack of consensus regarding terminology, methods, and data types has resulted in poor consistency of resultant findings of patient-matching studies. While the science of patient matching has vastly improved, our ability to communicate about the type of patient similarity we endeavor to accomplish has become a significant challenge., Indeed, heterogeneous interpretations exist within similarity matching. A continued lack of consensus regarding terminology, methods, and data types has resulted in poor consistency of resultant findings of patient-matching studies.

DEFINING THE PROBLEM: AMBIGUOUS NOMENCLATURE IN THE PATIENT SIMILARITY SPACE

Identification of common language and methods among the many efforts of quantifying and improving patient similarity is vital to improve precision patient care. In addition to a need for improved standardization of medical terminology and categorization, there is further need for an accepted framework for synthesizing data elements that create a generalizable “computable phenotype” as a basis for matching similar patients. Defining similar patients, therefore, may require disease- or task-specific methods. Shifts in the core features of a patient’s disease over time adds additional complexity to defining similarity. Features such as genomic similarity are distinct from features such as similar response to therapy. An example of temporal complexity can be seen in the treatment of cancer where 2 patients diagnosed with early-stage disease may be quite similar early in their disease trajectories, but if 1 of those patients develops recurrent disease, that patient may subsequently be much more similar to a third patient who had advanced disease at diagnosis (Figure 1). Standardization of language and methodology when discussing patient similarity is vital to the progression of its study and implementation.

Figure 1.

Defining patient similarity. These diagrams represent the clinical courses of 3 hypothetical patients with non-small cell lung cancer. Patient A corresponds to a patient who was diagnosed with early stage disease, who underwent surgery and adjuvant chemotherapy and, so far, has not developed recurrent disease. Patient B had a trajectory that began similarly but developed cancer recurrence, leading their oncologist to order tumor genomic sequencing and prescribe immunotherapy. Patient C had metastatic disease at diagnosis which was treated initially with chemotherapy and, subsequently, with immunotherapy. Any definition of similarity among these patients must necessarily be time-dependent; early in the cancer trajectory, patients A and B are most similar, but later in the trajectory, patients B and C are most similar.

PATIENT SIMILARITY WORKSHOP DETAILS

To define a common framework for relating patient similarity, a workshop was convened at the American Medical Informatics Association (AMIA) 2019 Annual Meeting entitled: “What defines a patient like mine? A collaborative effort to provide clarity into the computational nomenclature of patient similarity, their requisite data categories, and associated algorithms.” Open to all registrants of the meeting, attendees participated in a series of focused presentations by expert discussants with academic, industry, and regulatory viewpoints. This perspective builds on the consensus recommendations presented by discussants and among attendees.

CONSENSUS RECOMMENDATIONS

Patient similarity can be divided into 4 classes: 1) feature; 2) outcome; 3) exposure; and 4) mixed-class (Figure 2; Table 1). Each class has particular characteristics of temporality (snapshot versus change over time), and whether the feature describes an object or an action. By object, we refer to features that are properties of physical objects (ie, people or tumors), also commonly thought of as baseline characteristics or attributes. By actions we refer to processes performed (ie, various treatment modalities).

Figure 2.

Patient similarity categories. Classes of patient similarity proposed in this perspective. Drawing from a broad range of backgrounds, workshop participants were able to coalesce around 4 major patient similarity categories: (1) Feature, (2) Outcome, (3) Exposure, and (4) Mixed-Class.

Table 1.

Classes of patient similarity

Similarity Class	Temporality	Object or Action	Examples
Feature	Snapshot	Object	Disease type/status, past medical history, treatments received
Outcome	Snapshot	Object	Adverse event, treatment efficacy
Exposure	Change over time	Action	Prior lines of therapy define a cohort for study and reflect disease status
Mixed-class	Snapshot/change over time	Object/Action	Molecularly and disease-matched patients who exhibit a similar outcome to therapy

Class 1: feature similarity

Feature similarity can be considered as the state of a physical object or short period of a “snapshot.” This would include the mutational status of the tumor, the state of the disease, cancer stage, as well as more complex features, such as past medical history, previous therapies, and allergies. A common example of feature-based similarity in the biomedical informatics domain is the use of diagnostic billing codes to define groups of patients. Despite their demonstrated utility, abstracted features are nevertheless problematic due to their imprecision., Historically, feature similarity has been well-studied and implemented in clinical practice. However, developing high-dimensional feature similarity quickly results in inaccurate or minimal similarities between patients, particularly when dimensionality exceeds the number of patients in a study., Methods to identify features with the greatest predictive value of a given outcome are necessary to improve the utility of this class of patient similarity measures.

Class 2: outcome similarity

Outcome similarity focuses on finding matches in temporal-based endpoints. These metrics try to answer the question, “How did the patient do?” Outcome measures used to match similar patients can also be considered a “snapshot” of a patient’s health. These outcome measures can be process measures of other related interventions, toxicities related to disease or treatment, or classic therapeutic benchmarking outcome measures, in addition to others. Using these metrics, it may be possible to find similar patients for a control group, therapeutic benchmarking, and granular dynamic “features” of a patient reflecting “outcomes” of disease control. Outcome similarity metrics may ultimately be used to develop quintessential real-world evidence (RWE). As RWE is not without its limitations, developing granular understanding of its contribution to data from existing clinical trials can help clinical trialists select patient populations, help companies prioritize research efforts, or reduce uncertainty for patients and practitioners surrounding treatment decisions. Pulling these outcome measures from systematically mapped sources of structured data reduces variability and enhances RWE as a modeling tool. Data in this space are inherently challenging to analyze and are highly subject to selection bias and confounding.

Class 3: exposure similarity

Exposure similarity identifies patients based on the presence or absence of therapeutic interventions or other exposures which affect their health status. These exogenously applied “actions” may include drugs, devices, surgical and radiation therapy, and environmental exposures. Feature similarity addresses patient and disease characteristics as baseline objects, and outcome similarity treats these objects as endpoints. In contrast, exposure similarity defines changes over time, adding a temporal dimension to patient similarity. In an observational cohort study of a therapeutic intervention, exposure similarity is used to define 1 or more groups for comparison. In clinical trials, exposure to prior lines of therapy are used as inclusion criteria in order to enhance the precision of likely disease activity status and response to therapy. These prior lines of therapy are often described in the indications for approved drugs and biologics. Use of RWE as an external comparator for a single-arm trial places special emphasis on temporal issues as well as exposure and feature similarity. Because the groups are not necessarily ascertained in the same temporal period with the same background availability of therapeutic exposures and with the same level of granularity regarding the details of the therapeutic interventions, secular trends in therapeutic patterns, differing availability of therapeutics, or differential ascertainment of the details of exposure may impact outcomes.

Class 4: mixed-class similarity

When considering the 3 previous classes of patient similarity, the last significant class of similarity is the interaction of these classes, or a mixed-class similarity. For example, the interaction of comorbidity status and diuretic therapy exposure in a patient creates a mixed metric more complex and indicative of true patient similarity. In the case of 3 different cancer patients outlined in Figure 1, the interaction of baseline feature, exposure, and outcome provided vastly different similarity possibilities temporally. In modern clinical medicine, attempting to derive general phenotypes for patient matching may be extremely challenging; in effect, suffering from a “curse of dimensionality” would imply no 2 patients are similar in any meaningful way given the near infinite data necessary to accurately portray a patient. Mixed-class similarity represents a challenge computationally that has yet to be well-addressed. It is likely that computable similarity efforts that are task- and setting-dependent will improve its applicability.

OPPORTUNITIES FOR IMPROVEMENT

Ultimately, multiple sources of data derived from the previously discussed classes of patient similarity must be integrated to adequately construct patient cohorts that are similar in phenotype and genotype. Previous studies have demonstrated a preference for study of molecular measures of patient similarity; however, multi-class phenotype calculation is also necessary., One approach to harmonizing the collection and sharing of data is the creation of networks between stakeholders in order to agree on key parameters, such as patient consent and data dictionaries. Recognizing that patients’ diseases are heterogeneous and molecularly evolve following treatment, may require sequential clinical and molecular analysis to accurately assign patients to the most similar patient cohort. Approaches based on sequence alignment may provide promising solutions for matching patients while considering important temporal information. The application of machine learning (ML) to analyze observational cohorts also has the potential to improve clinical decision making but will require very large populations followed prospectively throughout the clinical course for each patient. Patient similarity will also be key to a type of ML called reinforcement learning (RL). In contrast to traditional supervised learning methods that usually rely on single-episode training, RL tackles clinical questions with sequential decision-making problems using sampled, evaluative, and delayed feedback. Identifying common health variables is a vital element of biomedical research. Currently utilized general ontologies for medical concepts (eg, SNOMED, ICD) provide mechanisms for structuring the often-unstructured data contained in health records. These coding systems have improved the structure of the medical record but lack the ability to define key clinical characteristics for many aspects of clinical care. Newer frameworks, such as mCODE, are specifically designed to capture such key concepts and may serve to further standardize the language of medical data and provide a platform to improve the computation of patient similarity.,

CONCLUSION

In many respects, it is easier to sequence a whole cancer genome in 2020 than to readily and reproducibly define a group of “similar” patients. Similarity classes create a framework for defining groups of patients who are likely to have similar defining traits, outcomes, and/or temporal experiences. This aids clinicians in their treatment decisions and patients in anchoring themselves to a defined wellness or illness group. While every patient is unique and every journey is different, practically, treatments are targeted toward a group of patients with similar characteristics for whom we reasonably would expect a similar response. This is the same reason nomenclature has moved from personalized medicine to precision medicine. This objective approach to similarity has major advantages. First, people want to develop kinship with patients facing similar medical issues as themselves—as demonstrated through the development of cancer biomarker-defined patient advocacy groups (eg, ROS1ders, EGFR Resisters)., These groups demonstrate how patient similarity can provide a community for patients while also serving as a launchpad for further research. Second, reproducible similarity metrics are also used in drug development as industry and regulatory bodies approach drug approvals in defined patient cohorts, with biomarkers and prior treatment-specific indications granted by the FDA. Taken together, this perspective represents a nascent effort to bring together a variety of stakeholders in patient similarity to define common nomenclature. Communities that centralize stakeholders, such as AMIA, must continue to unify future clinical and research efforts in this space. We believe these aforementioned classes will provide a clear and useful basis for communicating work surrounding patient similarity.

AUTHOR CONTRIBUTIONS

All authors contributed to the manuscript preparation and have approved the final version of the manuscript and agree to be accountable for all aspects of the work.

32 in total

1. Evidence based medicine: what it is and what it isn't. 1996.

Authors: David L Sackett; William M C Rosenberg; J A Muir Gray; R Brian Haynes; W Scott Richardson
Journal: Clin Orthop Relat Res Date: 2007-02 Impact factor: 4.176

2. Informatics and medicine--from molecules to populations.

Authors: K A Kuhn; A Knoll; H-W Mewes; M Schwaiger; A Bode; M Broy; H Daniel; H Feussner; R Gradinger; H Hauner; H Höfler; B Holzmann; A Horsch; A Kemper; H Krcmar; E F Kochs; R Lange; R Leidl; U Mansmann; E W Mayr; T Meitinger; M Molls; N Navab; F Nüsslin; C Peschel; M Reiser; J Ring; E J Rummeny; J Schlichter; R Schmid; H E Wichmann; S Ziegler
Journal: Methods Inf Med Date: 2008 Impact factor: 2.176

3. Too many covariates and too few cases? - a comparative study.

Authors: Qingxia Chen; Hui Nian; Yuwei Zhu; H Keipp Talbot; Marie R Griffin; Frank E Harrell
Journal: Stat Med Date: 2016-06-30 Impact factor: 2.373

4. Incorporating Knowledge-Driven Insights into a Collaborative Filtering Model to Facilitate the Differential Diagnosis of Rare Diseases.

Authors: Feichen Shen; Hongfang Liu
Journal: AMIA Annu Symp Proc Date: 2018-12-05

5. Opportunities for Patient Matching Algorithms to Improve Patient Care in Oncology.

Authors: Travis Johnson; David Liebner; James L Chen
Journal: JCO Clin Cancer Inform Date: 2017-11

6. Estimating prognosis with the aid of a conversational-mode computer program.

Authors: A R Feinstein; J F Rubinstein; W A Ramshaw
Journal: Ann Intern Med Date: 1972-06 Impact factor: 25.391

7. Patient similarity for precision medicine: A systematic review.

Authors: E Parimbelli; S Marini; L Sacchi; R Bellazzi
Journal: J Biomed Inform Date: 2018-06-01 Impact factor: 6.317

8. Adaptive designs in clinical trials: why use them, and how to run and report them.

Authors: Philip Pallmann; Alun W Bedding; Babak Choodari-Oskooei; Munyaradzi Dimairo; Laura Flight; Lisa V Hampson; Jane Holmes; Adrian P Mander; Lang'o Odondi; Matthew R Sydes; Sofía S Villar; James M S Wason; Christopher J Weir; Graham M Wheeler; Christina Yap; Thomas Jaki
Journal: BMC Med Date: 2018-02-28 Impact factor: 8.775

9. Redefining diuretics use in hypertension: why select a thiazide-like diuretic?

Authors: Michel Burnier; George Bakris; Bryan Williams
Journal: J Hypertens Date: 2019-08 Impact factor: 4.844

10. Desiderata for computable representations of electronic health records-driven phenotype algorithms.

Authors: Huan Mo; William K Thompson; Luke V Rasmussen; Jennifer A Pacheco; Guoqian Jiang; Richard Kiefer; Qian Zhu; Jie Xu; Enid Montague; David S Carrell; Todd Lingren; Frank D Mentch; Yizhao Ni; Firas H Wehbe; Peggy L Peissig; Gerard Tromp; Eric B Larson; Christopher G Chute; Jyotishman Pathak; Joshua C Denny; Peter Speltz; Abel N Kho; Gail P Jarvik; Cosmin A Bejan; Marc S Williams; Kenneth Borthwick; Terrie E Kitchner; Dan M Roden; Paul A Harris
Journal: J Am Med Inform Assoc Date: 2015-09-05 Impact factor: 4.497

4 in total

1. A Novel Patient Similarity Network (PSN) Framework Based on Multi-Model Deep Learning for Precision Medicine.

Authors: Alramzana Nujum Navaz; Hadeel T El-Kassabi; Mohamed Adel Serhani; Abderrahim Oulhaj; Khaled Khalil
Journal: J Pers Med Date: 2022-05-10

2. Drivers of genomic loss of heterozygosity in leiomyosarcoma are distinct from carcinomas.

Authors: Nathan D Seligson; Joy Tang; Dexter X Jin; Monica P Bennett; Julia A Elvin; Kiley Graim; John L Hays; Sherri Z Millis; Wayne O Miles; James L Chen
Journal: NPJ Precis Oncol Date: 2022-04-25

3. Can Machine Learning from Real-World Data Support Drug Treatment Decisions? A Prediction Modeling Case for Direct Oral Anticoagulants.

Authors: Andreas D Meid; Lucas Wirbka; Andreas Groll; Walter E Haefeli
Journal: Med Decis Making Date: 2021-12-15 Impact factor: 2.749

4. OSIRIS: A Minimum Data Set for Data Sharing and Interoperability in Oncology.

Authors: Julien Guérin; Yec'han Laizet; Vincent Le Texier; Laetitia Chanas; Bastien Rance; Florence Koeppel; François Lion; Sophie Gourgou; Anne-Laure Martin; Manuel Tejeda; Maud Toulmonde; Stéphanie Cox; Elisabeth Hess; Marina Rousseau-Tsangaris; Vianney Jouhet; Pierre Saintigny
Journal: JCO Clin Cancer Inform Date: 2021-03

4 in total