Kevin Zhang1, Dina Demner-Fushman2. 1. College of Medicine and Life Sciences, University of Toledo, Toledo, OH, USA. 2. Lister Hill National Center for Biomedical Communications, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.
Abstract
OBJECTIVE: To develop automated classification methods for eligibility criteria in ClinicalTrials.gov to facilitate patient-trial matching for specific populations such as persons living with HIV or pregnant women. MATERIALS AND METHODS: We annotated 891 interventional cancer trials from ClinicalTrials.gov based on their eligibility for human immunodeficiency virus (HIV)-positive patients using their eligibility criteria. These annotations were used to develop classifiers based on regular expressions and machine learning (ML). After evaluating classification of cancer trials for eligibility of HIV-positive patients, we sought to evaluate the generalizability of our approach to more general diseases and conditions. We annotated the eligibility criteria for 1570 of the most recent interventional trials from ClinicalTrials.gov for HIV-positive and pregnancy eligibility, and the classifiers were retrained and reevaluated using these data. RESULTS: On the cancer-HIV dataset, the baseline regex model, the bag-of-words ML classifier, and the ML classifier with named entity recognition (NER) achieved macro-averaged F2 scores of 0.77, 0.87, and 0.87, respectively; the addition of NER did not result in a significant performance improvement. On the general dataset, ML + NER achieved macro-averaged F2 scores of 0.91 and 0.85 for HIV and pregnancy, respectively. DISCUSSION AND CONCLUSION: The eligibility status of specific patient populations, such as persons living with HIV and pregnant women, for clinical trials is of interest to both patients and clinicians. We show that it is feasible to develop a high-performing, automated trial classification system for eligibility status that can be integrated into consumer-facing search engines as well as patient-trial matching systems. Published by Oxford University Press on behalf of the American Medical Informatics Association 2017. This work is written by US Government employees and is in the public domain in the US.
OBJECTIVE: To develop automated classification methods for eligibility criteria in ClinicalTrials.gov to facilitate patient-trial matching for specific populations such as persons living with HIV or pregnant women. MATERIALS AND METHODS: We annotated 891 interventional cancer trials from ClinicalTrials.gov based on their eligibility for human immunodeficiency virus (HIV)-positive patients using their eligibility criteria. These annotations were used to develop classifiers based on regular expressions and machine learning (ML). After evaluating classification of cancer trials for eligibility of HIV-positive patients, we sought to evaluate the generalizability of our approach to more general diseases and conditions. We annotated the eligibility criteria for 1570 of the most recent interventional trials from ClinicalTrials.gov for HIV-positive and pregnancy eligibility, and the classifiers were retrained and reevaluated using these data. RESULTS: On the cancer-HIV dataset, the baseline regex model, the bag-of-words ML classifier, and the ML classifier with named entity recognition (NER) achieved macro-averaged F2 scores of 0.77, 0.87, and 0.87, respectively; the addition of NER did not result in a significant performance improvement. On the general dataset, ML + NER achieved macro-averaged F2 scores of 0.91 and 0.85 for HIV and pregnancy, respectively. DISCUSSION AND CONCLUSION: The eligibility status of specific patient populations, such as persons living with HIV and pregnant women, for clinical trials is of interest to both patients and clinicians. We show that it is feasible to develop a high-performing, automated trial classification system for eligibility status that can be integrated into consumer-facing search engines as well as patient-trial matching systems. Published by Oxford University Press on behalf of the American Medical Informatics Association 2017. This work is written by US Government employees and is in the public domain in the US.
Authors: Mary A Foulkes; Christine Grady; Catherine Y Spong; Angela Bates; Janine A Clayton Journal: J Womens Health (Larchmt) Date: 2011-08-05 Impact factor: 2.681
Authors: Samir R Thadani; Chunhua Weng; J Thomas Bigger; John F Ennever; David Wajngurt Journal: J Am Med Inform Assoc Date: 2009-08-28 Impact factor: 4.497
Authors: Yizhao Ni; Jordan Wright; John Perentesis; Todd Lingren; Louise Deleger; Megan Kaiser; Isaac Kohane; Imre Solti Journal: BMC Med Inform Decis Mak Date: 2015-04-14 Impact factor: 2.796
Authors: Satya S Sahoo; Shiqiang Tao; Andrew Parchman; Zhihui Luo; Licong Cui; Patrick Mergler; Robert Lanese; Jill S Barnholtz-Sloan; Neal J Meropol; Guo-Qiang Zhang Journal: Cancer Inform Date: 2014-12-04
Authors: Guergana K Savova; Ioana Danciu; Folami Alamudun; Timothy Miller; Chen Lin; Danielle S Bitterman; Georgia Tourassi; Jeremy L Warner Journal: Cancer Res Date: 2019-08-08 Impact factor: 12.701
Authors: Steven R Chamberlin; Steven D Bedrick; Aaron M Cohen; Yanshan Wang; Andrew Wen; Sijia Liu; Hongfang Liu; William R Hersh Journal: JAMIA Open Date: 2020-07-26
Authors: Ahmed Rafee; Sarah Riepenhausen; Philipp Neuhaus; Alexandra Meidt; Martin Dugas; Julian Varghese Journal: BMC Med Res Methodol Date: 2022-05-14 Impact factor: 4.612