| Literature DB >> 33948264 |
Janette Vazquez1, Samir Abdelrahman1,2, Loretta M Byrne3, Michael Russell3, Paul Harris3, Julio C Facelli1,4.
Abstract
INTRODUCTION: Lack of participation in clinical trials (CTs) is a major barrier for the evaluation of new pharmaceuticals and devices. Here we report the results of the analysis of a dataset from ResearchMatch, an online clinical registry, using supervised machine learning approaches and a deep learning approach to discover characteristics of individuals more likely to show an interest in participating in CTs.Entities:
Keywords: Supervised machine learning; clinical trial participation; clinical trial recruitment; convolutional neural network; deep learning
Year: 2020 PMID: 33948264 PMCID: PMC8057403 DOI: 10.1017/cts.2020.535
Source DB: PubMed Journal: J Clin Transl Sci ISSN: 2059-8661
Fig. 1.Pipeline of method for analysis.
Descriptive statistics of ResearchMatch dataset
| Demographic/Health Data | Total |
|---|---|
| Total instances | 841,377 |
| Unique users | 102,510 |
| Age, mean (SD) | 35.67 (E±16.45) |
| Gender | |
| Male | 28,579 (27.88%) |
| Female | 73,627 (71.82%) |
| Transgender | 293 (0.29%) |
| No Answer | 11 (0.01%) |
| Response | |
| Yes (%) | 420,688 (50%) |
| No (%) | 154,018 (18.31%) |
| No Answer (%) | 266,670 (31.69%) |
| Race | |
| American Indian/Alaska Native (%) | 692 (0.68%) |
| Asian (%) | 3693 (3.60%) |
| Black or African American (%) | 11,628(11.34%) |
| Multiracial (%) | 4847 (4.73%) |
| Native Hawaiian/Pacific Islander (%) | 215 (0.21%) |
| Other (%) | 2929 (2.86%) |
| White (%) | 78,492 (76.57%) |
| No Answer (%) | 14 (0.01%) |
| Ethnicity | |
| Hispanic (%) | 7795 (7.60%) |
| Non-Hispanic (%) | 94,626 (92.31%) |
| No Answer (%) | 89 (0.09%) |
| Tobacco Use | |
| Yes (%) | 16,540 (16.14%) |
| No (%) | 85,933 (83.83%) |
| No Answer (%) | 37 (0.03%) |
| VetStatus | |
| Non-Veteran (%) | 77,545(75.65%) |
| Veteran (%) | 4433 (4.32%) |
| No Answer (%) | 20,532 (20.03%) |
| Multiple Birth Status | |
| Single (%) | 100,208 (97.75%) |
| Twin (%) | 2172 (2.12%) |
| Triplet (%) | 98 (0.10%) |
| No Answer (%) | 32 (0.03%) |
| Medical conditions | |
| No medical conditions (%) | 35,495 (34.63%) |
| Reported medical conditions (%) | 66,559 (64.93%) |
| No Answer (%) | 456 (0.44%) |
| Most frequent conditions | |
| 1 C0344315 Depression | 94,626 (92.31%) |
| 2 C0020538 Hypertension | 76,631 (74.75%) |
| 3 C1963064 Anxiety | 51,658 (50.39%) |
| Medication usage | |
| No medication use (%) | 38,329 (37.39%) |
| Reported medication use (%) | 63,296 (61.75%) |
| No Answer (%) | 885 (0.86%) |
| Most frequent medications | |
| 1 C0978787 Multivitamins tab | 29,293 (28.58%) |
| 2 C0162723 Zyrtec | 16,163 (15.77%) |
| 3 C0728762 Synthroid | 16,223 (15.83%) |
| Top 3 States | |
| 1 OH (%) | 12,751 (12.44%) |
| 2 TN (%) | 7953 (7.76%) |
| 3 NY (%) | 7803 (7.61%) |
| Willing to travel (in miles) | |
| 0 (%) | 296 (0.29%) |
| 50 (%) | 43,219 (42.16%) |
| 100 (%) | 17,902 (17.46%) |
| 200 (%) | 22,445 (21.90%) |
| 300 (%) | 2006 (1.96%) |
| 1000 (%) | 16,642 (16.23%) |
| Charge | |
| Guardian (%) | 6413 (6.26%) |
| Self (%) | 96,097 (93.74%) |
| How learn | |
| Facebook-Advertisement (%) | 738 (0.72%) |
| From a friend/colleague (%) | 6960 (6.79%) |
| From an organization (%) | 16,748 (16.34%) |
| From my physician (%) | 1198 (1.17%) |
| Health fair (%) | 554 (0.54%) |
| News release (%) | 1933 (1.89%) |
| Other promotion (%) | 10,039 (9.79%) |
| RM code (%) | 20,134 (19.64%) |
| Search Engine – Google (%) | 14,957 (14.59%) |
| No Answer (%) | 29,249 (28.53%) |
Standardized differences (SMD) and multicollinearity values for ResearchMatch dataset. Standardized differences are comparisons between ‘yes’ and ‘no’ responders
| Variable | SMD | Multicollinearity |
|---|---|---|
| Contact_date | 0.028 | 1.076947 |
| Age_at_account_created | 0.001 | 1.148036 |
| Race | 0.005 | 1.098512 |
| Ethnicity | 0.004 | 1.096648 |
| Vetstatus | 0.012 | 1.516087 |
| Gender | 0.003 | 1.092575 |
| Tobacco | 0.008 | 1.048358 |
| Twin | 0.001 | 1.060303 |
| State | 0.012 | 1.025191 |
| Parentstatus | 0.001 | 1.389189 |
| Willing_to_travel | 0.002 | 1.348529 |
| Charge | 0.004 | 1.089152 |
| Has_conditions | 0.007 | 2.467756 |
| Has_meds | 0.009 | 2.029087 |
| Guardian_account_created | 0.032 | 1.983791 |
| Last_login | 0.026 | 1.735373 |
| How_learn | 0.017 | 1.148880 |
| Condition | 0.001 | 2.071828 |
| Medication | 0.001 | 1.716581 |
Results for ResearchMatch dataset
| Machine Learning Classifiers | AUC – Validation | AUC – Testing | Accuracy | Recall | Precision |
|---|---|---|---|---|---|
| CNN | 0.8748 | 0.8105 | 0.7483 | 0.7738 | 0.7371 |
| RFC | 0.7284 | 0.7288 | 0.7284 | 0.7306 | 0.7271 |
| KNC | 0.7094 | 0.7091 | 0.7095 | 0.6037 | 0.7653 |
| Decision Tree | 0.7027 | 0.7047 | 0.7027 | 0.7179 | 0.6963 |
| ABC | 0.6804 | 0.6803 | 0.6804 | 0.6817 | 0.6796 |
| LR | 0.6394 | 0.6383 | 0.6394 | 0.6511 | 0.6358 |
| GNB | 0.5895 | 0.5872 | 0.5895 | 0.6743 | 0.5762 |
RFC, Random Forest Classifier; ABC, Adaboost Classifier; KNC, K-Nearest Neighbor; GNB, Gaussian Naïve Bayes; LR, Logistic Regression; CNN: Convolutional Neural Network