| Literature DB >> 27121609 |
Yizhao Ni1, Andrew F Beck2, Regina Taylor3, Jenna Dyas3, Imre Solti4, Jacqueline Grupp-Phelan3, Judith W Dexheimer5.
Abstract
OBJECTIVE: (1) To develop an automated algorithm to predict a patient's response (ie, if the patient agrees or declines) before he/she is approached for a clinical trial invitation; (2) to assess the algorithm performance and the predictors on real-world patient recruitment data for a diverse set of clinical trials in a pediatric emergency department; and (3) to identify directions for future studies in predicting patients' participation response.Entities:
Keywords: machine learning; patient-directed precision recruitment; predictive modeling; socioeconomic status
Mesh:
Year: 2016 PMID: 27121609 PMCID: PMC4926740 DOI: 10.1093/jamia/ocv216
Source DB: PubMed Journal: J Am Med Inform Assoc ISSN: 1067-5027 Impact factor: 4.497
Figure 1:The overall processes of the study.
The List of Variables Collected from Multiple Sources
| Variable Name | Variable Description |
|---|---|
| Age (EI) | Patient’s age |
| Gender (EI) | Patient’s gender |
| Race (EI) | Patient’s race |
| Ethnicity (EI) | Patient’s ethnicity |
| Insurance type (EI) | Patient’s insurance type (eg, commercial, Medicare, and self-pay) |
| Arrival means (EI) | The arrival means of the patient (eg, walk in, by private car and other) |
| Arrival time (EI) | The time interval in which the patient arrived (at 2-h increment) |
| Arrival season (EI) | The season of the patient visit (spring, summer, autumn, winter) |
| Guardian presence (EI) | Is the patient escorted by his/her legal guardian? (yes/no) |
| Arrival complaint (EI) | Number of arrival complaints |
| Chief complaint (EI) | The category of the patient’s chief complaint (eg, abdominal pain) |
| Pain score (EI) | The pain score evaluated by the clinicians (normalized from 0 to 10) |
| Triage priority (EI) | Is this patient a triage priority? (yes/no) |
| Acuity (EI) | The acuity of the patient’s chief complaint (from 1 to 5: 1 indicating urgent complaint and 5 nonurgent complaint) |
| Length of stay (EI) | Patient’s length of stay (at 30-min increment) |
| Disposition (EI) | The disposition of the patient (admit, discharge, transfer, other) |
| Poor (SES) | Percentage of persons within a census tract at <100% poverty line |
| Extreme poor (SES) | Percentage of persons within a census tract at <50% poverty line |
| Unemployment (SES) | Unemployment rate within a census tract for persons ≥16 years in the workforce |
| Income (SES) | Median household income within a census tract |
| Occupied house (SES) | Percentage of housing units that are occupied within a census tract |
| House value (SES) | Median value of owner-occupied houses within a census tract |
| Crowded house (SES) | Percentage of households with ≥1 person per room within a census tract |
| Rent house (SES) | Percentage of households who rent their home within a census tract |
| Own car (SES) | Percentage of households who do not own a car within a census tract |
| Marriage (SES) | Percentage of persons aged ≥15 years who have never married within a census tract |
| Education (SES) | Percentage of persons aged ≥25 years with less than 12th grade education within a census tract |
| Complexity (CTC) | Amount of information provided to the patient (simple, moderate, complex) |
| Time required (CTC) | Length of time required for the trial (brief, moderate, extensive) |
| Invasiveness (CTC) | Level of invasiveness of the trial (from 1 to 5: 1 indicating noninvasive and 5 highly invasive) |
| Incentive (CTC) | Amount of compensation |
| Conductor (CTC) | Conductor of the clinical trial (patient, parent, CRC, nurse, and physician) |
| Trial type (CTC) | Type of the clinical trial (observation, intervention, other) |
| Randomization (CTC) | Is the trial a randomized trial? (yes/no) |
| Disease specific (CTC) | Is the trial a disease specific trial? (yes/no) |
| Multi-center (CTC) | Is the trial a multi-center trial? (yes/no) |
| Sample required (CTC) | Does the trial require samples (eg, blood sample)? (yes/no) |
| Follow-up visit (CTC) | Does the trial require follow-up visits? (yes/no) |
| Follow-up call (CTC) | Does the trial require follow-up calls? (yes/no) |
| Insurance restriction (CTC) | Does the trial only enroll Medicare or self-pay patients? (yes/no) |
| Sensitive topic (CTC) | Does the trial involve sensitive topics? (yes/no) |
“EI” in “Variable Name” indicates an “Encounter Information” variable, “SES” a socioeconomic status variable and “CTC” a clinical trial characteristics variable.
Figure 2:The consent and decline rates of the clinical trials.
Performance of Different Classification Algorithms with all Variables.
| Classifier | Ten-fold cross validation performance (%) | ||||
|---|---|---|---|---|---|
| AUC | |||||
| BASELINE | 61.68 | 61.58 | 61.54 | 50.64 | 1.06E-9 |
| Logistic Regression | 70.82 | 92.02 | 80.04 | 2.85E-1 | |
| SVM + Linear Kernel | 70.22 | 92.02 | 79.65 | 69.91 | 2.83E-1 |
| SVM + RBF Kernel | 70.35 | 69.46 | N/A | ||
| Random Forest | 79.31 | 75.76 | 72.13 | 5.56E-6 | |
Bold numbers indicate the best results.
*The P-value was calculated by comparing the F-measure between the best algorithm (SVM + RBF kernel) and the other algorithms using the paired t-test in 10-fold cross-validation.
N/A indicates that the performances between the two algorithms are identical and no P-value is returned.
Figure 3:Performance of the predictive models on individual clinical trials under the out-of-domain test simulation.
Performance of Logistic Regression with Different Variable Sets
| Variable Set | Ten-fold cross validation performance (%) | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Set | EI | SES | CTC | AUC | |||||||
| 1 | √ | × | × | 65.33 | 81.95 | 72.69 | 61.25 | 3.39E-10 | |||
| 2 | × | √ | × | 61.61 | 91.80 | 73.72 | 52.15 | 1.05E-7 | |||
| 3 | × | × | √ | 70.64 | 90.12 | 79.20 | 72.22 | 9.60E-3 | |||
| 4 | √ | √ | × | 65.06 | 82.25 | 72.65 | 61.94 | 1.99E-8 | |||
| 5 | √ | × | √ | 70.70 | 90.50 | 79.38 | 72.23 | 5.28E-2 | |||
| 6 | × | √ | √ | 70.01 | 79.60 | 71.41 | 3.30E-1 | ||||
| 7 | √ | √ | √ | 92.02 | N/A | ||||||
√ variable set used; × otherwise.
Bold numbers indicate the best results.
*The P-value was calculated by comparing the F-measure between the best algorithm (set 7) and the other algorithms using the paired t-test in 10-fold cross-validation.
N/A indicates that the performances between the two algorithms are identical and no P-value is returned.
Variables Output by Logistic Regression That Were Significant at the P ≤ .1 Level (Ordered by Odds Ratio).
| Variable Index | Variable category | Variable description | OR (95% CI) |
|---|---|---|---|
| 1 | CTC | Disease specific trial: yes vs no+ | 5.29 (0.91, 30.89) |
| 2 | EI | Guardian presence: yes vs no | 2.22 (1.03, 4.78) |
| 3 | EI | Arrival means: other means vs by car | 1.56 (1.00, 2.44) |
| 4 | EI | Arrival season 1: winter vs summer | 1.54 (1.14, 2.08) |
| 5 | EI | Arrival season 2: autumn vs summer | 1.33 (1.00, 1.77) |
| 6 | EI | Race: White vs African American | 1.31 (1.02, 1.69) |
| 7 | EI | Disposition: discharge vs admission+ | 1.29 (0.96, 1.74) |
| 8 | EI | Length of stay: every 30-min increment | 1.06 (1.03, 1.09) |
| 9 | EI | Pain score: every 1-point increment | 0.95 (0.92, 0.98) |
| 10 | SES | Extreme poor: every 3% increment | 0.94 (0.88, 0.99) |
| 11 | CTC | Randomization: yes vs no+ | 0.43 (0.16, 1.14) |
| 12 | CTC | Multi-center: yes vs no | 0.37 (0.13, 0.99) |
| 13 | CTC | Complexity 1: complex vs simple | 0.24 (0.06, 0.96) |
| 14 | CTC | Complexity 2: moderate vs simple | 0.16 (0.03, 0.80) |
| 15 | CTC | Follow-up visit: yes vs no | 0.10 (0.02, 0.43) |
| 16 | EI | Chief complaint: swollen lymph nodes+ | 0.08 (0.004, 1.52) |
| 17 | EI | Chief complaint: pain+ | 0.06 (0.002, 1.86) |
*Variable significant at P ≤ .05 level, + variable significant at P ≤ .1 level. OR: odds ratio, CI: confidence interval.
False Positive Errors Made by the LR Algorithm
| Category and percentage | Subcategory | Frequency |
|---|---|---|
| Participant Attitude (37.8%) | Generally not interested in research study | 28 |
| Time Restraints (29.7%) | Enrollment process interrupted by patient treatment | 5 |
| Parent(s) occupied by other activities (eg, taking care of the patient, working on insurance issue) | 11 | |
| Being discharged, not willing to stay | 6 | |
| Study Procedures (17.6%) | Could not complete the enrollment (eg, could not use computer or understand the protocol) | 2 |
| Concerns about additional invasive techniques (mainly blood draw) | 6 | |
| Concerns about privacy (access of patient EHR) | 2 | |
| Unspecified concerns | 3 | |
| Patient Status (14.9%) | Patient too tired (eg, sleeping or tired due to long stay in the ED) | 3 |
| Patient too ill (eg, headache and pain) | 8 |