Literature DB >> 35622812

Oncologist phenotypes and associations with response to a machine learning-based intervention to increase advance care planning: Secondary analysis of a randomized clinical trial.

Eric Li¹, Christopher Manz², Manqing Liu¹, Jinbo Chen¹, Corey Chivers¹, Jennifer Braun¹, Lynn Mara Schuchter¹, Pallavi Kumar¹, Mitesh S Patel¹, Lawrence N Shulman¹, Ravi B Parikh¹.

Abstract

BACKGROUND: While health systems have implemented multifaceted interventions to improve physician and patient communication in serious illnesses such as cancer, clinicians vary in their response to these initiatives. In this secondary analysis of a randomized trial, we identified phenotypes of oncology clinicians based on practice pattern and demographic data, then evaluated associations between such phenotypes and response to a machine learning (ML)-based intervention to prompt earlier advance care planning (ACP) for patients with cancer. METHODS AND
FINDINGS: Between June and November 2019, we conducted a pragmatic randomized controlled trial testing the impact of text message prompts to 78 oncology clinicians at 9 oncology practices to perform ACP conversations among patients with cancer at high risk of 180-day mortality, identified using a ML prognostic algorithm. All practices began in the pre-intervention group, which received weekly emails about ACP performance only; practices were sequentially randomized to receive the intervention at 4-week intervals in a stepped-wedge design. We used latent profile analysis (LPA) to identify oncologist phenotypes based on 11 baseline demographic and practice pattern variables identified using EHR and internal administrative sources. Difference-in-differences analyses assessed associations between oncologist phenotype and the outcome of change in ACP conversation rate, before and during the intervention period. Primary analyses were adjusted for patients' sex, age, race, insurance status, marital status, and Charlson comorbidity index. The sample consisted of 2695 patients with a mean age of 64.9 years, of whom 72% were White, 20% were Black, and 52% were male. 78 oncology clinicians (42 oncologists, 36 advanced practice providers) were included. Three oncologist phenotypes were identified: Class 1 (n = 9) composed primarily of high-volume generalist oncologists, Class 2 (n = 5) comprised primarily of low-volume specialist oncologists; and 3) Class 3 (n = 28), composed primarily of high-volume specialist oncologists. Compared with class 1 and class 3, class 2 had lower mean clinic days per week (1.6 vs 2.5 [class 3] vs 4.4 [class 1]) a higher percentage of new patients per week (35% vs 21% vs 18%), higher baseline ACP rates (3.9% vs 1.6% vs 0.8%), and lower baseline rates of chemotherapy within 14 days of death (1.4% vs 6.5% vs 7.1%). Overall, ACP rates were 3.6% in the pre-intervention wedges and 15.2% in intervention wedges (11.6 percentage-point difference). Compared to class 3, oncologists in class 1 (adjusted percentage-point difference-in-differences 3.6, 95% CI 1.0 to 6.1, p = 0.006) and class 2 (adjusted percentage-point difference-in-differences 12.3, 95% confidence interval [CI] 4.3 to 20.3, p = 0.003) had greater response to the intervention.
CONCLUSIONS: Patient volume and time availability may be associated with oncologists' response to interventions to increase ACP. Future interventions to prompt ACP should prioritize making time available for such conversations between oncologists and their patients.

Entities: Chemical

Mesh：

Year: 2022 PMID： 35622812 PMCID： PMC9140236 DOI： 10.1371/journal.pone.0267012

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.752

Introduction

End-of-life care is often not concordant with the goals and wishes of patients with cancer [1]. Early advance care planning has been shown to improve goal-concordant care, decrease end-of-life spending, decrease aggressive care in cancer, and improve patient mood [2-4]. Advances in machine learning (ML) may enable better identification of patients at the highest risk for mortality in order to target interventions for earlier advance care planning discussions (ACPs) [5-10]. Several studies have demonstrated promise in increasing guideline-concordant practice through behavioral interventions targeted towards clinicians [11,12], and there has been similar interest in leveraging behavioral principles to increase the frequency of advance ACP between oncologists and patients. Previous work suggests that targeted ML-based interventions to clinicians can dramatically increase ACPs and palliative care utilization among patients with serious illness. One pragmatic randomized control trial found that an ML-based prompt to oncology clinicians increased rates of ACPs from 3% to 15% of all patients at a large academic cancer center [5,6]. Similar ML-based interventions have been shown to increase ACP documentation [13], reduce length of stay, and increase home palliative care referrals [14]. However, clinicians have heterogeneous responses to such strategies [11], and the efficacy of such interventions across oncology clinician subgroups is not well understood. Identifying subgroups of oncology clinicians that may be more inclined to respond to behavioral interventions to improve ACP may increase the overall effectiveness of such interventions. Latent profile analysis (LPA) is a hypothesis-free statistical approach to identification of clusters of clinicians based on input variables, and has been used in prior studies to identify phenotypes of patients based on a variety of input data types including clinical [15,16], behavioral [17-19], and activity data [17,20,21]. LPA based on clinician demographics and practice patterns may help identify groups of clinicians with differing engagement and response to behavioral interventions to improve ACP frequency. In this secondary analysis of a randomized trial, we derived oncologist phenotypes using LPA and compared ACP rates before and after the intervention by phenotype. We hypothesized that distinct clusters of clinicians would be identified by LPA, with variation in response to the ML-based intervention tested in the trial across clusters of clinicians. Our findings provide an empirical approach to phenotype response to ML interventions in healthcare in order to refine such interventions.

Methods

The University of Pennsylvania Institutional Review Board approved the study. A waiver of informed consent was granted because this was an evaluation of a health system initiative that posed minimal risk to clinicians and patients.

Study design

This was a secondary analysis of a stepped-wedge randomized trial conducted between June 17 to November 1, 2019 which showed that ML-based nudges among 42 specialty or general oncologists, many of whom worked with an advanced practice provider (APP) as an oncologist-APP dyad (78 total clinicians) caring for 14,607 patients led to a quadrupling of ACP rates (NCT03984773). Eligible clinicians in this secondary analysis included physicians and APPs (physician assistants and nurse practitioners) at 9 medical oncology practices within a large tertiary academic center that participated in the trial. We chose oncologist-APP dyads as the unit of analysis because oncologists usually work 1:1 with APPs in our practice and because oncologists and APPs share responsibility for ACPs for patients. Patients of participating oncologists were excluded if they had a documented ACP prior to the start of the trial, or if they were enrolled in another ongoing trial of early palliative care. Medical genetics encounters were also excluded.

Outcome

The primary outcome was the change in ACP rate among all encounters with patients with >10% predicted 180-day mortality risk in the intervention period compared to the pre-intervention period. Any note which utilized the ACP template in the electronic medical record was classified as an ACP.

Intervention

The clinical trial used an ML algorithm which generated predictions of 180-day mortality for cancer patients, and a multi-pronged behavioral intervention to increase ACP frequency based on the generated predictions. The ML algorithm incorporated 3 classes of variables 1) demographic variables, 2) Elixhauser comorbidities and 3) laboratory and select electrocardiogram data. The algorithm utilized a gradient boosted algorithm to identify patients at risk of short-term mortality [22]. Clinicians caring for patients at high risk of short-term mortality predicted risk of mortality >10% were prompted to initiate an ACP through a multipronged intervention incorporating principles of behavioral economics including peer comparisons, performance reports, and opt-out default text messages based on the ML algorithm. Because clinicians received the intervention only for patients with >10% predicted risk of mortality, our primary analysis only included patients with >10% predicted risk of mortality in order to restrict our cohort to the target population of the intervention. Further details of the intervention and clinical trial are published elsewhere [5,6].

Data

11 variables were included in this study based on their conceptual relevance to a clinician’s expected response to the ML intervention. The selected variables were grouped into three categories: demographic, practice pattern, and end-of-life outcomes. Demographic variables included the clinician’s gender and years in practice. Practice pattern variables included the clinician’s oncology subspecialty (e.g. general oncology, thoracic, genitourinary, etc.); number of days in clinic per week [1-5]; percentage of patient encounters with new patients (0–100%); average number of patient encounters per week (continuous); average number of encounters per day; number of years in practice; and baseline ACP rates in the month prior to the start of our randomized trial. End-of-life outcomes metrics were measured in the year prior to the start of our trial among patients who died and who were part of an oncology clinician’s panel. These variables included chemotherapy received within 14 days of death, death in the hospital, and hospice enrollment prior to death. Practice pattern and end-of-life outcome data came was obtained from Clarity, an EPIC reporting database that contains structured data elements of individual EHR data for patients treated at the University of Pennsylvania Health System. Demographic data and years in practice were extracted from an internal database of the Abramson Cancer Center at Penn Medicine.

Oncologist phenotyping

We used latent profile analysis (LPA), applied to the aforementioned variables, to identify phenotypes of oncologists based on their demographic information and practice patterns. LPA is a statistical modelling approach for recovering hidden groups in data by modeling the probability that individuals in the dataset belong to different groups [23]. LPA is conceptually similar to Latent Class Analysis, however, LPA enables recovery of hidden groups based on continuous data whereas latent class analysis is only suitable for analysis of categorical data. Since most of the variables chosen in our analysis are continuous, we used LPA instead of latent class analysis. 11 variables described in the data section were included in the LPA. These variables were not standardized in the analysis as it has no impact on the results of the clustering algorithm. To determine the model of best fit, we used the Akaike information criterion (AIC), Bayesian information criterion (BIC), and entropy. AIC and BIC are two estimators of a model’s prediction error which balance the goodness of fit with model simplicity [24]. Entropy is a commonly used statistical measure of the separation between classes in LPA [25]. The Bootstrapped Likelihood Ratio Test (BLRT) was also used to assess whether a given model with k classes is significantly more informative than one with k-1 classes [26]. We required that each class contain a minimum of 10% of oncologists (n = 5 oncologists). LPA was conducted using the tidyLPA package in R version 3.6.0 [27]. We attached descriptive labels to each of the clusters in order to provide interpretability to the clustering results. Means were calculated and examined for each of the 11 variables included in the clustering analysis, and labels were selected to capture clinically relevant themes shared by most of the clinicians in the cluster, and to capture variability between clusters.

Statistical analysis

Difference-in-difference analyses tested the association between the identified oncologist phenotypes and response to the nudge. Changes in the ACP rate (pre-intervention vs. intervention period) were compared for each phenotype identified by LPA. We fit a multivariable logistic regression model using the clinician phenotype as a predictor for whether the patient received an ACP or not at the patient-level. Covariates included in the model were the interaction term between oncologist phenotype and intervention period, patients’ age (continuous), gender, race, insurance type, marital status, and Charlson comorbidity score. Adjusted probabilities of receiving an ACP accounted for these variables and were calculated by converting the log-odds ratio from the model output for each class pre-intervention and in the intervention period into a probability. Difference-in-difference estimates comparing class 1 and class 2 to class 3 were calculated by taking the difference in intervention response as measured by the difference in pre-intervention adjusted probability of ACP and intervention period adjusted probability of ACP for each of the classes. The adjusted probabilities and difference-in-difference in percentage points with 95% confidence intervals were estimated by bootstrapping, where the data was resampled 1000 times. Statistical significance of the difference-in-differences was calculated by the p-value of the interaction between clinician phenotype and intervention. In a secondary analysis, we used logistic regression to measure the impact of various clinician-level variables on the likelihood of a patient receiving an SIC in both the pre-intervention and intervention periods. The logistic regression was conducted at the level of the patient-wedge with the outcome of SIC receipt. Patient covariates included in the model were patient sex, age, race, insurance status, marital status, and Charlson Comorbidity Index. Clinician-level variables included in model were the number of days in clinic per week, percentage of new patients per week, average patients per week, average encounters per day, years in practice, and end-of-life quality metrics (hospice enrollment rate, inpatient death rate, and chemo utilization at the end of life). All analyses were conducted using R version 3.6.0.

Sensitivity analysis

To analyze whether response to the intervention was similar among all patients regardless of predicted risk of mortality, we applied the aforementioned analysis to all patients, including those with predicted risk of mortality of less than 10%. We compared response to the ML-based intervention by clinician phenotype identified by LPA as described above in Statistical Analysis.

Results

The trial sample consisted of 78 clinicians (of whom 42 were oncologists), 14 607 patients, and 26 059 patient encounters (). In this secondary analysis of a pragmatic randomized control trial, we studied a subset of oncologists and their patient encounters that included ACPs.

CONSORT diagram.

SIC indicates serious illness conversation, a type of ACP.

Clinician characteristics

We studied 42 oncologist and oncologist-APP dyads in this analysis. Among oncologists, 26 (61.9%) were male and 16 (38.1%) were female. 6 (14.3%) were general oncologists and 36 (86%) were specialty oncologists. The median number of years in practice was 7.4 (IQR 5.3, 13.0), and oncologists spent a mean of 2.8 SD (1.1) days in the clinic per week and saw an average of 28.7 SD (15.2) patients per week. The median percentage of new patients seen per week was 21% (IQR 15.8%, 24.1%), and median number of encounters per day was 9.3 (IQR 8.0,11.5).

Model selection

Models with two latent classes and three latent classes were generated. The entropy of the 2-class model and 3-class model were comparable. The 3-class model was selected as the model of best fit by the BLRT (p = 0.010) and because 3-class model had a lower AIC (2678.46 vs. 2689.46) (. In addition, this model was reviewed by the first and senior authors for clinical interpretability and chosen because the 3-class model distinguished between high and low volume specialty clinicians. This model was chosen to ensure the model did not collapse potentially meaningfully different classes into a single class given comparable statistical estimates of prediction error between the 2-class and 3-class models. Each of the three latent classes contained greater than 10% of the total clinician population. Based on this model, three oncologist phenotypes were identified (

Class 1

This class was comprised of 9 oncologists, containing 21% of the total clinician population. Of the three classes, these oncologists had the most years in practice (mean [range], 8.42 [3.59, 37.0]), saw the most patients per week (mean standard deviation, SD]: 53.2 [8.9]), had the highest number of clinic days per week (mean [SD]: 4.4 [0.7]), had the lowest percentage of new patients per week (mean [SD]: 17% [5.7%]) and had lowest baseline ACP rates (mean [SD]: 0.8% [0.7%]), highest chemotherapy use rates within 14 days of death (mean [SD] 7.1% [7.5%]) and intermediate inpatient death rates(mean [SD] 9.9% [6.9%]). This class is comprised primarily of generalist oncologists with high-volume practices.

Class 2

This class was comprised of 5 specialty oncologists, containing 12% of the total study population. Of the three classes, this class had the fewest years in practice (mean [range] 5.26 [2.39, 21.0]), saw the fewest patients per week (mean [SD]: 9.2 [5.6]), had the fewest clinic days per week (mean [SD]: 1.6 [0.9]), saw the highest percentage of new patients per week (mean [SD]: 34% [13.1%]), had the highest baseline ACP rates (mean [SD]: 3.9% [5.0%]), lowest chemotherapy use rates within 14 days of death (mean [SD]: 1.4% [2.8%]) and lowest inpatient death rates (mean [SD]: 5.8% [4.3%]). This class is comprised primarily of specialist oncologists with low-volume practices.

Class 3

This class was the largest class, comprised of 28 specialty oncologists containing 67% of the study sample. Of the three classes, this class tended to have an intermediate number of years in practice (mean [range] 7.43 [2.06, 31.5]), saw an intermediate number of patients per week (mean [SD]: 24.3 [5.8]), had an intermediate number of clinic days per week (mean [SD]: 2.5 [0.6]), intermediate percentage of new patients per week (mean [SD]: 21% [6.2%]) and had an intermediate baseline ACP rates (mean [SD]: 1.6% [1.5%]) as well as highest inpatient death rates(mean [SD]: 17.2% [11.1%]), and intermediate rates of chemotherapy use within 14 days of death (mean [SD]: 6.5% [7.0%]). This class is comprised primarily of specialist oncologists with high-volume practices.

Intervention response by clinician phenotype for high-risk patients

The probability of a high-risk patient (predicted 180-day mortality >10%) receiving an ACP increased significantly following the intervention among patients receiving care from class 1 and class 2 oncologists compared to class 3 oncologists. Among patients receiving care from class 3 oncologists, the adjusted probability of a high-risk patient receiving an ACP increased from 2.3% pre-intervention to 7.6% during the intervention period. Among patients receiving care from class 2 oncologists, the adjusted probability of ACP increased from 3.1% pre-intervention to 20.7% in the intervention period (adjusted percentage-point difference-in-differences relative to class 3 oncologists 12.3, 95% CI 4.3 to 20.3, p = 0.003) (. Class 1 oncologists also had a significantly greater response relative to class 3 oncologists (adjusted percentage-point difference-in-differences relative to class 3 oncologists 3.6, 95% CI 1.0, 6.1, p = 0.006), though the magnitude of this change was not as large as that of class 2 oncologists. The adjusted probability of ACP for class 1 oncologists increased from 1.9% pre-intervention to 10.7% in the intervention period (.

Intervention response by oncologist phenotype for patients with high predicted risk of mortality.

The adjusted probability of a high risk patient (predicted 180-day mortality risk >10%) of receiving an SIC during the pre-intervention and intervention periods by oncologist phenotype. Class 2 oncologists (green) had the highest response to the intervention, with the probability of receiving an ACP increasing from 3.1% during the pre-intervention period to 20.7% during the intervention period. The adjusted probability of ACP increased from 1.9% to 10.7% among class 1 oncologists (blue), and from 2.3% to 7.6% for class 3 oncologists (red). Multivariable logistic regression models were run at the patient level for patients with a predicted 180-day mortality risk of greater than 10% using the clinician phenotype as a predictor for whether the patient received an ACP or not. Covariates included in the model included the interaction term between oncologist phenotype and intervention period, patients’ age (continuous), gender, race, Insurance type, marital status, and Charlson comorbidity score. Adjusted probabilities of receiving an ACP accounted for these variables and were calculated by converting the log-odds ratio from the model output for each oncologist class pre-intervention and during the intervention period into a probability. The adjusted probabilities and difference-in-difference in percentage points with 95% confidence intervals were estimated by bootstrapping, where the data was resampled 1000 times.

Sensitivity analyses: Intervention response by clinician phenotype for all patients in the study cohort

As a sensitivity analysis, we compared the probability of ACP before and after the intervention for all patients (not only high risk patients) across clinician phenotypes. Consistent with the main analysis, the probability of ACP for all patients increased significantly more for class 2 oncologists compared to class 3 oncologists (adjusted percentage-point difference-in-difference 2.6, 95% CI 0.9 to 4.3, p = 0.002). ( The change in ACP rate was not statistically significant for class 1 oncologists compared to class 3 oncologists (adjusted percentage-point difference-in-difference 0.2, 95% CI 0 to 0.4, p = 0.109) ( Multivariable logistic regression models were run at the patient level for all patients in the cohort using the clinician phenotype as a predictor for whether the patient received an ACP or not. Covariates included in the model included the interaction term between oncologist phenotype and intervention period, patients’ age (continuous), gender, race, Insurance type, marital status, and Charlson comorbidity score. Adjusted probabilities of receiving an ACP accounted for these variables and were calculated by converting the log-odds ratio from the model output for each oncologist class pre- and post-intervention into a probability. The adjusted probabilities and difference-in-difference in percentage points with 95% confidence intervals were estimated by bootstrapping, where the data was resampled 1000 times.

Logistic regression on oncologist characteristics associated with likelihood of SIC

In our adjusted secondary regression analysis, specialist oncologists, higher number of days per week in clinic, and higher percentage of new patients per week were associated with significantly greater likelihood of SIC receipt (.

Discussion

In this secondary analysis of a randomized trial analyzing oncology clinician response to an ML-based intervention to increase ACP frequency, we identified three phenotypes of oncology clinicians based on demographic, practice pattern, and end-of-life quality data. While our overall trial was associated with an 11.6 percentage-point increase in ACPs, we found that this response varied considerably among each of the 3 identified phenotypes. In particular, the intervention was associated with a 5.6-fold and 6.7-fold increase in response rates among class 1 oncologists, who consisted primarily of general oncologists with higher patient volumes; and class 2 oncologists, who consisted primarily of specialists with lower patient volumes; compared to class 3 oncologists, who consisted primarily of specialists with higher patient volumes. While prior studies have identified groups of clinicians who vary in their surveyed attitudes towards ML-based clinical support tools [28], this is one of the first studies to identify phenotypes of clinician response to an ML-based clinical intervention studied in a randomized controlled trial and demonstrate significant variation in response to the intervention by phenotype. These findings are consistent with prior analyses, which have demonstrated the feasibility of using a variety of data sources including clinical [15,16], behavioral [17-19], and activity data [17,20,21] to identify subgroups of clinicians and patients with different responses to interventions. These findings have several important implications for future design of ML interventions, particularly those to improve care of advanced illness. First, this analysis suggests mechanisms by which ML-based interventions may increase advance care planning in previous trials [6,13,14]. One possible reason for variable response to an ML-based intervention observed in this study is variation in cognitive workload. Prior studies of physician behavior have found that the frequency of desired behaviors requiring active cognitive effort such as influenza vaccination, antibiotic prescribing, and hand hygiene decline over the course of the day as cognitive workload builds [29-31]. Class 2 oncologists may have responded more strongly to this ML-based intervention due to several factors, including having more time to spend with their patients due to the lower practice volume. Such clinicians also had better baseline performance of ACPs, suggested by their higher baseline rates of ACPs and higher concordance with clinical practice guidelines for end-of-life care. While this analysis did not exhaustively examine all provider and practice pattern characteristics of these oncology clinicians, our analysis suggests that bandwidth and patient volume may be drivers of response to interventions intended to improve advance care planning and clinician-patient interaction. Second, this analysis offers insights into targeting ML-based interventions. Our analysis argues to focus ML-based interventions on clinician phenotypes that may be more likely to respond to such interventions. In contrast, clinicians and health systems should pay careful attention to resource constraints before deploying potentially expensive ML interventions to clinicians with higher patient volumes, who may be less likely or able to respond. While ML-based interventions or EHR-based clinical decision support usually pose little risk to patient safety and outcomes, some studies have found evidence of “alert fatigue” [32] among clinicians. As our present study demonstrates, a small cluster of clinicians may respond strongly to a particular intervention while most clinicians exhibit less response, limiting broad application of the intervention to all clinicians in a practice setting. Targeted deployment of ML-based interventions in the future to clinicians most likely or able to respond, while mitigating alert fatigue or workflow interruptions for clinicians less likely to respond, is a viable strategy for future deployment of ML-based clinician decision support tools. Third, while techniques to characterize patient phenotypes have been utilized in population health to identify patients for targeted interventions for behavior change [33,34], the application of similar techniques to identify groups of clinicians with differential response to ML-based interventions is relatively unexplored [11]. Utilizing clinician-level data available in institutional data stores or EHRs may provide additional insights into clinician behavior and enable better understanding of clinician response to future ML-based interventions and health systems initiatives. Using such techniques allows for better description of which clinicians are responding to an intervention and the magnitude of response. Leveraging the availability of EHR and additional sources of clinician-level data, combined with hypothesis-free techniques for identification of hidden clusters within data, may provide a clearer way to interrogate the efficacy and responses to ML-based interventions. This study has several limitations. First, this trial was conducted within a single tertiary cancer center with limited sample size. The results of our analysis may be influenced by features of individual oncologists who practice at our center, and the results of this study may be difficult to generalize to other settings whose characteristics of oncologists differ from our sample. However, each cluster includes at least 10% of the study population which insulates our results against inappropriate influence of any single clinician on cluster characteristics. Furthermore, our findings regarding the potential association of patient volume with intervention effectiveness is likely generalizable given the intuitive reasons that lower-volume clinicians likely have more time and clinical bandwidth to have these conversations. Additionally, the study included clinicians who practiced at either academic and/or community sites and includes diverse patients across demographics, socioeconomic, cancer type, and comorbidity domains. Thus, we believe this is generalizable to a large proportion of oncology practices and practicing oncologists. Second, we were also limited to studying the effect of the intervention on ACP frequency, as we did not have adequate follow-up to determine the effect of the intervention on end-of-life outcomes. However, ACPs are a guideline-based quality metric in cancer and other advanced illnesses and a surrogate for downstream goal-concordant care [35-37]. Future analyses may study the impact of ML interventions on metrics such as inpatient death rates, chemotherapy utilization, and hospice enrollment, and how the impact of ML-based interventions may vary by clinician phenotype.

Conclusion

Among three phenotypes of oncologists identified by LPA at a large academic medical center, an ML-based intervention to increase ACP frequency had greater effect on class 1 oncologists, which were generally comprised of high-volume generalists, and class 2 oncologists, which were generally comprised of low-volume specialists, compared to class 3 oncologists, which were generally comprised of high-volume specialists. Not all oncologists respond similarly to ML-based interventions, and response to ML-based interventions to guide clinician behavior may in part be determined by a clinician’s cognitive workload and patient volume. Future initiatives to prompt ACP conversations between oncology clinicians and patients should prioritize making time available for such conversations, in order to maximize clinician response. (DOC) Click here for additional data file.

Intervention response by oncologist phenotype for all patients in the study cohort.

The adjusted probability of any patient in the cohort receiving an SIC during the pre-intervention and intervention periods by oncologist phenotype. Class 2 oncologists (green) had the highest response to the intervention, with the probability of receiving an SIC increasing from 0.5% during the pre-intervention period to 3.8% during the intervention period. The adjusted probability of ACP increased from 0.2% to 1.0% among class 1 oncologists, and from 0.3% to 0.9% for class 3 oncologists. (DOCX) Click here for additional data file.

Model fit statistics by number of classes included in the model.

(DOCX) Click here for additional data file.

Association between oncologist phenotype and response to nudges (whole cohort).

(DOCX) Click here for additional data file.

Logistic regression at the patient-wedge level identifying clinician characteristics associated with increased likelihood of conducting an SIC.

(DOCX) Click here for additional data file. (CSV) Click here for additional data file. (CSV) Click here for additional data file.

Transfer Alert

This paper was transferred from another journal. As a result, its full editorial history (including decision letters, peer reviews and author responses) may not be present. 26 Oct 2021

PONE-D-21-24615

Oncologist Phenotypes and Associations with Response to a Machine Learning-Based Intervention to Increase Advance Care Planning: Secondary Analysis of a Randomized Clinical Trial

PLOS ONE Dear Dr. Parikh, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. I want to apologize for the length of time it took to review this manuscript. As you can see we obtained reviews from 3 independent reviewers including one biostatistician. The overall tenor of the reviews is positive and I believe the comments/suggestions provided are reasonable. Please submit your revised manuscript by Dec 10 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Randall J. Kimple Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide. 3. PLOS requires an ORCID iD for the corresponding author in Editorial Manager on papers submitted after December 6th, 2016. Please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field. This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager. Please see the following video for instructions on linking an ORCID iD to your Editorial Manager account: https://www.youtube.com/watch?v=_xcclfuvtxQ 4. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Partly Reviewer #2: Partly Reviewer #3: Yes ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: I Don't Know Reviewer #2: No Reviewer #3: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: No Reviewer #2: No Reviewer #3: No ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: In this secondary analysis of a randomized clinical trial that tested an intervention to increase ACP at a tertiary cancer hospital, the authors performed an analysis of oncologist phenotype to assess whether that impacted the likelihood of increasing ACP discussions. I have 1 major issue with the article and several minor ones below. Mostly, it is very unclear how the phenotypes of the physicians are actually defined. For example, why were there some specialists listed in the generalist phenotype and it is unclear how low volume specialists versus high volume specialists were exactly defined. This makes it very difficult to truly understand the difference between the groups. Minor issues: 1) Why was the initial analysis only done on patient with >10% risk of mortality, with sensitivity analysis on the entire cohort? Why not just do the initial analysis on the entire cohort. 2) In the abstract (page 2 line 36) please add the word patients' when explaining the multivariable analysis variables. 3) When explaining the statistics behind the oncologist phenotyping please explain why you decided to focus on patient volume rather than other provider characteristics (main center versus satellite, age, number of years in practice, etc.) Seems like the authors did have an underlying hypothesis when choosing patient volume as the phenotype to test. 4) Please clarify if the 78 clinicians studied were only in the intervention group of the randomized trial, and if so, shouldn't there only have been 12,170 patient encounters? If both groups were analyzed shouldn't there have been more physicians? 5) As already stated above in the major issues, why were there 3 specialty oncology physicians in the generalists phenotype (how the phenotypes were defined needs to be made more clear). 6) The wording in the response section when describing the difference in response rates between the specialists needs to be a more little more careful. The authors were not studying the individual doctor response rates. Rather they were testing whether the patients were listed as having an SIC and comparing that between the doctor phenotypes. It may seem irrelevant, but it is important statistically as it implies that the individual doctors were being tested for how well they responded to the intervention which is not the case. The statistics were done on a patient level. 7) Discussion: "our analysis suggests that bandwidth and volume are key drivers of response to interventions intended to improve advance care planning and clinician-patient interaction." Please be careful here with your wording. Your analysis does not suggest that these items are key drivers, rather it suggests that they "may be a driver of". Youre multivariate model only included one phenotypic characteristic of physicians (all of the other variables were patient variables). Therefore, you have no idea whether patient volume or other provider variables may be the actual driver. Maybe patient volume is associated with another physician variable (like training programs, gender, etc.) that are the actual drivers. This analysis did not look at that and therefore cannot claim this one provider variable is the key driver. 8) Discussion: "Targeted deployment of ML-based interventions in the future to clinicians most likely or able to respond, while mitigating alert fatigue or workflow interruptions for clinicians less likely to respond, is a viable strategy for future deployment of ML-based clinician decision support tools." One could also argue that an intervention that cannot be implemented by the physicians that are seeing the majority of the patients is probably not a good intervention. If only the doctors who are seeing the least number of patients can intervene than the majority of patients will not be helped by the intervention. Reviewer #2: Thank you to the authors for their hard work and submission and for the opportunity to review this study. This is an interesting secondary analysis of a recently reported trial investigating the use of machine learning to direct behavioral nudges for advanced care planning discussions. This study explores practitioner characteristics to identify potential groups where the intervention may have had a greater effect on practice. Overall, the study is important, a good idea, and interesting. The investigators should be applauded for their work in this area, though I do have a number of comments for clarification, particularly around study design and the conclusions drawn by the authors. Major comments: 1) Overall comment: the use of LPA is creative to try to define clusters/groups of physicians that respond differently to the intervention. However, this is overall less interpretable and more complex, which is reflected in the discussion. Overall, it seems like the investigators' primary objective is to identify characteristics associated with response. To that end, a logistic regression model across characteristics may be the most helpful tool, and I believe it should be included in the study even if it does not end up as a point of emphasis. Otherwise, conclusions are discussed in the context of clusters whose names are potentially overly simplified (comments regarding this challenge below). Rather than generating logistic regression models summarizing physician features with the LPA, it may be more clear to do so with physician characteristics themselves. It may also reduce some of the challenges with small categories caused by the LPA approach. The overall advantage of using the less-transparent LPA approach feels a bit unclear (and less practically useful). 2) LPA: I have a couple of questions for clarification - continuous data is on multiple scales (in this study for instance, clinic days/week versus % new patients versus patient encounters/week). Were these data standardized? Were baseline ACP rates ascertainable from Clarity? 3) The authors used AIC/BIC/entropy approaches to determine the best fit model. More on this decision making process should be discussed (balancing AIC/BIC, etc). AIC/BIC approaches do also have limitations that have been well-discussed in the statistical literature. The concept of "clinical interpretability" should also be discussed further. It is possible that due to the small sample sizes - particularly in the distribution of some characteristics (which result in imbalanced classes) may not allow the generation of highly distinct classes and that the 1 or 2 class models may not be as overfit as the 3 class model. 4) While the results appear to make sense, I think the authors should discuss the limitations of small sample sizes more. Using only 5 oncologists to define a group limits its external generalizability. Only 6 oncologists in the study were generalists, and only 5 were classified into the "low-volume specialists group." While the overall diversity of the general trial (as the authors have highlighted in the discussion) are an overall credit, this also reduces the sizes of each group and makes it more challenging to characterize the subgroups (increasing the brittleness of each group and potential bias). For instance, conclusions drawn on those 6 generalists is highly dependent on those few oncologists; they would also be expected to cluster together as a small group among specialists. Conclusions drawn here may not reflect differences that would be detected if this study was performed exclusively among generalists for instance. I think this limitation may contribute to comment #2 with regards to the models that had fewer classes. 5) On a related note, the "high-volume generalists" category feels like it may be a misnomer - all 6 of the generalists are in this group (which would be expected as a small minority group), but specialists still make up a fair number in the group. Similarly, the small "low-volume specialists" group also has a particularly high baseline ACP rate and fewer years in practice. Limiting the names to specific dimensions loses the resolution/benefit of including all of the variables in the LPA process. Again, logistic regression would help distill some of these features out (such as volume-based metrics). 6) I think there may be an error in line 197 - I think that 82% may be miscalculated. 7) Data availability - I agree that the patient data may not be available, though with 42 oncologists analyzed, I feel like deidentified individual level data should be potentially available for sharing, and I encourage the authors to consider exploring this given the PLOS Data policy. Minor comments: - I think that the "number of oncology clinicians" in Table 1 is a bit misleading as the analysis is based on dyads rather than individual clinicians. - While the authors indicate primacy in line 327-329 in the discussion, I don't believe this is necessarily true. There have been a number of studies now investigating clinician trust and use of AI, particularly in the radiology space. Some of those findings have similar findings as in this study - for instance more junior/trainee radiologists are more likely to follow clinical decision support tools/computer aided diagnosis systems. This historical work should be included and placed in context. Reviewer #3: A secondary analysis of a clinical trial aimed to evaluate associations between phenotypes and response to machine learning based intervention to prompt earlier advance care planning for cancer patients. High-volume specialists had greater response to intervention when compared to low-volume specialists and high-volume generalists. Minor revisions: 1- Line 197: Provide a measure of dispersion, perhaps interquartile range or range, for the median number of years in practice. 2- Line 198: Provide standard deviations for days oncologist spent in clinic and patients seen per week. 3- Line 199-200: Provide a measure of dispersion for these medians. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. 13 Dec 2021 Please see attached response letter, which is pasted below. December 11th, 2021 Randall J. Kimple MD, PhD Academic Editor PLOS ONE Dear Dr. Kimple and Reviewers, Thank you for your thoughtful review of our manuscript, “Oncologist Phenotypes and Associations with Response to a Machine Learning-Based Intervention to Increase Advance Care Planning: Secondary Analysis of a Randomized Clinical Trial.” We appreciate your comments and have responded to each of your concerns below. Manuscript revisions are highlighted in bold with page numbers indicating pages in the clean, revised version of the manuscript. REVIEWER 1 SPECIFIC COMMENTS 1. It is very unclear how the phenotypes of the physicians are actually defined. For example, why were there some specialists listed in the generalist phenotype and it is unclear how low volume specialists versus high volume specialists were exactly defined. This makes it very difficult to truly understand the difference between the groups. We thank the reviewer for these thoughtful comments and the opportunity to further clarify the rationale and methodology behind this study. We labelled clusters identified by latent profile analysis by calculating means of each of the 11 covariates used in the clustering analysis. To enforce some level of interpretability to the clusters, we then examined these means and assigned descriptive labels to the clusters based on what we perceived to be clinically relevant themes shared by most of the clinicians in the cluster. We now better clarify this in our Methods section. For example, all clusters varied significantly in the average patients seen per week: The mean number of patients seen per week was 24.9, 10.5, and 50.2 among the three clusters. Thus, we labelled each cluster using patient volume was chosen as a covariate to characterize these clusters. Specifically, the label “high-volume” was then given to the first and third clusters because their mean number of patients seen per week was significantly higher in those clusters than in the “low-volume” second cluster. As a result, in this analysis, certain generalists were grouped together with specialists because of underlying similarity in other sociodemographic or practice pattern characteristics. Labels such as “generalist” or “specialist” were attached to the clusters based on overarching similarities between clinicians within the cluster and were provided to add clinical interpretability to the results of the analysis. While not every clinician within the cluster may fit the exact labels attached to the cluster, we believe that the labels identify general patterns of similarity between clinicians within the cluster that are clinically relevant. In the second section of our analysis, each clinician cluster had meaningful differences in response to the intervention, reinforcing the distinctiveness of the clusters identified in this analysis. We have clarified our process for generating cluster labels in our revised manuscript as below. Methods lines 172-176: We attached descriptive labels to each of the clusters in order to provide interpretability to the clustering results. Means were calculated and examined for each of the 11 variables included in the clustering analysis, and labels were selected to capture clinically relevant themes shared by most of the clinicians in the cluster, and to capture variability between clusters. 2. Why was the initial analysis only done on patient with >10% risk of mortality, with sensitivity analysis on the entire cohort? Why not just do the initial analysis on the entire cohort. The goal of the behavioral intervention was to increase SIC rates in patients with high risk of mortality. Because clinicians received the intervention only for patients with >10% risk of mortality, including patients with <10% risk of mortality may dilute potential differences between groups. Thus, in this analysis, we restricted the cohort to the population receiving the nudge so that we could isolate differences in clinician responses to the nudge. To analyze whether phenotype response to the intervention was similar in the whole cohort (including patients with lower risk of mortality), the sensitivity analysis was performed on the entire cohort and did not show meaningful differences from the primary analysis. In response to the reviewer’s comment, we have now better clarified this as below: Methods lines 131-134: Because clinicians received the intervention only for patients with >10% predicted risk of mortality, our primary analysis only included patients with >10% predicted risk of mortality in order to restrict our cohort to the target population of the intervention. 3. In the abstract (page 2 line 36) please add the word patients’ when explaining the multivariable analysis variables. We have now included the word patients in our explanations of each of the multivariable analysis variables Abstract lines 36: Primary analyses were adjusted for patients’ sex, age, race, insurance status, marital status, and Charlson comorbidity index. 4. When explaining the statistics behind the oncologist phenotyping please explain why you decided to focus on patient volume rather than other provider characteristics (main center versus satellite, age, number of years in practice, etc.) Seems like the authors did have an underlying hypothesis when choosing patient volume as the phenotype to test. In Table 1, a summary of all variables included in the LPA are presented, broken down by clinician cluster. As explained in the response to Reviewer 1, Comment 1, we analyzed differences in all covariates to label clusters. Because patient volume best distinguished all clusters, we labelled clusters using this covariate. The reviewer’s point about including provider characteristics such as the number of years in practice is well taken. We have now amended language in our results section to make note of this variation. Results lines 261-262: This class was comprised of 9 general oncologists, containing 21% of the total clinician population. Of the three classes, these oncologists had the most years in practice (mean [range], 8.42 [3.59, 37.0]) Results lines 270-271: Of the three classes, low-volume specialists had the fewest years in practice (mean [range] 5.26 [2.39, 21.0]), saw the fewest patients per week (mean [SD]: 9.2 [5.6]), had the fewest clinic days per week (mean [SD]: 1.6 [0.9]), saw the highest percentage of new patients per week (mean [SD]: 34% [13.1%]), Results lines 279-281: Of the three classes, high-volume specialists tended to have an intermediate number of years in practice (mean [range] 7.43 [2.06, 31.5]), saw an intermediate number of patients per week (mean [SD]: 24.3 [5.8]), As for the reviewer’s point on focusing on practice pattern characteristics rather than provider characteristics, we examined all provider characteristics including number of years in practice, specialty vs general oncology in addition to practice pattern characteristics. After reviewing all covariates, practice pattern covariates (in particular, patient volume) most clearly varied by cluster. Thus, we labelled clusters based on these practice pattern characteristics over provider characteristics. For clarity, all covariates including provider covariates and practice pattern covariates are presented in Table 1 and the results and discussion sections. 5. Please clarify if the 78 clinicians studied were only in the intervention group of the randomized trial, and if so, shouldn’t there only have been 12,170 patient encounters? If both groups were analyzed shouldn’t there have been more physicians? The original clinical trial for the behavioral intervention utilized a stepped wedge design. In this trial, 78 total clinicians were originally included in control wedges and then were subsequently randomized in 4-week intervals to the intervention at different times over the course of the trial. 12,170 patient encounters took place during control periods. 14,267 patient encounters took place during intervention periods. Notably, while 78 clinicians in total were included in the original trial, the unit of analysis in the original trial and this analysis was the oncologist-APP dyad because oncologists primarily work 1:1 with APPs and because oncologists and APPs share responsibility for ACPs for patients. Hence, we report phenotypes for the 42 medical oncologist-APP dyads who participated in the trial. 6. As already stated above in the major issues, why were there 3 specialty oncology physicians in the generalists phenotype (how the phenotypes were defined needs to be made more clear). As explained in the response to Reviewer 1, Comment 1, phenotypes were labelled by calculating the mean of all covariates used in the analysis within each LPA-identified cluster, and subsequently identifying the covariates that most clearly define a cluster. For additional detail please see our response to Reviewer 2 Comment 1 below. Because most clinicians in the group identified in the LPA were generalists, we described this group as generalists to provide a conceptual basis for the identified clusters. 7. The wording in the response section when describing the difference in response rates between the specialists needs to be a little more careful. The authors were not studying the individual doctor response rates. Rather they were testing whether the patients were listed as having an SIC and comparing that between the doctor phenotypes. It may seem irrelevant, but it is important statistically as it implies that the individual doctors were being tested for how well they responded to the intervention which is not the case. The statistics were done on a patient level. The reviewer is correct here and we thank the reviewer for pointing out this important distinction. The difference-in-difference analysis measured the ACP rate at the patient-level. The outcome we measured was the change in ACP rate between the pre-intervention period and the intervention period. While the difference in ACP rate does not directly measure the response rate at the physician-level, it nevertheless captures the response to the intervention at the level of the patients who are followed by physicians within the identified cluster. We have clarified this as below: Results lines 289-296: The probability of a high-risk patient (predicted 180-day mortality >10%) receiving an ACP increased significantly following the intervention among patients receiving care from low-volume specialists and high-volume generalists compared to patients receiving care from high-volume specialists. Among patients receiving care from high-volume specialists, the adjusted probability of a high-risk patient receiving an ACP increased from 2.3% pre-intervention to 7.6% during the intervention period. Among patients receiving care from low-volume specialists, the adjusted probability of ACP increased from 3.1% pre-intervention to 20.7% in the intervention period (adjusted percentage-point difference-in-differences relative to high-volume specialists 12.3, 95% CI 4.3 to 20.3, p=0.003) (Table 2). 8. Discussion: “our analysis suggests that bandwidth and volume are key drivers of response to interventions intended to improve advance care planning and clinician-patient interaction.” Please be careful here with your wording. Your analysis does not suggest that these items are key drivers, rather it suggests that they “may be a driver of”. Your multivariate model only included one phenotypic characteristic of physicians (all of the other variables were patient variables). Therefore, you have no idea whether patient volume or other provider variables may be the actual driver. Maybe patient volume is associated with another physician variable (like training programs, gender, etc.) that are the actual drivers. This analysis did not look at that and therefore cannot claim this one provider variable is the key driver. The reviewer is correct in that our analysis does not definitively demonstrate that patient volume and bandwidth are key drivers of response to our intervention. As the reviewer points out, our analysis did not include variables such as history of training programs, or practice location. Of the variables we did include, however, we believe that patient volume are two variables that associate clearly with stronger response to the intervention. To our reviewer’s point, we have amended the language in our discussion to better reflect this point of feedback and to also reflect the fact that our analysis was non-exhaustive in terms of provider characteristics. Discussion lines 388-391: While this analysis did not exhaustively examine all provider and practice pattern characteristics of these oncology clinicians, our analysis suggests that bandwidth and patient volume may be drivers of response to interventions intended to improve advance care planning and clinician-patient interaction. 9. Discussion: "Targeted deployment of ML-based interventions in the future to clinicians most likely or able to respond, while mitigating alert fatigue or workflow interruptions for clinicians less likely to respond, is a viable strategy for future deployment of ML-based clinician decision support tools." One could also argue that an intervention that cannot be implemented by the physicians that are seeing the majority of the patients is probably not a good intervention. If only the doctors who are seeing the least number of patients can intervene than the majority of patients will not be helped by the intervention. This feedback exactly highlights the importance of our present study. The original clinical trial investigating this behavioral intervention found a quadrupling in Serious Illness Conversation rates in the intervention periods compared to pre-intervention periods. By studying the underlying heterogeneity in response to this intervention, we have now found that most of the response was driven by a few clinicians. We agree with our reviewer that this intervention may not be effective for generalist oncologists, given the low observed response rate amongst generalists. This may dissuade some general oncology practices from taking up an intervention such as this. However, we would argue that conversely this is an excellent intervention for clinicians in the other clusters that have demonstrated a significant response to the intervention. By identifying clusters of physicians and underlying heterogeneity in response to our intervention, we identify groups of clinicians that respond well to the intervention and groups of clinicians that exhibit poorer response to our intervention. Understanding the heterogeneity of response to any particular intervention will allow for nudges to be targeted to the clinicians who are most likely to respond, while avoiding undesired effects like alert fatigue as a result of deploying the intervention to higher-volume clinicians less likely or unable to respond, exactly as our reviewer suggests. We have included additional discussion on the limitations of this intervention in our revised manuscript as below. Discussion lines 399-405: As our present study demonstrates, a small cluster of clinicians may respond strongly to a particular intervention while most clinicians exhibit less response, limiting broad application of the intervention to all clinicians in a practice setting. Targeted deployment of ML-based interventions in the future to clinicians most likely or able to respond, while mitigating alert fatigue or workflow interruptions for clinicians less likely to respond, is a viable strategy for future deployment of ML-based clinician decision support tools. REVIEWER 2 SPECIFIC COMMENTS 1. The use of LPA is creative to try to define clusters/groups of physicians that respond differently to the intervention. However, this is overall less interpretable and more complex, which is reflected in the discussion. Overall, it seems like the investigators' primary objective is to identify characteristics associated with response. To that end, a logistic regression model across characteristics may be the most helpful tool, and I believe it should be included in the study even if it does not end up as a point of emphasis. Otherwise, conclusions are discussed in the context of clusters whose names are potentially overly simplified (comments regarding this challenge below). Rather than generating logistic regression models summarizing physician features with the LPA, it may be more clear to do so with physician characteristics themselves. It may also reduce some of the challenges with small categories caused by the LPA approach. The overall advantage of using the less-transparent LPA approach feels a bit unclear (and less practically useful). We thank this reviewer for the thoughtful feedback and consideration of our work. Regarding this reviewer’s major comments, our choice of approach was ultimately informed by the question we sought to answer with this analysis. The purpose of this study was to identify heterogeneity in clinician response to the clinician-directed intervention by identifying distinct phenotypes of oncology clinicians and examining their response to the intervention. Our purpose was not to identify specific clinical or sociodemographic characteristics associated with response to a behavioral intervention, as many of these characteristics may overlap with each other. The question we chose to explore specifically interrogates the heterogeneity of response and whether all clinicians who participated in this trial can be grouped into clusters that predict response to the behavioral intervention, as identifying these distinct clusters would allow future developers of interventions to target the interventions towards phenotypes most likely to response while pursuing different strategies for other phenotypes. We sought to use statistical methods to group clinicians into clusters based on underlying similarities across multiple variables or categories rather than identifying single characteristics associated with response. We chose this approach because we did not want to examine the effect of single variables on intervention response rate in isolation, and because an approach utilizing multiple variable regression may identify a set of characteristics associated with significant intervention response, but it would be unclear how best to apply those results if no single physician fulfills each of those criteria. We wished to use an approach that captures interactions between variables and identifies groups of clinicians based on underlying similarity across multiple variables. We also believe that the output of such an analysis would also be more clinically relevant, as it is easier to sort clinicians into pre-defined clusters based on their similarity to physicians in the clusters. While there is more than one way to cluster clinicians, we decided to use LPA because it is a hypothesis-free approach to clustering and hence relatively robust to human sources of bias, and because it handles continuous variables well. Latent profile analysis produces clusters based on underlying patterns of similarity without input from study personnel. 2. I have a couple of questions for clarification - continuous data is on multiple scales (in this study for instance, clinic days/week versus % new patients versus patient encounters/week). Were these data standardized? Were baseline ACP rates ascertainable from Clarity? We chose to not standardize variables when performing LPA as standardizing variables has no impact on the results of the clustering algorithm. Baseline ACP rates were ascertainable from Clarity and were used to establish pre-intervention ACP rates. We have clarified this in our revised manuscript as below. Methods lines 163-164: 11 variables described in the data section were included in the LPA. These variables were not standardized in the analysis as it has no impact on the results of the clustering algorithm. 3. The authors used AIC/BIC/entropy approaches to determine the best fit model. More on this decision making process should be discussed (balancing AIC/BIC, etc). AIC/BIC approaches do also have limitations that have been well-discussed in the statistical literature. The concept of "clinical interpretability" should also be discussed further. It is possible that due to the small sample sizes - particularly in the distribution of some characteristics (which result in imbalanced classes) may not allow the generation of highly distinct classes and that the 1 or 2 class models may not be as overfit as the 3 class model. The AIC, BIC, and entropy of the 2-class model and 3-class model were comparable. We ultimately utilized a 3-class model because the AIC and BLRT favored the 3-class model Additionally, after observing that the 3-class model distinguished between high and low-volume specialty clinicians, we wanted to explore full spectrum of heterogeneity and wanted to make sure that we did not collapse meaningfully different classes into a single class, given that the statistical estimates of prediction error for 2-class and 3-class models are near-equivalent. We have clarified this point as below. Results lines 225-232: Models with two latent classes and three latent classes were generated. The entropy of the 2-class model and 3-class model were comparable. The 3-class model was selected as the model of best fit by the BLRT (p = 0.010) and because 3-class model had a lower AIC (2678.46 vs. 2689.46) (S1 Table). In addition, this model was reviewed by the first and senior authors for clinical interpretability and chosen because the 3-class model distinguished between high and low volume specialty clinicians. This model was chosen to ensure the model did not collapse potentially meaningfully different classes into a single class given comparable statistical estimates of prediction error between the 2-class and 3-class models. 4. While the results appear to make sense, I think the authors should discuss the limitations of small sample sizes more. Using only 5 oncologists to define a group limits its external generalizability. Only 6 oncologists in the study were generalists, and only 5 were classified into the "low-volume specialists group." While the overall diversity of the general trial (as the authors have highlighted in the discussion) are an overall credit, this also reduces the sizes of each group and makes it more challenging to characterize the subgroups (increasing the brittleness of each group and potential bias). For instance, conclusions drawn on those 6 generalists is highly dependent on those few oncologists; they would also be expected to cluster together as a small group among specialists. Conclusions drawn here may not reflect differences that would be detected if this study was performed exclusively among generalists for instance. I think this limitation may contribute to comment #2 with regards to the models that had fewer classes. We thank this reviewer for pointing out this important nuance in the discussion of our study. Because our study focuses on heterogeneity in response to an intervention, any results will always be limited by the characteristics of the underlying study population. Our study is no different in that our study population was comprised of all oncologists at a large academic institution, although our clinician sample included a range of specialist and generalist clinicians. While clusters may have been labelled differently in a different sample, there is likely generalizability of our results to other academic oncology institutions. We believe that our finding of stronger response to behavioral interventions among lower-volume clinicians with higher baseline ACP rates would likely hold true even among community practices given the intuitive reasons of these clinicians having likely having more time and clinical bandwidth to have these conversations, and likely having higher buy-in to ACPs as suggested by higher baseline ACP rates. We also believe that the clusters identified associate well with the sociodemographic and practice pattern characteristics noted in the cluster labels. Consistent with prior literature and clustering best practices, our analysis identified clusters with a minimum cluster size of at least 10% of the entire study cohort which insulate the results of the analysis against inappropriate influence of any single clinician on cluster characteristic. Finally, we believe that the key positive finding of our study is the identification of clusters of clinicians with heterogeneous response to our behavioral intervention. A similar analysis conducted on a different group of clinicians may yield clusters with different features, however we believe other analyses may find those clusters demonstrate meaningful differences in response to the intervention in question. We have clarified this limitation as below: Discussion lines 425--432: This study has several limitations. First, this trial was conducted within a single tertiary cancer center with limited sample size. The results of our analysis may be influenced by features of individual oncologists who practice at our center and, the results of this study may be difficult to generalize to other settings whose characteristics of oncologists differ from our sample. However, each cluster includes at least 10% of the study population which insulates our results against inappropriate influence of any single clinician on cluster characteristics, and our findings regarding the potential association of patient volume with intervention effectiveness is likely generalizable given the intuitive reasons that lower-volume clinicians likely have more time and clinical bandwidth to have these conversations. 5. On a related note, the "high-volume generalists" category feels like it may be a misnomer - all 6 of the generalists are in this group (which would be expected as a small minority group), but specialists still make up a fair number in the group. Similarly, the small "low-volume specialists" group also has a particularly high baseline ACP rate and fewer years in practice. Limiting the names to specific dimensions loses the resolution/benefit of including all of the variables in the LPA process. Again, logistic regression would help distill some of these features out (such as volume-based metrics). The reviewer is correct in pointing out that the labels used to describe each cluster do not always perfectly describe each clinician in the cluster. Rather, these labels were designed to capture what we believed to be the most clinically distinctive characteristics of the clinicians together as a group or cluster. Please see our response to Reviewer 1 Comment 1 for further discussion of our methodology in choosing cluster labels. We believe that our clustering approach with LPA still has several advantages over a logistic regression based approach when attempting to identify oncologist phenotypes. Certain features associated with heterogeneous response to the intervention only emerge when those features are considered together. Please see our response to Reviewer 2 Comment 1 for further discussion of our logic in choosing LPA over logistic regression. 6. I think there may be an error in line 197 - I think that 82% may be miscalculated. We thank the reviewer for pointing out this error. We have corrected line 206 to 86%. Results, line 214-216: We studied 42 oncologist and oncologist-APP dyads in this analysis. Among oncologists, 26 (61.9%) were male and 16 (38.1%) were female. 6 (14.3%) were general oncologists and 36 (86%) were specialty oncologists. 7. Data availability - I agree that the patient data may not be available, though with 42 oncologists analyzed, I feel like deidentified individual level data should be potentially available for sharing, and I encourage the authors to consider exploring this given the PLOS Data policy. We thank the reviewer for the suggestion. We have now stipulated that we will provide deidentified individual-level data for sharing, per the PLOS Data policy. 8. I think that the "number of oncology clinicians" in Table 1 is a bit misleading as the analysis is based on dyads rather than individual clinicians. For clarity, we have removed this row in Table 1 and instead focus on highlighting oncology-APP dyads as the main unit of analysis, and clustering on sociodemographic and practice pattern characteristics of the oncologist of each dyad. 9. While the authors indicate primacy in line 327-329 in the discussion, I don't believe this is necessarily true. There have been a number of studies now investigating clinician trust and use of AI, particularly in the radiology space. Some of those findings have similar findings as in this study - for instance more junior/trainee radiologists are more likely to follow clinical decision support tools/computer aided diagnosis systems. This historical work should be included and placed in context. As the reviewer points out, there are several papers in the existing literature that surveyed physicians about their attitudes towards machine-learning based clinical decision support tools and artificial intelligence. These studies almost exclusively rely on semi-structured interviews with clinicians, and group the participating physicians into clusters based on review on the interview transcripts. We have now contextualized this prior research in our discussion section; however, we believe that our analysis is one of the first to use a hypothesis-free clustering algorithm to identify clusters and demonstrate meaningful variation by cluster in response to a machine-learning based behavioral intervention. Discussion lines 367-371: While prior studies have identified groups of clinicians who vary in their surveyed attitudes towards ML-based clinical support tools (28), this is one of the first studies to identify phenotypes of clinician response to an ML-based clinical intervention studied in a randomized controlled trial and demonstrate significant variation in response to the intervention by phenotype. REVIEWER 3 SPECIFIC COMMENTS: 1. Line 197: Provide a measure of dispersion, perhaps interquartile range or range, for the median number of years in practice. We have now included the interquartile range for the median number of years in practice. Results line 216-217: The median number of years in practice was 7.4 (IQR 5.3,13.0), 2. Line 198: Provide standard deviations for days oncologist spent in clinic and patients seen per week. We have now included the standard deviations for the number of days in the clinic per week and patients seen per week. Results line 217-218: oncologists spent a mean of 2.8 (SD 1.1) days in the clinic per week and saw a mean of 25 (SD 15.2) patients per week. 3. Line 199-200: Provide a measure of dispersion for these medians. We have included the interquartile ranges for the median percentage of new patients seen per week and median number of encounters per day. Results line 221-222: The median percentage of new patients seen per week was 21% (IQR 15.8%, 24.1%), and median number of encounters per day was 9.3 (IQR 8.0,11.5). Thank you for your consideration. Sincerely, Ravi B. Parikh and Eric H. Li Submitted filename: ResponseToReviewers_PLOS.docx Click here for additional data file. 14 Jan 2022

PONE-D-21-24615R1

Oncologist Phenotypes and Associations with Response to a Machine Learning-Based Intervention to Increase Advance Care Planning: Secondary Analysis of a Randomized Clinical Trial

Please pay particular attention to the request to relabel clusters and the request for an additional regression model.

Please submit your revised manuscript by Feb 28 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:

Table 1

Demographic and practice characteristics of oncologist phenotypes.

	Class 1 (N = 9)	Class 2 (N = 5)	Class 3 (N = 28)	Overall (N = 42)
Gender
Female	3 (33.3%)	2 (40.0%)	11 (39.3%)	16 (38.1%)
Male	6 (66.7%)	3 (60.0%)	17 (60.7%)	26 (61.9%)
Practice specialty
General Oncology	6 (66.7%)	0 (0%)	0 (0%)	6 (14.3%)
Specialty Oncology	3 (33%)	5 (100%)	28 (100%)	36 (82%)
Years in practice
Median [Min, Max]	8.42 [3.59, 37.0]	5.26 [2.39, 21.0]	7.43 [2.06, 31.5]	7.43 [2.06, 37.0]
Days in Clinic Per Week
Mean (SD)	4.44 (0.726)	1.60 (0.894)	2.50 (0.577)	2.81 (1.11)
Percentage of New Patients Per Week
Median [Min, Max]	21% [6%, 25%]	32% [24%,52%]	21% [11%, 36%]	21% [6%, 52%]
Missing	0 (0%)	1 (20.0%)	0 (0%)	1 (2.4%)
Average number of patients per week
Median [Min, Max]	50.2 [39.8, 65.7]	10.5 [0.900, 15.5]	24.9 [13.6, 36.4]	25.4 [0.900, 65.7]
Average number of encounters per day
Median [Min, Max]	12.3 [10.3, 14.4]	5.65 [1.83, 9.88]	9.15 [6.33, 15.5]	9.33 [1.83, 15.5]
Baseline ACP rate
Mean (SD)	0.801% (0.666)	3.94% (4.95)	1.59% (1.50)	1.70% (2.18)
Hospice Enrollment rate at baseline
Mean (SD)	69.4% (17.9)	59.2% (10.7)	61.3% (28.4)	62.9% (25.0)
Missing	0 (0%)	1 (20.0%)	1 (3.6%)	2 (4.8%)
Inpatient death rate at baseline
Mean (SD)	9.88% (6.89)	5.81% (4.28)	17.2% (11.1)	14.4% (10.5)
Missing	0 (0%)	1 (20.0%)	1 (3.6%)	2 (4.8%)
Chemotherapy use at end of life at baseline
Mean (SD)	7.06% (7.46)	1.39% (2.78)	6.50% (6.95)	6.11% (6.84)
Missing	0 (0%)	1 (20.0%)	1 (3.6%)	2 (4.8%)

Table 2

Association between oncologist phenotype and response to the intervention.

Phenotype	Oncologists, n (%)	Patients, n (%)	Adjusted probability of ACP, pre-intervention	Adjusted probability of ACP, intervention	Percentage point difference in differences vs Class 3 (95% CI)	p-value
Class 3	28 (67%)	1883 (70%)	2.3%	7.6%	---	---
Class 1	9 (21%)	673 (25%)	1.9%	10.7%	3.6 (1.0,6.1)	0.006
Class 2	5 (12%)	139 (5%)	3.1%	20.7%	12.3 (4.3, 20.3)	0.003

32 in total

1. An Entropy-Based Measure for Assessing Fuzziness in Logistic Regression.

Authors: Brandi A Weiss; William Dardick
Journal: Educ Psychol Meas Date: 2015-12-31 Impact factor: 2.821

2. Precision Medicine in Alcohol Dependence: A Controlled Trial Testing Pharmacotherapy Response Among Reward and Relief Drinking Phenotypes.

Authors: Karl Mann; Corey R Roos; Sabine Hoffmann; Helmut Nakovics; Tagrid Leménager; Andreas Heinz; Katie Witkiewitz
Journal: Neuropsychopharmacology Date: 2017-11-20 Impact factor: 7.853

3. Cardiovascular Function Phenotypes in Response to Cardiotoxic Breast Cancer Therapy.

Authors: Biniyam G Demissei; Brian S Finkelman; Rebecca A Hubbard; Amanda M Smith; Hari K Narayan; Vivek Narayan; Payal Shah; Adam J Waxman; Susan M Domchek; Bonnie Ky
Journal: J Am Coll Cardiol Date: 2019-01-22 Impact factor: 24.094

4. Association between early palliative care referrals, inpatient hospice utilization, and aggressiveness of care at the end of life.

Authors: Koji Amano; Tatsuya Morita; Ryohei Tatara; Hirofumi Katayama; Teruaki Uno; Ibuki Takagi
Journal: J Palliat Med Date: 2014-09-11 Impact factor: 2.947

5. Behavioral Phenotyping in Health Promotion: Embracing or Avoiding Failure.

Authors: Shreya Kangovi; David A Asch
Journal: JAMA Date: 2018-05-22 Impact factor: 56.272

6. Measuring Goal-Concordant Care: Results and Reflections From Secondary Analysis of a Trial to Improve Serious Illness Communication.

Authors: Justin J Sanders; Kate Miller; Meghna Desai; Olaf P Geerse; Joanna Paladino; Jane Kavanagh; Joshua R Lakin; Bridget A Neville; Susan D Block; Erik K Fromme; Rachelle Bernacki
Journal: J Pain Symptom Manage Date: 2020-06-26 Impact factor: 3.612

7. A personalized BEST: characterization of latent clinical classes of nonischemic heart failure that predict outcomes and response to bucindolol.

Authors: David P Kao; Brandie D Wagner; Alastair D Robertson; Michael R Bristow; Brian D Lowes
Journal: PLoS One Date: 2012-11-07 Impact factor: 3.240

8. Can machine-learning improve cardiovascular risk prediction using routine clinical data?

Authors: Stephen F Weng; Jenna Reps; Joe Kai; Jonathan M Garibaldi; Nadeem Qureshi
Journal: PLoS One Date: 2017-04-04 Impact factor: 3.240

9. Latent profile analysis of accelerometer-measured sleep, physical activity, and sedentary time and differences in health characteristics in adult women.

Authors: Kelsie M Full; Kevin Moran; Jordan Carlson; Suneeta Godbole; Loki Natarajan; Aaron Hipp; Karen Glanz; Jonathan Mitchell; Francine Laden; Peter James; Jacqueline Kerr
Journal: PLoS One Date: 2019-06-27 Impact factor: 3.240

10. Phenotyping physician practice patterns and associations with response to a nudge in the electronic health record for influenza vaccination: A quasi-experimental study.

Authors: Sujatha Changolkar; Jeffrey Rewley; Mohan Balachandran; Charles A L Rareshide; Christopher K Snider; Susan C Day; Mitesh S Patel
Journal: PLoS One Date: 2020-05-20 Impact factor: 3.240