Literature DB >> 35885718

Estimation of Surgery Durations Using Machine Learning Methods-A Cross-Country Multi-Site Collaborative Study.

Sean Shao Wei Lam1,2, Hamed Zaribafzadeh3,4, Boon Yew Ang1, Wendy Webster3, Daniel Buckland3,5, Christopher Mantyh3, Hiang Khoon Tan2,6.   

Abstract

The scheduling of operating room (OR) slots requires the accurate prediction of surgery duration. We evaluated the performance of existing Moving Average (MA) based estimates with novel machine learning (ML)-based models of surgery durations across two sites in the US and Singapore. We used the Duke Protected Analytics Computing Environment (PACE) to facilitate data-sharing and big data analytics across the US and Singapore. Data from all colorectal surgery patients between 1 January 2012 and 31 December 2017 in Singapore and, 1 January 2015 to 31 December 2019 in the US were used, and 7585 cases and 3597 single and multiple procedure cases from Singapore and US were included. The ML models were based on categorical gradient boosting (CatBoost) models trained on common data fields shared by both institutions. The procedure codes were based on the Table of Surgical Procedure (TOSP) (Singapore) and the Current Procedural Terminology (CPT) codes (US). The two types of codes were mapped by surgical experts. The CPT codes were then transformed into the relative value unit (RVU). The ML models outperformed the baseline MA models. The MA, scheduled durations and procedure codes were found to have higher loadings as compared to surgeon factors. We further demonstrated the use of the Duke PACE in facilitating data-sharing and big data analytics.

Entities:  

Keywords:  big data analytics; data sharing; machine learning; multi-country; multi-site; surgical durations

Year:  2022        PMID: 35885718      PMCID: PMC9319102          DOI: 10.3390/healthcare10071191

Source DB:  PubMed          Journal:  Healthcare (Basel)        ISSN: 2227-9032


1. Introduction

Operating Rooms (ORs) account for a significant proportion of a hospital’s total revenue and about 40% of the hospital’s total expenses. [1] With a global estimate of 312.9 million surgeries performed each year, the effective planning of OR resources is an essential process that has a large impact on hospital surgical processes worldwide [2]. The scheduling of surgical procedures plays an important role in the OR planning process and has a direct impact on resource utilization, patient outcomes and staff welfare. Optimal scheduling of OR slots for surgeries prevents under-utilization of costly surgical resources as well as delays which cause unfavorable waiting times. Overtime resulting from sub-optimal schedules also leads to staff dissatisfaction as well as burnout from long working hours [3,4]. One key factor required in optimal scheduling of OR slots is the accurate prediction of surgery duration [3,5]. However, the variability in patients’ conditions and the type of surgical procedures and techniques required and the uncertainties around these patient and provider related factors present challenges for the prediction of surgery durations [6,7]. In recent years, many studies around the world have reported the use of various machine-learning (ML) methods to accurately predict surgery duration [8,9,10]. Linear regression techniques have been explored using patient and surgical factors and have reported the importance of such variables in predicting the total surgical procedure time [11]. Distributional modelling methods such as Kernel Density Estimation (KDE) [12] as well as log-normal distributions [13] were also demonstrated to be able to effectively predict surgery duration. More complex methods such as heteroscedastic neural network regression combined with expressive drop-out regularized neural networks have also been shown to have good performance [14]. Ensemble tree-based methods such as random forests were also used to predict surgery duration and cross validation showed that it outperformed other methods, reducing the mean absolute percentage error by 28%, when compared to current hospital estimation approaches [15]. A multi-center study based on two large European teaching hospitals has also demonstrated the use of a parsimonious lognormal modelling approach to improve the estimation of surgery duration and OR efficiency across more than one hospital [13]. The significance of multiple factors affecting surgical duration predictions may vary across hospital systems due to differences in case scheduling practices, surgical techniques and intraoperative processes. A broader understanding of the differences in the characteristics of the scheduling processes as well as scheduling behavior will lead to further insights into improving such estimations. In order to determine hospital level differences for estimating surgery duration, there is a need to go beyond simple distributional approaches to understand the multifactorial effects influencing the length of surgery durations across multiple sites. ML models such as gradient boosted trees, which utilize error residuals to improve the performance of ensemble models, have also been reported for other clinical prediction models, such as anterior chamber depth (ACD) in cataract surgery [16]. Although previous studies have examined the use of various ML prediction modelling methods to provide more accurate estimates, most of these studies offer results that are specific to a single institution with a minority that derived prediction models validated with external institutional data, albeit from the same country [3,6,10,11,13,14,17]. Similar to other use cases where prediction models are developed based on an extensive use of real-world data, a key reason resulting in the difficulty of conducting multi-site across multiple countries is the lack of data sharing and of governance infrastructure to support collaborative work. This impediment can be further magnified when the sharing of data has to occur over multiple jurisdictions. This has resulted in a scarcity of published studies that can cover multi-institutional data across countries or continents, thereby reducing the external validity and generalizability of the prediction models. In this study, we aim to determine the performance of current surgery case duration estimations and the use of machine learning models to predict surgery duration across two large teaching hospitals in the United States and Singapore. To facilitate deep collaboration between both hospitals and the sharing of large-scale datasets required for the development of the ML models, we will introduce the use of Duke Protected Analytics Computing Environment (PACE) [18]. PACE is a collaborative platform for facilitating data-sharing and analysis across both healthcare institutions. This study has demonstrated value in the use of PACE for a cross border and multi-institutional studies in the evaluation of surgical durations across institutions.

2. Materials and Methods

The two study hospitals SH-1 and SH-2 described in this study are from Singapore, a city-state in Southeast Asia, and Durham, North Carolina, a state in the United States, respectively. SH-1 is the Singapore General Hospital (SGH), which is one of the largest comprehensive public hospitals in Singapore under the Singapore Health Services (SingHealth) public healthcare cluster. SGH is a tertiary multidisciplinary academic hospital which comprises more than 30 clinical disciplines and approximately 1700 inpatient beds and provides acute and specialist care to over one million patients per year [19]. The hospital saw more than 25,000 surgeries in 2019. SH-2 is Duke University Hospital, a full-service tertiary and quaternary care hospital that is part of the Duke University Health System in Durham, North Carolina. Duke University Hospital has 957 inpatient beds, 51 operating rooms, an endo-surgery center and an ambulatory surgery center with nine operating rooms. The hospital offers multidisciplinary care and serves as a regional emergency/trauma center where 42,554 patients were admitted in 2020 [20,21]. Ethics approval for the study was exempted by both the SingHealth’s Centralized Institutional Review Board (SingHealth CIRB Reference: 2018-2558) and Duke Institutional Review Board (Duke IRB Reference: Pro00104275) for both study hospitals.

2.1. Cross Country Collaborative Platform

The Duke PACE [22] is a secured virtualized network environment where researchers can collaborate and perform analysis with protected health information. PACE simplifies the process of obtaining and sharing protected data from the electronic medical record (EMR) systems. Datasets from both study hospitals are shared and analyzed jointly by the study team through the PACE system. The use of PACE requires video-based training and a rigorous account request and approval process for Duke University employees and affiliates. Data loaded in PACE has to be HIPAA compliant. Ethics approval or exemption has to be given by the ethics review boards of the respective study hospitals. Data was extracted from the SH-1 EMR system based on the Sunrise Clinical Manager, Allscripts [23], extracted through the enterprise data warehouse, electronic Health Intelligence System-eHIntS [24]. Data from the SH-2 EMR system were extracted from the Duke Health enterprise data warehouse and Duke’s Maestro Care (Epic) EMR system [25]. These data were loaded into PACE and then served through a secured Duo multifactor authentication gateway [26] for access by collaborators across the two countries with approved network IDs. The analysis was performed with Python 3.6, Python Software Foundation [27], with the required packages loaded into the PACE environment. The hardware provisioned in PACE for this study was Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (2 processors) (Intel Corporation) and 32 GB of RAM running on Windows 10 Enterprise operating system (Microsoft Corporation). Access was time-bound based on the approved period of study according to the respective ethics review boards’ decisions.

2.2. Descriptive Analysis

We performed a retrospective analysis of all patients who had undergone colorectal surgery between 1 January 2012 and 31 December 2017 for SH-1 and 1 January 2015 to 31 December 2019 for SH-2. Common data fields were mapped between datasets from both study sites and used in the study. Patient demographics included age, gender, height, weight and body mass Index (BMI). Surgery related factors included surgery procedure codes, number of procedures done in the surgery, first and second surgeons codes, principal anesthetist codes, anesthesia type (local, general or regional anesthesia), patient case type (inpatient or day surgery), OR location, OR code and ASA scores. The Table of Surgical Procedures (TOSP) is a categorical variable used for billing purposes [28]. TOSP codes provide some information on the complexity of the procedure codes used in SH-1, where higher levels represent greater complexity. SH-2 uses the categorical Current Procedural Terminology (CPT) codes that similarly show the complexity of the procedures and services [29]. The TOSP and CPT codes for colorectal surgeries used in this study were mapped by the surgical domain experts and shown in Table A1, Appendix A. The list of mapped fields across the two institutions is shown in Table 1. In SH-2, the categorical CPT codes per case were transformed into the relative value unit (RVU) which is a single continuous variable (see Table 2). The RVU is a consensus driven billing indicator that can serve as a proxy for procedure workload and replace CPT codes as a more informative feature of surgical duration predictions [30,31]. The scheduled/listing duration for the surgical case as well as the moving average (MA) durations were also included.
Table A1

Mapping of CPT to TOSP codes.

Anatomical SystemType of ProcedureCPT CodeTOSP CodesTOSP Table
DigestiveHepatectomy47120SF815L4C
DigestiveHepatectomy47120SF813L5C
DigestiveHepatectomy47122SF809L7C
DigestiveHepatectomy47125SF812L6B
DigestiveHepatectomy47130SF812L6B
DigestiveAppendectomy44950SF849A3B
DigestiveAppendectomy44950SF723A4A
DigestiveAppendectomy44960SF849A3B
DigestiveAppendectomy44960SF723A4A
DigestiveAppendectomy44970SF849A3B
DigestiveAppendectomy44970SF723A4A
DigestiveColorectal44140SF701C6C
DigestiveColorectal44140SF803C5C
DigestiveColorectal44140SF806C5C
DigestiveColorectal44143SF808R5C
DigestiveColorectal44144SF808R5C
DigestiveColorectal44145SF805R6C
DigestiveColorectal44145SF703R6C
DigestiveColorectal44145SF807R6B
DigestiveColorectal44146SF805R6C
DigestiveColorectal44146SF703R6C
DigestiveColorectal44146SF807R6B
DigestiveColorectal44147SF703R6C
DigestiveColorectal44150SF804C6A
DigestiveColorectal44150SF712C6A
DigestiveColorectal44151SF804C6A
DigestiveColorectal44151SF712C6A
DigestiveColorectal44160SF803C5C
DigestiveColorectal44204SF701C6C
DigestiveColorectal44204SF803C5C
DigestiveColorectal44204SF806C5C
DigestiveColorectal44205SF803C5C
DigestiveColorectal44206SF808R5C
DigestiveColorectal44207SF805R6C
DigestiveColorectal44207SF703R6C
DigestiveColorectal44207SF807R6B
DigestiveColorectal44208SF805R6C
DigestiveColorectal44208SF703R6C
DigestiveColorectal44208SF807R6B
DigestiveColorectal44210SF712C6A
DigestiveColorectal44210SF804C6A
DigestiveEsophagectomy43101SF802E5B
DigestiveEsophagectomy43107SF809E7B
DigestiveEsophagectomy43108SM702L7C
DigestiveEsophagectomy43112SF809E7B
DigestiveEsophagectomy43112SM702L7C
DigestiveEsophagectomy43113SF809E7B
DigestiveEsophagectomy43113SM702L7C
DigestiveEsophagectomy43116SF806E7C
DigestiveEsophagectomy43117SF804E6B
DigestiveEsophagectomy43117SF809E7B
DigestiveEsophagectomy43118SF804E6B
DigestiveEsophagectomy43118SF809E7B
DigestiveEsophagectomy43121SF804E6B
DigestiveEsophagectomy43122SF804E6B
DigestiveEsophagectomy43123SF804E6B
DigestiveEsophagectomy43124SF812E3A
DigestiveEsophagectomy43124SF806E7C
DigestivePancreatectomy48120SF705P4C
DigestivePancreatectomy48120SF706P5A
DigestivePancreatectomy48140SF708P5B
DigestivePancreatectomy48145SF809P7C
DigestivePancreatectomy48145SF712P5C
DigestivePancreatectomy48146SF703P7A
DigestivePancreatectomy48146SF704P7A
DigestivePancreatectomy48148SF807B5C
DigestivePancreatectomy48150SF809P7C
DigestivePancreatectomy48152SF809P7C
DigestivePancreatectomy48153SF809P7C
DigestivePancreatectomy48154SF809P7C
DigestivePancreatectomy48155SF809P7C
DigestiveColorectal44155SF712C6A
DigestiveColorectal44155SF804C6A
DigestiveColorectal44155SF805C6B
DigestiveColorectal44156SF805C6B
DigestiveColorectal44157SF805C6B
DigestiveColorectal44157SF713C6C
DigestiveColorectal44158SF713C6C
DigestiveColorectal44211SF713C6C
DigestiveColorectal44212SF712C6A
DigestiveColorectal44212SF804C6A
DigestiveColorectal44212SF805C6B
DigestiveColorectal45110SF845A6B
DigestiveColorectal45110SF805R6C
DigestiveColorectal45111SF805C6B
DigestiveColorectal45111SF701R5C
DigestiveColorectal45112SF807R6B
DigestiveColorectal45113SF807R6B
DigestiveColorectal45114SF701R5C
DigestiveColorectal45116SF701R5C
DigestiveColorectal45119SF807R6B
DigestiveColorectal45120SF803R5C
DigestiveColorectal45120SF700R5C
DigestiveColorectal45121SF803R5C
DigestiveColorectal45126SF703R6C
DigestiveColorectal45126SF808R5C
DigestiveColorectal45126SF805A6B
DigestiveColorectal45130SF700R5C
DigestiveColorectal45135SF700R5C
DigestiveColorectal45160SF701R5C
DigestiveColorectal45395SF805C6B
DigestiveColorectal45395SF805C6B
DigestiveColorectal45397SF713C6C
DigestiveColorectal45402SF701R5C
DigestiveColorectal45550SF701R5C
EndocrineThyroid60200SJ801T3B
EndocrineThyroid60210SJ802T4A
EndocrineThyroid60212SJ802T4A
EndocrineThyroid60220SJ804T6A
EndocrineThyroid60220SJ802T4A
EndocrineThyroid60225SJ804T6A
EndocrineThyroid60225SJ802T4A
EndocrineThyroid60240SJ803T5C
EndocrineThyroid60240SJ703T6C
EndocrineThyroid60252SJ702T6A
EndocrineThyroid60254SJ702T6A
EndocrineThyroid60260SJ702T6A
EndocrineThyroid60270SJ702T6A
EndocrineThyroid60271SJ702T6A
ReproductiveHysterectomy/Myomectomy58140SI816U3B
ReproductiveHysterectomy/Myomectomy58146SI815U5A
ReproductiveHysterectomy/Myomectomy58150SI803U4A
ReproductiveHysterectomy/Myomectomy58150SI804U5C
ReproductiveHysterectomy/Myomectomy58150SI805U5C
ReproductiveHysterectomy/Myomectomy58150SI812U5C
ReproductiveHysterectomy/Myomectomy58152SI702U4C
ReproductiveHysterectomy/Myomectomy58180SI802U4A
ReproductiveHysterectomy/Myomectomy58210SI825U5C
ReproductiveHysterectomy/Myomectomy58210SI827U5A
ReproductiveHysterectomy/Myomectomy58210SI828U4A
ReproductiveHysterectomy/Myomectomy58240SI824U6B
ReproductiveHysterectomy/Myomectomy58260SI837U4A
ReproductiveHysterectomy/Myomectomy58260SI713V4A
ReproductiveHysterectomy/Myomectomy58262SI723U4B
ReproductiveHysterectomy/Myomectomy58263SI721U4B
ReproductiveHysterectomy/Myomectomy58270SI713V4A
ReproductiveHysterectomy/Myomectomy58290SI837U4A
ReproductiveHysterectomy/Myomectomy58290SI713V4A
ReproductiveHysterectomy/Myomectomy58291SI723U4B
ReproductiveHysterectomy/Myomectomy58292SI721U4B
ReproductiveHysterectomy/Myomectomy58294SI713V4A
ReproductiveHysterectomy/Myomectomy58541SI713U4B
ReproductiveHysterectomy/Myomectomy58542SI713U4B
ReproductiveHysterectomy/Myomectomy58543SI712U5A
ReproductiveHysterectomy/Myomectomy58544SI712U5A
ReproductiveHysterectomy/Myomectomy58545SI709U3C
ReproductiveHysterectomy/Myomectomy58546SI700O4B
ReproductiveHysterectomy/Myomectomy58548SI800O 5C
ReproductiveHysterectomy/Myomectomy58548SI804O4A
ReproductiveHysterectomy/Myomectomy58550SI718U4B
ReproductiveHysterectomy/Myomectomy58552SI718U4B
ReproductiveHysterectomy/Myomectomy58553SI718U4B
ReproductiveHysterectomy/Myomectomy58554SI718U4B
ReproductiveHysterectomy/Myomectomy58570SI713U4B
ReproductiveHysterectomy/Myomectomy58572SI712U5A
ReproductiveHysterectomy/Myomectomy58940SI805O3B
ReproductiveHysterectomy/Myomectomy58951SI800O5C
ReproductiveHysterectomy/Myomectomy58951SI711U6A
ReproductiveHysterectomy/Myomectomy58953SI804O4A
ReproductiveHysterectomy/Myomectomy58954SI800O5C
ReproductiveHysterectomy/Myomectomy58954SI804O4A
MusculoskeletalTHA27125SB838H5C
MusculoskeletalTHA27130SB839H6A
MusculoskeletalTHA27130SB723H6B
MusculoskeletalTHA27132SB724H6C
MusculoskeletalTHA27134SB724H6C
MusculoskeletalTHA27137SB724H6C
MusculoskeletalTHA27138SB724H6C
KidneyNephrectomy50220SG816K4B
KidneyNephrectomy50225SG816K4B
KidneyNephrectomy50230SG804K5C
KidneyNephrectomy50234SG800K5C
KidneyNephrectomy50236SG800K5C
KidneyNephrectomy50240SG721K5C
KidneyNephrectomy50543SG720K6A
KidneyNephrectomy50545SG710K6A
KidneyNephrectomy50546SG700K6A
KidneyNephrectomy50546SG722K4C
KidneyNephrectomy50548SG700K6A
Table 1

Field mapping between SH-1 and SH-2.

SNSH-1 Data FieldsSH-2 Data Fields
1OT CodeRoom
2Actual DurationIn-Out Duration
3First Surgeon Department CodeService Type
4Priority of OperationCase Class
5Department CodeDivision
6OT Location CodeLocation
7Procedure CodeCPT List
8Type of AnesthesiaPrimary Anesthesia Type
9ASA StatusASA Rating
10AgePatient Age
11GenderSex
12Visit TypePatient Class
13BMIBMI
14HeightHeight
15WeightWeight
16First Surgeon IDPrimary Physician ID
17Second Surgeon IDSecondary Physician ID
18Principal Anesthetist IDFirst Anesthetist ID
19MA 1 year_3rdMA 1 year_3rd (calculated)
20Number of ProceduresNumber of Procedures
21Number of PanelsNumber of Panels
22Multiple Procedure CodesSorted CPT List
23Listing DurationScheduled Duration

Notations: OT: Operating Theatre; CPT: Current Procedure Terminology (transformed into Relative Value Units); ASA: American Society of Anesthesiology; BMI: Body Mass Index; ID: Unique identifier; MA: Moving Average.

Table 2

List of CatBoost models compared (List of features present in each model are shown in Table A2 in the Appendix A).

NameFeatures Considered
Model 0Baseline Model which considered patient and surgery factors only
Model 1Baseline Model + RVU/Procedure Surgical Table Code
Model 2Baseline Model + Moving Average
Model 3Baseline Model + Scheduled Duration
Model 4Baseline Model + Moving Average + Scheduled Duration
Model 5Baseline Model + Moving Average + Scheduled Duration + RVU/Procedure Surgical Table Code
A total of 7585 cases and 3597 cases from SH-1 and SH-2 were included respectively. The data cleaning process is shown in Figure 1 for both SH-1 and SH-2. The mean duration for SH-1 and SH-2 were 102 min and 128 min, respectively. The mean patient age across both sites were 54.8 and 54.4 years, respectively.
Figure 1

Data-cleaning process (MA: Moving Average).

2.3. Moving Average Estimation

The existing EMR systems for both SH-1 and SH-2 surgical case management both adopt a moving average (MA) prediction of historical surgery durations to provide an estimated surgery duration for each surgery case. SH-1 uses Allscripts [23], whilst SH-2 uses the EPIC system [25]. The MA algorithm calculates a historical moving average of actual surgery duration by grouping surgical procedure codes and surgeon codes over a specified period of time. The OR schedulers, who schedule surgery cases into the respective EMR systems, are able to override the estimates with their own estimated durations. The historical moving average of the actual case length for each procedure and surgeon code combination was used as the prediction for the next surgery of the same procedure code and conducted by the same surgeon. If there are less than five cases for a particular surgeon and procedure code combination, the MA of the surgical duration for that particular procedure (regardless of surgeon) is utilized. If the data is insufficient for this grouping, no MA estimate will be provided and the scheduler will have to provide an estimate instead. If there are sufficient data for the MA estimates, data below the 10th percentile and above the 90th percentile will be excluded from the MA calculation. If there are no manual overrides on the MA prediction, the MA-based duration is then recorded as the scheduled duration and is used to schedule cases in the system. The existing estimates as well as the scheduled durations will be used as the baseline against which new predictions developed by machine learning algorithms in this study will be compared. Cases which were listed as emergency cases as well as those with missing actual surgery duration were excluded for the study. Both single and multiple procedure surgeries were included. The data cleaning process is summarized in Figure 1. Cases with missing values for either scheduled duration or MA duration were excluded. Outcome metrics were compared by available cases by individual duration type, as well as a common set of valid cases. The outcome of interest was the difference between the predicted and the actual surgery durations for each surgical case. Surgery duration was defined as the time taken between the point when the patient is wheeled into the OR and when the patient is wheeled out of the OR. For each comparison, we compared the Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE) and the percentage of cases within 20% of scheduled duration with the predicted duration. The predictive models were generated with the categorical gradient boosting (CatBoost, version 0.2.1) [32] package in Python 3.6 [27]. The dataset of each study hospital was split into 80% for training and 20% for testing. The models were trained on the common data fields (Patient and Surgery factors) shared by both datasets, with different permutations of additional key variables such as the Moving Average, Scheduled/Listing Duration and TOSP Code/RVU. The CatBoost models with features listed in Table 2 were compared. SH-1’s models were trained and tested on SH-1’s dataset and SH-2’s models were trained and tested on SH-2’s dataset. Hyperparameter optimization was conducted using a grid search on a four-fold cross validation performed to determine the optimal parameters for the models. The following parameters were used for the CatBoost models-Number of Iterations: 200; Maximum Tree Depth: 5, 6, 7, 8, 9, 10, 11, Learning Rate: 0.01, 0.03, 0.05, 0.1, 0.2, 0.3; Loss Function: Root Mean Square Error (RMSE). The parameters which provided the lowest cross validation RMSE score were chosen for the final model. Feature importance for the CatBoost models were evaluated based on the amount that the prediction value changes with respect to a change in the predictor variable [32].

3. Results

Scheduler and System Average Performance

Table 3 compares the performance of the scheduled duration and the MA duration against the actual surgery duration. The MA algorithm provides better performance across all evaluation metrics for both datasets as compared to the scheduled durations. Proportion of cases with the actual duration falling within 80–120% of the listed duration is higher in SH-2. Higher proportion of cases were found to be overestimated in the SH-2 dataset, whereas for SH-1 (>80% of actual duration) higher proportion of cases were found to be underestimated (<80% of actual duration) (see Figure 2).
Table 3

Performance of scheduled vs. MA duration.

SH-1SH-2
ScheduledMAScheduledMA
N (cases) 7685768535973597
RMSE 61.551.557.548.2
MAE (mins) 37.729.234.829.5
MAPE (%) 7.49%2.40%15.91%5.54%
<=80% 41.0%36.8%20.2%24.7%
80–120% 24.8%36.6%41.4%49.8%
>=120% 34.2%26.6%38.4%25.5%

RMSE: Root Mean Squared Error; MAE: Mean Absolute Error; MAPE: Mean Absolute Percentage Error.

Figure 2

Proportion of Cases within 20% of Actual Durations.

Table 4 and Table 5 show the performance of the various SH-1 and SH-2 predictive models based on the test dataset. The results showed that with the ML-based models (Models 0–5), SH-1 could at least predict 40% of cases accurately within +/−20% of the actual duration, while SH-2 could at least predict approximately 50% of its cases within +/−20% of the actual duration. Based on the +/−20% prediction band, the ML-based models in both hospitals showed better prediction accuracy than the existing MA models that each individual hospital uses.
Table 4

SH−1 Accuracy & Error Metrics Comparison of Models.

ModelPercentage within +\−20% RMSEMAEMAPE
Listing 24.68%62.3137.50565.57%
MA 37.66%55.1628.84446.85%
Model 0 40.31%48.1526.32336.74%
Model 1 43.15%47.8825.22134.61%
Model 2 43.28%47.3024.93835.56%
Model 3 41.34%46.2625.42634.97%
Model 4 42.89%45.3024.32534.50%
Model 5 44.06%45.1823.98634.40%

RMSE: Root Mean Squared Error; MAE: Mean Absolute Error; MAPE: Mean Absolute Percentage Error.

Table 5

SH-2 Accuracy & Error Metrics Comparison of Models.

ModelPercentage within +\−20%RMSEMAEMAPE
Listing 43.06%53.5732.16727.63%
MA 48.33%45.3928.1927.30%
Model 0 49.86%50.84530.49227.23%
Model 1 52.78%38.81724.41223.83%
Model 2 55.42%40.925.52924.90%
Model 3 55.28%43.20826.1824.54%
Model 4 55.42%39.36724.51823.79%
Model 5 56.11%38.48223.6123.36%

RMSE: Root Mean Squared Error; MAE: Mean Absolute Error; MAPE: Mean Absolute Percentage Error.

In SH-1, Model 5 showed the best performance as shown in Table 4. Model 4 has slightly higher MAE, MAPE and RMSE, as compared to Model 5, but it shows a better prediction accuracy (within +/−20% deviation from the actual duration). Nonetheless, both Models 4 and 5 have at least a 5% higher prediction accuracy than that of the MA. Among the five models in SH-2, Table 5 shows that Model 5 has the best performance, with 56.11% of its predictions falling within +/−20% of the actual duration. Model 5 prediction accuracy (within +/−20%) is 7.78% higher than that of the MA. Model 5 also has the lowest RMSE, MAE and MAPE at 38.48%, 23.61% and 23.36%, respectively. The feature importance of each predictor variable is averaged across all of the decision trees within the model. The best performing models (Model 5 for both SH-1 and SH-2) were used to plot the feature importance shown in Figure 3 and Figure 4.
Figure 3

Model 0 Feature Importance for (a) SH-1; (b) SH-2. (Note: Refer to Table 1 for feature mapping between SH-1 and SH-2).

Figure 4

Model 5 Feature Importance for (a) SH-1 (Procedure.Surgical.Table.Code is “Procedure Code”; (b) SH-2 (RVU_TOTAL_CaseMax is “CPT List”). (Note: Refer to Table 1 for feature mapping between SH-1 and SH-2).

4. Discussion

The objective of this study was to explore and determine the performance of current surgery case duration estimations and the use of machine learning models to predict surgery duration across two large tertiary healthcare institutions located in the United States and Singapore. The two healthcare institutions have different EMR systems, coding and representation of surgical details such as surgical procedure codes. The Table of Surgical Procedure (TOSP) codes [28] were used in SH-1 whilst the Current Procedural Terminology (CPT) [29] codes were used in SH-2. The two types of codes were mapped by surgical domain experts. The mapping table is given in Appendix A. The validation results showed that, in both study sites, the simple MA-based predictions outperform the scheduled duration provided by the OR schedulers across RMSE, MAE, MAPE and proportion of cases within 80–120% of the scheduled actual duration. Every minute of improved duration estimates would help in improving the efficiency of OR performance [15,33]. MA-based predictions have been frequently reported in the literature. Similar to the existing literature [3,6,34], both hospitals have been using simple MA-based methods, such as Last-5 [6], which uses the average of the most recent five cases in the relevant history for the prediction. Simple MA methods can be accurate in the estimation of surgery durations across multiple sites. The baseline machine-learning (ML) models which considered patient, surgeon and surgery related factors without the MA (Model 0) show improved performance for SH-1 across all metrics against the scheduled durations and MA estimates. Extending from the baseline model, Models 2, 4 and 5 included the MA features to improve the performance and generalizability for the predictions. The improved performance of ML-based models is similar to results that were recently reported [8,10,17,35]. For both SH-1 and SH-2, the majority of the contributions to the model was based on the MA and scheduled duration. For SH-1, the next five variables with the highest contributions are: Procedure Surgical Table Code, OT Code, First Surgeon ID, OT Location Code and BMI. At SH-2, the most significant factors are also MA and scheduled duration. whilst the next five variables with the highest contributions are RVU, Patient Class, Primary Physician ID and OR Location and Number of Procedures. In both SH-1 and SH-2, MA, scheduled duration and, TOSP Code (for SH-1) or RVU (for SH-2), have higher loading in the model as compared to surgeon factors. However, the order of importance of the other variables differs slightly between the two sites. For both sites, the variables describing the complexity of the surgery (TOSP Code in SH-1 and RVU in SH-2) have relatively higher loadings in the prediction models. The presence of the MA, Scheduled/Listing and Procedure Surgical Table Code and RVU only for Model 5 may have resulted in the better performance of this model. All these features have the highest contributions in Model 5 feature importance for SH-1 and SH-2 as shown in Figure 4. As the study sites utilized similar datasets across different study periods, there may be concerns about model bias. However, for both sites during the study horizon, there were no significant shifts in the surgical procedures for colorectal procedures and the design of the EMR systems and the extract-transform-load (ETL) system within the enterprise data warehouse across both sites. Moreover, each hospital has its own trained CatBoost ML model [32] so different periods in one model will not affect the other. The framework using CatBoost ML models has been tuned to provide the best prediction models based on the lowest cross validation RMSE. This result can be further evaluated in future studies in collaboration with more study sites. The collaborative PACE platform [22] has been shown to facilitate such study across two different jurisdictions. Electronic health record (EHR) data are extremely sensitive and valuable and require a protected environment to work in. This can be difficult and time consuming to achieve even in one institute. Duke PACE [22] provides a secured and protected environment to query and store these data and perform advanced analysis. This study demonstrates that PACE can provide the platform for this study to share EHR data between the two institutes of the two countries and facilitates the use of advanced machine-learning tools to predict surgical durations. Similar features were used in the prediction models developed at both sites (see Table 1). This study shows a viable alternative to facilitate future collaboration between institutes around the world. The collaboration through PACE demonstrated the feasibility in data sharing, validating the hypothesis and collaborative development of analytical models in order to support better clinical decision that can improve system, process and patient outcomes.

5. Conclusions

In this study, we compared the performance of existing MA-based estimates with novel ML-based predictive models for surgery durations across two large tertiary healthcare institutions. The ML-based models which considered additional patient, surgeon and surgery related factors show improved performance over both the MA-based method and the scheduled durations across multiple accuracy metrics. The ML-based models can be deployed in place of the existing MA-based estimates. Additional patient-related factors (e.g., comorbidities) could potentially help to further improve the accuracies of the predictions. We further demonstrated the use of the Duke PACE as the collaborative platform for facilitating data-sharing and analysis across both healthcare institutions for cross border and cross-institutional studies. Duke PACE was able to overcome the impediments in data sharing and governance policies to support collaborative work across multiple jurisdictions.
Table A2

List of features present in each CatBoost model.

Model Number
SNSH-1 Data FieldsSH-2 Data Fields0 (Baseline)12345
1OT CodeRoom
2Actual DurationIn-Out Duration
3First Surgeon Department CodeService Type
4Priority of OperationCase Class
5Department CodeDivision
6OT Location CodeLocation
7Procedure CodeCPT List
8Type of AnesthesiaPrimary Anesthesia Type
9ASA StatusASA Rating
10AgePatient Age
11GenderSex
12Visit TypePatient Class
13BMIBMI
14HeightHeight
15WeightWeight
16First Surgeon IDPrimary Physician ID
17Second Surgeon IDSecondary Physician ID
18Principal Anesthetist IDFirst Anesthetist ID
19MA 1year_3rdMA 1year_3rd (calculated)
20Number of ProceduresNumber of Procedures
21Number of PanelsNumber of Panels
22Multiple Procedure CodesSorted CPT List
23Listing DurationScheduled Duration

Notations: OT: Operating Theatre; CPT: Current Procedure Terminology (transformed into Relative Value Units); ASA: American Society of Anesthesiology; BMI: Body Mass Index; ID: Unique identifier; MA: Moving Average.

  17 in total

1.  Relying solely on historical surgical times to estimate accurately future surgical times is unlikely to reduce the average length of time cases finish late.

Authors:  J Zhou; F Dexter; A Macario; D A Lubarsky
Journal:  J Clin Anesth       Date:  1999-11       Impact factor: 9.452

2.  What does one minute of operating room time cost?

Authors:  Alex Macario
Journal:  J Clin Anesth       Date:  2010-06       Impact factor: 9.452

3.  An estimation of the global volume of surgery: a modelling strategy based on available data.

Authors:  Thomas G Weiser; Scott E Regenbogen; Katherine D Thompson; Alex B Haynes; Stuart R Lipsitz; William R Berry; Atul A Gawande
Journal:  Lancet       Date:  2008-06-24       Impact factor: 79.321

4.  Improving prediction of surgery duration using operational and temporal factors.

Authors:  Enis Kayis; Haiyan Wang; Meghna Patel; Tere Gonzalez; Shelen Jain; R J Ramamurthi; Cipriano Santos; Sharad Singhal; Jaap Suermondt; Karl Sylvester
Journal:  AMIA Annu Symp Proc       Date:  2012-11-03

5.  Surgical Duration Estimation via Data Mining and Predictive Modeling: A Case Study.

Authors:  N Hosseini; M Y Sir; C J Jankowski; K S Pasupathy
Journal:  AMIA Annu Symp Proc       Date:  2015-11-05

6.  Artificial Intelligence: A New Tool in Operating Room Management. Role of Machine Learning Models in Operating Room Optimization.

Authors:  Valentina Bellini; Marco Guzzon; Barbara Bigliardi; Monica Mordonini; Serena Filippelli; Elena Bignami
Journal:  J Med Syst       Date:  2019-12-10       Impact factor: 4.460

7.  Improving Operating Room Efficiency: Machine Learning Approach to Predict Case-Time Duration.

Authors:  Matthew A Bartek; Rajeev C Saxena; Stuart Solomon; Christine T Fong; Lakshmana D Behara; Ravitheja Venigandla; Kalyani Velagapudi; John D Lang; Bala G Nair
Journal:  J Am Coll Surg       Date:  2019-07-13       Impact factor: 6.113

8.  CPT to RVU conversion improves model performance in the prediction of surgical case length.

Authors:  Nicholas Garside; Hamed Zaribafzadeh; Ricardo Henao; Royce Chung; Daniel Buckland
Journal:  Sci Rep       Date:  2021-07-08       Impact factor: 4.379

9.  The cost of trauma operating theatre inefficiency.

Authors:  W W Ang; S Sabharwal; H Johannsson; R Bhattacharya; C M Gupte
Journal:  Ann Med Surg (Lond)       Date:  2016-03-05

10.  Improving the Prediction of Total Surgical Procedure Time Using Linear Regression Modeling.

Authors:  Eric R Edelman; Sander M J van Kuijk; Ankie E W Hamaekers; Marcel J M de Korte; Godefridus G van Merode; Wolfgang F F A Buhre
Journal:  Front Med (Lausanne)       Date:  2017-06-19
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.