Literature DB >> 36128254

Detection of Sacral Fractures on Radiographs Using Artificial Intelligence.

Naoya Inagaki¹, Norio Nakata², Sina Ichimori¹, Jun Udaka¹, Ayano Mandai¹, Mitsuru Saito¹.

Abstract

Sacral fractures are often difficult to diagnose on radiographs. Computed tomography (CT) and magnetic resonance imaging (MRI) can improve the detection rate but cannot always be performed. The accuracy of artificial intelligence (AI) in detecting orthopaedic fractures is now comparable with that of orthopaedic specialists. However, the ability of AI to detect sacral fractures has not been investigated, to our knowledge. We hypothesized that the ability to detect sacral fractures on radiographs could be improved by using AI, and aimed to develop an AI model to detect sacral fractures accurately on radiographs with better accuracy than that of orthopaedic surgeons.
Methods: Subjects were patients with suspected pelvic fractures for whom radiographs and CT scans had been obtained. The radiographs were labeled according to sacral fracture status based on CT results. The data set was divided into a training set (2,038 images) and a test set (200 images). Eight convolutional neural network (CNN) models were trained using the training set. Post-trained models were used to evaluate their discrimination ability. The detection ability of 4 experienced orthopaedic surgeons was also measured using the same test set. The results of fracture assessment by the orthopaedic surgeons were compared with those of the 3 CNNs with the greatest area under the receiver operating characteristic curve.
Results: Among the 8 trained models, the highest areas under the curve were for InceptionV3 (0.989), Xception (0.987), and Inception ResNetV2 (0.984). The detection rate was significantly higher for these 3 CNNs than for the orthopaedic surgeons. Conclusions: By enhancing the processing of probabilistic tasks and the communication of their results, AI may be better able to detect sacral fractures than orthopaedic surgeons. Level of Evidence: Diagnostic Level III. See Instructions for Authors for a complete description of levels of evidence.

Entities: Chemical

Year: 2022 PMID： 36128254 PMCID： PMC9478257 DOI： 10.2106/JBJS.OA.22.00030

Source DB: PubMed Journal: JB JS Open Access ISSN： 2472-7245

Sacral fractures are a heterogeneous group of fractures that occur in young people following high-energy trauma or in elderly individuals with osteoporosis following minor trauma[1]. Sacral fractures have particularly serious consequences because the sacrum is the keystone of the pelvic girdle and provides approximately 60% of pelvic stability, and is thus subjected to high stress. Spinopelvic dissociation due to high-energy trauma has been reported to have a poor clinical outcome in up to 42% of cases at 1 to 10 years of follow-up[2]. Moreover, undiagnosed low-energy fractures have been found to be associated with high mortality, with a 12-month mortality rate of 28% and with loss of pre-injury ability in 34% of cases[3]. Diagnosis of a sacral fracture may be missed or delayed in 25% to 70% of cases[4-6]. Furthermore, Hussin et al. and Denis et al. showed that even when a correct diagnosis is made, patients may develop neurologic deficits, underscoring the importance of this injury[7,8]. Therefore, accurate diagnosis and treatment are essential in managing sacral fractures. Detecting sacral fractures on pelvic radiographs is often complicated by bowel gas and sacral inclination, and occasionally complicated by an overlying anterior portion of the pelvis. It can also be complicated by bone rarefaction leading to decreased contrast, particularly in elderly patients[9-11]. The detection rate is improved by performing computed tomography (CT) and magnetic resonance imaging (MRI)[12]. However, these imaging modalities are not suitable for some suspected pelvic fractures because of their cost and the patient’s clinical status[12,13]. Artificial intelligence (AI) has been used to improve the diagnosis of orthopaedic fractures, and its accuracy is now comparable with that of orthopaedic specialists[14,15]. However, using AI to detect sacral fractures has not been investigated, to our knowledge. We hypothesized that AI could be used to improve the ability to detect sacral fractures on radiographs. The methods used in AI include convolutional neural networks (CNNs), linear discriminant analysis, quadratic discriminant analysis, and support vector machines. CNNs are employed in deep learning, a machine learning method that has great potential in diagnostic imaging, and CNNs are already being actively used in the analysis of medical images[16]. Thus, we aimed to develop CNN-based AI models that can accurately detect sacral fractures on radiographs. And we hypothesized that the accuracy of detection would be better for the developed AI models than for orthopaedic surgeons.

Materials and Methods

This study was approved by the ethics committee of Jikei University School of Medicine for Biomedical Research (registration number 33-029[10639]) and conducted in accordance with the ethical standards of the amended Declaration of Helsinki.

Subjects

We retrospectively identified all consecutive patients examined for suspected pelvic fractures at Jikei University School of Medicine or Jikei University Kashiwa Hospital between January 2014 and September 2020. Anteroposterior radiographs were made according to the Advanced Trauma Life Support guidelines. CT scans were performed at a pitch of 2 mm by either of 2 orthopaedic surgeons (N.I., S.I.). After excluding patients with metal implants, 652 patients (231 men, 421 women) were enrolled. Most (75.8%) were ≥65 years of age (mean, 71.8 years; range, 20 to 103 years); 197 had sacral fractures and 455 did not. Maximum displacement of the sacral fracture was measured on CT. Sacral fracture morphology was classified as Denis type 1, 2, and 3 in 132, 43, and 22 cases, respectively. Fracture displacement was ≥2 mm in 43 cases and <2 mm in 154 cases.

Data Preparation and Image Selection

Anteroposterior pelvic radiographs were made with the patient in the supine position. The imaging conditions were 70 kV, 200 mA, 0.4 s, and 100-cm tube-to-film distance, and digital images were obtained. Multiple pelvic radiographs of each patient were made within 1 month after injury. Uncompressed imaging data were stored on a DICOM (digital imaging and communication in medicine) server (Toshiba Medical Systems Corporation). Images extracted from the server were converted into 8-bit JPEG images. A square region showing both sacroiliac joints and the sacrum was cropped from the anteroposterior pelvic radiograph (Fig. 1). The image was resized to 256 × 256 pixels and labeled by 2 orthopaedic surgeons (N.I., S.I.) according to the presence or absence of sacral fractures based on the CT results. CT findings were used because the accuracy of CT for diagnosing sacral fractures is higher than that of radiography[4]. We used 2,238 images (770 with and 1,468 without sacral fractures). The images in the data set were randomly divided into a training set (670 with and 1,368 without sacral fractures) and a test set (100 with and 100 without). In the training set, sacral fracture morphology was classified as Denis type 1, 2, and 3 on 465, 148, and 57 images, respectively. Fracture displacement was ≥2 mm on 121 images and <2 mm on 549 images. In the test set, sacral fractures were classified as Denis type 1, 2, and 3 on 74, 15, and 11 images, respectively. Fracture displacement was ≥2 mm on 13 images and <2 mm on 87 images.

Fig. 1

A square region of the sacrum showing both sacroiliac joints was cropped from an anteroposterior radiograph of the pelvis.

Hardware and Software

All computations were performed on desktop workstations optimized for deep learning with an Intel core i9 processor, 128 Gb RAM (random-access memory), and an NVIDIA Quadro GV100 or NVIDIA RTX A6000 GPU. The operating system was Ubuntu Linux versions 18.04 to 20.04. To train and test multiple CNN models for detection of sacral fractures, we created each program using Python version 3.8; TensorFlow version 2.5.0, which is an open-source platform developed for machine learning; and Keras version 2.4.0, which is a library for machine learning.

Training

Our 8 CNN models were designed to tweak the parameters of already pre-trained open-source models downloaded from 2 websites (https://keras.io/, https://github.com/qubvel/classification_models) and adapted to our new task. The first fine-tuning task was to freeze the ImageNet weights and add a new, fully connected layer. The second fine-tuning task was to unfreeze some of the upper layers of the frozen model base and jointly train both the newly added fully connected layer and the final layers of the base model. All base CNN models were pre-trained on ImageNet. To compare these models and ensure reproducibility of the training results, the random seed was fixed by setting an arbitrary seed value. The images used to train these training models were 256 × 256 or 224 × 224 pixels (Table I). Model fitting was performed in 50 epochs, and a callback (early stopping feature) was introduced to stop training before overtraining occurred. Training was stopped if there was no improvement of val_loss (value of the cost function of the cross-validation data) within 10 epochs. The ReduceLROnPlateau function in Keras was used to reduce the learning rate by 0.1 if no improvement was seen for 3 epochs. Comparison of the Pre-Trained Convolutional Neural Networks

Testing and Evaluation

We conducted a test using each CNN model created by the training. Indicators used to evaluate the accuracy of the test results for each model were precision, sensitivity (recall), specificity, accuracy, F1 score, and area under the receiver operating characteristic curve (AUC). The F1 score is the harmonic mean of precision and recall. The AUC measures classification performance at various threshold settings; it represents the degree of separability of the classes (the extent to which a model can distinguish between classes).

Detection Ability of Orthopaedic Surgeons

Four experienced orthopaedic surgeons (mean experience, 10.8 years [range, 7 to 15 years] since graduation from medical school) reviewed the same 200 test set images on a color LCD (liquid crystal display) monitor (EV2450; EIZO) with a resolution of 1,920 × 1,080 pixels, contrast ratio of 1,000:1, and brightness of 250 cd/m2 and classified each sacral image for the presence or absence of fractures. Brightness, contrast, and zoom settings were routinely adjusted when fractures were unclear.

Statistical Analysis

We compared the ability of the AI with the highest AUC in each CNN model and the 4 experienced orthopaedic surgeons to detect sacral fractures on test images. Significance of the results was analyzed using the McNemar test in the Python statistical library.

Heat Map

TensorFlow (version 2; Google) and gradient-weighted class activation mapping (Grad-CAM) were used to create a heat map image of the sacral region for the CNN model showing the best accuracy[17].

Source of Funding

There was no external funding source for this study.

Results

Among the 8 trained CNN models, InceptionV3 had the highest AUC (0.989; 95% confidence interval [CI], 0.975 to 1.000), followed by Xception (0.987; 95% CI, 0.970 to 1.000) and Inception ResNetV2 (0.984; 95% CI, 0.966 to 1.000) (Fig. 2, Table II). InceptionV3 was the most accurate, followed again by Xception and Inception ResNetV2. Test results for the orthopaedic surgeons are shown in Table III. The McNemar test showed that the results for the 3 CNNs with the highest AUC (InceptionV3, Xception, and Inception ResNetV2) were significantly more accurate than those for the orthopaedic surgeons. When 68 heat map images on Grad-CAM were analyzed to predict the presence of a sacral fracture, 67.8% of the activation area was located at the sacral fracture site.

Fig. 2

Receiver operating characteristic curves for the 8 models. Fig. 2-A Xception. Fig. 2-B InceptionV3. Fig. 2-C Inception ResNetV2. Fig. 2-D ResNet50. Fig. 2-E ResNet101. Fig. 2-F SeResNeXt50. Fig. 2-G SeResNeXt101. Fig. 2-H NASNet-Mobile. Mean Ability of Each Post-Trained Convolutional Neural Network Model to Detect Sacral Fractures in the Test Data Ability of Each Experienced Orthopaedic Surgeon to Detect Sacral Fractures* McNemar tests showed that the ability of the best 3 convolutional neural networks to detect sacral fractures was significantly higher than that of the experienced orthopaedic surgeons.

Discussion

AI Performance

To our knowledge, this report is the first to describe using an AI method to detect sacral fractures on radiographs. InceptionV3 showed precision of 0.989, sensitivity of 0.880, specificity of 0.990, an F1 score of 0.931, accuracy of 0.935, and an AUC of 0.989, explaining its high discrimination ability. In contrast, the orthopaedic surgeons showed precision of 0.484 to 0.625, sensitivity of 0.460 to 0.750, specificity of 0.420 to 0.700, accuracy of 0.485 to 0.600, and an F1 score of 0.472 to 0.643. A previous study found that orthopaedic surgeons had a sacral fractures detection accuracy of only 53% on anteroposterior pelvic radiographs[4], which is consistent with our results and highlights the difficulty that orthopaedic surgeons have in diagnosing sacral fractures using radiographs alone. Previous studies of the feasibility of using CNNs to detect fractures on radiographs have had promising results. Olczak et al. trained a Visual Geometry Group 16-layer (VGG_16) network to detect hand, wrist, and ankle fractures and found its diagnostic accuracy to be comparable with that of radiologists (0.83 versus 0.82)[18]. Cheng et al. evaluated the ability of the ResNet model to detect fractures of the humerus on shoulder radiographs and found that accuracy was better for the CNN than for orthopaedists (0.96 versus 0.93)[15]. Urakawa et al. reported that the VGG_16 network could detect intertrochanteric fractures on radiographs with an accuracy of 0.96, compared with an accuracy of 0.92 for orthopaedic surgeons[14]. These studies suggested that the ability of CNNs to detect fractures approaches or exceeds that of specialists[14,15,18]. The ability of our CNNs to detect sacral fractures was greater than that of orthopaedic surgeons, and there are several reasons for this clear difference in discrimination ability. First, the creation of our training data differed from that in previous studies. Training data are typically created by clinical specialists using radiographs, and CNNs trained using such data are unlikely to be better at discrimination than the specialists. However, in our study, training data were created based on CT findings, so using the results of higher-level tests as the basis for classifying the results of lower-level tests could create a high-quality CNN for lower-level tests. This novel method could be used in the future to develop a CNN with high accuracy for other fractures that are difficult for specialists to detect. Second, the images that we used were created manually by cropping them into certain matrix sizes before the images were put into the CNNs. Since the images were uniform and had concentrated matrix sizes, the region of interest (ROI) on the images was recognized more rapidly and accurately by the CNN models, thereby markedly improving model efficiency in learning and test procedures[14,15,18]. We believe that limiting the ROI to the sacral region improved the ability to discriminate sacral fractures, which are difficult to detect. Third, the orthopaedic surgeons also viewed the radiographs that had been cropped to show only the sacrum, rather than the original whole-pelvis radiographs. Orthopaedic surgeons cannot detect sacral fractures as well on sacral radiographs alone; they make the diagnosis based on deformities of the entire pelvis and anterior fractures[19,20]. The limited ability of our orthopaedic surgeons to detect sacral fractures may thus reflect the fact that these sacral fractures were not visible on cropped radiographs. Further research is needed to determine the difference in detection rate between cropped and whole-pelvis radiographs.

Visualization by AI

A heat map can provide influential information underlying a decision about an image, so the heat maps may represent an additional factor in favor of sacral fracture detection by AI[17]. In our study, Grad-CAM roughly visualized a high-signal region consistent with the fracture site approximately 70% of the time. A typical example is shown in Figure 3. This result suggests that the AI models could visualize fractures in many of the test images. That could be due partially to callus on the radiographs that were made 1 month after injury. However, some images failed to show a causal relationship with the high-signal area or details of the type of fracture, reflecting the current problems with Grad-CAM. We anticipate further development of heat map technology.

Fig. 3

Visualization of fractures by the different models. The top row shows a Grad-CAM image of fractures in a 77-year-old woman who fell on her buttocks. The fracture line was not clear on an anteroposterior pelvic radiograph, but a CT scan showed bilateral sacral fractures (Denis type 1). The bottom row shows a Grad-CAM image of a sacral fracture in a 20-year-old woman who was injured in a road traffic accident. The fracture line was not clear on an anteroposterior radiograph. However, a CT scan showed a fracture of the right sacrum (Denis type 2).

Status of Computer-Aided Diagnosis of Sacral Fractures

We believe that our AI method can be used by orthopaedic surgeons as a diagnostic screening test for sacral fractures before proceeding to CT. Berg et al. reported that the accuracy rate for sacral fracture detection was 53% on anteroposterior pelvic radiographs and that addition of an inlet or an outlet radiographic view allowed the identification of anterior pelvic ring injuries with 78% and 74% sensitivity, respectively[4]. These results indicate that additional views are necessary to ensure accurate diagnosis of sacral fractures on radiographs alone. Many clinicians use CT and MRI to improve their rate of diagnosis of sacral fractures[12]. MRI detects occult posterior pelvic fractures that cannot be visualized on CT, but whether all of these fractures require surgical treatment is controversial[21,22]. Although CT has become the gold standard for detecting sacral fractures[12,23], it is not feasible to obtain CT scans in all cases because of clinical resources, cost, and the clinical status of certain patients. Several papers have suggested criteria for appropriate CT use. Scheyerer et al. found that exclusion of posterior lesions by CT was essential because most anterior pelvic fractures involve additional fracture sites[20]. However, this method can be used only for anterior pelvic fractures. McCormick et al. recommend using the results of palpation of the posterior portion of the pelvis in patients with pelvic fractures (sensitivity, 0.98; specificity, 0.94) as a basis for determining whether CT should be used[13]. However, their patients had other types of trauma and were fully conscious. Moreover, palpation of the posterior portion of the pelvis has limited practical value because many patients with sacral fractures have other trauma or impaired consciousness[24]. Our AI method may help to solve these problems because it is not affected by the fracture type or the patient’s condition, although CNN can only detect sacral fractures and cannot determine the fracture type, which determines the treatment plan. Thus, our CNN method is valuable because it can be used for screening until CT can be performed. However, it is currently unclear whether using this AI method as an aid improves examiner accuracy. Clinical workflow is also a subject for future research, in which challenges regarding ethical deployment, regulatory approval, and the clinical superiority of AI over traditional statistical methods and decision-making will need to be addressed[25].

Limitations

This study has 4 main limitations. First, we did not use MRI to detect fractures, so it is possible that some fractures were missed. Second, the original sample size in our data set was small, which might have limited the improvement in the performance of our CNNs in the training and test procedures. Third, we created an artificial, nonclinical process by cropping the images and utilizing an image processing system. The sacrum should be cropped from pelvic radiographs made for clinical use before surgeons can use these CNNs. Also, the training images identified by the orthopaedic surgeons will not have been perfectly accurate; training on more accurate data should be performed before the AI model is incorporated into the clinical workflow. Lastly, the diagnostic imaging in this study was performed by orthopaedic surgeons rather than radiologists. However, which experts interpret these radiographs may depend on their country’s health-care system. Furthermore, even if radiologists generally interpret patient radiographs, orthopaedic surgeons make the final decision about performing CT based on the radiologists’ interpretations. Also, Kuo et al. reported that orthopaedic surgeons and radiologists had very similar ability to diagnose fractures on radiographs[26]; thus, the demonstrated accuracy of AI that far surpasses that of orthopaedic surgeons appears valuable.

Conclusions

We have successfully developed AI models to detect sacral fractures on radiographs. Our CNN models were trained using pelvic radiographs, which had been classified according to the presence or absence of fractures based on the results of CT, and had a discrimination ability far surpassing that of orthopaedic surgeons. By enhancing the processing of probabilistic tasks and the communication of their results, AI has the potential to become a useful screening tool for diagnosing sacral fractures before CT can be performed.

TABLE I

Comparison of the Pre-Trained Convolutional Neural Networks

Model	Total Parameters	Trainable Parameters	Total Layers	Input Size
Xception	22,961,706	22,907,178	135	256 × 256
InceptionV3	23,903,010	23,903,010	314	256 × 256
Inception ResNetV2	55,912,674	55,852,130	783	256 × 256
ResNet50	25,687,938	11,035,650	178	256 × 256
ResNet101	44,758,402	11,035,650	348	256 × 256
SeResNeXt50	27,679,346	4,875,010	1,330	256 × 256
SeResNeXt101	49,144,498	4,944,642	2,724	256 × 256
NASNet-Mobile	5,354,134	1,151,298	772	224 × 224

TABLE II

Mean Ability of Each Post-Trained Convolutional Neural Network Model to Detect Sacral Fractures in the Test Data

Model	Precision	Sensitivity	Specificity	F1 Score	Accuracy	AUC
Xception	0.966	0.860	0.970	0.910	0.915	0.987
InceptionV3	0.989	0.880	0.990	0.931	0.935	0.989
Inception ResNetV2	0.976	0.820	0.980	0.891	0.900	0.984
ResNet50	0.893	0.250	0.970	0.391	0.610	0.850
ResNet101	1.000	0.150	1.000	0.261	0.575	0.821
SeResNeXt50	0.892	0.740	0.910	0.809	0.825	0.935
SeResNeXt101	0.963	0.770	0.970	0.856	0.870	0.965
NASNet-Mobile	0.783	0.650	0.820	0.710	0.735	0.837

TABLE III

Ability of Each Experienced Orthopaedic Surgeon to Detect Sacral Fractures*

Orthopaedic Surgeon	Years of Experience	Precision	Sensitivity	Specificity	F1 Score	Accuracy
A	13	0.564	0.750	0.420	0.643	0.585
B	15	0.484	0.460	0.510	0.472	0.485
C	7	0.625	0.500	0.700	0.556	0.600
D	8	0.542	0.520	0.560	0.531	0.540

McNemar tests showed that the ability of the best 3 convolutional neural networks to detect sacral fractures was significantly higher than that of the experienced orthopaedic surgeons.

24 in total

1. Detecting intertrochanteric hip fractures with orthopedist-level accuracy using a deep convolutional neural network.

Authors: Takaaki Urakawa; Yuki Tanaka; Shinichi Goto; Hitoshi Matsuzawa; Kei Watanabe; Naoto Endo
Journal: Skeletal Radiol Date: 2018-06-28 Impact factor: 2.199

Review 2. Imaging and treatment of sacral insufficiency fractures.

Authors: E M Lyders; C T Whitlow; M D Baker; P P Morris
Journal: AJNR Am J Neuroradiol Date: 2009-09-17 Impact factor: 3.825

3. Complications associated with surgical stabilization of high-grade sacral fracture dislocations with spino-pelvic instability.

Authors: Carlo Bellabarba; Thomas A Schildhauer; Alexander R Vaccaro; Jens R Chapman
Journal: Spine (Phila Pa 1976) Date: 2006-05-15 Impact factor: 3.468