| Literature DB >> 35794502 |
Nathan Larson1, Chantal Nguyen2, Bao Do3, Aryan Kaul4, Anna Larson1, Shannon Wang4, Erin Wang5, Eric Bultman3, Kate Stevens6, Jason Pai6, Audrey Ha7, Robert Boutin6, Michael Fredericson2, Long Do8, Charles Fang9.
Abstract
Leg length discrepancies are common orthopedic problems with the potential for poor functional outcomes. These are frequently assessed using bilateral leg length radiographs. The objective was to determine whether an artificial intelligence (AI)-based image analysis system can accurately interpret long leg length radiographic images. We built an end-to-end system to analyze leg length radiographs and generate reports like radiologists, which involves measurement of lengths (femur, tibia, entire leg) and angles (mechanical axis and pelvic tilt), describes presence and location of orthopedic hardware, and reports laterality discrepancies. After IRB approval, a dataset of 1,726 extremities (863 images) from consecutive examinations at a tertiary referral center was retrospectively acquired and partitioned into train/validation and test sets. The training set was annotated and used to train a fasterRCNN-ResNet101 object detection convolutional neural network. A second-stage classifier using a EfficientNet-D0 model was trained to recognize the presence or absence of hardware within extracted joint image patches. The system was deployed in a custom web application that generated a preliminary radiology report. Performance of the system was evaluated using a holdout 220 image test set, annotated by 3 musculoskeletal fellowship trained radiologists. At the object detection level, the system demonstrated a recall of 0.98 and precision of 0.96 in detecting anatomic landmarks. Correlation coefficients between radiologist and AI-generated measurements for femur, tibia, and whole-leg lengths were > 0.99, with mean error of < 1%. Correlation coefficients for mechanical axis angle and pelvic tilt were 0.98 and 0.86, respectively, with mean absolute error of < 1°. AI hardware detection demonstrated an accuracy of 99.8%. Automatic quantitative and qualitative analysis of leg length radiographs using deep learning is feasible and holds potential in improving radiologist workflow.Entities:
Keywords: Artificial Intelligence; Deep Learning; Leg Length Discrepancy; Radiography
Year: 2022 PMID: 35794502 PMCID: PMC9261153 DOI: 10.1007/s10278-022-00671-2
Source DB: PubMed Journal: J Digit Imaging ISSN: 0897-1889 Impact factor: 4.903
Fig. 1A flowchart summarizing the dataset usage throughout the various components
Demographic data of both training and testing sets
| N | 643 | 220 |
| Mean age (SD, range) | 61.4 years (15.8 years, 16–96) | 60.7 years (16.2 years, 22–95) |
| Sex | 345 M (54%) / 298 (46%) F | 125 M (57%) / 95 F (43%) |
| Hip arthroplasty | 41 (6.4%) | 17 (7.7%) |
| Other hip hardware | 31 (4.7%) | 10 (4.5%) |
| Knee arthroplasty | 104 (16%) | 67 (30%) |
| Other knee hardware | 91 (14%) | 23 (10%) |
| Ankle hardware | 24 (3.7%) | 11 (5.0%) |
| Amputation/incomplete anatomy | 39 (6.1%) | 12 (5.4%) |
Fig. 2Defined landmarks and calculation of each metric. a depicts the twelve anatomic landmarks used for the measurements. b depicts the measurement of distances. Black represents femur length, blue represents leg length, and red represents tibia length. c depicts the measure of the angles. Angle A (blue) represents the measure of the pelvic tilt compared to a horizontal line (black). Angles B and C (green) represent the mechanical axis angle for each lower extremity
Fig. 3Data flow of the final AI application
Fig. 4Example image patches used to train the hardware classifier model. Images were classified as native or hardware and used to train an EfficientNet-B0 model. The top row shows sample native image patches. The bottom row shows sample hardware image patches
Fig. 5Current version of the clinical application (left). Current version of the webapp (right). Output format is subject to change
Object detection confusion matrix
| Actual | |||
|---|---|---|---|
| Landmarks | Other | ||
| Predicted | Detected | 2446 | 101 |
| Missed | 62 | - | |
Fig. 6Scatterplot of leg (a), femur (b), and tibia lengths (c) with parallel lines of +—50 pixels (about 1 cm) Scatterplot of mechanical angle (d) and pelvic tilt (e) with parallel lines of +—2 degrees
Comparison of A.I. with ground truth (radiologist average)
| Absolute Mean error | Absolute mean Error | Standard deviation | |
|---|---|---|---|
| Leg length | 6.37 pixels | 0.22% | 8.93 pixels |
| Femur length | 9.97 pixels | 0.64% | 7.89 pixels |
| Tibia length | 5.94 pixels | 0.47% | 8.61 pixels |
| Mech angle | 0.42° | 0.54° | |
| Pelvic tilt | 0.67° | 1.74° |
Leg length Pearson correlation coefficient (r value)
| Radiologist 2 | Radiologist 3 | A.I | |
|---|---|---|---|
| Radiologist 1 | 0.999 (p < 0.00001) | 0.999 (p < 0.00001) | 0.998 (p < 0.00001) |
| Radiologist 2 | NA | 0.999 (p < 0.00001) | 0.999 (p < 0.00001) |
| Radiologist 3 | 0.999 (p < 0.00001) | NA | 0.999 (p < 0.00001) |
Mechanical axis angle Pearson correlation coefficient (r value)
| Radiologist 2 | Radiologist 3 | A.I | |
|---|---|---|---|
| Radiologist 1 | 0.983 (p < 0.00001) | 0.986 (p < 0.00001) | 0.978 (p < 0.00001) |
| Radiologist 2 | NA | 0.988 (p < 0.00001) | 0.989 (p < 0.00001) |
| Radiologist 3 | 0.988 (p < 0.00001) | NA | 0.983 (p < 0.00001) |
Machine vs gold standard Pearson correlation coefficients
| Measurement | R value (A.I. vs. radiologist average) |
|---|---|
| Leg length | 0.999 (p < 0.00001) |
| Femur length | 0.997 (p < 0.00001) |
| Tibia length | 0.995 (p < 0.00001) |
| Mech angle | 0.988 (p < 0.00001) |
| Pelvic tilt | 0.864 (p < 0.00001) |
Intraclass correlation coefficients (ICC): high ICC among the measurements of leg length, femur length, tibia length, mechanical axis angle, and pelvic tilt indicates a high level of agreement between radiologist measurements and machine predicted measurements. For reference, values less than 0.5, between 0.5 and 0.75, between 0.75 and 0.9, and greater than 0.90 are indicative of poor, moderate, good, and excellent reliability, respectively. [24]
| VS aggregate (CI; p value) | VS individual radiologists (CI; p value) | |
|---|---|---|
| Leg length | 1.00 (1.00–1.00; 0) | 1.00 (1.00–1.00; 0) |
| Femur length | 0.99 (0.94–1.00; 0) | 0.98 (0.95–0.99; 0) |
| Tibia length | 1.00 (0.99–1.00; 0) | 0.99 (0.98–0.99; 0) |
| Mech axis | 0.98 (0.98–0.99; 0) | 0.99 (0.99–0.99; 0) |
| Pelvic tilt | 0.86 (0.83–0.89; < 0.00001) | 0.8 (0.77–0.83; < 0.00001) |
Fig. 7a A case where the object detection model detected two ankle mortises on the same leg, which leads to an error in the leg length and tibia length measurements. Points identified by our model are labeled in yellow; the average point identified by radiologists is labeled in green. b A sample properly detected image with points very similar to that of the radiologists. Points identified by our model are labeled in yellow; the average point identified by radiologists is labeled in green
Confusion matrix for machine predictions of hardware presence versus native anatomy
| Actual | |||
|---|---|---|---|
| Hardware | Native | ||
| Predicted | Hardware | 69 | 0 |
| Native | 3 | 1438 | |
Fig. 8Misclassified patches from the hardware detection stage of the system. These patches were mistakenly classified as native anatomy by the machine. No patches with native anatomy were misclassified as having hardware