| Literature DB >> 36036935 |
Dongmei Sun1,2, Thanh M Nguyen1, Robert J Allaway3, Jelai Wang1, Verena Chung3, Thomas V Yu3, Michael Mason3, Isaac Dimitrovsky4, Lars Ericson5, Hongyang Li6, Yuanfang Guan6, Ariel Israel7,8, Alex Olar9, Balint Armin Pataki9, Gustavo Stolovitzky10,11, Justin Guinney3, Percio S Gulko12, Mason B Frazier1, Jake Y Chen1, James C Costello13, S Louis Bridges1,2,14.
Abstract
Importance: An automated, accurate method is needed for unbiased assessment quantifying accrual of joint space narrowing and erosions on radiographic images of the hands and wrists, and feet for clinical trials, monitoring of joint damage over time, assisting rheumatologists with treatment decisions. Such a method has the potential to be directly integrated into electronic health records.Entities:
Mesh:
Year: 2022 PMID: 36036935 PMCID: PMC9425151 DOI: 10.1001/jamanetworkopen.2022.27423
Source DB: PubMed Journal: JAMA Netw Open ISSN: 2574-3805
Figure 1. An Overview of RA2-DREAM Challenge
A. Representative radiographs showing joints without disease, joints with mild and severe damages due to RA (copyright 2022 ACR). Red arrows indicate areas of erosion; yellow arrows, areas with joint space narrowing. B. A total of 367 sets of radiographic images and expert-curated SvH scores were provided to participants to train algorithms. Leaderboard data set (119 images only) and final evaluation data set (188 images only) were used for performance evaluation in the leaderboard and final evaluation rounds.
Demographic Characteristics of Patients in Training, Leaderboard, and Final Evaluation Data Sets
| Characteristic | Data set, No. (%) | ||
|---|---|---|---|
| Training (n = 367) | Leaderboard (n = 119) | Final evaluation (n = 188) | |
| Score, median (IQR) | |||
| Overall total | 13.0 (5.0-38.5) | 5.0 (2.0-15.5) | 10.0 (5.0-20.0) |
| Joint erosion | 5.0 (2.0-17.0) | 2.0 (1.0-5.0) | 5.0 (2.0-9.0) |
| Joint narrowing | 7.0 (2.0-22.5) | 2.0 (0.0-11.0) | 4.0 (0.8-11.3) |
| Sex | |||
| Men | 60 (16.3) | 11 (9.2) | 30 (16.0) |
| Women | 307 (83.7) | 108 (90.8) | 158 (84.0) |
| Race | |||
| Black | 242 (65.9) | 88 (73.9) | 156 (83.0) |
| White | 106 (28.9) | 26 (21.8) | 21 (11.2) |
| Other | 19 (5.2) | 5 (4.2) | 11 (5.9) |
| Age, mean (SD), y | |||
| At radiographic examination | 54.9 (13.2) | 51.4 (13.1) | 53.5 (13.5) |
| At RA diagnosis | 46.3 (13.3) | 45.9 (13.9) | 48.4 (13.1) |
| Disease duration, mean (SD), mo | 103.2 (113.1) | 65.4 (74.3) | 60.5 (93.9) |
| Rheumatoid factor | |||
| Positive | 273 (74.4) | 89 (74.8) | 126 (67.0) |
| Negative | 90 (24.5) | 29 (24.4) | 60 (31.9) |
| NA | 4 (1.1) | 1 (0.8) | 2 (1.1) |
| Anti-CCP antibody | |||
| Positive | 250 (68.1) | 76 (63.9) | 112 (59.6) |
| Negative | 94 (25.6) | 39 (32.8) | 68 (36.2) |
| NA | 23 (6.3) | 4 (3.4) | 8 (4.2) |
Abbreviations: CCP, cyclic citrullinated peptide; NA, not available; RA, rheumatoid arthritis.
Includes American Indian or Alaska Native, Asian, and Native Hawaiian or other Pacific Islander individuals.
Rheumatoid factor and anti-CCP antibodies are 2 autoantibodies characteristic of RA.
Figure 2. Evaluation and Validation of Challenge Results
All estimations were bootstrapped to generate a distribution of scores to calculate Bayes factors between the top performing model and all other models. Any estimation with a Bayes factor of 3 or less was considered “tied” with the top model; no estimations were tied for top place in any of the 3 subchallenges. The baseline model provided by the organizers was used for reference. The models were run on postchallenge independent validation data set of 50 images and scored against mean of the 2 expert-curated measurements from the validation data set, using the overall damage (subchallenge 1), joint space narrowing (subchallenge 2), and erosion (subchallenge 3) metrics. Left, Bold lines indicate medians; boxes, IQRs; whiskers, ranges; and dots, individual data points. Right, The dot plots show the weighted RMSE performance in the final evaluation round (x-axis) for each model compared with the mean of weighted RMSE (2 readers) from the validation data set (y-axis). Algorithms below the dashed line performed better in the final evaluation round, while those above the dashed line performed better on the validation data set.
Figure 3. Ensembled Models Improve Performance (Weighted RMSE)
A series of ensemble models were created by combining the top 2 models, the top 3, and so on until all models were combined (from x-axis left to right, model appears according to its ranking in each subchallenge). For each ensemble model, the means of the estimations were calculated and scored with the overall damage (subchallenge 1), joint space narrowing (subchallenge 2), or erosion (subchallenge 3) metrics. A bootstrap Bayes factor analysis was used to determine differences in performance between the top-performing (individual) model and the ensemble models.
Summary of Machine Learning Methods Used by Teams Who Submitted Models in the Final Evaluation Round
| Team name | Segmentation model | Image augmentation | Algorithm class | Prebuild models | Ensemble models, No. |
|---|---|---|---|---|---|
| Team Shirin | NA | No | Other | DenseNet[ | NA |
| HYL-YFG | U-Net[ | No | ConvNet | NA | 10 |
| Gold Therapy | ResNet[ | No | ConvNet | NA | NA |
| CSAbaibio | RCNN,[ | No | ConvNet | NA | 12 |
| Nc717 | RetinaNet[ | No | ConvNet | EfficientNet[ | 5 |
| Aboensis V | YOLO[ | No | ConvNet | YOLO[ | NA |
| NAD | RCNN[ | No | ConvNet | EfficientNet[ | 15 |
| kichuDL | NA | No | Autoencoder | NA | 20 |
| Alpine Lads | NA | Yes | ConvNet | NA | 2 |
| Zbigniew Wojna | U-Net[ | No | ConvNet | NA | 8 |
| RYM | YOLO[ | Yes | PenReg | NA | NA |
| CU_DSI | RCNN[ | No | ConvNet | EfficientNet | NA |
| ZS_ai | YOLO[ | Yes | ConvNet | ResNet,[ | 3 |
Abbreviations: HYL-YFG, Hongyang Li and Yuanfang Guan; NA, not available.
The team did not apply segmentation.
The team did not use an ensemble model.
The team did not apply any prebuilt model.
Response not received but information was extracted from write-ups.