Literature DB >> 35855348

Prediction Models for Osteoporotic Fractures Risk: A Systematic Review and Critical Appraisal.

Xuemei Sun1, Yancong Chen1, Yinyan Gao1, Zixuan Zhang1, Lang Qin1, Jinlu Song1, Huan Wang1, Irene Xy Wu1,2.   

Abstract

Osteoporotic fractures (OF) are a global public health problem currently. Many risk prediction models for OF have been developed, but their performance and methodological quality are unclear. We conducted this systematic review to summarize and critically appraise the OF risk prediction models. Three databases were searched until April 2021. Studies developing or validating multivariable models for OF risk prediction were considered eligible. Used the prediction model risk of bias assessment tool to appraise the risk of bias and applicability of included models. All results were narratively summarized and described. A total of 68 studies describing 70 newly developed prediction models and 138 external validations were included. Most models were explicitly developed (n=31, 44%) and validated (n=76, 55%) only for female. Only 22 developed models (31%) were externally validated. The most validated tool was Fracture Risk Assessment Tool. Overall, only a few models showed outstanding (n=3, 1%) or excellent (n=32, 15%) prediction discrimination. Calibration of developed models (n=25, 36%) or external validation models (n=33, 24%) were rarely assessed. No model was rated as low risk of bias, mostly because of an insufficient number of cases and inappropriate assessment of calibration. There are a certain number of OF risk prediction models. However, few models have been thoroughly internally validated or externally validated (with calibration being unassessed for most of the models), and all models showed methodological shortcomings. Instead of developing completely new models, future research is suggested to validate, improve, and analyze the impact of existing models. copyright:
© 2022 Sun et al.

Entities:  

Keywords:  critical appraisal; osteoporotic fractures; prediction model; systematic review

Year:  2022        PMID: 35855348      PMCID: PMC9286920          DOI: 10.14336/AD.2021.1206

Source DB:  PubMed          Journal:  Aging Dis        ISSN: 2152-5250            Impact factor:   9.968


Osteoporotic fractures (OF) are fractures that occur during minor trauma or daily activities, which are a serious consequence of osteoporosis [1]. The common fracture sites are vertebral, hip, distal radius, proximal humerus, and pelvis [2]. Osteoporosis causes more than nine million new fractures worldwide every year, it is estimated that an OF occurs every three seconds [3], and one-third of women and one-fifth of men will suffer an OF in their lifetime [4]. OF can cause pain, severe disability and mortality, as well as burdens on families and society. It seriously impairs the quality of life of patients [5]. Prevention of OF requires early and accurate identification of individuals at risk and taking effective preventive interventions in time [6]. Bone mineral density (BMD) test is the gold standard for diagnosing osteoporosis. It is often used to identify patients with osteoporosis or low BMD. Nevertheless, studies have shown that the BMD test alone does not reliably predict whether individuals will develop a fracture [7]. In addition, high cost, ionizing radiation, and low mobility of the BMD test limit its clinical application [8]. Therefore, in many clinical guidelines, it is now recommended to use prediction models integrating several risk factors to identify individuals at high risk of OF [9]. At present, numerous prediction tools for OF have been developed, including but not limited to the World Health Organization (WHO) Fracture Risk Assessment Tool (FRAX) algorithm [10], Qfracture algorithm [11], and Garvan Fracture Risk Calculator (Garvan) [12]. Some of them have been recommended in clinical guidelines for treatment management [13,14] and more and more advocated by health policymakers. Although there are some systematic reviews on OF prediction models [15-17], they are outdated with the latest literature search being performed in 2017 [16]. Further limitations include restriction to a few specific tools [17] or a certain population like women [15], or no critical appraisal of the included models with standardized criteria [16,17]. Hence, an updated systematic review of prediction models for OF is needed. We conducted this systematic review and critical appraisal to summarize the characteristics of the development and validation of OF risk prediction model, assess its methodological quality and reporting quality, and provide up-to-date evidence for clinical implementation and future research.

METHODS

This systematic review was reported by following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) [18]. The protocol of this systematic review has been registered in PROSPERO (registration number: CRD42020199196).

Search strategy

We systematically searched PubMed, Embase, and PsycINFO from inception to April 3, 2021. In addition, the reference lists of included studies were manually reviewed. The search strategy included the key concepts of i) osteoporotic fractures and osteoporosis and ii) risk prediction and related terms. The detailed search strategies are presented in Supplementary table 1.

Eligible criteria

Cohort studies that develop or validate risk prediction models for OF in the general population were considered eligible. Studies were excluded if i) the prediction model consisted of only one predictor; ii) they targeted secondary OF or focused on specific patient groups for the treatment of OF or related conditions; iii) the performance of the model was not reported; iv) they were reviews, conference abstracts, letters or protocols. In addition, if the development article was not available, the corresponding externally verified articles were excluded.

Literature selection

Two reviewers (HW and JS) independently selected the studies, determined eligibility, and resolved the discrepancies by consensus. When the difference is not resolved, the third reviewer (LQ) was invited to make a consensus decision.

Data extraction

Two reviewers (XS and YC) independently extracted the data with a pre-developed data extraction form, which was developed by following the guidance of the critical appraisal and data extraction for systematic reviews of prediction modelling studies (CHARMS) checklist [19]. Extracted the following information from each included study: i) characteristics of the study (e.g., study design, data source); ii) data related to participants (e.g., country or region of participants, age, gender, events per variable (EPV)); iii) details about model development and validation (e.g., type of prediction model, predictors included in the model, modelling method) and model performance. Multiple different models were included in a study, for example, separate models for men and women, separate models for different outcomes (e.g., hip fracture, major osteoporotic fractures (MOF), were included separately. When multiple versions (e.g., with different risk factors) of a model for the same population and outcome were included in a study, the model with the best performance was selected for data extraction. When an article validated multiple models, separate data extraction was performed for each model. Model performance was assessed by discrimination and calibration. Discrimination is often quantified by the C index or area under the receiver operating characteristic curve (AUC). A C index or AUC less than 0.5 suggests no discrimination, 0.5 to 0.7 is poor, 0.7 to 0.8 is acceptable, 0.8 to 0.9 is excellent, and higher than 0.9 is outstanding [20]. Calibration can be visualized by a calibration plot and is usually quantified using the calibration intercept and the calibration slope, with a slope close to 1 and an intercept close to 0 indicating good calibration [21]. The indexes mentioned above were extracted from the publications when available. Sensitivity and specificity were extracted as well if available. Additionally, EPV was calculated to measure model overfitting. An EPV less than 20 was considered as overfitting for model development while less than 100 for model validation [22].

Risk of bias and applicability assessment

The risk of bias and applicability of each included study was independently assessed by two reviewers (ZZ and XS) using the prediction model risk of bias assessment tool (PROBAST) [23,24]. Discrepancies were resolved by consensus between the two reviewers, and a third author (YG) was invited for consensus adjudication in need. For risk of bias assessment, it contains four domains: participants, predictors, outcome, and analysis. Each domain was judged as low, high, or unclear risk of bias. The overall risk of bias was summarized according to the following rules: when all the four domains were judged as “low” risk of bias, the overall risk of bias was “low”; otherwise, “high” or “unclear” risk of bias was graded accordingly [23,24]. For applicability assessment, it contains three domains: participants, predictors, and outcome. It has similar assessment rules and procedures to the risk of bias assessment.

Statistical Analysis

All results were narratively summarized and described without any quantitative synthesis due to variation in predictors and characteristics of participants among the included prediction models. PRISMA flow diagram for literature search and selection.

RESULTS

Study selection

The literature search identified 2852 records, of which 784 were removed due to duplication, and 1882 were excluded based on title and abstract. A total of 186 full texts were assessed, of which 68 articles met the eligibility criteria were included in this review (Fig. 1). In total, 38 articles focused on one or more development of OF risk prediction models, and 44 articles described one or more external validation of OF risk prediction models. Articles frequently concern combinations of development and external validation, leading to the total number of articles does not sum up to 68.
Figure 1.

PRISMA flow diagram for literature search and selection.

Studies focused on the development of OF prediction models

Populations and outcomes

Thirty-eight [10-12,25-59] articles represented the development of 70 different models in total. Most of the participants were from the UK (n=17, 24%), China (n=17, 24%) or the US (n=12, 17%), the remaining (n=24, 34%) were from other countries in Oceania, Western Europe, or East Asia, while there were no models developed using data from Africa, South America, and the Middle East. The average age of participants ranged from 56.7 to 80.5 years. The follow-up duration ranged from 1 to 13 years, with 30 (43%) models equal to or more than 10 years. The outcomes covered MOF (n=35, 50%), hip fracture (n=25, 36%), any fractures (n=6, 9%), and other fractures (n=4, 6%). Diagnosis of fracture was mostly through medical records (n=39, 56%) or self-reported (n=18, 26%), 11 (16%) models were radiographic reports, and the remaining two (3%) models were self-reported and confirmed by medical records (Table 1).
Table 1

Basic characteristics of included studies.

First author, yearModel;No.aStudy designData source;country or region of participantsAge (SD) (years)Female (%)Follow up duration (SD) (year)OutcomeMeasurement of fractureIncidence of fracture (%)Sample size
Model developmentb
Dargent- Molina 2002[25]NR;1REPIDemiologie de l'OSteoporose study; France80.5(3.7)1003.7(0.8)Hip fractureSelf-reported4.06933
Colón- Emeric 2002l [26]NR;4REstablished population for epidemiologic studies of the elderly; USM: 73.4(6.7)F: 74.5(6.6)65.03.0(NR)Any fracturescSelf-reportedHip: 3.8Any: 11.07654
McGrother 2002[27]NR;1PA large general practice; UK77.9(6.1)1003.0(NR)Hip fractureMedical records2.01289
Albertsson 2007[28]FRAMO;1RThree rural primary health care; Sweden78.8(6.5)1002.0(NR)Hip fractureRadiographic reports1.21248
Robbins 2007[29]WHI;1PFemale’s Health Initiative 40 clinical centers; USNR1008.0(1.7)Hip fractureSelf-reported and confirmed by medical records0.193676
Nguyen 2008[12]Garvan;4PDubbo osteoporosis epidemiology study; AustraliaM: 70.0(6.0)F: 71.0(8.0)61.3M: 12.0(NR)F: 13.0(NR)Any fracturesRadiographic reportsM: 17.4F: 31.4M: 858F: 1358
Kanis 2008l [10]FRAX;8PNine population-based cohort studiesd; UK65.068.010.0(NR)MOFeSelf-reported or confirmed by medical recordsHip: 1.8MOF: 7.2273826
Hippisley- Cox 2009[11]QFracture;4PVersion 20 of the QResearch; UKNRNR10.0(NR)MOFMedical recordsHip: 0.4(M)1.2(F)MOF: 1.0(M)3.1(F)M: 1807996F: 1825816
Tanaka 2010l [30]FRISC;2RThree population-based cohort studies; Japan63.4(11.1)1005.3(NR)Any fracturesfMedical records21.42187
Yun 2010k [31]NR, FRAX;4RMedicare current beneficiary survey; UKNRNR2.0(NR)MOFMedical recordsNR12337
Sambrook 2011[32]NR;1RThe global longitudinal Osteoporosis studyg; UKNR1002.0(NR)Hip fractureSelf-reported4.519586
Bow 2011[33]NR;1PMr. and Ms. Os study; China68.0(10.3)03.5(2.9)MOFSelf-reported and confirmed by medical records2.01,810
Henry 2011k [34]FRISK, FRAX, Garvan;4PGeelong osteoporosis study; AustraliaNR1009.6(NR)MOFRadiographic reports20.8600
Tamaki 2011k [35]NR, FRAX;6RPopulation-based cohort study; Japan56.7(9.6)10010.0(NR)MOFRadiographic reportsMOF: 5.3Hip: 0.5815
Hippisley- Cox 2012[36]Updated QFracture;4PVersion 32 of the QResearch; UKNRNR10.0(NR)MOFMedical recordsHip: 0.3(M)0.9(F)MOF: 0.9(M)2.8(F)4726046
LaFleur 2012[37]NR;2PVeterans health administration system; US66.9(10.3)02.8(NR)MOFMedical recordsHip: 0.3MOF: 1.284763
Schousboe 2014[38]NR;1PStudy of osteoporotic Fractures; US75.0100NRVertebral fracturesRadiographic reports20.45560
Yu 2014k [39]FRAX+S, FRAX;16PPopulation-based cohort study; China72.5(5.2)50.010.2MOFMedical recordsHip: 3.3MOF: 14.14000
Iki 2015k [40]FRAX +TBS, FRAX;2PStudy of Fujiwara-kyo Osteoporosis Risk in male; Japan73.0(5.1)04.5(NR)MOFRadiographic reports1.21872
Jang 2016[41]NR;2PHealth and genome study; KoreaM: 61.3(7.1)F: 61.1(7.1)52.77.0(NR)MOFSelf-reportedM:9.9F: 12.3M: 363F: 405
Kim 2016[42]KFRS;2PNational Health Insurance Service; KoreaM: 59.8(7.9)F: 60.6(8.3)48.57.0(NR)MOFMedical recordsM: 1.3F: 4.3M: 370225F: 348253
Francesco 2017l [43]FRA-HS;2PIMS health longitudinal study; Italy60.1(12.8)55.010.0(NR)MOFMedical records5.9490013
Kruse 2017[44]NR;2RHealth database; DenmarkNR86.15.0(NR)Hip, femoral fracturesMedical records6.6(M/F)M: 717F: 4722
Li 2017[45]NR;1PGlobal longitudinal study of osteoporosis in female 3-year cohort; Canada69.4(8.9)1003.0(NR)MOFSelf-reported4.03985
Su 2017[46]NR;2PMr. and Ms. Os study; ChinaM: 72.4(NR)F: 72.6(NR)50.3M: 9.9(2.8)F: 8.8(1.5)MOFMedical recordsM: 6.6F: 11.0M: 1923F: 1950
Weycker 2017[47]NR;2RStudy of osteoporotic fractures; USNR1001.0(NR)Any fractureshSelf-reportedHip: 2.2Non vertebral: 6.62,499
Sundh 2017k [48]FRAX+MS, FRAX;2PPopulation-based cohort study; SwedenNR10010.0(NR)MOFMedical records16.3412
Biver 2018k [50]NR, FRAX;2PGeneva retirees cohort study; Switzerland65.0(1.4)1005.0(1.8)MOFSelf-reported19.1740
Reber 2018[49]NR;1RSocial insurance for agriculture, forestry and horticulture; Germany75.4(6.3)48.82.0(NR)MOFMedical records2.6298530
Su 2018k [52]FRAX+Fall, FRAX;4PMr. and Ms. Os study; ChinaM: 72.4(NR)F: 72.6(NR)50.0M: 9.9(2.8)F: 8.8(1.5)MOFMedical recordsM: 7.0F: 11.8M: 2000F: 2000
Rubin 2018[51]FREM;2PNational registers data; DenmarkNR51.910.0(NR)MOFMedical recordsM: 0.6F: 1.4M: 12011143F: 1294206
Su 2019(1)k [53]NR, FRAX;3POsteoporotic fractures in men; China73.6(5.9)08.6(2.5)Hip fractureSelf-reported or confirmed by radiographic reports2.95977
Engels 2020[54]NR;1RAdministrative claims data; Germany75.7(6.20)48.84.0(NR)Hip fractureMedical records0.678074
Kong 2020[55]NR;1PHealth and genome Study; Korea61.2(8.7)56.47.5(1.6)MOFSelf-reported or confirmed by radiographic reports25.62227
Sheer 2020[56]NR;1RHumana research; US74.3(NR)56.01.0(NR)MOFMedical records or self-reported6.61287354
Wu 2020[57]NR;1POsteoporotic fractures in men Study; USNR0NRMOFRadiographic reports8.85130
Lu 2021m [58]GSOS, FRAX;6RFive population-based cohort studiesi; UK, US, Sweden, ChinaNR54.0NRMOFMedical records or radiographic reportsHip: 2.5MOF: 6.0431621
de Vries 2021[59]NR;1RPopulation-based cohort study; Netherlands68.0(NR)74.05.0(NR)MOFMedical records11.07578
Model validationn
Ensrud 2009[60]FRAX;4PStudy of osteoporotic fractures; US71.3(5.1)1009.2(1.8)MOFSelf-reported and confirmed by radiographic reportsHip: 6.2MOF: 16.66252
Hundrup 2010[61]WHI;1PDanish Nurse Cohort Study; Denmark61.0(6.9)1005.0(NR)Hip fractureMedical records0.913353
Leslie 2010[62]FRAX;4RManitoba bone density program; CanadaM: 68.2(10.1)F: 65.7(9.8)92.710.0(NR)MOFMedical recordsHip: 1.4MOF: 6.439603
Sornay- Rendu 2010[63]FRAX;2POs des femmes de Lyon cohort; France58.8(10.3)10010.0(NR)MOFSelf-reported and confirmed by radiographic reportsMOF: 13.4867
Trémollieres 2010[64]FRAX;1PMenopause et Os cohort study; US54.0(4.0)10013.4(1.4)MOFSelf-reported and confirmed by radiographic reports6.62196
Bolland 2011[65]FRAX, Garvan;6PPopulation-based cohort study; New Zealand74.2(4.2)1008.8(2.4)Any fracturesjSelf-reportedHip: 4.0FRAX: 16.1Garvan: 19.61422
Langsetmo 2011[66]Garvan;4POsteoporosis epidemiology study; CanadaM: 67.6(7.6)F: 67.7(7.6)72.1M: 8.3(NR)F: 8.6(NR)MOFSelf-reportedHip: NR(M/F)MOF: 7.2(M)14.0(F)M: 1606F: 4152
Pressman 2011[67]FRAX;2RPopulation-based cohort study; USNR1006.6(NR)Hip fractureMedical records1.794489
Tanaka 2011[68]FRISC;1RPopulation-based cohort study; Japan63.3(10.8)10010.0(NR)MOFRadiographic reports18.4765
Collins 2011[69]QFracture;4PHealth improvement network database; UKM: 47.0(NR)F: 48.0(NR)50.610.0(NR)MOFMedical recordsMOF: 0.1(M)0.3(F)Hip: 0.1(M/F)M: 1108219F: 1136417
Fraser 2011[70]FRAX;4PMulti-centre osteoporosis study; CanadaM: 65.3(9.1)F: 65.8(8.8)40.210.0(NR)MOFSelf-reported and confirmed by a doctorMOF: 6.4(M)12.0(F)Hip: 2.4(M)2.7(F)6697
Azagra 2012[71]FRAX;4PFracture risk factors and bone densitometry type central dual X-ray cohort; Spain56.8(8.0)10010.0(NR)MOFSelf-reported and confirmed by medical recordsMOF: 8.4Hip: 2.2770
Cheung 2012[72]FRAX;4PMr. and Ms. Os study; China62.1(8.5)1004.5(2.8)MOFSelf-reported and confirmed by medical recordsMOF: 4.7Hip: 0.92266
González- Macías 2012[73]FRAX;2PEcografía Oseaen Atención Primaria cohort study; Italy72.3(5.3)1003.0(NR)MOFRadiographic reportsHip: 1.0MOF: 3.85201
Briot 2013[74]FRAX;2POsteoporosis and ultrasound study; Germany74.2(NR)1006.0(NR)MOFSelf-reported and confirmed by radiographic reportsMOF: 4.91748
Czerwiński 2013[75]FRAX;1RCra cow Medical Centre data; Poland63.8(6.7)10011.0(NR)MOFSelf-reported22.15092
Cordomí 2013[76]FRAX;1RCentre for technical studies with radioactive isotopes; Spain56.8(7.8)10011.0(NR)MOFSelf-reported18.11231
Ettinger 2013[77]FRAX;4ROsteoporotic fractures in men study; US73.5(5.8)08.4(2.3)MOFMedical recordsHip: 2.7MOF: 6.45891
Rubin 2013[78]FRAX;1PPopulation-based cohort study; Denmark64.0(13.0)1003.0(NR)MOFMedical records4.03614
Ahmed 2014[79]Garvan;4RTromsø study; AustraliaNR54.7M: 7.1(NR)F: 6.9(NR)MOFMedical recordsM: 1.2F: 3.22992
Friis- Holmberg 2014[80]FRAX;4PHealth examination survey; DenmarkM: 58.3(10.6)F: 56.8(10.2)59.24.3(NR)MOFMedical recordsHip: 0.4MOF: 3.112758
Van Geel 2014[81]FRAX,Garvan;7PTen general practice centers cohort study; Netherlands67.8(5.8)1005.0(NR)MOFSelf-reported and confirmed by radiographic reportsHip: 1.2MOF: 9.5506
Klop 2016[82]FRAX;2RClinical practice research Datalink cohort study; UK62.9(11.4)67.89.0(NR)MOFMedical recordsHip: 1.4MOF: 5.038755
Orwoll 2017[83]FRAX;6ROsteoporotic fractures in men study; Sweden, US, China75.0(3.0)o74.0(6.0)p72.0(5.0)q010.6(NR)o8.6(NR)p9.8(NR)qMOFMedical records or radiographic reportsHip: 6.8o, 3.2p, 3.1qMOF: 16.4o, 7.2p, 3.1q2542o1469p1476q
Dagan 2017[84]QFracture, FRAX,Garvan;6RElectronic health record; IsraelNR54.64.7(NR)MOFMedical recordsMOF: 7.7Hip: 2.71054815
Holloway 2018[85]FRAX;2PGeelong osteoporosis study; Australia70.0(NR)09.5(NR)MOFRadiographic reportsHip: 2.4MOF: 8.5591
Crandall 2019[86]FRAX, Garvan;4PWomen’s Health Initiative observational study; US57.9(4.1)10010.0(NR)MOFMedical records or self-reportedHip: 0.7MOF: 8.4Hip: 62723MOF: 63621
Holloway- Kew 2019[87]FRAX, Garvan;8PGeelong osteoporosis Study; AustraliaM: 69.0(NR)F: 71.0(NR)49.610.0(NR)MOFRadiographic reportsM: 8.9F: 14.2M: 821F: 809
Su 2019(2)[88]FRAX+TBS, FRAX;4PMr. and Ms. Os study; ChinaM: 72.3(4.9)F: 72.5(5.3)50.3M: 9.9(2.8)F: 8.8(1.5)MOFMedical records or self-reportedM: 6.6F: 11.0M: 1923F: 1950
Tamaki 2019[89]FRAX+TBS, FRAX;4PPopulation-based cohort study; Japan58.1(10.6)10010.0(NR)MOFRadiographic reports4.31541

F: female; FRA-HS: fracture health search; FRAMO: fracture and mortality index; FRAX: fracture risk assessment tool; FREM: fracture risk evaluation model; FRISC: fracture and immobilization score; FRISK: fracture risk; Garvan: Garvan Fracture Risk Calculator; gSOS: genomic speed of sound; KFRS: Korean fracture risk score; M: male; NR: not reported; MOF: major osteoporotic fracture; MST: mandibular sparse trabeculation; P: prospective cohort study; R: retrospective cohort study; S: sarcopenia; TBS: trabecular bone score; WHI: women's health initiative;

Naming of models or tools, and No. refers to the number of models that were developed or the number of times models was externally validated in the article.

Development of new model;

Included hip, vertebrae (symptomatic), wrist, meta-carpal, humerus, scapula, clavicle, distal femur, proximal tibia, patella, pelvis and sternum;

Included the Rotterdam Study, The European Vertebral Osteoporosis Study (later the European Prospective Osteoporosis Study), The Canadian Multicentre Oosteoporosis Study (CaMos), Rochester, Sheffield, Dubbo, a cohort from Hiroshima and two cohorts from Gothenburg;

Included hip, wrist, vertebral, forearm or humerus fractures;

Included hip fracture, surgical neck fracture of the humerus, distal forearm fracture, or clinical vertebral fracture;

Included Australia, Belgium, Canada, France, Germany, Italy, The Netherlands, Spain, the United Kingdom, and the United States;

Included ankle, clavicle, elbow, face, foot, finger, hand, heel, hip, humerus, knee, lower leg, pelvis, rib, toe, upper leg, or wrist fractures;

Included the UK Biobank, the United States-based Osteoporotic Fractures in Men Study, the Sweden-based Osteoporotic Fractures in Men Study, the Study of Osteoporotic Fractures, and the China Kadoorie Biobank;

FRAX-defined osteoporotic fractures were fractures of the shoulder, hip, or forearm and clinical vertebral fractures; Garvan-defined osteoporotic fractures were fractures of the hip, vertebrae (symptomatic), forearm, metacarpal, humerus, scapula, clavicle, distal femur, proximal tibia, patella, pelvis, or sternum

The study not only developed new models, but also externally verified the existing models.

The study developed and externally verified new models.

The study not only developed and externally verified new models, but also externally verified the existing models.

External validation of existing model;

Sweden.

US.

China.

Basic characteristics of included studies. F: female; FRA-HS: fracture health search; FRAMO: fracture and mortality index; FRAX: fracture risk assessment tool; FREM: fracture risk evaluation model; FRISC: fracture and immobilization score; FRISK: fracture risk; Garvan: Garvan Fracture Risk Calculator; gSOS: genomic speed of sound; KFRS: Korean fracture risk score; M: male; NR: not reported; MOF: major osteoporotic fracture; MST: mandibular sparse trabeculation; P: prospective cohort study; R: retrospective cohort study; S: sarcopenia; TBS: trabecular bone score; WHI: women's health initiative; Naming of models or tools, and No. refers to the number of models that were developed or the number of times models was externally validated in the article. Development of new model; Included hip, vertebrae (symptomatic), wrist, meta-carpal, humerus, scapula, clavicle, distal femur, proximal tibia, patella, pelvis and sternum; Included the Rotterdam Study, The European Vertebral Osteoporosis Study (later the European Prospective Osteoporosis Study), The Canadian Multicentre Oosteoporosis Study (CaMos), Rochester, Sheffield, Dubbo, a cohort from Hiroshima and two cohorts from Gothenburg; Included hip, wrist, vertebral, forearm or humerus fractures; Included hip fracture, surgical neck fracture of the humerus, distal forearm fracture, or clinical vertebral fracture; Included Australia, Belgium, Canada, France, Germany, Italy, The Netherlands, Spain, the United Kingdom, and the United States; Included ankle, clavicle, elbow, face, foot, finger, hand, heel, hip, humerus, knee, lower leg, pelvis, rib, toe, upper leg, or wrist fractures; Included the UK Biobank, the United States-based Osteoporotic Fractures in Men Study, the Sweden-based Osteoporotic Fractures in Men Study, the Study of Osteoporotic Fractures, and the China Kadoorie Biobank; FRAX-defined osteoporotic fractures were fractures of the shoulder, hip, or forearm and clinical vertebral fractures; Garvan-defined osteoporotic fractures were fractures of the hip, vertebrae (symptomatic), forearm, metacarpal, humerus, scapula, clavicle, distal femur, proximal tibia, patella, pelvis, or sternum The study not only developed new models, but also externally verified the existing models. The study developed and externally verified new models. The study not only developed and externally verified new models, but also externally verified the existing models. External validation of existing model; Sweden. US. China.

Sample size

The sample size of included models ranged from 405 to 12,011,134, and the incidence of fracture ranged from 0.1% to 31.4%. The EPV ranged from 0.1 to 6,613.3. Of the 70 models, 30 (43%) had an EPV less than 20, indicating the existence of over-model fitting (Table 1 andTable 2).
Table 2

Information related to predictive model of included studies.

AuthorType of predictive modelEPVNo. of included predictorsModeling methodType of validationPerformancea (95% CI, if reported)
AUC/C indexSensitivitySpecificityCalibration
WHI (women's health initiative)
Robbins 2007[29]Development and internal validationb27.310Cox’s proportional hazardsCross validation0.80(0.77 to 0.82)NRNR P=0.20h
Hundrup 2010[61]External validationc12.210Logistic regressionGeographical validation0.820.690.801.08i
FRAMO (fracture and mortality index)
Albertsson 2007[28]Development only1.44Cox’s proportional hazardsNA0.72(0.64 to 0.81)0.810.64NR
Garvan (Garvan Fracture Risk Calculator)
Nguyen 2008[12]Development and internal validationM: 11.5F: 32.84Cox’s proportional hazardsBootstrappingModel 1: 0.75(M/F)Model 2: 0.74(M), 0.72(F)NRNR0.01 to 0.02j
Bolland 2011[65]External validationHip: 11.4MOF: 55.85NRGeographical validationHip: 0.67 (0.60-0.75)kMOF: 0.64 (0.60-0.67)kNRNR P<0.01h
Henry 2011[34]External validation25.05NRGeographical validation0.70(0.65 to 0.75)NRNRNR
Langsetmo 2011[66]External validationHip: NR(M/F)MOF: 29.0(M)145.8(F)4Cox’s proportional hazardsGeographical validationHip: 0.85(M), 0.80(F)MOF: 0.69(M), 0.70(F)NRNRNR
Van Geel 2014[81]External validationHip: 1.5MOF: 12.05NRGeographical validationModel 1: 0.70(hip), 0.70(MOF)Model 2: NR(hip), 0.65(MOF)NRNRNR
Ahmed 2014[79]External validation71.25NRGeographical validationModel 1: 0.61(M), 0.62(F)Model 2: 0.57(M), 0.58(F)NRNRNR
Dagan 2017[84]External validationHip: 5618.2MOF: 16312.85NRGeographical validationHip: 0.78kMOF: NRkHip: 0.57MOF: NRHip: 0.81MOF: NR0.68i
Crandall 2019[86]External validationHip: 87.8MOF: 1068.84Logistic regressionGeographical validationHip: 0.57(0.55 to 0.60)MOF: 0.57(0.57 to 0.58)Hip: 0.81MOF: 0.16Hip: 0.31MOF: 0.94NR
Holloway- Kew 2019[87]External validationM: 3.4F: 8.45Logistic regressionGeographical validationModel 1: 0.68(0.63 to 0.73)(M)0.70(0.65 to 0.74)(F)Model 2: 0.67(0.62 to 0.72)(M)0.67(0.62 to 0.71)(F)NRNRNR
FRAX (fracture risk assessment tool)
Kanis 2008[10]Development and external validationdHip: 77.3MOF: 301.611Poisson regressionGeographical validationHip: 0.66n, 0.74oMOF: 0.60n, 0.62oNRNRNR
Ensrud 2009[60]External validationHip: 35.4MOF: 94.311Logistic regressionGeographical validationHip: 0.71n, 0.75oMOF: 0.61n, 0.68oNRNRNR
Leslie 2010[62]External validationHip: 49.9MOF: 231.211Cox’s proportional hazardsGeographical validationHip: 0.79(0.78 to 0.81)n0.83(0.82 to 0.85)oMOF: 0.66(0.65 to 0.67)n0.69(0.68 to 0.71)oNRNRHip: 0.92(M)1.03(F)iMOF: 1.24)(M)1.13(F)i
Sornay- Rendu 2010[63]External validationMOF: 1.511NRGeographical validation0.75(0.71 to 0.79)n0.78(0.72 to 0.82)oNRNRNR
Trémollieres 2010[64]External validation13.211Cox’s proportional hazardsGeographical validation0.63(0.56 to 0.69)oNRNRNR
Yun 2010[31]External validationHip: 17.0MOF: 39.111Logistic regressionGeographical validationHip: 0.64(0.60 to 0.68)oMOF: 0.55(0.53 to 0.58)oNRNRNR
Bolland 2011[65]External validationHip: 5.2MOF: 20.811NRGeographical validationHip: 0.69 (0.63 to 0.76)n,0.70 (0.64 to 0.77)o,MOF: 0.62 (0.58 to 0.66)n0.64 (0.60 to 0.68)oNRNRHip: P=0.18h,n P<0.01h,oMOF: P<0.01h
Pressman 2011[67]External validation143.511Logistic regressionGeographical validation0.83(0.82 to 0.84)n0.84(0.83 to 0.85)oNRNRNR
Henry 2011[34]External validation11.411NRGeographical validation0.66(0.61 to 0.71)n0.68(0.63 to 0.73)oNRNRNR
Tamaki 2011[35]External validationHip: 3.9MOF: 0.411Logistic regressionGeographical validationHip: 0.86(0.68 to 1.00)n 0.88(0.73 to 1.00)oMOF: 0.67(0.59 to 0.75)n 0.69(0.61 to 0.76)oNRNRNR
Fraser 2011[70]External validationHip: 15.9MOF: 63.211Cox’s proportional hazardsGeographical validationHip: 0.77(0.73 to 0.80)n 0.80(0.77 to 0.83)oMOF: 0.66(0.63 to 0.68)n0.69(0.67 to 0.7)oNRNRHip: 1.83(M)0.93(F)iMOF: 1.26(M)1.07(F)i
Azagra 2012[71]External validationHip: 1.5MOF: 5.911NRGeographical validationHip: 0.89n, 0.85oMOF: 0.69n, 0.72oNRNR P>0.05h
Cheung 2012[72]External validationHip: 1.9MOF: 9.611Cox’s proportional hazardsGeographical validationHip: 0.90(0.83 to 0.97)n 0.88(0.82 to 0.94)oMOF: 0.71(0.66 to 0.76)n 0.73(0.68 to 0.80)oNRNRNR
González- Macías 2012[73]External validationHip: 5.0MOF: 18.311NRGeographical validationHip: 0.64oMOF: 0.62oNRNRNR
Briot 2013[74]External validation7.711Logistic regressionGeographical validation0.62(0.56 to 0.68)n0.66(0.60 to 0.73)oNRNRNR
Czerwiński 2013[75]External validation29.511NRGeographical validation0.59(0.54 to 0.64)oNRNRNR
Cordomí 2013[76]External validationMOF: 20.211NRGeographical validation0.61(0.57 to 0.65)oNRNRNR
Ettinger 2013[77]External validationHip: 14.6MOF: 34.011Logistic regressionGeographical validationHip: 0.69n, 0.77oMOF: 0.63n, 0.67oNRNRNR
Rubin 2013[78]External validation15.610oCox’s proportional hazardsGeographical validation0.72(0.69, 0.76)nNRNRNR
Friis- Holmberg 2014[80]External validationHip: 4.9MOF: 35.911Cox’s proportional hazardsGeographical validationMOF: 0.67(0.61 to 0.73)o(M)0.72(0.69 to 0.75)o(F)Hip: 0.72(0.60 to 0.84)o(M)0.86(0.81 to 0.92)o(F)NRNRNR
Van Geel 2014[81]External validationHip: 0.5MOF: 4.411NRGeographical validationHip: 0.70oMOF: 0.65n, 0.69oNRNRNR
Yu 2014[39]External validationHip: 12.0MOF: 51.311Cox’s proportional hazardsGeographical validationHip: 0.70n(M), 0.76o(M)0.73n(F), 0.76o(F)MOF: 0.61n(M), 0.64o(M)0.60n(F), 0.62o(F)NRNRNR
Iki 2015[40]External validation2.811Logistic regressionGeographical validation0.68(0.59 to 0.78)oNRNRNR
Klop 2016[82]External validationHip: 48.7MOF: 175.010oLogistic regressionGeographical validationHip: 0.83nMOF: 0.71nNRNR1.02i
Orwoll 2017[83]External validationNR11Logistic regressionGeographical validationHip: 0.72p, 0.78q, 0.74rMOF: 0.65p, 0.65q, 0.69rNRNRNR
Sundh 2017[48]External validation7.110oNRGeographical validation0.75(0.70 to 0.81)nNRNRNR
Dagan 2017[84]External validationHip: 2553.7MOF: 7414.911NRGeographical validationHip: 0.82oMOF: 0.71oHip: 0.66MOF: 0.47Hip: 0.81MOF: 0.820.94i
Biver 2018[50]External validation12.811Cox’s proportional hazardsGeographical validation0.71oNRNRNR
Su 2018[52]External validationM: 12.6F: 21.511Cox’s proportional hazardsGeographical validationM: 0.69(0.64 to 0.73)oF: 0.61(0.58 to 0.65)oNRNRNR
Holloway 2018[85]External validationHip: 1.3MOF: 4.511NRGeographical validationHip: 0.74oMOF: 0.85oHip: 0.57MOF: 0.02Hip: 0.84MOF: 0.99NR
Crandall 2019[86]External validationHip: 39.9MOF: 485.810oLogistic regressionGeographical validationHip: 0.64(0.61 to 0.66)nMOF: 0.58(0.57 to 0.59)nHip: 0.81MOF: 0.59Hip: 0.81MOF: 0.68NR
Holloway- Kew 2019[87]External validationM: 7.3F: 11.510oLogistic regressionGeographical validationM: 0.70(0.65 to 0.76)n0.72(0.67 to 0.78)oF: 0.74(0.69 to 0.78)n0.75(0.71 to 0.80)oNRNRNR
Su 2019(1)[53]External validation17.310oCox proportional hazardGeographical validation0.70(0.67 to 0.74)n0.620.78NR
Su 2019(2)[88]External validationM: 11.5F: 19.511Cox proportional hazardGeographical validationM: 0.68(0.63 to 0.73)oF: 0.63(0.59 to 0.67)oNRNRNR
Tamaki 2019[89]External validation6.111Logistic regressionGeographical validation0.67(0.61 to 0.73)n0.68(0.62 to 0.74)oNRNRNR
Lu 2021[58]External validationHip: 776.0MOF: 1862.411Cox’s proportional hazardsGeographical validationMOF: 0.76(0.75 to 0.76)oHip: 0.81(0.80 to 0.81)oNRNRNR
FRAX+S (fracture risk assessment tool and sarcopenia)
Yu 2014[39]Development onlyeHip: 11.0MOF: 47.112Cox’s proportional hazardsNAHip: 0.73n, 0.78o(M)0.73n, 0.75o(F)MOF: 0.62n, 0.66o(M)0.60n, 0.62o(F)NRNRNR
FRAX+TBS (fracture risk assessment tool and trabecular bone score)
Iki 2015[40]Development only0.112Logistic regressionNA0.68(0.57 to 0.80)oNRNRNR
Su 2019(2)[88]External validationM: 10.6F: 17.812Cox proportional hazardGeographical validationM: 0.69(0.65 to 0.74)oF: 0.63(0.59 to 0.67)oNRNRNR
Tamaki 2019[89]External validation5.612Logistic regressionGeographical validation0.68(0.62 to 0.74)n0.68(0.62 to 0.74)oNRNRNR
FRAX+MST (fracture risk assessment tool and mandibular sparse trabeculation)
Sundh 2017[48]Development only5.911oNRNA0.75(0.70 to 0.81)nNRNRNR
FRAX+FALL (fracture risk assessment tool and history of falls)
Su 2018[52]Development onlyM: 11.6F: 19.712Cox’s proportional hazardsNAM: 0.69(0.65 to 0.74)oF: 0.61(0.58 to 0.65)oNRNRNR
QFracture
Hippisley- Cox 2009[11]Development and internal validationHip: 161.4(M)489.6(F)MOF: 417.6(M)1281.6(F)M: 12F: 17Cox’s proportional hazardsTraining test splitHip: 0.86(0.85 to 0.86)(M)0.89(0.89 to 0.89)(F)MOF: 0.69(0.68 to 0.69)(M)0.79(0.79 to 0.79)(F)NRNR0.99i
Collins 2011[69]External validationHip: 274.8(M)833.2(F)MOF: 559.4(M)1732.3(F)M: 12F: 17NRGeographical validationHip: 0.86(M), 0.89(F)MOF: 0.74(M), 0.82(F)NRNRHip: 0.01(M) 0.01(F)jMOF: 0.01(M)0.03(F)j
Updated QFracture
Hippisley- Cox 2012[36]Development and internal validationHip: 166.6(M)479.5(F)MOF: 461.2(M)1467.0(F)M: 26F: 25Cox’s proportional hazardsTraining test splitHip: 0.88(0.87 to 0.88 )(M)0.89(0.89 to 0.90) (F)MOF: 0.71(0.70 to 0.72) (M)0.79(0.79 to 0.79) (F)Hip: 0.64(M)0.60(F)MOF: 0.37(M)0.35(F)NR P>0.05h
Dagan 2017[84]External validationHip: 906.2MOF: 2631.131NRGeographical validationHip: 0.88MOF: 0.71Hip: 0.70 MOF: 0.46Hip: 0.81MOF: 0.820.60i
FRISC (fracture and immobilization score)
Tanaka 2010[30]Development and external validation23.95Poisson regressionGeographical validation0.73(0.66 to 0.79)NRNR P=0.17h
Tanaka 2011[68]External validation28.25Cox’s proportional hazardsGeographical validation0.73(0.69 to 0.78)NRNRNR
FRISK (fracture risk)
Henry 2011[34]Development only25.05NRNA0.66(0.60 to 0.71)59.20.65NR
KFRS (Korean fracture risk score)
Kim 2016[42]Development and internal validationM: 543.2F: 1661.29Cox’s proportional hazardsTraining test splitM: 0.68, F: 0.65NRNR1.00i
FRA-HS (fracture health search)
Francesco 2017[43]Development and external validation6613.39Cox’s proportional hazardsGeographical validation0.85NRNR1.00(0.83 to 1.18)i
FREM (fracture risk evaluation model)
Rubin 2018[51]Development and internal validationM: 2.3F: 5.7M: 44F: 39Logistic regressionTraining test splitM: 0.75(0.74 to 0.76)F: 0.75(0.74 to 0.80)NRNR0.01j
GSOS (genomic speed of sound)
Lu 2021[58]Development, internal and external validationfMOF: <0.1Hip: <0.121717Cox’s proportional hazardsTraining test split, geographical validationMOF: 0.73(0.73 to 0.74)Hip: 0.80(0.79 to 0.81)NRNRNR
Models without a specific name
Dargent- Molina 2002[25]Development onlyNR5Cox’s proportional hazardsNANR0.370.85NR
Colón- Emeric 2002[26]Development and external validationHip: 11.7Any: 33.7Hip: 7Any: 6Logistic regressionGeographical validationgHip: 0.75Any: 0.57NRNRNR
McGrother 2002[27]Development and internal validation1.46Cox’s proportional hazardsCross validation0.820.67(0.54 to 0.80)0.68(0.65 to 0.72)NR
Yun 2010[31]Development onlyNRNRLogistic regressionNAHip: 0.74(0.70 to 0.77)MOF: 0.71(0.69 to 0.73)NRNRNR
Sambrook 2011[32]Development onlyNR2Cox’s proportional hazardsNA0.78NRNRNR
Bow 2011[33]Development only1.17Cox’s proportional hazardsNA0.82NRNRNR
Tamaki 2011[35]Development onlyHip: 0.4MOF: 3.93Logistic regressionNAHip: 0.90(0.77 to 1.00)MOF: 0.71(0.63 to 0.79)NRNRNR
LaFleur 2012[37]Development and internal validationNRHip: 10MOF: 12Cox’s proportional hazardsBootstrappingHip: 0.81MOF: 0.740.840.75NR
Schousboe 2014[38]Development and internal validation172.17Logistic regressionBootstrapping0.69NRNR P>0.05h
Jang 2016[41]Development onlyM: 4.0F: 5.6M: 5F: 7Logistic regressionNAM: 0.74, F: 0.73NRNR P>0.05h
Kruse 2017[44]Development and internal validationM: <0.1F: 0.2M: 9F: 11Machine learningBootstrappingM: 0.89(0.82 to 0.95)F: 0.91(0.88 to 0.93)M: 0.69F: 0.88M: 0.69F: 0.81NR
Li 2017[45]Development only11.55Cox’s proportional hazardsNA0.71NRNRNR
Su 2017[46]Development onlyM: 21.0F: 35.82Poisson regressionNAM: 0.67(0.62 to 0.71)F: 0.58(0.55 to 0.62)M: 0.64F: 0.69M: 0.74F: 0.42NR
Weycker 2017[47]Development onlyNRHip: 5Non vertebral: 7Cox’s proportional hazardsNAHip: 0.71(0.67 to 0.76)Non vertebral: 0.62(0.59 to 0.65)NRNR P=0.41h
Biver 2018[50]Development only8.312Cox’s proportional hazardsNA0.76NRNRNR
Reber 2018[49]Development and internal validation436.93Cox’s proportional hazardsTraining test split0.70(0.69 to 0.71)NRNRNR
Su 2019(1)[53]Development and internal validationModel 1: 57.3Model 2: 13.2Model 1: 3Model 2: 13Machine learningCross validationModel 1: 0.71(0.68 to 0.75)Model 2: 0.73(0.69 to 0.76)NRNRNR
Engels 2020[54]Development and internal validation80.623Machine learningTraining test split0.70(0.68 to 0.71)NRNR0.03j
Kong 2020[55]Development and internal validation19.921Machine LearningCross validation0.69NRNRNR
Sheer 2020[56]Development and internal validation1896.56Cox’s proportional hazardsTraining test split0.71NRNRNR
Wu 2020[57]Development and internal validation0.41115Machine learningCross validation0.71NRNRNR
de Vries 2021[59]Development and internal validation18.38Cox’s proportional hazardsCross validation0.70(0.66 to 0.73)NRNRNR

AUC: area under receiver operating characteristic curve; EPV: events per variable; M: male; MOF: major osteoporotic fracture; NA: not applicable; NR: not reported;

Performance is given for the strongest form of validation reported;

Development and internal validation refers to the study developed and internally validated the new model;

External validation refers to the study only externally validated the existing model;

Development and external validation refers to the study developed and externally validated the new model;

Development only refers to the study only developed the new model;

Development, internal and external validation refers to the study developed, internally and externally validated the new model;

External validation in different population only;

Pvalue refers to the results of Hosmer-Lemeshow test;

Refers to value of calibration slope;

Refers to value of calibration intercept;

The type of model used is not reported;

Without bone mineral density;

With bone mineral density;

Sweden;

US;

China;

Predictors

The number of predictors included in development models ranged from 2 to 21,717 (2 models did not report related information). Most models contained less than 15 predictors (n=55, 79%), while three (4%) models included more than 100 predictors (Table 2). Most models (n=31, 44%) contained some similar predictors, including age, prior fractures, and body mass index (BMI). Other commonly selected predictors were smoking status (n=35, 50%), BMD (n=31, 44%), alcohol use (n=30, 43%), rheumatoid arthritis (n=28, 40%). Sex was included in 25 (36%) models. However, most models were sex-specific, with 23 (33%) models for males only while 31 (44%) for females only. All three models with more than 100 predictors included single nucleotide polymorphisms (SNPs) as predictors (Supplementary table 2).

Modelling

Most prediction models (n=42, 60%) were developed using Cox proportional hazards regression, followed with Logistic regression (n=12, 17%), machine learning (n=7, 10%), and Poisson regression (n=7, 10%), while the remaining two (3%) did not report related information.

Performance

Sixty-nine (99%) models reported information about discrimination, with AUC or C index ranging from 0.60 to 0.91. To be specific, two (3%) models showed outstanding discrimination, nine (13%) showed excellent discrimination, 39 (57%) showed acceptable discrimination, and 20 (57%) showed poor discrimination. Calibration was reported among 25 (36%) models, with all of them being judged as good fitness. Calibration was assessed using Hosmer-Lemeshow test (n=11, 16%), the calibration slope (n=7, 10%), and the calibration intercept (n=7, 10%). Thirty-three (47%) models were internally validated using training test split (n=17), bootstrapping (n=9), and cross validation (n=7). It is worth noting that only four (6%) models used suitable methods for both internal validation (using bootstrapping or cross validation) and calibration calculation (using calibration slope or calibration intercept) (Table 2). Information related to predictive model of included studies. AUC: area under receiver operating characteristic curve; EPV: events per variable; M: male; MOF: major osteoporotic fracture; NA: not applicable; NR: not reported; Performance is given for the strongest form of validation reported; Development and internal validation refers to the study developed and internally validated the new model; External validation refers to the study only externally validated the existing model; Development and external validation refers to the study developed and externally validated the new model; Development only refers to the study only developed the new model; Development, internal and external validation refers to the study developed, internally and externally validated the new model; External validation in different population only; Pvalue refers to the results of Hosmer-Lemeshow test; Refers to value of calibration slope; Refers to value of calibration intercept; The type of model used is not reported; Without bone mineral density; With bone mineral density; Sweden; US; China;

Model presentation

Only 39 (56%) models provided model presentation as a web calculator, nomogram, or risk score of each predictor to allow practical use, while the remaining 31 (44%) models did not offer related information.

Risk of bias and applicability

All 70 models were judged as high overall risk of bias. Respectively 31 (44%) and 10 (14%) models had an unclear and high risk of bias in the outcome domain. Mainly because it is unclear whether a prespecified or standard outcome definition or subjective outcome measures (e.g., self-reported) had been used. All models (n=70, 100%) were at high risk of bias for the analysis domain, which is commonly due to the risk of overfitting caused by an insufficient number of cases, or categorization of continuous predictors. In addition, the calibration of many models was not assessed or was not assessed correctly (e.g., using Hosmer-Lemeshow test). In terms of applicability, 44 (63%) models had a low concern while the remaining 26 (37%) had a high concern. The most common concern about applicability was the outcome domain, which focused on hip fracture. The models focused on predicting hip fracture may not accurately predict all osteoporosis fractures. Details on the risk of bias and applicability assessments are presented inFigure 2 and Supplementary table 3.
Figure 2.

Summary results on risk of bias and applicability assessment (using PROBAST) of development of osteoporotic fracture prediction models.

Studies focus on external validation of OF prediction model

In 44 articles [10,26,30,31,34,35,39,40,43,48,50,52,53,58,60-89], 138 external validations were performed. However, most (n=48, 69%) of the 70 developed models has never been externally validated. Out of the 22 (31%) models externally validated, 15 (21%) were validated once, and five (7%) were validated more than five times (range: 5 to 37). The most commonly validated models were FRAX with BMD (for MOF) (n=37, 27%) and FRAX with BMD (for hip fracture) (n=23, 17%) (Table 1 andTable 2). Summary results on risk of bias and applicability assessment (using PROBAST) of development of osteoporotic fracture prediction models.

Study populations and outcomes

All the external validations were conducted in a different geographical area from the development study. Most of the participants were from China (n=29, 21%), US (n=27, 20%) or UK (n=16, 12%), with the remaining (n=66, 48%) from countries in Oceania, Western Europe or East Asia. It is worth noting that no external validation was conducted among participants from Africa, South America, and the Middle East. Most models (n=109, 79%) were sex-specific, with 76 (55%) being validated for female, and 33 (24%) for male. The average age of participants ranged from 54 to 75 years. The outcomes included MOF (n=84, 61%), hip fracture (n=50, 36%) and any fractures (n=4, 9%). Diagnosis of fracture was mostly through medical records (n=57, 41%), following with self-reported (n=28, 20%), self-reported with another confirmation method (n=28, 20%) and radiograph reports (n=25, 18%) (Table 1). The sample size ranged from 412 to 1,136,417, and the incidence of fracture ranged from 0.1% to 22.1%. The EPV ranged from 0.1 to 16,312.8, and 114 (83%) models were less than 100, indicating the existence of over model fitting (Table 1 andTable 2). The discrimination of 136 (99%) models was reported as an AUC or C index (range: 0.55 to 0.90). Among them, one (1%) showed outstanding discrimination, 23 (15%) showed excellent discrimination, 45 (38%) showed acceptable discrimination, and 67 (38%) showed poor discrimination. Calibration measurements were reported for 33 (24%) models, with 31 (22%) models showing good fitness. Calibration was assessed with calibration slope (n=18, 13%), the Hosmer-Lemeshow test (n=11, 8%), and the calibration intercept (n=4, 3%). Only 22 (16%) models used suitable methods (calibration slope or calibration intercept) for calibration calculation (Table 2). Summary results on risk of bias and applicability assessment (using PROBAST) of external validation of osteoporotic fracture prediction model. The discrimination of the four most frequently validated models, including FRAX with BMD (for MOF), FRAX with BMD (for hip fracture), FRAX without BMD (for MOF), and FRAX without BMD (for hip fracture), varied among the studies, with AUC/C index ranged from 0.55 to 0.85, 0.64 to 0.88, 0.58 to 0.75, 0.64 to 0.90, respectively. Other commonly validated models, including the Garvan Model 1 and Garvan Model 2 in females, showed AUC/C index between 0.57 to 0.80, 0.58 to 0.78, respectively. There were some FRAX extension models based on FRAX predictors and other predictors, such as FRAX plus sarcopenia [39], FRAX plus history of falls [52], FRAX plus trabecular bone score (TBS) [40,88,89]. The model performance of the extension models (AUC/C index: 0.60 to 0.78) was slightly improved compared with FRAX alone (AUC/C index: 0.60 to 0.74). However, most of them had not been externally validated yet. The AUC/C indexes of the models using the machine learning modelling method were between 0.69 and 0.91, indicating relatively good discrimination. Some models only included two or three predictors, such as Sambrook 2011 (age, prior fractures) [32], Su 2017 (TBS, femoral neck BMD) [46], Tamaki 2011 (age, weight, femoral neck BMD) [35], with AUC/C indexes being 0.78, 0.67, and 0.90, respectively. Wu 2020 [57], gSOS (for MOF) [58], and gSOS (for hip fracture) [58] included SNPs as predictors, all contained more than 1000 predictors, with AUC/C indexes being 0.71, 0.73 and 0.80, respectively.

Risk of bias and applicability

Most models (n=126, 91%) were judged as high overall risk of bias, while the remaining 12 (9%) were unclear risk of bias, and no low risk of bias model was identified. The most common issues were seen in the analysis domain, in which 126 (91%) models were rated as high risk of bias. The most common reason was the insufficient number of cases or the incorrect assessment of calibration. Several models have an unclear risk (n=58, 42%) or high risk (n=15, 11%) of bias in outcome domain. It is mainly because of the unclarity of whether a prespecified or standard outcome definition or subjective outcome measures (e.g., self-reported) had been used. In applicability section, 88 (64%) models had a low concern, and the remaining 50 (36%) models had a high concern, because they focused on hip fracture in the outcome domain. Details on risk of bias and applicability assessments are presented inFigure 3 and Supplementary table 4.
Figure 3.

Summary results on risk of bias and applicability assessment (using PROBAST) of external validation of osteoporotic fracture prediction model.

Model comparison

FRAX, QFracture, and Garvan were the three most used tools in clinical practice. In addition, there were also some tools with a potential clinical value that had been externally verified with good performance (e.g., FRA-HS, WHI). The details of these models that have been externally validated as well as their advantages and disadvantages were summarized inTable 3.
Table 3

Predictors, advantages and disadvantages of externally validated models.

AuthorModelDetails of the predictors included in the modelAdvantagesDisadvantages
Colón- Emeric 2002[26]Colón-Emeric- AnyGender, ethnicity, BMI, activity of daily living difficulty, antiepileptic use, Rosow-Breslau impairmenta• Relatively easy to measure• Contains few predictors• Performance is poor• Rarely externally verified• Dose-response is not included
Colón-Emeric- HipAge, gender, ethnicity, BMI, stroke history, cognitive impairment, Rosow-Breslau impairmenta• Relatively easy to measure• Contains few predictors• Performance is acceptable• Rarely externally verified• Dose-response is not included
Robbins 2007[29]WHIAge, general health, BMI, prior fractures, ethnicity, physical activity, smoking status, family history of fractures, corticosteroid use, treated diabetes• Easy to measure• Performance is excellent• Includes dose-response for general health and physical activity• Rarely externally verified• Not applicable to male
Nguyen 2008[12]Garvan-Model 1Age, femoral neck BMD, prior fractures, history of falls• Contains few predictors• Includes dose-response for number of prior fractures and falls• Commonly used in clinical practice• Performances range from poor to acceptable• Need to measure BMD
Garvan-Model 2Age, weight, prior fractures, history of falls• Easy to measure• Contains few predictors• Includes dose-response for number of prior fractures and falls• Commonly used in clinical practice• Performances range from poor to acceptable
Kanis 2008[10]FRAX-with BMDAge, gender, BMI, prior fractures, family history of fractures, glucocorticoid use, smoking status, alcohol use, RA, secondary osteoporosis, femoral neck BMD• Had been externally verified many times• Widely used in clinical practice• Performances range from poor to acceptable• Need to measure BMD• Dose-response is not included
FRAX-without BMDAge, gender, BMI, prior fractures, family history of fractures, glucocorticoid use, smoking status, alcohol use, RA, secondary osteoporosis• Had been externally verified many times• Widely used in clinical practice• Relatively easy to measure• Performances range from poor to acceptable.• Dose-response is not included
Hippisley- Cox 2009[11]QFracture-MAge, BMI, smoking status, alcohol use, RA, cardiovascular disease, type 2 diabetes, asthma, tricyclic antidepressants use, corticosteroids use, history of falls, liver disease• Performances range from acceptable to excellent• Includes dose-response for smoking, alcohol use, type of diabetes• Commonly used in clinical practice• Relatively easy to measure• Contains many predictors
QFracture-FHormone replacement therapy use, age, BMI, smoking status, alcohol use, parental history of osteoporosis, RA, cardiovascular disease, type 2 diabetes, asthma, tricyclic antidepressants, corticosteroids use, history of falls, menopausal symptoms, chronic liver disease, gastrointestinal malabsorption, other endocrine disorders• Performance is excellent• Includes dose-response for smoking, alcohol use, type of diabetes• Commonly used in clinical practice• Relatively easy to measure• Contains many predictors
Tanaka 2010[30]FRISCAge, weight, prior fractures, back pain, lumbar BMD• Contains few predictors• Performance is acceptable• Need to measure BMD• Not applicable to male• Dose-response is not included
Hippisley- Cox 2012[36]Updated QFracture-FAge, BMI, ethnicity, alcohol use, smoking status, chronic obstructive pulmonary disease or asthma, any cancer, cardiovascular disease, dementia, epilepsy, history of falls, chronic liver disease, Parkinson’s disease, RA or systemic lupus erythematosus, chronic renal disease, type 1 diabetes, type 2 diabetes, prior fractures, endocrine disorders, gastrointestinal malabsorption, antidepressants, corticosteroids use, unopposed hormone replacement therapy, parental history of osteoporosis• Performances range from acceptable to excellent• Includes dose-response for smoking, alcohol use, type of diabetes• Commonly used in clinical practice• Relatively easy to measure• Contains many predictors
Updated QFracture-MAge, BMI, ethnicity, alcohol use, smoking status, chronic obstructive pulmonary disease or asthma, any cancer, cardiovascular disease, dementia, epilepsy, history of falls, chronic liver disease, Parkinson’s disease, RA or systemic lupus erythematosus, chronic renal disease, type 1 diabetes, type 2 diabetes, prior fractures, endocrine disorders, gastrointestinal malabsorption, antidepressants, corticosteroids use, unopposed hormone replacement therapy, parental history of osteoporosis, care home residence• Performances range from acceptable to excellent• Includes dose-response for smoking, alcohol use, type of diabetes• Commonly used in clinical practice• Relatively easy to measure• Contains many predictors
Iki 2015[40]FRAX+TBSAge, gender, BMI, prior fractures, family history of fractures, glucocorticoid use, smoking status, alcohol use, RA, secondary osteoporosis, femoral neck BMD, trabecular bone score• It is an extended model of FRAX-with BMD, with its performance better than that of FRAX-with BMD• Need to measure BMD• Rarely externally verified• Dose-response is not included
Francesco 2017[43]FRA-HSAge, gender, prior fractures, secondary osteoporosis, corticosteroids use, RA, BMI, smoking status, alcohol abuse disorder• Relatively easy to measure• Performance is excellent• Rarely externally verified• Dose-response is not included
Lu 2021[58]GSOS21,717 SNP• Performances range from acceptable to excellent• Contains many predictors• Predictors are difficult to measure

BMD: bone mineral density; BMI: body mass index; F: female; FRA-HS: Fracture health search; FRAX: fracture risk assessment tool; FRISC: fracture and immobilization score; GSOS: Genomic speed of sound; M: male; RA: rheumatoid arthritis; SNP: Single Nucleotide Polymorphisms; TBS: trabecular bone score; WHI: women's health initiative;

Rosow-Breslau impairment is defined as difficulty doing heavy work, walking upstairs, or unable to walk a mile.

Predictors, advantages and disadvantages of externally validated models. BMD: bone mineral density; BMI: body mass index; F: female; FRA-HS: Fracture health search; FRAX: fracture risk assessment tool; FRISC: fracture and immobilization score; GSOS: Genomic speed of sound; M: male; RA: rheumatoid arthritis; SNP: Single Nucleotide Polymorphisms; TBS: trabecular bone score; WHI: women's health initiative; Rosow-Breslau impairment is defined as difficulty doing heavy work, walking upstairs, or unable to walk a mile.

DISCUSSION

This systematic review summarized and critically appraised 68 studies focused on OF risk prediction models in the general population, with 70 developed models and 138 external validations. Only a few models showed outstanding (n=3, 1%) or excellent (n=32, 15%) prediction discrimination. There was a paucity (n=22, 31%) of external validation models among these developed models. Notwithstanding there were a few notable exceptions, such as FRAX with BMD (for MOF) and FRAX with BMD (for hip fracture)). Calibration of developed models (n=25, 36%) or external validation models (n=33, 24%) were rarely assessed. Moreover, no model was appraised as having a low risk of bias. We found much variability in the geographical location of both model development and model validation. However, the majority of models were developed and validated in the UK, the US, or China. Although studies have shown that osteoporosis fractures in low or middle-income countries are also prevalent [90], no model has been developed or validated among the population from Africa, South America, and the Middle East. Tailored models for populations in these countries are important because it is well known that predictor-outcome associations vary among ethnic groups [91]. In the future, more external validation studies among the aforementioned uncovered populations are needed to improve the generalizability of existing models, which is also a cost-effective choice than investing extra research funding in developing new models [92]. Although postmenopausal females are at high risk of OF, with the increase of age, the incidence of OF in males will increase significantly. Furthermore, the mortality and disability of OF in males are higher than that in females [93]. Therefore, osteoporosis is an underestimated bone condition among the male population [94]. Although research progress has been made on OF in male [37,57], we found that most models were developed (n=31, 44%) and validated (n=76, 55%) specifically for female, with relatively less models being specifically developed (n=23, 33%) or validated (n=33, 24%) for male. Future studies are suggested to pay attention to risk prediction models specific to the male population. It is worth noting that some models only included a few numbers of predictors (e.g., two or three predictors) [32,35,46], or easily measured predictors [29] also showed promising model performance when compared to those models [57] that used multiple complex predictors like SNPs. Moreover, due to a large number of predictors and resources demanding for measurement, the practical application of these complex models (including a large number of SNPs) is limited. On the other hand, as the gold standard for the diagnosis of osteoporosis, BMD has been included in several prediction models [34,35,39,40,46,48]. This review found that many studies showed Garvan and FRAX with BMD had higher discrimination than Garvan and FRAX without BMD [39]. However, we also observed similar or even better model performance in models without BMD, such as QFracture [84], and WHI [29], indicating that BMD may not be an essential predictor for future fracture. Hence, an increasing number of predictors or including complex predictors may not necessarily improve model performance. Complex predictors (e.g., BMD, SNPs) could be replaced by other easily measurable predictors (e.g., age, prior fractures, history of falls) for future studies under the circumstances when it is unavailable, difficult to obtain, or showed no evidence of improving model performance. FRAX, QFracture, and Garvan are the top three commonly used models for OF prediction. FRAX (10 or 11 predictors) is a model recommended by the WHO to evaluate the risk of OF [10]. It has strong applicability and operability and has been used worldwide [17]. In this systematic review, we found that FRAX with BMD (for MOF) (n=37, 27%) was the most externally validated model, but its model performance was not particularly good; Compared with FRAX alone, the model performance of its extended model was slightly improved, but most of them had not been externally verified. The Garvan (4 predictors) contained the least predictors that are easy to measure as well [12]. That facilitates its practical use. However, the model performance of the Garvan was relatively poor [16]. The QFracture was developed through electronic medical records and showed the best model performance among the three models. Nevertheless, the larger number of predictors (26 predictors for males and 25 predictors for females) limits its practical application to a certain extent [11]. Moreover, there were some models (e.g., FRA-HS) with potentially clinical value and good performance [43], had neither been externally verified in different populations nor were rarely used in clinical practice. As a result, there is no one fit for all models being recommended in this review. The model performance, applicability, and characteristics should be considered for selecting OF prediction model [16]. Modeling methods include classical regression methods (e.g., Cox proportional hazards regression, Logistic regression) and artificial intelligence methods (e.g., machine learning). Generally, classical regression methods have the defect of lower prediction performance [57]. Compared with classical regression methods, artificial intelligence methods have a powerful ability for data analysis and exploration. Models developed through artificial intelligence methods showed the advantages of accuracy, sensitivity, and efficiency [59,95]. In this systematic review, 7 (10%) models that adopted machine learning methods indicated relatively good discrimination. However, artificial intelligence modeling requires huge and high-quality data. In addition, the model is prone to overfitting [59]. Nonetheless, with the coming of the big data era, artificial intelligence methods have more applications in the medical field and could be considered as a flexible alternative for risk prediction in large datasets. This systematic review did not consider model impact studies, which will quantify the benefits, harms, and costs of introducing a new prediction risk model through comparative design, it is also the final crucial step to identify whether the model can be applied to the clinic [96,97]. A recent related systematic review only identified three model impact studies on OF [98]. Results from this systematic review showed that population screening could effectively reduce OF and hip fractures, however, the information on the costs and screening interval was still unclear [98]. More rigorous impact studies are needed to determine whether OF risk prediction models should be implemented in clinical practice.

Recommendations and implications

Accurate OF risk evaluation can allow clinicians and individuals in understanding the risk of OF and guide them to make decisions to mitigate the risks [99]. When choosing a model for the prediction of OF risk, its accuracy, applicability, convenience, data availability, and cost should be considered. When developing models, simple models with less number or easily measured predictors should be considered as a priority choice to improve the clinical feasibility and practicality of the models. Given a large number of existing models, priority for the future studies should recalibrate and extend the existing OF prediction models to improve prediction performance, and conduct external verification and analysis of model impact, instead of developing new models from scratch [92].

Strengths and limitations

The strengths of this review include systematic literature search, rigorous study selection, and detailed data extraction on the main characteristics of OF prediction models. Furthermore, we evaluated the risk of bias and applicability of all the identified models to suggest where improvements are needed in future OF prediction model studies. However, this review also has some limitations. Firstly, due to the varied heterogeneity across studies, the results were not quantitatively synthesized, which limited the comparability of models. Secondly, although we conducted an exhaustive literature search, some relevant citations may be missed due to no attempt of grey literature search. This may underestimate the number of development and validation models,

Conclusion

In conclusion, our systematic review found that although there were a certain number of OF risk prediction models, most of the developed models had not been thoroughly internally validated or externally validated (with calibration being unassessed for most of the models). Most of the models showed poor performance as well. Moreover, all models suffered from methodological shortcomings. Given the availability of large and combined datasets, more rigorous studies are suggested to validate, improve and analyze the impact of existing OF risk prediction models in the general population rather than developing completely new models. Rigorous studies on OF prediction models are needed to target to males and the population in low or middle-income countries.

Supplementary Materials

The Supplementary data can be found online at: www.aginganddisease.org/EN/10.14336/AD.2021.01206
  97 in total

1.  Fracture risk prediction using FRAX®: a 10-year follow-up survey of the Japanese Population-Based Osteoporosis (JPOS) Cohort Study.

Authors:  J Tamaki; M Iki; E Kadowaki; Y Sato; E Kajita; S Kagamimori; Y Kagawa; H Yoneshima
Journal:  Osteoporos Int       Date:  2011-01-29       Impact factor: 4.507

Review 2.  Receiver operating characteristic curve in diagnostic test assessment.

Authors:  Jayawant N Mandrekar
Journal:  J Thorac Oncol       Date:  2010-09       Impact factor: 15.609

Review 3.  An overview of clinical guidelines for the management of vertebral compression fracture: a systematic review.

Authors:  Patrícia C S Parreira; Chris G Maher; Rodrigo Z Megale; Lyn March; Manuela L Ferreira
Journal:  Spine J       Date:  2017-07-21       Impact factor: 4.166

4.  Machine Learning Approaches for Fracture Risk Assessment: A Comparative Analysis of Genomic and Phenotypic Data in 5130 Older Men.

Authors:  Qing Wu; Fatma Nasoz; Jongyun Jung; Bibek Bhattarai; Mira V Han
Journal:  Calcif Tissue Int       Date:  2020-07-29       Impact factor: 4.333

5.  FRAX and the assessment of fracture probability in men and women from the UK.

Authors:  J A Kanis; O Johnell; A Oden; H Johansson; E McCloskey
Journal:  Osteoporos Int       Date:  2008-02-22       Impact factor: 4.507

6.  Factors associated with 5-year risk of hip fracture in postmenopausal women.

Authors:  John Robbins; Aaron K Aragaki; Charles Kooperberg; Nelson Watts; Jean Wactawski-Wende; Rebecca D Jackson; Meryl S LeBoff; Cora E Lewis; Zhao Chen; Marcia L Stefanick; Jane Cauley
Journal:  JAMA       Date:  2007-11-28       Impact factor: 56.272

7.  FRAX®: prediction of major osteoporotic fractures in women from the general population: the OPUS study.

Authors:  Karine Briot; Simon Paternotte; Sami Kolta; Richard Eastell; Dieter Felsenberg; David M Reid; Claus-C Glüer; Christian Roux
Journal:  PLoS One       Date:  2013-12-30       Impact factor: 3.240

8.  Osteoporotic hip fracture prediction from risk factors available in administrative claims data - A machine learning approach.

Authors:  Alexander Engels; Katrin C Reber; Ivonne Lindlbauer; Kilian Rapp; Gisela Büchele; Jochen Klenk; Andreas Meid; Clemens Becker; Hans-Helmut König
Journal:  PLoS One       Date:  2020-05-19       Impact factor: 3.240

9.  FRAX® tool, the WHO algorithm to predict osteoporotic fractures: the first analysis of its discriminative and predictive ability in the Spanish FRIDEX cohort.

Authors:  Rafael Azagra; Genís Roca; Gloria Encabo; Amada Aguyé; Marta Zwart; Sílvia Güell; Núria Puchol; Emili Gene; Enrique Casado; Pilar Sancho; Silvia Solà; Pere Torán; Milagros Iglesias; Maria Carmen Gisbert; Francesc López-Expósito; Jesús Pujol-Salud; Yolanda Fernandez-Hermida; Ana Puente; Mireia Rosàs; Vicente Bou; Juan José Antón; Gustavo Lansdberg; Juan Carlos Martín-Sánchez; Adolf Díez-Pérez; Daniel Prieto-Alhambra
Journal:  BMC Musculoskelet Disord       Date:  2012-10-22       Impact factor: 2.362

10.  Comparisons of Different Screening Tools for Identifying Fracture/Osteoporosis Risk Among Community-Dwelling Older People.

Authors:  Sy-Jou Chen; Yi-Ju Chen; Chui-Hsuan Cheng; Hei-Fen Hwang; Chih-Yi Chen; Mau-Roung Lin
Journal:  Medicine (Baltimore)       Date:  2016-05       Impact factor: 1.889

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.