PURPOSE: Accurate and reliable radiographic classifications are of great importance as a basis of treatment decisions and prognosis in Perthes disease. The classification of Stulberg is widely used as a predictor of long-term outcome. The aim of the present study was to determine whether the Stulberg classification is sufficiently reliable for routine clinical use in the assessment of Perthes disease. METHODS: We used this classification to assess the radiographs of 101 hips in two separate sessions (55 and 46 hips, respectively), interfered by an educational intervention in which the classification algorithm was discussed and clarified. RESULTS: We obtained good agreement between experienced examiners (weighted kappa 0.65) and a percentage agreement of 71%. We obtained weighted kappa values of 0.51 and 0.57 (moderate agreement) and percentage agreements of 62% and 65% between the least experienced observer and the two experienced examiners. Combining Stulberg class I and II, and IV and V into a simpler three-group classification gave better agreement between all observers. The agreement between the two experienced observers was improved to 81%. CONCLUSIONS: We conclude that the reliability of the Stulberg classification is acceptable when the radiographic assessment is carried out by experienced examiners. A simpler three-group classification based on the shape of the femoral head (spherical, ovoid and flat) gave better agreement and is, therefore, recommended for routine clinical use.
PURPOSE: Accurate and reliable radiographic classifications are of great importance as a basis of treatment decisions and prognosis in Perthes disease. The classification of Stulberg is widely used as a predictor of long-term outcome. The aim of the present study was to determine whether the Stulberg classification is sufficiently reliable for routine clinical use in the assessment of Perthes disease. METHODS: We used this classification to assess the radiographs of 101 hips in two separate sessions (55 and 46 hips, respectively), interfered by an educational intervention in which the classification algorithm was discussed and clarified. RESULTS: We obtained good agreement between experienced examiners (weighted kappa 0.65) and a percentage agreement of 71%. We obtained weighted kappa values of 0.51 and 0.57 (moderate agreement) and percentage agreements of 62% and 65% between the least experienced observer and the two experienced examiners. Combining Stulberg class I and II, and IV and V into a simpler three-group classification gave better agreement between all observers. The agreement between the two experienced observers was improved to 81%. CONCLUSIONS: We conclude that the reliability of the Stulberg classification is acceptable when the radiographic assessment is carried out by experienced examiners. A simpler three-group classification based on the shape of the femoral head (spherical, ovoid and flat) gave better agreement and is, therefore, recommended for routine clinical use.