Literature DB >> 28538397

Quality of reporting of multivariable logistic regression models in Chinese clinical medical journals.

Ying-Ying Zhang¹, Xiao-Bin Zhou, Qiu-Zhen Wang, Xiao-Yan Zhu.

Abstract

Multivariable logistic regression (MLR) has been increasingly used in Chinese clinical medical research during the past few years. However, few evaluations of the quality of the reporting strategies in these studies are available.To evaluate the reporting quality and model accuracy of MLR used in published work, and related advice for authors, readers, reviewers, and editors.A total of 316 articles published in 5 leading Chinese clinical medical journals with high impact factor from January 2010 to July 2015 were selected for evaluation. Articles were evaluated according 12 established criteria for proper use and reporting of MLR models.Among the articles, the highest quality score was 9, the lowest 1, and the median 5 (4-5). A total of 85.1% of the articles scored below 6. No significant differences were found among these journals with respect to quality score (χ = 6.706, P = .15). More than 50% of the articles met the following 5 criteria: complete identification of the statistical software application that was used (97.2%), calculation of the odds ratio and its confidence interval (86.4%), description of sufficient events (>10) per variable, selection of variables, and fitting procedure (78.2%, 69.3%, and 58.5%, respectively). Less than 35% of the articles reported the coding of variables (18.7%). The remaining 5 criteria were not satisfied by a sufficient number of articles: goodness-of-fit (10.1%), interactions (3.8%), checking for outliers (3.2%), collinearity (1.9%), and participation of statisticians and epidemiologists (0.3%). The criterion of conformity with linear gradients was applicable to 186 articles; however, only 7 (3.8%) mentioned or tested it.The reporting quality and model accuracy of MLR in selected articles were not satisfactory. In fact, severe deficiencies were noted. Only 1 article scored 9. We recommend authors, readers, reviewers, and editors to consider MLR models more carefully and cooperate more closely with statisticians and epidemiologists. Journals should develop statistical reporting guidelines concerning MLR.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2017 PMID： 28538397 PMCID： PMC5457877 DOI： 10.1097/MD.0000000000006972

Source DB: PubMed Journal: Medicine (Baltimore) ISSN： 0025-7974 Impact factor: 1.889

Introduction

The most widely used approaches to multivariable models in clinical studies are multivariable linear regression, multivariable logistic regression (MLR), and proportional hazards regression[. The independent variables in MLR model can be continuous, categorical, or ordinal. Moreover, their coefficients can be easily converted into odds ratios (ORs) with straightforward explanation. Normal distribution is not required in logistic regression. Due to these advantages, MLR is widely used in medical research mostly for adjustment of confounding factors, screening of relevant variables, predicting, and discrimination.[ However, failure to perform multivariable regression appropriately, which is manifested, for instance, violating or neglecting assumptions and preconditions, or ambiguous coding of variables, can potentially lead to inaccurate, misleading, or even erroneous conclusions; or render the conclusions difficult to interpret.[ Appropriate assumptions, correct interpretation, and complete reporting of MLR have been studied since 1993.[ The study of Kumar and Chhabra[ indicated that understanding of the inherent assumptions and important limitations of the model before application can significantly enhance the quality and reliability of MLR analysis. However, several deficiencies in reporting quality can be found in many fields, such as obstetrics and gynecology,[ pulmonary and critical care,[ and clinical epidemiology.[ As early as 1993, Concato et al[ pointed out the reporting quality problems of multivariate statistical analysis in medical research, found that 6 important assumptions in logistic regression analysis are ignored or unreported, and recommended improving the reporting and application guidelines for multivariate analysis in medical research. In the following 20 years, a large number of such studies emerged.[ However, the reporting quality of MLR in the literature remained uneven. Recently, there has been significant improvement; however, there are insufficiencies in, for instance, the collinearity test, checking for outliers, reporting guidelines or standards, and participation of statisticians.[ Scholars in China have only focused on theoretical research. Sun[ discussed the significant influence of sample size and screened methods on model parameters and paired data variables, respectively; Liu et al[ developed a generalized ad-logistic regression theory; Liu[ found several common problems in logistic regression analysis and proposed methods for performing collinearity tests and identifying outliers in the sample. So far, research on reporting quality evaluation of MLR is rare worldwide, and has not been available in Chinese literature. Thus, we evaluated studies using MLR published in 5 leading Chinese clinical medical journals from 2010 to 2015 according 12 established criteria, hoping to provide reference for reasonable application of MLR and correct reporting of results for authors, reviewers, and journal editors.

Methods

Data collection

The following journals were selected due to their high Chinese Science Citation Database (CSCD) impact factor (IF) and their large circulations: Chinese Journal of Cardiology (IF = 0.5065 of year 2012), Chinese Journal of Oncology (IF = 0.4474), Chinese Journal of Neuromedicine (IF = 0.6729), Chinese Journal of Pediatrics (IF = 0.7669), and Chinese Journal of Digestive Surgery (IF = 0.7318). We performed a manual search of all articles published between 2010 and 2015. The 6-year interval was used to indicate any trends over time. A total of 316 articles that contained the word “logistic” in the title, abstract, or keywords list published in Chinese Journal of Cardiology (119 articles), Chinese Journal of Oncology (57 articles), Chinese Journal of Neuromedicine (77 articles), Chinese Journal of Pediatrics (31 articles), and Chinese Journal of Digestive Surgery (32 articles) were included in the research.

Assessment criteria of reporting quality

In line with related research,[ we applied 12 criteria to evaluate the quality of reporting of MLR, namely: (1) selection of independent variables; (2) fitting procedure; (3) coding of variables; (4) interactions; (5) collinearity; (6) statistical significance (OR, 95% confidence interval [CI], P value); (7) goodness-of-fit; (8) checking for outliers; (9) complete identification of the statistical software application that was used; (10) sufficient events (>10) per variable; (11) participation of statisticians and epidemiologists; and (12) conformity with linear gradient. The articles were evaluated according to the first 11 criteria. Each criterion contributed one point to the score if it was fulfilled. Criterion (12) was not included in the calculation of the total score, since continuous or rank variable were not used in every article. Ethical approval was not involved, as the subject of our study is literature, not human being or animal.

Selection of independent variables

Variable selection can seriously affect the model estimation. Therefore, it should be justified. Usually, variables are selected according to professional knowledge and previous studies, or statistically significant association in a univariate analysis.

Fitting procedure

The variables may be determined by model selection methods, such as automatic procedures (eg, forward inclusion, backward elimination, stepwise selection, or best subset selection), nonautomated backward selection, or a priori specification (either collectively or in “hierarchically” grouped subsets).[ Preferably a description of the model test methods (eg, conditional parameter estimation, maximum partial likelihood estimation, and the Wald chi-square test) should be included. An article was considered to fulfill this criterion if it reported one of the methods above.

Coding of variables

Reporting the coding of variables properly is important, since the interpretation of regression coefficients depends on the coding of variables and the measurement of units. For example, the coefficient for the impact of age on mortality will be very different if age is coded in 1-year increments, in 10-year increments, or dichotomously as <65 versus ≥65 years.[ Hence, reporting the coding of variables plays an important role in correctly understanding the model parameters (OR). We evaluated an article as satisfying this criterion if a detailed classification of the independent variables or the reference group was presented. An explicit list is, of course, preferable.

Interactions

When the effect of an independent variable on the outcome variables can be affected by other variables, we have interactions among independent variables. This can conceal the true correlation between independent and dependent variables.[ Generally, their statistical significance and effect on the model must be tested and reported, according to professional knowledge or previous studies. Articles including explicit tests for interaction, mentioning the concept of interaction anywhere in the text, or justifying the exclusion of interaction from the final model were regarded as fulfilling the criterion.

Collinearity

Collinearity is high correlation between 2 or more covariates. Multicollinearity would occur if some covariates are partially or totally explained by other covariates. Collinearity is necessary to be checked before establishing logistic models, otherwise unreliable estimates of coefficients and wide CIs may appear. Methods for tackle the multicollinearity can be found in the textbook by Allison.[ However, collinearity is often ignored. The criterion was considered to be fulfilled as long as the concept of collinearity was discussed anywhere in article.

Statistical significance

Articles that reported the OR and CI correctly, preferably with P values, met this criterion.[ The use of P values only should be discouraged.

Goodness-of-fit

Goodness-of-fit can provide information about how well the entire model matches the observed data. Despite controversy, several methods are available for goodness-of-fit testing including Pearson test, deviance, likelihood ratio, and Hosmer–Lemeshow statistic (equivalents of R2 in linear regression).[ Moreover, there are strategies for assessing the model's predictive performance based on Pseudo R2, the fraction of correct predictions, or receiver-operating characteristic (ROC) curves. These 2 procedures (methods for goodness of fit and predictive performance) are different concepts, and both should be reported in a model. Nevertheless, article was classified as meeting this criterion if any of the measures above was calculated.

Checking for outliers

Outliers are variables whose residuals (observed-predicted) are significantly greater than those of other variables. If undetected, they may distort the model's robustness due to their effect on the coefficients of the variables. Obviously, outlier-checking procedures are essential to ensure the accuracy of MLR analysis. Useful techniques are available for detecting outliers, such as residual examination (eg, Pearson residuals and deviance residuals), and for measuring the impact of outliers on the regression model, such as the Cook distance and DFBETA (s).[ Specific SAS procedures can be found in the literature. We classified the articles based on whether any method for outliers-checking was mentioned.

Complete identification of the statistical software application that was used

Identifying the software application can help other researchers to reproduce and test the MLR model. It is necessary to report the version of the program due to the continuous revisions of the same application.[

Sufficient events (>10) per variable (ratio of outcome events to independent variables)

Generally, including more relevant variables can result in a model that fits the data better.[ However, large standard error, unreliable coefficients estimates, and wider CIs may appear when an excessively large number of variables are used for small number of outcome events. This is called over-fitting.[ Despite controversy about the number of outcome events required for MLR models, the ratio of at least 10 outcome events per independent variable is widely approved.[ A small ratio implies greater potential of biased estimation and invalidity of MLR.[ We categorized the articles with a ratio greater than 10:1 (or greater than 20 for conditional MLR) as fulfilling the criterion.

Participation of statisticians and epidemiologists

Proper use of statistical methods combined with professional knowledge can avoid bias and reduce defects during statistical analysis and reporting procedures. Similar studies have indicated that inadequate reporting was less frequent if an author was affiliated with a Department of Statistics, Epidemiology, or Public Health.[ Therefore, the participation of statisticians and epidemiologists is important for properly using and appropriately reporting the MLR model.

Conformity with linear gradient for continuous or rank variables

In the case of continuous or rank independent variables, linear relationship of log-odds may be imposed upon the model. Any specific unit change in the respective covariate should have the same impact on the outcome,[ otherwise, there is a risk of misspecification and false inferences,[ especially when the relationship is U-shaped, J-shaped, or parabolic.[ To avoid this problem, the most common strategy is to convert a continuous variable into a categorical variable. This is likely to result in loss of useful information. Therefore, more flexible modeling procedures are required for handling continuous variables, such as spline regression, multivariable fractional polynomials, and generalized additive models. If there was any description of the linear assumption, preferably, some type of justification (eg, professional modeling approaches, exploratory plot of the data, or prior clinical knowledge) should also be provided, the article was considered to meet the criterion.

Statistical analysis

The criteria were evaluated by 2 authors (YZ and XZ) and possible disagreement was discussed. The frequencies of fulfilling the criteria were separately calculated for each journal, and the Pearson χ2 test and Fisher exact test were used for testing the differences among the journals. The quality scores were represented by median (interquartile range), and their differences among the journals were tested by the multiple independent samples Kruska–Walis H test. All calculations were performed in IBM SPSS Statistics 19.0, and P value <.05 was considered significant.

Results

Basic characteristics of the quality of the selected articles

The highest score was 9, the lowest was 1, the median (interquartile range) was 4 (4–5). A total of 85.1% of the articles scored less than 6, and 14.9% scored between 6 and 9 points, as seen in Table 1. The distribution of quality scores among journals is summarized in Table 2. The number of articles using MLR and the quality score had no apparent change over time (Fig. 1). Through the Kruskal–Walis H test, the difference of quality scores among 5 journals was no statistically significant (χ2 = 6.706, P = .15), whereas there is a significant difference over the 6-year period (χ2 = 26.388, P < .05).

Table 1

Distribution of quality scores of the 316 articles using multivariable logistic regression (MLR).

Table 2

Distribution of articles in 5 journals and their descriptive statistics for quality score.

Figure 1

Number of articles using multivariable logistic regression and their quality score. The bar chart represents the number of articles using multivariable logistic regression (MLR). The line chart represents the quality score of articles using MLR. The difference among the 5 journals had no statistical significance (χ2 = 6.706, P = .15). The difference was significant over the 6-year period (χ2 = 26.388, P < .05).

Distribution of quality scores of the 316 articles using multivariable logistic regression (MLR). Distribution of articles in 5 journals and their descriptive statistics for quality score. Number of articles using multivariable logistic regression and their quality score. The bar chart represents the number of articles using multivariable logistic regression (MLR). The line chart represents the quality score of articles using MLR. The difference among the 5 journals had no statistical significance (χ2 = 6.706, P = .15). The difference was significant over the 6-year period (χ2 = 26.388, P < .05).

The quality of reporting of articles using MLR

The results for the fractions of fulfillment of the 11 criteria are summarized in Table 3. The criterion of conformity with linear gradients for continuous or rank variables was applicable to 186 articles; however, only 7 (3.8%) mentioned or tested this conformity.

Table 3

Quality of reporting of multivariable logistic regression criteria (n, %).

Quality of reporting of multivariable logistic regression criteria (n, %). A total of 219 articles (69.3%) described the selection of independent variables. Specifically, 56 were based on previous studies and experience, and 128 were based on statistical significance of the single factor test. Twenty-one articles used both methods, and 14 full-model. In 185 articles (58.5%), the fitting procedure was reported. Only 22 articles reported both the selection and test methods. Classification or coding of independent variables was described completely in 59 articles (18.7%). Twelve articles (3.8%) met the criterion. The product term was included as an independent variable in all models in order to isolate the interaction and examine its impact. The criterion of collinearity among independent variables was met in 6 articles (1.9%). BKW[ was used to assess the degree of collinearity.

Statistical significance (OR, 95% CI, P value)

A total of 273 articles (86.4%) met this criterion. Specifically, 261 reported all indices for each variable's coefficients in the final model, and 12 reported the OR and 95% CI. Among the remaining articles, 32 reported either the OR or the 95% CI, and 11 reported only the P value. Thirty-two articles (10.1%) evaluated the model. Specifically, 5 conducted goodness-of-fit (Hosmer–Lemeshow was used in 3 articles and likelihood ratio in 2); 23 evaluated the model's predictive performance based on the fraction of correct predictions (4 articles), or receiver operating characteristic curves (19 articles); 4 used the C++ programming language to predict model's performance. It seems that 68% of the articles may be over-fitted. Ten articles (3.2%) met the criterion of checking for outliers.

Identification of the statistical software application

The majority, that is, 307 articles (97.2%) specified the application and its version. The most frequently used program is SPSS (88.3%), followed by SAS (7.9%).

Sufficient events (>10) per variable

In 247 articles (78.2%), the ratio of outcome events to independent variables was 10:1 or higher. The entire range was from 1.50 to 2459.75. There were 6 articles using conditional logistic regression with an optimal ratio over 20, namely, between 25.00 and 183.43. Only 1 article (0.3%) reported participation of statisticians and epidemiologists.

Discussion

It was found that 5 criteria were fulfilled in more than 50% of the articles. This result is not particularly encouraging. For example, classification and coding of variables were described in only 18.7% of the articles. By contrast, Mikolajczyk et al[ found that 86 of the 104 articles (83%) that were published in obstetrics and gynecology journals in 2005 and 2006 used coding of potential covariates. The percentage was 41.28% in the study conducted by Kumar et al,[ where 109 articles using MLR were selected from 8 Indian journals from 1994 to 2008. Interactions between independent variables, conformity with linear gradient, and collinearity were reported in 3.8%, 3.8%, and 1.9%, respectively, in this study. Even though these are similar to the corresponding percentages of the study of Kumar (3.67%, 1.83%, and 0, respectively),[ they are far below those of the study conducted by Ottenbacher et al[ (39%, 17%, and 19%) in 2 epidemiology journals from 2000 to 2001, and those in the study of Kalil et al[ (19%, 4.7%, and 25%) from 6 major journals on organ transplant from January, 2005 to January, 2006. The small number of reportings for these assumptions may be due to the lack of statistical knowledge, unskillful software application, or absence of guidelines for appropriate reporting. In this study, the encouraging finding was that the statistical software application and statistical significance, which are the most basic requirements of statistical analysis, were provided in a large number of articles. However, this was not the case with relatively complex tests. We found that 88.3% of the articles used SPSS for statistical analysis. However, there was no output information on multicollinearity in SPSS. It is likely that researchers ignore the multicollinearity problem when SPSS is used to construct MLR models. One solution is to use the same dependent and independent variables to conduct multiple-linear regression and the corresponding collinearity diagnosis. Yang[ and Zhao et al[ have given the implementation process for the diagnosis of multicollinearity of logistic regression by Stata and SAS, respectively. This can partly resolve the problem. However, in this study, 98.1% of the articles did not consider the potential of multicollinearity, which may result unstable regression coefficients or large CIs, and even affect the selection of variables. Currently, the most frequently used methods, such as principal component analysis, partial least squares estimation, and ridge regression, can be employed more efficiently to overcome the multicollinearity issue; however, there may be defects. More adaptable methods are required for analyzing collinearity completely.[ Only 10 articles (3.2%) detected outliers; however, no article provides a detailed analysis. Outliers, that is, observations beyond what is expected, may be identified by statistical variations, face validity, or consensus based on clinical reasons.[ MLR models are very sensitive to outliers, as outliers may cause or cover multicollinearity between independent variables and affect the model's robustness and parametric estimation.[ Therefore, outliers should be treated with caution when they appears: first, remove outliers caused by data collection or recording error. Subsequently, important covariates, interactions, sufficient sample size, and other issues should be considered when outliers are corrected.[ Finally, conduct 2 complete independent studies. In 1 study, retain the outliers. In the other, remove them. The conclusions from both studies are then compared.[ The linear gradient for continuous or rank variables is not generally considered.[ In our study, only 7 articles fulfilled the corresponding criterion. This may be due to nonavailability of an automatic option for this test in current statistical software. Moreover, it may be related to the level of technical expertise. The linear gradient of continuous covariates can be easily resolved through a flexible modeling approach, such as spline regression (or segmented regression), multivariable fractional polynomials,[ and generalized additive models.[ However, these modeling procedures are more complex and cannot be achieved by common statistical software. Moreover, they are difficult to interpret and understand, even for experts. These problems limit the use of these complex statistical models in the medical field.[ The selected journals in this study have high impact factor and large circulation. Since the evaluation criteria are currently acceptable, the same problems are likely to be found in other Chinese clinical medical journals. Undoubtedly, there are limitations in our study. We evaluated only articles from 5 leading journals. This is not likely to reflect conformance with the criteria comprehensively. Moreover, we cannot ensure whether certain data were missing due to actual failure to perform the corresponding test, or space limitations in the article.[ The authors might feel that it was unnecessary to perform these test or had good reasons to make exemptions.[ Finally, a comparison study between Chinese and non-Chinese journals may be necessary to decide whether this is a global issue. In conclusion, despite the fact that reporting quality of MLR used in Chinese clinical medical journals has increased, severe deficiencies were noted. The main reasons behind these deficiencies may be: lack of statistical expertise and software application ability; nonavailability of automatic option for complex analysis in current statistical software; inadequate training in statistical methods among medical researchers or medical professionals; and absence of guidelines for appropriate reporting. Moreover, researchers may occasionally be unwilling to perform complex statistical tests and collaborate with biostatisticians. The reporting quality and reliability of MLR models can be improved by editorial amendments, peer review, and a statistical review system.[ When MLR models are inaccurately constructed and improperly reported, it is difficult for researchers and readers to understand the results, and reproduce the models for future research. Hence, we recommend authors, readers, reviewers, and editors to become more acquainted with the use and reporting of MLR models. Journal editors should be more specific and proactive about the requirements for the publication of MLR and relax the word limit in the statistical analysis section, where interaction, multicollinearity, and linear gradient should be reported. Moreover, journals develop statistical reporting guidelines concerning MLR and encourage researchers to collaborate with statisticians and epidemiologists to improve accuracy and the quality of reporting.

19 in total

Review 1. Logistic regression models in obstetrics and gynecology literature.

Authors: K S Khan; P F Chien; L S Dwarakanath
Journal: Obstet Gynecol Date: 1999-06 Impact factor: 7.661

2. Reporting on statistical methods to adjust for confounding: a cross-sectional survey.

Authors: Marcus Müllner; Hugh Matthews; Douglas G Altman
Journal: Ann Intern Med Date: 2002-01-15 Impact factor: 25.391

3. An appraisal of multivariable logistic models in the pulmonary and critical care literature.

Authors: Marc Moss; D Andrew Wellman; George A Cotsonis
Journal: Chest Date: 2003-03 Impact factor: 9.410

4. [Assessment of multivariate logistic regression analysis in articles published in Turkish cardiology journals].

Authors: Ibrahim Halil Tanboğa; Mustafa Kurt; Turgay Işik; Ahmet Kaya; Mehmet Ekinci; Enbiya Aksakal; Serdar Sevimli; Murat Cayli
Journal: Turk Kardiyol Dern Ars Date: 2012-03

5. The importance of assessing the fit of logistic regression models: a case study.

Authors: D W Hosmer; S Taber; S Lemeshow
Journal: Am J Public Health Date: 1991-12 Impact factor: 9.308

6. Statistical reporting in Indian Pediatrics.

Authors: Jay Karan; J P Goyal; P Bhardwaj; Preeti Yadav
Journal: Indian Pediatr Date: 2009-09 Impact factor: 1.411

7. Confidence intervals rather than P values: estimation rather than hypothesis testing.

Authors: M J Gardner; D G Altman
Journal: Br Med J (Clin Res Ed) Date: 1986-03-15

Review 8. The risk of determining risk with multivariable models.

Authors: J Concato; A R Feinstein; T R Holford
Journal: Ann Intern Med Date: 1993-02-01 Impact factor: 25.391

9. Standardizing criteria for logistic regression models.

Authors: C Campillo
Journal: Ann Intern Med Date: 1993-09-15 Impact factor: 25.391

Review 10. The misuse and abuse of statistics in biomedical research.

Authors: Matthew S Thiese; Zachary C Arnold; Skyler D Walker
Journal: Biochem Med (Zagreb) Date: 2015 Impact factor: 2.313

3 in total

1. Prognostic Utility of Optical Coherence Tomography for Visual Outcome After Extended Endoscopic Endonasal Surgery for Adult Craniopharyngiomas.

Authors: Ning Qiao; Chuzhong Li; Jing Xu; Guofo Ma; Jie Kang; Lu Jin; Lei Cao; Chunhui Liu; Yazhuo Zhang; Songbai Gui
Journal: Front Oncol Date: 2022-01-06 Impact factor: 6.244

Review 2. Prediction equations of forced oscillation technique: the insidious role of collinearity.

Authors: Hassib Narchi; Afaf AlBlooshi
Journal: Respir Res Date: 2018-03-27

3. Vaccine Production Process: How Much Does the General Population Know about This Topic? A Web-Based Survey.

Authors: Angela Bechini; Paolo Bonanni; Beatrice Zanella; Giulia Di Pisa; Andrea Moscadelli; Sonia Paoli; Leonardo Ancillotti; Benedetta Bonito; Sara Boccalini
Journal: Vaccines (Basel) Date: 2021-05-29

3 in total