Literature DB >> 21431839

PredictABEL: an R package for the assessment of risk prediction models.

Suman Kundu1, Yurii S Aulchenko, Cornelia M van Duijn, A Cecile J W Janssens.   

Abstract

The rapid identification of genetic markers for multifactorial diseases from genome-wide association studies is fuelling interest in investigating the predictive ability and health care utility of genetic risk models. Various measures are available for the assessment of risk prediction models, each addressing a different aspect of performance and utility. We developed PredictABEL, a package in R that covers descriptive tables, measures and figures that are used in the analysis of risk prediction studies such as measures of model fit, predictive ability and clinical utility, and risk distributions, calibration plot and the receiver operating characteristic plot. Tables and figures are saved as separate files in a user-specified format, which include publication-quality EPS and TIFF formats. All figures are available in a ready-made layout, but they can be customized to the preferences of the user. The package has been developed for the analysis of genetic risk prediction studies, but can also be used for studies that only include non-genetic risk factors. PredictABEL is freely available at the websites of GenABEL ( http://www.genabel.org ) and CRAN ( http://cran.r-project.org/).

Entities:  

Mesh:

Substances:

Year:  2011        PMID: 21431839      PMCID: PMC3088798          DOI: 10.1007/s10654-011-9567-4

Source DB:  PubMed          Journal:  Eur J Epidemiol        ISSN: 0393-2990            Impact factor:   8.082


Introduction

The rapid identification of genetic markers for multifactorial diseases from genome-wide association studies is fuelling interest in investigating the predictive ability and health care utility of genetic risk models. Genetic risk models are investigated for their potential to target diagnostic, preventive and therapeutic interventions for multifactorial diseases. Implementation of these models in health care requires a series of studies that encompass all phases of translational research [1, 2], starting with a comprehensive evaluation of genetic risk prediction. Various measures are available for the assessment of risk prediction models, each addressing a different aspect of performance and utility [3, 4]. The GRIPS Statement recommends that transparent and complete reporting should provide a description of the risk factors and the risk model by reporting univariate and multivariate odds ratios for the predictors, present risk distributions for individuals with and without the outcome of interest, and report measures of model fit, predictive ability and others, if pertinent [5, 6]. Examples of measures include the Hosmer–Lemeshow statistic [7] and Nagelkerke’s R2 [8] for model fit, the area under the receiver operating characteristic (ROC) curve (AUC) [9] and integrated discrimination improvement (IDI) [10] for predictive ability, and percentages of total reclassification [11] and net reclassification improvement (NRI) [10] for clinical utility. Even though the assessment of risk prediction models is relatively standard, there is no single statistical package that would allow for the computation and production of all these measures and plots. Therefore, we developed PredictABEL, a freely available R package, which contains functions to obtain all descriptive tables, measures and plots that are used in genetic risk prediction studies.

Description of PredictABEL

The core part of PredictABEL comprises functions for the assessment of risk prediction models. The measures and plots covered in PredictABEL are listed in Table 1. Most functions can be applied to predicted risks, risk scores or any other continuous predictor variable, but some to predicted risks (probabilities) only. Predicted risks and genetic risk scores can be obtained using functions in the package, but they can be imported from other programs as well. The functions to obtain predicted risks using logistic regression analysis are specifically written for models that include genetic variables, eventually in addition to non-genetic factors, but they can also be applied to construct models based on non-genetic risk factors only. Genetic risk scores can be computed as unweighted and weighted risk scores, where weights are obtained from uploaded data or imported from meta-analyses, e.g., as beta coeffcients.
Table 1

Measures and plots covered in PredictABEL (version 1.1)

Measures and plotsDescription
Description of the data

Frequencies

Univariate odds ratios

Allele and genotype frequencies by disease status

Odds ratios per allele and per genotype

Description of the model

Multivariate odds ratios

Risk distribution

Predictiveness curve

Odds ratios adjusted for all predictors in the logistic regression modela

Histogram of predicted risks by disease status

Cumulative percentage of individuals against predicted risks

Overall model performance

Nagelkerke’s R2

Brier score

Percentage of variance in the outcome explained by predictors in the logistic regression modela

Average squared difference between predicted risks and observed disease status

Calibration

Hosmer–Lemeshow statistic

Calibration plot

Average difference between observed and predicted risks across subgroups

Observed and predicted risks across subgroups

Discrimination

Receiver operating characteristic (ROC) curve

Area under the ROC curve (AUC)

Discrimination box plot

Integrated discrimination improvement (IDI)

Sensitivity and specificity for all possible cut-off values of predicted risks

Measure of discriminative accuracy

Box plot of predicted risks by disease status

Comparison of mean difference in predicted risks of individuals with and without the disease between initial and updated model

Reclassification

Reclassification table

Net reclassification improvement (NRI)

Number of individuals per risk category of the initial against the updated model by disease status

Net improvement in risk classification in individuals with and without the disease.

aThese functions can only be used when the logistic regression model is constructed using the functions in PredictABEL

Measures and plots covered in PredictABEL (version 1.1) Frequencies Univariate odds ratios Allele and genotype frequencies by disease status Odds ratios per allele and per genotype Multivariate odds ratios Risk distribution Predictiveness curve Odds ratios adjusted for all predictors in the logistic regression modela Histogram of predicted risks by disease status Cumulative percentage of individuals against predicted risks Nagelkerke’s R2 Brier score Percentage of variance in the outcome explained by predictors in the logistic regression modela Average squared difference between predicted risks and observed disease status Hosmer–Lemeshow statistic Calibration plot Average difference between observed and predicted risks across subgroups Observed and predicted risks across subgroups Receiver operating characteristic (ROC) curve Area under the ROC curve (AUC) Discrimination box plot Integrated discrimination improvement (IDI) Sensitivity and specificity for all possible cut-off values of predicted risks Measure of discriminative accuracy Box plot of predicted risks by disease status Comparison of mean difference in predicted risks of individuals with and without the disease between initial and updated model Reclassification table Net reclassification improvement (NRI) Number of individuals per risk category of the initial against the updated model by disease status Net improvement in risk classification in individuals with and without the disease. aThese functions can only be used when the logistic regression model is constructed using the functions in PredictABEL The tables and plots generated using PredictABEL are saved as separate files in the working directory. Tables can be saved as Excel or tab-delimited text files and figures can be saved as publication-quality EPS or TIFF files or as JPEG files for insertion in manuscripts. All figures are available in a ready-made layout, but they can be customized to the journal style or preferences of the user. A hypothetical dataset and examples of use are included in the package to demonstrate all functions.

Example

The hypothetical dataset included in the package was reconstructed from an empirical study on age-related macular degeneration (AMD) [12], using a simulation method that has been described in detail elsewhere [13]. Based on published frequencies and odds ratios of the genetic variants and non-genetic risk factors implicated in AMD and on published population disease risks, we created a dataset that contains genotype data and disease status for 10,000 individuals. Predicted risks were obtained using logistic regression analysis, for which the codes are provided in the package. Two risk models were constructed: a model based on non-genetic risk factors only and a model based on genetic and non-genetic predictors. Figure 1 presents three examples of plots that are produced by PredictABEL. Figure 1a shows distributions of predicted risks based on genetic and non-genetic factors for individuals with and without AMD. The degree of overlap between the two histograms is indicative for the discriminative accuracy of the risk model. This discriminative accuracy is assessed by the AUC and visualized in a ROC plot. Figure 1b presents the ROC curves for the two risk models. The figure shows that the model with genetic factors had a higher AUC than the model without. Using the same function, the AUC values were quantified as 0.80 and 0.74. Finally, Fig. 1c presents the calibration plot for the risk model based on the genetic and non-genetic variables as predictors, which shows how well predicted risks match observed risks. The calibration plot suggests that the model was well calibrated, which was supported by the non-significance of the Hosmer–Lemeshow test (P = 0.65).
Fig. 1

Example graphs produced by PredictABEL. a Distributions of predicted risks in individuals with and without age-related macular degeneration (AMD); b ROC plot presenting risk models without and with genetic variants; and c Calibration plot comparing predicted risks with observed risks. Figure 1a and c present the risk model based on genetic and non-genetic risk factors

Example graphs produced by PredictABEL. a Distributions of predicted risks in individuals with and without age-related macular degeneration (AMD); b ROC plot presenting risk models without and with genetic variants; and c Calibration plot comparing predicted risks with observed risks. Figure 1a and c present the risk model based on genetic and non-genetic risk factors Finally, Table 2 presents an example of the reclassification table and statistics that are produced by PredictABEL. The reclassification table presents the categorization into risk groups according to the initial and updated risk models. The table provides information about the total number of individuals that change between risk categories and about correct and incorrect reclassification. The percentage of total reclassification and NRI are calculated from the reclassification table. The table indicates that net 8.8% of the individuals without AMD and 9.6% of those with AMD would be correctly reclassified when the clinical model was updated by the addition of genetic factors.
Table 2

Reclassification table comparing clinical risk models without and with genetic factors

Without genetic predictorsWith genetic predictorsReclassifiedNet correctly reclassified (%)
<10%10–35%>35%Increased riskDecreased risk
Individuals without AMD
 <5%2,1874590
 10–35%1,2252,9133578161,5208.8
 >35%15280577
Individuals with AMD
 <5%53340
 10–35%939193263601709.6
 >35%176485

Net reclassification improvement 18.4% (95% CI 15.8–20.9); P < 0.001

AMD age-related macular degeneration, CI confidence interval. Values are numbers unless otherwise indicated. The cut-off risk thresholds chosen are for illustration purposes only and do not reflect clinically significant categories

Reclassification table comparing clinical risk models without and with genetic factors Net reclassification improvement 18.4% (95% CI 15.8–20.9); P < 0.001 AMD age-related macular degeneration, CI confidence interval. Values are numbers unless otherwise indicated. The cut-off risk thresholds chosen are for illustration purposes only and do not reflect clinically significant categories

Conclusions

PredictABEL is a comprehensive software package, designed for the development and assessment of genetic risk prediction models. PredictABEL is a part of the GenABEL software suite for statistical genomics [14, 15] and for that reason written in R to enable easy transfer of data from gene discovery to genetic prediction studies. A detailed manual is available that demonstrates and explains all the functions in the package. The manual is accessible for researchers who do not regularly use R software. The manual and the package are freely available from the GenABEL project website (http://www.genabel.org) and from CRAN (http://cran.r-project.org/). The current version of PredictABEL (version 1.1) includes all basic descriptive tables, measures and plots that are used in the assessment of risk prediction models. Planned extensions of the package include other strategies to construct risk models, e.g., using Cox Proportional Hazards analysis for prospective data, and functions to construct simulated data for the evaluation of genetic risk models [13]. Furthermore, we will optimize the interconnectivity between PredictABEL and other packages in the GenABEL suite. Where the GRIPS Statement aims to improve the transparency, quality and completeness of reporting [5, 6], PredictABEL has similar goals for the assessment of genetic risk prediction studies. The collection of all measures and plots in a single, software package gives a comprehensive overview of the various measures that are available for the assessment of risk prediction studies. This overview emphasizes that different measures are available to answer different questions in the assessment of risk models and facilitates the selection of the most appropriate measure for the question under study.
  14 in total

1.  GenABEL: an R library for genome-wide association analysis.

Authors:  Yurii S Aulchenko; Stephan Ripke; Aaron Isaacs; Cornelia M van Duijn
Journal:  Bioinformatics       Date:  2007-03-23       Impact factor: 6.937

2.  Assessing new biomarkers and predictive models for use in clinical practice: a clinician's guide.

Authors:  Kevin McGeechan; Petra Macaskill; Les Irwig; Gerald Liew; Tien Y Wong
Journal:  Arch Intern Med       Date:  2008-11-24

Review 3.  A comparison of goodness-of-fit tests for the logistic regression model.

Authors:  D W Hosmer; T Hosmer; S Le Cessie; S Lemeshow
Journal:  Stat Med       Date:  1997-05-15       Impact factor: 2.373

4.  The meaning and use of the area under a receiver operating characteristic (ROC) curve.

Authors:  J A Hanley; B J McNeil
Journal:  Radiology       Date:  1982-04       Impact factor: 11.105

5.  Use and misuse of the receiver operating characteristic curve in risk prediction.

Authors:  Nancy R Cook
Journal:  Circulation       Date:  2007-02-20       Impact factor: 29.690

6.  Criteria for evaluation of novel markers of cardiovascular risk: a scientific statement from the American Heart Association.

Authors:  Mark A Hlatky; Philip Greenland; Donna K Arnett; Christie M Ballantyne; Michael H Criqui; Mitchell S V Elkind; Alan S Go; Frank E Harrell; Yuling Hong; Barbara V Howard; Virginia J Howard; Priscilla Y Hsue; Christopher M Kramer; Joseph P McConnell; Sharon-Lise T Normand; Christopher J O'Donnell; Sidney C Smith; Peter W F Wilson
Journal:  Circulation       Date:  2009-04-13       Impact factor: 29.690

7.  Prediction model for prevalence and incidence of advanced age-related macular degeneration based on genetic, demographic, and environmental variables.

Authors:  Johanna M Seddon; Robyn Reynolds; Julian Maller; Jesen A Fagerness; Mark J Daly; Bernard Rosner
Journal:  Invest Ophthalmol Vis Sci       Date:  2008-12-30       Impact factor: 4.799

Review 8.  The continuum of translation research in genomic medicine: how can we accelerate the appropriate integration of human genome discoveries into health care and disease prevention?

Authors:  Muin J Khoury; Marta Gwinn; Paula W Yoon; Nicole Dowling; Cynthia A Moore; Linda Bradley
Journal:  Genet Med       Date:  2007-10       Impact factor: 8.822

9.  Strengthening the reporting of genetic risk prediction studies: the GRIPS statement.

Authors:  A Cecile J W Janssens; John P A Ioannidis; Cornelia M van Duijn; Julian Little; Muin J Khoury
Journal:  Eur J Epidemiol       Date:  2011-04       Impact factor: 8.082

10.  Predictive testing for complex diseases using multiple genes: fact or fiction?

Authors:  A Cecile J W Janssens; Yurii S Aulchenko; Stefano Elefante; Gerard J J M Borsboom; Ewout W Steyerberg; Cornelia M van Duijn
Journal:  Genet Med       Date:  2006-07       Impact factor: 8.822

View more
  100 in total

1.  An examination of the dynamic changes in prostate-specific antigen occurring in a population-based cohort of men over time.

Authors:  Brant A Inman; Jingyu Zhang; Nilay D Shah; Brian T Denton
Journal:  BJU Int       Date:  2012-02-07       Impact factor: 5.588

2.  Can we improve clinical prediction of at-risk older drivers?

Authors:  Alex R Bowers; R Julius Anastasio; Sarah S Sheldon; Margaret G O'Connor; Ann M Hollis; Piers D Howe; Todd S Horowitz
Journal:  Accid Anal Prev       Date:  2013-07-16

3.  The Rotterdam Study: 2014 objectives and design update.

Authors:  Albert Hofman; Sarwa Darwish Murad; Cornelia M van Duijn; Oscar H Franco; André Goedegebure; M Arfan Ikram; Caroline C W Klaver; Tamar E C Nijsten; Robin P Peeters; Bruno H Ch Stricker; Henning W Tiemeier; André G Uitterlinden; Meike W Vernooij
Journal:  Eur J Epidemiol       Date:  2013-11-21       Impact factor: 8.082

4.  Urinary biomarker incorporation into the renal angina index early in intensive care unit admission optimizes acute kidney injury prediction in critically ill children: a prospective cohort study.

Authors:  Shina Menon; Stuart L Goldstein; Theresa Mottes; Lin Fei; Ahmad Kaddourah; Tara Terrell; Patricia Arnold; Michael R Bennett; Rajit K Basu
Journal:  Nephrol Dial Transplant       Date:  2016-02-02       Impact factor: 5.992

5.  A 17-gene stemness score for rapid determination of risk in acute leukaemia.

Authors:  Stanley W K Ng; Amanda Mitchell; James A Kennedy; Weihsu C Chen; Jessica McLeod; Narmin Ibrahimova; Andrea Arruda; Andreea Popescu; Vikas Gupta; Aaron D Schimmer; Andre C Schuh; Karen W Yee; Lars Bullinger; Tobias Herold; Dennis Görlich; Thomas Büchner; Wolfgang Hiddemann; Wolfgang E Berdel; Bernhard Wörmann; Meyling Cheok; Claude Preudhomme; Herve Dombret; Klaus Metzeler; Christian Buske; Bob Löwenberg; Peter J M Valk; Peter W Zandstra; Mark D Minden; John E Dick; Jean C Y Wang
Journal:  Nature       Date:  2016-12-07       Impact factor: 49.962

6.  Cross-national validation of prognostic models predicting sickness absence and the added value of work environment variables.

Authors:  Corné A M Roelen; Christina M Stapelfeldt; Martijn W Heymans; Willem van Rhenen; Merete Labriola; Claus V Nielsen; Ute Bültmann; Chris Jensen
Journal:  J Occup Rehabil       Date:  2015-06

7.  Predicting outcome after traumatic brain injury: development of prognostic scores based on the IMPACT and the APACHE II.

Authors:  Rahul Raj; Jari Siironen; Riku Kivisaari; Juha Hernesniemi; Markus B Skrifvars
Journal:  J Neurotrauma       Date:  2014-08-12       Impact factor: 5.269

8.  Variations in ADIPOR1 But Not ADIPOR2 are Associated With Hypertriglyceridemia and Diabetes in an Admixed Latin American Population.

Authors:  Gustavo Mora-García; María S Ruiz-Díaz; Fabian Espitia-Almeida; Doris Gómez-Camargo
Journal:  Rev Diabet Stud       Date:  2017-10-10

9.  The Associations of Blood Kidney Injury Molecule-1 and Neutrophil Gelatinase-Associated Lipocalin with Progression from CKD to ESRD.

Authors:  Helen V Alderson; James P Ritchie; Sabrina Pagano; Rachel J Middleton; Menno Pruijm; Nicolas Vuilleumier; Philip A Kalra
Journal:  Clin J Am Soc Nephrol       Date:  2016-11-16       Impact factor: 8.237

Review 10.  The Prediction of Radiotherapy Toxicity Using Single Nucleotide Polymorphism-Based Models: A Step Toward Prevention.

Authors:  Sarah L Kerns; Suman Kundu; Jung Hun Oh; Sandeep K Singhal; Michelle Janelsins; Lois B Travis; Joseph O Deasy; A Cecile J E Janssens; Harry Ostrer; Matthew Parliament; Nawaid Usmani; Barry S Rosenstein
Journal:  Semin Radiat Oncol       Date:  2015-05-15       Impact factor: 5.934

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.