| Literature DB >> 32023287 |
Parichoy Pal Choudhury1,2, Paige Maas2, Amber Wilcox2,3, William Wheeler4, Mark Brook5, David Check2, Montserrat Garcia-Closas2, Nilanjan Chatterjee1,6.
Abstract
This report describes an R package, called the Individualized Coherent Absolute Risk Estimator (iCARE) tool, that allows researchers to build and evaluate models for absolute risk and apply them to estimate an individual's risk of developing disease during a specified time interval based on a set of user defined input parameters. An attractive feature of the software is that it gives users flexibility to update models rapidly based on new knowledge on risk factors and tailor models to different populations by specifying three input arguments: a model for relative risk, an age-specific disease incidence rate and the distribution of risk factors for the population of interest. The tool can handle missing information on risk factors for individuals for whom risks are to be predicted using a coherent approach where all estimates are derived from a single model after appropriate model averaging. The software allows single nucleotide polymorphisms (SNPs) to be incorporated into the model using published odds ratios and allele frequencies. The validation component of the software implements the methods for evaluation of model calibration, discrimination and risk-stratification based on independent validation datasets. We provide an illustration of the utility of iCARE for building, validating and applying absolute risk models using breast cancer as an example.Entities:
Mesh:
Year: 2020 PMID: 32023287 PMCID: PMC7001949 DOI: 10.1371/journal.pone.0228198
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Summary of features of existing R packages for risk prediction.
✓ denotes the presence of the feature and × denotes absence of the feature.
| Packages | Model building | Model validation | |||||
|---|---|---|---|---|---|---|---|
| Calibration to population incidence | Detailed family history | Special option for SNP markers | Imputation of missing risk-factors | Full cohort | Two-phase study | Imputation of missing risk-factors | |
| riskRegression | × | × | × | × | ✓ | × | × |
| predictABEL | × | × | × | × | ✓ | × | × |
| BCRA | ✓ | × | × | × | × | × | × |
| BayesMendel | × | ✓ | × | ✓ | × | × | × |
| rmap | × | × | × | × | ✓ | ✓ | × |
|
| ✓ | × | ✓ | ✓ | ✓ | ✓ | ✓ |
a These packages include some functions for model building (see Section), but those approaches do not demonstrate the key features shown in the above table.
b Capability to use information from multiple data sources, e.g., BCRA and iCARE can use relative risk parameters from cohort or case-control studies and disease incidence and mortality rates from population registries.
c BCRA estimates baseline hazard and calibrates the model to the underlying population incidence rates using distribution of risk-factors from cases in a specific study that may not be representative of the general population. This step is implemented in iCARE using a reference dataset that provides information on the distribution of risk-factors in the general population.
d iCARE includes the special option in which independent SNP markers can be included using published estimates of odds ratios and allele frequencies.
e Inverse probability weighted estimators of model validation statistics are implemented, accounting for bias due to non-random sampling using sampling weights.
f BayesMendel incorporates imputation methods for certain risk-factors (e.g., age), but they do not implement any method of validating risk prediction models. iCARE implements an inbuilt imputation approach to deal with missing risk-factors using a reference risk-factor dataset representative of the underlying population. The standardized model validation methods implemented in iCARE can take advantage of this inbuilt feature to impute missing risk-factors in the validation study.
Fig 1Estimated absolute risk of three women overlaid on the population distribution of absolute risk in the age interval: 50-80 years.
Detailed information about the classical risk factors included in the model.
| Risk factor | Variable name | Variable type |
|---|---|---|
| Family history (presence or absence of disease among first degree relatives) | Binary | |
| Presence | ||
| Absence | ||
| Age at menarche (years) | Categorical: | |
| ≤ 11 | ||
| 11-11.5 | ||
| 11.5-12 | ||
| 12-13 | ||
| 13-14 | ||
| 14-15 | ||
| >15 | ||
| Parity (number of full term pregnancies) | Categorical | |
| nulliparous | ||
| 1 birth | ||
| 2 births | ||
| 3 births | ||
| ≥ 4 births | ||
| Age at first birth (years) | Categorical | |
| ≤ 19 | ||
| 19-22 | ||
| 22-23 | ||
| 23-25 | ||
| 25-27 | ||
| 27-30 | ||
| 30-34 | ||
| 34-38 | ||
| >38 | ||
| Age at menopause (years) | Categorical | |
| ≤40 | ||
| 40-45 | ||
| 45-47 | ||
| 47-48 | ||
| 48-50 | ||
| 50-51 | ||
| 51-52 | ||
| 52-53 | ||
| 53-55 | ||
| >55 | ||
| Height (meters) | Categorical | |
| ≤ 1.55 | ||
| 1.55-1.57 | ||
| 1.57-1.60 | ||
| 1.60-1.61 | ||
| 1.61-1.63 | ||
| 1.63-1.65 | ||
| 1.65-1.66 | ||
| 1.66-1.68 | ||
| 1.68-1.71 | ||
| >1.71 | ||
| Body mass index (BMI, | Categorical | |
| ≤ 21.5 | ||
| 21.5-23 | ||
| 23-24.2 | ||
| 24.2-25.3 | ||
| 25.3-26.5 | ||
| 26.5-27.8 | ||
| 27.8-29.3 | ||
| 29.3-31.4 | ||
| 31.4-34.6 | ||
| >34.6 | ||
| Use of Hormone Replacement Therapy (HRT) | Categorical | |
| Premenopausal | ||
| Postmenopausal and never HRT user | ||
| Postmenopausal and ever HRT user | ||
| Use of estrogen + progesterone combined therapy | Binary | |
| Postmenopausal and ever user of combined therapy therapy | ||
| Otherwise | ||
| Use of estrogen only therapy | Binary | |
| Postmenopausal and ever user of estrogen only therapy | ||
| Otherwise | ||
| Current use of HRT | Binary | |
| Postmenopausal and current HRT user | ||
| Otherwise | ||
| Alcohol, drinks/week | Categorical | |
| None | ||
| 0-0.4 | ||
| 0.4-0.8 | ||
| 0.8-1.5 | ||
| 1.5-3.2 | ||
| 3.2-5.7 | ||
| 5.7-9.8 | ||
| >9.8 | ||
| Smoking status | Binary | |
| Ever | ||
| Never | ||
| Interaction of BMI and use of HRT | Categorical |
Fig 2Plots showing model validation results.
The top left panel shows calibration results for absolute risk; the top right panel shows calibration for relative risk with respect to the average risk in the validation study; the bottom left panel shows the distribution of risk score in cases (red) and controls (black); the bottom right panel shows the population incidence rates (black) and the incidence rates estimated in validation study (red).