Literature DB >> 25288876

A review of cancer risk prediction models with genetic variants.

Xuexia Wang¹, Michael J Oldani², Xingwang Zhao¹, Xiaohui Huang³, Dajun Qian⁴.

Abstract

Cancer risk prediction models are important in identifying individuals at high risk of developing cancer, which could result in targeted screening and interventions to maximize the treatment benefit and minimize the burden of cancer. The cancer-associated genetic variants identified in genome-wide or candidate gene association studies have been shown to collectively enhance cancer risk prediction, improve our understanding of carcinogenesis, and possibly result in the development of targeted treatments for patients. In this article, we review the cancer risk prediction models that have been developed for popular cancers and assess their applicability, strengths, and weaknesses. We also discuss the factors to be considered for future development and improvement of models for cancer risk prediction.

Entities: Chemical Disease Gene Mutation Species

Keywords: cancer; cancer intervention; cancer risk prediction; genetic variants; risk prediction models

Year: 2014 PMID： 25288876 PMCID： PMC4179686 DOI： 10.4137/CIN.S13788

Source DB: PubMed Journal: Cancer Inform ISSN： 1176-9351

Introduction

Cancer is one of the leading causes of death worldwide. A large percentage of patients are diagnosed at an advanced stage, making the removal of tumors in this population problematic. As a result, the overall 5-year survival rate is low for this cohort of patients.1 Therefore, early stage detection would be helpful in reducing cancer mortality because treatment might be most effective at the earliest stages of the disease. For this reason, a well-established assessment model would greatly benefit patients, clinicians, and researchers because it would allow individuals at high risk to be identified at the earliest stages. Cancer is a polygenic disease in which many genetic factors appear to play important roles in disease development in its different subtypes of cancer.2 To date, more than 50 cancer genome-wide association studies (GWAS) incorporating more than 15 different malignancies have been reported identifying over 100 genomic cancer susceptibility regions.3 The cancer-associated genetic variants identified in GWAS or candidate gene association studies have been shown to collectively enhance cancer risk prediction, improve our understanding of carcinogenesis, and possibly result in the development of targeted treatments for patients. For example, clinicians already use these kinds of guidelines in making decisions about assessments in order to identify carriers of BRCA1 and BRCA2 mutations, which indicate very high risks of breast and ovarian cancer.4 The number of rapidly discovered cancer-associated genetic variants continues to rise and is reflected by the increasing number of published articles looking closely at the performance of genetic variants in popular cancer risk prediction models. These studies have prompted an updated assessment of the associations between genetic variants and cancer risk. Nevertheless, to date there has been no literature review concerning these publications, which provides an initial assessment. This paper examines in detail the performance of cancer risk prediction models with genetic variants by examining the relevant studies through PubMed, Medline, and Web of Science. This review article summarizes what has been learned regarding the contribution of genetic variants as an alternative or as a supplement to the components of risk prediction models for cancer including breast cancer, prostate cancer, testicular cancer, lung cancer, and bladder cancer, as well as cancers of the head and neck.

Breast Cancer

Although the incidence rate of breast cancer has been declining since 1998–1999, there will still be 232,670 new female cases and 2,360 new male cases in the US in 2014 (http://www.cancer.gov/cancertopics/types/breast). Early stage detection of breast cancer is very important because treatment can be more effective at the early stages. For this reason, a well-established assessment model that could identify individuals at high risk would greatly benefit patients, clinicians, and researchers in the prevention and intervention of breast cancer. Risk prediction models have been widely used to identify individuals with high risk of breast cancer. The Gail model,5,6 for example, is used by FDA to screen women with high risk for chemopreventive use of tamoxifen. Traditional risk factors like family history, age at menarche, age at first live birth and number of previous breast biopsies, mammographic density, and diagnosis of atypical hyperplasia have been used to predict breast cancer as well. Recently, genetic susceptibility risk prediction has been improved with the discovery of more than 40 risk-associated single-nucleotide polymorphisms (SNPs) from GWAS. Several breast cancer susceptibility genes have now been identified, including BRCA1,* BRCA2, TP53, and PTEN/MMAC1. Approximately, 60% of women with an inherited mutation in BRCA1 or BRCA2 will develop breast cancer sometime during their lives, compared with about 12% of women in the general population. Women with inherited BRCA1 or BRCA2 gene mutations also have an increased risk of ovarian cancer. Thus, evaluating these genetic susceptive predicting models has become crucial during clinical decision-making in order to help physicians and patients to determine whether a genetic testing is warranted. Wacholder et al.7 evaluated the Gail model using 5,590 cases and 5,998 control subjects, which are from four US cohort studies as well as from a Poland case control study. The range of age of all the subjects is from 50- to 79-years old. The area under the receiver-operating-characteristic (ROC) curve (AUC)** is 58% for four traditional risk factors. After incorporating 10 genetic variants into the prediction using a logistic regression model, the Wacholder study achieved a 61.8% AUC, a 3.8% increase over the model without genetic variants. Another notable study in breast cancer risk prediction was done by Machiela et al.8 In this study, a total of 1,145 breast cancer cases and 1,142 controls from the Nurses’ Health Study were used to build and evaluate polygenic risk scores (PRSs) with 10–60,000 independent SNPs showing the strongest evidence of association with breast cancer. No significant evidence was found that polygenic risk score (PRS) using common variants could improve risk prediction for breast cancer over replicated SNP scores that had been robustly replicated across several independent sample sets. Some polymorphisms identified in GWAS were also associated with an increased risk of breast cancer for BRCA1 or BRCA2 mutation carriers. Another study by Antoniou et al.9 reanalyzed the association between breast cancer and six susceptibility polymorphisms in gene FGFR2, TNRC9/TOX3, MAP3K1, LSP1, 2q35 using a sample of 12,525 BRCA1, and 7,409 BRCA2 carriers. The six susceptibility polymorphisms were identified in recent large-scale association studies conducted by the Consortium of Investigators of Modifiers of BRCA1/2.10 Three additional SNPs (ie, rs4973768 in SLC4A7/NEK10, rs6504950 in STXBP4/COX11, and rs10941679 at 5p12) were also evaluated in this study. The interactions between SNPs were also investigated. Of the nine polymorphisms investigated, seven SNPs were found to be associated with breast cancer for BRCA2 carriers and two SNPs were associated with BRCA1 carriers. Additionally, interaction existed among all risk-associated polymorphisms for mutation carriers. Based on the joint genotype distribution of seven risk-associated SNPs in BRCA2 mutation carriers, the top 5% high-risk BRCA2 carriers were predicted to develop breast cancer by the age of 80 with a probability of 80–96%, whereas the bottom 5% low-risk BRCA2 carriers only have a risk of 42–50% of developing breast cancer. Thus, the author concluded that these risk differences could be used in the day-to-day clinical management of mutation carriers. Compared to high-penetrance mutations, such as BRCA1 or BRCA2, all of the genetic susceptibility loci identified in GWAS to date are low-penetrance polymorphisms, with weak associations to breast cancer risk. Although each low-penetrance variant confers only a small increase in the risk of breast cancer, a combination of single variants may act cumulatively to increase the risk. For example, Sueta et al.11 analyzed 23 genetic variants identified in previous GWASs and conducted a case–control study with 697 case subjects and 1,394 controls matched with age and menopausal status in the Japanese population. They fit conditional regression models with genetic variants and conventional risk factors. In addition, they created a polygenic risk score, using those variants with a statistically significant association with breast cancer risk, and also evaluated the contribution of these genetic predictors using AUC. Eleven SNPs revealed significant associations with breast cancer risk. In addition, a dose-dependent association was observed between the risks of breast cancer and the genetic risk score (GRS), which was an aggregate measure of alleles in seven selected variants. The AUC for the regression model, which included the GRS in addition to the conventional risk factors, was 0.6933, but it was only 0.6652 for conventional risk factors (P = 1.3 × 10−4). The population-attributable fraction of the risk score was 33.0%. Thus, this kind of study indicates that risk models, which include a GRS, are helpful in distinguishing women at high risk of breast cancer from those at low risk, particularly in the context of targeted prevention.

Prostate Cancer

Prostate cancer, behind only lung cancer, is the second leading cause of cancer-related deaths in American men. Recent data indicate that the estimated probability of being diagnosed with prostate cancer is 2.5%, 7%, and 13% for men ages 40–59 years, 60–69 years, and 70 years and older, respectively. In 2014, 233,000 new cases will be diagnosed in the US, and more than 29,480 men die of the disease (http://www.cancer.org/cancer/prostatecancer/detailedguide/prostate-cancer-key-statistics). Prostate cancer is also a complex and unpredictable disease, with the risk for cancer affected by advancing age, ethnic background, and family history.12 Prostate cancer is usually accompanied by a rise in the concentration of serum prostate-specific antigen (PSA). PSA lacks specificity, but, nevertheless, has been used for decades as a sensitive biomarker and has evolved into a controversial predictor of prostate cancer mortality. In general, prostatic biopsies are often deemed unnecessary, which underscores the need for improving prediction models with increased specificity in order to aid clinicians when deciding whether or not to recommend a biopsy for patients. This is especially relevant for men with mildly elevated PSA values (3–10 ng/mL), where the risk of being diagnosed with prostate cancer is only about 20–25%.13 After diagnosis, some cancers are indolent and cause no clinical problems, whereas others progress and may become fatal. Therefore, it is important to search for biomarkers that signal a need for more aggressive treatments, potentially improving clinical outcomes. Recently, more than 30 discovered SNPs have been associated with prostate cancer.14 These SNPs provide an opportunity to identify strong candidates for a predictive role. SNPs identified and associated with prostate cancer in GWAS are common but confer only small increases in the risk. The mechanisms underlying their association with prostate cancer risk remain unknown. Xu et al.15 used SNPs of multiple DNA sequence variants and family history to estimate the absolute risk for prostate cancer. These investigators examined a Swedish study with 2,893 cases and 1,781 controls and a study in the US – the Prostate, Lung, Colon and Ovarian Cancer Screening Trial with 1,172 cases and 1,157 controls. Individuals with more than 14 risk alleles and positive family history had almost a five-fold increase in risk compared with people who had 11 risk alleles and negative family history. The study also outlined the risk of developing prostate cancer for a 55-year-old man who has positive family history and more than 14 risk alleles as being 40% over the next 20 years, while men without family history and such genotypes saw their absolute risk reduced to 13%. In another study by Sun et al.16 the investigators assessed predictive performance by employing positive predictive values (PPV) as well as sensitivity using family history and three sets of SNPs associated with prostate cancer. This study was a population-based case–control study (2,899 cases and 1,722 controls) in another Swedish population. SNPs and family history emerged as factors that can differentiate individual risk for prostate cancer, while identifying men at higher risk. In this particular study, the top 18% of men had a two-fold risk, while the top 8% had developed a three-fold risk of having prostate cancer in a 20-year period (age range: 55–74 years). In addition, the study showed that including more SNPs in the risk prediction will increase sensitivity with PPV. Other studies have combined genetic variants along with PSA to predict the risk of prostate cancer. For example, a study by Johansson et al.17 specifically combined 33 genetic variants with PSA and evaluated the risk. This was a case control study (520 cases and 988 controls) nested within the Northern Sweden Health and Disease Cohort. The AUC was used to assess whether the GRS of 33 SNPs in addition to pre-diagnostic PSA improves prostate cancer prediction. Adding GRS into the model improved the AUC from 86.2% to 87.2%. Thus, it appears that including GRSs into these models may not be beneficial when competing for a clinical risk assessment of prostate cancer. Others such as Machiela et al.8 created a new model by applying PRSs18 and incorporating common variants to predict the risk of prostate cancer. A total of 1,164 prostate cancer cases and 1,113 controls from the Prostate Lung Colorectal and Ovarian Cancer Screening Trial were employed in this study. PRSs with 10 to 60,000 independent SNPs were used in the PRS model to compare with a model only including 30 published risk variants. Appling a 10-fold cross-validation for PRS model, the area under the ROC curve ranged from 0.564 (60,000 SNPs) to 0.569 (10 SNPs), while the AUC using 30 published risk SNPs from the literature was 0.614. Kote-Jarai19 also proposed a study to predict the risk of prostate cancer using a multiplicative risk model, which combines the risk variants. This study used a worldwide consortium, composed of 13 groups with 7,370 prostate cases and 5,742 controls. All of the loci contributed to 16% of the familial risk of the disease, and the top 10% of risk distribution doubled the chance of prostate cancer with an odds ratio of 2.1. The first risk prediction model for familial prostate cancer was developed by Macinnis et al.20 which incorporated 26 prostate cancer-associated SNPs identified in previous GWAS.21 Family phenotypes and histories were explained by a mixed model of inheritance which can be used to predict the probability of developing prostate cancer for an individual. Combined populations from 1,832 prostate cancer patients and relatives in Australian and 2,558 patients from prostate cancer clinics at the Royal Marsden NHS Foundation Trust (UK) were used. Using this predictive model, the risk of prostate cancer for an UK male can be predicted. For example, if a man’s genotype is in the top 10th percentile of joint genotype distribution and his father was diagnosed with prostate cancer at age 70, he would have a cumulative risk of 33% of developing prostate cancer by age 85. For a male with a genotype risk within the bottom 10%, the risk to develop prostate cancer would be 23%. In comparison, even without SNP information and incorporation into this kind of model, the risk remains 22% for a UK man. Finally, LindstrÖm et al.22 combined a series of risk models and estimated their performance in 7,509 prostate cancer cases and 7,652 controls within the National Cancer Institute Breast and Prostate Cancer Cohort Consortium. The investigators also calculated absolute risks based on the Surveillance, Epidemiology, and End Results incidence data. The best risk model included individual genetic markers and family history of prostate cancer. They observed a decreasing trend in discriminative ability with advancing age, with highest accuracy in men younger than 60 years. The absolute 10-year risk for 50-year-old men with a family history ranged from 1.6% (10th percentile of genetic risk) to 6.7% (90th percentile of genetic risk). For men without a family history, the risk ranged from 0.8% (10th percentile of genetic risk) to 3.4% (90th percentile of genetic risk). These results indicate that incorporating both genetic information and family history into prostate cancer risk models can be particularly useful for identifying younger men who might benefit from PSA screening.

Testicular Cancer

Testicular cancer remains the most common form of cancer in men between the ages of 15 and 35 (http://www.ncbi.nlm.nih.gov/pubmedhealth/PMH0002266/). It is also the most treatable form of cancer with a survival rate greater than 95% for the least aggressive type. The risk of this type of cancer has been reported to be 8- to 10-fold higher for brothers and two- to four-fold higher for the sons of men who previously had testicular cancer.23–27 Familial studies have estimated that genetic effects account for nearly a quarter of testicular cancer risk, which is one of the largest estimated heritabilities reported for any type of cancer.28 More specifically, GWASs have implicated multiple genomic regions associated with testicular cancer risk, including those containing KITLG, SPRY4, BAK1, ATF7IP, DMRT1, and TERT.29–33 Previously published GWAS and candidate gene data have also been used to build a multiplicative model with risk variants and estimate the AUC as a measure of discrimination between testicular cancer cases and controls. Kratz et al.’s34 study is one such example, of using this kind of data,29–33 where previously uncovered predisposition alleles in or near KITLG, BAK1, SPRY4, TERT, ATF7IP, and DMRT1 were used to predict the risk of testicular cancer by employing ROC curve analysis. The authors claim that an AUC of 69.2% suggests that about 69.2% of the time a randomly selected testicular cancer patient had a higher estimated risk than that of a randomly selected control subject. Another study showed how several established testicular germ cell tumor risk factors, such as cryptorchidism (relative risk (RR) = 4.8) and male infertility (standardized incidence ratio (SIR) = 2.8) can be incorporated into the clinical model to predict the risk.35 Under this kind of multiplicative model, the authors estimated that white men in the top 1% of genetic risk as defined by eight risk variants had a relative risk that was 10.5-fold greater than that for a general population of similar male subjects. We have to be aware as well that white men are more likely to develop testicular cancer than African-American and Asian-American men. Because of this race/ethnicity effect, cancer risk prediction needs to be tailored for specific populations. Additionally, GWAS needs to be extended into different populations. Each specific population requires models developed with their specificity in mind in order to create improved methods for overall risk assessment of testicular cancer.

Lung Cancer

Lung cancer remains the most common form of human cancer with complex risk factors, including genetic and environmental effects. Heredity plays an important role, and in relatives of people with lung cancer, the risk is increased 2.4 times.36,37 This may be due, for example, to risks associated with genetic polymorphisms.38–43 Environmental factors, such as a history of smoking, are central to several proposed lung cancer risk assessment models. These models include the Bach model, Spize model, and Liverpool Lung Project (LLP) model44–47 as well as the improvement models based on LLP.48,49 The LLP risk model,45 developed from the LLP case–control study, provides a single unified model for smokers (current and former) and nonsmokers, whereas the Bach model was developed for predicting risk only in smokers and the Spitz model46 requires three separate models for predicting risk in current smokers, former smokers, or nonsmokers. In addition, the LLP model also accounts for important lung cancer risk factors in addition to age, sex, and smoking duration. These include history of pneumonia, a history of non-lung cancer, prior asbestos exposure, and family history. Overall, this comprehensive model is simpler to incorporate into a clinical setting than Tammemagi and colleagues’ model,47 which includes many smoking-related variables that may be difficult to obtain from patients during clinical exchanges. Other models have been developed in order to predict the 5-year absolute risk of lung cancer. For example, model based on five epidemiologic risk factors has been developed by the LLP by Raji et al.48 where investigators quantified the improvement in risk prediction with the addition of SEZ6L, a Met430IIe polymorphic variant linked with an increased risk of lung cancer, within the framework of the LLP risk model. In this predictive model, the authors combined the genotypes of 388 LLP subjects on SEZ6L SNP with epidemiologic risk factors. They use multivariable conditional logistic regression, with and without SEZ6L SNP, to predict 5-year absolute risk of lung cancer. Pair-wise comparison of the AUC and the net reclassification improvements (NRI) were also used to assess the improvement in the model itself with and without the SEZ6L SNP. The authors found a modest statistically significant increase in AUC when SEZ6L was added into the baseline model. The NRI for the genetic model was 27% with the SNP, while 15% without the SNP. Raji et al. also further evaluated the LLP risk model in terms of discrimination and its ability to demonstrate a predictive benefit for stratifying patients for computed tomography (CT) screening.49 These investigators assessed the 5-year absolute risks for lung cancer that were predicted by the LLP model in both case–control and prospective cohort study, which used data from three independent studies – the European Early Lung Cancer (EUELC), Harvard case–control studies, and the LLP population-based prospective cohort (LLPC) study from Europe and North America. The LLP risk model produced good discrimination in both the Harvard (AUC = 0.76 [95% confidence interval, CI, 0.75–0.78]) and the LLPC (AUC, 0.82 [CI, 0.80–0.85]) studies and modest discrimination in the EUELC (AUC, 0.67 [CI, 0.64–0.69]) study. The decision utility analysis, which incorporates the harm and benefit of using a risk model to make clinical decisions, indicated that the LLP risk model performed better than smoking duration or family history alone in stratifying high-risk patients for lung cancer CT screening. However, this model cannot assess whether the incorporation of other risk factors, such as lung function or genetic markers, will improve accuracy. In particular, the lack of information on asbestos exposure in the LLPC limited the ability to validate the complete LLP risk model. Models and risk evaluation focused on genetic susceptibility loci have conferred a small to moderate disease risk and appear to be of limited utility in risk prediction. Li et al.50 combined multiple disease-related loci with modest effects into a GRS and identified subgroups that were at high risk of lung cancer in a Chinese population. In their case–control study, they evaluated the discriminatory and predictive ability of the cumulative effect of several SNPs associated with lung cancer risk. Five SNPs identified in previous GWA or large cohort studies were genotyped in 5,068 Chinese case–control subjects. The GRS based on these SNPs was estimated by two approaches: a simple risk alleles count (cGRS) and a weighted (wGRS) method. The AUC in combination with the bootstrap resampling method was used to assess the predictive performance of the GRS for lung cancer. Four independent SNPs were found to be associated with a risk of lung cancer. The wGRS based on these four SNPs was a better predictor than cGRS. Using a liability threshold model, they estimated that these four SNPs accounted for only 4.02% of genetic variance in lung cancer. As with other studies, smoking history contributed significantly to lung cancer risk (P < 0.001) (AUC = 0.619 [0.603–0.634]), with the AUC value becoming 0.639 (0.621–0.652) after incorporation with wGRS and adjustment for over-fitting. Ultimately, this model shows some promise for assessing lung cancer risk in a Chinese population. Black/white disparities concerning in lung cancer incidence and mortality mandate an evaluation of underlying biological differences. Etzel et al.51 have previously shown higher risks of lung cancer associated with prior emphysema in African-American populations compared with white patients with lung cancer. Spitz et al.52 further evaluated a panel of 1,440 inflammatory gene variants in a two-phase analysis (discovery and replication), adding top GWAS lung cancer hits from white populations, and 28 SNPs from a published gene panel. The discovery set (477 self-designated African-Americans cases, 366 controls matched on age, ethnicity, and gender) was from Houston, Texas. The external replication set (330 cases and 342 controls) was from the EXHALE study at Wayne State University. In discovery, 154 inflammation SNPs were significant (P < 0.05) on univariate analysis. One inflammation SNP, rs950286, which is intergenic between IRF4 and EXOC2 genes, was successfully replicated with a concordant odds ratio of 1.46 (1.14–1.87) in discovery, 1.37 (1.05–1.77) in replication, and a combined odds ratio of 1.40 (1.17–1.68). These researchers also constructed and validated an epidemiological discovery model. Furthermore, they extended risk prediction models, with the AUC for the epidemiologic discovery model being 0.77 and 0.80 for the extended model; for the combined datasets, the AUC values were 0.75 and 0.76, respectively.

Bladder Cancer

Bladder cancer remains a major health issue worldwide. In the US, bladder cancer is the fourth most common tumor in men and an estimated 74,690 new diagnoses are expected in 2014 (http://www.cancer.org/cancer/bladdercancer/detailedguide/bladder-cancer-key-statistics). The disease generally presents in older individuals, and is more common in men than women, with higher frequency among white patients than those of other ethnicities. Smoking is the most widely recognized cause of bladder cancer and accounts for half of all cases in the US. The first risk prediction model for bladder cancer was developed by Wu et al.53 in 2007. Patient epidemiologic and genetic data from a case–control study were used to build risk prediction models and constructed ROC. The AUC was used to evaluate the model’s discriminatory ability. The model consisted of 678 white patients and 678 controls and included mutagen sensitivity and pack-years as well as six other risk factors, while achieving a 0.80 AUC, demonstrating good discrimination ability. In 2009,54 the same group added three bladder cancer predisposition SNPs into the risk prediction model but found no improvement of the discrimination power. However, Chen and colleagues54 also pointed out that with the development of computing power and statistical tools, other risk factors such as gene–gene interaction and gene–environment interactions may offer greatly improved risk prediction. In a recent paper published in Cancer Research, Garcia-Closas and colleagues55 examined how genetic variants were recently identified in GWAS for bladder cancer interaction with smoking status to influence bladder cancer risk. The authors identified a new high-risk subgroup of individuals – current smokers carrying the highest genetic risk burden – who could be targeted for behavioral interventions and/or early detection protocols. This article is the first time to evaluate gene–environment interactions on risk difference, which indicates a new direction in bladder cancer prevention. Using data from seven studies, including 3,942 patients and 5,680 controls of European ancestry, the team investigated additive and multiplicative interactions between smoking status and 12 SNPs on the risk of developing bladder cancer. The SNPs selected for inclusion were recently identified bladder cancer susceptibility hits or known smoking metabolizing variants. To determine the combined effect of the SNPs across loci, the researchers created a PRS representing lowest to highest genetic risk quartiles. Smoking was assessed as lifetime history (ever/never), and as smoking status at the time of enrolment into the study (current/former/never). To gauge the public health relevance of their findings, they calculated the absolute risks resulting from the joint effects of smoking and the SNPs, and reported gene–environment interactions on the risk difference rather than relative risk of bladder cancer. Garcia-Closas and colleagues found that the cumulative 30-year absolute risk for bladder cancer in a 50-year-old US male varied by smoking status: 1.3% in never smokers, 3.0% for former smokers, and 6.2% for current smokers, confirming the importance of smoking as a strong risk factor for bladder cancer. When they factored in the PRS quartiles, the cumulative 30-year absolute risk for bladder cancer in a 50-year-old US male who is a current smoker and who carries the highest genetic risk jumped to 9.9%. Furthermore, they reported highly significant additive interactions between risk differences for smoking status across levels of PRS. They found that over four times more bladder cancer cases would be prevented if smoking were eliminated from the highest genetic risk group (n = 8,200 per 100,000 men) compared with the lowest genetic risk group (n = 2,000 per 100,000 men; P < 0.0001).

Head and Neck Cancer

The incidence of head and neck cancer has increased markedly in the last 20 years. Head and neck cancers account for about 3–5% of all cancers in the US. In this year, an estimated 55,070 people (40,220 men and 14,850 women) will develop head and neck cancers and 12,000 deaths (8,600 men and 3,400 women) will occur (http://www.cancer.net/cancer-types/head-and-neck-cancer/statistics). Cigarette smoking is associated with increased head and neck cancer risk and tobacco-related carcinogens are known to cause bulky DNA adducts. Nucleotide excision repair genes encode enzymes that remove adducts and may be independently associated with head and neck cancer risk, as well as modifiers of the association between smoking and head and neck cancer risk.56–58 Several studies have reported that SNPs of genes in multiple biological pathways are involved in the development of head and neck cancer.59–63 Recently, Annah et al.61 performed a two-stage GWAS with a total of 8,605 cases and 11,405 controls and reported that five genetic variants had significant associations with risk of upper aerodigestive tract cancers including head and neck cancer in Europeans. With the recent increase in associated SNPs with head and neck cancer being identified, the development of risk prediction models is catching up. A study by Wu et al.64 used a customized chip containing 9,645 chromosomal and mitochondrial SNPs (mtSNPs) to call genotypes for 150 early stage head and neck cancer patients with 300 controls. The goal is to model the second primary tumor or head and neck cancer recurrence using both clinical and epidemiological variables. Results showed that when 12 chromosomal SNPs and one mtSNP were incorporated into the model, the AUC increased from 0.64 to 0.84. The 95% CI of the AUC difference is 0.18–0.29, indicating significant improvement in discrimination power.

Discussion

Cancer is a polygenic disease in which many genetic factors appear to play important roles in disease development in its different subtypes of cancer.2 During the past several years, more than 100 SNPs have been identified that are associated with cancer.3 How to effectively incorporate these genetic susceptive variants in risk predictive models has become more and more important during the clinical decision-making process because effective models can help physicians and patients determine whether a genetic testing is needed. Although enormous progress has been made in the area of genetics and the susceptive risk prediction of cancer, cautions should be made when considering the application of these models within the clinical setting. Cancer remains a fundamentally complex disease with multiple, interacting risk factors. These risk factors include components of race/ethnicity, environmental carcinogens, familial history, genetic variants, and their interactions. The studies reviewed here should be understood as an initial attempt to begin a more systematic approach to assessing predictive risk models for cancer treatment in the future. In order to more accurately predict the overall risk of cancer in patients, risk prediction models need to be continuously reexamined, comprehensively assessed, and revised, taking into consideration specific populations and emergent subtype of cancer. Table 1 indicates that the cancer risk prediction models with genetic variants generally outperform the models without genetic variants in both discrimination and prediction of cancer. However, there are still many practical concerns on implementing genetic testing into the diagnostic process. For example, the substantial cost of genetic screening is one of the main concerns.

Table 1

Performance of cancer risk prediction models with genetic variants.

CANCER TYPE	RISK PREDICTION MODEL	TYPES OF GENETIC FACTOR	MEASURES OF PERFORMANCE	VARIANTS IN PREDICTION	REFERENCE
Breast	Logistic regression model	Individual SNP and GRS	AUC	Helpful	7
	Logistic regression model	PRS	AUC	Not helpful	8
	Conditional regression model	PRS	AUC	Helpful	11
Prostate	Logistic regression model	GRS	Relative risk	Helpful	15
	Multiplicative model	Individual SNPs	PPV and sensitivity	Helpful	16
	Logistic regression model	GRS	AUC	Not helpful	17
	Logistic regression and multiplicative model	Individual SNPs and GRS	Overall familial risk	Helpful	19
	Mixed recessive model	PRS	LRT and AIC	Helpful	20
	Logistic regression model	Individual SNPs and GRS	AUC	Helpful	22
Testicular	Multiplicative model	Individual SNPs	AUC	Helpful	34
Lung	Conditional logistic regression	Individual SNPs	AUC and NRI	Helpful	48
	Logistic regression model	GRS	AUC	Helpful	50
	Logistic regression model	Individual SNPs	AUC	Helpful	52
Bladder	Logistic regression model	Individual SNPs	AUC	Helpful	53
Bladder	Logistic regression model	Individual SNPs and PRS	Bootstrap resampling	Helpful	55
HNC	Cox proportional hazard model	Individual SNPs	AUC	Helpful	64

Abbreviations: NRI, the net reclassification improvements; PPV, positive predictive value; AUC, the area under the receiver operator characteristic (ROC) curve (AUC); PRS, polygenic risk score; GRS, genetic risk score; HNC, head and neck cancer; PSA, prostate-specific antigen; AIC, Akaike’s A Information Criterion; LRT, Likelihood ratio tests.

Table 2 summarizes the frequently used risk prediction models with genetic factors and general modeling procedures. The most commonly used model is logistic regression model. When dealing with multiple genetic factors and other covariates, logistic regression assumes a linear relationship among the predictors and uses a logit link to combine them into a one-dimensional fitted value.

Table 2

The frequently used risk prediction models and general modeling procedures.

RISK PREDICTION MODELS	TYPE OF GENETIC FACTOR	MODELING PROCEDURES	REFERENCES
Logistic regression model	Individual SNPs,	Assuming a linear relationship among multiple predictors and using a logit link to combine them into a one dimensional fitted value. Usually start with the main effects model.Univariate logistic regression analysis is performed to assess the main effects of each individual risk factor. Then, perform a stepwise logistic regression analysis to identify significant predictors in a multivariate model.	7, 22, 52
Logistic regression model	PRS, or GRS		8, 15, 17, 22
Multiplicative model	Individual SNPs	A multiplicative model is used to derive genotype relative risks by multiplying the allelic odds ratio (OR) of each SNP which is obtained from a marginal test. An individual is affected if his genotype relative risk is greater than a threshold.	15, 16
Conditional logistic regression	Individual SNPs	Conditional logistic regression works in nearly the same way as regular logistic regression except we need to specify which individuals belong to which matched set or stratum.	48
Cox proportional hazard model	Individual SNPs	For each SNP, the risks of disease occurrence is estimated as hazard ratios (HRs) using multivariable Cox proportional hazard regression models adjusted for age, gender, ethnicity, smoking status, tumor site, stage, and treatment, where appropriate.	64

Although more than 50 cancer GWAS incorporating more than 15 different malignancies have been reported, identifying over 100 genomic cancer susceptibility regions,3 for most malignancies the number of consistently confirmed SNPs is less than a dozen. The lack of power of GWAS suggests that there may exist many more SNPs associated with some malignancies that have smaller effect sizes. However, such SNPs may be statistically insignificant in genome-wide. How to effectively incorporate these SNPs in a risk prediction model is challenging since a group of these SNPs may likely make a positive contribution to a risk prediction model while the other ones may just add some noise. Evans et al.18 and Purcell et al.65 have proposed methods of aggregating information on a large number of SNP alleles associated with a trait that does not achieve stringent genome-wide statistical significance or even nominal statistical significance of P < 0.05. These models create PRS by summing risk alleles from thousands or tens of thousands of loci spanning the genome to predict an individual’s genetic risk of developing disease. Michielsla et al.8 built a logistic regression model and used PRS to reflect the genetic effect of lists of genetic markers prioritized by their association with breast cancer in a training dataset and evaluated whether these scores could improve current genetic prediction of these specific cancers in independent test samples. However, the logistic regression model integrating PRS did not outperform the model without PRS. Whereas, the study of Sueta et al.11 demonstrates that the regression model including PRS of seven published variants outperform the model without PRS with an increased AUC as 2.81%. Both Sueta et al.11 and Michielsla et al.8 use logistic regression models including PRS to predict breast cancer risk, but the performance of PRS in the two studies are quite different. Sueta et al. create the PRS based on published SNPs with a statistically marginally significant association with breast cancer risk. Michielsla et al.8 build PRS based on 10–60,000 common SNPs in their GWAS. The comparison of the two studies indicates how effectively selecting SNPs when creating PRS is critical and will affect the performance of the prediction model. Compared with traditional risk factors such as family history, smoking, age, and sex, sometimes the impact of genetic variants in predicting risk is small which may reflect the small effect size of disease-associated SNPs integrated in the risk prediction model. Due to the difference in effect sizes of associated SNPs, the power of genetic variants in prediction for different cancers is different as well. A recent report uses PRS to estimate the relative risks of disease. In these reported estimates, the predictive power is higher for prostate cancer than for breast cancer, which reflects the fact that the known associated SNP effect sizes for prostate cancer are greater and account for a larger percentage of the familial relative risk.66 In the article, we reviewed the cancer risk prediction models by different cancer types. But many cancers share the same major oncogenic or tumor suppressor genes such as KRAS, P53, SRC, HER2/neu, RAF, and MYC. Most oncogenes display a very broad tumor spectrum. For example, abnormalities of the P53 gene (which codes for the P53 protein) have been found in more than half of human cancers. Acquired mutations of this gene appear in a wide range of cancers, including lung, colorectal, and breast cancer. The predictive power of the same oncogenic gene might be different for different cancer types. If the incidence of the disease is low, such as ovarian cancer, the predictive power for ovarian cancer might be low as well. This advocates the need for the analysis of substantially larger numbers of cases, especially if there is significant variability across histological subtypes of the disease.

66 in total

Review 1. Polymorphisms in DNA repair genes and associations with cancer risk.

Authors: Ellen L Goode; Cornelia M Ulrich; John D Potter
Journal: Cancer Epidemiol Biomarkers Prev Date: 2002-12 Impact factor: 4.254

2. Familial risk of lung carcinoma in the Icelandic population.

Authors: Steinn Jonsson; Unnur Thorsteinsdottir; Daniel F Gudbjartsson; Hjortur H Jonsson; Kristleifur Kristjansson; Sigurdur Arnason; Vilmundur Gudnason; Helgi J Isaksson; Jonas Hallgrimsson; Jeffrey R Gulcher; Laufey T Amundadottir; Augustine Kong; Kari Stefansson
Journal: JAMA Date: 2004-12-22 Impact factor: 56.272

3. A risk model for prediction of lung cancer.

Authors: Margaret R Spitz; Waun Ki Hong; Christopher I Amos; Xifeng Wu; Matthew B Schabath; Qiong Dong; Sanjay Shete; Carol J Etzel
Journal: J Natl Cancer Inst Date: 2007-05-02 Impact factor: 13.506

Review 4. How nucleotide excision repair protects against cancer.

Authors: E C Friedberg
Journal: Nat Rev Cancer Date: 2001-10 Impact factor: 60.716

5. A risk prediction algorithm based on family history and common genetic variants: application to prostate cancer with potential clinical impact.

Authors: Robert J Macinnis; Antonis C Antoniou; Rosalind A Eeles; Gianluca Severi; Ali Amin Al Olama; Lesley McGuffog; Zsofia Kote-Jarai; Michelle Guy; Lynne T O'Brien; Amanda L Hall; Rosemary A Wilkinson; Emma Sawyer; Audrey T Ardern-Jones; David P Dearnaley; Alan Horwich; Vincent S Khoo; Christopher C Parker; Robert A Huddart; Nicholas Van As; Margaret R McCredie; Dallas R English; Graham G Giles; John L Hopper; Douglas F Easton
Journal: Genet Epidemiol Date: 2011-07-18 Impact factor: 2.135

6. Common breast cancer susceptibility alleles and the risk of breast cancer for BRCA1 and BRCA2 mutation carriers: implications for risk prediction.

Authors: Antonis C Antoniou; Jonathan Beesley; Lesley McGuffog; Olga M Sinilnikova; Sue Healey; Susan L Neuhausen; Yuan Chun Ding; Timothy R Rebbeck; Jeffrey N Weitzel; Henry T Lynch; Claudine Isaacs; Patricia A Ganz; Gail Tomlinson; Olufunmilayo I Olopade; Fergus J Couch; Xianshu Wang; Noralane M Lindor; Vernon S Pankratz; Paolo Radice; Siranoush Manoukian; Bernard Peissel; Daniela Zaffaroni; Monica Barile; Alessandra Viel; Anna Allavena; Valentina Dall'Olio; Paolo Peterlongo; Csilla I Szabo; Michal Zikan; Kathleen Claes; Bruce Poppe; Lenka Foretova; Phuong L Mai; Mark H Greene; Gad Rennert; Flavio Lejbkowicz; Gord Glendon; Hilmi Ozcelik; Irene L Andrulis; Mads Thomassen; Anne-Marie Gerdes; Lone Sunde; Dorthe Cruger; Uffe Birk Jensen; Maria Caligo; Eitan Friedman; Bella Kaufman; Yael Laitman; Roni Milgrom; Maya Dubrovsky; Shimrit Cohen; Ake Borg; Helena Jernström; Annika Lindblom; Johanna Rantala; Marie Stenmark-Askmalm; Beatrice Melin; Kate Nathanson; Susan Domchek; Ania Jakubowska; Jan Lubinski; Tomasz Huzarski; Ana Osorio; Adriana Lasa; Mercedes Durán; Maria-Isabel Tejada; Javier Godino; Javier Benitez; Ute Hamann; Mieke Kriege; Nicoline Hoogerbrugge; Rob B van der Luijt; Christi J van Asperen; Peter Devilee; E J Meijers-Heijboer; Marinus J Blok; Cora M Aalfs; Frans Hogervorst; Matti Rookus; Margaret Cook; Clare Oliver; Debra Frost; Don Conroy; D Gareth Evans; Fiona Lalloo; Gabriella Pichert; Rosemarie Davidson; Trevor Cole; Jackie Cook; Joan Paterson; Shirley Hodgson; Patrick J Morrison; Mary E Porteous; Lisa Walker; M John Kennedy; Huw Dorkins; Susan Peock; Andrew K Godwin; Dominique Stoppa-Lyonnet; Antoine de Pauw; Sylvie Mazoyer; Valérie Bonadona; Christine Lasset; Hélène Dreyfus; Dominique Leroux; Agnès Hardouin; Pascaline Berthet; Laurence Faivre; Catherine Loustalot; Tetsuro Noguchi; Hagay Sobol; Etienne Rouleau; Catherine Nogues; Marc Frénay; Laurence Vénat-Bouvet; John L Hopper; Mary B Daly; Mary B Terry; Esther M John; Saundra S Buys; Yosuf Yassin; Alexander Miron; David Goldgar; Christian F Singer; Anne Catharina Dressler; Daphne Gschwantler-Kaulich; Georg Pfeiler; Thomas V O Hansen; Lars Jønson; Bjarni A Agnarsson; Tomas Kirchhoff; Kenneth Offit; Vincent Devlin; Ana Dutra-Clarke; Marion Piedmonte; Gustavo C Rodriguez; Katie Wakeley; John F Boggess; Jack Basil; Peter E Schwartz; Stephanie V Blank; Amanda Ewart Toland; Marco Montagna; Cinzia Casella; Evgeny Imyanitov; Laima Tihomirova; Ignacio Blanco; Conxi Lazaro; Susan J Ramus; Lara Sucheston; Beth Y Karlan; Jenny Gross; Rita Schmutzler; Barbara Wappenschmidt; Christoph Engel; Alfons Meindl; Magdalena Lochmann; Norbert Arnold; Simone Heidemann; Raymonda Varon-Mateeva; Dieter Niederacher; Christian Sutter; Helmut Deissler; Dorothea Gadzicki; Sabine Preisler-Adams; Karin Kast; Ines Schönbuchner; Trinidad Caldes; Miguel de la Hoya; Kristiina Aittomäki; Heli Nevanlinna; Jacques Simard; Amanda B Spurdle; Helene Holland; Xiaoqing Chen; Radka Platte; Georgia Chenevix-Trench; Douglas F Easton
Journal: Cancer Res Date: 2010-11-30 Impact factor: 12.701

7. Role of selected genetic variants in lung cancer risk in African Americans.

Authors: Margaret R Spitz; Christopher I Amos; Susan Land; Xifeng Wu; Qiong Dong; Angela S Wenzlaff; Ann G Schwartz
Journal: J Thorac Oncol Date: 2013-04 Impact factor: 15.609

Review 8. The prevalence of familial testicular cancer: an analysis of two patient populations and a review of the literature.

Authors: K P Dieckmann; U Pichlmeier
Journal: Cancer Date: 1997-11-15 Impact factor: 6.860

9. Genetic variations in PI3K-AKT-mTOR pathway and bladder cancer risk.

Authors: Meng Chen; Adrian Cassidy; Jian Gu; George L Delclos; Fan Zhen; Hushan Yang; Michelle A T Hildebrandt; Jie Lin; Yuanqing Ye; Robert M Chamberlain; Colin P Dinney; Xifeng Wu
Journal: Carcinogenesis Date: 2009-12 Impact factor: 4.944

Review 10. Global burden of cancers attributable to infections in 2008: a review and synthetic analysis.

Authors: Catherine de Martel; Jacques Ferlay; Silvia Franceschi; Jérôme Vignat; Freddie Bray; David Forman; Martyn Plummer
Journal: Lancet Oncol Date: 2012-05-09 Impact factor: 41.316

8 in total

1. Use of empiric methods to inform prostate cancer health disparities: Comparison of neighborhood-wide association study "hits" in black and white men.

Authors: Shannon M Lynch; Kristen Sorice; Erin K Tagai; Elizabeth A Handorf
Journal: Cancer Date: 2020-02-03 Impact factor: 6.860

2. Microarray-based SNP genotyping to identify genetic risk factors of triple-negative breast cancer (TNBC) in South Indian population.

Authors: M Aravind Kumar; Vineeta Singh; Shaik Mohammad Naushad; Uday Shanker; M Lakshmi Narasu
Journal: Mol Cell Biochem Date: 2017-09-16 Impact factor: 3.396

3. A Novel Pathway-Based Approach Improves Lung Cancer Risk Prediction Using Germline Genetic Variations.

Authors: David C Qian; Younghun Han; Jinyoung Byun; Hae Ri Shin; Rayjean J Hung; John R McLaughlin; Maria Teresa Landi; Daniela Seminara; Christopher I Amos
Journal: Cancer Epidemiol Biomarkers Prev Date: 2016-05-24 Impact factor: 4.254

Review 4. A Decade of GWAS Results in Lung Cancer.

Authors: Yohan Bossé; Christopher I Amos
Journal: Cancer Epidemiol Biomarkers Prev Date: 2017-06-14 Impact factor: 4.254

5. Cancer risk among patients with multiple sclerosis: A cohort study in Isfahan, Iran.

Authors: Masoud Etemadifar; Hamidreza Jahanbani-Ardakani; Sara Ghaffari; Maboobeh Fereidan-Esfahani; Hossein Changaei; Nazila Aghadoost; Ameneh Jahanbani Ardakani; Negin Moradkhani
Journal: Caspian J Intern Med Date: 2017

6. Exposure to Antineoplastic Agents Induces Cytotoxicity in Nurse Lymphocytes: Role of Mitochondrial Damage and Oxidative Stress.

Authors: Mohmmad Ali Eghbal; Elham Yusefi; Maria Tavakoli-Ardakani; Maral Ramazani; Mohammad Hadi Zarei; Ahmad Salimi; Jalal Pourahmad
Journal: Iran J Pharm Res Date: 2018 Impact factor: 1.696

7. Prediction of acute myeloid leukaemia risk in healthy individuals.

Authors: Sagi Abelson; Grace Collord; Stanley W K Ng; Omer Weissbrod; Netta Mendelson Cohen; Elisabeth Niemeyer; Noam Barda; Philip C Zuzarte; Lawrence Heisler; Yogi Sundaravadanam; Robert Luben; Shabina Hayat; Ting Ting Wang; Zhen Zhao; Iulia Cirlan; Trevor J Pugh; David Soave; Karen Ng; Calli Latimer; Claire Hardy; Keiran Raine; David Jones; Diana Hoult; Abigail Britten; John D McPherson; Mattias Johansson; Faridah Mbabaali; Jenna Eagles; Jessica K Miller; Danielle Pasternack; Lee Timms; Paul Krzyzanowski; Philip Awadalla; Rui Costa; Eran Segal; Scott V Bratman; Philip Beer; Sam Behjati; Inigo Martincorena; Jean C Y Wang; Kristian M Bowles; J Ramón Quirós; Anna Karakatsani; Carlo La Vecchia; Antonia Trichopoulou; Elena Salamanca-Fernández; José M Huerta; Aurelio Barricarte; Ruth C Travis; Rosario Tumino; Giovanna Masala; Heiner Boeing; Salvatore Panico; Rudolf Kaaks; Alwin Krämer; Sabina Sieri; Elio Riboli; Paolo Vineis; Matthieu Foll; James McKay; Silvia Polidoro; Núria Sala; Kay-Tee Khaw; Roel Vermeulen; Peter J Campbell; Elli Papaemmanuil; Mark D Minden; Amos Tanay; Ran D Balicer; Nicholas J Wareham; Moritz Gerstung; John E Dick; Paul Brennan; George S Vassiliou; Liran I Shlush
Journal: Nature Date: 2018-07-09 Impact factor: 49.962

8. Comprehensive assessments of germline deletion structural variants reveal the association between prognostic MUC4 and CEP72 deletions and immune response gene expression in colorectal cancer patients.

Authors: Peng-Chan Lin; Hui-O Chen; Chih-Jung Lee; Yu-Min Yeh; Meng-Ru Shen; Jung-Hsien Chiang
Journal: Hum Genomics Date: 2021-01-11 Impact factor: 4.639

8 in total