Literature DB >> 33959510

Disease Spectrum of Breast Cancer Susceptibility Genes.

Jin Wang^1,2, Preeti Singh², Kanhua Yin^2,3, Jingan Zhou^2,4, Yujia Bao⁵, Menghua Wu⁵, Kush Pathak⁶, Sophia K McKinley⁷, Danielle Braun^3,8, Kevin S Hughes².

Abstract

BACKGROUND: Pathogenic variants in cancer susceptibility genes can increase the risk of a spectrum of diseases, which clinicians must manage for their patients. We evaluated the disease spectrum of breast cancer susceptibility genes (BCSGs) with the aim of developing a comprehensive resource of gene-disease associations for clinicians.
METHODS: Twelve genes (ATM, BARD1, BRCA1, BRCA2, CDH1, CHEK2, NF1, PALB2, PTEN, RECQL, STK11, and TP53), all of which have been conclusively established as BCSGs by the Clinical Genome Resource (ClinGen) and/or the NCCN guidelines, were investigated. The potential gene-disease associations for these 12 genes were verified and evaluated based on six genetic resources (ClinGen, NCCN, OMIM, Genetics Home Reference, GeneCards, and Gene-NCBI) and an additional literature review using a semiautomated natural language processing (NLP) abstract classification procedure.
RESULTS: Forty-two diseases were found to be associated with one or more of the 12 BCSGs for a total of 86 gene-disease associations, of which 90% (78/86) were verified by ClinGen and/or NCCN. Four gene-disease associations could not be verified by either ClinGen or NCCN but were verified by at least three of the other four genetic resources. Four gene-disease associations were verified by the NLP procedure alone.
CONCLUSION: This study is unique in that it systematically investigates the reported disease spectrum of BCSGs by surveying multiple genetic resources and the literature with the aim of developing a single consolidated, comprehensive resource for clinicians. This innovative approach provides a general guide for evaluating gene-disease associations for BCSGs, potentially improving the clinical management of at-risk individuals.

Entities: Chemical

Keywords: breast cancer; cancer genetic; cancer susceptibility genes; disease spectrum; germline mutation

Year: 2021 PMID： 33959510 PMCID： PMC8093501 DOI： 10.3389/fonc.2021.663419

Source DB: PubMed Journal: Front Oncol ISSN： 2234-943X Impact factor: 6.244

Introduction

Hereditary predisposition is found in approximately 10% of all breast cancer cases (1). Most are related to germline mutations in high-penetrance genes such as BRCA1 and BRCA2 (2–5). Since the identification of BRCA1 and BRCA2 (6, 7), genetic testing has become a routine part of clinical care for individuals with possible hereditary breast cancer predisposition (1). With the substantial increase in knowledge of cancer genetics (8, 9), more than 30 potential breast cancer susceptibility genes (BCSGs) have been suggested, including genes with high (e.g., BRCA1/2, TP53, CDH1, PTEN, and STK11), moderate (e.g., PALB2, CHEK2, ATM, and RECQL), and low-to-disputed penetrance (e.g., MLH1, MSH2, MSH6, PMS2, MEN1, and PPM1D) (9–12). Among them, 12 genes with high or moderate penetrance for breast cancer have been definitively established by either the Clinical Genome Resource (ClinGen) (11) or the National Comprehensive Cancer Network (NCCN) (12), the top two authoritative resources. Pathogenic variants in a BCSG can also increase the risk of other diseases. For instance, CDH1 is not only associated with increased breast cancer risk, but also a predisposition to gastric cancer (13, 14). Furthermore, several BCSGs are responsible for rare hereditary cancer syndromes, such as TP53, which is responsible for Li-Fraumeni syndrome. Individuals with this syndrome have a very high risk of developing multiple malignancies, including but not limited to, breast cancer, sarcoma, brain cancer, leukemia, lung cancer, and adrenocortical cancer (15–18). As comprehensive panel genetic testing becomes the norm (19), clinicians are increasingly faced with the challenge of advising mutation carriers about genes they may be less familiar with or involving cancer susceptibility in organs outside their specialty. A variety of existing resources, in addition to NCCN and ClinGen, describe the diseases associated with each gene (20), including but not limited to, Genetics Home Reference (https://ghr.nlm.nih.gov/), Online Mendelian Inheritance in Man (OMIM) (https://www.ncbi.nlm.nih.gov/omim), GeneCards (https://www.genecards.org/), and Gene-NCBI (https://www.ncbi.nlm.nih.gov/gene/). However, gene-disease associations described among these six resources are often ambiguous, incomplete, or confusing. For example, the association of BRCA2 with melanoma is identified in NCCN and Genetics Home Reference but not in other genetic resources such as ClinGen, OMIM, GeneCards, or Gene-NCBI. Furthermore, some gene-disease associations are not found in any genetic resource, such as the association of CHEK2 with gastric cancer, which has been established with high likelihood in the literature (21, 22). This poses a considerable dilemma for clinicians who are obligated to identify and assess gene-disease associations that require management in clinical practice. In addition, the rapidly growing medical literature makes it not possible for clinicians to extract useful information precisely and quickly. To address this challenge, Natural language processing (NLP), a technology that trains a computational algorithm with many annotated examples to allow the computer to “learn” and “predict” the meaning of human language, may present a promising solution. Our previous studies illustrate how to train and evaluate an NLP algorithm and incorporate it into a semi-automated procedure to accurately identify the penetrance studies based on abstracts (23–25). Relying on a patchwork of resources is cumbersome, time-consuming, and can lead to errors of omission. A single comprehensive resource is critically needed to streamline this process. In light of these issues, we have developed a novel approach to identify, evaluate, and curate the diseases or complex syndromes associated with cancer susceptibility genes based on six genetic resources and the NLP literature review.

Methods

Established Breast Cancer Susceptibility Genes

Germline genetic testing is performed on non-cancer cells and mostly blood-based or saliva-based, and a germline pathogenic variant in a cancer susceptibility gene indicates the possibility that other family members have a hereditary susceptibility to developing cancer. In contrast, somatic testing is performed on cancer cells (e.g., tumor tissue), and a somatic variant may guide targeted therapy and other treatment decisions. The present study focused on germline BCSGs, and only monoallelic BCSGs were included. The BCSGs were initially identified using ClinGen (11) and NCCN (12). In 2019, Lee and other experts on the ClinGen Hereditary Cancer Clinical Domain Executive Committee published a list of 31 high-priority genes for curation using the ClinGen Gene Curation framework (11). Among these 31 genes, 11 classified as having a ‘Definitive’ or ‘Moderate’ association with breast cancer were included in our study. The NCCN Guidelines for ‘Genetic/Familial High-Risk Assessment: Breast and Ovarian’ identified 21 genes offered in multi-gene panels where breast cancer risk was classified as ‘Very strong’, ‘Strong’, or ‘Limited’ (12). Of these 21, the 12 genes that were classified as ‘Very strong’ or ‘Strong’ were also included in our study. Accounting for overlap between the two resources, 12 BCSGs were selected for breast cancer, namely, ATM, BARD1, BRCA1, BRCA2, CDH1, CHEK2, NF1, PALB2, PTEN, RECQL, STK11, and TP53 ( ).

Figure 1

Flow chart for identifying and evaluating gene-disease association. The number ‘1’ indicates that the gene was associated with BCSG in the resource. The number ‘0’ indicates that the gene’s association with BCSG was refuted in the resource. The number ‘9’ indicates that the gene’s association with BCSG was unclear in the resource. Uncertain association indicates that the gene’s association with BCSG is unclear, and further studies are required to refute or accept the association. BCSGs, breast cancer susceptibility genes; NLP, natural language processing.

Identification of Gene-Disease Association

Diseases associated with BCSGs were initially identified in the six genetic resources (ClinGen, NCCN, OMIM, Genetics Home Reference, GeneCards, and Gene-NCBI) and by reviewing the literature. For each of these sources, each potential association was coded in our database as ‘1’ if the association was definitive, ‘9’ if the association was possible, and ‘0’ if there was no association, as shown in . The date of last access to all resources was November 20, 2020. In the following sections we describe in detail each of these resources.

ClinGen

ClinGen is a database curated by the Clinical Genome Resource. It uses a standardized clinical validity framework to assess evidence to validate a gene-disease association and to define disease management. We extracted data regarding gene-disease associations directly from the ‘Gene-Disease Validity’ reports in ClinGen (https://search.clinicalgenome.org/kb/gene-validity). The strength of ‘Gene-Disease Validity’ was classified by ClinGen as ‘Definitive’, ‘Strong’, ‘Moderate’, ‘Limited’, ‘Refuted’, ‘Disputed’, or ‘No Reported Evidence’ based on the level of evidence. If an association was classified as ‘Definitive’, ‘Strong’, or ‘Moderate’, it was coded in our database as ‘1’ in the field ClinGen Validity. If an association was classified as ‘Limited’, it was coded in our database as ‘9’. If an association was classified as ‘Refuted’, ‘Disputed’ or ‘No Reported Evidence’, it was coded in our database as ‘0’. We also reviewed the ‘Actionability’ reports in ClinGen, where the gene-disease associations were identified indirectly (https://clinicalgenome.org/working-groups/actionability/). The ‘Actionability’ report in ClinGen summarizes secondary findings in patients and identifies diseases caused by susceptibility genes that can be prevented or palliated. A gene-disease association was coded as ‘1’ in our database in the field ClinGen Actionability, if the disease was a manifestation of the genetic disorder, if management of that disease was recommended by screening or preventive intervention, or if the disease was verified in the ‘Penetrance’ section of the ‘Actionability’ report. The gene-disease association was coded in our database as ‘9’, if the report suggested a possible relationship.

NCCN Guidelines

Data was extracted from the NCCN Guidelines on Genetic/Familial High-Risk Assessment: Breast, Ovarian and Pancreatic (Version 2.2021) (12) and Colorectal (Version 2.2019) (26). A gene-disease association was coded as ‘1’ in our database if a disease or a feature was used to identify patients for genetic testing or if the management of a disease was recommended for mutation carriers. If NCCN identified a possible relationship, the gene-disease association was coded as ‘9’.

Other Genetic Resources

Other reputable databases such as ‘OMIM’, ‘Genetics Home Reference’, ‘GeneCards’, and ‘Gene-NCBI’ (described in detail below) were also used to identify gene-disease associations. If a gene-disease association was present in one of these resources, this association was coded as ‘1’ in our database. ‘OMIM’ is an online compendium of human genes and genetic phenotypes that is written and regularly updated by the McKusick-Nathans Institute of Genetic Medicine. The “Clinical Synopses” table for each gene was used to identify gene-disease associations. ‘Genetics Home Reference’ is a free online resource that was created after the announcement of the human genome map in 2003 and is maintained by the National Library of Medicine. It is designed to make the connection between genetics and disease more transparent for the general public. The “health conditions related to the Genetic Changes” section for each gene was used to identify gene-disease associations. Of note, as of October 1, 2020, Genetics Home Reference was ended as a stand-alone website, and most of its content has been transferred to MedlinePlus Genetics (https://medlineplus.gov/genetics). ‘GeneCards’ is a comprehensive database of human genes. The content of this database is reviewed and updated by the GeneCards Suite Project Team. The “disorders” table for each gene was used to identify gene-disease associations. ‘Gene-NCBI’ is a resource of the National Center for Biotechnology Information (NCBI), which centralizes gene-related information into individual records. Many different types of gene-specific data are connected to the record including gene products and their attributes, expression, interactions, pathways, variation, and phenotypic consequences. The “Phenotypes” section for each gene was used to identify gene-disease associations.

Evaluation of Gene-Disease Association

The process of validating the gene-disease association is outlined in . Of the six genetic resources, we considered ClinGen and NCCN the most authoritative and curated these as major resources. As shown in , we designated the gene-disease association ‘verified’ if it was coded as ‘1’ in either ClinGen or NCCN. Additionally, if the gene-disease association was coded as ‘1’ in more than three other genetic resources (OMIM, Genetic Home Reference, GeneCard, and Gene-NCBI), it was also designated ‘verified’. On the other hand, we designated the gene-disease association ‘uncertain’, if it was not coded as ‘1’ in either ClinGen or NCCN and was found in fewer than three of the other genetic resources (OMIM, Genetic Home Reference, GeneCard, and Gene-NCBI). We designated the gene-disease association as ‘no association’ directly if it was coded as ‘0’ in ClinGen. All ‘uncertain’ gene-disease associations were further evaluated by literature review using an abstract classifier NLP procedure, which classifies abstracts as being relevant to cancer penetrance or not (23, 24). Our NLP abstract classifier was developed to cull germline penetrance papers from PubMed. In brief, it uses a Support Vector Machine algorithm to classify abstracts as relevant to penetrance, prevalence, both, or neither (24). This NLP abstract classifier has been incorporated into a semiautomated procedure. The sensitivity and specificity of this approach in identifying cancer penetrance studies have been validated (23). In this study, we used standard gene and disease PubMed search terms ( ) to run the procedure. The NLP abstract classifier was applied to identify the abstracts that were classified as relevant to prevalence or penetrance, and the abstracts were subsequently reviewed by two researchers independently. We then retrieved the full text of these penetrance studies and determined the gene-disease associations based on the quality of the penetrance study (including type of study, sample size, carrier numbers, and ascertainment criteria) as well as the statistical significance of the results. If no relevant penetrance abstract was identified, the association was designated ‘no association’. If relevant penetrance studies were identified, they were presented in a group consensus meeting with our principal investigator (KSH), one surgery resident, and four clinical researchers participating (two attending surgical oncologists and two research fellows in surgical oncology). The attendees selected high-quality penetrance studies based on study design, patient population, number of pathogenic variant carriers, and ascertainment mechanism, and reached a final consensus based on evaluating these high-quality studies. As a rule of thumb, we considered a gene-cancer association to be real if at least one high-quality penetrance study reported at least a two-fold increased risk that was statistically significant. If the attendees could not reach a consensus, the gene-disease association remained ‘uncertain’. Of note, to ensure accuracy, the group meeting not only discussed the potential controversial gene-cancer associations but also examined all the evidence regarding every gene-cancer association reported in the study.

Results

Breast Cancer Susceptibility Genes in Six Genetic Resources

As shown in , among the twelve established BCSGs, the association of breast cancer risk with ATM, BARD1, BRCA1, BRCA2, CDH1, and CHEK2 was identified in all six genetic sources; PALB2, PTEN, STK11 and TP53 were identified in at least two genetic sources. However, the association of breast cancer risk with NF1 was only identified in NCCN, and RECQL was only identified in ClinGen.

Table 1

Associations between the 12 susceptibility genes and breast cancer in six genetic resources.

Gene	Genetic Resources
Gene	No. of resources	ClinGen	NCCN	OMIM	GHR	GeneCards	Gene-NCBI
ATM	6	Definitive	Strong	1	1	1	1
BARD1	6	Definitive	Strong fortriple-negative disease	1	1	1	1
BRCA1	6	Definitive	Very strong	1	1	1	1
BRCA2	6	Definitive	Very strong	1	1	1	1
CDH1	6	Definitive	Strong	1	1	1	1
CHEK2	6	Definitive	Strong	1	1	1	1
STK11	4	Definitive	Strong	1	1
PALB2	4	Definitive	Strong	1		1
TP53	4	Definitive	Strong		1		1
PTEN	3	Definitive	Strong		1
NF1	1		Strong
RECQL	1	Moderate

The number ‘1’ indicates that the gene was associated with breast cancer in the resource.

GHR, Genetics Home Reference; NCBI, National Center for Biotechnology Information.

Associations between the 12 susceptibility genes and breast cancer in six genetic resources. The number ‘1’ indicates that the gene was associated with breast cancer in the resource. GHR, Genetics Home Reference; NCBI, National Center for Biotechnology Information.

Diseases Associated With BCSGs

There were 66 unique diseases initially identified, of which 42 diseases were determined to be associated with BCSGs by our evaluation ( ). Besides breast cancer, malignant diseases including prostate cancer, pancreatic cancer, colorectal cancer, brain tumor, gastric cancer, ovarian cancer, and sarcoma were associated with at least three BCSGs (range: 3 to 6). However, BARD1 and RECQL were only associated with breast cancer, without increased risk for any other diseases. The disease spectrum of each BCSG is shown in . Furthermore, several BCSGs are associated with specific syndromes, such as NF1 with Neurofibromatosis Type 1, PTEN with Cowden Syndrome, STK11 with Peutz-Jeghers Syndrome, and TP53 with Li-Fraumeni Syndrome. The most common cancers associated with these syndromes were determined to be associated with the corresponding susceptibility genes by our procedure.

Table 2

Diseases associated with the 12 breast cancer susceptibility genes.

BCSGs	Disease Spectrum
BCSGs	Malignant	Benign	Borderline
ATM	Breast Cancer, Colorectal Cancer, Gastric Cancer, Pancreatic Cancer, Prostate Cancer
BARD1	Breast Cancer
BRCA1	Breast Cancer, Ovarian Cancer, Pancreatic Cancer, Prostate Cancer
BRCA2	Breast Cancer, Melanoma, Ovarian Cancer, Pancreatic Cancer, Prostate Cancer
CDH1	Breast Cancer, Gastric Cancer	BCD Syndrome*
CHEK2	Breast Cancer, Colorectal Cancer, Gastric Cancer, Kidney Cancer, Prostate Cancer, Osteosarcoma, Thyroid Cancer
NF1	Brain Tumor, Breast Cancer, Leukemia, Sarcoma	Bone Dysplasia, Cafe-Au-Lait Spots, Intellectual Disability, Iris Hamartoma, Neurofibroma, Pulmonary Stenosis, Skin	GIST, Paraganglioma, Pheochromocytoma
PALB2	Breast Cancer, Ovarian Cancer, Pancreatic Cancer, Prostate Cancer
PTEN	Brain Tumor, Breast Cancer, Colorectal Cancer, Endometrial Cancer, Kidney Cancer, Melanoma, Thyroid Cancer	Acral Keratoses, Autism, Cerebrovascular Malformation, Facial Papules, GI Hamartomatous Polyps, Lipoma, Macrocephaly, Macular Pigmentation, Oral Mucosal Papillomatosis, Palmoplantar Keratoses, Thyroid, Trichilemmoma, Uterine Fibroid
RECQL	Breast Cancer
STK11	Breast Cancer, Cervical Cancer, Colorectal Cancer, Endometrial Cancer, Gastric Cancer, Hepatobiliary Cancer, Lung Cancer, Pancreatic Cancer, Small Intestine Cancer	GI Hamartomatous Polyps, Skin	Non-Epithelial Ovarian Tumor, Ovarian SCST, Testicular SCST
TP53	Adrenocortical Carcinoma, Brain Tumor, Breast Cancer, Colorectal Cancer, Hepatobiliary Cancer, Pancreatic Cancer, Osteosarcoma, Soft Tissue Sarcoma

GI, gastrointestinal; BCD, blepharocheilodontic; SCST, sex cord-stromal tumor; GIST, gastrointestinal stromal tumor.

*BCD syndrome consists of facial dysmorphism, hypertelorism, imperforate anus, distichiasis, clinodactyly, hypoplastic nails, choanal atresia, cleft palate, and benign teeth disorder.

Diseases associated with the 12 breast cancer susceptibility genes. GI, gastrointestinal; BCD, blepharocheilodontic; SCST, sex cord-stromal tumor; GIST, gastrointestinal stromal tumor. *BCD syndrome consists of facial dysmorphism, hypertelorism, imperforate anus, distichiasis, clinodactyly, hypoplastic nails, choanal atresia, cleft palate, and benign teeth disorder.

Disease Spectrum of BCSGs and the Corresponding Resources

A total of 160 gene-disease associations were initially identified in the six genetic resources and literature ( ). As shown in , a total of 86 gene-disease associations were identified by our evaluation. Among them, 90% (78/86) of gene-disease associations were verified by ClinGen and/or NCCN. Conversely, four gene-disease associations were absent from both ClinGen and NCCN but verified in three or more of the other four genetic resources. These included CDH1-Blepharocheilodontic (BCD) Syndrome, CHEK2-osteosarcoma, NF1-leukemia, and NF1-pulmonary stenosis. Notably, four gene-disease associations, namely, ATM-gastric cancer, CHEK2-gastric cancer, CHEK2-kidney cancer, and CHEK2-thyroid cancer, were verified by NLP literature review alone.

Figure 2

Disease spectrum of breast cancer susceptibility genes. “†” refers to both female and male breast cancer. The three colors represent malignant disease (black), benign disease (grey), and borderline disease (orange), respectively. NLP, natural language processing; GI, gastrointestinal; BCD, blepharocheilodontic syndrome; SCST, sex cord-stromal tumor; GIST, gastrointestinal stromal tumor; NEOT, non-epithelial ovarian tumor.

Discussion

Although hereditary breast cancer is mainly associated with BRCA1/2 pathogenic variants, it may also be associated with germline mutations in other genes. Thus, multi-gene panels usually include both high- and moderate-penetrance genes associated with breast cancer (8, 27, 28). The twelve BCSGs included in our study are those previously established by ClinGen and/or NCCN. To outline the disease spectrum for the twelve BCSGs, we examined six reliable genetic resources combined with a literature review using NLP. Finally, 49 unique diseases were verified as being associated with the twelve BCSGs. One of the authoritative resources used for this study is the NIH-funded ClinGen. In contrast to “expert panel” consensus assessments used by NCCN, ClinGen creates a framework that provides evidence for the strength of the association between a gene and a disease risk through semi-quantitative classification (29). The ClinGen classification is based on genetic evidence including case-level data and case-control data, as well as experimental evidence. The other authoritative resource employed for this study is the NCCN Guidelines - the recognized standard for clinical practice in cancer care - using its frequently updated set of clinical practice guidelines. More than 1,300 physicians and oncology researchers from the NCCN Member Institutions comprise the expert panels. Hence, the gene-disease association was designated ‘verified’ in our study if it was established by either ClinGen or NCCN. Although the standardized literature review method used by ClinGen is outstanding (11), this approach is time-consuming and leads to delay in reflecting the most recent findings. In addition, the gene-cancer associations listed on the NCCN guidelines may not be comprehensive. Therefore, it is necessary to include other genetic resources and find associations missed or not yet addressed by ClinGen and/or NCCN. Four other genetic resources (OMIM, Genetics Home Reference, GeneCards, and Gene-NCBI) are also considered reputable and contain a comprehensive compendium of relationships between phenotypes and genotypes. However, these resources lack the strict curation processes for evaluating strength of evidence utilized by ClinGen or the expert panels employed by NCCN. Therefore, we rated the level of evidence from these four resources lower than ClinGen and NCCN, and the gene-disease association was designated ‘verified’ only if it was established by at least three of these sources when the relationship was not found in ClinGen or NCCN. Meanwhile, we understand that the likely valid gene-disease associations we identified that were not present in ClinGen or NCCN may be explained in part by the observation that the latter entities work in a slow and deliberate manner that might not yet have allowed a full review of all associations. Forty-nine unique diseases were verified as being associated with BCSGs by our procedure. Each BCSG was associated with at least three diseases except BARD1 and RECQL, which were only associated with breast cancer. BARD1 shares strong structural homology with BRCA1 and has been demonstrated to be involved in the cellular DNA repair process (30). The association between breast cancer and mutations in the BARD1 gene was first found in a large case-control study of 65,057 women with breast cancer (8), where the prevalence of BARD1 mutations was 0.18%, significantly greater than the controls (OR = 2.16, 95% CI: 1.31-3.63, p < 0.05). On the other hand, RECQL was first identified as a novel breast cancer susceptibility gene in 2015, by two independent research groups (31, 32). Bogdanova et al. compared 2596 breast cancer patients and 2132 healthy females from central Europe and indicated that RECQL* c.1667_1667+3delAGTA could represent a moderate-risk breast cancer susceptibility allele (33). A recent study found a moderate risk of breast cancer in African American women with RECQL mutation (34). In addition, RECQL is considered associated with hereditary breast carcinoma in ClinGen (gene-disease validity: moderate) (https://search.clinicalgenome.org/kb/genes/HGNC:9948). However, there is no high-quality penetrance study that showed statistical significance for additional diseases beyond breast cancer. Generally speaking the BCSGs are thought to affect female breast cancer risk, but some are also associated with male breast cancer (MBC). Tai et al. evaluated 97 men with breast cancer from 1939 families. The cumulative risk of breast cancer was higher in both BRCA1 and BRCA2 male heterozygotes compared to those without a BRCA1/2 pathogenic variant at all ages. The relative risk of developing breast cancer peaks in the 30s and 40s (35). Another study analyzed 321 families with BRCA2 mutations both retrospectively and prospectively, suggesting a cumulative risk for male breast cancer of 8.9% up to age 80 (36). Based on these data, NCCN guidelines recommend that men with a BRCA1/2 pathogenic variant should receive a clinical breast exam at a young age (12). Notably, we found that CHEK2 and PALB2 were also associated with male breast cancer in GeneCards. We verified these associations by literature review based on the NLP procedure, with the literature showing strong evidence in penetrance studies. The CHEK2/1100delC, a truncating variant, is present in 13.5% of individuals from families with male breast cancer (p = 0.00015) and results in an approximately ten-fold increase of breast cancer risk in men (37). A population-based study found the CHEK2/1100delC was present in 4.2% of unselected male breast cancer cases, more prevalent than the frequency of 1.1% in 1,692 controls (OR = 4.1, 95% CI: 1.2-14.3, p = 0.05) (38). Recently, Yang et al. analyzed data from 524 families with PALB2 pathogenic variants from 21 countries and found an association between PALB2 and risk of male breast cancer (RR = 7.34, 95% CI: 1.28-42.18, p = 0.026) (39). Additionally, Pritzlaff et al. reviewed 715 male breast cancer patients who underwent germline multi-gene panel testing and found that pathogenic variants in CHEK2 (OR = 3.7, p = 6.24 × 10-24) and PALB2 (OR = 6.6, p = 0.01) were both significantly associated with breast cancer risk in men (40). In the present study, 82% of gene-disease associations were verified by ClinGen and/or NCCN, underscoring the credibility of these two major resources. Nevertheless, six gene-disease associations were not found in ClinGen or NCCN but were instead identified in at least three of the other four genetic resources. Furthermore, these associations were similarly supported by published studies with strong evidence of the association, underscoring the reliability our review criteria. Of note, four gene-disease associations, i.e., ATM-gastric cancer, CHEK2-gastric cancer, CHEK2-kidney cancer, and CHEK2-thyroid cancer, were not identified in any of the six resources but were verified by the NLP-aided literature review. In 2015, Helgason et al. reported a GWAS of gastric cancer in a European population, using information on 2,500 population-based gastric cancer cases and 205,652 controls. They found a new gastric cancer association with loss-of-function mutations in ATM (OR = 4.74, p = 8.0 × 10-12) (41). A recent study reported that ATM carriers were significantly associated with lower protein expression in five cancer types, including gastric cancer (42). A CHEK2 mutation was also identified to predispose to gastric cancer (OR = 1.6, p = 0.004), particularly in young-onset cases (OR = 2.1, p = 0.01) (21). Additionally, Näslund-Koch et al. examined 86,975 individuals from the Copenhagen General Population Study. The age- and sex-adjusted hazard ratio for CHEK2/1100delC heterozygotes compared with noncarriers was 5.76 (95% CI: 2.12-15.6) for gastric cancer and 3.61 (95% CI: 1.33-9.79) for kidney cancer (22). Furthermore, a case-control study reported a CHEK2 mutation in 15.6% of unselected patients with papillary thyroid cancer, compared to 6.0% in age- and sex-matched controls (OR = 3.3, p < 0.0001) (43). Another CHEK2 variant, c.470C allele, was shown to increase the risk of papillary thyroid carcinoma in female patients by almost 13-fold (OR = 12.81, p = 0.019) (44). The NCCN guidelines for considering risk-reducing mastectomy and breast MRI are well established for carriers of high-risk genes (e.g., BRCA1, BRCA2, and PALB2), and guidelines on annual mammogram with consideration of breast MRI are also established regarding carriers with moderate-risk genes (e.g., ATM and CHEK2) (12). Women with genes such as TP53, CDH1, PTEN, STK11, and NF1 may be managed according to established guidelines for the associated cancer predisposition syndrome. For instance, in Li-Fraumeni syndrome, annual whole-body MRI is advised in TP53 pathogenic variant carriers (45, 46). More aggressive interventions may be recommended, such as consideration of prophylactic gastrectomy if a CDH1 mutation is found, even in the absence of gastric cancer in the family (47). This necessitates that clinicians stay current with management guidelines and access reliable information resources to implement these updates effectively for their patients (e.g., resources such as ASK2ME could aid with this). Risks of other cancers for those BCSG carriers appear to be modestly elevated, but whether this should alter screening recommendations is unknown. For example, the risk of leukemia with “TP53” is 1.6 times as high as the general population, but since the general population risk of leukemia is 0.9%, this amounts to an absolute risk of only 1.4% by age 85 (48). Although a pathogenic mutation in TP53 is statistically associated with leukemia, it would be hard to justify intensive screening or prevention measures based on this information. It is beyond the scope of this paper to identify the penetrance for each gene-disease association, but this will be the target of future work. Our proposed expansion of disease-gene association reporting will require clinicians to counsel patients appropriately about their risk of additional diseases and to refer them to genetic counselors or other specialists (e.g., neurologist, urologist). Evaluation based on six genetic resources could result in omissions of some phenotypes associated with BCSGs. We attempted to lessen this effect by including a literature review as an additional step. Another limitation is that the strict criteria we set for gene-disease associations (e.g., verified by ClinGen/NCCN, or at least three genetic resources) could mean that some diseases are overlooked. By reviewing the literature using NLP, we reevaluated those uncertain gene-disease associations to lessen this effect as much as possible. Although the comprehensiveness of our data seems to be conducive to more individualized care, this raises the problem of absence of management guidelines for patients who carry such variants. Additionally, the clinical utility of identifying potential diseases in BCSG carriers may conflict with current cost-efficacy constraints (i.e., interpreting variants, genetic counseling, overdiagnoses, and resulting anxiety in patients). Of note, we are making assumptions based on the available evidence, and we recognize that authoritative sources, such as ClinGen and NCCN guidelines, are updated periodically. Thus, this study represents a snapshot of current knowledge and understanding, rather than a definitive conclusion. In 2016, we built a clinical decision support tool for cancer susceptibility genes, called Ask2Me.Org (49). This tool provides labs, researchers, and clinical experts with the estimated cancer risk of germline pathogenic variants, including the disease spectrum for each susceptibility gene. Ask2Me.Org has been recommended as a resource in recent clinical practice guidelines (50). These disease spectrums we verified in the current study will be soon available in our website Ask2Me.Org, which is constantly updated. Ongoing research based on accurate estimates of cancer risk needs to be conducted in terms of appropriate management strategies.

Conclusions

To the best of our knowledge, this is the first study to collate the disease spectrum of BCSGs from multiple sources and make it available in a single resource. Notably, we developed an innovative assessment process based on six genetic resources and literature review using an NLP procedure. Throughout our evaluation process, we have kept in mind that frequent updates of the disease spectrum will be necessary to adjust for new data in these genetic resources. Our study provides a reference point for future studies, showing that BCSG mutation carriers should also be cautious of other diseases beyond breast cancer and highlights the necessity of broadening the criteria of management and improving outcomes for at-risk individuals.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics Statement

We used public database with no patient data, and individual informed consent was waived.

Author Contributions

JW, KY, DB, and KSH were involved in the conceptualization and design of this study. JW, PS, KY, JZ, KP, and SKM collected the data. YB and MW were responsible for maintaining the natural language processing abstract classifier. JW and PS analyzed the data and interpreted the results. JW, PS, and KY drafted the initial manuscript with critical feedback from DB and KSH. All authors contributed to the article and approved the submitted version.

Conflict of Interest

KH receives Honoraria from Hologic (Surgical implant for radiation planning with breast conservation and wire-free breast biopsy) and Myriad Genetics and has a financial interest in CRA Health (Formerly Hughes RiskApps). CRA Health develops risk assessment models/software with a particular focus on breast cancer and colorectal cancer. KH is a founder and owns equity in the company. KH is the Co-Creator of Ask2Me.Org, which is freely available for clinical use and is licensed for commercial use by the Dana Farber Cancer Institute and the MGH. KH’s interests in CRA Health and Ask2Me.Org were reviewed and are managed by Massachusetts General Hospital and Partners Health Care in accordance with their conflict of interest policies. DB co-leads the BayesMendel laboratory, which licenses software for the computation of risk prediction models. She does not derive any personal income from these licenses. All revenues are assigned to the lab for software maintenance and upgrades. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

49 in total

1. Germline RECQL mutations are associated with breast cancer susceptibility.

Authors: Cezary Cybulski; Jian Carrot-Zhang; Wojciech Kluźniak; Barbara Rivera; Aniruddh Kashyap; Dominika Wokołorczyk; Sylvie Giroux; Javad Nadaf; Nancy Hamel; Shiyu Zhang; Tomasz Huzarski; Jacek Gronwald; Tomasz Byrski; Marek Szwiec; Anna Jakubowska; Helena Rudnicka; Marcin Lener; Bartłomiej Masojć; Patrica N Tonin; Francois Rousseau; Bohdan Górski; Tadeusz Dębniak; Jacek Majewski; Jan Lubiński; William D Foulkes; Steven A Narod; Mohammad R Akbari
Journal: Nat Genet Date: 2015-04-27 Impact factor: 38.330

2. Risks of first and subsequent cancers among TP53 mutation carriers in the National Cancer Institute Li-Fraumeni syndrome cohort.

Authors: Phuong L Mai; Ana F Best; June A Peters; Rosamma M DeCastro; Payal P Khincha; Jennifer T Loud; Renée C Bremer; Philip S Rosenberg; Sharon A Savage
Journal: Cancer Date: 2016-08-06 Impact factor: 6.860

3. Frequency of mutations in individuals with breast cancer referred for BRCA1 and BRCA2 testing using next-generation sequencing with a 25-gene panel.

Authors: Nadine Tung; Chiara Battelli; Brian Allen; Rajesh Kaldate; Satish Bhatnagar; Karla Bowles; Kirsten Timms; Judy E Garber; Christina Herold; Leif Ellisen; Jill Krejdovsky; Kim DeLeonardis; Kristin Sedgwick; Kathleen Soltis; Benjamin Roa; Richard J Wenstrup; Anne-Renee Hartman
Journal: Cancer Date: 2014-09-03 Impact factor: 6.860

4. Breast cancer. The 'other' breast cancer genes.

Authors: Sam Kean
Journal: Science Date: 2014-03-28 Impact factor: 47.728

5. Risk of breast cancer in male BRCA2 carriers.

Authors: D G R Evans; I Susnerwala; J Dawson; E Woodward; E R Maher; F Lalloo
Journal: J Med Genet Date: 2010-06-28 Impact factor: 6.318

6. Li-fraumeni syndrome.

Authors: David Malkin
Journal: Genes Cancer Date: 2011-04

7. Low-penetrance susceptibility to breast cancer due to CHEK2(*)1100delC in noncarriers of BRCA1 or BRCA2 mutations.

Authors: Hanne Meijers-Heijboer; Ans van den Ouweland; Jan Klijn; Marijke Wasielewski; Anja de Snoo; Rogier Oldenburg; Antoinette Hollestelle; Mark Houben; Ellen Crepin; Monique van Veghel-Plandsoen; Fons Elstrodt; Cornelia van Duijn; Carina Bartels; Carel Meijers; Mieke Schutte; Lesley McGuffog; Deborah Thompson; Douglas Easton; Nayanta Sodha; Sheila Seal; Rita Barfoot; Jon Mangion; Jenny Chang-Claude; Diana Eccles; Rosalind Eeles; D Gareth Evans; Richard Houlston; Victoria Murday; Steven Narod; Tamara Peretz; Julian Peto; Catherine Phelan; Hong Xiang Zhang; Csilla Szabo; Peter Devilee; David Goldgar; P Andrew Futreal; Katherine L Nathanson; Barbara Weber; Nazneen Rahman; Michael R Stratton
Journal: Nat Genet Date: 2002-04-22 Impact factor: 38.330

8. Cancer Risks Associated With Germline PALB2 Pathogenic Variants: An International Study of 524 Families.

Authors: Xin Yang; Goska Leslie; Alicja Doroszuk; Sandra Schneider; Jamie Allen; Brennan Decker; Alison M Dunning; James Redman; James Scarth; Inga Plaskocinska; Craig Luccarini; Mitul Shah; Karen Pooley; Leila Dorling; Andrew Lee; Muriel A Adank; Julian Adlard; Kristiina Aittomäki; Irene L Andrulis; Peter Ang; Julian Barwell; Jonine L Bernstein; Kristie Bobolis; Åke Borg; Carl Blomqvist; Kathleen B M Claes; Patrick Concannon; Adeline Cuggia; Julie O Culver; Francesca Damiola; Antoine de Pauw; Orland Diez; Jill S Dolinsky; Susan M Domchek; Christoph Engel; D Gareth Evans; Florentia Fostira; Judy Garber; Lisa Golmard; Ellen L Goode; Stephen B Gruber; Eric Hahnen; Christopher Hake; Tuomas Heikkinen; Judith E Hurley; Ramunas Janavicius; Zdenek Kleibl; Petra Kleiblova; Irene Konstantopoulou; Anders Kvist; Holly Laduca; Ann S G Lee; Fabienne Lesueur; Eamonn R Maher; Arto Mannermaa; Siranoush Manoukian; Rachel McFarland; Wendy McKinnon; Alfons Meindl; Kelly Metcalfe; Nur Aishah Mohd Taib; Jukka Moilanen; Katherine L Nathanson; Susan Neuhausen; Pei Sze Ng; Tu Nguyen-Dumont; Sarah M Nielsen; Florian Obermair; Kenneth Offit; Olufunmilayo I Olopade; Laura Ottini; Judith Penkert; Katri Pylkäs; Paolo Radice; Susan J Ramus; Vilius Rudaitis; Lucy Side; Rachel Silva-Smith; Valentina Silvestri; Anne-Bine Skytte; Thomas Slavin; Jana Soukupova; Carlo Tondini; Alison H Trainer; Gary Unzeitig; Lydia Usha; Thomas van Overeem Hansen; James Whitworth; Marie Wood; Cheng Har Yip; Sook-Yee Yoon; Amal Yussuf; George Zogopoulos; David Goldgar; John L Hopper; Georgia Chenevix-Trench; Paul Pharoah; Sophia H L George; Judith Balmaña; Claude Houdayer; Paul James; Zaki El-Haffaf; Hans Ehrencrona; Marketa Janatova; Paolo Peterlongo; Heli Nevanlinna; Rita Schmutzler; Soo-Hwang Teo; Mark Robson; Tuya Pal; Fergus Couch; Jeffrey N Weitzel; Aaron Elliott; Melissa Southey; Robert Winqvist; Douglas F Easton; William D Foulkes; Antonis C Antoniou; Marc Tischkowitz
Journal: J Clin Oncol Date: 2019-12-16 Impact factor: 44.544

9. Associations Between Cancer Predisposition Testing Panel Genes and Breast Cancer.

Authors: Fergus J Couch; Hermela Shimelis; Chunling Hu; Steven N Hart; Eric C Polley; Jie Na; Emily Hallberg; Raymond Moore; Abigail Thomas; Jenna Lilyquist; Bingjian Feng; Rachel McFarland; Tina Pesaran; Robert Huether; Holly LaDuca; Elizabeth C Chao; David E Goldgar; Jill S Dolinsky
Journal: JAMA Oncol Date: 2017-09-01 Impact factor: 31.777

10. Clinical validity assessment of genes frequently tested on hereditary breast and ovarian cancer susceptibility sequencing panels.

Authors: Kristy Lee; Bryce A Seifert; Hermela Shimelis; Rajarshi Ghosh; Stephanie B Crowley; Natalie J Carter; Kurston Doonanco; A Katherine Foreman; Deborah I Ritter; Sharisse Jimenez; Mackenzie Trapp; Kenneth Offit; Sharon E Plon; Fergus J Couch
Journal: Genet Med Date: 2018-12-03 Impact factor: 8.822

3 in total

1. Revisiting the Implications of Positive Germline Testing Results Using Multi-gene Panels in Breast Cancer Patients.

Authors: Georgios N Tsaousis; Eirini Papadopoulou; Konstantinos Agiannitopoulos; Georgia Pepe; Nikolaos Tsoulos; Ioannis Boukovinas; Theofanis Floros; Rodoniki Iosifidou; Ourania Katopodi; Anna Koumarianou; Christos Markopoulos; Konstantinos Papazisis; Vasileios Venizelos; Achilleas Kapsimalis; Grigorios Xepapadakis; Amanda Psyrri; Eugeniu Banu; Dan Tudor Eniu; Alexandru Blidaru; Dana Lucia Stanculeanu; Andrei Ungureanu; Vahit Ozmen; Sualp Tansan; Mehmet Tekinel; Suayib Yalcin; George Nasioulas
Journal: Cancer Genomics Proteomics Date: 2022 Jan-Feb Impact factor: 4.069

2. Incidental germline findings during molecular profiling of tumor tissues for precision oncology: molecular survey and methodological obstacles.

Authors: Alexandra Lebedeva; Yulia Shaykhutdinova; Daria Seriak; Ekaterina Ignatova; Ekaterina Rozhavskaya; Divyasphoorthi Vardhan; Sofia Manicka; Margarita Sharova; Tatiana Grigoreva; Ancha Baranova; Vladislav Mileyko; Maxim Ivanov
Journal: J Transl Med Date: 2022-01-15 Impact factor: 5.531

3. Integrative genomic and transcriptomic analyses illuminate the ontology of HER2-low breast carcinomas.

Authors: Enrico Berrino; Laura Annaratone; Sara Erika Bellomo; Giulio Ferrero; Amedeo Gagliardi; Alberto Bragoni; Dora Grassini; Simonetta Guarrera; Caterina Parlato; Laura Casorzo; Mara Panero; Ivana Sarotto; Silvia Giordano; Matteo Cereda; Filippo Montemurro; Riccardo Ponzone; Nicola Crosetto; Alessio Naccarati; Anna Sapino; Caterina Marchiò
Journal: Genome Med Date: 2022-08-29 Impact factor: 15.266

3 in total