| Literature DB >> 28569218 |
Giorgio E M Melloni1, Luca Mazzarella2,3, Loris Bernard4, Margherita Bodini1, Anna Russo2, Lucilla Luzi2, Pier Giuseppe Pelicci2,5, Laura Riva6.
Abstract
BACKGROUND: The landscape of cancer-predisposing genes has been extensively investigated in the last 30 years with various methodologies ranging from candidate gene to genome-wide association studies. However, sequencing data are still poorly exploited in cancer predisposition studies due to the lack of statistical power when comparing millions of variants at once.Entities:
Keywords: Breast cancer; Germline mutations; Predisposition; Somatic mutations
Mesh:
Substances:
Year: 2017 PMID: 28569218 PMCID: PMC5452392 DOI: 10.1186/s13058-017-0854-1
Source DB: PubMed Journal: Breast Cancer Res ISSN: 1465-5411 Impact factor: 6.466
Fig. 1Workflow scheme for the whole analysis. Blue cylinders represent the data (obtained from available databases or processed during the analysis); hexagons are the analyzed datasets of cases and controls; red squares and triangles represent analysis and output. Flag shapes represent post-process annotation and statistical testing; brown trapezoids represent the three main analysis branches presented in this paper. NFE non-Finnish European
Breast cancer-predisposing genes and variants found in our case dataset
| Gene | Somatic driver gene | Total number of variants | Number of pathogenic variants | Number of truncating variants | Number of highly damaging mutations |
|---|---|---|---|---|---|
|
| X | 21 | 5 | ||
|
| X | 18 | 2 | 3 | |
|
| X | 21 | 5 | 2 | |
|
| 5 | 1 | |||
|
| X | 3 | 1 | ||
|
| X | 6 | 2 | 1 | |
|
| 4 | 1 | |||
|
| 5 | 1 | 2 | ||
|
| 1 | ||||
|
| |||||
|
| X | ||||
|
| 5 | ||||
|
| 3 | 1 | |||
|
| X | 3 | |||
|
| X | 4 | 1 |
The second column reports if the gene is also considered to be a somatic driver gene. The next three columns report the total number of non-synonymous variants, the number of variants considered being pathogenic, and the number of rare truncating variants (control minor allele frequency below 1%) not already included in the list of pathogenic variants. The last column shows instead all the missense variants that are not considered to be pathogenic but have a very high deleteriousness score (8/9 tools for predicting functional damage report the variant as damaging). As pathogenic reference we used the ClinVar and Humsavar databases
List of rare cancer-related pathogenic variants [control minor allele frequency (MAF) below 1%]
| Gene – variant | Control MAF | Case MAF | log2 MAF ratio | Summary of ClinVar and Humsavar annotations |
|---|---|---|---|---|
|
| 0.002% | 0.07% | 5.35 | Malignant melanoma |
|
| Novel | 0.08% | 4.47 | Colon, ovary and breast cancer |
|
| 0.006% | 0.07% | 3.76 | Fanconi anemia |
|
| 0.213% | 2.61% | 3.62 | Lynch syndrome |
|
| 0.072% | 0.23% | 1.66 | Prostate cancer |
|
| 0.244% | 0.69% | 1.50 | MEN2A syndrome/thyroid carcinoma |
|
| 0.033% | 0.08% | 1.20 | Renal cell carcinoma |
|
| 0.075% | 0.15% | 0.98 | Renal cancer |
|
| 0.185% | 0.30% | 0.72 | Colorectal cancer |
|
| 0.501% | 0.82% | 0.72 | Non-Hodgkin lymphoma |
|
| 0.992% | 1.04% | 0.07 | Cowden disease 3 |
This list includes all those genes that are not breast cancer predisposing but are connected to other types of cancer or cancer syndromes.
*translation termination (stop) codon
Fig. 2Distribution of pathogenic and truncating variants on breast cancer genes in our case dataset of 673 breast cancer patients. Oncoprint plot showing three classes of high confidence breast cancer-predisposing genes (rows); each column represents one of the samples with at least one of these mutations. Variants on known breast cancer-predisposing genes are indicated in blue (complete list in Table 1). A star indicates a variant that is a truncation but is not reported as pathogenic in the ClinVar or Humsavar databases. Pathogenic variants which affect genes related to cancer or cancer syndromes that are not linked to breast cancer are indicated in black and include genes like RET (thyroid cancer) or APC (colon cancer)
Fig. 3Analysis of rare variants. This flowchart represents the step-wise procedure in the central arm of Fig. 1 and is performed by filtering 73,544 non-synonymous coding variants down to 16,014 rare variants (MAF <1%), with a deleteriousness score over 0.5 and where the MAF in the cases is higher than in the controls. Rare variants are prioritized into two branches: on the left, variants falling in GWAS breast cancer linkage disequilibrium blocks (LD blocks); on the right, variants overlapping with cancer somatic mutations from COSMIC or cBioPortal (see Additional file 3: Table S6 and S7). For both datasets, overlaps are shown both at the initial level and after filtering for variants belonging to our list of 758 target genes (known cancer-predisposing genes, known somatic driver genes, and DNA repair genes). Six common (i.e., both overlapping with somatic mutations and falling into a GWAS LD block) variants on our target gene list, are reported at the bottom of this figure
Fig. 4Polygenic age-dependent model breakdown. a Feature ranking of the random forest model according to the mean decrease of Gini index. At the top, the most important variable is the deleteriousness score. b ROC curve on random forest training model. An AUC of 0.84 is reached under the supervision of the training dataset formed by reported pathogenic and non-pathogenic variants according to the ClinVar and Humsavar databases. c Representation of the distribution of deleteriousness score among variants. The top predictor in our random forest model is shown without the influence of the other variants. Although it cannot represent the real tree scheme of the model alone, there is a clear positive trend between increased deleteriousness score (DS, X-axis) and the number of trees classifying a variable as pathogenic (RF score, Y-axis)
Results from the polygenic age-dependent model
| Variant | Approved name | Control MAF | Case MAF | Protein change | Mean beta elastic net | Negative beta percentage |
|---|---|---|---|---|---|---|
|
| Mitochondrial ribosomal protein L24 | Novel | 0.074% | W54* | −2.78 | 1.00 |
|
| Cystatin S | 0.0129% | 0.300% | V81fs | −5.09 | 1.00 |
|
| Par-6 family cell polarity regulator alpha | 0.0018% | 0.078% | R256* | −1.86 | 1.00 |
|
| TRIO and F-actin binding protein | 0.0059% | 0.471% | S1075fs | −3.64 | 1.00 |
|
| Zinc finger protein 85 | Novel | 0.085% | R205* | −4.36 | 1.00 |
|
| Forkhead box P4 | 0.0018% | 0.091% | K147R | −8.04 | 1.00 |
|
| Polycystic kidney and hepatic disease 1 (autosomal recessive) | Novel | 0.075% | M1373R | −5.33 | 1.00 |
|
| Surfeit 1 | Novel | 0.081% | L179Q | −6.49 | 1.00 |
|
| Histone cluster 2, H2ab | Novel | 0.074% | T121fs | −3.59 | 0.97 |
|
| Stromal interaction molecule 2 | Novel | 0.081% | V281I | −1.65 | 0.97 |
|
| Carboxypeptidase A3 (mast cell) | Novel | 0.074% | R178* | −5.47 | 0.94 |
|
| Transmembrane and coiled-coil domains 3 | 0.0326% | 0.742% | A469fs | −1.93 | 0.93 |
|
| Serpin peptidase inhibitor, clade F | Novel | 0.080% | A62fs | −1.74 | 0.84 |
|
| Phosphorylase, glycogen, liver | 0.0037% | 0.149% | R276C | −0.08 | 0.71 |
|
| Folliculin interacting protein 2 | 0.0016% | 0.101% | S893* | −0.86 | 0.58 |
|
| Calcineurin-like phosphoesterase domain containing 1 | Novel | 0.074% | R149* | −0.14 | 0.44 |
|
| Olfactory receptor, family 52, subfamily B, member 4 (gene/pseudogene) | 0.0018% | 0.076% | R195* | 4.81 | 0.09 |
|
| Sodium channel, voltage gated, type X alpha subunit | 0.0037% | 0.074% | R1155C | 1.62 | 0.08 |
|
| Zinc finger protein 683 | Novel | 0.089% | R35* | 1.18 | 0.03 |
A double-step machine learning algorithm selects variant based on a series of pathogenic prototypes and then further selects them using a permutation-based multi-model regression over age at onset. Variants in this set are negatively associated with age, and are divided in three layers: at the top, variants negatively associated in at least 80% of the models and with an average beta less than −1.5; in the middle, variants retained in at least 40% of the models with poor average beta; at the bottom, variants found negatively associated only in a few models
*translation termination (stop) codon