Literature DB >> 29854292

Somatic mutations in early onset luminal breast cancer.

Giselly Encinas1, Veronica Y Sabelnykova2, Eduardo Carneiro de Lyra3, Maria Lucia Hirata Katayama1, Simone Maistro1, Pedro Wilson Mompean de Vasconcellos Valle1, Gláucia Fernanda de Lima Pereira1, Lívia Munhoz Rodrigues1, Pedro Adolpho de Menezes Pacheco Serio1, Ana Carolina Ribeiro Chaves de Gouvêa1, Felipe Correa Geyer1, Ricardo Alves Basso3, Fátima Solange Pasini1, Maria Del Pilar Esteves Diz1, Maria Mitzi Brentani1, João Carlos Guedes Sampaio Góes3, Roger Chammas1, Paul C Boutros2,4,5, Maria Aparecida Azevedo Koike Folgueira1.   

Abstract

Breast cancer arising in very young patients may be biologically distinct; however, these tumors have been less well studied. We characterized a group of very young patients (≤ 35 years) for BRCA germline mutation and for somatic mutations in luminal (HER2 negative) breast cancer. Thirteen of 79 unselected very young patients were BRCA1/2 germline mutation carriers. Of the non-BRCA tumors, eight with luminal subtype (HER2 negative) were submitted for whole exome sequencing and integrated with 29 luminal samples from the COSMIC database or previous literature for analysis. We identified C to T single nucleotide variants (SNVs) as the most common base-change. A median of six candidate driver genes was mutated by SNVs in each sample and the most frequently mutated genes were PIK3CA, GATA3, TP53 and MAP2K4. Potential cancer drivers affected in the present non-BRCA tumors include GRHL2, PIK3AP1, CACNA1E, SEMA6D, SMURF2, RSBN1 and MTHFD2. Sixteen out of 37 luminal tumors (43%) harbored SNVs in DNA repair genes, such as ATR, BAP1, ERCC6, FANCD2, FANCL, MLH1, MUTYH, PALB2, POLD1, POLE, RAD9A, RAD51 and TP53, and 54% presented pathogenic mutations (frameshift or nonsense) in at least one gene involved in gene transcription. The differential biology of luminal early-age onset breast cancer needs a deeper genomic investigation.

Entities:  

Keywords:  breast cancer; germline mutation; luminal subtype; somatic mutation; young patients

Year:  2018        PMID: 29854292      PMCID: PMC5976478          DOI: 10.18632/oncotarget.25123

Source DB:  PubMed          Journal:  Oncotarget        ISSN: 1949-2553


INTRODUCTION

Breast cancer mainly affects post-menopausal women, however, it is estimated that 4.8-5.0% of cases occur in young adults, less than 40 years [1]. Even at this early age the disease can be highly fatal. In the USA, where cancer is the second leading cause of total deaths in young women aged less than 40 years, breast cancer is the leading cause of cancer deaths in this age group [2]. There is evidence that some cancers in very young adults have differential biology, and probably etiology/pathogenesis, compared to older persons [3]. Surprisingly, only a few studies have explored this question. In breast cancer, germline mutations in BRCA1 and BRCA2 genes may support the carcinogenic process in around 20% of the young patients [4-7], but only in 1-4% of post-menopausal women [8]. Mutations in other cancer predisposing genes such as TP53, PTEN, CHEK2, may explain an additional 4% of early onset cases [9] Younger age has been associated with a less favorable prognosis in breast cancer partly because early onset cases comprise a lower proportion of the relatively good outcome luminal A subtype and higher proportion of the more aggressive triple negative subtype. Moreover, within each subtype, women diagnosed at an early age may have worse outcomes than those diagnosed at more advanced ages in any breast cancer subtype, i.e., luminal [10-12], triple negative and HER2 [13]. Although young age seems to be a poor prognostic factor, different age cut offs have been used, varying from 35 [10, 13] to 40 years [11, 12]. It is interesting to observe that women aged less than 35 years old seems to have similar disease-free survival among themselves, which is worse compared with women aged 35 to 50 years old [10, 13]. Accordingly, mRNA abundance analysis revealed a differential transcriptional profile in tumors arising in young women, with enrichment of biological processes related to immature mammary cell populations and growth factor signaling [11]. Molecular signatures of breast cancer subtypes, irrespective of age, have been examined and great differences have been shown between basal-like and luminal tumors, the former presenting a higher rate of genomic rearrangements than the latter [14]. In addition, numerous subtype-associated and novel gene mutations have been described [15-18]. These studies however, have not focused on somatic point mutations (single nucleotide variants, SNVs) that may distinguish early onset breast cancer. Hence, our aims were to characterize a group of Brazilian patients with early onset breast cancer for BRCA1 and BRCA2 germline mutations, as well as for somatic SNVs arising in luminal subtype tumors.

RESULTS

Family history suggestive of hereditary breast and ovarian cancer syndrome (HBOCS) and germline BRCA1 and BRCA2 mutations

Our first aim was to evaluate family history of cancer and to detect BRCA1 and BRCA2 mutations in very young Brazilian patients. For this purpose, 79 young women were interviewed, among whom 17 (21.5%) were not able to provide family history for one or both sides of the family. Thirty (48.3%) out of 62 patients with informative family history reported at least one close relative (until 3rd degree) with breast, ovarian, pancreatic or prostate cancer, among whom 10 (16.2%) reported at least one affected first degree family member (Supplementary Table 1). Thirteen out of 79 patients presented pathogenic mutations (16.5%) in BRCA1 or BRCA2 genes. These, represent 12 distinct types of mutations: three frameshift and one missense in BRCA1 and four frameshift, three nonsense and one missense in BRCA2. Only one mutation (frameshift mutation in BRCA2 c.2808_2811delACAA (p.Ala938Profs) was detected in two women; one nonsense mutation c.483T>A (C161X) on exon 6 of BRCA2 was detected for the first time (Supplementary Tables 2 and 3). Twenty-nine variants of uncertain significance (VUS) were also identified, including 13 distinct missense variants, each one detected only once: two in BRCA1 and 11 in BRCA2 gene. Two patients presented more than one (missense) VUS in BRCA2 gene, one of them, diagnosed with a triple negative tumor reported a positive family history (c.3349A>G; c.5414A>G; c.8092G>A); the other one, diagnosed with a luminal B tumor, presented a limited family history (c.2837A>G; c.7418G>A). In addition, one VUS characterized as an in-frame deletion in exon 23 of BRCA1 gene (c.5425_5430delGTTGTG) (Supplementary Table 4), was observed in a patient with positive family history, diagnosed with triple negative breast cancer. The other VUS were characterized as intronic or synonymous variants. Neither of these patients presented large deletions/amplifications in BRCA1 and BRCA2, nor CHEK2 mutations (c.1100delC).

Somatic SNVs detected by whole exome sequencing

Eight patients, who were BRCA1 and BRCA2 wild type carriers with luminal HER2 negative tumors, had their tumor and normal exomes sequenced. These patients mainly reported Brazilian ancestry in both sides of the family, which means that their parents and grandparents were born in Brazil, but they were not aware from where did more ancient ancestries had come from. One patient reported one maternal grandmother with Amerindian ancestry and a second patient reported grandparents from the paternal side with European ancestry. Whole exome sequencing of these eight tumors and matched blood samples was performed to a mean sequencing depth of 35.8x for tumors and 36.3x for corresponding blood samples (Supplementary Table 5). The mean total mutation rate across all samples was 1.9/Mbp. The mean non-silent mutation rate was 1.8/Mbp (Supplementary Table 6) and the most frequent events were C to T transitions, mainly seen in trinucleotides ACG>ATG and CCT>CTG (Figure 1).
Figure 1

Trinucleotide mutational profile of current luminal samples

Trinucleotide barplot showing the number of Single Nucleotide Variants (SNVs) in the context of each of the 96 trinucleotide mutation types. The blue covariates at the bottom of the plot represent the 5' and 3' ends. All the 310 SNVs were considered.

Trinucleotide mutational profile of current luminal samples

Trinucleotide barplot showing the number of Single Nucleotide Variants (SNVs) in the context of each of the 96 trinucleotide mutation types. The blue covariates at the bottom of the plot represent the 5' and 3' ends. All the 310 SNVs were considered. We identified 310 somatic single nucleotide variants (SNVs), comprising of 303 unique variants (five SNVs were detected in two patients each; one SNV was detected in three patients), and mainly comprising intergenic regions, 3 prime UTR, missense, intron and synonymous variants (Supplementary Table 7). The median mutation load was 37.5 and varied from 19-74 SNVs per tumor. SeqSig analysis revealed 55 likely driver non-synonymous mutations in 53 genes (false discovery rate, (FDR) < 10%); (Figure 2) and PIK3CA was the only recurrent finding, which was detected in three different tumors. Somatic SNVs were then verified by performing an independent capillary sequencing (except for GLI3, LONRF3 and EPPK1 that were not tested) and 81% (42/52) were confirmed (Supplementary Tables 8, 8a). Confirmed SNVs included nonsense mutations in four genes, GRHL2, GRIN1, NOL9 and TTC21B, as well as 38 missense mutations in 36 different genes, including known tumor suppressor genes, such as TP53 and POLD1, and protein kinases like PRKD1, PRKAR1A and AK8.
Figure 2

Landscape of coding somatic SNVs

Each of the 54 genes in which at least one significant SNV was identified is listed down the left hand side. The genes are listed by their significant SeqSig q-value (FDR adjusted p-value). Type and number of mutations (top panel), significantly mutated genes (middle panel) and percentage of Single Nucleotide Variants (SNVs) (bottom panel) per tumor sample.

Landscape of coding somatic SNVs

Each of the 54 genes in which at least one significant SNV was identified is listed down the left hand side. The genes are listed by their significant SeqSig q-value (FDR adjusted p-value). Type and number of mutations (top panel), significantly mutated genes (middle panel) and percentage of Single Nucleotide Variants (SNVs) (bottom panel) per tumor sample. We compared our results with the gene list from the “Cancer Gene Census” database (http://cancer.sanger.ac.uk/census) [19] and detected five genes, PIK3CA, TP53, PRKAR1A, POLD1 and CIITA, which were already causally implicated in cancer. We have then examined more closely this list of candidates to identify potentially cancer driver genes, using the score system described in methods, mainly based on detection in databases of mouse insertional mutagenesis experiments and causal relationship mutation function assessment algorithms, Kaplan-Meier (KM) plotter [20] (to assess the effect of the genes on breast cancer prognosis) (Supplementary Figure 1) and literature, among others. Excluding the five genes included in the “Cancer Census Gene” database, another 18 genes were already reported as candidate cancer genes through transposon-based forward genetic screens in mice [21] (Table 1). Seven genes were relatively frequently mutated (≥1%) in cancer in general or in breast cancer specifically: CACNA1E, HECW2, STAB1, ZNF462, FLG, TTN and NDST4. Finally, using the above ranking system, 20 genes, scoring at least 2, were considered possible cancer drivers (Supplementary Tables 9, 9a), such as PIK3AP1, GRHL2, CACNA1E, SMURF2, SEMA6D, RSBN1, MTHFD2, among others.
Table 1

Cancer-related analysis of confirmed gene variants detected in breast cancer samples in the current analysis

IDGeneAlterationCGCCCGD (mice)Mutation domainSame variant in BC/Other CancersSNVs Frequency in all cancersSNVs Frequency in BCSNVs in BC young patients/all agesFATHMM (score)PolyPhenSIFTGV/GDCRAVAT - CHASM p-value (missense)KM - OSLiteratureScoreTotal
401MTHFD2p.P17L c.50C>TNoBlood - D [1], Colorectal - NR [2]low_complexitySource: segmaskerNo/No38/37401 (0.10%)1/2137 (0.05%)0/1Pathogenic (0.87)BenignTolerated0.00/97.78 (C65)0.2851p = 0.0027 OE[35]2.5pd
401SEMA6Dp.E553A c.1658A>CNoSarcoma - B [6] Colorectal - B [7]Plexin RepeatNo/No317/37626 (0.84%)17/2159 (0.78%) 1 FS2/17Pathogenic (0.74)NDTolerated0.00/106.71 (C65)0.3967p = 0.0091 n≤200[8]3pd
402CACNA1Ep.R590W c.1768C>TNoNo resultsIon transport DomainNo/Yes902/37516 (2.40%)54/2116 (2.55%) 2 NS3/45Pathogenic (0.89)DeleteriousNot Tolerated0.00/101.29 (C65)0.0523p = 0.11 n≤200[9, 10]3pd
402CIITAp.P443T c.1327C>AYesNo resultsNACHT domainNo/No2/37750 (0.005%)0-Pathogenic (0.58)DeleteriousTolerated0.00/37.56 (C35)0.3563p = 0.085[1113]4.5CGC
402FAM65B/RIPOR2p.E718D c.2154G>CNoBlood - C [14]No Pfam annotations foundNo/No130/37401 (0.35%)7/2114 (0.33%) 1 FS1/7Neutral (0.26)NDTolerated0.00/44.60 (C35)0.8042p = 0.14NO1Neutral
402HECW2p.D265G c.794A>GNoLiver - D [15]C2 DomainNo/No422/38016 (1.11%)20/2312 (0.86%)1/20Pathogenic (0.98)DeleteriousNot Tolerated0.00/93.77 (C65)0.058p = 0.42 n≤200[16]3pd
402NESp.E340V c.1019A>TNoNo resultsNo fuctional domainNo/No346/37419 (0.92%)17/2137 (0.79%) 1 NS0/17Pathogenic (0.59)DeleteriousNot Tolerated0.00/121.33 (C65)0.1896p = 0.028 UE[1721]3pd
402PIK3CAp.E545K c.1633G>AYesBlood - D [1] Gastric - D [22] Liver - C [23, 24] Nervous system - D [25] Skin - B [26]PIK domainYes/Yes10271/107457 (9.56%)4098/15384 (26.64%)-Pathogenic (0.97)DeleteriousTolerated0.00/56.87 (C55)0.0002p = 0.057Oncogene7.5CGC
402SMURF2p.S193C c.578C>GNoBlood - A [1] Colorectal - C [2, 27] Gastric - C [22] Liver - A [15, 23] Pancreatic – D [28]disorderSource: IUPredNo/No117/38086 (0.30%)9/2288 (0.39) 3 FS1/9Pathogenic (0.91)BenignNot Tolerated0.00/111.67 (C65)0.4405p = 0.074 n≤200[29, 30]4PD
402STAB1p.G1381R c.4141G>ANoNo resultsNo fuctional domainNo/No502/37566 (1.34%)23/2158 (1.06%) 3 FS/2 NS0/22Pathogenic (0.72)DeleteriousTolerated0.00/125.13 (C65)0.6908p = 0.071[3133]2.5pd
402ZNF462p.G2426C c.7276G>TNoBreast - C [34]No fuctional domainNo/No555/37476 (1.48%)36/2137 (1.68%) 2 NS3/36Pathogenic (0.89)DeleteriousTolerated0.00/158.23 (C65)0.0004p = 0.028 n≤200NO2.5pd
404CILP2p.R472G c.1414C>GNoBlood - D [1]No fuctional domainNo/No309/37401 (0.83%)5/2137 (0.23%) 1 NS0/5Neutral (0.1)DeleteriousNot Tolerated0.00/125.13 (C65)0.661p = 0.17 n≤200NO1.5Neutral
404ELMO3p.L251F c.753G>CNoNo resultsNo Pfam annotations foundNo/No117/37401 (0.31%)2/2137 (0.09%)1/2Pathogenic (0.66)BenignTolerated0.00/21.82 (C15)0.0675p = 0.000052 OE[3537]2pd
404NOL9p.S283* c.848C>GNoNo resultslow_complexitySource: segmaskerNo/No111/37401 (0.30%)5/2137 (0.23%) 1 FS1/5Neutral (0.13)NDND--p = 0.15NO1.5Neutral
406C2orf57/TEX44p.T265M c.794C>TNoBlood - D [1]Domain of unknown functionNo/Yes89/37401 (0.24%)1/2137 (0.05%) 1 NS0/1Neutral (0.00)BenignTolerated0.00/81.04 (C65)0.5572p = 0.14 n≤200NO0.5Neutral
406IL22p.R73C c.217C>TNoNo resultsInterleukin 22 domainNo/No60/37402 (0.16%)3/2137 (0.14%)0/3Pathogenic (0.57)DeleteriousNot Tolerated0.00/179.53 (C65)0.6159p = 0.4 n≤200[3840]2.5pd
406OSR2p.G262E c.785G>ANoNo resultsZinc Finger domainNo/No98/37312 (0.26%)5/2126 (0.24%) 1 FS0/5Pathogenic (0.94)DeleteriousNot Tolerated0.00/97.85 (C65)0.0922p = 0.2[41]2.5pd
406PIK3CAp.H1047R c.3140A>GYesBlood - D [1] Gastric - D [22] Liver - C [23, 24] Nervous system - D [25] Skin - B [26]PI3K/PI4K domainYes/Yes10271/107457 (9.56%)4098/15384 (26.64%)-Pathogenic (0.96)DeleteriousTolerated0.00/28.82 (C25)0p = 0.057Oncogene7.5CGC
406POC5p.R541Q c.1622G>ANoNo resultsNo Pfam annotations foundNo/No87/37355 (0.23%)3/2137 (0.14%) 1 FS2/3Pathogenic (0.94)DeleteriousNot Tolerated0.00/48.81 (C35)0.4111p = 0.06 n≤200NO1Neutral
406PRKAR1Ap.L20F c.58C>TYesLiver - B [15, 23] Nervous system - D [25]Dimerization and phosphorylation regionNo/No112/40450 (0.28%)8/2379 (0.34%) 2 NS0/8Pathogenic (0.94)BenignTolerated0.00/21.82 (C15)0.3232p = 0.14[4244]6CGC
406TP53p.T220C c.659A>GYesColorectal - C [2, 27, 45] Nervous system - NR [46] Skin- A [47] Liver - NR [24]P53 DNA-binding domainYes/Yes31140/127779 (24.37%)3189/13359 (23.87%)-Pathogenic (0.99)DeleteriousNot Tolerated-0.0012p = 0.041 UETSG9CGC
413AK8p.T101P c.301A>CNoBlood - B [1]Adenylate kinaseNo/No106/37402 (0.28%)3/2114 (0.14%)1/3Neutral (0.02)BenignTolerated0.00/37.56 (C35)0.7606p = 0.12 n≤200NO2pd
413PLA2G4Dp.S173G c.517A>GNoBlood - D [1]No fuctional domainNo/No197/37446 (0.53%)10/2135 (0.47%) 1 FS0/10Neutral (0.04)BenignTolerated0.00/55.27 (C55)0.3458-NO0.5Neutral
413POLD1p.P146R c.437C>GYesNo resultsExonucelase domainNo/No263/37786 (0.70%)9/2137 (0.42%) 5 FS0/9Pathogenic (0.95)DeleteriousNot Tolerated0.00/102.71 (C65)0.1793p = 0.000042 OE[4851]6.5CGC
413RSBN1p.P148S c.442C>TNoBlood - B [1] Liver - D [15] Colorectal - C [2] Gastric - C [22]Pro-Rich domainNo/No149/37401 (0.40%)12/2137 (0.52%) 1 NS1/12Neutral (0.27)DeleteriousNot Tolerated0.00/73.35 (C65)0.1256p = 0.011 UENO4PD
413SLC13A1p.R277P c.830G>CNoNo resultsNo Pfam annotations foundNo/No241/37402 (0.64%)15/2137 (0.70%) 1 NS0/15Neutral (0.02)BenignTolerated0.0/102.71 (C65)0.6497p = 0.064NO0Neutral
413ZNF33Ap.G183V c.548G>TNoNo resultsNo fuctional domainNo/No166/37402 (0.44%)4/2137 (0.19%) 1 FS0/4Neutral (0.12)NDND0.3098p = 0.0081 n≤200NO0Neutral
415LRRC66p.H434N c.1300C>ANoNo resultsdisorderSource: IUPredNo/No276/37403 (0.74%)7/2137 (0.33%)2/7Neutral (0.00)BenignTolerated0.00/68.35 (C65)0.577p = 0.015 n≤200NO0Neutral
415MAMLD1p.A775V c.2324C>TNoMixed - NR [52] Colorectal - C [2]No Pfam annotations foundNo/No172/37402 (0.46%)17/2137 (0.79%) 2 FS/1 NS0/17Neutral (0.00)NDTolerated0.00/64.43 (C65)0.7952p = 0.23NO1Neutral
415PIK3AP1p.Q285K c.853C>ANoBlood - A [1] Colorectal - NR [2] Liver - C [15, 23]Dof, BCAP, and BANK (DBB) motifNo/No194/37522 (0.51%)7/2137 (0.33%)0/7Pathogenic (0.98)DeleteriousNot Tolerated0.00/53.23 (C45)0.1641p = 0.17 n≤200[5355]4.5PD
415PIK3CAp.H1047L c.3140A>TYesBlood - D [1] Gastric – D [22] Liver - C [23, 24] Nervous system - D [25] Skin - B [26]PI3K/PI4K domainYes/Yes10271/107457 (9.56%)4098/15384 (26.64%)-Pathogenic (0.96)BenignTolerated0.00/98.69 (C65)0p = 0.057Oncogene7.5CGC
415PRKD1p.Y800C c.2399A>GNoNo resultsProtein Kinase DomainNo/No328/38363 (0.85%)11/2364 (0.46%) 1 NS0/11Pathogenic (0.98)DeleteriousNot Tolerated0.00/193.72 (C65)0.0002p = 0.075[5660]2.5pd
415TTC21Bp.R898* c.2692C>TNoNo resultsTetratricopeptide repeatNo/No230/37405 (0.61%)13/2137 (0.61%) 1 FS/1 NS0/13Pathogenic (0.90)NDND-p = 0.15NO1.5Neutral
416FAM96Ap.E75K c.223G>ANoColorectal - NR [2]Iron-sulfur cluster assembly proteinNo/No36/37402 (0.10%)2/2137 (0.09%)0/2Pathogenic (0.94)BenignTolerated0.00/56.87 (C55)0.6908p = 0.039 N≤200[61, 62]1.5Neutral
416FLGp.R1166C c.3496C>TNoNo resultsdisorderSource: IUPredNo/Yes1467/37980 (3.87%)78/2139 (3.64%) 1 NS-Neutral (0.01)NDNot Tolerated0.00/179.53 (C65)0.6299p = 0.18[63]1.5Neutral
416GRIN1p.Q910* c.2728C>TNoColorectal - NR [2]disorderSource: IUPredNo/No162/37493 (0.43%)7/2137 (0.33%)0/7Pathogenic (0.77)NDND--p = 0.24[64]2.5pd
416MYO1Hp.E501G c.1502A>GNoNo resultsMyosin motor domainNo/No237/37317 (0.63%)17/2126 (0.76%) 1 FS/1 NS1/17Pathogenic (0.98)NDTolerated0.00/97.85 (C65)0.0825p = 0.14 n≤200NO0.5Neutral
416TTNp.L6228S c.18683T>CNoColorectal - B [65]IG-Like 43 domainNo/No4470/37491 (11.92%)288/2105 (13.68%)-Pathogenic (0.81)DeleteriousND-0.1523p = 0.019 OE[66, 67]4.5PD
417AZI2p.I66V c.196A>GNoNo resultscoiled_coilSource: ncoilsNo/No63/37401 (0.17%)5/2137 (0.23%) 1 FS0/5Pathogenic (0.65)BenignTolerated0.00/29.61 (C25)0.2633p = 0.000061 n≤200NO0Neutral
417GRHL2p.E32* c.94G>TNoColorectal - D [2], pancreatic - D [58]disorderSource: IUPredNo/No171/37403 (0.46%)17/2138 (0.79%) 3 NS0/17Pathogenic (0.99)NDND--p = 0.39[6869]3pd
417NDST4p.V313F c.937G>TNoNo resultsheparan sulfate-N-deacetylase domainNo/No403/37402 (1.08%)6/2137 (0.28%)0/6Pathogenic (0.98)DeleteriousNot Tolerated0.00/49.94 (C45)0.0601p = 0.1[70, 71]2.5pd
417PRICKLE2p.P81L c.242C>TNoSkin - D [26]PET DomainNo/No213/37402 (0.57%)8/2138 (0.37%) 2 NS0/8Pathogenic (0.99)DeleteriousNot Tolerated0.00/97.78 (C65)0.0751p = 0.17 n≤200[72, 73]3pd

SNVs Frequency in breast cancer*: frequency of SNVs (including synonymous) in breast cancer. SNVs (n) in breast cancer. Young pts/all ages: number of SNVs (excluding synonymous) in breast cancer (BC) patients ≤ 35 years/number of SNVs in BC patients with all ages (excluding patients who had unknown ages). CGC genes for which mutations have been causally implicated in cancer and which are catalogued at “Cancer Gene Census”. CCGD genes that are potential cancer drivers in genetic screens in mice and are catalogued at the “Candidate Cancer Gene Database”. NS: Nonsense; FS: Frameshift. KM-OS: (p) for overall survival evaluated through gene expression using KM plotter. Abbreviations: OE: Overexpression associated with longer survival; UE: Underexpression associated with longer survival. PD: Probably driver; pd: Possibly driver. The score system is described in Supplementary Table 9. Coments of literature is referenced in Supplementary Table 9a.

SNVs Frequency in breast cancer*: frequency of SNVs (including synonymous) in breast cancer. SNVs (n) in breast cancer. Young pts/all ages: number of SNVs (excluding synonymous) in breast cancer (BC) patients ≤ 35 years/number of SNVs in BC patients with all ages (excluding patients who had unknown ages). CGC genes for which mutations have been causally implicated in cancer and which are catalogued at “Cancer Gene Census”. CCGD genes that are potential cancer drivers in genetic screens in mice and are catalogued at the “Candidate Cancer Gene Database”. NS: Nonsense; FS: Frameshift. KM-OS: (p) for overall survival evaluated through gene expression using KM plotter. Abbreviations: OE: Overexpression associated with longer survival; UE: Underexpression associated with longer survival. PD: Probably driver; pd: Possibly driver. The score system is described in Supplementary Table 9. Coments of literature is referenced in Supplementary Table 9a. Each tumor sample was then individually explored to detect potential drivers. Three tumors presented SNVs in at least three potential cancer driver genes: 402, 406 and 415. In tumor 402, besides PIK3CA and CIITA, other candidate cancer genes harboring somatic SNVs were CACNA1E, NES, STAB1, HECW2, SMURF2 and ZNF462. In tumor 406, SNVs were detected in three known cancer genes reported in the “Cancer Gene Census” database [19]: PIK3CA, TP53 and PRKAR1A. However, the alteration detected in the latter was considered pathogenic in only one of the five mutation function assessment algorithms (Table 1). In addition, SNVs were observed in other two possibly driver genes, IL22 and OSR2. In tumor 415, besides PIK3CA, other potential cancer drivers affected by SNVs were PIK3AP1 and PRKD1. In the other five tumors, SNVs were identified in one to three potential cancer driver genes: in tumor 413, RSBN1; in tumor 416: TTN and GRIN1; in tumor 401: SEMA6D [21, 22], as well as MTHFD2; in tumor 417: GRHL2, PRICKLE2 and NDST4. In tumor 404, a possible cancer driver is ELMO3, which KM plotter indicated that overexpression is associated with poor overall survival (Supplementary Figure 1). To further explore somatic mutations in luminal tumors (HER2 negative) from very young patients, we identified another 29 patients aged ≤35 years at diagnosis, who had data published in studies of tumor exome or genome sequencing [15-18], most of which, deposited in the COSMIC database [15-17]. In these tumors, the most frequent events were C to T transitions, representing a mean percentage of 39% of the substitutions (Supplementary Figure 2). A total of 1,617 non-synonymous variants were detected across these 29 patients, with a median number of 29 variants per patient (minimum: 9 and maximum: 546; mean: 56) (Supplementary Table 10, 10a). Some genes, that were present in our list, were also mutated in these luminal tumors, such as PIK3CA, TP53, AK8, CIITA, FLG, POC5, POLD1, SEMA6D, TTN and LCRC66. Functional categories enriched in gene variants according to DAVID bioinformatics tool [23] included ATP binding, in five tumors and plasma membrane, in three tumors, among others less frequently represented (Supplementary Table 11). Seven out of these 29 tumors presented SNVs in just one cancer driver, classified in the “Cancer Gene Census” database, which were: AKT1, MPL (MPL Proto-Oncogene, Thrombopoietin Receptor), TP53, GATA3 (2 samples), BCOR (BCL6 Corepressor) and KMT2C (Lysine Methyltransferase 2C), while the other 19 tumors presented SNVs in at least two cancer genes from the “Cancer Gene Census” database (Supplementary Table 12). Furthermore, three tumors did not show any variants in driver candidates from the list of “Cancer Gene Census”, but each one presented SNVs in one or two genes, already reported in the “Candidate Cancer Gene Database” category A: PDS5B (PDS5 cohesin associated factor); LPHN2/ADGRL2 (Adhesion G Protein-Coupled Receptor L2) and ETF1 (Eukaryotic translation termination factor 1); CELF2 (CUBGBD Elav-like family member 2) and NAP1L4 (Nucleosome assembly protein 1 like 4). All genes considered as causally implicated in cancer or potential cancer drivers are shown in Figure 3 and Supplementary Table 12. The score system (described in methods) identified FAT2 (FAT atypical cadherin 2) as a probable driver gene in two samples, because it is a gene ranked B in CCGD, also frequently mutated in cancers and variants were considered pathogenic in three out of four prediction models of cancer causality investigated.
Figure 3

Distribution of mutated candidate driver genes among 28 tumor samples retrieved from the literature and COSMIC database

All cancer genes listed at “Cancer Gene Census” (CGC) database (http://cancer.sanger.ac.uk/cosmic/census) and all driver candidates listed in “Candidate Cancer Gene Database” (CCGD), ranked as A (http://ccgd-starrlab.oit.umn.edu/about.php), are shown. Note: Sample TCGA-04 is shown exclusively in Supplementary Table 10 (but not in the figure), due to a large number of somatic mutations (CGC= 30; CCGD rank A= 56). Green: CGC; Red: CCGD, rank A [18]. Causal relationship with cancer was based on a scoring system, described in Materials and Methods. All reported genes affected by SNVs appear in Supplementary Table 10.

Distribution of mutated candidate driver genes among 28 tumor samples retrieved from the literature and COSMIC database

All cancer genes listed at “Cancer Gene Census” (CGC) database (http://cancer.sanger.ac.uk/cosmic/census) and all driver candidates listed in “Candidate Cancer Gene Database” (CCGD), ranked as A (http://ccgd-starrlab.oit.umn.edu/about.php), are shown. Note: Sample TCGA-04 is shown exclusively in Supplementary Table 10 (but not in the figure), due to a large number of somatic mutations (CGC= 30; CCGD rank A= 56). Green: CGC; Red: CCGD, rank A [18]. Causal relationship with cancer was based on a scoring system, described in Materials and Methods. All reported genes affected by SNVs appear in Supplementary Table 10. Among the 29 tumors, six were obtained from patients whose BRCA1 and BRCA2 status was known: two wild type and four mutation carriers. Somatic SNVs in both tumors from BRCA1 and BRCA2 wild type germline patients involved GATA3; however, none of the affected genes in this pair of tumors coincided with data from our patients. Finally, we analyzed the 37 tumors all together (29 previously reported and 8 currently evaluated). Considering only SNVs detected in the genes already included in the “Cancer Gene Census” database or the “Candidate Cancer Gene Database”, categories A or B, the median number (minimum and maximum) of driver candidates per tumor, were: 2 (0-30); 2 (0-56); 2 (0-61) respectively, totalizing a median of 6 potential drivers affected per tumor (0-147) (Supplementary Tables 12). The most frequently altered cancer causing genes according to “Cancer Gene Census” were PIK3CA (11/37: 29.7%); GATA3 (7/37; 18.9%), TP53 (6/37: 16.2%) and MAP2K4 (3/37: 8.1%). SNVs were also frequently detected in the following genes: TTN (7/37; 18.9%), CAMK1G, LYST, DALRD3 (3/29; 10.3%) and FLG (3/37; 8.1%). Among these genes, it is interesting to point out that pathogenic frameshift mutations in DALRD3 were detected in two (out of three) tumor samples. PIK3CA was concomitantly mutated with TP53 in three tumors and with GATA3 in one tumor (Figure 4; Supplementary Figure 3).
Figure 4

Most frequently mutated genes in luminal tumors

Samples (n=27) presenting SNVs in at least one of the nine most frequently mutated genes were included (current analysis, n=4; and COSMIC Database, n=23). Type of gene alteration and BRCA1/2 status are shown. Each column represents a single patient. UK: unknown.

Most frequently mutated genes in luminal tumors

Samples (n=27) presenting SNVs in at least one of the nine most frequently mutated genes were included (current analysis, n=4; and COSMIC Database, n=23). Type of gene alteration and BRCA1/2 status are shown. Each column represents a single patient. UK: unknown. SNVs were detected in genes involved in DNA repair mechanisms in 16 out of the 37 tumors (43.2%). In 11 samples, only one gene was altered, such as FANCD2, FANCL or BAP1, which are involved in homologous recombination repair (HRR); PARP4 (2 samples), involved in base excision repair (BER); ATR and TP53 (the latter altered in 3 samples), involved in signaling DNA damage to cell cycle checkpoints. In two tumors, SNVs uniquely affected polymerases POLD1 or POLE, which are involved in the base excision repair (BER), nucleotide excision repair (NER) and mismatch repair (MMR). Three samples presented composite gene disturbances involving TP53 and either POLD1 or RAD51 (HRR) or POLQ (involved in translesion synthesis, TLS). The highest number of SNVs was described in two tumors, one presenting mutations in genes involved in BER (MUTYH), NER (ERCC6) and HRR (PALB2) and the other, in genes involved in MMR (MLH1) and HRR (RAD9A) [24] (Table 2).
Table 2

Samples with somatic mutations in genes involved in DNA repair mechanisms

SampleGeneMechanisms of DNA repairN. of variants/sample
BERNERMMRHRRNHEJDDCTLS
406TP53X7
413POLD1XXX6
PD-02FANCD2X17
PD-04POLD1XXX55
TP53X
PD-05ATRXX36
PD-06FANCLX76
PD-10BAP1X17
PD-11TP53X16
TCGA-01PARP4X46
TCGA-04MUTYHX546
ERCC6X
PALB2X
TCGA-06PARP4X48
TCGA-07POLEXXX21
TCGA-08MLH1X229
RAD9AX
TCGA-10TP53X9
TCGA-11TP53X84
RAD51X
TCGA-14POLQX79
TP53X

DNA repair genes altered and respective pathways affected per patient. Base excision repair (BER), nucleotide excision repair (NER), mismatch repair (MMR), homologous recombination repair (HRR), non-homologous end-joining (NHEJ), DNA damage signaling to cell cycle checkpoints (DDC) and translesion synthesis (TLS).

DNA repair genes altered and respective pathways affected per patient. Base excision repair (BER), nucleotide excision repair (NER), mismatch repair (MMR), homologous recombination repair (HRR), non-homologous end-joining (NHEJ), DNA damage signaling to cell cycle checkpoints (DDC) and translesion synthesis (TLS). In addition, variants involving 213 genes were of nonsense or frameshift types (Supplementary Table 12). One of these genes, RBM16/SCAF8 is a driver candidate, because it is also listed in the “Candidate Cancer Gene Database”, rank A, in at least two solid tumor types. Among these genes, 42 were involved in positive regulation of gene expression [25] and one of these genes was mutated in 15 different samples and more than one gene was mutated in five other samples. Hence, 54% of the luminal samples presented at least one mutated gene involved in gene expression regulation (Table 3).
Table 3

Characterization of nonsense and frameshift variants according to CCGD A/B and biological function (Toppgene) per sample

SampleGene IDCCGD ACCGD BPositive regulation of gene expressionN. of variants/sample
415TTC21B---6
416GRIN1--GRIN15
417GRHL2--GRHL24
BRV-01CTCFL; FAM118B; MARVELD2; VPS11--CTCFL28
PD-02PTENPTEN-PTEN17
PD-04FSCB; IL12RB2; AKAP11---55
PD-05ATR; VIM HK1; OR7C1; RBM16/SCAF8ATR; RBM16/SCAF8-VIM36
PD-06CACNA2D3; DNAH17; PEX5L MAB21L3; SYNE1; ZMYND11-ZMYND11; DNAH17-76
PD-07PTRH1---13
PD-08KCNJ15; SYTL2 ENSG00000233280---41
PD-09PLCG2 SHCBP1; GATA3PLCG2GATA3GATA315
PD-10NEMFNEMF-17
PD-11KRTAP2-1; SLC2A3; NARG2/ICE2 COL22A1--NARG2/ICE216
PD-12GATA3; PDE7A-GATA3GATA322
TCGA-01ABCA10; NTRK2; MAP3K6_ENST00000374040; CX3CR1; KBTBD4; KMT2A--KMT2A; NTRK246
TCGA-02GATA3; DALRD3; RASGRP2; SALL3; TNFSF9-GATA3GATA318
TCGA-03C1orf187; NR1I3; FAM155A; GNAS; PCDHA2; SSC5D; SEC14L5; WDR81_ENST00000409644--NR1I321
TCGA-0441UBR5BTBD7; ITGB1; KLHDC2; MTA2; ODF2; PCCA; PPFIA3; RASGRF1ITGA8; MTA2; NFKBIA; ATF7IP; SPAG8; TARBP2; TLR3546
TCGA-05NMS; FAM111B; DYNC2H1_ENST00000398093---31
TCGA-06ARID1A; CFTR; SYNM; GATA3; CCDC61; CDK18; IRF7; TCF20; KIAA0430/MARF1; LZTR1; MAP2K4; SH3PXD2AARID1A; TCF20; CFTR; MAP2K4; SH3PXD2A;GATA3ARID1A; GATA3; IRF7; TCF2048
TCGA-07GATA3GATA3GATA321
TCGA-0873ATXN2; DIP2B; KIAA2026_E NST0000039 9933; PCNX1; PHF2; TNKS2; SLC9A1CLMN; KIAA0947_ ENST0000 0296564/ICE1; OSBPL1A; RAB11A; ARHGAP2 9; SLTM;17229
TCGA-09DYNC1H1; IGSF3; MAST1; AKAP12; ASB10_ENST00000422024; NASP; TENM1/ODZ1; THOC5; ZNF799-THOC5THOC535
TCGA-10C9orf66---9
TCGA-11A2M; CHKB; NBR1; RB1; SYT3; ARR3; KIFC3; PPP1R3C; ZBTB24RB1NBR1RB184
TCGA-12EFEMP1; MAP2K4; C1orf35; KIF26AMAP2K4--15
TCGA-13SNUPN_ENST00000371091; GATA3; GRM6-GATA3GATA313
TCGA-14PRDM5; COL14A1; POLA1; SLC22A25-PRDM5-79
TCGA-15IGSF1; NFYB; SCN2A; TRAF5-NFYBNFYB; TRAF534
TCGA-16JHDM1D/KDM7A--JHDM1D/KDM7A30

Samples with genes affected by nonsense or frameshift variants were searched for candidate cancer genes (CCGD database ranks A or B) and involvement in positive regulation of gene expression (GO: biological process).

Samples with genes affected by nonsense or frameshift variants were searched for candidate cancer genes (CCGD database ranks A or B) and involvement in positive regulation of gene expression (GO: biological process).

DISCUSSION

Our goal was to characterize BRCA1 and BRCA2 germline mutations in a group of very young Brazilian patients and to identify somatic mutations in luminal HER2 negative breast cancer. Our data indicates that in very young Brazilian patients, BRCA1 and BRCA2 mutation frequency is 16%, similar to that already reported in comparable groups of patients from Brazil [7], as well as from other countries [4-8]. However, there is still a lack of information regarding the spectrum of mutations and VUS in the average Brazilian population, that harbors peculiar characteristics of miscegenation, comprehending a mixture of 70% European, 15% African and 15% Amerindian ancestry genes [26]. In our patients we could detect a new mutation in the BRCA2 gene, as well as another 13 variants of unknown significance. Somatic mutation in the group of eight luminal samples (HER2 negative) from BRCA1 and BRCA2 wild type carriers were then investigated. The overall mutation rate in these tumor samples was 1.93 per Mbp, as compared with 1.18 per Mbp and 1.66 per Mbp reported in luminal samples from post-menopausal women [27] and other breast cancer samples in general, irrespective of subtype or age [18], respectively. We have also detected a predominance of C>T substitutions, a signature previously associated with advancing age, indicating that these alterations are also the most prevalent in early onset breast cancer [28]. In accordance, the same signature was also the most frequent among other luminal tumors from very young patients deposited in COSMIC [15-17]. In the present series, somatic SNVs affected, among others, five known cancer causing genes, PIK3CA, TP53, PRKAR1A, POLD1 and CIITA [19]. PIK3CA was the only recurrent finding, which was detected in three different tumors. Other cancer causing candidates were SMURF2, PIK3AP1, RSBN1, TTN and SEMA6D, which were ranked in the top 25% potential drivers in transposon insertional mutagenesis studies in mice [21, 29]. These genes variants were also considered pathogenic/deleterious/not tolerated in at least two out of five mutation function assessment algorithms. In addition, SNVs were detected in genes that were previously associated with cancer, such as CACNA1E, PRKD1, NDST4, and were also considered pathogenic/deleterious/not tolerated in at least three mutation function models. Moreover, nonsense mutations were detected in GRHL2, GRIN1, NOL9 and TTC21B, however only GRHL2 and GRIN1 were previously shown to be involved in cancer. GRHL2 (grainyhead-like transcription factor 2), is a transcription factor that mainly suppresses epithelial mesenchymal transition (EMT) process. It is considered a potential tumor suppressor gene in breast cancer [30]. GRIN1 or NMDAR1 (N-Methyl-D-Aspartate Receptor Subunit NR1) was shown to be expressed in breast cancer specimens, but not in normal breast and to be involved in tumor growth [31], being thus, a potential oncogene. SMURF2 (SMAD specific E3 ubiquitin protein ligase 2) is a tumor suppressor involved in the maintenance of genomic stability and suppression of breast cancer cells invasiveness [32, 33]. PIK3AP1 (phosphoinositide-3-kinase adaptor protein 1), also known as BCAP, is involved in the phosphatidylinositol 3-kinase (PI3K) pathway and genome wide association studies suggest that the PIK3AP1 gene region might be involved in breast cancer predisposition [34]. RSBN1 encodes a round spermatid basic protein 1, which function is not well established. In breast cancer lineages RSBN1 expression is induced by hypoxia and the gene is a potential HIF target [35]. Besides, in luminal breast cancer, RSBN1 high expression is associated with a better prognosis in luminal breast cancer [20]. CACNA1E, calcium voltage-gated channel subunit alpha-1 E, was shown to be underexpressed in breast cancer compared with normal tissue and was hypothesized to be a tumor suppressor gene in some types of cancer [36]. In the current study, CACNA1E mutation occurred in a hot spot site already reported as altered in at least five different types of cancers. PRKD1 codes for a serine-threonine kinase and mutations all over the gene were described in various types of cancer. A recurrent activating mutation in the kinase domain described in polymorphous low grade adenocarcinoma of salivary glands, was associated with improved metastasis free survival in a transfection cell model [37]. In breast cancer cells however, PRKD1 may display a dual function as an oncogene, stimulating drug resistance in breast cancer stemness [38] or as a tumor suppressor, blocking invasion and metastasis. In our patient, PRKD1 mutation was located in the distal region in the kinase domain. POLD1 codes for the catalytic subunit of DNA polymerase delta, which plays a role in DNA replication and DNA repair [39]. Both germline and somatic gene mutations may cause an ultra-mutated phenotype, and mutations affecting the exonuclease domain are associated with high risk of colorectal and endometrial carcinomas [40]. In our patient, POLD1 amino-acid change occurred in the exonuclease domain. In addition, POLD1 was also mutated in another luminal sample from a very young patient present in COSMIC database [16]. Although infrequent in breast cancer, five of ten POLD1 somatic mutations reported in the COSMIC database were of frameshift nature, therefore, potentially pathogenic (http://cancer.sanger.ac.uk/cosmic) (July, 2017). NDST4 (N-deacetylase/N-sulfotransferase-4), is involved in heparan sulfate (HS) biosynthesis and may be implicated in positive or negative aspects of tumor progression. In colorectal cancer, NDST4 loss of function was implicated in tumor progression and the gene was considered a candidate tumor suppressor [41]. ELMO3 (Engulfment and Motility 3) is involved in induction of cell proliferation, invasion and metastasis in colorectal cancer cells [42]. In addition, ELMO3 positive/higher expression is associated with poor overall survival in non-small cell lung cancer and head and neck cancer, as well as in breast cancer, corroborating its role as an oncogene [43, 44]. MTHFD2 (methylenetetrahydrofolate dehydrogenase (NADP+ dependent) 2) is a source of carbon units for purine synthesis in rapidly growing cancer cells and has been associated with poor prognosis in patients with breast cancer [45, 46]. SEMA6D (Semaphorin 6D) encodes a transmembrane protein and its overexpression increases proliferation and tumor formation, playing an oncogene role in osteosarcoma [47]. SEMA6D high expression is also associated with better patient survival, especially among triple negative breast cancer [48]. The results, considering all the 37 tumors (29 previously analyzed and the eight currently analyzed), suggest that the median number of driver candidates per tumor is six, however, this number is quite variable. Moreover, in luminal tumors from very young patients the most frequent cancer drivers are PIK3CA; GATA3, and TP53. In accordance with a recent analysis that included some of these very young patients (≤ 35 years) but mainly older patients, with ages up to 45 years, the most prevalent mutated genes were also PIK3CA, TP53, GATA3 and TTN [49]. Other genes frequently mutated were CAMK1G (Calcium/Calmodulin Dependent Protein Kinase IG), DALRD3 (DALR Anticodon Binding Domain Containing 3), LYST (Lysosomal Trafficking Regulator) and MAP2K4 (3/37: 8.1%). DALRD3 contains two microRNA (miRNA) precursors (miR-191 and miR-425) in one of its introns and the expression of both microRNAs is higher in estrogen receptor alpha (ER) positive cells. However, estrogen regulation of miR191/425-DALRD3 transcriptional unit is complex and may be unparalleled. Although the exact function DALRD3 is not known, in estrogen receptor positive cells, miR-191/425 works as oncogenes by inducing proliferation. Interestingly, SNVs in DALRD3 detected in two out of three samples from young patients were of the frameshift kind [50]. LYST gene silencing may inhibit cell proliferation and induce apoptosis in myeloma cells [51]. It is worth mentioning that somatic mutations in genes involved in DNA repair mechanisms were quite common and any of these pathways might be altered: base excision repair (BER), nucleotide excision repair (NER), mismatch repair (MMR), homologous recombination repair (HRR) as well as signaling DNA damage to cell cycle checkpoints. The highest number of SNVs was described in two tumors presenting mutations in genes involved in HRR, as well as in other DNA repair mechanisms concomitantly [24]. In accordance, an association between younger age at diagnosis and risk genotypes for genes involved in DNA repair, such as NER, MMR and NHEJ (Non-homologous end-joining) have been already reported [52]. The weaknesses and the strengths of our study involve the number of exomes analyzed, though small, add around 20% of samples to the available data thus far. In summary, in luminal tumors (HER2 negative) from very young patients, the most frequent events were C to T transitions. SNVs were detected in a median number of six potential driver genes per sample, and 43% of the tumors presented mutations in DNA repair genes and 54% of the tumors presented at least one pathogenic mutation in a gene involved in positive regulation of gene transcription. The most frequent somatic mutations involved cancer driver genes, such as PIK3CA, TP53 and GATA3. Other potential driver candidates currently identified were GRHL2, PIK3AP1, CACNA1E and SEMA6D.

MATERIALS AND METHODS

Patients

This study was approved by the Institutional Ethics Committee of Instituto Brasileiro de Controle do Câncer (IBCC) and Instituto do Câncer do Estado de São Paulo (ICESP)/Faculdade de Medicina da Universidade de São Paulo (FMUSP). All patients were informed and signed an informed consent. Early onset breast cancer was defined as a disease diagnosed in very young women aged ≤35 years. No patients received previous medical treatment for their breast cancer before the tumor collection through biopsy or mastectomy procedures. Patients were interviewed for family history suggestive of Hereditary Breast and Ovarian Cancer Syndrome (HBOCS) in close relatives, such as first, second, and third degree family members. Family history was considered informative if the patient could report on at least two first or second degree female relatives having lived beyond age 45 in both parental lineages, otherwise it was considered unknown or limited (National Comprehensive Cancer Network, NCCN, https://www.nccn.org/professionals/physician_gls/pdf/genetics_screening.pdf, (February 2012). Genetic/Familial high-risk assessment: breast and ovarian. Patients were also asked about their ancestry, to obtain information of country or continent where their parents and grandparents (at least) were born. The median age of the 79 patients at diagnosis was 32 years, most of whom diagnosed with invasive ductal carcinoma (91.1%), high histological grade (48%), Ki67 >14% (90.4%), luminal subtype (65.8%; ER and/or PR positive and HER2 negative), and advanced stage disease (clinical stages III/IV; 47.1%) (Supplementary Table 1). HER2 positivity was defined as immunohistochemistry 3+ or 2+, the latter, associated with Fluorescence in situ hybridization (FISH)-amplification. HER2 immunohistochemistry and FISH were scored according to ASCO/CAP guidelines [53]. All women had a blood sample collected for BRCA1 and BRCA2 whole gene sequencing (see below). Among the 79 women, 12 had fresh-frozen tumor samples collected during breast surgery. Among the latter, eight patients, who were BRCA1 and BRCA2 wild type carriers bearing luminal HER2 negative tumors, had their samples subsequently analyzed through whole exome sequencing (see below) (Supplementary Figure 4).

DNA extraction from blood and tumor tissue

DNA was extracted from 8mL of whole blood using the Kit Illustra Blood GenomicPrep Mini Spin Kit (GE Healthcare Bio-Sciences, Pittsburgh, PA, USA/28-9042-64); and from cancer cells enriched areas from fresh-frozen or FFPE samples, using the QIAamp DNA Mini Kit - Qiagen (Qiagen, Valencia, CA, USA/51304) and QIAamp® DNA FFPE Tissue (Qiagen/56404), respectively, following instructions of the manufacturer.

Direct sequencing of BRCA1 and BRCA2 genes

Polymerase chain reaction (PCR) amplification and sanger sequencing

Briefly, the complete coding region of BRCA1 (U14680 or NM_7294.2) and BRCA2 (U43746 or NM_000059.1) genes were amplified and sequenced in both forward and reverse directions. Primers and conditions are described in Supplementary Table 13 for BRCA1 [54, 55] and Supplementary Table 14 for BRCA2 [56]. Sequences obtained were visualized by Chromas (v2.33; Technelysium Pty, Ltd Eden Prairie, MN, USA) and by Mutation Surveyor software (v3.20, SoftGenetics LLC, State College, PA, USA). If a pathogenic mutation was identified, a new DNA sample derived from a second venipuncture was resequenced for confirmation. Full details of methods are given in the Supplementary Methods.

Multiplex ligation-dependent probe amplification (MLPA) of BRCA1 and BRCA2 genes

Samples from patients, who were negative for BRCA1 and BRCA2 pathogenic mutations were investigated for large deletions and duplications, using the MLPA commercial kits SALSA® MLPA® P002 BRCA1 probemix (P002 - 100R) and SALSA® MLPA® P045 BRCA2/CHEK2 probemix (P045 - 100R) (MRC-Holland, Amsterdam, The Netherlands), as described in Supplementary Methods. Sequencing to detect the presence of CHEK2 hot spot (c.1100delC) was also performed.

BRCA1 and BRCA2 sequencing analysis and reporting criteria

All sequence variants were named according to nomenclature used by The Human Gene Mutation Database, HGMD (http://www.hgmd.cf.ac.uk/ac/index.php). The variants were searched for their classification in five publicly accessible databases: Breast Cancer Information Core (BIC) [57], Leiden Open Variation Database (LOVD v3.0 build 13), [58], Leiden Open Variation Database - International Agency for Research on Cancer (LOVD-IARC v.2.0 Build 22), Universal Mutation Database (UMD), [59, 60], and ClinVar [61], this search was performed on the months of April - June 2017. Gene variants were submitted to the following in silico prediction models: Polymorphism Phenotyping (PolyPhen; v2.2.2) [62], Sorting Intolerant From Tolerant (SIFT; v1.0.3) [63], Align-GVGD [64, 65], for missense variants; Protein Variation Effect Analyzer (Provean; v1.1) [66] for in-frame deletions, and Human Splicing Finder [67] to check for intronic and exonic variants leading to potential splicing defects. Minor allele frequency was checked in the 1000 Genomes Project database [68], the Exome Aggregation Consortium (ExAC) [69, 70], the Global MAF dbSNP [71], and the Exome Variant Server, NHLBI GO Exome Sequencing Project (ESP) [72]. The variants were classified according to recommendations of the American College of Medical Genetics and Genomics in: pathogenic, likely pathogenic, benign, likely benign and variant of uncertain significance (VUS) [73]. Variants for BRCA1 were also checked for co-occurrence with known pathogenic mutations in the same patient. If VUS were classified in two of the five databases, and categorized as benign (BIC and ClinVar), no known pathogenicity (LOVD), 1-not pathogenic (LOVD-IARC), or 1-neutral (UMD), they were reclassified as benign.

Exome sequencing

DNA extracted from mononuclear cells and fresh tumor samples (containing at least 70% malignant cells) from eight patients was used to prepare a DNA library with the Illumina Nextera Rapid Capture Expanded kit (Illumina, Inc., San Diego, CA, USA/FC-140-1004), as detailed in Supplementary Methods. Shortly, genomic DNA (gDNA) was enzymatically fragmented while tags were simultaneously added. After purification, a limited-cycle PCR program was performed to ligate adapters and amplify libraries. Once gDNA libraries were prepared, exon-specific capture probes attached to streptavidin beads were used to enrich fragments containing only regions of interest, comprising 201,121 exons, totaling 62 mega base pairs (Mbp) of the genome. Exome libraries were then evaluated on a DNA 1000 Agilent 2100 Bioanalyzer chip (Agilent Technologies, Santa Clara, CA, USA) and quantified using KAPA SYBR FAST qPCR Kits (Kapa Biosystems, Wilmington, MA, USA, part #KK4602) prior to cluster generation. Pooled libraries were loaded on six lanes of one flow cell and sequenced on HiSeq 1000 platform (Illumina, Inc.) using 2 × 100bp paired-end reads, with a median of 95.3% of targeted bases covered at least 30-fold across the sample set.

Exome sequencing analysis

BWA (v0.5.7) [74] software was used to align 8 paired tumor/blood exome samples, using hg19 as the reference genome and Picard (v1.92) to mark duplicates. Paired tumor-normal samples were processed together using GATK (v2.4.9) [75] for local realignment and for base quality recalibration. SAMtools (v0.1.9) and Picard (v1.107) were then used to process the bam headers and to index the samples, respectively [76]. To detect somatic single nucleotide variants (SNVs), SomaticSniper (v1.0.2) [77] was utilized. Default parameters were used to call SNVs, except for the mapping quality threshold, which was set to 1, as recommended by the developer. Standard, LOH, bam-readcount, false positive and lastly high confidence filters were applied using SAMtools (v0.1.6) and scripts provided by the SomaticSniper package. The final VCF file, containing high-confidence somatic SNVs, was used in downstream analyses. An in-house perl- and R- based pipeline was used to identify recurrent mutations. Parameters were set to find genes that were mutated in at least 2 samples. This pipeline uses lists of SNPs compiled from various studies to filter out likely false positive SNPs from the samples, unless they are found in the Catalogue of Somatic Mutations in Cancer (COSMIC v71) database for coding and non-coding mutations [78]. After somatic SNVs were called using SomaticSniper, the SNPs were annotated by ANNOVAR (v2014-07-14) [79], using the RefGene database. Nonsynonymous, stop-loss, stop-gain and splice-site SNVs (based on RefGene annotations) were considered functional. SNVs were filtered using tabixpp (3b299cc) [80], removing SNVs found in any of the following databases: Fuentes, 2012 [81], dbSNP142 [82], 1000 Genomes Project (v3) [68], AccuSNP blacklist (invalidated SNVs from 68 human colorectal cancer exomes (in preparation), generated from GATK (v2.4.9 UG) and AccuSNP platform (Roche NimbleGen) analyses), and ENCODE DAC and Duke [83]. SeqSig (v3.6.4)[84] was used to identify likely driver non-synonymous mutations. This test assumes that for each patient, mutations are independent among nucleotides and homogeneous across all positions on coding regions and compute the background mutation rate for non-synonymous mutations. It uses the convolution law and may be used in situations where samples are not abundant. Discrepancies between the number of genes found in Supplementary Table 15 and that plotted in Figure 2, are due to the collapsing of variants into genes. SnpEff (v4) [85] was then used to predict amino acid changes. Data visualization used the BPG package (v5.2.1) in R [86].

Analysis of somatic variants to identification of candidate driver genes

Genes candidates were then searched for in the “Cancer Gene Census” (CGC) database (http://cancer.sanger.ac.uk/census/) [19] to identify genes causally implicated in cancer, as well as in the “Candidate Cancer Genes Database” (CCGD) (http://ccgd-starrlab.oit.umn.edu/search.php) [21], to identify potential cancer drivers, detected in mouse insertional mutagenesis experiments. In this model, candidate genes were associated with common insertion sites (CIS), which were ranked based either on the number of insertions or the p-value: A for the top 10%; B for the top 11-25%, C for the top 26-50% and D for the bottom 50%. CISs identified in screens that did not include insertion numbers or p-values are denoted as Not Ranked [21]. Afterwards, gene mutations were analyzed through mutation function assessment algorithms: PolyPhen, SIFT, Align GV/GD [62-65], Functional analysis through Hidden Markov Models (FATHMM; v2.3) (http://fathmm.biocompute.org.uk/) [87], and Cancer-Related Analysis of Variants Toolkit (CRAVAT), [88]. This search was performed between April and June 2017, and the latter three algorithms were reviewed in December 2017. We have then developed a scoring system in order to identify potential cancer drivers. The genes found in CGC were scored 3 points; CCGD was scored according to the highest rank for each sample: “A”: 2 points, “B”: 1.5 points, “C”: 1.0 point, “D”: 0.5 point; “Not Ranked” variants were not scored; mutation domain, frequency of the variant in other cancers and/or in breast cancer (≥1%) were scored 0.5 point each; mutation consequence when nonsense or frameshift was scored 1.5 points; mutation function assessment algorithms FATHMM, PolyPhen, SIFT, GV/GD and CRAVAT-CHASM (3.0) were scored by 1 point, if the variant was considered pathogenic at least in 3 of them.

Analysis of somatic variants identified in other published manuscripts and COSMIC database

For this analysis, publicly available data about 29 patients, aged 35 years or younger, was obtained. Most patients (n=28) had data for tumor exome or genome sequencing deposited in the COSMIC database [15-17] (TCGA, 2012, n=16; Nik-Zainal et al., 2016, n=9; Stephens et al., 2012, n=3). Additionally, data for one patient was recovered from a published manuscript [18], which was not available in COSMIC. Only HER2 negative tumors were included. One and four of these patients were BRCA1 and BRCA2 mutation carriers, respectively. BRCA mutation status of the remaining patients was unknown [16, 17]. For the present analysis, among the total number of mutations per patient, repeated substitutions detected in the same chromosomal position were considered only once. In addition, only non-synonymous mutations were contemplated. The list of nonsynonymous variants derived from each tumor was then clustered using the DAVID v6.7 bioinformatics tool (The Database for Annotation, Visualization, and Integrated Discovery) [23], in order to explore its biological meaning. Only one Gene Ontology category (p ≤0.05) or Interpro process (in the absence of GO category) was selected for each tumor sample. If more than one GO category was enriched, the one containing the largest number of genes was chosen. To identify potential cancer driver genes a scoring system has been developed. The genes found in CGC were scored 3.0 points; CCGD was scored according to the highest rank for each sample: “A”: 2 points, “B”: 1.5 points, “C”: 1.0 point, “D”: 0.5 point; “Not Ranked” variants were not scored; mutation domain, frequency of the variant in other cancers and/or in breast cancer (≥1%) were scored 0.5 point each; mutation consequence when nonsense or frameshift was scored 1.5 points; mutation function assessment algorithms FATHMM, PolyPhen, SIFT and CRAVAT-CHASM (3.0) were scored by 0.5 point, if the variant was considered pathogenic at least in 2 of them; were scored by 1 point, if the variant was considered pathogenic in 3 or 4 of them. Gene variants scoring ≥3.5 were considered as candidates for cancer drivers (Supplementary Table 10, 10a). The search in the referred databases and prediction tools was performed for this analysis until December, 2017. Toppgene was used to identify biological processes enriched in the list of genes affected by pathogenic mutations (nonsense and frame shift). (https://toppgene.cchmc.org/enrichment.jsp). Gene ID followed by ENST number was searched using the gene ID without ENST number. Ten functions (biological process) presented more than 10 affected genes and had a p value, Bonferroni and FDR <0.05, including positive regulation of gene expression. Analysis was performed in March 2018.
  82 in total

1.  BRCA1, BRCA2 and TP53 mutations in very early-onset breast cancer with associated risks to relatives.

Authors:  Fiona Lalloo; Jennifer Varley; Anthony Moran; David Ellis; Lindsay O'dair; Paul Pharoah; Antonis Antoniou; Roger Hartley; Andrew Shenton; Sheila Seal; Barbara Bulman; Anthony Howell; D Gareth R Evans
Journal:  Eur J Cancer       Date:  2006-04-27       Impact factor: 9.162

2.  Comprehensive statistical study of 452 BRCA1 missense substitutions with classification of eight recurrent substitutions as neutral.

Authors:  S V Tavtigian; A M Deffenbaugh; L Yin; T Judkins; T Scholl; P B Samollow; D de Silva; A Zharkikh; A Thomas
Journal:  J Med Genet       Date:  2005-07-13       Impact factor: 6.318

3.  Hotspot activating PRKD1 somatic mutations in polymorphous low-grade adenocarcinomas of the salivary glands.

Authors:  Ilan Weinreb; Salvatore Piscuoglio; Luciano G Martelotto; Daryl Waggott; Charlotte K Y Ng; Bayardo Perez-Ordonez; Nicholas J Harding; Javier Alfaro; Kenneth C Chu; Agnes Viale; Nicola Fusco; Arnaud da Cruz Paula; Caterina Marchio; Rita A Sakr; Raymond Lim; Lester D R Thompson; Simion I Chiosea; Raja R Seethala; Alena Skalova; Edward B Stelow; Isabel Fonseca; Adel Assaad; Christine How; Jianxin Wang; Richard de Borja; Michelle Chan-Seng-Yue; Christopher J Howlett; Anthony C Nichols; Y Hannah Wen; Nora Katabi; Nicholas Buchner; Laura Mullen; Thomas Kislinger; Bradly G Wouters; Fei-Fei Liu; Larry Norton; John D McPherson; Brian P Rubin; Blaise A Clarke; Britta Weigelt; Paul C Boutros; Jorge S Reis-Filho
Journal:  Nat Genet       Date:  2014-09-21       Impact factor: 38.330

4.  Architecture of the human regulatory network derived from ENCODE data.

Authors:  Mark B Gerstein; Anshul Kundaje; Manoj Hariharan; Stephen G Landt; Koon-Kiu Yan; Chao Cheng; Xinmeng Jasmine Mu; Ekta Khurana; Joel Rozowsky; Roger Alexander; Renqiang Min; Pedro Alves; Alexej Abyzov; Nick Addleman; Nitin Bhardwaj; Alan P Boyle; Philip Cayting; Alexandra Charos; David Z Chen; Yong Cheng; Declan Clarke; Catharine Eastman; Ghia Euskirchen; Seth Frietze; Yao Fu; Jason Gertz; Fabian Grubert; Arif Harmanci; Preti Jain; Maya Kasowski; Phil Lacroute; Jing Jane Leng; Jin Lian; Hannah Monahan; Henriette O'Geen; Zhengqing Ouyang; E Christopher Partridge; Dorrelyn Patacsil; Florencia Pauli; Debasish Raha; Lucia Ramirez; Timothy E Reddy; Brian Reed; Minyi Shi; Teri Slifer; Jing Wang; Linfeng Wu; Xinqiong Yang; Kevin Y Yip; Gili Zilberman-Schapira; Serafim Batzoglou; Arend Sidow; Peggy J Farnham; Richard M Myers; Sherman M Weissman; Michael Snyder
Journal:  Nature       Date:  2012-09-06       Impact factor: 49.962

5.  NDST4 is a novel candidate tumor suppressor gene at chromosome 4q26 and its genetic loss predicts adverse prognosis in colorectal cancer.

Authors:  Sheng-Tai Tzeng; Ming-Hong Tsai; Chi-Long Chen; Jing-Xing Lee; Tzu-Ming Jao; Sung-Liang Yu; Sou-Jhy Yen; Ya-Chien Yang
Journal:  PLoS One       Date:  2013-06-25       Impact factor: 3.240

6.  The genomic ancestry of individuals from different geographical regions of Brazil is more uniform than expected.

Authors:  Sérgio D J Pena; Giuliano Di Pietro; Mateus Fuchshuber-Moraes; Julia Pasqualini Genro; Mara H Hutz; Fernanda de Souza Gomes Kehdy; Fabiana Kohlrausch; Luiz Alexandre Viana Magno; Raquel Carvalho Montenegro; Manoel Odorico Moraes; Maria Elisabete Amaral de Moraes; Milene Raiol de Moraes; Elida B Ojopi; Jamila A Perini; Clarice Racciopi; Andrea Kely Campos Ribeiro-Dos-Santos; Fabrício Rios-Santos; Marco A Romano-Silva; Vinicius A Sortica; Guilherme Suarez-Kurtz
Journal:  PLoS One       Date:  2011-02-16       Impact factor: 3.240

7.  Voltage-gated calcium channels: Novel targets for cancer therapy.

Authors:  Nam Nhut Phan; Chih-Yang Wang; Chien-Fu Chen; Zhengda Sun; Ming-Derg Lai; Yen-Chang Lin
Journal:  Oncol Lett       Date:  2017-06-22       Impact factor: 2.967

8.  An integrated map of genetic variation from 1,092 human genomes.

Authors:  Goncalo R Abecasis; Adam Auton; Lisa D Brooks; Mark A DePristo; Richard M Durbin; Robert E Handsaker; Hyun Min Kang; Gabor T Marth; Gil A McVean
Journal:  Nature       Date:  2012-11-01       Impact factor: 49.962

9.  Predicting the functional, molecular, and phenotypic consequences of amino acid substitutions using hidden Markov models.

Authors:  Hashem A Shihab; Julian Gough; David N Cooper; Peter D Stenson; Gary L A Barker; Keith J Edwards; Ian N M Day; Tom R Gaunt
Journal:  Hum Mutat       Date:  2012-11-02       Impact factor: 4.878

10.  ClinVar: public archive of relationships among sequence variation and human phenotype.

Authors:  Melissa J Landrum; Jennifer M Lee; George R Riley; Wonhee Jang; Wendy S Rubinstein; Deanna M Church; Donna R Maglott
Journal:  Nucleic Acids Res       Date:  2013-11-14       Impact factor: 16.971

View more
  13 in total

Review 1.  Grainyhead-like transcription factors in cancer - Focus on recent developments.

Authors:  Grzegorz Kotarba; Agnieszka Taracha-Wisniewska; Tomasz Wilanowski
Journal:  Exp Biol Med (Maywood)       Date:  2020-02-02

2.  Expression of Nrf2 and NF-κB transcription factors in breast cancer and breast fibroadenoma: Insights for a new therapeutic approach.

Authors:  Camila Maria Simplicio-Revoredo; Renato de Oliveira Pereira; Mariella de Almeida Melo; Pedro Vitor Lopes-Costa; Paulo de Tarso Moura-Borges; Emerson Brandão Sousa; Fidelis Manes Neto; Viriato Campelo; Ione Maria Ribeiro Soares-Lopes; Maria da Conceição Barros-Oliveira; Cleciton Braga Tavares; Alesse Ribeiro Dos Santos; Camila Guedes Borges de Araújo; Eid Gonçalves Coelho; Larysse Cardoso Campos-Verdes; Aldenora Oliveira do Nascimento-Holanda; Jackeline Lopes Viana; Maria Liduina Meneses Bezerra-Chaves; Rodrigo José de Vasconcelos-Valença; Lina Gomes Dos Santos; Lauro Rodolpho Soares-Lopes; André Luiz Pinho-Sobral; Luiz Henrique Gebrim; Benedito Borges da Silva
Journal:  Oncotarget       Date:  2020-05-05

Review 3.  Teneurins: Role in Cancer and Potential Role as Diagnostic Biomarkers and Targets for Therapy.

Authors:  Giulia Peppino; Roberto Ruiu; Maddalena Arigoni; Federica Riccardo; Antonella Iacoviello; Giuseppina Barutello; Elena Quaglino
Journal:  Int J Mol Sci       Date:  2021-02-26       Impact factor: 5.923

4.  Landscape of somatic mutations in breast cancer: new opportunities for targeted therapies in Saudi Arabian patients.

Authors:  Duna H Barakeh; Rasha Aljelaify; Yara Bashawri; Amal Almutairi; Fatimah Alqubaishi; Mohammed Alnamnakani; Latifa Almubarak; Abdulrahman Al Naeem; Fatema Almushawah; May Alrashed; Malak Abedalthagafi
Journal:  Oncotarget       Date:  2021-03-30

5.  MiR-195 and Its Target SEMA6D Regulate Chemoresponse in Breast Cancer.

Authors:  Diana E Baxter; Lisa M Allinson; Waleed S Al Amri; James A Poulter; Arindam Pramanik; James L Thorne; Eldo T Verghese; Thomas A Hughes
Journal:  Cancers (Basel)       Date:  2021-11-28       Impact factor: 6.639

6.  Germline Variants in Cancer Genes from Young Breast Cancer Mexican Patients.

Authors:  Liliana Gómez-Flores-Ramos; Angélica Leticia Barraza-Arellano; Alejandro Mohar; Miguel Trujillo-Martínez; Lizbeth Grimaldo; Rocío Ortiz-Lopez; Víctor Treviño
Journal:  Cancers (Basel)       Date:  2022-03-24       Impact factor: 6.639

7.  Bisphenol A replacement chemicals, BPF and BPS, induce protumorigenic changes in human mammary gland organoid morphology and proteome.

Authors:  Juliane Winkler; Pengyuan Liu; Kiet Phong; Johanna H Hinrichs; Nassim Ataii; Katherine Williams; Elin Hadler-Olsen; Susan Samson; Zev J Gartner; Susan Fisher; Zena Werb
Journal:  Proc Natl Acad Sci U S A       Date:  2022-03-09       Impact factor: 11.205

8.  Comprehensive Cohort Analysis of Mutational Spectrum in Early Onset Breast Cancer Patients.

Authors:  Mohit K Midha; Yu-Feng Huang; Hsiao-Hsiang Yang; Tan-Chi Fan; Nai-Chuan Chang; Tzu-Han Chen; Yu-Tai Wang; Wen-Hung Kuo; King-Jen Chang; Chen-Yang Shen; Alice L Yu; Kuo-Ping Chiu; Chien-Jen Chen
Journal:  Cancers (Basel)       Date:  2020-07-28       Impact factor: 6.639

9.  Genetic alterations associated with multiple primary malignancies.

Authors:  Jenny Nyqvist; Anikó Kovács; Zakaria Einbeigi; Per Karlsson; Eva Forssell-Aronsson; Khalil Helou; Toshima Z Parris
Journal:  Cancer Med       Date:  2021-05-31       Impact factor: 4.452

10.  Germline and Somatic mutations in postmenopausal breast cancer patients.

Authors:  Tauana Rodrigues Nagy; Simone Maistro; Giselly Encinas; Maria Lucia Hirata Katayama; Glaucia Fernanda de Lima Pereira; Nelson Gaburo-Júnior; Lucas Augusto Moyses Franco; Ana Carolina Ribeiro Chaves de Gouvêa; Maria Del Pilar Estevez Diz; Luiz Antonio Senna Leite; Maria Aparecida Azevedo Koike Folgueira
Journal:  Clinics (Sao Paulo)       Date:  2021-07-16       Impact factor: 2.365

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.