Literature DB >> 30545124

Use of Germline Genetic Variability for Prediction of Chemoresistance and Prognosis of Breast Cancer Patients.

Viktor Hlavac1, Maria Kovacova2, Katerina Elsnerova3,4, Veronika Brynychova5, Renata Kozevnikovova6, Karel Raus7, Katerina Kopeckova8, Sona Mestakova9, David Vrana10, Jiri Gatek11, Pavel Ostasov12, Radka Vaclavikova13, Pavel Soucek14.   

Abstract

The aim of our study was to set up a panel for targeted sequencing of chemoresistance genes and the main transcription factors driving their expression and to evaluate their predictive and prognostic value in breast cancer patients. Coding and regulatory regions of 509 genes, selected from PharmGKB and Phenopedia, were sequenced using massive parallel sequencing in blood DNA from 105 breast cancer patients in the testing phase. In total, 18,245 variants were identified of which 2565 were novel variants (without rs number in dbSNP build 150) in the testing phase. Variants with major allele frequency over 0.05 were further prioritized for validation phase based on a newly developed decision tree. Using emerging in silico tools and pharmacogenomic databases for functional predictions and associations with response to cytotoxic therapy or disease-free survival of patients, 55 putative variants were identified and used for validation in 805 patients with clinical follow up using KASPTM technology. In conclusion, associations of rs2227291, rs2293194, and rs4376673 (located in ATP7A, KCNAB1, and DFFB genes, respectively) with response to neoadjuvant cytotoxic therapy and rs1801160 in DPYD with disease-free survival of patients treated with cytotoxic drugs were validated and should be further functionally characterized.

Entities:  

Keywords:  breast cancer; chemoresistance; in silico prediction; next generation sequencing; pharmacogenomics

Year:  2018        PMID: 30545124      PMCID: PMC6316878          DOI: 10.3390/cancers10120511

Source DB:  PubMed          Journal:  Cancers (Basel)        ISSN: 2072-6694            Impact factor:   6.639


1. Introduction

Breast cancer is the most frequent cancer in women worldwide [1]. The efficacy of breast cancer therapy is associated with a number of cellular processes that in some cases lead to tumor resistance. Among other factors, inactivation of anticancer drugs by biotransformation enzymes, decreased uptake and/or increased efflux of drugs, changes in cell-cycle checkpoints, increased DNA repair or reduced cell death, and cellular compartmentalization may contribute to the development of multidrug resistance [2]. The majority of currently used cytotoxic drugs are metabolized by biotransformation enzymes in liver and extrahepatic tissues. Biotransformation often leads to inactivation of drugs which become more polar to allow for body elimination. On the other hand, prodrugs are designed to be activated via biotransformation in the first place and then follow the same principles of metabolism as drugs. Consequently, germline genetic variability in biotransformation enzymes is considered as important factor determining individual patient sensitivity to an administered drug. These enzymes are in general divided into phase I (activation) and phase II (conjugation) enzymes [3]. Cytochromes P450 (CYP) constitute a major group of (in)activation enzymes in phase I whereas phase II enzymes are more heterogeneous. Numerous pharmacogenomic studies in breast cancer patients concentrated mostly on analysis of selected polymorphisms in single or several genes from all biotransformation phases (reviewed in [4]), but a comprehensive germline genetic variability screen of the majority of these enzymes in breast cancer patient cohorts is virtually missing. Since drug efflux is mediated by membrane-bound ATP-binding cassette (ABC) transporters [5] and drug uptake is provided by solute carrier (SLC) transporters [6] it seems obvious that equilibrium of these exporters/importers is important for prediction of cancer drug resistance [7,8]. Indeed, comprehensive transcriptomic profiling studies demonstrated gene expression deregulations of a number of ABCs and SLCs between tumor and paired non-malignant tissues from patients with solid tumors, e.g., colorectal [9], breast [10], pancreatic [11,12], and ovarian [13] suggesting their potential role in cancer progression. Moreover, these studies revealed a number of associations between gene expression levels of transporters, therapy response and survival of the patients with implications for prognosis and individualized therapy. Data from publicly available large-scale sequencing studies have shown that genetic alterations in drug targets, cell death and major cancer driving pathways, e.g., PI3K/AKT/MTOR or RAS/MAPK, and nuclear receptors can be found across all cancer types; however, at highly variable frequencies [14]. Very recently, highly frequent deleterious somatic mutations relevant for clinical management, including PIK3CA, RTK/RAS/MAPK and cell cycle pathway genes, were found in inflammatory breast cancer patients through next generation sequencing analysis [15]. Pharmacogenomics represents an important tool for personalized medicine. Two major types of studies may be found in the published literature dealing with the issue of genetic susceptibility and drug response in oncology. First, studies of germline genetic variation, mainly polymorphisms, in homogeneous groups of patients treated with defined drug regimens [16]. Second, in vitro screening of drug response in human cancer cell lines with well characterized somatic genetic profile also helped to elucidate the genetic background of chemoresistance [17,18]. However, recent data from analysis of sensitivity of 993 cell lines to 265 drugs show that germline genetic variability can be of the same importance as somatic one [19]. Thus, continuing with studies in patients is imperative for further understanding and subsequent translation of these aspects into clinical setting. There are several genome-wide association studies (GWAS) in the literature demonstrating the power of pharmacogenomics in breast cancer [20] and accelerated implementation of the next generation sequencing into clinical studies will undoubtedly bring further progress in this area. Very recently, analysis of available big data demonstrated that priority pharmacogenes for population-adjusted genetic profiling exist with highly variable distribution across populations [21], suggesting that use of sequencing-based approaches may enable “true personalized medicine”. Here, we explored the genetic variability of a panel of 509 genes relevant for pharmacogenomics using targeted sequencing in a testing set of patients treated with neoadjuvant or adjuvant cytotoxic therapy. Genetic variants significantly associating with therapy outcome measured as clinical response in neoadjuvant setting or disease-free survival (DFS) of the patients were evaluated in a larger validation set of patients. To our knowledge, this is the first research study providing genetic data with association to drug chemoresistance evaluated as prognosis and therapy outcome of breast cancer patients with aid of in silico prediction in the Czech population to such an extent. The validated variants may further be used for functional studies and prospective follow up trials evaluating their prognostic and predictive utility in clinical setting.

2. Results

2.1. Testing Phase

The clinical characteristics of the patients in the testing set (n = 105) are shown in Table 1. Patients were treated with neoadjuvant cytotoxic therapy and/or with adjuvant therapy following surgical treatment. Cytotoxic therapy was based on monotherapy or combinations of anthracyclines, cyclophosphamide, 5-fluorouracil and taxanes (Table S1). The mean follows up of the patients was 70 ± 28 months. One patient was lost to follow up.
Table 1

Clinical data of patient in the testing set.

CharacteristicsPatients, n (%) 1
Age at diagnosis, mean ± S.D. 2 (years)51.7 ± 9.4
Menopausal status
Premenopausal46 (46)
Postmenopausal55 (55)
Missing data4
Tumor size (pT)
pTis8 (8)
pT150 (48)
pT240 (39)
pT35 (5)
pTX2
Lymph node metastasis (pN)
Absent (pN0)68 (65)
Present (pN1–3)37 (35)
Pathological stage
SI46 (44)
SII47 (45)
SIII12 (11)
Histological type
Invasive ductal carcinoma88 (84)
Other type17 (16) 4
Pathological grade (G)
G111 (11)
G235 (35)
G354 (54)
GX5
Estrogen receptor status
Positive38 (38)
Negative61 (62)
Missing data6
Progesterone receptor status
Positive39 (39)
Negative60 (61)
Missing data6
Expression of HER2
Positive2 (2)
Negative97 (97)
Missing data6
Expression of Ki-67, mean ± S.D. 2 (%)32.9 ± 20.3
Molecular subtype
Luminal A15 (16)
Luminal B23 (24)
Triple negative58 (60)
Missing data9
Response to neoadjuvant cytotoxic therapy
Complete or partial response47 (69)
Stable disease or progression21 (31)
Not applicable 337

Footnotes: 1 Number of patients with % in parentheses; 2 S.D. = standard deviation; 3 patients treated with adjuvant therapy without neoadjuvant cytotoxic therapy; 4 six lobular, six medullary, two metaplastic, one mucinous, one papillary, and one neuroendocrine invasive carcinomas.

In these patients, a panel of 509 genes (Table S2) representing major drug metabolizing and transporting enzymes, nuclear receptors, cell death, chemotherapy target, and signaling pathway genes (Figure S1), selected using PharmGKB (www.pharmgkb.com) and Phenopedia (https://phgkb.cdc.gov) databases was assessed using targeted sequencing.

2.1.1. Targeted Sequencing, Processing and Quality Control of Raw Data

Processing of raw reads, quality control, filtering and annotation of the variants is depicted in Figure 1.
Figure 1

Pipeline for processing and quality control of raw sequencing data.

Quality control of the reads was performed in FastQC program and coverage was calculated after raw reads preprocessing (trimming and duplicate removal) by GATK 3.7. The average coverage was 76.9 ± 19.3 and 94% of the captured regions were covered at least 10 times. Altogether, we found 18,245 variants in exonic and adjacent intronic sequences. Of the total number 509 genes, 503 genes (99%) contained at least one genetic alteration. No alterations were found in ABCF1, HSPA1A, RXRB, TAP1 (ABCB2), TAP2 (ABCB3) and VDAC1P4 genes. On the other hand, the most polymorphic genes with over one hundred alterations were NCOR2, ABCA13, RPTOR, ABCA4, CIT, BIRC6, ABCC1, ABCA1, RXRA, NCOR1, ABCA7, ABCC4 and ABCB5. Of the total number 18,245 variants, 3256 were in exons, 9458 intronic, and 3872 were in 3′UTR or 5′UTR regions according to NCBI Reference Sequence Database, RefSeq (RefSeq; https://www.ncbi.nlm.nih.gov/refseq/) in Annovar (Table 2).
Table 2

Overview of the observed alterations in breast cancer patients by function according to Annovar.

TypeTotalPercentage
Downstream 13531.9
Exonic (coding)325617.8
Intergenic3722.0
Intronic945851.9
Splicing 2400.2
Upstream 14142.3
UTR3310617.0
UTR57664.2
Other4802.7

Footnotes: 1 Variant is within 1 kb region downstream/upstream of transcription end site; 2 Variant is within 2 bp of a splicing junction.

7539 variants (41%) had minor allele frequency (MAF) > 0.05; the rest, 10,706 variants, had MAF 0.05 or below. On average, each patient showed 3792 ± 240 variants. We found 88 loss of function truncating variants that were either affecting the stop codon (gain or loss) or frameshift insertions or deletions (indels). 1646 of the variants were non-synonymous single nucleotide variants (SNVs) and 1455 were synonymous SNVs (Table 3).
Table 3

Overview of the observed exonic alterations in breast cancer patients by coding consequence.

ClassificationCountPercentage
Frameshift deletion240.7
Frameshift insertion220.7
Non-frameshift deletion170.5
Non-frameshift insertion80.2
Non-synonymous SNV164650.6
Stopgain401.2
Stoploss20.1
Synonymous SNV145544.7
Unknown421.3
Altogether, 2565 (14%) of the variants were novel (i.e., not found in dbSNP Build 150). The distribution of the variants and position in protein according to their functional classes and frequencies of novel variants in the groups of genes are shown in Figure 2.
Figure 2

Distribution of alterations in the studied groups of genes. The picture shows: (a) the frequency of genetic alterations according to their functional classes; (b) The frequency of genetic alterations according to their exonic functional classes analyzed by RefSeq: NCBI Reference Sequence Database (https://www.ncbi.nlm.nih.gov/refseq/) shown according to the groups of studied genes; (c) The number of novel variants according to the groups of genes. The number of the variants normalized to the counts of genes per each group are shown for each plot on the right axis and depicted by the overlaid line. Column plots for all gene groups are shown in Figure S2.

2.1.2. Prioritization of Variants for the Validation Phase

Variants with MAF > 0.05 were considered relevant to achieve adequate statistical power for variant interpretation. Variants that were not in Hardy-Weinberg equilibrium (n = 842) were excluded from analyses. In addition, variants with the missing data in more than 50% patients were excluded (n = 432). Further filtration parameters were applied (see Section 4 Materials and Methods) and these resulted in set of 5875 variants. In these variants, the associations with response of patients to neoadjuvant cytotoxic therapy and survival of patients were assessed in order to reveal genetic alterations with putative functional effect in vivo. We found 327 variants (two novel) associated with the response to neoadjuvant cytotoxic therapy and 418 (three novel) variants associating with DFS (Table S3). Using Kaplan-Meier plots for variants associated with DFS, gene dosage relationship was evaluated. Those variants in which heterozygous genotype had the most pronounced effect (compared to both homozygotes) were excluded and a final set of statistically significant variants with clinical associations was built. The testing set was composed of both neoadjuvantly and adjuvantly treated patients. Therefore, we divided the testing set into two corresponding subsets and computed the DFS separately. The neoadjuvant subset (n = 68) comprised three molecular subtypes (luminal A, luminal B and triple negative; data were missing for nine patients) while the adjuvant subset (n = 37) comprised only patients with triple negative tumors. Due to this unevenness, we have also analyzed the associations of variants with molecular subtypes and displayed these for comparison in Table S3. In order to select the most relevant functional alterations from the statistically significant set of variants we down-sampled the results using information from in silico predictions and according to confirmed pharmacogenomic and clinical evidence. Annotations were conducted by Annovar and Variant Effect Predictor (VEP). Choice of in silico tools was based on the scope of the prediction for given software with the intention to ensure annotation for all types of coding and noncoding alterations (see Figure 3 and Section 4 Materials and Methods). Pharmacogenomic evidence was based mostly on PharmGKB database of published phenotypes (manual data curation is depicted in Figure S3). Variants with records in ClinVar indicating drug or any disease association were considered pathogenic. Additionally, in cases where information regarding drug response was available in ClinVar and/or dbSNP databases, a match with PharmGKB data provided an extent of evidence and level of priority. All variants significantly associated with response to neoadjuvant cytotoxic therapy of DFS were compared with records in these databases and variants were ordered by the level of priority (Table 4).
Figure 3

Variant prioritization scheme. Numbers of unique variants shared with statistically significant results are depicted in brackets. Statistical analysis using germline variants from targeted sequencing and clinical data of patients with Hardy-Weinberg equilibrium (HWE, p > 0.01) and Minor allele frequency (MAF > 0.05) was conducted to search for drug response (A) and (B) disease-free survival (DFS) associations. In silico prediction was applied on synonymous (sSNVs) and non-synonymous (nsSNVs) single nucleotide variants and indels from next generation sequencing (NGS) data in VCF format using several web-based and command-line software tools (see Section 4 Materials and Methods). In pharmacogenomic (PGx) databases with no batch or download option, only statistically significant variants were considered for manual curation (*). GRCh37 = Genome Reference Consortium Human Build 37 (hg19); TFBS = transcription factor binding site; TFBP = transcription factor binding profile.

Table 4

Priorities and data input used for prioritization of variants for the validation phase.

Data InputPriority
Variant functionalityHighestHighMediumLow
Response or disease-free survivalsignificantsignificantsignificantsignificant
PharmGKBassociatedassociatedno datano data
ClinVardrug response or cancer/neoplasmno datadrug response or cancer/neoplasmno data
In silico prediction calldeleteriousdeleterious/neutraldeleteriousneutral
Cancer related functionalitynononoyes
Following these priorities, 58 variants (56 SNVs and two indels; Table 5) were selected for validation in a larger cohort (n = 805) of breast cancer patients. The overall variant prioritization scheme is depicted in Figure 3.
Table 5

Prioritized variants for the validation phase.

GeneHGVS CodingHGVS ProteinClassification in Annovar 1Rs IDClinVarDFS 2Response 2Function 3AF 4ExAC 5NCMG 6
ABCA4 c.5603A>Tp.N1868INSrs1801466likely benign0.03 D; CADD0.1050.0660.065
ABCB1 intronicrs2032583 0.03 PA1661573170.1060.1230.093
ABCB5 intronicrs11983326 0.040.038DFS & response0.2790.2970.246
ABCB8 intronicrs3214587 0.01 miR-670-3p0.115
ABCC1 c.*1512T>C UTR3rs212091 0.05 PA1661549870.180 0.114
ABCC1 c.*543C>T UTR3rs3743527 0.05 PA1661550490.205
ABCC3 intronicrs4148413 0.0021f, PINES0.168
ABCC6 c.2835C>Tp.P945Psynonymousrs2856585pathogenic0.03 ClinVar0.0640.0990.044
AHRR intronicrs2013782 0.02 1f0.5870.6230.597
AKR7A2 c.424G>Ap.A142TNSrs1043657 0.01 PA166161794; CADD0.0950.0930.081
AKT1 intronicrs3803304 0.05 PA166154802; 1f0.292 0.289
ATP7A c.2299G>C p.V767L NS rs2227291 benign 0.003 PA166157866 0.260 0.217 0.225
BAK1 dist = 114 downstreamrs210134 0.0331f, PINES0.750
BIRC7 c.528C>Tp.S176Ssynonymousrs2273487 <0.001 1b0.4860.4670.464
BLK c.-53667C>T UTR5rs922483benign 0.0231f0.229
CDA c.79A>Cp.K27QNSrs2072671 0.01 PA1661536670.3620.3430.296
CES1 c.-75T>G UTR5rs3815583 0.038PA1661550580.202.0.162
CES1 c.224G>Ap.S75NNSrs2307240 0.01 PA1661550390.0670.0540.063
CES1 intronicrs76336259 0.0010.046DFS & response0.063 0.060
CMPK1 c.22G>Cp.G8RNSrs7543016 0.05 PA1661537930.4510.5380.320
CYP2C9 intronicrs1934969 0.03 PA1661539860.613 0.658
CYP2D6 c.100C>Tp.P34SNSrs1065852likely benign 0.021PA166156062; PharmVar0.2140.2490.206
CYP2D6 c.985 + 39G>A ncRNA_intronicrs28371725 0.011PA1661561550.0590.0950.084
CYP2E1 c.1263T>Cp.F421Fsynonymousrs2515641benign0.03 PA1661540170.8560.8870.880
CYP2E1 intronicrs2070677 0.05 1f0.856 0.867
CYP4F12 intronicrs12460651 0.020.029DFS & response0.882
DFFB intronic rs4376673 0.04 0.047 DFS & response; PINES 0.909 0.947 0.901
DPYD c.2194G>A p.V732I NS rs1801160 likely_benign 0.02 PA166153647 0.052 0.047 0.045
DPYD c.1896T>Cp.F632Fsynonymousrs17376848likely_benign0.03 PA166153874; CADD0.0570.0370.047
DPYD c.496A>Gp.M166VNSrs2297595drug_response PA166153696; D0.1480.1030.125
DPYS intronicrs2669429not_provided 0.01PA1661575790.540 0.557
ENOSF1 intronicrs2612083 0.01 1f0.381.0.322
EPHX2 c.662G>Ap.R221QNSrs751141risk_factor0.03 sequence in silico tools set0.1140.0950.115
EPHX2 dist = 55 downstreamrs4149259 0.03 1f0.167
ESR2 c.*39G>A UTR3rs4986938 0.05PA1661548050.3710.3790.347
GSTA1 c.-9630T>C UTR5rs3957357 0.047PA1661570940.591
GSTA2 c.335G>Cp.S112TNSrs2180314 0.017PA1661570200.5920.5900.581
GSTP1 c.313A>Gp.I105VNSrs1695drug_response0.05 PA1661542490.3290.3190.320
GSTP1 intronicrs762803 <0.001 1f; PINES0.3860.4060.323
IRS1 c.*4476A>G UTR3rs2288587 0.05 1f0.057
KCNAB1 intronic rs2293194 0.04 0.013 DFS & response 0.476 0.515
MADD intronicrs10501320 0.05 IW; PINES0.281
NR5A2 dist = 45 upstreamrs2816948 0.050.036DFS & response; PINES0.130
PIK3C2G c.2732C>Tp.P911LNSrs12312266 0.04 sequence in silico tools set0.2050.2980.242
PIP4K2B intronicrs2075061 0.02 1f0.605 0.592
PPARA c.*5977G>A UTR3rs9626814 0.0191d0.101 .
PPARG c.1347C>Tp.H449Hsynonymousrs3856806likely_benign 0.046PA1661563880.1200.1250.133
RALBP1 c.*756G>A UTR3rs3322 0.03 1f0.0950.0920.072
RARB c.*1287T>G UTR3rs1058378 0.013IW, miR-6650.091 0.083
RPTOR c.90T>Cp.F30Fsynonymousrs61750765 0.03 IW0.2380.1260.142
RRAGD c.*1105T>A UTR3rs1555403 0.0191f0.238 .
SLC22A1 c.1222A>Gp.M408VNSrs628031 0.028PA1661569330.6000.5920.605
SLC28A3 c.338A>G p.Y113C NS rs10868138 0.022 PA166157820 0.067 0.085 0.091
SLC2A1 c.*462G>C UTR3rs4658benign0.01 PA1661535440.210 0.178
SLCO1A2 c.-189_-188insA UTR5rs3834939 0.05 PA1661636000.295
SLCO1C1 intronicrs34288910 0.030.028DFS & response0.1440.1280.152
TUBB1 c.*817G>C UTR3rs10485828 0.03 PA1661559650.212
UGT2A1 c.949G>Ap.G317RNSrs4148301 0.033sequence in silico tools set0.1100.0960.104

Validated variants (Section 2.2.2.) in bold. Variants rs3815583, rs1065852, and rs3322 were excluded due to technical failure. Footnotes: 1 NS = non-synonymous; 2 p-value provided for clinical associations; 3 Prediction based on combination of pharmacogenomic databases, e.g., PharmGKB (“PA” designation stands for specific diseases, genes and drugs in the database) and in silico individual tools, e.g., TargetScan (micro RNA target prediction), metaLR and metaSVM (D = deleterious), CADD (cut-off value ≥ 19), Nexus IW score under 0.01 (IW), PINES (p-value ≤ 0.05) or Regulome DB (score provided), and sequence in silico tools set (see Section 4 Materials and Methods and data provided in Table S4); 4 AF = non-reference allelic frequencies in the testing set; 5 Exome Aggregation Consortium (ExAc), allelic frequencies in European non-Finnish population; 6 National Center for Medical Genomics (NCMG), allelic frequencies in general Czech population.

2.2. Validation Phase

The clinical characteristics of the patients in the validation set (n = 805) are shown in Table 6.
Table 6

Clinical data of patients in the validation set.

CharacteristicsPatients, n (%) 1
Age at diagnosis, mean ± S.D. 2 (years)58.9 ± 12.5
Menopausal status
Premenopausal197 (25)
Postmenopausal590 (75)
Missing data18
Tumor size (pT)
pTis65 (8)
pT1489 (62)
pT2208 (27)
pT318 (2)
pT410 (1)
pTX15
Lymph node metastasis (pN)
Absent (pN0)509 (67)
Present (pN1-3)253 (33)
pNX43
Pathological stage
S061 (8)
SI358 (47)
SII282 (37)
SIII67 (9)
Not determined37
Histological type
Invasive ductal carcinoma598 (25)
Other type197 (75)
Missing data10
Pathological grade (G)
G1177 (23)
G2385 (50)
G3209 (27)
GX34
Estrogen receptor status
Positive618 (77)
Negative181 (23)
Missing data6
Progesterone receptor status
Positive579 (73)
Negative220 (27)
Missing data6
Expression of HER2
Positive194 (24)
Negative602 (76)
Missing data9
Expression of Ki-67, mean ± S.D. 2 (%)23.3 ± 22.6
Molecular subtype
Luminal A330 (41)
Luminal B313 (39)
Triple negative93 (12)
HER263 (8)
Missing data6
Response to neoadjuvant cytotoxic therapy
Complete or partial response127 (75)
Stable disease or progression43 (25)
Not applicable 3635

Footnotes. 1 Number of patients with % in parentheses; 2 S.D. = standard deviation; 3 patients treated with adjuvant therapy without neoadjuvant cytotoxic therapy.

Patients were treated with neoadjuvant cytotoxic therapy and/or with adjuvant therapy following surgical treatment. Cytotoxic therapy was based on the same drug combinations as in the testing set (Table S5). A small fraction of patients with localized disease and good prognosis was not treated with cytotoxic or hormonal therapy (n = 83). The mean follows up of the patients was 76 ± 30 months. Sixty patients were lost to follow up and could not be further evaluated in survival analyses. All clinical characteristics were tested as modifiers of survival functions. High tumor grade (p = 0.008), advanced disease stage (p < 0.001), and the lack of expression of estrogen (p = 0.004) and progesterone (p = 0.013) receptors predicted worse DFS. Thus, these clinical factors were subsequently used for adjustment of multivariate survival analyses. Analogously, all clinical characteristics were tested on association with response. Only the advanced disease stage was a modifier of therapy response (p < 0.001) and was used for adjustment of multivariate analysis.

2.2.1. Genotyping

Together, 58 variants were assessed by KASPTM method in 805 DNA samples within the validation phase. Despite several attempts to optimize detection techniques, three variants (rs1065852, rs3815583 and rs3322) failed to perform consistently and could not be further evaluated for clinical associations. No variants significantly departed from Hardy-Weinberg equilibrium (p > 0.01). Out of theoretical 55 × 805 (44,275) data points, 257 (less than 1%) were missing due to uncertainty in genotype calling or absent signal. Highest missing data rate among individual variants was 30 (3.7%) and 800 samples had less than 10% of missing data. For 86 samples from the testing phase results of KASPTM genotyping were available for validation of the sequencing results. Overall non-reference discordance rate across 55 × 86 (4730) data points was 2.3 with all but one mismatches in heterozygotes (n = 45). Also, the MAFs of variants in the validation set did not substantially differ from those observed in the testing set (Table 7).
Table 7

Distribution of genotypes for variants assessed in the validation phase.

GeneSNVGenotypes 1MAF 2
Common HomozygousHeterozygousRare Homoz YgousValidation Set (n = 805)Testing Set (n = 105)
AKR7A2 rs104365766013660.090.10
TUBB1 rs10485828521252270.190.21
MADD rs10501320428318510.260.28
RARB rs105837866613420.090.09
SLC28A3 rs1086813868110880.080.07
ABCB5 rs11983326409332590.280.28
PIK3C2G rs12312266462290440.240.21
CYP4F12 rs1246065168311180.080.12
RRAGD rs1555403432318480.260.24
GSTP1 rs1695366355760.320.33
DPYD rs173768487247620.050.06
DPYD rs18011607297400.050.05
ABCA4 rs180146667811740.080.11
CYP2C9 rs19349692813871350.410.39
AHRR rs2013782312389970.370.41
ABCB1 rs203258363915860.100.11
CYP2E1 rs207067762616490.110.14
CDA rs2072671360366660.310.36
PIP4K2B rs20750613013831190.390.40
BAK1 rs210134398335620.290.25
ABCC1 rs212091598186140.130.18
GSTA2 rs21803142514091410.430.41
ATP7A rs2227291499262370.210.26
BIRC7 rs22734872274051610.460.49
IRS1 rs22885877355930.040.06
KCNAB1 rs22931942293731980.480.48
DPYD rs2297595617169160.130.15
CES1 rs23072406997210.050.07
CYP2E1 rs251564162716690.110.14
ENOSF1 rs26120833323701010.360.38
DPYS rs26694292464211350.430.46
NR5A2 rs2816948619165110.120.13
CYP2D6 rs28371725672112110.080.06
ABCC6 rs28565857198020.050.06
ABCB8 rs321458763615970.110.12
SLCO1C1 rs34288910597186190.140.14
ABCC1 rs3743527476278460.230.21
AKT1 rs3803304407317640.280.29
SLCO1A2 rs3834939366360770.320.30
PPARG rs3856806592181260.150.12
GSTA1 rs39573572604121240.410.41
UGT2A1 rs4148301644147100.100.11
ABCC3 rs4148413499236430.210.17
EPHX2 rs4149259569212220.160.17
DFFB rs437667369410610.070.09
SLC2A1 rs4658506266260.200.21
ESR2 rs49869383293671030.360.37
RPTOR rs61750765575208170.150.24
SLC22A1 rs6280313023721220.390.40
EPHX2 rs75114165213790.100.11
CMPK1 rs75430162404091430.440.45
GSTP1 rs7628032894021080.390.39
CES1 rs763362597148700.050.06
BLK rs922483429304610.270.23
PPARA rs962681462716390.110.10

Footnote: 1 Genotypes do not sum up to 805 due to missing data; 2 MAF = minor allele frequency.

2.2.2. Clinical Associations

In order to validate clinical associations observed in the testing phase, variants were evaluated against response and survival of patients in the validation set. All homozygous genotypes observed in less than five patients were grouped with the corresponding heterozygous genotype for enhancing the statistical power of comparisons. The variants that associated with response in both testing and validation phase are listed in Table 8 and thus these variants are considered validated variants with putative clinical importance.
Table 8

Validated variants significantly associating with the response of patients to neoadjuvant cytotoxic therapy in the validation phase.

GeneSNVGenotypesResponders 1Non-Responders 1p-Value 2p-Value Adj 4
SLC28A3 rs10868138 0.0130.266
Solute Carrier Family 28 (Sodium-Coupled Nucleoside Transporter), Member 3—Nucleoside transporter with broad specificity for pyrimidine and purine nucleosidesCommon homozygous10241
Rare allele 3231
ATP7A rs2227291 <0.0010.004
ATPase Copper Transporting Alpha—Copper transporterCommon homozygous8816
Heterozygous2924
Rare homozygote92
KCNAB1 rs2293194 0.0030.030
Potassium Voltage-Gated Channel, Shaker-Related Subfamily, Beta Member 1—Pottasium channelCommon homozygous429
Heterozygous6617
Rare homozygous1917
DFFB rs4376673 0.0070.017
DNA Fragmentation Factor Subunit Beta—DNA fragmentation factor involved in apoptosisCommon homozygous11532
Rare allele 31211

Footnotes: 1 numbers of responders (complete or partial remission) or non-responders (stable or progressive disease); 2 p-value by the Pearson test; 3 in the absence of rare homozygotes in any of the compared groups, effect of rare allele was evaluated; 4 adjusted p-value by the multivariate logistic regression adjusted to disease stage.

Analogously to the testing phase, associations of variants with DFS of all patients or patients stratified according to therapy were evaluated. Two variants (rs12460651 and rs751141) significantly associated with DFS of all patients (n = 745; without 60 patients lost to follow up) and three other variants (rs17376848, rs1801160, and rs2288587) associated with DFS of patients treated with neoadjuvant and/or adjuvant cytotoxic therapy (n = 371; without six patients lost to follow up). These patients comprised molecular subtypes: luminal A (n = 101), luminal B (n = 166), HER2 (n = 36) and triple negative (n = 66); data were missing for two patients. Variant rs2075061 associated with DFS only in patients treated with hormonal therapy without cytotoxic drugs (n = 312. Luminal A, n = 187; luminal B, n = 118 and seven missing). Of these variants, rs12460651, rs2075061, and rs751141 did not pass the gene dosage condition (Figure S4) and thus these variants could not be further considered validated. Validated variants associating with DFS in the cytotoxic therapy treated patients are depicted in Figure 4. In order to clarify the effect of molecular subtype on prognosis of the patients treated with neoadjuvant and/or adjuvant cytotoxic therapy, these patients were further stratified according to their molecular subtype. Associations with DFS were then calculated separately for each subtype (Table 9 and Figure S5). If single p-value of stratified patients was delivered (pooled log-rank test), variants rs1801160 and rs2288587 remained significantly associated with DFS (p < 0.001 and p = 0.030, respectively), but rs17376848 was non-significant (p = 0.083).
Figure 4

Kaplan-Meier plots with validated associations of variants with DFS of patients treated with cytotoxic therapy. Solid line represents the common homozygous genotype and dashed line the rare allele. Significance was evaluated by the log-rank test, n = number of individuals. In the absence of rare homozygotes in any of the compared groups, effect of rare allele was evaluated. (a): rs1801160; (b): rs17376848; (c): rs2288587.

Table 9

Validated associations of variants associating with DFS of patients treated with cytotoxic therapy according to their molecular subtypes.

GeneSNVGenotypesLuminal A 2Luminal B 2HER2 2TN 2,3
DPYD rs1801160 NS <0.001 NS 0.018
Dihydropyrimidine Dehydrogenase—Pyrimidine catabolic enzymeCommon homozygous90 150 33 63
Rare allele 111 16 3 3
DPYD rs17376848 NSNS 0.012 NS
Dihydropyrimidine Dehydrogenase—Pyrimidine catabolic enzymeCommon homozygous88146 33 59
Rare allele 11320 3 7
IRS1 rs2288587 0.002 NSNSNS
Insulin Receptor Substrate 1—Protein mediating various cellular processes by insulinCommon homozygous 93 1503357
Rare allele 1 7 1438

Footnotes: 1 In the absence of rare homozygotes in any of the compared groups, effect of rare allele was evaluated; 2 p-value by the log-rank test and numbers of patients (significant associations are depicted in bold); 3 Triple negative; NS = Non-significant.

In multivariate analyses, using Cox regression adjusted to tumor grade, disease stage and expression of hormonal receptors, the rare allele in rs1801160 in DPYD was associated with a significant hazard ratio (HR = 2.58, 95% CI = 1.48–4.50, p = 0.001), but the other two variants (rs17376848 also in DPYD and rs2288587 in IRS1) were non-significant (p = 0.071 and p = 0.115, respectively).

3. Discussion

There is no doubt that drug therapy tailored to individual genetic predisposition could bring substantial cost-benefit effects in terms of both enhanced drug efficacy and decreased risk of adverse drug reactions. Pharmacogenomics seem so far instrumentally more accessible than routine pharmacokinetics or pharmacodynamics in clinical day use. However, current approaches, including the state-of-the-art technological platforms such as the next generation sequencing, are still in the early evolutionary phase. Except the considerable decrease of cost per genotype in the last few years, the complexity of data management and the need for robust evaluation of results to make them clinically meaningful still hinder broader usage, especially in the pharmaceutical area. Thus, studies addressing these aspects are urgently needed. The present paper shows that out of quite large number of germline variants (18,245) detected among 509 pharmacogenes and other drug-related genes in breast cancer patients, only a few may be important from the view of individualized therapy after robust validation. Four variants associated with response of patients to neoadjuvant cytotoxic therapy and three, out of which just one was significant in multivariate analyses, were prognostic in terms of DFS after cytotoxic therapy. Responders to the neoadjuvant cytotoxic therapy carried, significantly more frequently than non-responders, the rare allele in rs10868138 of SLC28A3, the common homozygous genotype in rs2227291 (ATP7A), or rs4376673 (DFFB) or the common allele in rs2293194 (KCNAB1). However, the association of rs10868138 with response should be treated with caution while this association was non-significant after adjustment for disease stage in multivariate analysis. Rs2293194 was significantly associated with response only in patients with early stage disease 0 or I (p < 0.001). Patients with the common homozygous genotype in rs1801160 of DPYD survived longer without relapse after cytotoxic therapy than carriers of the rare allele. This effect was particularly pronounced in patients with luminal B and triple negative molecular subtypes. Protein coding gene SLC28A3 (Solute carrier family 28 member 3) is a sodium-dependent nucleoside transporter involved in the homeostasis of endogenous nucleosides and regulating multiple cellular processes, e.g., neurotransmission and metabolism and transport of nucleoside drugs [22] (https://www.genecards.org/cgi-bin/carddisp.pl?gene=SLC28A3&keywords=A-3). Genetic variability in SLC28A3 was previously connected with pharmacokinetics of nucleoside analogs [23] and cardiotoxicity of anthracyclines [24] although a more recent study did not confirm the latter observation [25]. Variation rs7867504 in SLC28A3 was shown to be involved in gemcitabine pharmacobiology and toxicity in metastatic breast cancer patients receiving maintenance therapy [26] or in patients with pancreatic carcinoma [27]. The rs10868138 in SLC28A3 is a less studied variant; however, it was recently connected with higher concentration of azathioprine in erythrocytes of patients with neuromyelitis optica [28], suggesting that it may be functional in vivo. The other relevant gene to pharmacogenomics of nucleoside analogs is dihydropyrimidine dehydrogenase (DPYD). The protein encoded by this gene (DPD) is a pyrimidine catabolic enzyme and the initial and rate-limiting factor in the pathway of uracil and thymidine catabolism (https://www.genecards.org/cgi-bin/carddisp.pl?gene=DPYD). DPD is active in the catabolic pathway of 5-fluorouracil and mutations in its gene result in an increased risk of toxicity in cancer patients receiving 5-fluorouracil chemotherapy [29]. DPYD polymorphism rs1801160, associated with survival of breast cancer patients in our study, is very frequently studied. Carriage of rs1801160 in DPYD associated with grade 3 or 4 5-fluoropyrimidine associated adverse risk effects, e.g., neutropenia, in a recent study of colon cancer patients treated with regimens consisting of 5-fluoro-uracil or capecitabine combined with oxaliplatin [30]. Although DPYD genotype-guided individualized dosing for better safety of fluoropyrimidine treatment was recently suggested as a new standard of care [31], the present study was not designed to address adverse effects and the validated association of DPYD variant with DFS adds a new observation to the knowledge base. Of the other validated variants associated with response to the therapy, the rs2227291 in ATP7A raises particular attention since it is non-synonymous (V767L) and thus probably more directly functional. Notably, rs2227291 is the only association with response in the validation set that passes the false discovery rate (p = 0.011). Copper transporter ATP7A encodes a transmembrane protein that functions in copper transport across membranes and it is frequently studied in connection with sensitivity to platinum drugs, e.g., cisplatin. Very recently, the rs2227291 was associated with cisplatin resistance in patients with epithelial ovarian cancer treated with combination of platinum and taxane [32]. However, though the authors state that carriers of the minor allele are more sensitive to cisplatin, the functional link is missing and must be obtained using further study. GWAS studies show that regulatory non-coding variants may play a role in multiple distinct diseases such as cancer [33] and thus the other two intronic variants associated with response (rs2293194 in KCNAB1 and rs4376673 in DFFB) also represent a potential target for further studies. KCNAB1 (Potassium Voltage-Gated Channel, Shaker-Related Subfamily, Beta Member 1) gene encodes a potassium channel involved in an important dopamine pathway, chemical transmission of signal across synapses and various CYP450 pathways (https://www.genecards.org/cgi-bin/carddisp.pl?gene=KCNAB1). In cancer genetics, KCNAB1 variation may play a role in breast cancer pathogenesis because its overexpression was found in breast tumors in comparison to non-tumor tissues [34]. Finally, DFFB is a subunit Beta and active component of DNA Fragmentation Factor protein (DFF). DFFB has been found to trigger both DNA fragmentation and chromatin condensation during the apoptosis (https://www.genecards.org/cgi-bin/carddisp.pl?gene=DFFB). For example, enhanced expression of DFFB with doxorubicin or in combination with sulfonamides enhanced the killing of T47-D breast cancer tumor cells via apoptosis [35,36]. Thus, variation and deregulation of the DFFB gene in the presence of apoptosis-inducing drugs might have an impact on their efficiency in tumor cells. Of the 88 loss of function variants identified in our study, only seven frameshift variants had MAF above 5% ensuring the necessary statistical power to estimate the associations with DFS or response. None of the associations of frameshift indels with outcome was statistically significant. Of the genes harboring these variants, only RRM2B, was in the first quartile of the most intolerant genes to functional variation, according to LoFtool gene score [37]. The rest of the genes in the first quartile were ABCA5/A6/A7/A10/A13, ABCB4/B10, ABCC2/C3/C5/C10/C11/C12, DHCR7, NR1I3, SLC35C2 and SLCO3A1. Of their corresponding proteins, mainly the multidrug resistance protein (MRP)2, MRP3, MRP5, and MRPs 7–9 coded by membrane transporters ACBC2, ABCC3, ABCC5, ABCC10, ABCC11, ABCC12 and the organic anion transporter polypeptide-related protein (OATP)3A1 coded by SLCO3A1 are of the highest importance because of the relation of MRPs and OATPs in the chemotherapy resistance or sensitivity [5,6]. However, associations with response or DFS in these genes could not be assessed due to the modest size of our cohort and the low frequency of these variants in population. Population context is currently broadly discussed, for example considerable gene-dependent variability between African and European Americans has recently been demonstrated [21]. The present study was performed on homogeneous population of Slavic Europeans. As such, adds unique information to the existing clinically associated datasets. The only publicly available data in the Czech population on the germline whole exome level are in the National Center for Medical Genomics (NCMG) set of healthy Czech population (n = 309 at time of writing). Of the total number of 509 genes, 503 genes (99%) contained at least one genetic alteration. No alterations were found in ABCF1, HSPA1A, RXRB, TAP1 (ABCB2), TAP2 (ABCB3) and VDAC1P4 genes in the present study. However, in comparison to the data in the NCMG set of healthy Czech population, these genes, except VDAC1P4, were polymorphic. In total, 54 variants were found in ABCF1, 13 in HSPA1A, 16 in RXRB, 78 in TAP1, and 88 in TAP2. Whether these differences are due to the different composition of both sets in terms of individual characteristics of participants (the present set contained only females while the NCMG set is composed of both genders) or due to the disease etiology (breast cancer patients versus healthy population) remains to be elucidated. Differences between sequencing platforms, raw data management and annotation cannot be excluded either. On the other hand, the most polymorphic genes with over one hundred alterations were NCOR2, ABCA13, RPTOR, ABCA4, CIT, BIRC6, ABCC1, ABCA1, RXRA, NCOR1, ABCA7, ABCC4 and ABCB5 in the present study and except for RXRA, all these genes showed more than 100 alterations in the NCMG set as well. ABCA13 is overall the 80th most polymorphic gene in NCMG data coming from the whole exome sequencing, while the rest of the top 80 variable genes in NCMG data were not analyzed in this study. Thus, although there are some similarities in these sets, the direct comparison of data from two sets within the same population points to some differences, mainly in the low MAF variants, and thus, multiethnic cohorts must be very carefully evaluated in this regard. The recent study by Kozyra et al. [21] reported ABCA4, ABCA1 and ABCC1 among genes with highest counts of variations suggesting that the most variable genes may be conserved across diverse populations. We aimed not only to contribute to the search for predictive genetic biomarkers for oncology, but also to set up a pipeline for processing of raw data generated by massively parallel gene panel sequencing, including quality controls. Last but not least, complex variant prioritization scheme including both evaluation of variants by associating them with individual patient data relevant to their pharmacological response and further filtration using in silico predictive tools and pharmacological databases is provided. Moreover, robust validation by means of comparison of results obtained by two technological platforms and two stage study evaluation using testing and validation clinical cohorts was accomplished. Public databases such as PharmGKB are wealthy sources of germline variants which evolved from laborious curation of published studies and a strong need for systematic use of perished knowledge in personalized medicine [38,39]. At the time of writing of this article, 21,115 annotations in 647 drugs associated with drug response at pharmacodynamic and/or pharmacokinetic level were in PharmGKB (https://www.pharmgkb.org/, accessed 4 November 2018). Despite the significant number of annotations available, automated prediction for drug response of sequenced variants is not available. Many in silico tools have been developed to aid with the prediction mostly for coding variants. Evolutionary characteristics of variants in pharmacogenes involved in biotransformation and transport of drugs are, however, different. This complicates accurate estimates provided by methods mostly built on Mendelian disease principles [40]. Consequently, genomic evolutionary rate profiling or evolutionary constraint algorithms, as well as tools trained on disease pathogenic/neutral variants were not included in our in silico sequence tools set. Several attempts have been made to generate specialized tools scaled for pharmacogenes or to optimize current models for pharmacogenetic assessments [40,41]. Nonetheless, “gold standard” methods are still lacking in the public domain. Furthermore, even no standard recommendations on the number or types of in silico tools to be considered in analyses which may have significant impact on results are available [42]. While this prevents to a certain extent potentially incorrect use and interpretation in clinical practice, academic research is also hindered. In our research we attempted to combine different approaches to acquire complex information for given variants. Prediction or knowledge acquired for prioritized variants was not further manually curated. The reason was to verify the ability of automated prioritization and to estimate the added value of in silico tools for our further studies. The modest size of the testing set may be seen as a limitation of the study. Due to this fact, the importance of very rare (MAF < 0.001) and rare (<0.01) variants could not be assessed. Thus, we prioritized variants with MAF > 0.05. In the light of the recently acknowledged contribution of rare variants to inter-individual variability in drug response [40] this limitation needs to be considered in future pharmacogenomic studies in oncology. On the other hand, ethnical homogeneity and completeness of clinical follow up is considered beneficiary. Moreover, the study may be extended by addition of more patients or compiling with similarly designed set of patients with whole exome or genome data. Another limitation of this study is the nonhomogeneity of the patient sets. The advanced disease stage and the molecular subtype can be seen as the strongest modifiers of patient prognosis. We have employed the multivariate analyses adjusted to disease stage and we have analyzed the associations of variants with molecular subtype in the testing set to circumvent these issues. We also analyzed associations with survival separately in neoadjuvantly treated patients (with predominant luminal subtype) and adjuvantly treated patients (triple negative tumors only) in the testing set and ran the full prioritization pipeline again. Despite some slight discrepancies which might be caused by chance due to small sizes of compared groups, all the major conclusions of this study remained unchanged. Functional studies of the identified variants and genes will be the next step. Functionality of a variant may be studied using CRISPR-Cas9 gene editing in a suitable tumor cell model in vitro. Subsequently gene function, including response of the model cell line to clinically relevant drugs, e.g., taxanes, may be followed.

4. Materials and Methods

4.1. Patients

The testing study included a total of 105 breast cancer patients of Caucasian origin diagnosed in the Institute for the Care for Mother and Child and Medicon in Prague and Hospital Atlas in Zlin (all in the Czech Republic) during 2006–2013. Patients underwent neoadjuvant cytotoxic therapy with regimens based on 5-fluorouracil/anthracyclines/cyclophosphamide (FAC or FEC) and/or taxanes (n = 68) or postoperative adjuvant therapy using the same cytotoxic drugs (n = 37). The validation set was composed of 805 breast cancer patients recruited in Motol University Hospital, Institute for the Care for Mother and Child, and Medicon in Prague and Hospital Atlas in Zlin during 2001–2013. Collection of blood samples and retrieval of clinical data was performed as described previously [43]. The following data on patients were retrieved from medical records: age at diagnosis, menopausal status, personal medical history, family history (number of relatives affected by breast/ovarian carcinoma or other malignant diseases), stage, tumor size, presence of lymph node metastasis, histological type and grade of the tumor, expression of estrogen, progesterone, and ERBB2 (v-erb-b2 avian erythroblastic leukemia viral oncogene homolog 2, OMIM:164870) receptors, expression of Ki67 (proliferation-related Ki-67 antigen, OMIM:176741), response to the therapy according to RECIST criteria [44], and DFS. DFS was defined as the time elapsed between surgery and disease recurrence. Response to the neoadjuvant therapy was evaluated based on ultrasonography performed before and after the cytotoxic therapy. All patients were informed about the study and those who agreed and signed an informed consent participated in the study. The study was approved by the Ethical Commission of the National Institute of Public Health in Prague (ethic code: Č.15-25618A, 6 August 2014). The methods were carried out in accordance with guidelines approved by the Ethical Commission.

4.2. DNA Extraction

Blood samples were collected during the diagnostic procedures using tubes with K3EDTA anticoagulant. Genomic DNA was isolated from human peripheral blood lymphocytes by the standard phenol/chloroform extraction and ethanol precipitation method [45]. DNA was quantified by Quant-iT PicoGreen DNA Assay Kit (Invitrogen, Carlsbad, CA, USA). DNA samples were stored in aliquots at −20 °C prior to analysis.

4.3. Gene Panel Selection

The genes were selected according to the following criteria: (i)gene in PharmGKB with published association to anthracyclines, doxorubicin, daunorubicin, 5-fluorouracil, cyclophosphamide, paclitaxel or docetaxel; pharmacokinetics pathway of doxorubicin, 5-fluorouracil, cyclophosphamide, docetaxel or paclitaxel; (ii)association with metabolism of xenobiotics, drug metabolism, sulfur metabolism and transport in Phenopedia. The selected genes were divided into groups according to major function, i.e., transport (ABCs, SLCs), metabolism (CYPs, UGTs, etc.), cell death (CASPs, BCLs, etc.), nuclear receptors, targets and signaling genes (Figure S1). All selected genes were validated using NimbleDesign web tool (Nimblegen, Roche, Prague, Czech Republic) (list of genes in (Table S2).

4.4. Targeted Sequencing

Libraries encompassing all exons of selected genes were prepared following the previously published design [46]. Based on the character of probe design, i.e., tiling; the exons were surrounded by approximately 30 bp regions of intronic sequences which were also sequenced in both directions. Target enrichment was performed by the Nimblegen’s SeqCap EZ Choice (Roche, Prague, Czech Republic) using a standard SeqCap protocol [47]. Samples were sequenced on an Illumina MiSeq platform (Illumina Inc., San Diego, CA, USA). Twelve samples were sequenced in each run with the planned minimal coverage 60–100. Data were aligned to the hg19 reference genome with the Burrows-Wheeler Aligner (BWA, Cambridge, UK) 0.7.12 [48], SAM to BAM conversion was done by Samtools 1.4.1 (Wellcome Trust Sanger Institute, Hinxton, UK), duplicate removal by Picard 2.17.10 and base recalibration, local realignment and detection of single nucleotide polymorphisms (SNP) and small indels were done by the Genome Analysis Toolkit (GATK, Broad Institute, Cambridge, UK) 3.7 Haplotype Caller according to GATK Best Practices [49]. The variants were annotated using Annovar (version 2018 Apr 16, Pennsylvania, PA, USA) [50] and VEP 94 (Wellcome Trust Sanger Institute, Hinxton, UK) [51] (Figure 1).

4.5. Genotyping

In the validation phase, 58 genetic variants with clinical associations were analyzed in DNA from 805 breast cancer patients using KASPTM technology (LGC Genomics, Hoddesdon, UK). Quality control was performed by determination of duplicate samples for approximately 10% of the samples in both phases. The genotyping concordance between duplicate samples exceeded 99%.

4.6. Data Analysis

The raw variants from targeted sequencing were recalibrated using GATK 3.7. Hardy-Weinberg test was computed using Bcftools 1.5 (Cambridge, UK). Only variants in Hardy-Weinberg equilibrium (p > 0.01), with MAF > 0.05 and with less than 50% of missing data were considered for statistical and functional evaluations. Comparison of response to the therapy with respect to groups of patients (common homozygous, heterozygous and rare homozygous) was based on the Pearson chi-square test for each variant. Adjusted p-value was calculated for each variant and each of these tests. Computation of adjusted p-value was as follows: (1) p-value based on original data was calculated; (2) 1000 permutations of original data were generated; (3) value of test statistic was calculated for each permutation; (4) proportion of p-values based on permuted data (1000 p-values for each test) which were higher or equal than p-value based on original data was calculated; and (5) adjusted p-value was obtained for given variant. For multivariate analyses, binary logistic regression was used. Comparison of DFS with respect to groups of patients (common homozygous, heterozygous and rare homozygous) was performed by the log-rank test for each variant separately. Kaplan-Meier plot for each variant separately was generated as well for visual comparison. As study follow-up was set to 120 months (10 years) then if the value of DFS for some subject was higher than 120 months, value for this subject was set to 120 and censored. Adjusted p-value for log-rank test was based on 100 permutations of original data. Methodology of computation for adjusted p-value for each variant was performed in a similar way as it was mentioned previously. A p-value of less than 0.05 after adjustment for multiple testing by using 100 permutations for survival and 1000 permutations for response was considered statistically significant. Analyses were conducted using the statistical program SPSS v16.0 (SPSS, Chicago, IL, USA) and using the R script. Cox regression was also performed in analysis of DFS separately for variants which were statistically significant based on adjusted p-values. Response variable was DFS and predictor variable was variant. Based on likelihood ratio test for testing statistical significance of variant, p-value was recorded. After that, adjustment on multiplicity only for these p-values was performed. Considered adjustments were Bonferroni method, Hochberg method, Hommel method, Holm method and false discovery rate method. Analysis for comparison of DFS with respect to groups of patients was also performed for subgroups of adjuvantly treated patients and neoadjuvantly treated patients. Methodology for adjusted p-values derivation, Cox regression analysis and adjustment methods on multiplicity are the same as previously. Permutations of original data (100 permutations) were generated separately for adjuvant patients and for neoadjuvant patients. For functional prediction germline variants were annotated by Annovar and VEP. Choice of in silico tools was based on scope of the prediction for given software with the intention of ensuring prediction for all types of consequences in our set. For non-synonymous/missense and splice site variants, dbNSFP 3.5a provided binary predictions by ensemble scores metaSVM, metaLR and dbscSNV scores, respectively [52]. In addition, a strict consensus in prediction results while excluding all variants for any of the missing prediction was applied to missense variants in “sequence in silico tools set”. The set encompass Mutation Assessor (H/M = functional), SIFT (D: Deleterious; ≤0.05), LRT (D: Deleterious) and Provean (D: Damaging) software tools. This approach is not based on machine learning methods and thus overcomes the concern of type 2 circularity due to insufficient discrimination of deleterious variants from neutral ones within given protein in training dataset [53]. CADD v1.3 (cut-off value ≥ 19, VEP) for all types of variants (i.e., coding, non-coding SNVs and short insertion/deletions) provided supplementary ensemble score. Further, splicing defect prediction was also annotated by VEP, flagging variants in a high information position of a transcription factor binding profile (TFBP). Splice donor/acceptor variants were annotated by MaxEntScan (based on the Maximum Entropy principle and neural networks) with score for reference and alternative variants. The higher score in MaxEntScan implied for a higher probability of the sequence being a true splice site. Known regulatory elements in the intergenic regions (e.g., DNAase hypersensitivity, binding sites of transcription factors, and promoter regions that have been biochemically characterized to regulation transcription) were predicted for deleteriousness by Regulome DB score [54] following classification category 1. These variants were considered likely affecting binding to transcription factors and expression of a gene target. A very recently developed web-based IW-Scoring framework (http://www.snp-nexus.org/IW-Scoring/) and PINES (Phenotype-Informed Noncoding Element Scoring) [55] were additionally used. IW score was provided for known (IW score K11) and novel (IW score N8) non-coding and coding synonymous variants. PINES (p-value ≤ 0.05) provided a ranked list of non-coding variants with functional characterization for liver tissue as a major site of drug biotransformation. Biological targets of miRNAs and conserved sites of given variant (UTR3) were matched by TargetScan (release 7.2, Cambridge, MA, USA). Complementary to in silico predictions, evidence from pharmacogenomic and clinical databases was provided. Queried databases included, e.g., PharmGKB [38], PharmVar (available only for CYP2C9, 2C19, 2D6 genes), ADReCS-Target an Adverse Drug Reaction Classification System-Target Profile (http://bioinf.xmu.edu.cn/ADReCS-Target) [56], which provides comprehensive information about ADRs caused by drug interaction with protein, gene and genetic variation, PheWas Resources (phenome-wide association studies with antineoplastic drugs), Clinvar and dbSNP.

5. Conclusions

Through massive parallel sequencing, germline variability within a panel consisting of genes with relationships to drug metabolism and disposition, cell death and major oncogenic pathways was assessed in Czech breast cancer patients for the first time. Technically and clinically validated associations of rs2227291 in ATP7A, rs2293194 in KCNAB1 (in early stage patients), and rs4376673 in DFFB with response to neoadjuvant cytotoxic therapy provide new putative loci for subsequent functional studies. The frequently studied rs1801160 in DPYD significantly associated with disease-free survival of patients treated with cytotoxic drugs and represents additional provocative target with prognostic potential namely in patients with luminal B or triple negative tumors. Additionally, the present study brings complex insights into the prioritization of variants using individual clinical data, emerging in silico tools and established pharmacogenomic databases.
  6 in total

1.  DNA Copy Number Aberrations and Expression of ABC Transporter Genes in Breast Tumour: Correlation with the Effect of Neoadjuvant Chemotherapy and Prognosis of the Disease.

Authors:  Matvey M Tsyganov; Marina K Ibragimova; Kseniya A Gaptulbarova; Irina A Tsydenova; Daria S Dolgasheva; Evgeniy Y Garbukov; Anastasia A Frolova; Elena M Slonimskaya; Nikolai V Litvyakov
Journal:  Pharmaceutics       Date:  2022-04-27       Impact factor: 6.525

Review 2.  Toxicity and Pharmacogenomic Biomarkers in Breast Cancer Chemotherapy.

Authors:  Zeina N Al-Mahayri; George P Patrinos; Bassam R Ali
Journal:  Front Pharmacol       Date:  2020-04-15       Impact factor: 5.810

3.  Targeted Sequencing of Pancreatic Adenocarcinomas from Patients with Metachronous Pulmonary Metastases.

Authors:  Viktor Hlavac; Beatrice Mohelnikova-Duchonova; Martin Lovecek; Jiri Ehrmann; Veronika Brynychova; Katerina Kolarova; Pavel Soucek
Journal:  Genes (Basel)       Date:  2020-11-24       Impact factor: 4.096

4.  5' Untranslated Region Elements Show High Abundance and Great Variability in Homologous ABCA Subfamily Genes.

Authors:  Pavel Dvorak; Viktor Hlavac; Pavel Soucek
Journal:  Int J Mol Sci       Date:  2020-11-23       Impact factor: 5.923

5.  Consensus of experts from the Spanish Pharmacogenetics and Pharmacogenomics Society and the Spanish Society of Medical Oncology for the genotyping of DPYD in cancer patients who are candidates for treatment with fluoropyrimidines.

Authors:  P García-Alfonso; M Saiz-Rodríguez; R Mondéjar; J Salazar; D Páez; A M Borobia; M J Safont; I García-García; R Colomer; X García-González; M J Herrero; L A López-Fernández; F Abad-Santos
Journal:  Clin Transl Oncol       Date:  2021-11-13       Impact factor: 3.405

6.  Role of Genetic Variation in Cytochromes P450 in Breast Cancer Prognosis and Therapy Response.

Authors:  Viktor Hlaváč; Radka Václavíková; Veronika Brynychová; Pavel Ostašov; Renata Koževnikovová; Katerina Kopečková; David Vrána; Jiří Gatěk; Pavel Souček
Journal:  Int J Mol Sci       Date:  2021-03-10       Impact factor: 5.923

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.