Literature DB >> 20525348

Copy number variation and cytidine analogue cytotoxicity: a genome-wide association approach.

Krishna R Kalari1, Scott J Hebbring, High Seng Chai, Liang Li, Jean-Pierre A Kocher, Liewei Wang, Richard M Weinshilboum.   

Abstract

BACKGROUND: The human genome displays extensive copy-number variation (CNV). Recent discoveries have shown that large segments of DNA, ranging in size from hundreds to thousands of nucleotides, are either deleted or duplicated. This CNV may encompass genes, leading to a change in phenotype, including drug response phenotypes. Gemcitabine and 1-beta-D-arabinofuranosylcytosine (AraC) are cytidine analogues used to treat a variety of cancers. Previous studies have shown that genetic variation may influence response to these drugs. In the present study, we set out to test the hypothesis that variation in copy number might contribute to variation in cytidine analogue response phenotypes.
RESULTS: We used a cell-based model system consisting of 197 ethnically-defined lymphoblastoid cell lines for which genome-wide SNP data were obtained using Illumina 550 and 650 K SNP arrays to study cytidine analogue cytotoxicity. 775 CNVs with allele frequencies > 1% were identified in 102 regions across the genome. 87/102 of these loci overlapped with previously identified regions of CNV. Association of CNVs with gemcitabine and AraC IC50 values identified 11 regions with permutation p-values < 0.05. Multiplex ligation-dependent probe amplification assays were performed to verify the 11 CNV regions that were associated with this phenotype; with false positive and false negative rates for the in-silico findings of 1.3% and 0.04%, respectively. We also had basal mRNA expression array data for these same 197 cell lines, which allowed us to quantify mRNA expression for 41 probesets in or near the CNV regions identified. We found that 7 of those 41 genes were highly expressed in our lymphoblastoid cell lines, and one of the seven genes (SMYD3) that was significant in the CNV association study was selected for further functional experiments. Those studies showed that knockdown of SMYD3, in pancreatic cancer cell lines increased gemcitabine and AraC resistance during cytotoxicity assay, consistent with the results of the association analysis.
CONCLUSIONS: These results suggest that CNVs may play a role in variation in cytidine analogue effect. Therefore, association studies of CNVs with drug response phenotypes in cell-based model systems, when paired with functional characterization, might help to identify CNV that contributes to variation in drug response.

Entities:  

Mesh:

Substances:

Year:  2010        PMID: 20525348      PMCID: PMC2894803          DOI: 10.1186/1471-2164-11-357

Source DB:  PubMed          Journal:  BMC Genomics        ISSN: 1471-2164            Impact factor:   3.969


Background

It is known that inherited genomic CNV is linked to risk for human disease and response to treatment. It has also been established for decades that genomic variation, including CNV in germline DNA, can help predict variation in efficacy and/or adverse responses to therapeutic drugs [1-5]. For example, individuals with multiple copies of the gene encoding the drug metabolizing enzyme CYP2D6 are "ultrarapid" metabolizers as compared to those with CYP2D6 deletions ("poor" metabolizers), and these genotypes are associated with variation in response to a large number of drugs [4,6]. CNVs within the human genome are not rare events. Redon et. al. [7] identified nearly 1,500 CNV regions scattered throughout the genome in 270 HapMap samples. Those regions comprised approximately 10% of the human genome, encompassing coding and non-coding regions, as compared to the < 1% of the genome that is occupied by SNPs [8]. CNVs appear to be present at lower frequencies than SNPs [9], but this may be due in part to the techniques utilized to identify them. Thus, the prevalence and biological significance of CNVs may be underestimated. As of early 2009, nearly 6,225 CNV loci had been cataloged by the Database of Genomic Variants http://projects.tcag.ca/variation/. In addition, nearly 18% of mRNA species that are genetically regulated through cis effects could be explained by CNVs [10]. Together with SNP genotypes, CNV data can be generated with SNP arrays [7,9,11-13]. Although these methodologies have limitations [14], CNVs, depending on their size and location, may be just as important for variation in function as are SNPs. The cytidine analogues, gemcitabine and AraC, show significant therapeutic effect in several types of cancer. Gemcitabine is mainly used to treat solid tumors [15,16] while AraC is used to treat acute myelogenous leukemia [17]. Clinical response to these two drugs varies widely, and previous studies showed that inheritance can contribute to the variation in response of these two drugs [18]. In this study, we set out to test the hypothesis that CNV might contribute to variation in gemcitabine and AraC response in 197 EBV transformed lymphoblastoid cell lines using SNP data obtained with Illumina 550 and 650 K SNP arrays.

Methods

Genotyping and populations

A subset of the "Human Variation Panel" lymphoblastoid cell lines consisting of 60 Caucasian-American (CA), 54 African-American (AA), and 60 Han Chinese-American (HCA), as well as 23 CEPH Caucasian HapMap EBV transformed cell lines was obtained from the Coriell Cell Repository (Camden, NJ). These cell lines had been obtained from healthy individuals and were anonymized by the National Institute of General Medical Sciences prior to deposit. All of these individuals had provided written consent for the use of their cells and DNA from those cells to be used for experimental purposes. We genotyped the AA DNA from these cell lines using the Illumina Human Hap 650 beadchip (Human660W-Quad v1), and the Illumina Human Hap 550 beadchips were used to genotype the remainder of the samples. All samples were genotyped in the Mayo Clinic Genotyping Core Facility. All but two samples had a call rate greater than 98%, and those two samples, even after repetition, had call rates between 95 and 98%. We assessed LRR standard deviation (SD) for our samples and found that none of the samples had a SD less than 0.21. Quality control (QC) recommendations for the PennCNV or QuantiSNP algorithms suggest using a SD < 0.3 [19]. Since, our LRR standard deviation did not exceed this QC threshold; we also used those two samples in the analysis. For consistency, we did not include the additional 100 K SNPs genotyped for the AA samples in the CNV analysis.

Copy number identification

Bead Studio version 3.1 was used to obtain log R ratios and B allele frequencies for 550,000 SNPs in the 197 samples studied. LogR ratios were generated by comparing our experimental LogR values to Bead Studio's built in multi-ethnic HapMap population. CNV genotyping was performed using an Objective Bayes Hidden-Markov model (QuantiSNP) [20] plug-in within Illumina's Bead Studio interface. QuantiSNP is a statistical algorithm that utilizes joint information with regard to log R ratios and B allele frequencies for quantitative SNP array data analysis that allows for precise discovery and mapping of copy number changes. We used the QuantiSNP parameters recommended by Illumina: expectation maximization = 10, CNV length = 10,000, maximum copy number returned = 4, no GC content normalization, and score threshold = 50. After applying the QuantiSNP algorithm, we exported the CNV values and confidence values for each SNP out of the Bead Studio software. Using our own in-house programs written in R and Perl, we then separated all SNPs associated with CNVs that were observed in two or more samples (frequency > 1%). These thresholds and parameters were set conservatively to accurately identify CNVs under these conditions. This approach should reduce the false positive rate, but at the risk of increasing the false negative rate and missing more common, yet smaller CNVs that are inherently more difficult to detect. After transforming CNV values for each SNP into deletion (CNV value < 2), normal (CNV value = 2) or amplification (CNV value > 2), distinct copy number regions were obtained by merging neighboring SNPs with identical CNVs across samples. A detailed description of the methods used is available in the Additional file 1 Methods Section.

Copy number validation

Eleven copy number regions found to be associated with gemcitabine and AraC IC50 values (p < 0.05) were selected for validation using multiplex ligation-dependent probe amplification (MLPA). Oligonucleotides were preferentially designed based on a successful assay, followed by selection for coding sequences and underlying p-values. M13 sequence was attached to each probe together with a complementary FlexMap100 sequence (Luminex, Austin, TX). Specifically, 80 ng of DNA was denatured at 98°C for 5 minutes, followed by 25°C for 1 minute. In an 8 μL reaction, 80 ng of DNA and 0.3 femtomole/μL of each probe were mixed with 1.5 μL of MLPA buffer (MRC-Holland, Amsterdam, Netherlands). Probes were allowed to hybridize at 60°C for 16-24 hours. Probes were ligated in a reaction containing 25 μl H2O, 3 μL Ligase-65 Buffer A and B (MRC-Holland), and 1 μL Ligase-65 (MRC-Holland) at 54°C for 15 minutes, followed by 98°C for 5 minutes. Each 50 μL PCR reaction consisted of 10 μL of ligated product mixed with 27.5 μL H2O, 5 μL 10 × buffer (Invitrogen, Carlsbad, CA), 1.5 μL 50 mM MgCl2 (Invitrogen), 4 μL 10 mM dNTPs (Applied Biosystems, Foster City, CA), 0.5 μL 10 μM M13 primers, and 1 μL Platinum Taq (Invitrogen). 10 μL of PCR product was then added to 40 μL of bead mix containing 2,000 beads for each FlexMap Microsphere (Luminex) suspended in 1 × TMAC, and the mixture was incubated at 96°C for 2 minutes, followed by 37°C for 60 minutes. Following incubation, 0.2 μL of Streptavidin R-Phycoerythrin Conjugate (Invitrogen) plus 25 μL of 1 × TMAC was added and incubated at room temperature for 30 minutes. Samples were assayed on a LiquiChip 100IS System (Qiagen, Valencia, CA) and results were analyzed with GeneMarker 1.6 software. Of the 11 CNV assayed, one (chr14CNV87:106047919-106066496), did not provide adequate signal intensity for analysis.

MTS assay

AraC was purchased from Sigma-Aldrich (St. Louis, MO) and gemcitabine was provided by Eli Lilly (Indianapolis, IN). Cytotoxicity assays were performed with the CellTiter 96® Aqueous Non-Radioactive Cell Proliferation Assay (Promega Corporation, Madison, WI). The drug concentrations used to perform these experiments were described in detail previously by Li et al., 2008 [18].

Statistical analysis

The cytotoxicity phenotype (IC50) was determined on the basis of the best fitting curve, either 4 parameter logistic, 4 parameter logistic with top = 100%, or 4 parameter logistic with bottom = 0%. The curves were constructed using the dose response curves package in R. The logistic model with the lowest mean square error was used to determine IC50 values for gemcitabine and AraC as described in Li et al. [18]. Drug response phenotypes (IC50 values) for both drugs were adjusted for ethnicity, gender and storage time of the 197 samples using linear regression (natural log transformation applied to IC50 values). In addition, CNV values were adjusted for ethnicity and gender. Linear regression was then used to perform association with adjusted CNV values (residuals from regressing CNV against ethnicity and gender) with adjusted IC50 phenotypes (residuals from regressing log IC50 against ethnicity, gender and storage time). P-values for association were obtained after performing 1000 permutations for both gemcitabine and AraC IC50 values.

Transient transfection and RNA interference

Human MiaPaca-2 pancreatic cancer cells were transfected with siRNA using Lipofectamine RNAMAX (Invitrogen). Specifically, cells were seeded into 96-well plates and were mixed with siRNA-complex containing 50 nM specific or negative control siRNA (Qiagen) and transfection reagent (Invitrogen) in Opti-MEM® I Reduced Serum Media (Invitrogen). Forty eight hours post-transfection, cells were harvested for cell-based assays. SMYD3 siRNA and negative control siRNA were purchased from Qiagen and were used as suggested by the manufacturer. Sequences for siRNA against SMYD3 were: Sense strand: GGC GAU CAU AAG CAG CAA UdTdT CGA UUA UAA UAA AUU CAA CdTdT Antisense strand: AUU GCU GCU UAU GAU CGC CdTdT UUU GAA UUU AUU AUA AUC GdTdG Sequences for negative control siRNA were: Sense strand: UUC UCC GAA CGU GUC ACG UdTdT Antisense strand: ACG UGA CAC GUU CGG AGA AdTdT

Real-time quantitative reverse transcription-PCR

Total RNA was isolated from cultured cells with the Qiagen RNeasy kit (Qiagen), followed by QRT-PCR performed with the 1-step, Brilliant II SYBR Green QRT-PCR master mix kit (Stratagene, La Jolla, CA). Specifically, primers purchased from Qiagen were used to perform QRT-PCR using the Stratagene Mx3005P™ Real-Time PCR detection system (Stratagene). All experiments were performed in triplicate with β-actin as an internal control. Control reactions lacked RNA template.

Results

CNV identification

We used the QuantiSNP parameters recommended by Illumina for copy number identification. The QuantiSNP algorithm in Illumina provided CNV values and confidence values for each SNP and sample. After pre-processing the data, we had 73,738 SNPs with CNV values other than "normal" (CNV value = 2). 1,674 SNPs were retained in the analysis after excluding SNPs that did not display variation in at least two samples (minor allele frequency > 1%). We then applied a simple segregation algorithm as described in Additional file 1 Methods and identified 775 CNVs at 102 loci using the 197 DNA samples obtained from 3 ethnic groups. Figure 1 shows the CNV call results using CNV region display in Bead Studio Software for 15 samples selected randomly from among the 197 samples assayed. From the randomly selected data displayed in Figure 1, it is clear that specific CNV regions could be associated with multiple DNA samples. Copy number loci or regions can also have multiple forms of variation, probably as a result of different breakpoints. The mean and median CNV frequencies per sample were 3.9 and 4.0, respectively, with a maximum value of 10. The 102 CNV loci identified represented 7.8 Mb of sequence, with an average length of 77 kb and a median of 20 kb (90 bp to 1.7 Mb) (Table 1). Twenty five CNVs identified in this study were observed in all three ethnic groups; 45 CNVs were found in 2 ethnicities; and 32 CNVs were observed in only one ethnic group (Table 1). No loci below 1% copy number frequency were reported in this study.
Figure 1

Visualization of copy number regions identified in 15 randomly selected samples using Bead studio software. Randomly selected individual samples are listed on the X-axis and chromosomes on the Y-axis. Each colored bar represents one CNV call. Colors indicate copy number; where dark red indicates copy # 0, dark orange indicates copy # 1, dark blue indicates copy # 3, blue violet indicates copy # 4+. The thickness of the band indicates the length of the CNV region.

Table 1

CNV regions identified in 197 DNA samples.

CNV_IDStartStopSizeAACAHCACombined CNV Frequency
chr1CNV11278917712834675454980.0360.0240.0500.035

chr1CNV29490677094925850190800.0910.0000.0000.025

chr1CNV3105966892106000090331980.0550.0000.0170.020

chr1CNV41473066901474143621076720.0360.0120.0000.015

chr1CNV51876652611878093521440910.0000.0000.0830.025

chr1CNV6195092486195160949684630.0730.0000.0830.045

chr1CNV724370719024371398467940.0000.0360.0000.015

chr2CNV8410923764109900566290.2730.0240.0170.091

chr2CNV95728145757295357139000.0000.0360.0000.015

chr2CNV1089397452898777784803260.0360.0600.0170.040

chr2CNV11110228954110315618866640.0360.0120.0500.030

chr2CNV12184808081184866619585380.1270.0120.0000.040

chr2CNV13242566407242653950875430.0000.0840.0000.035

chr3CNV1461943266211038167120.0730.0000.0000.020

chr3CNV15530034155301008466690.0000.0480.0170.025

chr3CNV166516828665187636193500.0180.1200.1170.091

chr3CNV177553579075610832750420.0000.0360.0000.015

chr3CNV18101837214101854561173470.0000.0360.0000.015

chr3CNV19152997395153028291308960.0000.0360.0000.015

chr3CNV2016361410216361794038380.0180.0600.0000.030

chr3CNV2116370154316371056490210.0000.0120.2830.091

chr3CNV22166750382166766442160600.0360.0120.0000.015

chr3CNV23177371924177397828259040.0000.0720.0000.030

chr3CNV24189070613189088009173960.0000.0120.0330.015

chr3CNV2519254861519255098223670.0000.1200.0000.051

chr4CNV26643868886439166447760.0000.0240.0330.020

chr4CNV276478048864795145146570.0000.0000.0500.015

chr4CNV288840574688445557398110.0550.0000.0000.015

chr4CNV29104433739104454829210900.0910.0000.0000.025

chr4CNV30161277505161290832133270.0000.0600.0000.025

chr4CNV31162083578162175279917010.0000.0120.0830.030

chr5CNV3299554039976731213280.0360.0360.0000.025

chr5CNV339707523697107276320400.0360.0360.0000.025

chr5CNV3411741845711742031118540.1090.0600.0000.056

chr5CNV351203379921204401191021270.0550.0000.0000.015

chr5CNV3616047475216047960648540.0000.0000.0500.015

chr2CNV376707665167104015273640.0180.1810.0000.081

chr6CNV387903111179086086549750.2550.4100.0830.268

chr6CNV39936320519363421321620.0730.0000.0170.025

chr6CNV40167619368167688151687830.0550.0000.0000.015

chr7CNV4164334979645536722186930.0000.0240.0330.020

chr7CNV4276038186763949833567970.0730.0360.0330.045

chr7CNV4381761377817618494720.0000.0000.0500.015

chr7CNV448918743689247424599880.0360.0120.0000.015

chr7CNV45141420759141433796130370.0360.0600.0670.056

chr8CNV463775146377695518090.0550.0360.0500.045

chr8CNV473987675399089932240.0730.0000.0000.020

chr8CNV485583294559168583910.0180.0360.0000.020

chr8CNV49136439041365268087760.0360.0120.0000.015

chr8CNV501365787113680110222390.1820.0120.0000.056

chr8CNV51163070521630908520330.0180.0480.0000.025

chr8CNV5272378670723789843140.0000.0360.0000.015

chr8CNV531377574121379196301622180.0180.0360.0000.020

chr9CNV5450771551349557800.0550.0000.0000.015

chr9CNV55581094597738166440.0730.0000.0000.020

chr9CNV565376384538219958150.1090.0000.0000.030

chr9CNV579782326978611637900.0000.0000.0500.015

chr9CNV5811941204121751852339810.0180.0000.0670.025

chr10CNV59208910192089457435550.0000.0360.0330.025

chr10CNV60454898544717361916837650.2000.1570.0500.136

chr10CNV61581861185818868225640.0360.0240.0000.020

chr10CNV625858365758603555198980.0000.0000.0670.020

chr10CNV63122759910122774261143510.1270.0000.0000.035

chr10CNV641351163791352199951036160.0730.0600.0170.051

chr11CNV6558585285889688311600.0000.0000.0500.015

chr11CNV662114566221170916252540.1090.0120.0000.035

chr11CNV67255744252558072062950.0360.0120.0000.015

chr11CNV682566273425676867141330.1270.0120.0000.040

chr11CNV695513973355201444617110.0360.1200.2670.141

chr11CNV7055217364553468721295080.0360.0000.0000.010

chr11CNV718118237381192815104420.0180.1200.1170.091

chr11CNV72991540249915614321190.0000.0120.0500.020

chr11CNV7313249981413250944696320.0000.0360.0000.015

chr11CNV74134154053134211153571000.0180.0360.0000.020

chr12CNV7578881577982106939490.0180.0000.0500.020

chr12CNV761936410219442103780010.0000.0240.0170.015

chr12CNV7731180151312980761179250.0360.0600.0170.040

chr12CNV7862269256624153751461190.0360.0120.0330.025

chr12CNV796916207569162165900.0180.0240.0330.025

chr12CNV801260560071260565255180.0550.0000.0170.020

chr12CNV81127794683127830402357190.0000.0120.0170.010

chr12CNV82130297674130314013163390.0000.0240.0170.015

chr12CNV8313036891313037845195380.0360.0240.0330.030

chr14CNV84435769014358437274710.0000.0000.1330.040

chr14CNV858535866685376726180600.0180.0240.0000.015

chr14CNV868554005485557089170350.0000.0360.0000.015

chr14CNV871059970701062376392405690.0730.1570.0670.106

chr15CNV88302988473030163327860.0730.0120.0330.035

chr15CNV893253002532587887578620.1270.0720.0670.086

chr15CNV903272468132757729330480.0000.0360.0000.015

chr17CNV9160478376061766139290.0360.0120.0000.015

chr17CNV921498842414998870104460.0000.0000.1170.035

chr18CNV9319150331964966499330.0180.0360.0000.020

chr18CNV94648985486490536768190.0180.0360.3330.121

chr18CNV95653601216536292628050.0000.0360.0170.020

chr19CNV961564174715690364486170.0910.0000.0000.025

chr19CNV972042378820473895501070.1090.0840.0330.076

chr19CNV9848066441483876803212390.1270.0840.1330.111

chr20CNV991472988214770129402470.0360.0360.0000.025

chr22CNV10017270615173765651059500.0360.0000.0330.020

chr22CNV10120718332215540588357260.2180.2530.1670.217

chr22CNV10223994408242398112454030.0550.0600.0330.051

Total7805201
Average76522
Median19624
Minimum90

The 102 CNV regions identified using the 197 DNA samples obtained from 3 ethnic groups.

CNV regions identified in 197 DNA samples. The 102 CNV regions identified using the 197 DNA samples obtained from 3 ethnic groups. Visualization of copy number regions identified in 15 randomly selected samples using Bead studio software. Randomly selected individual samples are listed on the X-axis and chromosomes on the Y-axis. Each colored bar represents one CNV call. Colors indicate copy number; where dark red indicates copy # 0, dark orange indicates copy # 1, dark blue indicates copy # 3, blue violet indicates copy # 4+. The thickness of the band indicates the length of the CNV region. We compared our loci to those in the structural variation table in the University of California Santa Cruz's (UCSC) database http://genome.ucsc.edu. Figure 2 shows an example on chromosome 22. Eighty-seven of the 102 loci that we identified overlapped with previously characterized CNVs, and 51/102 had been identified in more than one study. Of the 15 loci that had not been characterized previously, no obvious differences in size or prevalence were observed, suggesting that these are likely to be true CNVs and not the result of systematic error. In Figure 2 we have superimposed our CNV values for a portion of chromosome 22 over UCSC database data. Divergence from the baseline indicates regions of CNV, while amplitude represents prevalence. Figure 2 represents the overlap of our data (Illumina 550 + 650 K, Illumina 550 K, Illumina 650 K) at the top with previous reports at the bottom [7,11-13,21-25]. Lack of technical bias between Illumina 550 K and 650 K data is also shown since only the overlapping SNP set for the two platforms was used (Figure 2).
Figure 2

Comparison of chromosome 22 CNV regions identified using our 197 cell line samples compared with the results of previous studies in the UCSC genome browser. The Illumina 550 + 650 K (all samples combined), Illumina 550 K (CA, CEPH, HCA populations) and Illumina 650 K (AA samples) results in the diagram are from the present study, where spikes in the data indicate changes in CNV values. The "RefSeq Genes" row shows the locations of known genes in the human genome. In the "Structural Variation" tracks, green color indicates duplications, red indicates deletions, blue indicates both deletion and duplication, black represents an inversion and gray could be a gain or loss. "Conrad Dels" in the diagram are deletions detected by the analysis of SNP genotypes using the HapMap Phase I data, release 16c.1, CEU and YRI samples [11]. "Hinds Dels" are deletions observed during haploid hybridization analysis in 24 unrelated individuals from the Polymorphism Discovery Resource, selected for a SNP LD study [12]. "Iafrate CNVs" are from BAC microarray analysis of a population of 55 individuals [21]. "Locke CNVs" are CNV regions identified using array CGH in 269 HapMap individuals [22]. "McCarroll Dels" are deletions from genotype analysis, performed with HapMap Phase I data, release 16a [13]. "Redon CNVs" are from SNP and BAC microarray analysis of HapMap Phase II data [7]. "Sebat CNVs" represents oligonucleotide microarray analysis performed with a population of 20 normal individuals [7]. "Sharp CNVs" represents putative CNV regions detected by BAC microarray analysis in a population of 47 individuals [24]. The "Tuzun Fosmids" row consists of fosmid mapping sites detected by mapping paired-end sequences from a human fosmid DNA library [25].

Comparison of chromosome 22 CNV regions identified using our 197 cell line samples compared with the results of previous studies in the UCSC genome browser. The Illumina 550 + 650 K (all samples combined), Illumina 550 K (CA, CEPH, HCA populations) and Illumina 650 K (AA samples) results in the diagram are from the present study, where spikes in the data indicate changes in CNV values. The "RefSeq Genes" row shows the locations of known genes in the human genome. In the "Structural Variation" tracks, green color indicates duplications, red indicates deletions, blue indicates both deletion and duplication, black represents an inversion and gray could be a gain or loss. "Conrad Dels" in the diagram are deletions detected by the analysis of SNP genotypes using the HapMap Phase I data, release 16c.1, CEU and YRI samples [11]. "Hinds Dels" are deletions observed during haploid hybridization analysis in 24 unrelated individuals from the Polymorphism Discovery Resource, selected for a SNP LD study [12]. "Iafrate CNVs" are from BAC microarray analysis of a population of 55 individuals [21]. "Locke CNVs" are CNV regions identified using array CGH in 269 HapMap individuals [22]. "McCarroll Dels" are deletions from genotype analysis, performed with HapMap Phase I data, release 16a [13]. "Redon CNVs" are from SNP and BAC microarray analysis of HapMap Phase II data [7]. "Sebat CNVs" represents oligonucleotide microarray analysis performed with a population of 20 normal individuals [7]. "Sharp CNVs" represents putative CNV regions detected by BAC microarray analysis in a population of 47 individuals [24]. The "Tuzun Fosmids" row consists of fosmid mapping sites detected by mapping paired-end sequences from a human fosmid DNA library [25]. We also had Affymetrix U133 Plus 2.0 expression array data for the same 197 lymphoblastoid cell lines in which we assayed CNV [18], which made it possible for us to quantify the expression of genes linked to CNVs. Forty one expression array probesets mapped close to (within 500 kb) or within the 102 CNV regions that we identified. Of those 41 probesets, only 7 were expressed (17% (7/41) when compared to the 28% of the 54,000 probesets across the entire genome that were expressed) in the lymphoblastoid cell lines, with an average expression value above "100" using GCRMA normalization data (Additional file 1 Table S1).

Gemcitabine and AraC IC50 value associations with CNVs

To identify gene(s) that might contribute to variation in cytidine analog-induced cytotoxicity, we next analyzed associations between CNVs and IC50 values for gemcitabine and AraC. We had previously performed gemcitabine and AraC cytotoxicity studies using the same cell lines, as described previously [18]. IC50 values for both drugs were used as phenotypes for the association studies, and the analysis was adjusted for race and gender. The association studies with gemcitabine and AraC IC50 value phenotypes resulted in the identification of 5 and 6 CNV regions, respectively, that showed associations with p-values < 0.05 after 1000 permutations. Although these two drugs are similar in structure, we did not observe any common CNV regions that were significantly associated with IC50 values for both gemcitabine and AraC. The annotation and association results for gemcitabine and AraC are listed in Tables 2 and 3, respectively.
Table 2

Significant associations between gemcitabine IC50 values and CNV regions.

CNV IDPermutation P-valueChromosome: RegionNumber of SNPsLength(bp)SNP start-SNP endNearest Gene(s)
chr9CNV580.027chr9:12005741-120989162393176rs10809674-rs12351590TYRP1

chr1CNV50.031chr1:187795066-187809352414287rs382645-rs269747FAM5C

chr14CNV870.036chr14:106047919-106066496218578rs4562969-rs10151262ADAM6

chr11CNV740.042chr11:134154053-1342111532557101rs1289444-rs2155304B3GAT1

chr11CNV650.043chr11:5858528-58896881231161rs1377518-rs1453428OR52E4

Association of CNV regions with the gemcitabine phenotype.

Table 3

Significant associations between AraC IC50 values and CNV regions.

CNV IDPermutation P-valueChromosome: RegionNumber of SNPsLength (bp)SNP start-SNP endNearest Gene(s)
chr22CNV1020.013chr22:24092010-24128856536847rs713878-rs84486LRP5L

0.028chr22:23999581-240919361492356rs6004527-rs713847LRP5L

0.028chr22:24135224-2423981130104588rs13057190-rs2780695LRP5L

chr2CNV100.016chr2:89714801-898747469159946rs2847840-rs842164FLJ40330

chr12CNV760.020chr12:19364102-194421031678002rs12825616-rs2565666PLEKHA5

chr1CNV70.035chr1:243707190-24371398466795rs10737772-rs12121903SMYD3,KIF26B

chr2CNV110.044chr2:110243431-1103156181172188rs3789735-rs17463266NPHP1

chr12CNV830.047chr12:130368913-13037845159539rs12319995-rs4759915GPR133

Association of CNV regions with the AraC phenotype.

Significant associations between gemcitabine IC50 values and CNV regions. Association of CNV regions with the gemcitabine phenotype.

CNV validation using MLPA

To experimentally validate CNVs that were significantly associated with drug cytotoxicity, we tested the 11 CNVs with permutated p-values for association that were less than 0.05 using a high-throughput method designed to quantify genomic content, multiplex ligation-dependent probe amplification (MLPA). We were unable to amplify one CNV (chr14CNV87:106047919-106066496). When we compared MLPA to CNV values on a per-sample basis, 173/197 samples matched our original QuantiSNP CNV calls. Therefore, our original analysis had a zero false positive rate for all but two regions (1.23%), and the false negative rates ranged from 0% to 65% for the 10 regions that could be amplified (Additional file 1 Table S2). The chr2CNV10 CNV, as shown in Table S2, had an exceptionally high false negative rate, and we cannot rule out the possibility that a SNP beneath the MLPA probe might be responsible.

Functional characterization

To further characterize CNV regions, we identified genes within 500 Kb of the 11 regions that were associated with gemcitabine or AraC IC50 values. This relatively large region was chosen because previous studies have shown that cis-acting regulators can act over megabase distances [26,27]. Two of the 5 regions that were significantly associated with gemcitabine cytotoxicity contained a gene within 500 Kb of the CNV. Both regions were on chromosome 11. One chromosome 11 region, 134154053-134158019, had 25 SNPs associated with the CNV. The nearest gene, B3GAT1, was 372016 bp distant from the CNV (Table 2). The second chromosome 11 region, 5858528-5889688, had 12 SNPs associated with the CNV and the OR52E4 gene overlapped this region. In the case of AraC, one region (divided into 3 sub-regions, as shown in Table 3) associated with AraC cytotoxicity was located on chromosome 22, with the nearest gene LRP5L (low density lipoprotein receptor-related protein 5-like), more than 3486 bases distant. Genes associated with chromosome 2 and 12 regions were NPHP1 [nephronophthisis 1 (juvenile)], PLEKHA5 (pleckstrin homology domain containing, family A member 5) and GPR133 (G protein-coupled receptor 133), respectively. NPHP1 and PLEKHA5 overlapped the CNV regions, whereas GPR133 was 179 Kb away from the CNV region associated with AraC. The region located on chromosome 1 overlapped the KIF26B gene and was 295 Kb away from a gene encoding a histone methyltransferase, SMYD3. Significant associations between AraC IC50 values and CNV regions. Association of CNV regions with the AraC phenotype. Since we had Affymetrix U133 Plus 2.0 mRNA expression array data for the same lymphoblastoid cell lines in which we had assayed CNV [18]; we determined average expression levels for all genes associated with the CNVs. Although there were no probesets associated with ORE52E4, we found that the average mRNA expression levels after GCRMA normalization for probesets linked with B3GAT1, TYRP1, FAM5C, ADAM6, LRP5L, PLEKHA5, KIF26B, NPHP1, FLJ40330, and GPR133 were less than 10, suggesting either low or no expression of these genes in lymphoblastoid cells. Therefore, no further analysis was conducted with these candidates. However, one of the genes associated with AraC IC50 had an average expression of 247 (SMYD3) in our cell lines and the expression of this gene was also associated with AraC IC50, with a p-value of 0.0027. The Chr1CNV7 association did not pass Bonferroni correction. However, the association study was a "discovery" study - to be followed by functional genomics validation. Hence, we selected the SMYD3 candidate gene based on expression and possible biological relevance to cancer.

SMYD3 functional validation

It is known that knockdown of SMYD3 inhibits cervical carcinoma cell growth and invasion [28] and that mutations in the 5'-flanking region of SMYD3 may represent a risk factor for human cancer [29]. It is also known that SMYD3 plays crucial roles in HeLa cell proliferation and migration/invasion, so it has been suggested that it may be a useful therapeutic target in human cervical carcinomas [30]. As shown in Table 3, a CNV region located on chromosome 1 close to the SMYD3 gene (chr1CNV7) is associated with AraC IC50 value, with a permutation p-value of 0.035. The chr1CNV7 deletion occurred in 3 samples, all from Caucasian subjects, so we also tested the association in only this ethnic group. Likelihood ratio testing of linear regression of AraC log IC50 values against gender and storage time, with or without the relevant CNV values, gave a p-value of 0.019. In addition, analysis of IC50 values with the chr1CNV7 region showed that deletion of this CNV region was associated with an increase in the IC50 value for AraC (Table 4). To confirm results obtained from the association study, we also performed specific siRNA knockdown of the SMYD3 gene in human MIApaca-2 pancreatic cancer cells, followed by cytotoxicity studies. Down regulation of SMYD3 mRNA by siRNA desensitized the pancreatic cancer cells to AraC (P-value= 0.0011) when compared with cells transfected with negative control siRNA (Figure 3), a directional change consistent with the results of our CNV association study. Although, SMYD3 did not show a significant association with gemcitabine cytotoxicity during the association study, we also performed functional studies with that drug. We found that knockdown of the SMYD3 gene also made MIApaca-2 cells more resistant to gemcitabine (P-value = 0.0002) as shown in Figure 3. The Chr1CNV7 copy number was associated with gemcitabine IC50 values, (r-value = -0.01 and p-value = 0.804). While the p-value of association was insignificant, the directionality of the association was consistent with the results of the knockdown studies. In summary, knockdown of SMYD3, followed by cytotoxicity studies with both drugs, showed significant deviations, but the deviation was small for AraC when compared to that after gemcitabine treatment (Figure 3).
Table 4

AraC IC50 value mean, median and standard deviations for copy number values = 1 (deletion) and copy number value = 2 (normal).

AraC IC50 values (μmol)
CNV ValueMeanMedianStd Dev

11.1771.0920.486
2-0.017-0.1800.998

Summary of IC50 values for copy number = 1 and copy number = 2.

Figure 3

Functional characterization for the SMYD3 candidate gene with specific siRNA knockdown. (A) Knockdown of the SMYD3 gene in human MIApaca-2 pancreatic cancer cells resulted in increased resistance to both AraC and gemcitabine as determined by MTS assay. SEM values for 3 independent experiments were so small that they are contained within the symbols. (B) Quantitative RT-PCR for SMYD3 in MIApaca-2 cells. Error bars represent SEM values for three independent experiments.

AraC IC50 value mean, median and standard deviations for copy number values = 1 (deletion) and copy number value = 2 (normal). Summary of IC50 values for copy number = 1 and copy number = 2. Functional characterization for the SMYD3 candidate gene with specific siRNA knockdown. (A) Knockdown of the SMYD3 gene in human MIApaca-2 pancreatic cancer cells resulted in increased resistance to both AraC and gemcitabine as determined by MTS assay. SEM values for 3 independent experiments were so small that they are contained within the symbols. (B) Quantitative RT-PCR for SMYD3 in MIApaca-2 cells. Error bars represent SEM values for three independent experiments. Finally, since the CNV close to the SMYD3 gene was significantly associated with IC50, and since the functional validation studies of SMYD3 agreed with the association study results, we expected that variation of mRNA expression for SMYD3 in the cell lines might be significantly correlated with IC50. SMYD3 mRNA expression was significantly associated with IC50 value, with a p-value of 0.028.

Discussion

CNV can occur as a result of genomic rearrangements like deletion, duplication, inversion, and translocation. Features such as the presence of repetitive elements, size of the sequences, GC content, similarity and distance between the sequences play a critical role in determining susceptibility of regions to these rearrangement events [31]. Many methods have been used successfully to identify CNV regions across the genome. The high density of data from SNP platforms such as those of supplied by Illumina or Affymetrix has not only allowed us to perform genome-wide association studies to identify genotypes that are associated with a phenotype, but have also made it possible to quantify SNP alleles (log R ratios and B allele frequencies) for CNV. These log R ratios and B allele frequencies can be used to discover CNV by applying computational algorithms. However, despite advances in computational methods, the identification of intermediate sized CNVs (50 bp to 50 Kb) remains a challenge; since detection of CNVs is based on the density and spacing of probes on the platform. In this study, we have used 550,000 SNP markers (Illumina 550 K Bead chip) to discover CNV in 197 human lymphoblastoid cell lines obtained from ethnically diverse populations. The average distance between markers on the Illumina 550 K chip is 5.8 kb, and the average size of CNVs identified in our cell lines was more than 76,000 bp, indicating that smaller CNV may be underrepresented in our study. The Illumina 550 K SNP chip was designed, in part, to interrogate gene rich regions [32]; which is an advantage with regard to a lower probability of our missing a gene-related CNV. We identified 775 CNVs in 102 regions with minor allele frequencies > 1% (Table 1). Variables such as array, coverage, intensity and CNV calling algorithms may all give different CNV calls. Therefore, we used previous copy number findings represented in the structural variation table in the UCSC database http://genome.ucsc.edu to compare with our CNVs and found that the vast majority of variant loci (87 of 102) were found in other publications, and 51/102 were represented in multiple studies. Although we did find agreement for many of our CNVs with previously reported variants, there were other CNV regions previously reported that were not identified in our study. This could be due to our stringent criteria for CNVs. It also could be due to the different platforms, methods and study populations used in different studies. In addition rare events are usually not reported. It is known that variation in response to chemotherapy results from many factors, including gender, race, environmental factors and DNA sequence variation. DNA sequence variation may include both SNPs and CNV. Therefore, the presence of CNV is an important factor that may contribute to variation in response to chemotherapy. Specifically, the existence of CNV within or near a gene might result in differences in mRNA and protein expression. To identify possible pharmacogenomic candidate genes that might be affected by CNV, we tested the association of CNV with a drug response phenotype (IC50) for gemcitabine and AraC using a 197 lymphoblastoid cell line-based model system designed to make it possible to study common human genetic variation. Although tumor genome is critical for understanding response to therapy and disease pathophysiology, the germline genome is also critical, especially for drug response phenotypes. Obviously, we understand that these lymphoblastoid cell lines were EBV transformed from normal individuals, and that they were neither collected from cancer patients nor tumor tissues. Hence, we might miss some candidate genes that may be specific to cancer. However, lymphoblastoid cell lines have been shown by several groups, including ours, to be useful for identifying candidate genes or genetic variation associated with drug-induced cytotoxicity [18,33-37]. Therefore, in this study we also used these lymphoblastoid cell lines to study the possible contribution of CNVs to variation in drug response. To begin the process of understanding how variation in copy number might affect drug response phenotypes for gemcitabine and AraC, we correlated 775 CNVs with IC50 values in 197 lymphoblastoid cell lines. 11/102 regions were associated with gemcitabine and AraC IC50 values (Tables 2 and 3). We then performed MLPA to compare with and to validate the in-silico QuantiSNP CNV calls. Since we had Affymetrix U133 Plus 2.0 expression array data for the same lymphoblastoid cell lines [18]; we determined expression levels for genes surrounding the 11 CNV regions. The B3GAT1, LRP5L, PLEKHA5, KIF26B, NPHP1, TYRP1, FAM5C, ADAM6, FLJ40330, and GPR133 genes had low expression. Only one CNV on chromosome 1 (chr1CNV7) had a gene (SMYD3) in close proximity that displayed high expression in the lymphoblastoid cell lines. SMYD3, found on the q arm of chromosome 1, encodes an alternatively spliced transcript for 369 or 428 amino acids protein. Hamamoto et. al. first described SMYD3's histone methyltransferase activity with specificity for di- and tri- methylation of lys4 on Histone 3. SMYD3's histone methyltransferase activity results in transcription induction for at least 60 targets across the genome [38]. Enhanced expression of SMYD3 has been observed in numerous tumors including colorectal, hepatocellular [38] and breast cancer [39]. Overexpression experiments of SMYD3 have repeatedly shown to increase the rate of cell proliferation [38-40], while knockdown experiments result in decrease cell proliferation and cell migration while increasing apoptosis [28,41]. Our studies indicated the association of the chr1CNV7 with AraC cytotoxicity as well as correlation with SMYD3 expression. Functional validations of our results were performed with knockdown of the SMYD3 gene in pancreatic cancer cell lines. Knockdown made the cells more resistant to AraC, confirming the association study results, and also made them resistant to gemcitabine. Our results suggest that joining association studies with functional validation experiments may help to identify biomarkers for disease or response to therapy.

Conclusions

We took the advantage of genome-wide SNP data obtained with 550 K Illumina Bead Chips to identify CNV regions across the genome in 197 lymphoblastoid cell lines. Association studies with gemcitabine and AraC cytotoxicity phenotypes identified CNV regions that might be associated with cytotoxicity for these two drugs. In this study we investigated the role of CNVs together with expression of neighboring genes (B3GAT1, LRP5L, PLEKHA5, KIF26B, TYRP1, FAM5C, ADAM6, FLJ40330, NPHP1, OR52E4, GPR133 and SMYD3) with drug response phenotypes. Analysis in lymphoblastoid cell lines and functional validation in cancer cell lines suggest the probable role of SMYD3 to AraC and gemcitabine drug response phenotype. The current study provides additional information with regard to the contribution of CNVs to variation in drug response for two important antineoplastic drugs and indicates that the assay of CNV should be included in pharmacogenomic studies.

Authors' contributions

The conception of the study and interpretation of the analysis was performed conjointly by SH, KRK, CH, JPK, LL, LW and RW. Writing of the manuscript was performed by KRK, LW and SH and RW. KRK and CH performed the computational and statistical analysis, SH and LL performed the laboratory-based experiments. All of the authors read, corrected and approved the final manuscript.

Additional file 1

Additional file 1, Methods Section, Table S1, Table S2 Click here for file
  41 in total

Review 1.  Methods and strategies for analyzing copy number variation using DNA microarrays.

Authors:  Nigel P Carter
Journal:  Nat Genet       Date:  2007-07       Impact factor: 38.330

2.  Genotype, haplotype and copy-number variation in worldwide human populations.

Authors:  Mattias Jakobsson; Sonja W Scholz; Paul Scheet; J Raphael Gibbs; Jenna M VanLiere; Hon-Chung Fung; Zachary A Szpiech; James H Degnan; Kai Wang; Rita Guerreiro; Jose M Bras; Jennifer C Schymick; Dena G Hernandez; Bryan J Traynor; Javier Simon-Sanchez; Mar Matarin; Angela Britton; Joyce van de Leemput; Ian Rafferty; Maja Bucan; Howard M Cann; John A Hardy; Noah A Rosenberg; Andrew B Singleton
Journal:  Nature       Date:  2008-02-21       Impact factor: 49.962

3.  Knockdown of SMYD3 by RNA interference inhibits cervical carcinoma cell growth and invasion in vitro.

Authors:  Shu-zhen Wang; Xue-gang Luo; Jing Shen; Jia-ning Zou; Yun-hua Lu; Tao Xi
Journal:  BMB Rep       Date:  2008-04-30       Impact factor: 4.778

4.  PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data.

Authors:  Kai Wang; Mingyao Li; Dexter Hadley; Rui Liu; Joseph Glessner; Struan F A Grant; Hakon Hakonarson; Maja Bucan
Journal:  Genome Res       Date:  2007-10-05       Impact factor: 9.043

5.  Enhanced methyltransferase activity of SMYD3 by the cleavage of its N-terminal region in human cancer cells.

Authors:  F Pittella Silva; R Hamamoto; M Kunizaki; M Tsuge; Y Nakamura; Y Furukawa
Journal:  Oncogene       Date:  2007-11-12       Impact factor: 9.867

6.  Genetic variants contributing to daunorubicin-induced cytotoxicity.

Authors:  R Stephanie Huang; Shiwei Duan; Emily O Kistner; Wasim K Bleibel; Shannon M Delaney; Donna L Fackenthal; Soma Das; M Eileen Dolan
Journal:  Cancer Res       Date:  2008-05-01       Impact factor: 12.701

7.  Gemcitabine and cytosine arabinoside cytotoxicity: association with lymphoblastoid cell expression.

Authors:  Liang Li; Brooke Fridley; Krishna Kalari; Gregory Jenkins; Anthony Batzler; Stephanie Safgren; Michelle Hildebrandt; Matthew Ames; Daniel Schaid; Liewei Wang
Journal:  Cancer Res       Date:  2008-09-01       Impact factor: 12.701

8.  Glutathione S-transferase T1 and M1: gene sequence variation and functional genomics.

Authors:  Ann M Moyer; Oreste E Salavaggione; Scott J Hebbring; Irene Moon; Michelle A T Hildebrandt; Bruce W Eckloff; Daniel J Schaid; Eric D Wieben; Richard M Weinshilboum
Journal:  Clin Cancer Res       Date:  2007-12-01       Impact factor: 12.531

9.  Mapping genes that contribute to daunorubicin-induced cytotoxicity.

Authors:  Shiwei Duan; Wasim K Bleibel; Rong Stephanie Huang; Sunita J Shukla; Xiaolin Wu; Judith A Badner; M Eileen Dolan
Journal:  Cancer Res       Date:  2007-06-01       Impact factor: 12.701

10.  To what extent do scans of non-synonymous SNPs complement denser genome-wide association studies?

Authors:  David M Evans; Jeffrey C Barrett; Lon R Cardon
Journal:  Eur J Hum Genet       Date:  2008-01-16       Impact factor: 4.246

View more
  10 in total

Review 1.  Current progress in pharmacogenetics.

Authors:  John D Blakey; Ian P Hall
Journal:  Br J Clin Pharmacol       Date:  2011-06       Impact factor: 4.335

2.  Chemotherapy-induced toxicity is highly heritable in Drosophila melanogaster.

Authors:  Galina Kislukhin; Maura L Murphy; Mahtab Jafari; Anthony D Long
Journal:  Pharmacogenet Genomics       Date:  2012-04       Impact factor: 2.089

Review 3.  In vitro human cell line models to predict clinical response to anticancer drugs.

Authors:  Nifang Niu; Liewei Wang
Journal:  Pharmacogenomics       Date:  2015       Impact factor: 2.533

Review 4.  Lymphoblastoid cell lines in pharmacogenomic discovery and clinical translation.

Authors:  Heather E Wheeler; M Eileen Dolan
Journal:  Pharmacogenomics       Date:  2012-01       Impact factor: 2.533

Review 5.  Genomic and Phenomic Research in the 21st Century.

Authors:  Scott Hebbring
Journal:  Trends Genet       Date:  2018-10-17       Impact factor: 11.639

6.  Functional genomics based on germline genome-wide association studies of endocrine therapy for breast cancer.

Authors:  Jacqueline Zayas; Sisi Qin; Jia Yu; James N Ingle; Liewei Wang
Journal:  Pharmacogenomics       Date:  2020-06-16       Impact factor: 2.533

7.  Genetic association studies of copy-number variation: should assignment of copy number states precede testing?

Authors:  Patrick Breheny; Prabhakar Chalise; Anthony Batzler; Liewei Wang; Brooke L Fridley
Journal:  PLoS One       Date:  2012-04-06       Impact factor: 3.240

Review 8.  Hidden Markov Model-Based CNV Detection Algorithms for Illumina Genotyping Microarrays.

Authors:  Eric L Seiser; Federico Innocenti
Journal:  Cancer Inform       Date:  2015-01-27

Review 9.  Targeting histone methylation for cancer therapy: enzymes, inhibitors, biological activity and perspectives.

Authors:  Yongcheng Song; Fangrui Wu; Jingyu Wu
Journal:  J Hematol Oncol       Date:  2016-06-17       Impact factor: 17.388

10.  Statistical tests for detecting variance effects in quantitative trait studies.

Authors:  Bianca Dumitrascu; Gregory Darnell; Julien Ayroles; Barbara E Engelhardt
Journal:  Bioinformatics       Date:  2019-01-15       Impact factor: 6.937

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.