| Literature DB >> 26305677 |
Sarah Munchel1, Yen Hoang2,3, Yue Zhao1, Joseph Cottrell1, Brandy Klotzle1, Andrew K Godwin4, Devin Koestler5, Peter Beyerlein2, Jian-Bing Fan1, Marina Bibikova1, Jeremy Chien3.
Abstract
Current genomic studies are limited by the poor availability of fresh-frozen tissue samples. Although formalin-fixed diagnostic samples are in abundance, they are seldom used in current genomic studies because of the concern of formalin-fixation artifacts. Better characterization of these artifacts will allow the use of archived clinical specimens in translational and clinical research studies. To provide a systematic analysis of formalin-fixation artifacts on Illumina sequencing, we generated 26 DNA sequencing data sets from 13 pairs of matched formalin-fixed paraffin-embedded (FFPE) and fresh-frozen (FF) tissue samples. The results indicate high rate of concordant calls between matched FF/FFPE pairs at reference and variant positions in three commonly used sequencing approaches (whole genome, whole exome, and targeted exon sequencing). Global mismatch rates and C · G > T · A substitutions were comparable between matched FF/FFPE samples, and discordant rates were low (<0.26%) in all samples. Finally, low-pass whole genome sequencing produces similar pattern of copy number alterations between FF/FFPE pairs. The results from our studies suggest the potential use of diagnostic FFPE samples for cancer genomic studies to characterize and catalog variations in cancer genomes.Entities:
Keywords: FFPE DNA; cancer genomics; copy number alterations; whole exome sequencing; whole genome sequencing
Mesh:
Substances:
Year: 2015 PMID: 26305677 PMCID: PMC4694877 DOI: 10.18632/oncotarget.4671
Source DB: PubMed Journal: Oncotarget ISSN: 1949-2553
Figure 1Study design and datasets
To characterize sequencing artifacts in formalin-fixed, archived diagnostic samples, 13 pairs of patient-matched fresh frozen and formalin-fixed tissue samples were subjected to four popular sequencing approaches: whole exome sequencing (WXS), targeted exon sequencing (TES), whole genome sequencing (WGS), and low-pass whole genome sequencing. In addition, OmniExpress genotype array was used as an orthogonal platform to validate genotype calls.
Figure 2Concordance of base calls between matched FF and FFPE samples
A. Concordance of base calls between FF and FFPE is higher than 98% in all samples in all three data sets (targeted exon sequencing, TES; whole exome sequencing, WXS, and whole genome sequencing, WGS). B-C. Concordance of base calls between Illumina sequencing and OmiExpress array-based genotyping is higher than 96% in all FF samples and 86% in all FFPE samples. In this analysis, both reference and alternate alleles were evaluated.
Figure 3Concordance of single nucleotide variant calls and small insertion and deletion calls between matched FF and FFPE samples
A. Concordance of SNV calls between FF and FFPE is higher than 95% in all three data sets. B. Concordance of INDEL calls between FF and FFPE is higher than 87% in all data sets.
Somatic mutations
| WXS | Normal | Tumor | Same positions | Concordant | Somatic | Same positions* | Concordant | Discordant |
|---|---|---|---|---|---|---|---|---|
| 14119_FF | 19750 | 19852 | 19024 | 19013 | 387 | |||
| 14119_FP | 19655 | 18807 | 18301 | 18252 | 491 | 90 | 88 | 2 |
| 22285_FF | 15521 | 11253 | 10670 | 10472 | 461 | |||
| 22285_FP | 13920 | 10765 | 6296 | 6058 | 450 | 55 | 54 | 1 |
| 14119_FF | 1274 | 1259 | 1189 | 1188 | 30 | |||
| 14119_FP | 1266 | 1214 | 1126 | 1124 | 47 | 6 | 5 | 1 |
| 22285_FF | 2157 | 2052 | 1068 | 1063 | 42 | |||
| 22285_FP | 1219 | 881 | 416 | 408 | 49 | 5 | 5 | 0 |
SNVs in normal and tumor samples were identified in whole exome and targeted exon sequencing data sets. Overlap of positions between normal and tumor within FF or FP are indicated by “Same positions”. Overlap of positions between normal and tumor within FF and FP (all four samples) are indicated by “Same position*”. Among the overlap positions in all four samples, concordant and discordant somatic calls are shown next.
Variants found in at least 2 tumor samples from TES dataset
| Count | Gene | AAChange | CosmicID | 2474 | 2561 | 2640 | 2685 | 2938 | 3050 | 3356 | 4079 | 4191 | 14119 (N) | 14119 (T) | 22285 (N) | 22285 (T) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 8 | NEB | NM_001164507:c.T2885G:p.V962G | COSM304166 | |||||||||||||
| 8 | CNTNAP2 | NM_014141:c.A1765C:p.T589P | COSM1284196 | |||||||||||||
| 8 | MLL3 | NM_170606:c.A946T:p.T316S | ||||||||||||||
| 7 | CACNA2D1 | NM_000722:c.T620G:p.V207G | ||||||||||||||
| 6 | ADAMTS20 | NM_025003:c.C540A:p.N180K | ||||||||||||||
| 6 | GJB2 | NM_004004:c.T500G:p.V167G | ||||||||||||||
| 6 | TSC1 | NM_001162427:c.T2752G:p.L918V | ||||||||||||||
| 6 | UBAP2 | NM_018449:c.A1487C:p.H496P | ||||||||||||||
| 6 | IGFBP7 | NM_001253835:c.C31T:p.L11F | ||||||||||||||
| 5 | KCNJ12, KCNJ18 | NM_001194958:c.G889A:p.V297I | ||||||||||||||
| 5 | KCNJ12, KCNJ18 | NM_001194958:c.G906T:p.M302I | ||||||||||||||
| 5 | TACC3 | NM_006342:c.G427A:p.E143K | ||||||||||||||
| 5 | CNTNAP2 | NM_014141:c.A3352C:p.T1118P | COSM1284198 | |||||||||||||
| 5 | COL1A1 | NM_000088:c.A3772C:p.T1258P | ||||||||||||||
| 5 | DNAH7 | NM_018897:c.A475G:p.K159E | ||||||||||||||
| 5 | CTNNB1 | NM_001098209:c.C1995A:p.D665E | ||||||||||||||
| 5 | PTK2B | NM_173175:c.C2753G:p.A918G | ||||||||||||||
| 4 | ATM | NM_000051:c.G5557A:p.D1853N | COSM41596 | |||||||||||||
| 4 | KCNJ12, KCNJ18 | NM_001194958:c.G865C:p.E289Q | COSM312202 | |||||||||||||
| 4 | KCNJ12, KCNJ18 | NM_001194958:c.C869T:p.T290M | ||||||||||||||
| 4 | PKHD1L1 | NM_177531:c.A490G:p.I164V | ||||||||||||||
| 4 | XIRP2 | NM_001199144:c.A3286C:p.T1096P | ||||||||||||||
| 4 | XIRP2 | NM_001199144:c.A1873C:p.T625P | ||||||||||||||
| 4 | SULF2 | NM_001161841:c.G226A:p.A76T | COSM1412272 | |||||||||||||
| 4 | CHEK2 | NM_007194:c.G736T:p.V246L | COSM304712 | |||||||||||||
| 4 | TNK2 | NM_001010938:c.A1601C:p.D534A | ||||||||||||||
| 4 | NHS | NM_001136024:c.T113C:p.L38P | COSM1317240 | |||||||||||||
| 3 | MGMT | NM_002412:c.A520G:p.I174V | ||||||||||||||
| 3 | ERBB3 | NM_001982:c.A3355T:p.S1119C | ||||||||||||||
| 3 | BARD1 | NM_000465:c.G1670C:p.C557S | ||||||||||||||
| 3 | GABRA6 | NM_000811:c.C1210T:p.P404S | ||||||||||||||
| 3 | PKHD1 | NM_138694:c.C1736T:p.T579M | ||||||||||||||
| 3 | MLL3 | NM_170606:c.G2512A:p.G838S | ||||||||||||||
| 3 | MLL3 | NM_170606:c.A2185G:p.N729D | COSM1635198 | |||||||||||||
| 3 | MLL3 | NM_170606:c.A14062C:p.T4688P | ||||||||||||||
| 3 | ABCB1 | NM_000927:c.A61G:p.N21D | COSM1178512 | |||||||||||||
| 3 | PKHD1L1 | NM_177531:c.C4403T:p.S1468F | COSM304040 | |||||||||||||
| 3 | NOTCH1 | NM_017617:c.A580C:p.T194P | COSM1624741 | |||||||||||||
| 3 | PIK3CD | NM_005026:c.A1127G:p.E376G | ||||||||||||||
| 3 | KIAA1549L | NM_012194:c.A3656C:p.N1219T | ||||||||||||||
| 3 | ADAMTS20 | NM_025003:c.T3137G:p.V1046G | ||||||||||||||
| 3 | NOTCH3 | NM_000435:c.A982C:p.T328P | ||||||||||||||
| 3 | HJURP | NM_018410:c.G1643C:p.S548T | ||||||||||||||
| 3 | ADAMTS2 | NM_014244:c.G722A:p.R241H | ||||||||||||||
| 3 | USP17L7 | NM_001256869:c.G902T:p.R301L | ||||||||||||||
| 2 | MKI67 | NM_001145966:c.G7958T:p.R2653L | ||||||||||||||
| 2 | MKI67 | NM_001145966:c.A6656T:p.D2219V | COSM328282 | |||||||||||||
| 2 | MKI67 | NM_001145966:c.C4550T:p.P1517L | ||||||||||||||
| 2 | MKI67 | NM_001145966:c.G3595A:p.V1199M | COSM146354 | |||||||||||||
| 2 | MKI67 | NM_001145966:c.C2660T:p.T887I | COSM146356 | |||||||||||||
| 2 | MKI67 | NM_001145966:c.A811C:p.I271L | COSM146358 | |||||||||||||
| 2 | ANKRD30A | NM_052997:c.C374T:p.T125M | ||||||||||||||
| 2 | MUC2 | NM_002457:c.C3620T:p.T1207I | COSM1351086 | |||||||||||||
| 2 | PARP4 | NM_006437:c.A3176G:p.Q1059R | ||||||||||||||
| 2 | KCNJ12, KCNJ18 | NM_001194958:c.G782A:p.R261H | COSM312197 | |||||||||||||
| 2 | KCNJ12, KCNJ18 | NM_001194958:c.T785G:p.I262S | COSM312198 | |||||||||||||
| 2 | MUC4 | NM_018406:c.C6671T:p.P2224L | COSM1644167 | |||||||||||||
| 2 | MUC4 | NM_018406:c.C5854T:p.P1952S | COSM1042915 | |||||||||||||
| 2 | MUC4 | NM_018406:c.G5271C:p.Q1757H | COSM149606 | |||||||||||||
| 2 | MUC4 | NM_018406:c.T5971C:p.S1991P | COSM1042911 | |||||||||||||
| 2 | MAP3K1 | NM_005921:c.C2816G:p.S939C | ||||||||||||||
| 2 | PKHD1 | NM_138694:c.T1756G:p.F586V | ||||||||||||||
| 2 | MLL3 | NM_170606:c.C2315T:p.S772L | ||||||||||||||
| 2 | DNAH11 | NM_003777:c.G7573A:p.V2525I | ||||||||||||||
| 2 | DNAH11 | NM_003777:c.C1961G:p.S654C | ||||||||||||||
| 2 | DNAH11 | NM_003777:c.T7777C:p.Y2593H | ||||||||||||||
| 2 | PKHD1L1 | NM_177531:c.T11416G:p.C3806G | COSM304038 | |||||||||||||
| 2 | NOTCH1 | NM_017617:c.C2734T:p.R912W | ||||||||||||||
| 2 | NOTCH1 | NM_017617:c.A931C:p.T311P | ||||||||||||||
| 2 | ADAM12 | NM_003474:c.G212A:p.R71Q | ||||||||||||||
| 2 | FAT3 | NM_001008781:c.C1235T:p.S412F | ||||||||||||||
| 2 | RASAL1 | NM_001193521:c.C173T:p.T58M | ||||||||||||||
| 2 | HERC1 | NM_003922:c.G3415T:p.V1139L | ||||||||||||||
| 2 | HERC1 | NM_003922:c.C9455T:p.S3152F | ||||||||||||||
| 2 | XIRP2 | NM_001199145:c.A1603G:p.R535G | ||||||||||||||
| 2 | ABCC1 | NM_004996:c.G2012T:p.G671V | ||||||||||||||
| 2 | DNMT1 | NM_001130823:c.G206A:p.R69H | ||||||||||||||
| 2 | LRP1B | NM_018557:c.C4174T:p.L1392F | ||||||||||||||
| 2 | LRP1B | NM_018557:c.T8707G:p.C2903G | COSM1631297 | |||||||||||||
| 2 | PIKFYVE | NM_015040:c.A1849G:p.M617V | ||||||||||||||
| 2 | PIKFYVE | NM_015040:c.T3097G:p.S1033A | ||||||||||||||
| 2 | DSP | NM_001008844:c.A913T:p.I305F | COSM1685467 | |||||||||||||
| 2 | FLNC | NM_001127487:c.G4700A:p.R1567Q | ||||||||||||||
| 2 | CDKN2A | NM_000077:c.G442A:p.A148T | ||||||||||||||
| 2 | GPR179 | NM_001004334:c.C2650T:p.R884W | ||||||||||||||
| 2 | ADAMTS2 | NM_014244:c.G2480A:p.R827Q | ||||||||||||||
| 2 | FLNA | NM_001456:c.A5747C:p.Y1916S |
Table legend: Samples with indicated variants are shown in red. Gray blocks represent absence of variants in the samples. Only variants found in matching FF/FFPE pair are shown here. Genes known to be highly variant (such as MUC16, MUC4, HLAs) or variants that are present in at least 10% of 1000 Genome Project or detected in normal samples (14119 or 22285) are filtered out. Variants are listed in the order from the highest to the lowest frequency in the dataset. It is important to note that not all variants listed here are expected to be somatic. Since normal matching samples were not available for the majority of samples (except samples 14119 and 22285), the list may include uncommon germline variants.
Figure 4Analysis of Copy Number Variations (CNVs) in FF and FFPE tumor pairs
Copy number variations in tumor samples were determined using QDNAseq and visualized by Integrative Genome Viewer. A. Whole genome view with copy number loss (blue) and copy number gain (red) regions are highlighted for all 7 pairs of tumor samples. B. Copy number variations in Chromosome 2 are shown for all 7 pairs of FF and FFPE samples. C. Copy number profiles of FF and FFPE (FP) tumor groups show similar pattern of gains and losses. Frequency of copy number alterations are plotted on Y-axis, and chromosome coordinates are plotted on X-axis and include chromosome 1 to 22. Plot was generated using CGHbase R package.
Figure 5Hierarchical clustering and correlation analysis of copy number alterations in FF and FFPE samples
A. Non-supervised hierarchical clustering was performed using aheatmap R package with input data created by CGHregions R package. The results show clustering of FF/FFPE pairs, indicating similarity between paired samples. B. Pearson correlation was performed to assess the correlation across all samples, and results indicate paired samples are highly correlated.
Figure 6Characterization of FFPE artifacts
A. Combined rates of C > T and G > A transition in TES and WXS data sets show no significant difference between FF and FFPE samples when all variant positions are analyzed (P = 0.4872 and P = 0.1845, respectively). B. In contrast, C > T or G > A (in reverse strand) substitution rates at discordant positions is marginally higher in FFPE samples in WXS data sets (P = 0.0201) but not in TES data sets (P = 0.2531). C. Global mismatch rates are slightly higher in FFPE samples compared to FF samples in both TES and WXS data sets, but they are not significant (P = 0.0704) in TES data sets and significant in WXS data sets (P = 0.0392). D. C > T substitutions are substantially higher in CpG sites in both FF and FFPE samples than any other CpN sites (P < 0.0001, One-way ANOVA with Dunnett Mulitple comparisons test). No significant difference in C > T transition at CpG or CpN sites are observed between FF and FFPE samples. E. C > T substitution rates in NpC sites are also comparable between FF and FFPE samples (P > 0.05 in all paired t tests). C > T substitution rate is plotted on the Y-axis and grouped according the subsequent (CpN) or antecedent base (NpC) on X-axis. All statistics are performed using two-tailed, parametric paired t test unless otherwise noted (GraphPad Prism Ver 6).