| Literature DB >> 26641479 |
Ensel Oh1,2,3, Yoon-La Choi1,2,4,5, Mi Jeong Kwon6,7, Ryong Nam Kim3, Yu Jin Kim1,2, Ji-Young Song1,2, Kyung Soo Jung1,5, Young Kee Shin3.
Abstract
Formalin fixing with paraffin embedding (FFPE) has been a standard sample preparation method for decades, and archival FFPE samples are still very useful resources. Nonetheless, the use of FFPE samples in cancer genome analysis using next-generation sequencing, which is a powerful technique for the identification of genomic alterations at the nucleotide level, has been challenging due to poor DNA quality and artificial sequence alterations. In this study, we performed whole-exome sequencing of matched frozen samples and FFPE samples of tissues from 4 cancer patients and compared the next-generation sequencing data obtained from these samples. The major differences between data obtained from the 2 types of sample were the shorter insert size and artificial base alterations in the FFPE samples. A high proportion of short inserts in the FFPE samples resulted in overlapping paired reads, which could lead to overestimation of certain variants; >20% of the inserts in the FFPE samples were double sequenced. A large number of soft clipped reads was found in the sequencing data of the FFPE samples, and about 30% of total bases were soft clipped. The artificial base alterations, C>T and G>A, were observed in FFPE samples only, and the alteration rate ranged from 200 to 1,200 per 1M bases when sequencing errors were removed. Although high-confidence mutation calls in the FFPE samples were compatible to that in the frozen samples, caution should be exercised in terms of the artifacts, especially for low-confidence calls. Despite the clearly observed artifacts, archival FFPE samples can be a good resource for discovery or validation of biomarkers in cancer research based on whole-exome sequencing.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26641479 PMCID: PMC4671711 DOI: 10.1371/journal.pone.0144162
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Whole-exome sequencing data statistics.
FFPE: formalin-fixed paraffin-embedded samples.
| type | Unique Reads | Properly mapped reads | Discordantly mapped reads | Unmapped Reads | covered target region | On-target average depth | 100-bp flanking region depth | % of off-target bases | |
|---|---|---|---|---|---|---|---|---|---|
| pair 1 | FFPE | 106,062,549 (100%) | 99,518,968 (93.8%) | 3,474,934 (3.3%) | 2,792,874 (2.6%) | 96% | 110× | 59× | 32% |
| frozen | 157,123,411 (100%) | 154,986,824 (98.6%) | 581,700 (0.4%) | 1,467,405 (0.9%) | 97% | 105× | 60× | 63% | |
| pair2 | FFPE | 127,734,224 (100%) | 111,576,762 (87.4%) | 7,202,130 (5.6%) | 8,946,783 (7.0%) | 98% | 71x | 22× | 40% |
| frozen | 184,279,822 (100%) | 181,875,732 (98.7%) | 1,891,524 (1.0%) | 498,601 (0.3%) | 99% | 152x | 68× | 48% | |
| pair3 | FFPE | 115,195,059 (100%) | 104,707,362 (90.9%) | 4,864,588 (4.2%) | 5,618,292 (4.9%) | 98% | 62x | 18× | 42% |
| Frozen | 156,655,931 (100%) | 154,185,112 (98.4%) | 1,940,720 (1.2%) | 518,664 (0.3%) | 99% | 125x | 55× | 49% | |
| pair4 | FFPE-1 | 98,896,552 (100%) | 87,624,296 (88.6%) | 4,786,078 (4.8%) | 5,785,040 (5.8%) | 97% | 77× | 41x | 41% |
| FFPE-2 | 33,520,501 (100%) | 12,795,876 (38.2%) | 3,404,488 (10.2%) | 16,622,576 (49.6%) | 91% | 9x | 5x | 41% | |
| frozen | 89,170,706 (100%) | 88,800,572 (99.6%) | 143,020 (0.2%) | 193,361 (0.2%) | 97% | 97x | 60x | 41% |
Fig 1Distribution of insert sizes and frequencies of double-sequenced regions.
The distribution of insert sizes was calculated from properly mapped paired reads. The distributions of formalin-fixed paraffin-embedded (FFPE) samples were skewed to the left because of a large number of short inserts (A). The short inserts generated abundant overlapping paired ends in FFPE samples (B), and soft clipped bases (C).
Fig 2Distribution of nucleotide quality according to mapping status.
The nucleotide quality scores were analyzed according to mapping status. In the mapped reads, there was no significant difference between the distributions of the 2 types of sample. In the unmapped reads, the blood and frozen samples showed a higher percentage of low-quality bases (≤ 20 on the Phred scale, black arrow) compared to the FFPE samples.
Fig 3Frequency of base transition in formalin-fixed paraffin-embedded (FFPE) samples.
A: A strategy for estimation of rates of sequencing errors/background DNA damage and overall base alteration rates. The discrepant bases at homozygous sites in control samples are likely to be sequencing errors or background DNA mutation. Conversely, discrepant bases at homozygous sites in frozen or FFPE tissue samples could be either sequencing errors/background DNA damage or base alteration caused by preservation methods. B: The rate of base alterations caused by formalin fixation. High frequencies of C>T and G>A were observed in FFPE tissue samples only.
Fig 4Comparison of somatic single nucleotide variant (SNV) calls.
The overall concordance of somatic SNV calls between FFPE-frozen paired samples. A. The overlapping fraction of somatic mutation calls. B. The overlapping fraction of somatic mutation calls in FFPE samples when the matched frozen samples have at least one supporting read at the mutation.