| Literature DB >> 34117742 |
Angelo Fortunato1,2,3, Diego Mallo1,2,3, Shawn M Rupp1,2, Lorraine M King4, Timothy Hardman4, Joseph Y Lo5, Allison Hall6, Jeffrey R Marks4, E Shelley Hwang4, Carlo C Maley1,2,3.
Abstract
Most tissue collections of neoplasms are composed of formalin-fixed and paraffin-embedded (FFPE) excised tumor samples used for routine diagnostics. DNA sequencing is becoming increasingly important in cancer research and clinical management; however it is difficult to accurately sequence DNA from FFPE samples. We developed and validated a new bioinformatic pipeline to use existing variant-calling strategies to robustly identify somatic single nucleotide variants (SNVs) from whole exome sequencing using small amounts of DNA extracted from archival FFPE samples of breast cancers. We optimized this strategy using 28 pairs of technical replicates. After optimization, the mean similarity between replicates increased 5-fold, reaching 88% (range 0-100%), with a mean of 21.4 SNVs (range 1-68) per sample, representing a markedly superior performance to existing tools. We found that the SNV-identification accuracy declined when there was less than 40 ng of DNA available and that insertion-deletion variant calls are less reliable than single base substitutions. As the first application of the new algorithm, we compared samples of ductal carcinoma in situ of the breast to their adjacent invasive ductal carcinoma samples. We observed an increased number of mutations (paired-samples sign test, P < 0.05), and a higher genetic divergence in the invasive samples (paired-samples sign test, P < 0.01). Our method provides a significant improvement in detecting SNVs in FFPE samples over previous approaches.Entities:
Keywords: DCIS; NGS; exome; heterogeneity
Mesh:
Substances:
Year: 2021 PMID: 34117742 PMCID: PMC8574974 DOI: 10.1093/bib/bbab221
Source DB: PubMed Journal: Brief Bioinform ISSN: 1467-5463 Impact factor: 13.994
Figure 1
Breast cancer anatomy. Schematic representation of mammary gland anatomy and cancer development. The majority of breast tumors develop in the terminal duct lobular unit, 80% starting among ductal cells. Initially, the duct suffers a benign hypertrophic growth of cells that can progress into ductal carcinoma in situ (DCIS). In this phase the neoplasm is confined within the duct’s lumen and it is still clinically benign. Cancer cells can cross the duct wall’s boundaries, invading nearby tissues (IDC) and metastasizing.
Figure 2
Flowchart of the algorithm used to estimate the genetic heterogeneity between two samples and details of its optimization. Inputs: aligned sequences (BAM files) of the two samples (A, in red; and B, in blue) and their healthy tissue control (N, in green), population allele frequency data from the gnomAD database (single nucleotide polymorphisms, SNPs, in purple), and user-specified configuration parameters (gear icon). Outputs: estimate of the genetic heterogeneity between samples A and B and set of variants (level of detail user-specified). All parameters that control this pipeline are detailed in the Parameters box, accompanied by the range of values assayed during optimization between parentheses and the final set of optimized values in bold. The key step of this algorithm is the generation of two sets of private and common variants by comparing the variants in the two samples twice, alternatively filtering one of the sets and using all variants from the other.
Figure 3
Empirical optimization of the variant postprocessing algorithm. Each violin plot summarizes the distribution of optimization scores of 5 308 416 combinations of values of the 13 parameters that control the pipeline for one of the 28 technical replicates (same DNA sample processed twice independently). The optimization score indicates the two-dimensional euclidean distance to the theoretical optimum value of similarity between technical replicates (1) and proportion of final common variants that have a population allele frequency below 0.05 (1) relative to the maximum possible distance. After parameter optimization the similarity between the technical replicates was on average 88%, range 0–100% (x = score before optimization; —: score after optimization; colors indicate the amount (ng) of DNA used as template).
Similarity between technical replicates and number of variants. The similarity between technical replicates on average is 88%, range 0–100%. Number of total, common and private SNVs. Common SNVs: SNVs detected in both replicas of the same DNA samples; Private SNVs: SNVs detected only in one of the two DNA sequences of the same DNA
| Sample | Common | A + B | Total | Similarity (%) |
|---|---|---|---|---|
| DCIS-017 | 0 | 1 | 1 | 0 |
| DCIS-020-B3 | 19 | 8 | 27 | 70.4 |
| DCIS-020-B6 | 57 | 11 | 68 | 83.8 |
| DCIS-028-K12 | 4 | 0 | 4 | 100 |
| DCIS-029-D5 | 20 | 6 | 26 | 76.9 |
| DCIS-029-D8 | 11 | 2 | 13 | 84.6 |
| DCIS-050 | 8 | 1 | 9 | 88.9 |
| DCIS-064 | 28 | 2 | 30 | 93.3 |
| DCIS-080 | 7 | 0 | 7 | 100 |
| DCIS-094-B11 | 45 | 4 | 49 | 91.8 |
| DCIS-094-B7 | 35 | 1 | 36 | 97.2 |
| DCIS-122 | 3 | 0 | 3 | 100 |
| DCIS-135 | 9 | 2 | 11 | 81.8 |
| DCIS-163 | 1 | 0 | 1 | 100 |
| DCIS-164 | 44 | 2 | 46 | 95.7 |
| DCIS-168-C4 | 55 | 2 | 57 | 96.5 |
| DCIS-168-C8 | 41 | 0 | 41 | 100 |
| DCIS-171 | NA | NA | NA | NA |
| DCIS-178 | 8.0 | 0 | 8 | 100 |
| DCIS-211 | 12 | 0 | 12 | 100 |
| DCIS-213 | NA | NA | NA | NA |
| DCIS-222-B10 | 6 | 0 | 6 | 100 |
| DCIS-222-B6 | 1.0 | 0 | 1 | 100 |
| DCIS-225-A16 | 9 | 5 | 14 | 64.3 |
| DCIS-225-A6 | NA | NA | NA | NA |
| DCIS-227 | 6 | 0 | 6 | 100 |
| DCIS-250 | NA | NA | NA | NA |
| DCIS-267 | 33 | 5 | 38 | 86.8 |
| Average | 19.3 | 2.2 | 21.4 | 88.0 |
| SD | 18.2 | 2.9 | 19.8 | 21.4 |
Validation. Targeted sequencing confirmed that 89.6% (optimal filtering pipeline) and 86.3% (permissive filtering pipeline) of single nucleotide variants identified using our algorithm. Excluding MUC6 and low input amounts of DNA we validated 94.7% (O) or 93.2% (P) of variants. We found that the 14.2% (O) or 12% (P) of the confirmed variants are also present in the control samples with a frequency >10%. These variants may be SNPs
| Optimal filter (O) | Variants | Common (A and B) | Private (A or B) | Variants in controls (>10%) |
|---|---|---|---|---|
| Total number of SNVs | 154 | 146 (94.8%) | 8 (5.2%) | 33 (21.4%) |
| Validated variants | 138 (89.6%) | 133 (91.1%) | 5 (62.5%) | 32 (97%) |
| Nonvalidated variants | 16 (10.4%) | 13 (8.9%) | 3 (37.5%) | 1 (3%) |
| MUC6-excluded, DNA ≧ 40 ng SNVs | 113 | 110 (97.3%) | 3 (2.7%) | 16 (14.2%) |
| Validated variants | 107 (94.7%) | 105 (95.5%) | 2 (66.7%) | 15 (93.8%) |
| Nonvalidated variants | 6 (5.3%) | 5 (4.5%) | 1 (33.3%) | 1 (6.3%) |
| Permissive filter (P) | Variants | Common (A and B) | Private (A or B) | Variants in controls (>10%) |
| Total number of SNVs | 182 | 170 (93.4%) | 12 (6.6%) | 34 (18.7%) |
| Validated variants | 157 (86.3%) | 152 (89.4%) | 5 (41.7%) | 33 (97.1%) |
| Nonvalidated variants | 25 (13.7%) | 18 (10.6%) | 7 (10.6%) | 1 (2.9%) |
| MUC6-excluded, DNA ≧ 40 ng SNVs | 133 | 130 (97.7%) | 3 (2.3%) | 16 (12%) |
| Validated variants | 124 (93.2%) | 122 (93.8%) | 2 (66.7%) | 15 (93.8%) |
| Nonvalidated variants | 9 (6.8%) | 8 (6.2%) | 1 (33.3%) | 1 (6.3%) |
Figure 4
Mutational burden and genetic divergence. The average of the number of mutations of synchronous DCIS samples (10.40 ± 15.31 SD) is lower than the IDC samples (18.05 ± 31.48 SD) and there is a statistically significant difference between the two groups, paired-samples sign test, P < 0.05. We found a statistically significant difference in genetic divergence comparing two regions of synchronous DCIS (21.48% ± 17.54 SD) versus the divergence between synchronous DCIS IDC samples (44.51% ± 29.04 SD) within the same patient, paired-sample sign test and Mann–Whitney U test, P < 0.01. White circle = median, box limits indicate the 25th and 75th percentiles; whiskers extend 1.5 times the interquartile range from the 25th and 75th percentiles; curves represent density and extend to extreme values. Data points are plotted as dots.
Patients clinical data. Clinical data of the 22 patients included in the study. The histopathological analysis showed that 11 patients are DCIS whereas six are DCIS adjacent to invasive disease (DCIS Adj. to IDC) and five have invasive features (IDC). We selected FFPE samples of different ages (1999–2017). ER: estrogen receptors, PR: progesterone receptors, HER2: human epidermal growth factor receptor 2 expression is qualitatively estimated (nonpresent (NP), 0–8) using histochemistry stains
| Patient ID | Age | Race | Date | Tumor type | Histopathological classification | ER | PR | HER2 | DCIS Size (mm) | DCIS nuclear grade | Invasive present |
|---|---|---|---|---|---|---|---|---|---|---|---|
| DCIS-017 | 66 | B | 2013 | Pure DCIS | Cribriform, solid | − | + | NA | 21 | 3 | No |
| DCIS-020-B3 | 67 | W | 2014 | Pure DCIS | Cribriform, solid, micrpapillary, comedo | + | + | NA | 40 | 2 | No |
| DCIS-020-B6 | 67 | W | 2014 | Pure DCIS | Cribriform, solid, micrpapillary, comedo | + | + | NA | 40 | 2 | No |
| DCIS-029-D5 | 34 | W | 2012 | Pure DCIS | Comedo | + | + | NA | 83 | 3 | No |
| DCIS-029-D8 | 34 | W | 2012 | Pure DCIS | Comedo | + | + | NA | 83 | 3 | No |
| DCIS-050 | 52 | W | 2010 | Synchronous DCIS | Cribriform, solid | + | + | − | 10 | 2 | Yes |
| DCIS-064 | 50 | OTHER | 2015 | Synchronous DCIS | Comedo | + | + | + | 75 | 3 | No |
| DCIS-080 | 49 | W | 2013 | Synchronous DCIS | Solid, comedo | + | + | − | 21 | 3 | Yes |
| DCIS-094-B11 | 68 | W | 2013 | IDC | Cribriform, solid, miropapillary | − | − | − | NA | 3 | Yes |
| DCIS-094-B7 | 68 | W | 2013 | IDC | Cribriform | − | − | − | NA | 3 | Yes |
| DCIS-122 | 47 | W | 2002 | Pure DCIS | Cribriform, solid, comedo | NA | NA | NA | 95 | 3 | No |
| DCIS-135 | 48 | B | 2013 | Pure DCIS | Cribiform, solid | + | + | NA | 13 | 2 | No |
| DCIS-163 | 53 | W | 2013 | Synchronous DCIS | Cribriform, solid, comedo | + | + | − | 54 | 3 | Yes |
| DCIS-164 | 65 | B | 2015 | IDC | Micropapilly, comedo | + | + | − | NA | 3 | Yes |
| DCIS-168-C4 | 63 | W | 2016 | IDC | Cribiform, solid | + | + | − | NA | 2 | Yes |
| DCIS-168-C8 | 63 | W | 2016 | IDC | Cribiform, solid | + | + | − | NA | 2 | Yes |
| DCIS-171 | 66 | B | 2000 | Synchronous DCIS | Solid | − | + | − | 15 | 3 | Yes |
| DCIS-178 | 56 | W | 2011 | Synchronous DCIS | Comedo, solid, micropapillary, papillary | − | − | − | NA | 3 | Yes |
| DCIS-211 | 43 | H | 2011 | Pure DCIS | Cribriform, solid, comedo | + | + | NA | 24 | 3 | No |
| DCIS-213 | 68 | W | 2009 | Pure DCIS | Cribriform, micrpapillary, comedo | + | + | NA | 16 | 3 | No |
| DCIS-222-B10 | 41 | A | 2013 | Pure DCIS | Cribiform, papillary | + | + | NA | 40 | 2 | No |
| DCIS-222-B6 | 41 | A | 2013 | Pure DCIS | Cribiform, papillary | + | + | NA | 40 | 2 | No |
| DCIS-225-A16 | 62 | B | 2011 | Pure DCIS | Cribiform, solid | + | + | NA | 30 | 2 | No |
| DCIS-225-A6 | 62 | B | 2011 | Pure DCIS | Cribiform, solid | + | + | NA | 30 | 3 | No |
| DCIS-227 | 75 | B | 2012 | Pure DCIS | Cribiform, solid, comedo | + | + | NA | 63 | 3 | No |
| DCIS-250 | 56 | W | 1999 | IDC | Cribiform, comedo | + | − | NA | NA | 3 | Yes |
| DCIS-267 | 66 | W | 2017 | IDC | Solid | + | + | − | 13 | 3 | Yes |
| DCIS-28-K12 | 42 | A | 2014 | Pure DCIS | Comedo, micropapillary | − | − | NA | 124 | 3 | No |