| Literature DB >> 23642077 |
Alexis Christoforides1, John D Carpten, Glen J Weiss, Michael J Demeure, Daniel D Von Hoff, David W Craig.
Abstract
BACKGROUND: The field of cancer genomics has rapidly adopted next-generation sequencing (NGS) in order to study and characterize malignant tumors with unprecedented resolution. In particular for cancer, one is often trying to identify somatic mutations--changes specific to a tumor and not within an individual's germline. However, false positive and false negative detections often result from lack of sufficient variant evidence, contamination of the biopsy by stromal tissue, sequencing errors, and the erroneous classification of germline variation as tumor-specific.Entities:
Mesh:
Year: 2013 PMID: 23642077 PMCID: PMC3751438 DOI: 10.1186/1471-2164-14-302
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Biological features currently supported by the Seurat software, and their respective input files
| 1. Normal DNA BAM | Normal RNA BAM | |
| 2. Tumor DNA BAM | Tumor RNA BAM | |
| 1. Normal DNA BAM | Normal RNA BAM | |
| 2. Tumor DNA BAM | Tumor RNA BAM | |
| 1. Normal DNA BAM | Normal RNA BAM | |
| 2. Tumor DNA BAM | Tumor RNA BAM | |
| 1. RNA BAM | -- | |
| 2. DNA BAM | | |
| 1. Normal DNA BAM | -- | |
| 2. Tumor DNA BAM |
Figure 1Performance of Seurat’s somatic point mutation detection with varying genomic coverage. Legend: The sensitivity (A) and false discovery rate (B) for Seurat’s somatic point mutation detection method, evaluated on simulated cancer genome data with no simulated normal tissue contamination. Each series represents the coverage used for the ‘normal’ genome data set, and the x-axis represents the ‘tumor’ genome average coverage.
Figure 2Performance of somatic point mutation detection with varying tumor purity. Legend: The sensitivity (A) and false discovery rate (B) for Seurat, VarScan 2, Strelka and Somaticsniper, given tumor DNA data of varying simulated tumor purity. Seurat reaches 90% sensitivity at ~45% tumor purity in sequence data with average genomic coverage of 128 × .
Summary of analysis results from the application of Seurat on an experimentally derived cancer dataset
| Average genomic coverage on normal tissue genome | 55× |
| Average genomic coverage on tumor tissue genome | 40× |
| Somatic base substitutions | 29526 |
| Somatic base substitutions (Quality > 20) | 17044 |
| Transition/Transversion ratio for somatic base substitutions | 1.433 |
| Transition/Transversion ratio for somatic base substitutions
(Quality > 20) | 1.922 |
| dbSNP build 135 rate | 0.146 |
| dbSNP build 135 rate (Quality > 20) | 0.088 |
| Somatic insertions | 1430 |
| Somatic deletions | 4067 |
| Somatic structural variance sites | 272 |
| Somatic loss of heterozygosity sites | 1523 |
| Non-synonymous/Synonymous mutation ratio | 0.00435 |
Detailed Legend: Summary of somatic mutation analysis details from the application of Seurat on a normal/tumor genome pair of a patient with a rare lymphoma. The dbSNP rate refers to the proportion of candidate somatic variants that are included in the public genomic variation database dbSNP [20]. This number is an indicator of known germline variants that were falsely identified as tumor-specific. The calculation of the transition/transversion and the non-synonymous/synonymous variant ratios was performed using snpEff [21].
Figure 3The effect of increased sequencing of the normal genome on Seurat’s somatic mutation detection. Legend: Demonstration of the effect of increased sequencing of the normal genome in a matched normal/tumor analysis using Seurat. We present three common scenarios: A) a locus with a true somatic variant, but presented with low frequency of the variant allele, because of mapping difficulty, low purity of the tumor biopsy or because of the variant being present only in a minor sub-clonal population. B) a locus with a potential false-positive call, because of erroneously-aligned variant evidence. C) a locus with a variant genotype in the normal genome, but with a coincidental lack of evidence causing it to appear as a tumor-only variant. In all three scenarios, the increase in sequencing data available for the normal genome updates the expectation of variant evidence (by altering the shape of the conjugate beta distribution) and consequently amplifies Seurat’s capability to correctly reject the last two cases and accept the first case.
Description of priors used in Seurat
| πι | Genotype prior probabilities | πvar = 0.0005 |
| πhet = 0.001 | ||
| πref = 1 –
(πhet + πvar) = 0.9985 | ||
| πsomatic = 0.0001 | ||
| πLOH = 0.0001 | ||
| αι, βι | Alpha and beta hyperparameters for the beta distributions of variant allele proportions | αref = 1,
βref = 700 |
| αvar = 700,
βvar = 1 | ||
| αnonhom = 1,
βnonhom = 1 | ||
| αsomatic = 1,
βsomatic = 1 | ||
| αAI = 1, βAI = 1 |
Detailed Legend: A list of the priors and hyperparameters used by Seurat, and their assigned values. The priors used for the genotype in the normal genome are the SNP frequencies for human diploid chromosomes, as calculated by Li et al. [24]. πsomatic and πLOH are high-end estimates of the frequency of somatic events, given that that the mutation profile of each individual cancer can vary wildly even within subtypes. At 0.0001, they expect 300,000 events through the human genome.