| Literature DB >> 30949446 |
Rebecca F Halperin1, Winnie S Liang2, Sidharth Kulkarni1, Erica E Tassone2, Jonathan Adkins2, Daniel Enriquez2, Nhan L Tran3, Nicole C Hank4, James Newell5, Chinnappa Kodira6,7, Ronald Korn4,5, Michael E Berens8, Seungchan Kim9, Sara A Byron2.
Abstract
Archival tumor samples represent a rich resource of annotated specimens for translational genomics research. However, standard variant calling approaches require a matched normal sample from the same individual, which is often not available in the retrospective setting, making it difficult to distinguish between true somatic variants and individual-specific germline variants. Archival sections often contain adjacent normal tissue, but this tissue can include infiltrating tumor cells. As existing comparative somatic variant callers are designed to exclude variants present in the normal sample, a novel approach is required to leverage adjacent normal tissue with infiltrating tumor cells for somatic variant calling. Here we present lumosVar 2.0, a software package designed to jointly analyze multiple samples from the same patient, built upon our previous single sample tumor only variant caller lumosVar 1.0. The approach assumes that the allelic fraction of somatic variants and germline variants follow different patterns as tumor content and copy number state change. lumosVar 2.0 estimates allele specific copy number and tumor sample fractions from the data, and uses a to model to determine expected allelic fractions for somatic and germline variants and to classify variants accordingly. To evaluate the utility of lumosVar 2.0 to jointly call somatic variants with tumor and adjacent normal samples, we used a glioblastoma dataset with matched high and low tumor content and germline whole exome sequencing data (for true somatic variants) available for each patient. Both sensitivity and positive predictive value were improved when analyzing the high tumor and low tumor samples jointly compared to analyzing the samples individually or in-silico pooling of the two samples. Finally, we applied this approach to a set of breast and prostate archival tumor samples for which tumor blocks containing adjacent normal tissue were available for sequencing. Joint analysis using lumosVar 2.0 detected several variants, including known cancer hotspot mutations that were not detected by standard somatic variant calling tools using the adjacent tissue as presumed normal reference. Together, these results demonstrate the utility of leveraging paired tissue samples to improve somatic variant calling when a constitutional sample is not available.Entities:
Keywords: cancer genomics; cancer hotspot mutations; next generation sequencing; somatic variant calling; tumor exome sequencing; tumor-only sequencing
Year: 2019 PMID: 30949446 PMCID: PMC6435595 DOI: 10.3389/fonc.2019.00119
Source DB: PubMed Journal: Front Oncol ISSN: 2234-943X Impact factor: 6.244
Figure 1Somatic and germline variant allelic fractions example. (A) Two chromosomes are illustrated for this example. Both chromosomes are present in the diploid state in the normal cell. In the tumor cell, one chromosome is in the diploid state, and the other shows one-copy gain. Blue circles represent somatic variants on the diploid chromosome, green and red circles represent somatic variants on the minor and major alleles of the gained chromosome, respectively. Simulated allelic fractions of germline variants (brown/tan) and somatic variants are plotted for a simulated 20% tumor (D), 50% tumor (E) and 80% tumor (F) by chromosome position. In the 50% tumor example, somatic variants could easily be distinguished from germline on the diploid chromosome, but on the copy number gain chromosome, the allelic fractions of the somatic variants on the major allele overlap with the germline variants. By using both the 20 and 80% tumor samples, the somatic variants can be separated from the germline variants by allelic fraction on both the diploid chromosome (B) and the copy number gain chromosome (C).
Patient characteristics.
| GBM-003 | GBM | NEB | 183 | 387 | 177 |
| GBM-005 | GBM | NEB | 482 | 404 | 144 |
| GBM-006 | GBM | NEB | 441 | 248 | 115 |
| GBM-008 | GBM | NEB | 203 | 424 | 186 |
| GBM-009 | GBM | NEB | 199 | 387 | 153 |
| GBM-014 | GBM | NEB | 223 | 366 | 114 |
| GBM-016 | GBM | NEB | 416 | 376 | 261 |
| BHH01 | Breast | ANWS | 255 | 268 | NA |
| BHH02 | Breast | ANMD | 296 | 350 | NA |
| BHH03 | Breast | ANMD | 228 | 302 | NA |
| BHH04 | Breast | ANMD | 269 | 296 | NA |
| BHH06 | Breast | ANMD | 249 | 336 | NA |
| BHH09 | Breast | ANMD | 312 | 290 | NA |
| BHH11 | Breast | ANMD | 249 | 282 | NA |
| BHH15 | Breast | ANWS | 211 | 323 | NA |
| BHH16 | Breast | ANWS | 294 | 289 | NA |
| BHH21 | Breast | ANMD | 286 | 331 | NA |
| BHH22 | Breast | ANWS | 226 | 301 | NA |
| BHH24 | Breast | ANWS | 229 | 297 | NA |
| BHH25 | Breast | ANWS | 278 | 275 | NA |
| BHH26 | Breast | ANMD | 301 | 291 | NA |
| BHH27 | Breast | ANWS | 236 | 328 | NA |
| BHH28 | Breast | ANWS | 275 | 264 | NA |
| BHH08 | Breast | ANWS | 286 | 288 | NA |
| BHH18 | Breast | ANWS | 287 | 261 | NA |
| BHH20 | Breast | ANWS | 185 | 326 | NA |
| BHH23 | Breast | ANWS | 273 | 243 | NA |
| HHP01 | Prostate | ANWS | 202 | 216 | NA |
| HHP02 | Prostate | ANWS | 206 | 221 | NA |
| HHP03 | Prostate | ANWS | 261 | 184 | NA |
| HHP04 | Prostate | ANWS | 312 | 241 | NA |
| HHP05 | Prostate | ANWS | 247 | 232 | NA |
| HHP06 | Prostate | ANWS | 294 | 214 | NA |
| HHP07 | Prostate | ANWS | 265 | 254 | NA |
| HHP08 | Prostate | ANWS | 287 | 247 | NA |
| HHP09 | Prostate | ANWS | 302 | 228 | NA |
| HHP10 | Prostate | ANWS | 328 | 277 | NA |
| HHP11 | Prostate | ANWS | 329 | 274 | NA |
| HHP12 | Prostate | ANWS | 238 | 239 | NA |
| HHP13 | Prostate | ANWS | 299 | 285 | NA |
| HHP14 | Prostate | ANWS | 269 | 269 | NA |
| HHP16 | Prostate | ANWS | 363 | 330 | NA |
| HHP17 | Prostate | ANWS | 255 | 316 | NA |
| HHP18 | Prostate | ANWS | 224 | 258 | NA |
| HHP19 | Prostate | ANWS | 241 | 281 | NA |
| HHP20 | Prostate | ANWS | 208 | 307 | NA |
| HHP21 | Prostate | ANWS | 213 | 263 | NA |
NEB, non-enhancing biopsy; ANWS, adjacent normal whole slide; ANMD, adjacent normal macrodissected.
Figure 2Overview of lumosVar 2.0 analysis. The flow-chart on the left show the main steps in the analysis. Steps 0.1 and 0.2 are data preparation, and steps 1–7 are performed by lumosVar 2.0. The graph on the right illustrates the main inputs and outputs of each step. The color of the arrows coming from each box indicates the steps where that data is used as input, and the color of each box indicates the step where the data is generated.
Parameters and default values.
| 0.1,0.7 | Vector of length J of initial sample fractions. Default assumes two samples, with low and high tumor content. | |
| 1.5 | Determines shape of prior distribution of Δf | |
| π ( | 0.01,0.25,0.3,0.2,0.15,0.09 | Copy Number Priors |
| π ( | 0.25,0.5,0.25 | Minor Allele Copy Number Priors |
| 1E-5 | Segmentation significance cutoff | |
| ω | COSMIC | Number of cancer variants observed at the position |
| 1,000 Genomes and Exac | Population Allele Frequencies | |
| ρSNV, | 1E-5, 1E-6 | Constant for calculating prior somatic |
| 1E-5, 1E-6 | Population allele frequencies assigned to alleles not seen in input population | |
| Fmax−somatic | 2E-5 | Maximum population allele frequency to be considered a possible somatic variant |
| 10 | Minimum mapping quality to count read | |
| 5 | Minimum base quality to count base | |
| 0.8 | Minimum posterior probability of belonging to the PASS group to be called pass | |
| 0.8 | Minimum posterior probability of variant is somatic to be called somatic | |
| 0.8 | Minimum posterior probability of variant is germline to be called germline | |
| ξ | 3 | Number of parameter fitting iterations without new global minimum before stopping |
| λ | 5 | Weight of penalty for adding clonal variant group |
Parameters and notation.
| RT, RB | Total tumor read depth, B allele read depth |
| RC | Mean read depth of unmatched normals |
| πS, πAB, πAA | prior probability of somatic, germline heterozygous, germline homozygous variant |
| Mean mapping quality of reads supporting the A or B allele | |
| Mean base quality of bases supporting B allele | |
| X | Total number of exons, |
| Y | Number of heterozygous germline variants |
| Z | Number of somatic variants |
| G | Number of segments |
| K | Number of clonal variant groups |
| J | Number of samples from the patient |
| ηg | Number of bases within the bed file in segment g |
| ηd | Number of bases within the bed file with min ( |
| fjk | fraction of cells in the sample j with the variants in clone k |
| C | centering parameter |
| W | controls the spread of the allelic fraction distributions |
| N | total copy number |
| M | minor allele copy number |
| φS, φG | expected allele fraction of somatic or germline variant |
| IS, Ij | Index of clonal subset containing somatic variant or copy number variant |
| A | Allele of somatic variant (A = 1 for allele A = 2 for minor allele) |
| XCNA | Number of copy number altered exons |
| GAA, GAB | Germline homozygous or heterozygous genotype |
| O | Other genotype beside somatic, germline homozygous AA, or germline heterozygous AB |
| U | Unknown genotype due to poor mapping |
| k | Index of clonal subset {1, 2, …, K} |
| g | Index of segment {1, 2, …, G} |
| z | Index of somatic variant {1, 2, …, Z} |
| y | Index of heterozygous variant {1, 2, …, Y} |
| x | Index of exon {1, 2, …, X} |
Figure 3Simulation results comparing pooled and joint approaches. Top row of graphs shows the expected allele frequency of somatic (red) and germline variants (black) by tumor content (x-axis) for different copy number states. The middle two rows of graphs are based on simulation results using a mean coverage of 200X per sample (400X pooled). They show the false negative rate (FNR—simulated somatic variants not called somatic) and false positive rate (FPR—simulated germline heterozygous variants falsely called somatic) plotted by mean tumor content for the pooled (black triangles) and joint (colored circles) approaches. For the joint approach, the color of the circles represents the difference in tumor content between the two samples analyzed jointly. The bottom set of graphs shows the coverage required to detect at least 80% of the simulated somatic variants using two samples of different tumor content (shown on the x and y axis) using a joint approach (lower triangle of each heatmap) or using a single-sample approach on a merged sample with a tumor content that is the average of the two samples and coverage that is the sum of the two samples (upper triangle of heatmap). The color indicates the mean target coverage in the pooled approach, or the sum of the mean target coverage in the two-sample joint approach. Black squares indicate that <80% of the somatic variants were detected at the highest coverage simulated (6400X).
Figure 4Example lumosVar 2.0 output. (A) Log2 fold change of the mean exon read depths compared to the unmatched controls. (B) The estimated integer copy number states are plotted for each genomic segment by chromosome position. (C) The variant allele fractions are plotted by chromosome position. The gray and brown dots represent variants called as germline heterozygous by lumosVar 2.0 and the large colored dots represent variants called somatic by lumosVar 2.0. (D) Summary of the clonal variant group patterns. The thickness of the lines represents the proportion of copy number events assigned to each group and the size of each circle is proportional to the number mutations assigned to each group. (E) Sample fraction (estimated proportion of cells in the sample containing the variant) distribution of somatic mutations. (F) Number of exons determined to be in each copy number state, excluding diploid. (G) Number of somatic mutations detected in both samples (left bar), enhancing only (middle bar), and non-enhancing only (right bar). On all plots, the colors indicate the clonal variant group.
Evaluation results.
| GBM-003 | 256 | 0.95 | 0.65 | 0.61 | 0.34 | 0.81 | 0.91 | 0.50 | 0.72 | 0.73 |
| GBM-005 | 179 | 1.00 | 0.66 | 0.87 | 0.22 | 0.80 | 0.94 | 0.36 | 0.72 | 0.90 |
| GBM-006 | 150 | 0.85 | 0.45 | 0.61 | 0.16 | 0.70 | 0.83 | 0.27 | 0.54 | 0.70 |
| GBM-008 | 212 | 1.00 | 0.76 | 0.83 | 0.31 | 0.83 | 0.96 | 0.47 | 0.80 | 0.89 |
| GBM-009 | 179 | 0.99 | 0.77 | 0.81 | 0.25 | 0.72 | 0.95 | 0.40 | 0.74 | 0.88 |
| GBM-014 | 285 | 0.90 | 0.52 | 0.44 | 0.36 | 0.77 | 0.73 | 0.52 | 0.62 | 0.55 |
| GBM-016 | 301 | 0.85 | 0.70 | 0.77 | 0.30 | 0.84 | 0.91 | 0.44 | 0.77 | 0.84 |
The number of true somatic variants found in each patient, as well as the sensitivity (TPR), precision (PPV), and F1 score are shown for the filtering (Filt), single sample (Pool), and lumosVar 2.0 (Joint) approaches.
Figure 5Comparison of variants called in pooled vs. joint approach. The first column of graphs shows the estimated sample fractions of true somatic variants that were detected by both the pooled and joint approaches. The variants are colored by clonal variant groups. The other three columns show the sample fractions of variants that were called incorrectly only in the pooled approach (column 2), only in the joint approach (column 3), or incorrectly in both approaches (column 4). False positives variants are shown in magenta and false negatives in cyan.
Figure 6Clonal patterns and variant counts detected by lumosVar 2.0 in the archival dataset. The top half of each plot shows the summary of the clonal variant group patterns for each patient. Each line represents a clonal variant group and the thickness of the lines represents the proportion of copy number events assigned to each group and the size of each circle is proportional to the number mutations assigned to each group. The bottom half of each plot shows the number of somatic variants detected in the adjacent normal (AN) and tumor (T) samples, with the colors corresponding the clonal variant groups. The 8 patients in the top row had the adjacent normal tissue macrodissected from tumor containing slides and these patients typically have similar number of variants detected in the tumor and adjacent normal.
Figure 7Comparison of allelic fractions of variants in archival dataset by calling method. For each of the breast and prostate patients, the allele fractions in the tumor sample are plotted for the variants detected in each of the three approaches. The color of each point indicates the allele fraction of the variant in the adjacent normal sample. Most of the variants detected in the adjacent normal as reference approach, but not lumosVar 2.0 joint analysis (ANR NOT LVJ), have low allele fractions in both the tumor and the adjacent normal. The variants detected by lumosVar 2.0 joint analysis, but not adjacent normal as reference approach (LVJ NOT ANR) typically have higher allele fractions in the tumor, and lower allele fractions in the adjacent normal, though lumosVar 2.0 joint analysis also detects some variants that are lower allele fraction in the tumor and higher allele fraction in the adjacent normal in a few patients such as HPP01. The variants only called in the unmatched filtering (UPF only) approach have similar allele fractions in the tumor and adjacent normal samples. The 8 patients in the top row had the adjacent normal tissue macrodissected from tumor containing slides and these patients typically have more variants detected by lumosVar 2.0 joint analysis and not ANR compared to the remaining patients whose adjacent normal sample was procured from separate slides.
Hotspot mutation detection.
| BHH01 | PIK3CA | H1047R | 473,9 (0.02) | 378,201 (0.35) | Yes | LQC | 3 | 1,806 |
| HHP13 | PIK3CA | E545K | 332,0 (0) | 195,11 (0.05) | No | Yes | 3 | 332 |
| BHH06 | AKT1 | E17K | 121,23 (0.16) | 151,62 (0.29) | Yes | No | 3 | 295 |
| BHH24 | AKT1 | E17K | 123,1 (0.01) | 118,60 (0.34) | Yes | Yes | 3 | 295 |
| BHH28 | PIK3CA | H1047L | 417,0 (0) | 348,81 (0.19) | Yes | Yes | 3 | 262 |
| HHP19 | TP53 | G245S | 186,1 (0.01) | 61,76 (0.55) | Yes | Yes | 3 | 81 |
| BHH18 | PIK3CA | Q546E | 201,16 (0.07) | 164,38 (0.19) | Yes | No | 3 | 3 |
| BHH18 | PIK3CA | G106R | 127,8 (0.06) | 96,19 (0.17) | Yes | No | 3 | 2 |
| BHH25 | PIK3CA | E726K | 269,0 (0) | 268,17 (0.06) | No | Yes | 2 | 31 |
| BHH25 | SF3B1 | K666E | 210,0 (0) | 122,46 (0.27) | Yes | Yes | 2 | 19 |
Allelic depth of reads supporting the reference, alternate alleles in the adjacent normal sample.
Tumor sample.
Cancer hotspots database validation level (Cancer Hotspots).
Called by one of three paired somatic variant callers (strelka).