| Literature DB >> 22916110 |
Anamaria Crisan1, Rodrigo Goya, Gavin Ha, Jiarui Ding, Leah M Prentice, Arusha Oloumi, Janine Senz, Thomas Zeng, Kane Tse, Allen Delaney, Marco A Marra, David G Huntsman, Martin Hirst, Sam Aparicio, Sohrab Shah.
Abstract
Next generation sequencing has now enabled a cost-effective enumeration of the full mutational complement of a tumor genome-in particular single nucleotide variants (SNVs). Most current computational and statistical models for analyzing next generation sequencing data, however, do not account for cancer-specific biological properties, including somatic segmental copy number alterations (CNAs)-which require special treatment of the data. Here we present CoNAn-SNV (Copy Number Annotated SNV): a novel algorithm for the inference of single nucleotide variants (SNVs) that overlap copy number alterations. The method is based on modelling the notion that genomic regions of segmental duplication and amplification induce an extended genotype space where a subset of genotypes will exhibit heavily skewed allelic distributions in SNVs (and therefore render them undetectable by methods that assume diploidy). We introduce the concept of modelling allelic counts from sequencing data using a panel of Binomial mixture models where the number of mixtures for a given locus in the genome is informed by a discrete copy number state given as input. We applied CoNAn-SNV to a previously published whole genome shotgun data set obtained from a lobular breast cancer and show that it is able to discover 21 experimentally revalidated somatic non-synonymous mutations in a lobular breast cancer genome that were not detected using copy number insensitive SNV detection algorithms. Importantly, ROC analysis shows that the increased sensitivity of CoNAn-SNV does not result in disproportionate loss of specificity. This was also supported by analysis of a recently published lymphoma genome with a relatively quiescent karyotype, where CoNAn-SNV showed similar results to other callers except in regions of copy number gain where increased sensitivity was conferred. Our results indicate that in genomically unstable tumors, copy number annotation for SNV detection will be critical to fully characterize the mutational landscape of cancer genomes.Entities:
Mesh:
Year: 2012 PMID: 22916110 PMCID: PMC3420914 DOI: 10.1371/journal.pone.0041551
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Novel somatic variants detected in allele-specific amplification on chromosome 19q arm.
A somatic high-level amplification of the 19q arm is confirmed in NGS as well as Affymetrix SNP6.0 data. Novel somatic variants that were undetectable by samtools variant caller or SNVMix are highlighted on the karyogram. A) and B) indicate raw log copy number and b allele intensity, respectively, for normal DNA (from the same patient) on Affymetrix SNP 6.0 array. Blue colour indicates diploid (neutral) copy number state; the brighter the colour of red the higher the level of amplification. The three distinct bands in (B) indicate the presence of the alleles harbouring one of the three diploid genotypes: AA,AB and BB. C) and D) shows metastatic tumor copy number and b allele intensity respectively. The high level amplification on the 19q arm is accompanied by B allele intensities that show an absence of the AB heterozygous (middle) band that was present in the normal. E) shows allelic counts from next generation sequencing for the positions represented on the array as a proportion of depth; the allelic ratio is calculated by summing the total number of reads containing a variant at each position divided by the total depth at that position. F) shows the raw copy from the NGS data annotated with the amplification information and indicates the same sites of amplification revealed by orthogonal array platform.
Figure 2Overview of CoNAn-SNV model, inputs and outputs.
A) CoNAn-SNV genotype state-space expansion shown schematically. As higher levels of amplification are encountered, a larger genotype state-space is required to accommodate the different events that could arise due to amplifications (examples in Figure S1). B) CoNAn-SNV generative probabilistic graphical model. Circles represent random variables, and rounded squares represent fixed constants. Shaded nodes indicate observed data, such as allelic counts, while white nodes indicate quantities that are inferred during training though expectation maximisation. represents the CNA states of a segment (defined by the HMM describe in Shah et al. [6]) that spans position i; represents the genotype, which varies depending on CNA state; is the number of reads and is the number of reference reads; is prior existing over the genotypes and extends to accommodate CNA states; and is the genotype-specific Binomial parameter for genotype k in CNA state Ci. C) Example of CoNAn-SNV input and output. CoNAn-SNV takes allelic counts and as well is CNA segment data as input, while SNVMix requires only allelic counts. The same positions and counts are provided to both algorithms, with different results. In some cases CoNAn-SNV will call a variant with an aaaab or aaab genotype, which would otherwise be missed by SNVMix; also, however, CoNAn-SNV will also genotype a positions with abbbb rather than bb (as SNVMix [21] would), which allows for better interpretation of events.
Figure 3Venn diagram of predictions made by samtools, SNVMix, CoNAn-SNV.
Separating by CNA state shows an enrichment of CoNAn-SNV specific predictions in the GAIN, AMP and HLAMP segments of the genome.
Figure 4Discovery Flow Diagram.
Novel somatic variants identified by CoNAn-SNV.
| WGSS ANALYSIS | NORMAL VALIDATION | PRIMARY VALIDATION | METASTATIC VALIDATION | Transcriptome | |||||||||||||||
| ChromPos | AA Mutation | Gene Name | Impact | Ref Base | Mut Base | Depth | Nref count | p(snv) | Genotype | Depth | Freq. Nref | Depth | Freq. Nref | Depth | Freq. NRef | Ref. | Ref. Count | Nref. | Nref. Count |
|
| S177L | PEF1 | 1.11(M) | G | A | 17 | 3 | 0.8459 | aab | 11190 | 0.0441 | 617 | 0.0340 | 27129 | 0.4731 | G | 2 | N | 0 |
|
| Q100H | NCF2 | 0.975(L) | C | G | 85 | 10 | 1.0000 | aaaab | 39290 | 0.0039 | 1733 | 0.0069 | 60040 | 0.0977 | C | 25 | N | 0 |
|
| R539T | IPO9 | 2.025(H) | G | C | 101 | 8 | 0.8015 | aaaab | 18800 | 0.0026 | 274 | 0.0036 | 12231 | 0.0916 | G | 84 | C | 11 |
|
| E525T | NPAS2 | 1.68(M) | A | T | 54 | 7 | 0.8824 | aaab | 131465 | 0.0017 | 15627 | 0.0022 | 187617 | 0.1796 | A | 29 | T | 3 |
|
| S | AC133961.3 | No Uni ID. | C | G | 11 | 2 | 0.7930 | aab | 10999 | 0.0025 | 443 | 0.0045 | 15554 | 0.2468 | N | 0 | N | 0 |
|
| E68K | ARL10 | 0.55(L) | G | A | 51 | 9 | 0.9988 | aaab | 35722 | 0.0011 | 5911 | 0.0008 | 56243 | 0.1454 | G | 1 | N | 0 |
|
| E152 | TMED9 | Truncating | G | T | 55 | 9 | 0.9962 | aaab | 83887 | 0.0110 | 40283 | 0.0109 | 97795 | 0.2028 | G | 111 | T | 9 |
|
| E222K | TCTE1 | 0.955(L) | C | T | 28 | 5 | 0.8956 | aaab | 63261 | 0.0054 | 4076 | 0.0064 | 70470 | 0.2327 | N | 0 | N | 0 |
|
| N1794K | REV3L | 0.345(L) | G | T | 36 | 6 | 0.9933 | aab | 91581 | 0.0016 | 54683 | 0.0020 | 74407 | 0.2006 | G | 18 | T | 3 |
|
| R2115Q | ARID1B | 1.845(M) | G | A | 52 | 9 | 0.9987 | aaab | 304781 | 0.0024 | 118051 | 0.0022 | 449145 | 0.2051 | N | 0 | N | 0 |
|
| L561V | JHDM1D | 0.615(L) | G | C | 41 | 6 | 0.9647 | aaab | 305 | 0.0000 | 1 | 0.0000 | 137 | 0.2774 | G | 91 | C | 30 |
|
| I109F | TRPM5 | −0.08(N) | T | A | 20 | 5 | 0.9882 | aaab | 100659 | 0.0045 | 33904 | 0.0104 | 182328 | 0.1948 | N | 0 | N | 0 |
|
| V359V | SERPINA9 | 0.28(L) | G | A | 28 | 4 | 0.7858 | aaab | 61006 | 0.0219 | 8291 | 0.0226 | 73354 | 0.2324 | N | 0 | N | 0 |
|
| V982I | RTL1 | 0805(L) | C | T | 33 | 7 | 0.9962 | aaab | 107685 | 0.0135 | 6172 | 0.0146 | 102285 | 0.1799 | N | 0 | N | 0 |
|
| G313S | SLC25A23 | 1.83(M) | C | T | 15 | 3 | 0.9965 | aab | 46019 | 0.0048 | 6579 | 0.0050 | 43855 | 0.2087 | C | 1 | T | 2 |
|
| K187M | ZNF607 | NA | T | A | 77 | 10 | 0.9922 | aaaab | 2722 | 0.0026 | 174 | 0.1667 | 13589 | 0.1525 | T | 15 | A | 1 |
|
| E24 | PRR19 | Truncating | C | T | 50 | 7 | 0.9674 | aaaab | 47838 | 0.0018 | 2712 | 0.0026 | 53450 | 0.1260 | C | 5 | N | 0 |
|
| Q311 | ALDH16A1 | Truncating | G | T | 52 | 7 | 0.9522 | aaaab | 75066 | 0.0036 | 1935 | 0.0078 | 91868 | 0.1159 | N | 0 | N | 0 |
|
| E16Q | ZNF480 | 1.67(M) | G | C | 64 | 11 | 0.9999 | aaaab | 16867 | 0.0033 | 1133 | 0.0071 | 52154 | 0.0862 | G | 12 | C | 1 |
|
| V328M | LILRA2 | 1.91(M) | G | A | 53 | 8 | 0.9922 | aaaab | 145106 | 0.0029 | 60245 | 0.0028 | 264119 | 0.1177 | G | 6 | N | 0 |
|
| G348E | ZSCAN22 | 2.99(H) | G | A | 71 | 8 | 0.9201 | aaaab | 279784 | 0.0023 | 64866 | 0.0021 | 218744 | 0.0996 | G | 1 | N | 0 |
Somatic variants that were uniquely predicted by CoNAn-SNV and were successfully validated by targeted ultradeep amplicon sequencing.Impact refers to functional impact as determined by MutationAssessor.
Refers to a stop codon.
Effect of copy number amplifications on germline alleles.
| Normal | Metastatic | Transcriptome | |||||||||
| ChromPos | AA mutation | Gene | Depth | Freq. Nref | Depth | Freq. Nref | Ref. | Ref. Count | Nref. | Nref. Count | Chi sq. q-value |
|
| F218C | AL139152.7 | 17928 | 0.3169 | 18017 | 0.2164 | T | 55 | G | 3 | 1.27E-102 |
|
| I213V | MRPL9 | 5387 | 0.2046 | 8770 | 0.0409 | T | 154 | C | 28 | 4.29E-211 |
|
| R3530S | FLG | 61790 | 0.6191 | 78410 | 0.3981 | N | 0 | N | 0 | 0 |
|
| A76V | ZNF7 | 92012 | 0.4499 | 147007 | 0.2683 | C | 2 | N | 0 | 0 |
|
| C | AQP7 | 24722 | 0.2781 | 22104 | 0.1985 | N | 0 | N | 0 | 1.12E-89 |
|
| M1259T | SVIL | 128591 | 0.3867 | 110884 | 0.4808 | A | 6 | N | 0 | 0 |
|
| N477K | PKP3 | 37172 | 0.4601 | 57560 | 0.2907 | C | 11 | N | 0 | 0 |
|
| R357Q | USH1C | 101208 | 0.5595 | 58749 | 0.1548 | N | 0 | N | 0 | 0 |
|
| A79T | RIN1 | 75400 | 0.4044 | 97848 | 0.1738 | N | 0 | N | 0 | 0 |
|
| R710C | SIDT2 | 260320 | 0.5342 | 237372 | 0.1390 | C | 51 | T | 19 | 0 |
|
| E358Q | FEZ1 | 249388 | 0.5259 | 171924 | 0.1372 | C | 0 | G | 2 | 0 |
|
| R279P | STED8 | 208542 | 0.3071 | 175257 | 0.4182 | G | 17 | N | 0 | 0 |
|
| S | KRTAP4-15 | 1774 | 0.3207 | 4409 | 0.1851 | N | 0 | N | 0 | 1.51E-30 |
|
| R | DMKN | 209119 | 0.5478 | 247223 | 0.1696 | C | 5 | T | 2 | 0 |
|
| H426R | ZNF829 | 6402 | 0.4531 | 10867 | 0.1214 | T | 1 | C | 1 | 0 |
|
| R190Q | GIPR | 70793 | 0.4878 | 90262 | 0.1843 | G | 26 | A | 5 | 0 |
|
| R | LILRB3 | 34753 | 0.1592 | 46500 | 0.0642 | N | 0 | N | 0 | 0 |
These variants exhibit an amplification of the reference allele and show allelic skew, and as a result suggest an unbalanced allelic amplification over the tumor evolution.Impact refers to functional impact as determined by MutationAssessor.
Figure 5Receiver operator characteristic curve for CoNAn-SNV and SNVMix broken down by amplification status.