| Literature DB >> 27094764 |
David W Craig1, Sara Nasser1, Richard Corbett2, Simon K Chan2, Lisa Murray3, Christophe Legendre1, Waibhav Tembe1, Jonathan Adkins1, Nancy Kim4, Shukmei Wong1, Angela Baker1, Daniel Enriquez1, Stephanie Pond4, Erin Pleasance2, Andrew J Mungall2, Richard A Moore2, Timothy McDaniel4, Yussanne Ma2, Steven J M Jones2, Marco A Marra2, John D Carpten1, Winnie S Liang1.
Abstract
Large-scale multiplexed identification of somatic alterations in cancer has become feasible with next generation sequencing (NGS). However, calibration of NGS somatic analysis tools has been hampered by a lack of tumor/normal reference standards. We thus performed paired PCR-free whole genome sequencing of a matched metastatic melanoma cell line (COLO829) and normal across three lineages and across separate institutions, with independent library preparations, sequencing, and analysis. We generated mean mapped coverages of 99X for COLO829 and 103X for the paired normal across three institutions. Results were combined with previously generated data allowing for comparison to a fourth lineage on earlier NGS technology. Aggregate variant detection led to the identification of consensus variants, including key events that represent hallmark mutation types including amplified BRAF V600E, a CDK2NA small deletion, a 12 kb PTEN deletion, and a dinucleotide TERT promoter substitution. Overall, common events include >35,000 point mutations, 446 small insertion/deletions, and >6,000 genes affected by copy number changes. We present this reference to the community as an initial standard for enabling quantitative evaluation of somatic mutation pipelines across institutions.Entities:
Mesh:
Year: 2016 PMID: 27094764 PMCID: PMC4837349 DOI: 10.1038/srep24607
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
WGS metrics.
| Mapped reads | 2,226,992,988 | 2,214,792,580 | 2,626,814,448 | 2,516,031,030 | 2,569,616,093 | 2,890,861,222 | 1,703,131,862 | 2,092,759,741 | |
| Mapped paired Reads | 2,215,137,161 | 2,204,283,295 | 2,600,711,902 | 2,494,669,592 | 2,564,928,975 | 2,888,165,980 | 1,694,413,815 | 2,083,919,389 | |
| Median insert size | 341 | 341 | 400 | 409 | 348 | 336 | 197 | 203 | |
| % Alignment | 99.0 | 99.4 | 99.0 | 99.1 | 99.0 | 99.1 | 99.3 | 99.4 | |
| Average coverage | 79.42 | 79.37 | 101.51 | 97.22 | 129.19 | 120.41 | 24.29 | 37.56 | |
| Properly paired reads | 2,183,906,643 | 2,180,616,978 | 2,518,297,733 | 2,422,792,278 | 2,542,369,674 | 2,864,033,284 | 1,684,269,960 | 2,066,734,682 | |
| Mapped bases | 246,067,396,020 | 247,815,655,476 | 324,367,737,235 | 310,765,322,484 | 258,096,280,820 | 290,447,668,499 | 126,036,007,551 | 154,431,022,209 | |
| TGen Pipe | Ti/Tv Ratio | 4.255 | 3.88 | 3.97 | 4.08 | ||||
| dbSNP rate | 0.06 | 0.0435 | 0.0465 | 0.0663 | |||||
| #somatic SNVs called | 42,122 | 39,805 | 40,438 | 34,695 | |||||
| #somatic indels called | 3,505 | 6,939 | 6,315 | 36,635 | |||||
| GSC Pipe | Ti/Tv Ratio | 2.84 | 3.39 | 3.25 | 3.41 | ||||
| dbSNP rate | 0.043 | 0.032 | 0.042 | 0.033 | |||||
| #somatic SNVs called | 45,740 | 42,428 | 44,424 | 39,833 | |||||
| #somatic indels called | 748 | 693 | 683 | 424 | |||||
| Illumina Pipe | Ti/Tv Ratio | 3.0058 | 3.51 | 3.49 | 3.59 | ||||
| dbSNP rate | 0.046 | 0.039 | 0.042 | 0.037 | |||||
| #somatic SNVs called | 421,901 | 40,683 | 41,287 | 34,050 | |||||
| #somatic indels called | 597 | 547 | 546 | 327 | |||||
Figure 1Overview of data generation and collection.
*Illumina acquired extracted DNA from ATCC for library construction, sequencing, and analysis.
Figure 2Construction of a somatic truth set for COLO829.
(A) Identification of somatic reference SNVs. The total numbers of coding SNVs present in each truth set are shown. (B) Final somatic reference standard. Selected events are shown. Somatic coding SNVs are shown as black tick marks within the outermost ring. Consensus CNV gains are shown in green and consensus CNV losses are shown in red in the innermost circle.
Somatic alterations in the final reference standard.
| SNVs | 35,543 |
| Stop gained | 13 |
| Splice acceptor | 4 |
| Splice donor | 1 |
| Missense | 151 |
| 5′ UTR | 26 |
| 3′ UTR | 150 |
| Synonymous | 90 |
| TF binding site | 44 |
| Intragenic | 14,867 |
| Intergenic | 18,780 |
| Upstream 5 kb of gene | 770 |
| Downstream 5 kb of gene | 621 |
| Splice region | 26 |
| Small Deletions | 260 |
| Frameshift | 3 |
| 3′ UTR | 3 |
| Inframe deletion | 2 |
| Intragenic | 109 |
| Intergenic | 126 |
| Upstream 5 kb of gene | 12 |
| Downstream 5 kb of gene | 5 |
| Small Insertions | 186 |
| TF binding site | 1 |
| 3′ UTR | 2 |
| Intragenic | 84 |
| Intergenic | 91 |
| Upstream 5 kb of gene | 3 |
| Downstream 5 kb of gene | 5 |
| Total | 35,989 |
Figure 3Meta-analysis of the COLO829 somatic reference.
(A) Mutational signature of all SNVs. All substitutions are referred to by the pyrimidine context of the mutated base pair. (B) Genomic evolution of COLO829. A schematic of unique somatic alterations across each truth set is shown. Separate lobes reflect the level of divergence demonstrated by each truth set as measured by the number of unique somatic SNVs and indels for that data set. Examples of unique events are shown. The Pleasance sample did not demonstrate unique coding SNVs or indels (chromosomal coordinates of unique intronic events are shown).