| Literature DB >> 33863366 |
Wendell Jones1, Binsheng Gong2, Natalia Novoradovskaya3, Dan Li2, Rebecca Kusko4, Todd A Richmond5, Donald J Johann6, Halil Bisgin7, Sayed Mohammad Ebrahim Sahraeian8, Pierre R Bushel9, Mehdi Pirooznia10, Katherine Wilkins11, Marco Chierici12, Wenjun Bao13, Lee Scott Basehore3, Anne Bergstrom Lucas11, Daniel Burgess14, Daniel J Butler15, Simon Cawley16, Chia-Jung Chang17, Guangchun Chen18, Tao Chen19, Yun-Ching Chen10, Daniel J Craig20, Angela Del Pozo21, Jonathan Foox15, Margherita Francescatto12, Yutao Fu22, Cesare Furlanello12, Kristina Giorda23, Kira P Grist24, Meijian Guan13, Yingyi Hao25, Scott Happe26, Gunjan Hariani24, Nathan Haseley27, Jeff Jasper24, Giuseppe Jurman12, David Philip Kreil28, Paweł Łabaj29,30, Kevin Lai31, Jianying Li32, Quan-Zhen Li18, Yulong Li33, Zhiguang Li33, Zhichao Liu2, Mario Solís López21,34, Kelci Miclaus13, Raymond Miller11, Vinay K Mittal22, Marghoob Mohiyuddin8, Carlos Pabón-Peña11, Barbara L Parsons35, Fujun Qiu36, Andreas Scherer34,37, Tieliu Shi38, Suzy Stiegelmeyer39, Chen Suo40, Nikola Tom34,41, Dong Wang2, Zhining Wen25, Leihong Wu2, Wenzhong Xiao17,42, Chang Xu43, Ying Yu44, Jiyang Zhang44, Yifan Zhang45, Zhihong Zhang36, Yuanting Zheng2,44, Christopher E Mason15, James C Willey46, Weida Tong2, Leming Shi44,47,48, Joshua Xu49.
Abstract
BACKGROUND: Oncopanel genomic testing, which identifies important somatic variants, is increasingly common in medical practice and especially in clinical trials. Currently, there is a paucity of reliable genomic reference samples having a suitably large number of pre-identified variants for properly assessing oncopanel assay analytical quality and performance. The FDA-led Sequencing and Quality Control Phase 2 (SEQC2) consortium analyze ten diverse cancer cell lines individually and their pool, termed Sample A, to develop a reference sample with suitably large numbers of coding positions with known (variant) positives and negatives for properly evaluating oncopanel analytical performance.Entities:
Mesh:
Substances:
Year: 2021 PMID: 33863366 PMCID: PMC8051128 DOI: 10.1186/s13059-021-02316-z
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
List and description of 10 cancer cell lines and a normal reference cell line with % estimated copy number alterations (CNA) and an intra-tumor heterogeneity (ITH) value to indicate potential polyclonality of the cell line
| Cell line | Name | Description | Comments | CNA % (est.)a | ITH (Shannon’s index) |
|---|---|---|---|---|---|
| B | Male reference | Normal | ~ 0 | 0 | |
| BLY | B lymphocyte | Myeloma | Mixed with TLY within studyb | ~ 25% | 43.2 |
| BRA | Brain | Glioblastoma | Polyclonal | 90% | 21.0 |
| BRE | Breast | Adenocarcinoma | Polyclonal | 60% | 100.0 |
| CRV | Cervix | Adenocarcinoma | Polyclonal | 70% | 10.9 |
| LIP | Soft tissue | Liposarcoma | 90% | 3.5 | |
| LIV | Liver | Hepatoblastoma | 27% | 2.5 | |
| MAC | Macrophage | Lymphoma | Polyclonal | 80% | 11.8 |
| SKN | Skin | Melanoma | 24% | 0 | |
| TES | Testes | Carcinoma | 72% | 4.8 | |
| TLY | T lymphoblast | Leukemia | Inherently tetraploidc with variations | 22% | 1.1 |
a CNA were estimated using Agilent GenetiSure Cancer Research CGH + SNP Microarray (2 × 400K), G5975A and WES
b Information of the mixture of the original BLY with TLY is provided in Supplementary Information
c Establishment of the tetraploid nature of the TLY is provided in Supplementary Information
Fig. 1Overall flow diagram of process/method. Discovery of Class 1 variants came from consensus analysis of WES1/2/3/4 runs on overlapping WES kit target regions having high confidence. Additional Class 2 variants were discovered after analyzing WGS1 with WES results. Variants were confirmed by analyzing in silico A results where we combined individual BAMs from each cell line replicate and by analyzing merged-BAM Sample A from pooled Sample A individual replicate BAMs. Finally, a subset of these variants was orthogonally validated with ddPCR
Fig. 2Defining the consensus target region (CTR). The regions shown are not to scale. Most of these regions and their sizes are provided in Additional file 1: Table S2. The low complexity regions are excluded from the CTR. Importantly, the size of the CTR is ~ 22.7 Mb for hg19
Variant characteristics of Sample A compared to other reference material (Sample A results are based on hg19), both Class1 and Class2 variants combined
| Reference material | Total variants identified | Total coding variants identified | Coding variantsin COSMIC genes | Total coding variants < 20% VAF | Total coding variants < 20% VAF in COSMIC genes | #genes with 1 or more variants | #COSMIC genes with 1 or more variants (Tier1/Tier2) |
|---|---|---|---|---|---|---|---|
| Sample A | 42,570 | 42,570 | 2432 | 28,064 | 1653 | 12,238 | 422/102 |
| SNV | 42,021 | 42,021 | 2398 | 27,683 | 1624 | ||
| Indel | 549 | 549 | 34 | 381 | 29 | ||
| Acrometrix | 555 | 555 | 555 | 341a | 341a | 53 | 52/1 |
| SNV | 504 | 504 | 504 | 317 | 317 | ||
| MNV/Indel | 2/49 | 2/49 | 2/49 | 0/24 | 0/24 | ||
| Oncospan | 386 | 386 | 319 | 52 | 46 | 152 | 114/2 |
| SNV | 357 | 357 | 297 | 43 | 38 | ||
| Indel | 30 | 30 | 22 | 9 | 8 | ||
| HCC1395 (somatic) | 41,556 | 487 | 193 | 144 | 14 | 466 | 188 |
| SNV | 39,536 | 460 | 186 | 132 | 13 | ||
| Indel | 2020 | 27 | 7 | 12 | 1 | ||
| HCC1395BL (germline) | 3,577,254 | 21,755 | NA | NA | NA | 9566 | NA |
| SNV | 3,225,512 | 21,381 | |||||
| Indel | 351,742 | 374 |
aMost of the Acrometrix variants are synthetic controls. Thus, it is possible to construct a version of the material where 524 of the 555 variants have a VAF < 20%
Fig. 3ddPCR and WES concordance: a VAF concordance of individual cell line WES consensus results with ddPCR assays of that cell line. b Concordance (log10 scale) of Sample A VAF between ddPCR and WES for positives only. c Various dilutions (C, D, E) of Sample A into B achieve the expected reduction in VAF as seen in the ddPCR results. It also shows the potential noise for measuring ddPCR variants below 0.1% (10− 3) in the distribution of Sample B variants. d Concordance of replicate ddPCR assays (on log10 scale) is very high (r2 = .95) in diluted target Samples D and E. Putative VAF values from Sample B are also shown for comparison
Fig. 4Illustration of considerations for determining positives and negatives within the reference material. Each WES kit coverage is shown relative to their intersection with coding regions (Interval4), the high confidence region, and the low complexity region. Also shown are known positive variant positions in Sample A (mostly Class 1 variants SNVs) including one identified by a violet box that is outside the Interval4 and CTR regions (Class 2 variant). Other positions shown include known negative positions
Fig. 5a VAF histogram of Sample A variants (Class 1 and Class2) with the obvious large numbers of variants in the low VAF range from 0.01 to 0.10. b VAF histogram of normal Sample B which can be used to dilute variants from Sample A