| Literature DB >> 25886838 |
Maochun Qin1, Biao Liu2, Jeffrey M Conroy3, Carl D Morrison4, Qiang Hu5, Yubo Cheng6, Mitsuko Murakami7, Adekunle O Odunsi8, Candace S Johnson9, Lei Wei10, Song Liu11, Jianmin Wang12.
Abstract
BACKGROUND: Somatically acquired structure variations (SVs) and copy number variations (CNVs) can induce genetic changes that are directly related to tumor genesis. Somatic SV/CNV detection using next-generation sequencing (NGS) data still faces major challenges introduced by tumor sample characteristics, such as ploidy, heterogeneity, and purity. A simulated cancer genome with known SVs and CNVs can serve as a benchmark for evaluating the performance of existing somatic SV/CNV detection tools and developing new methods.Entities:
Mesh:
Year: 2015 PMID: 25886838 PMCID: PMC4349766 DOI: 10.1186/s12859-015-0502-7
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1The overall workflow of SCNVSim. A) A personal genome with normal diploid status is generated by simulating SNV and INDEL against reference genome sequence. SNV/INDEL ratio, transition/transversion ratio, Heterozygous/Homozygous ratio and INDEL size distribution are considered (left). B) For tumor genome simulation, ploidy is first determined, followed by SV generations of different types and mechanisms (Non-homology or homology mediated). Heterogeneity is also implemented (right). The outputs include a simulated personal genome with normal diploid status in FASTA format, germline SNVs and INDELs in variant call format (VCF), and the following for each simulated tumor clone: 1) simulated SVs in terms of events and breakpoints, 2) copy number status for each individual segment, 3) BAF and LOH status of each segment, 4) FASTA format of cancer genome with somatic SV/CNV events as input for NGS reads simulation. By mixing the simulated genomes from the normal sample and tumor clones into a ratio specified by the user, a realistic and complicated cancer genome data set with varying levels of tumor heterogeneity and purity can be obtained.
Figure 2Homologous sequence mediated SV simulation. The types of homologous sequence (e.g., transposable elements) medicated SV simulated by SCNVSIM are: A) TE-mediated deletion; B) TE-mediated tandem duplication; C) TE-mediated inversion (two TEs are on opposite strands); D) TE-mediated translocation (balanced or unbalanced); and E) TE-mediated insertion (intra- or inter-chromosome). Break points are randomly picked in homologous sequences shared by compatible repeat elements which are from the same repeat family and overlapping with each other.
Figure 3The Circos plots of three simulated tumor clones. A) The ancestor clone with 50 simulated SVs, B) the first descendant clone with 150 simulated SVs, and C) the second descendant clone with 150 simulated SVs. B and C are independently generated from A. For each Circos plot, the outer circle plots CNV with gain as red and loss as blue. The middle circle shows LOH status using orange. The inner circle shows SVs using the following color schema: inversion as red, insertion as blue, ITX as cyan, balanced CTX as magenta, and unbalanced CTX as brown.
The CPU and memory usage for SCNVsim simulations with different parameter settings, including the number of SV events, ploidy status and number of sub-clones, in both human and mouse reference genomes*
|
|
|
|
| ||
|---|---|---|---|---|---|
|
|
|
|
| ||
| Germline simulation | Default parameters | 3.9 | 7.9 | 3.3 | 7.3 |
| Tumor simulation | single clone with 50 SVs | 2.1 | 6.2 | 1.7 | 5.5 |
| single clone with 50 SVs, triploid | 2.6 | 6.3 | 2.4 | 5.4 | |
| single clone with 50 Svs, tetraploid | 3.6 | 6.8 | 2.9 | 5.6 | |
| single clone with 200 SVs | 2.1 | 6.3 | 1.9 | 5.6 | |
| single clone with 300 SVs | 2.3 | 6.6 | 1.9 | 5.6 | |
| 2 clones with 50 and 150 SVs | 3.9 | 7.9 | 3.6 | 6.4 | |
| 3 clones with 50, 150, and 150 SVs | 5.9 | 8.0 | 4.7 | 7.1 | |
*Analysis was performed on a Linux computer with two Intel® Xeon(R) E5-2620 v2 CPUs and 32 GB memory.