| Literature DB >> 22759861 |
Nathan D Dees1, Qunyuan Zhang, Cyriac Kandoth, Michael C Wendl, William Schierding, Daniel C Koboldt, Thomas B Mooney, Matthew B Callaway, David Dooling, Elaine R Mardis, Richard K Wilson, Li Ding.
Abstract
Massively parallel sequencing technology and the associated rapidly decreasing sequencing costs have enabled systemic analyses of somatic mutations in large cohorts of cancer cases. Here we introduce a comprehensive mutational analysis pipeline that uses standardized sequence-based inputs along with multiple types of clinical data to establish correlations among mutation sites, affected genes and pathways, and to ultimately separate the commonly abundant passenger mutations from the truly significant events. In other words, we aim to determine the Mutational Significance in Cancer (MuSiC) for these large data sets. The integration of analytical operations in the MuSiC framework is widely applicable to a broad set of tumor types and offers the benefits of automation as well as standardization. Herein, we describe the computational structure and statistical underpinnings of the MuSiC pipeline and demonstrate its performance using 316 ovarian cancer samples from the TCGA ovarian cancer project. MuSiC correctly confirms many expected results, and identifies several potentially novel avenues for discovery.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22759861 PMCID: PMC3409272 DOI: 10.1101/gr.134635.111
Source DB: PubMed Journal: Genome Res ISSN: 1088-9051 Impact factor: 9.043
Figure 1.MuSiC flow diagram. MuSiC modules can either be implemented individually with various required input files or may be implemented in serial via one command where four inputs are used to execute the entire package of tools.
Analyses performed and the variants included for each MuSiC module
Figure 2.Mutation rates and SMGs in the OV data set. (A) The cohort-wide background mutation rates for all seven mutational mechanism categories are plotted for the OV data set. The overall BMR is also plotted, combining all types of mutations. (B) −Log10(P) for the top 12 OV SMGs are plotted for all three SMG tests in order of decreasing convolution test P-value.
Figure 3.Mutation relation analysis. A heat map showing mutations in highly mutated genes for all 316 OV samples. Dotted-line boxes highlight concurrent nonsynonymous EMR3 and FAT3 mutations (two concurrent mutations out of five nonsynonymous mutations from EMR3 and 18 from FAT3, P = 0.0333) and mutually exclusive nonsynonymous RB1 and TP53 mutations (297 mutually exclusive mutations out of six nonsynonymous mutations from RB1 and 299 from TP53, P = 0.0141).
Figure 4.BRCA1 variant status versus sample age of diagnosis for the OV data set. A boxplot of the age of diagnosis of 315 OV patients grouped by their BRCA1 mutation status. Germline BRCA1 variant status is correlated with a lower age of diagnosis via the CCT (P = 2.456 × 10−5).
Figure 5.Proximity analysis mutation diagrams. These mutation diagrams show recurrent triplet mutations in both UBR4 and RB1CC1, both of which harbor a relationship with tumor suppressor gene RB1.
Figure 6.Pfam domains affected by OV mutations. (A) A histogram of the most highly mutated domains in the OV data set next to the number of genes affected in each domain. (B) A stacked bar-graph where the value 100% represents the total number of mutations in a particular Pfam domain. Lighter and darker sections of the bars represent which proportions of the total mutations are nonsynonymous and synonymous, respectively.
PathScan results for the OV data set