| Literature DB >> 27091472 |
Maude Ardin1, Vincent Cahais2, Xavier Castells1, Liacine Bouaoun3, Graham Byrnes3, Zdenko Herceg2, Jiri Zavadil1, Magali Olivier4.
Abstract
BACKGROUND: The nature of somatic mutations observed in human tumors at single gene or genome-wide levels can reveal information on past carcinogenic exposures and mutational processes contributing to tumor development. While large amounts of sequencing data are being generated, the associated analysis and interpretation of mutation patterns that may reveal clues about the natural history of cancer present complex and challenging tasks that require advanced bioinformatics skills. To make such analyses accessible to a wider community of researchers with no programming expertise, we have developed within the web-based user-friendly platform Galaxy a first-of-its-kind package called MutSpec.Entities:
Keywords: Galaxy; Mutation signatures; Mutation spectra; Single base substitutions
Mesh:
Year: 2016 PMID: 27091472 PMCID: PMC4835840 DOI: 10.1186/s12859-016-1011-z
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Overview of MutSpec tools and workflow. List of variants identified in a set of cancer samples may be imported as a single VCF files that contains all samples (identified by a sample ID) or as multiple VCFs (one for each sample). The first tool to use is MutSpec_Annot for annotating variants with structural and functional information. These annotations may be used to filter out variants that are known polymorphisms or located in segmental duplication regions with the MutSpec-Filter tool. If a single VCF file containing several samples is uploaded, the MutSpec-Split tool should be used to split data by sample using the sample ID. This tool generates automatically a dataset collection. If multiple VCFs are uploaded, MutSpec-Split should not be run but the annotated VCF should be grouped in a dataset collection. MutSpec-Stat can then be run on the dataset collection to generate various statistics on variants characteristics. These statistics may be visualized as graphs on html pages or downloaded as a single Excel file. The report generated by MutSpec-Stat can then be used as input of MutSpec-NMF to extract mutation signatures present in the sample set. MutSpec-NMF generates plots showing the identified signatures and the contribution of each signature to the mutation load of each sample. Finally, MutSpec-Compare can be used to calculate cosine similarity values between the obtained signatures and a set of reference signatures (published or user-defined). These results are shown as a heatmap
Algorithms and code sources used in MutSpec
| MutSpec- | ||||||
|---|---|---|---|---|---|---|
| Packages | Version | Annot | Stat | NMF | Compare | Source |
| Annovara | June 2015 | X | - | - | - | [ |
| Statistics::Ra | 0.33 | - | X | - | - |
|
| Spreadsheet:: | 2.40 | - | X | - | - |
|
| ggplot2 | 1.0.1 | - | X | X | X |
|
| gplots | 2.17.0 | - | X | - | - |
|
| gtable | 0.1.2 | - | X | - | - |
|
| reshape | 1.4.1 | - | X | X | X |
|
| scales | 0.2.5 | - | X | X | - |
|
| gridExtra | 0.9.1 | - | X | X | - |
|
| NMF | 0.20.6 | - | X | X | - | [ |
| getopt | 1.20.0 | - | - | X | - |
|
| lsa | 0.73.1 | - | - | - | X |
|
aThese packages are developed in Perl while all other packages are developed in R
List of databases and reference genomes
| Reference genome | Related databases | Used in |
|---|---|---|
| hg19 | refGene | MutSpec-Stat |
| genomicSuperDups | MutSpec-Filter | |
| snp138 | MutSpec-Filter | |
| snp138NonFlagged | MutSpec-Filter | |
| 1000 genome (ALL) | MutSpec-Filter | |
| esp6500siv2 (ALL) | MutSpec-Filter | |
| mm9 | refGene | MutSpec-Stat |
| genomicSuperDups | MutSpec-Filter | |
| snp128 | MutSpec-Filter |
Analyses performed by the tool MutSpec-Stat
| Analysis | Table | Graph | Statistics |
|---|---|---|---|
| Overall mutation distribution | - | X | - |
| Impact on protein sequence | X | X | - |
| SBS type distribution | X | X | - |
| Stranded analysis of SBS type distribution | X | X | Chi-squared test |
| SBS distribution by functional region | X | - | - |
| Strand bias by functional region | X | - | - |
| SBS distribution per chromosome | X | - | Pearson Correlation |
| Trinucleotide sequence context of SBS on the genomic sequence | X | X | - |
| Stranded analysis of trinucleotide sequence context of SBS | X | X | - |
Fig. 2Mutation spectra in OSCC from Indian patients. Results for the pool of 106 samples are shown. a Distribution of variants (N = 13059) according to their functional impact on protein sequences. b Stranded analysis of the 6 types of SBS showing counts for SBS with transcript annotations (N = 12789). c Distribution of SBS according to their trinucleotide sequence context (SBS counts are indicated in parenthesis). d Plots of the cophenic and rss analyses using a range of factorisation values (2 to 8). The solid lines represent the results obtained with the original data while the dotted lines represent the results obtained with randomized data (original data are shuffled)
Fig. 3Mutation signatures in Indian OSCC and their suspected origin. Summary results of MutSpec-NMF and MutSpec-Compare analyses obtained on the 106 OSCC samples. a Mutation signatures obtained by NMF with a factorisation value of four. b Comparison of the four OSCC signatures (vertical axis) with 34 reference signatures (horizontal axis) using the cosine similarity method. The heatmap is color-coded according to the cosine value that ranges from 0 to 1. Only reference signatures with a significant match (cosine > 0.9) are labelled. c Number of mutations contributing to each of the four signatures identified, for each sample analyzed. d Average contributions of the four identified signatures to the mutation load of clustered samples and number of samples by cluster