| Literature DB >> 26860319 |
Giorgio E M Melloni1, Stefano de Pretis2, Laura Riva3, Mattia Pelizzola4, Arnaud Céol5, Jole Costanza6, Heiko Müller7, Luca Zammataro8.
Abstract
BACKGROUND: The increasing availability of resequencing data has led to a better understanding of the most important genes in cancer development. Nevertheless, the mutational landscape of many tumor types is heterogeneous and encompasses a long tail of potential driver genes that are systematically excluded by currently available methods due to the low frequency of their mutations. We developed LowMACA (Low frequency Mutations Analysis via Consensus Alignment), a method that combines the mutations of various proteins sharing the same functional domains to identify conserved residues that harbor clustered mutations in multiple sequence alignments. LowMACA is designed to visualize and statistically assess potential driver genes through the identification of their mutational hotspots.Entities:
Mesh:
Substances:
Year: 2016 PMID: 26860319 PMCID: PMC4748640 DOI: 10.1186/s12859-016-0935-7
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1b LowMACA results based on the alignment of the Ras superfamily (PF00071). The first barplot reports the most mutated proteins under significant hotspots in their original position. These hotspots are also highlighted in the second barplot with colored symbols. Labels in the second barplot report the position of the consensus, the FDR corrected p-value and the trident score of conservation (TS). The TS is reported only for hotspots identified in the alignment of all the 133 family members. Both barplots are truncated on non-informative positions. a The panel shows a plot representing the mutual exclusivity between mutations that fall in the same position of the global consensus alignment. Significant patterns are highlighted with the color corresponding to the tumor type where the mutual exclusivity was found. We consider mutually exclusive the pairs with a corrected p-value below 0.05 using the R package cooccur. c The dendrogram is built on hamming distances between all human sequences of the Ras superfamily aligned via clustal omega. Genes that belong to the same subfamily, as described in [Hall, 1998], are represented with the same color. Significant hotspots (under gene names) are represented with the symbols used in b
Fig. 2a Venn diagram of the represented Pfam domains in the list of 291 high confidence drivers and 144 candidate drivers. A total of 577 different Pfam domains are covered by these genes with 86 Pfam domains shared between the two lists. b Heatmap representation of significant Pfam domains in the “Kinase” network. Every row represents a patient of 17 different tumor types. A strong mutual exclusivity between tyrosine kinases, kinases and CH domain is shown. c PI3K networks in driver genes. Every circle represents a distinct Pfam domain and the size represents the number of genes that contain the specified Pfam domain. Color indicates if significant hotspots were found in the LowMACA analysis (red is significant, green is not significant). Two domains are connected if they are found together on the same gene/protein. Edge thickness represents the number of genes that harbor both Pfam domains at the vertices (minimum 2). Blue color indicates mutual exclusivity and orange depicts significant co-occurrence