| Literature DB >> 22839973 |
Alfonso Valencia1, Manuel Hidalgo1.
Abstract
Progress in genomics has raised expectations in many fields, and particularly in personalized cancer research. The new technologies available make it possible to combine information about potential disease markers, altered function and accessible drug targets, which, coupled with pathological and medical information, will help produce more appropriate clinical decisions. The accessibility of such experimental techniques makes it all the more necessary to improve and adapt computational strategies to the new challenges. This review focuses on the critical issues associated with the standard pipeline, which includes: DNA sequencing analysis; analysis of mutations in coding regions; the study of genome rearrangements; extrapolating information on mutations to the functional and signaling level; and predicting the effects of therapies using mouse tumor models. We describe the possibilities, limitations and future challenges of current bioinformatics strategies for each of these issues. Furthermore, we emphasize the need for the collaboration between the bioinformaticians who implement the software and use the data resources, the computational biologists who develop the analytical methods, and the clinicians, the systems' end users and those ultimately responsible for taking medical decisions. Finally, the different steps in cancer genome analysis are illustrated through examples of applications in cancer genome analysis.Entities:
Year: 2012 PMID: 22839973 PMCID: PMC3580417 DOI: 10.1186/gm362
Source DB: PubMed Journal: Genome Med ISSN: 1756-994X Impact factor: 11.117
Figure 1Scheme of a comprehensive bioinformatics pipeline to analyze personalized genomic information. The five steps in the pipeline are shown in the top row, with the main methods that have so far been developed for each step the middle and outstanding problems in the bottom row. (1) Revision of genomic information. In this rapidly developing area methods and software are continuously changing to match the improvements in sequencing technologies. (2) Analysis of the consequences of specific mutations and genomic alterations. The analysis needs go from the area of point mutation prediction in proteins to the much more challenging area of prediction of mutations in non-coding regions, including promoter regions and TF binding sites. Other genetic alterations important in cancer must also be taken into consideration, such as copy number variation, modification of splice sites and altered splicing patterns. (3) Mapping of gene/protein variants at the network level. At this point, the relationships between individual components (genes and proteins) are analyzed in terms of their involvement in gene control networks, protein interaction maps and signaling/metabolic pathways. It is clearly necessary to develop a network analysis infrastructure and analysis methods capable of extracting information from heterogeneous data sources. (4) Translation of the information into potential drugs or treatments. The pharmacogenomic analysis of the information is essential to identify potential drugs or treatments. The analysis at this level integrates genomic information with that obtained from databases linking drugs and potential targets, combining it with data on clinical trials drawn from text or web sources. Toxicogenomics information adds an interesting dimension that enables additional exploration of the data. (5) Finally, it is essential to make the information extracted by the systems accessible to the end users in adequate conditions, including geneticists, biomedical scientists and clinicians.
Some of the main data repositories of genetic variation associated with human phenotypes and disease
| Name | Description | URL | Reference |
|---|---|---|---|
| dbSNP | General catalog of polymorphisms | [ | |
| Ensembl | Maps known mutations and SNPs in the human genome from other databases | [ | |
| OMIM | Online Mendelian Inheritance in Man; a large collection of disease annotations, often for monogenetic diseases | [ | |
| COSMIC | Catalog of somatic mutations in cancer | [ | |
| CGC | Cancer Gene Census | [ |
Methods for predicting the consequences of point mutations
| Name | URL | How it works |
|---|---|---|
| SIFT | Uses sequence homology scores that are calculated using position-specific scoring matrices with Dirichlet priors | |
| Polyphen 2 | Uses sequence conservation, structure and Swiss-Prot annotations | |
| PMUT | http://mmb2.pcb.ub.es:8080/PMut/ | Formulates predictions with neural networks, using internal databases, secondary structure prediction and sequence conservation |
| SNPs3D | Based on a support vector machine that uses structural or sequence conservation parameters | |
| PantherPSEC19 | Uses sequence homology scores calculated using PANTHER hidden Markov model families | |
| Mutationassessor | Provides predictions using additional information based on the specific patterns of conservation of protein families | |
| VEP (Variant Effect Predictor) | This system categorizes Ensembl genomic variants in known transcripts by their potential effect | |
| KinMut | Prediction of the consequences of mutations in protein kinases; the system was trained with specific information about the kinase subfamilies, and together with the predictions provides general information about the corresponding proteins, a comparison with other predictors and links to the related literature |
Figure 2Screenshots representing the basic information provided by the wKinMut system for analyzing a set of point mutations in protein kinases [147,148]. The panels present: (a) general information about the protein kinase imported from various databases; (b) information about the possible consequences of the mutations extracted from annotated databases, each linked to the original source; (c) predictions of the consequences of the mutations in terms of the principal features of the corresponding protein kinase, including the results of the kinase-specific system KinMut [110] (Table 2); (d) an alignment of related sequences, including information about conserved and variable positions; (e) the position of the mutations in the corresponding protein structure (when available); (f) sentences related to the specific mutations from [77]; (g) information about the function and interactions of the protein kinase extracted from PubMed with the iHOP system [149,150]. A detailed description of the wKinMut system can be found in [147] and in the documentation of the web site [148].
Figure 3An interface (CONTEXTS) that we have developed for the analysis of cancer genome studies at the level of biological networks [122,151]. The upper panel shows the menus for selecting specific cancer studies, databases for pathway analysis (or set of annotations) and the level of confidence required for the relationships. From the user's requests, the system identifies the pathways or functional classes common to the different cancer studies, and the interface allows the corresponding information to be retrieved. The graph represent various cancer studies (those selected in the 'tumor types' panel are represented by red circles) using the pathways extracted from the Reactome database [152] as the background (the reference selected in the 'Annotation databases' panel and represented by small triangles). For the selected lung cancer study, the 'Lung tumor mutated genes' panel provides a link to the related genes indicating the database (source) from where the information was extracted. The lower panel represents the information on the pathways selected by the user ('innate immunity signaling') as directly provided by the Reactome database.
Resources with information connecting proteins and drugs
| Name | Details | URL |
|---|---|---|
| ChEBI (Chemical Entities of Biological Interest) | Contains more than half a million chemical compounds classified according to their biological activity | |
| DrugBank | Contains detailed chemical, pharmacological and pharmaceutical data linked with information on the sequence, structure and pathways of potential targets. The database contains information on almost 500 drugs | |
| Resources from Peer Bork's group, including STITCH, SuperDrug, SuperNatural and SuperTarget/Matador | Bork's group has developed a number of systems that help link drugs to their protein and genomic targets, including data on adverse drug effects and symptoms | |
| PharmGKB | Repository linking genomic information on 2,500 genetic variants with clinical data derived from pharmacogenomics studies, and the corresponding diseases and phenotypes | |
| TTD (Therapeutic Target Database) | Contains data for relations between 2,000 targets and more than 15,000 drugs, including information extracted from clinical trials |