| Literature DB >> 23300415 |
Miguel Vazquez1, Victor de la Torre, Alfonso Valencia.
Abstract
Although there is great promise in the benefits to be obtained by analyzing cancer genomes, numerous challenges hinder different stages of the process, from the problem of sample preparation and the validation of the experimental techniques, to the interpretation of the results. This chapter specifically focuses on the technical issues associated with the bioinformatics analysis of cancer genome data. The main issues addressed are the use of database and software resources, the use of analysis workflows and the presentation of clinically relevant action items. We attempt to aid new developers in the field by describing the different stages of analysis and discussing current approaches, as well as by providing practical advice on how to access and use resources, and how to implement recommendations. Real cases from cancer genome projects are used as examples.Entities:
Mesh:
Year: 2012 PMID: 23300415 PMCID: PMC3531315 DOI: 10.1371/journal.pcbi.1002824
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Figure 1Idealized cancer analysis pipeline.
The column on the left shows a list of sequential steps. The columns on the right show the bioinformatics and molecular biology disciplines involved at each step, the types of techniques employed and some of the current challenges faced.
Figure 2Main tasks in an analysis pipeline.
Starting with the patient information derived from NGS experiments, the variants are mapped between genes and proteins, evaluated for pathogenicity, considered systemically through functional analysis, and the resulting conclusions translated into actionable results.
Selection of the software packages used in cancer genome analysis.
| Software | Functionality | Availability |
| VEP | Mutation mapping | Local installation or web site |
| ANNOVAR | Mutation mapping | Local installation |
| VARIANT | Mutation mapping | Local installation, web site, and web service |
| Mutation Assessor, SIFT | For protein variants | Web site and web service |
| Condel | Consensus prediction | Web site and web service |
| wKinMut | Kinase specific | Web site and web service |
| Genecodis | Annotation enrichment for gene lists | Web site and web service |
| FatiGO, David | Annotation enrichment for gene lists | Web site |
| Cytoscape | Network visualization and analysis | Local installation. Can be embedded in browser applications |
| R | Statistics and plotting | Local installation |
| Taverna | Workflow enactment | Local installation |
| Galaxy | Workflow enactment | Browser application |
Selection of databases commonly used in our workflows.
| Database | Entities | Properties |
| Ensembl | Genes, proteins, transcripts, regulatory regions, variants | Genomic positions, relationships between them, identifiers in different formats, GO terms, PFAM domains |
| Entrez | Genes, articles | Articles for genes, abstracts of articles, links to full text |
| UniProt | Proteins | PDBs, known variants |
| KEGG, Reactome, Biocarta, Gene Ontology | Genes | Pathways, processes, function, cell location |
| TFacts | Genes | Transcription regulation |
| Barcode | Genes | Expression by tissue |
| PINA, HPRD, STRING | Proteins | Interactions |
| PharmaGKB | Drugs, proteins, variants | Drug targets, pharmacogenetics |
| STITCH, Matador | Drugs, proteins | Drug targets |
| Drug clinical trials | Investigational drugs | Diseases or conditions in they are being tested |
| GEO, ArrayExpress | Genes (microarray probes) | Expression values |
| ICGC, TCGA | Cancer Genomes | Point mutations, methylation, CNV, structural variants |
| dbSNP, 1000 genomes | Germline variations | Association with diseases or conditions |
| COSMIC | Somatic variations | Association with cancer types |
Types of third party software and their general characteristics.
| Software type | Installation | User friendly | Scriptable | Reusable |
| Browser app. | NO | YES | NO | NO |
| Web server | NO | NO | YES | NO |
| Local app | YES | YES | NO | NO |
| Command line | YES | NO | YES | YES |
| API | YES | NO | YES | YES |
Reusable means that the code, in whole or in part, can be reused for some other purpose.
May be scriptable using web scraping.
May support some macro definitions and batch processing.
If the source code is provided and is easy to pick apart.