Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Genomics software: The view from 10,000 feet.

Literature DB >> 19951894

Genomics software: The view from 10,000 feet.

Abstract

The rate of change in genomics, and 'omics generally, shows no signs of slowing down. Related analysis software is struggling to keep apace. This paper provides a brief review of the field.

Entities: Chemical Disease Species

Mesh：

Year: 2009 PMID： 19951894 PMCID： PMC3500188 DOI： 10.1186/1479-7364-4-1-56

Source DB: PubMed Journal: Hum Genomics ISSN： 1473-9542 Impact factor: 4.639

For the bioinformaticist, and still more so the traditional genetic epidemiologist, the big view on how to tackle genomic data analysis looks daunting. Only a few years ago, the genome-wide association study (GWAS) represented the overshadowing Everest on the landscape, and commentators fretted about the computational feasibility of analysing 500,000 or so single nucleotide polymorphisms (SNPs) against one phenotype variable. Now, the single-phenotype GWAS is a foothill from which to launch attacks on datasets of much greater scale and variety. A new term - systems genetics - has emerged to describe this expanded world view. Analytical tools for dealing with molecular data have always lagged behind the acceleration of high-throughput methods for generating them, and that seems especially true at the present time. Here, I provide an overview of the software currently available, and look ahead to future developments. First, a look at more familiar territory. The software package PLINK [1] has become the favoured work-horse of GWAS analysis, thanks to the untiring efforts of Shaun Purcell to keep the software well documented, flexible, fast and compact in its use of data structures. Few other packages surpass PLINK as far as basic quality control and first-pass SNP-by-SNP analysis are concerned, and many other, more advanced features are available and are being expanded continuously. In addition to SNP probes, modern GWAS panels are equipped with additional probe sets for interrogating copy number variation (CNV). PennCNV [2] is a popular software for calling these. CNV call uncertainty poses downstream problems for association analysis, and software for dealing with this has been reviewed recently in this journal [3]. Another trend is towards imputation of SNPs that are not present in the GWAS panel but can be inferred via linkage disequilibrium (LD), also reviewed here recently [4]. Popular choices are Mach, [5] Impute [6] and Beagle [7]. A more specialist imputation problem, but one of general interest due to the role of the immune response system in many diseases, is to call classical human leukocyte antigen (HLA) genotypes from SNPs typed in the HLA region of chromosome 6. Recently improved software from Gil McVean and colleagues is available for this [8]. SNP annotation tools provide the most straightforward window from GWAS hits and also sequence data into the wider 'omic universe. A recent review is by Rachel Karchin [9]. The SNP Function Portal [10] provides one of the more comprehensive lists of annotation for each SNP, including those arising via LD proxy or 'tagging'. Other options include FastSNP, [11] PupaSuite, [12] SNPnexus, [13] SNPinfo, [14] SNPselector, [15] F-SNP [16] and TAMAL [17]. WGAviewer [18] is geared specifically towards the analysis of GWAS results, and has a nice visual interface. All these tools struggle to keep up with the rapidly expanding set of available annotations. For example, several different datasets are now publically available that combine GWAS SNP data with genome-wide gene expression data (so-called genetical genomics or expression quantitative trait locus [eQTL] data). Currently, however, no one tool integrates the ability to search all these datasets simultaneously. One option for the more proficient investigator is to keep one step ahead by using the Galaxy web tool [19] to design their own application for integrating different annotation tracks with their GWAS hits. SNAP [20] is a useful tool for feeding LD proxy information into such a custom-made Galaxy application. Beyond SNP annotation, there are more formal attempts at linking genetic data into functional networks. These may be created from internal sources, such as p-values for SNP-SNP interactions, or extrinsic sources, such as protein-protein interactions (reviewed here recently [21]) and gene ontology categories. A repository of types of network data is available at http://www.pathwaycommons.org. While network visualisation tools were previously the domain of expensive commercial software, Cytoscape [22] has become an excellent freeware alternative. For formal statistical significance of coincident patterns within these networks, there is a rapidly expanding literature and no consensus yet on the best approach to take. Two examples are ALIGATOR [23] and gene-set enrichment analysis (GSEA). The latter has been adapted from gene expression studies and applied to GWAS p-values. Web-based implementations are available at http://bioinfo.vanderbilt.edu/webgestalt and http://www.broadinstitute.org/gsea. How can one keep up to date in the rapidly changing world of genomic software? Certainly, review sections such as the one here in Human Genomics will help. Nucleic Acids Research publishes a useful annual review of web server applications, [24] now also available online http://bioinformatics.ca/links_directory. The Applications Note section of the journal Bioinformatics provides the best, but by no means only, location for primary literature on new software. Looking ahead, software for handling high-throughput sequencing is an area where we can expect much development in the coming months. Bioinformatics has a useful online 'virtual issue' on tools for next generation sequencing which they are recurrently updating http://www.oxford journals.org/our_journals/bioinformatics/nextgene rationsequencing.html. One wonders whether 10,000 feet will be high enough for a synoptic view in 12 months time.

23 in total

1. Cytoscape: a software environment for integrated models of biomolecular interaction networks.

Authors: Paul Shannon; Andrew Markiel; Owen Ozier; Nitin S Baliga; Jonathan T Wang; Daniel Ramage; Nada Amin; Benno Schwikowski; Trey Ideker
Journal: Genome Res Date: 2003-11 Impact factor: 9.043

2. SNPselector: a web tool for selecting SNPs for genetic association studies.

Authors: Hong Xu; Simon G Gregory; Elizabeth R Hauser; Judith E Stenger; Margaret A Pericak-Vance; Jeffery M Vance; Stephan Züchner; Michael A Hauser
Journal: Bioinformatics Date: 2005-09-22 Impact factor: 6.937

3. SNP Function Portal: a web database for exploring the function implication of SNP alleles.

Authors: Pinglang Wang; Manhong Dai; Weijian Xuan; Richard C McEachin; Anne U Jackson; Laura J Scott; Brian Athey; Stanley J Watson; Fan Meng
Journal: Bioinformatics Date: 2006-07-15 Impact factor: 6.937

4. A new multipoint method for genome-wide association studies by imputation of genotypes.

Authors: Jonathan Marchini; Bryan Howie; Simon Myers; Gil McVean; Peter Donnelly
Journal: Nat Genet Date: 2007-06-17 Impact factor: 38.330

5. PLINK: a tool set for whole-genome association and population-based linkage analyses.

Authors: Shaun Purcell; Benjamin Neale; Kathe Todd-Brown; Lori Thomas; Manuel A R Ferreira; David Bender; Julian Maller; Pamela Sklar; Paul I W de Bakker; Mark J Daly; Pak C Sham
Journal: Am J Hum Genet Date: 2007-07-25 Impact factor: 11.025

6. TAMAL: an integrated approach to choosing SNPs for genetic studies of human complex traits.

Authors: Bradley M Hemminger; Billy Saelim; Patrick F Sullivan
Journal: Bioinformatics Date: 2006-01-17 Impact factor: 6.937

7. PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data.

Authors: Kai Wang; Mingyao Li; Dexter Hadley; Rui Liu; Joseph Glessner; Struan F A Grant; Hakon Hakonarson; Maja Bucan
Journal: Genome Res Date: 2007-10-05 Impact factor: 9.043