Literature DB >> 23245293

SNPTrack™ : an integrated bioinformatics system for genetic association studies.

Joshua Xu, Reagan Kelly, Guangxu Zhou, Steven A Turner, Don Ding, Stephen C Harris, Huixiao Hong, Hong Fang, Weida Tong.

Abstract

A genetic association study is a complicated process that involves collecting phenotypic data, generating genotypic data, analyzing associations between genotypic and phenotypic data, and interpreting genetic biomarkers identified. SNPTrack is an integrated bioinformatics system developed by the US Food and Drug Administration (FDA) to support the review and analysis of pharmacogenetics data resulting from FDA research or submitted by sponsors. The system integrates data management, analysis, and interpretation in a single platform for genetic association studies. Specifically, it stores genotyping data and single-nucleotide polymorphism (SNP) annotations along with study design data in an Oracle database. It also integrates popular genetic analysis tools, such as PLINK and Haploview. SNPTrack provides genetic analysis capabilities and captures analysis results in its database as SNP lists that can be cross-linked for biological interpretation to gene/protein annotations, Gene Ontology, and pathway analysis data. With SNPTrack, users can do the entire stream of bioinformatics jobs for genetic association studies. SNPTrack is freely available to the public at http://www.fda.gov/ScienceResearch/BioinformaticsTools/SNPTrack/default.htm.

Entities: Disease Species

Mesh：

Year: 2012 PMID： 23245293 PMCID： PMC3437569 DOI： 10.1186/1479-7364-6-5

Source DB: PubMed Journal: Hum Genomics ISSN： 1473-9542 Impact factor: 4.639

Introduction

Personalized medicine will improve health outcomes and patient satisfaction. However, implementing personalized medicine based on individuals' biological information relies on genetic biomarkers that are identified through genetic association studies. High-throughput genotyping technologies have been advanced to enable the simultaneous determination of genotypes for millions of single-nucleotide polymorphisms (SNPs). Concurrently, the International HapMap Project determined genotypes of over 3.1 million common SNPs in human populations [1]. These advances combine to make genetic association studies a feasible and promising research field for personalized medicine. However, there are a number of bioinformatics challenges associated with the enormous amount of genetic data generated by high-throughput technologies. Storing and accessing the data, performing association tests, and interpreting results can no longer be readily done using ad hoc approaches commonly utilized for much smaller candidate gene association studies. Furthermore, because contributions of individual polymorphisms to a phenotype are typically quite small, appropriate analysis and interpretation techniques are key. Thus, identifying all associated polymorphisms and placing them in context is a necessary step in understanding their role in defining the phenotype or treatment response. A number of bioinformatics algorithms and tools have been developed for managing and analyzing genetic data as well as for interpreting genetic biomarkers. However, none of them have been able to do all of the bioinformatics jobs needed for a complete genetic association study; scientists have needed to use more than one tool for their studies. Therefore, there was high demand for an integrated bioinformatics system. Early in the Voluntary eXploratory Data Submission program [2], the FDA's National Center for Toxicological Research developed ArrayTrackTM to manage, analyze, and interpret microarray gene expression data [3,4]. ArrayTrackTM has since been used for reviewing and analyzing genomic data at the FDA and for genomic research in the scientific community. Building on the success and experience from ArrayTrackTM, SNPTrack was developed as a one-stop-shop bioinformatics solution capable of performing the same function for genetic data that ArrayTrackTM does for gene expression data. SNPTrack offers a full suite of data storage and management, analysis, and interpretation tools for genetic association studies.

Implementation

SNPTrack adopts a client‐server system that integrates data management, analysis, and interpretation into a single system. The Oracle server stores and integrates phenotypic and genotypic data as well as annotations of genetic biomarkers from public resources about SNPs, quantitative trait loci (QTLs), genes, proteins, and pathways. Its user interface, query mechanism, and data visualization features were implemented in Java. As depicted in Figure 1, SNPTrack has three major components: StudyDB, TOOL, and LIB.

Figure 1

SNPTrack's graphical user interface with the connections of its major components: StudyDB, TOOL, and LIB.

SNPTrack's graphical user interface with the connections of its major components: StudyDB, TOOL, and LIB. StudyDB hosts and manages genotypic and phenotypic data. It supports importing of three types of files in tab-delimited text format: annotation files for the genotyped SNPs (which is compiled for the study or provided by the chip provider), genotype data files, and phenotype data files (which may include sex, age, race, disease status, and drug information such as environmental exposure, dose, treatment response, and adverse events). Data are organized and presented in a tree-structured view of three node types: study owner or group (username), study title, and study data. The TOOL component provides the data analysis features. Data are formatted and exported to the client computer for analysis with PLINK, a command-line program that features many statistical methods such as case‐control associations, various regression methods, permutation tests, false discovery rate, and other algorithms [5]. Analysis commands in PLINK are issued and managed through gPLINK, a Java-based graphical user interface for PLINK commands management [6]. Analysis results can be visualized through Haploview [7]. Linkage disequilibrium and haplotypes in the region around an interesting SNP can be downloaded from HapMap and viewed in Haploview. These component tools are automatically loaded to the client computer and updated by SNPTrack. Interesting SNPs can also be saved into StudyDB. As needed, other stand-alone analysis tools such as SAS and R/Bioconductor can be integrated in the TOOL. The LIB contains a collection of libraries to facilitate the interpretation of results from genetic studies. The libraries partially mirror the contents of dbSNP, GenBank, SWISS-PROT, LocusLink, Kyoto Encyclopedia of Genes and Genomes, Gene Ontology (GO), and others. The annotations from these databases are extracted to construct the enriched libraries, such as the SNPLib, GeneLib, ProteinLib, and PathwayLib. The SNP and QTL libraries are specifically designed for genetic association studies [8]. The libraries are cross-linked and support functions such as list-based queries to provide a mechanism for data interpretation. The SNP Library follows the release cycles of dbSNP and is updated about twice a year. A typical workflow begins with importing the SNP panel, genotype, and phenotype data files into SNPTrack. Access permission (data security) is controlled by the user. Significantly associated SNPs can be identified using PLINK. Some commonly used operations include filtering SNPs using the Hardy-Weinberg test for linkage disequilibrium, followed by an allele frequency summary, allelic association tests, genotypic association tests, and/or linear/logistic regression analysis. Significantly associated SNPs found by the analysis tools can be saved as a SNP list in SNPTrack. Users can also import, export, edit, manage, and compare SNP lists. Specific interesting SNPs can be directly linked to a wide selection of external databases (dbSNP Report, Ensembl, Hapmap, etc.) for more detailed information. Integrated libraries allow users to find genes and pathways related to SNPs.

Availability

The SNPTrack client application works on all major operating systems including Windows, Linux, and Mac. An instance of the SNPTrack server is hosted by the FDA and freely available at http://www.fda.gov/ScienceResearch/BioinformaticsTools/SNPTrack/default.htm. Users may also request the software for a local installation. Manuals and sample data are available at the above website.

Conclusions

SNPTrack is a one-stop-shop system for managing, analyzing, and interpreting genetic association data. It provides a centralized storage solution that can perform complicated genetic association analyses on a large number of SNPs for identification of genetic biomarkers, and find related genes, pathways, and GO terms. SNPTrack is used not only for review and analysis of genetic data by the FDA, but is also freely available to the public.

Endnote

The views presented in this article do not necessarily reflect those of the Food and Drug Administration.

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

JX and GZ developed SNPTrack. WT and HF conceived the original idea and methods; JX and SH guided the development. ST, SH, and JX contributed to the construction of SNPTrack databases. HF, DD, RK, and HH contributed to testing and improving the software. JX, RK, DD, and HF wrote the first draft. HH and WT improved the manuscript. All authors read and approved the final manuscript.

7 in total

1. Development of public toxicogenomics software for microarray data management and analysis.

Authors: Weida Tong; Stephen Harris; Xiaoxi Cao; Hong Fang; Leming Shi; Hongmei Sun; James Fuscoe; Angela Harris; Huixiao Hong; Qian Xie; Roger Perkins; Dan Casciano
Journal: Mutat Res Date: 2004-05-18 Impact factor: 2.433

2. Haploview: analysis and visualization of LD and haplotype maps.

Authors: J C Barrett; B Fry; J Maller; M J Daly
Journal: Bioinformatics Date: 2004-08-05 Impact factor: 6.937

3. Impact of microarray data quality on genomic data submissions to the FDA.

Authors: Felix W Frueh
Journal: Nat Biotechnol Date: 2006-09 Impact factor: 54.908

4. PLINK: a tool set for whole-genome association and population-based linkage analyses.

Authors: Shaun Purcell; Benjamin Neale; Kathe Todd-Brown; Lori Thomas; Manuel A R Ferreira; David Bender; Julian Maller; Pamela Sklar; Paul I W de Bakker; Mark J Daly; Pak C Sham
Journal: Am J Hum Genet Date: 2007-07-25 Impact factor: 11.025

5. A second generation human haplotype map of over 3.1 million SNPs.

Authors: Kelly A Frazer; Dennis G Ballinger; David R Cox; David A Hinds; Laura L Stuve; Richard A Gibbs; John W Belmont; Andrew Boudreau; Paul Hardenbol; Suzanne M Leal; Shiran Pasternak; David A Wheeler; Thomas D Willis; Fuli Yu; Huanming Yang; Changqing Zeng; Yang Gao; Haoran Hu; Weitao Hu; Chaohua Li; Wei Lin; Siqi Liu; Hao Pan; Xiaoli Tang; Jian Wang; Wei Wang; Jun Yu; Bo Zhang; Qingrun Zhang; Hongbin Zhao; Hui Zhao; Jun Zhou; Stacey B Gabriel; Rachel Barry; Brendan Blumenstiel; Amy Camargo; Matthew Defelice; Maura Faggart; Mary Goyette; Supriya Gupta; Jamie Moore; Huy Nguyen; Robert C Onofrio; Melissa Parkin; Jessica Roy; Erich Stahl; Ellen Winchester; Liuda Ziaugra; David Altshuler; Yan Shen; Zhijian Yao; Wei Huang; Xun Chu; Yungang He; Li Jin; Yangfan Liu; Yayun Shen; Weiwei Sun; Haifeng Wang; Yi Wang; Ying Wang; Xiaoyan Xiong; Liang Xu; Mary M Y Waye; Stephen K W Tsui; Hong Xue; J Tze-Fei Wong; Luana M Galver; Jian-Bing Fan; Kevin Gunderson; Sarah S Murray; Arnold R Oliphant; Mark S Chee; Alexandre Montpetit; Fanny Chagnon; Vincent Ferretti; Martin Leboeuf; Jean-François Olivier; Michael S Phillips; Stéphanie Roumy; Clémentine Sallée; Andrei Verner; Thomas J Hudson; Pui-Yan Kwok; Dongmei Cai; Daniel C Koboldt; Raymond D Miller; Ludmila Pawlikowska; Patricia Taillon-Miller; Ming Xiao; Lap-Chee Tsui; William Mak; You Qiang Song; Paul K H Tam; Yusuke Nakamura; Takahisa Kawaguchi; Takuya Kitamoto; Takashi Morizono; Atsushi Nagashima; Yozo Ohnishi; Akihiro Sekine; Toshihiro Tanaka; Tatsuhiko Tsunoda; Panos Deloukas; Christine P Bird; Marcos Delgado; Emmanouil T Dermitzakis; Rhian Gwilliam; Sarah Hunt; Jonathan Morrison; Don Powell; Barbara E Stranger; Pamela Whittaker; David R Bentley; Mark J Daly; Paul I W de Bakker; Jeff Barrett; Yves R Chretien; Julian Maller; Steve McCarroll; Nick Patterson; Itsik Pe'er; Alkes Price; Shaun Purcell; Daniel J Richter; Pardis Sabeti; Richa Saxena; Stephen F Schaffner; Pak C Sham; Patrick Varilly; David Altshuler; Lincoln D Stein; Lalitha Krishnan; Albert Vernon Smith; Marcela K Tello-Ruiz; Gudmundur A Thorisson; Aravinda Chakravarti; Peter E Chen; David J Cutler; Carl S Kashuk; Shin Lin; Gonçalo R Abecasis; Weihua Guan; Yun Li; Heather M Munro; Zhaohui Steve Qin; Daryl J Thomas; Gilean McVean; Adam Auton; Leonardo Bottolo; Niall Cardin; Susana Eyheramendy; Colin Freeman; Jonathan Marchini; Simon Myers; Chris Spencer; Matthew Stephens; Peter Donnelly; Lon R Cardon; Geraldine Clarke; David M Evans; Andrew P Morris; Bruce S Weir; Tatsuhiko Tsunoda; James C Mullikin; Stephen T Sherry; Michael Feolo; Andrew Skol; Houcan Zhang; Changqing Zeng; Hui Zhao; Ichiro Matsuda; Yoshimitsu Fukushima; Darryl R Macer; Eiko Suda; Charles N Rotimi; Clement A Adebamowo; Ike Ajayi; Toyin Aniagwu; Patricia A Marshall; Chibuzor Nkwodimmah; Charmaine D M Royal; Mark F Leppert; Missy Dixon; Andy Peiffer; Renzong Qiu; Alastair Kent; Kazuto Kato; Norio Niikawa; Isaac F Adewole; Bartha M Knoppers; Morris W Foster; Ellen Wright Clayton; Jessica Watkin; Richard A Gibbs; John W Belmont; Donna Muzny; Lynne Nazareth; Erica Sodergren; George M Weinstock; David A Wheeler; Imtaz Yakub; Stacey B Gabriel; Robert C Onofrio; Daniel J Richter; Liuda Ziaugra; Bruce W Birren; Mark J Daly; David Altshuler; Richard K Wilson; Lucinda L Fulton; Jane Rogers; John Burton; Nigel P Carter; Christopher M Clee; Mark Griffiths; Matthew C Jones; Kirsten McLay; Robert W Plumb; Mark T Ross; Sarah K Sims; David L Willey; Zhu Chen; Hua Han; Le Kang; Martin Godbout; John C Wallenburg; Paul L'Archevêque; Guy Bellemare; Koji Saeki; Hongguang Wang; Daochang An; Hongbo Fu; Qing Li; Zhen Wang; Renwu Wang; Arthur L Holden; Lisa D Brooks; Jean E McEwen; Mark S Guyer; Vivian Ota Wang; Jane L Peterson; Michael Shi; Jack Spiegel; Lawrence M Sung; Lynn F Zacharia; Francis S Collins; Karen Kennedy; Ruth Jamieson; John Stewart
Journal: Nature Date: 2007-10-18 Impact factor: 49.962

Review 6. ArrayTrack: a free FDA bioinformatics tool to support emerging biomedical research--an update.

Authors: Joshua Xu; Reagan Kelly; Hong Fang; Weida Tong
Journal: Hum Genomics Date: 2010-08 Impact factor: 4.639

7. Two new ArrayTrack libraries for personalized biomedical research.

Authors: Joshua Xu; Carolyn Wise; Vijayalakshmi Varma; Hong Fang; Baitang Ning; Huixiao Hong; Weida Tong; Jim Kaput
Journal: BMC Bioinformatics Date: 2010-10-07 Impact factor: 3.169

7 in total