Literature DB >> 17517769

QuickSNP: an automated web server for selection of tagSNPs.

Deepak Grover1, Alonzo S Woodfield, Ranjana Verma, Peter P Zandi, Douglas F Levinson, James B Potash.   

Abstract

Although large-scale genetic association studies involving hundreds to thousands of SNPs have become feasible, the associated cost is substantial. Even with the increased efficiency introduced by the use of tagSNPs, researchers are often seeking ways to maximize resource utilization given a set of SNP-based gene-mapping goals. We have developed a web server named QuickSNP in order to provide cost-effective selection of SNPs, and to fill in some of the gaps in existing SNP selection tools. One useful feature of QuickSNP is the option to select only gene-centric SNPs from a chromosomal region in an automated fashion. Other useful features include automated selection of coding non-synonymous SNPs, SNP filtering based on inter-SNP distances and information regarding the availability of genotyping assays for SNPs and whether they are present on whole genome chips. The program produces user-friendly summary tables and results, and a link to a UCSC Genome Browser track illustrating the position of the selected tagSNPs in relation to genes and other genomic features. We hope the unique combination of features of this server will be useful for researchers aiming to select markers for their genotyping studies. The server is freely available and can be accessed at the URL http://bioinformoodics.jhmi.edu/quickSNP.pl.

Entities:  

Mesh:

Year:  2007        PMID: 17517769      PMCID: PMC1933212          DOI: 10.1093/nar/gkm329

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

The biggest challenge in human genetics currently is to identify the genes whose alleles confer susceptibility to disease. It is believed that there will be many loci that increase the risk for each common disease (1). Since each causative gene may make only a very modest contribution to disease risk, identification of particular susceptibility variants becomes quite difficult. While genetic association studies have been used in gene mapping, their efficiency has been limited because they have typically assessed only one or a few genes at a time. The development of new SNP genotyping technologies, which can handle from dozens to hundreds of thousands of SNPs, and large numbers of samples, promises to accelerate gene mapping. Current platforms include the Illumina BeadArray and BeadChip systems (2,3), the Affymetrix GeneChip Mapping Arrays (4) and Applied Biosystem's TaqMan SNP Genotyping Assays (5). The newest technologies, while powerful, bring with them substantial costs, as they can involve as many as hundreds of millions of genotypes. For this reason, researchers have been trying to devise ways to maximize efficiency of resource utilization given a set of SNP-based gene-mapping goals. It has been found that the pattern of linkage disequilibrium (LD) varies across the human genome and that there are discrete regions of high LD in the genome, called haplotype blocks (6). Most variation in populations can be characterized by a small number of common haplotypes. By selecting SNPs that uniquely identify or ‘tag’ these haplotypes, the number of markers and, hence, the cost of genotyping can be significantly reduced. The approach became more powerful with the availability of genetic data from the International HapMap Project (7), which contains genotype data for ∼4 million SNPs from each of four populations: Yoruba from Ibadan, Nigeria (YRI), Japanese from Tokyo (JPT), Chinese from Beijing (CHB) and United States residents with European ancestry (CEU). Even with the increased efficiency introduced by tagSNPs, investigators are typically in the position of having to make strategic decisions about which set of tagSNPs to study. One strategy is to focus on those within genes, as these have the greatest likelihood of being functionally relevant or being in LD with those that are functional (8). Recently, a similar question was explored using empirical data from the HapMap-ENCODE project; tagSNPs chosen to capture common variation in exonic as well as evolutionarily conserved regions yielded genotype savings compared with a tagging approach that captured all common variation across the region (9). While the extent to which functionally important elements in the genome reside strictly within and near genes is not known, a gene-centric genotyping strategy may be a reasonable approach to searching for disease susceptibility alleles in the setting of limited resources. The choice of SNPs for genetic association testing, thus, is a crucial step that will directly affect both the cost and the outcome of studies. Since the number of SNPs can range into the thousands, manual selection can be extremely time-consuming. There are some useful internet-based tools available for selection and prioritization of SNPs for genotyping. These include SNPper (10) (http://snpper.chip.org/), TAMAL (11) (http://neoref.ils.unc.edu/tamal/index.jsp), SNPSelector (12) (http://snpselector.duhs.duke.edu/hqsnp36.html), SNPHunter (13) (http://www.hsph.harvard.edu/ppg/software.htm), PupasView (14) (http://pupasview.bioinfo.cipf.es/) and tagger (15) (http://www.broad.mit.edu/mpg/tagger/). These programs have a variety of strengths as well as limitations. Among the gaps: most of them do not allow for automated selection of gene-based SNPs in a region, and none examines SNP coverage on genome-wide microarray SNP genotyping platforms. We have developed a web server named QuickSNP to provide selection of tagSNPs in a chromosomal region, and to fill in some of the gaps in existing SNP selection tools. One useful feature of QuickSNP is the option to input the coordinates of a chromosomal region and have the program select SNPs, in an automated fashion, only from the genes within that region. Other useful features include automated selection of coding non-synonymous SNPs, SNP filtering based on inter-SNP distances, and reporting of whether SNPs have available assays or are present on whole genome chips. There are several situations where we believe this tool will be particularly useful, including: (i) planning an LD-mapping study of a region de novo, where one has decided for any of a number of reasons to focus on genes and (ii) one is planning to obtain, or has obtained, data from a genome-wide association chip, and one wants to ‘fill in’ a particular region either because the chip scan produced a positive result or because of other information (e.g. a linkage peak or interest in a particular gene pathway), and one wants to find additional tagSNPs as well as coding non-synonymous SNPs in genes in the region.

MATERIALS AND METHODS

Implementation

QuickSNP utilizes Apache as its web server, and CGI (Common Gateway Interface) scripts are used to handle dataflow and validation to and from a dynamic HTML interface that utilizes cascading style sheet objects and integrated JavaScript. The data extraction and manipulation portion of the program is written in PERL (practical extraction and report language) modules and features two other programs embedded in the main code—Haploview, a freely available Java-based utility, and liftOver, a freely available Linux command-line application. QuickSNP is available at the URL http://bioinformoodics.jhmi.edu/quickSNP.pl. It is located on a cluster of processors running Linux OS at the Johns Hopkins McKusick-Nathans Institute for Genetic Medicine. All databases are locally downloaded and placed in the storage space of the Linux cluster. Files are not copied to a fileserver during user data uploads, but instead data is extracted dynamically from these files using CGI file handles, and thus information uploaded by users will not be retained on a file server.

Functionality

The basic function of QuickSNP is to generate a list of tagSNPs in a given chromosomal region, or for the genes in that region, or for any specified list of genes. For genomic position (whole region) searches, genotype data for SNPs lying in the region are extracted from the HapMap database (Figure 1). For genomic position (genes only) and gene name-based queries, gene coordinates are first extracted from the Entrez gene database and then SNP genotype data for those positions are extracted from HapMap. If the genomic position entered is not for NCBI build 35 (May 2004), it is first converted to that by the liftOver program. Also, genomic position is adjusted according to the length of flanking sequence used. The resulting SNP list is passed to the Haploview program, which generates tagSNPs based on the tagger algorithm (15) using the user-specified r2, minimum minor allele frequency and include/exclude tags specifications (if any).
Figure 1.

Schematic overview of the functioning of the QuickSNP web server.

Schematic overview of the functioning of the QuickSNP web server. There are categories of options that the user can select to obtain the best results from QuickSNP (see below). Input options like include/exclude tags and coding non-synonymous SNPs are used before tagSNP selection and they affect the list of SNPs that is used by Haploview for tagSNP selection. After selection of tagSNPs, the results can be further filtered by options such as removing SNPs lying too close (by a user-defined distance criterion). The user sees a results web page with two types of output. One is the core output consisting of a summary statistics table, a file displaying tagSNPs selected, and another file displaying pairwise LD tests as well as LD bins. The other type of output consists of additional information, including tables for genotype and allele frequencies, for the occurrence of SNPs in whole-genome chips and assays, and for the cost of genotyping. There is also a link to a graphical display of tagSNPs in the UCSC Genome Browser.

User interface

We have attempted to create a simplified user interface for QuickSNP so that it can be employed without the need for sophisticated computational skills. The input screen is divided into three main sections: input method, search conditions and additional options. The user may enter either genomic positions or gene names in the search window of the input method section. Multiple gene names can be entered by either typing in the corresponding window or uploading a file with a list of gene names. For the genomic position-based searches, users can further specify whether they wish to consider the whole region or just the genes within the specified region for tagSNP selection. The user is then required, in the search conditions section, to enter the desired r2, minor allele frequency and HapMap population. To access the basic functionality of the server, the user need not consider the section containing additional options. However, depending on the study design, these options can enable more judicious and efficient selection of tagSNPs. For example, the user may want to include or exclude certain SNPs (based on availability of PCR primers or on past performance of genotyping assays). The user has the option to include flanking sequence around genes, reject SNPs that are too close to each other (because they are less likely to work with certain genotyping platforms), and force include coding non-synonymous SNPs, which can be identified and included automatically by QuickSNP through a search of the whole-genome coding SNP database. There are other result-related options that display various kinds of information for the chosen tagSNPs. These include the cost of genotyping using some popular methods, allele and genotype frequencies of tagSNPs in four HapMap populations, and occurrence of tagSNPs in available whole-genome chips and assays. For genomic position-based queries, the user also has the option to graphically visualize the tagSNPs in relation to genes, transcripts, conserved regions and other genomic features in that region using the UCSC Genome Browser (http://genome.ucsc.edu/cgi-bin/hgGateway). The results are generated in the form of a zipped archive containing multiple files (for multiple genes), as well as text files corresponding to an individual gene or region. A file containing a list of tagSNPs chosen and another file with details regarding LD tests and bins is generated for each gene/region. A summary table is also generated that displays the number of SNPs in the HapMap database for the gene/region queried and the eventual number of tagSNPs selected by QuickSNP for individual genes as well as the whole region. If include/exclude existing tags and/or coding non-synonymous SNPs were implemented in a QuickSNP search, an additional result table would be generated that lists the included or excluded SNPs, which of them were used in the tagSNP search (only those with genotype data in HapMap database can be used for tagging), and their type (user-specified include/exclude tags versus coding non-synonymous SNPs). Other results are also generated based on the additional options used for QuickSNP search (Figure 2).
Figure 2.

Snapshot of the results page generated by QuickSNP for a typical search with various components highlighted and explained.

Snapshot of the results page generated by QuickSNP for a typical search with various components highlighted and explained. There are three levels of help available to QuickSNP users: (a) QuickHelp, which can be accessed by clicking on the [?] symbol next to each option, and which briefly explains the purpose of that option; (b) frequently asked questions, which provides more detail and (c) direct contact with the authors, available by emailing us at QuickSNP@jhmi.edu.

VALIDATION AND USAGE EXAMPLE

We validated the core functionality of QuickSNP for various genes and genomic regions by comparing results from QuickSNP to those derived from a manual tagSNP selection using HapMap and the tagger algorithm in Haploview. Since many QuickSNP options and features are unique to this tool, they could not be compared to existing automated resources for tagSNP selection. For those cases, we manually performed steps of analyses for some of the options (for example, gene-based searches in a genomic region including coding non-synonymous SNPs), and compared the results with those generated in an automated manner by QuickSNP. The results generated by QuickSNP were always in agreement with those generated by the manual procedures. We extensively used QuickSNP to select tagSNPs for a 6 Mb region on chromosome 17 that produced evidence for linkage to major depressive disorder in our Genetics of Recurrent Early Onset Depression (GenRED) collaborative project (16). Our aim was to select SNPs for an initial LD mapping association study of this region using the Illumina BeadStation custom genotyping platform. The region contained a total of ∼8000 HapMapII SNPs. Using criteria of r2 ⩾ 0.8 and MAF ⩾ 0.1, there were 1526 tagSNPs selected from across the full region and an additional 438 coding non-synonymous SNPs. Our project budget allowed us to study approximately 800 SNPs from the region in this initial experiment, so that excellent tagSNP coverage could be achieved if we focused on genes and their associated regulatory regions. We searched the region with QuickSNP using the genomic position, genes-only input method, force including coding non-synonymous and some previously genotyped SNPs, and rejecting SNPs that were closer than 60 bp. We performed various searches for different r2, MAF values and lengths of flanking region around genes. Table 1 shows the number of tagSNPs selected by QuickSNP using different combinations of parameters. We elected to genotype the 809 SNPs that resulted from tagging with r2 = 0.8, MAF ≥ 0.1 and a 5 kb flanking region on either side of each gene.
Table 1.

Number of tagSNPs selected for chromosome 17p 6.59–13 Mb region

r20.80.9
Minor allele freq.0.050.10.050.1
Flanking region around genes1 kb875739912773
3 kb922777960812
5 kb960809999845

Different lengths of flanking regions around genes and values of r2 and minor allele frequency cutoff were selected to generate this data.

Number of tagSNPs selected for chromosome 17p 6.59–13 Mb region Different lengths of flanking regions around genes and values of r2 and minor allele frequency cutoff were selected to generate this data.

SUMMARY

QuickSNP offers many useful features (see Table 2 for a comparison with other available programs):
Table 2.

Comparison of QuickSNP features with those of comparable software programs

SNPperSNPSelectorSNPhunterTAMALPUPASviewtaggerQuickSNP
INPUT- RELATED FEATURESGene nameYesYesYesYesYesNoYes
Chromosomal positionYesYesNoNoYesYesYes
Chromosomal bandYesNoNoNoYesNoNo
Batch query for gene namesYesYesNoYesYesNoYes
Conversion of coordinates between different genome assembliesNoNoNoNoNoNoYes
Gene-centric tag selection in a chromosomal regionNoYesNoNoNoNoYes
FILTERING OPTIONSMAFNoYesNoNoNoYesYes
r2NoYesNoNoNoYesYes
Option to force include/ exclude SNPsNoYesNoNoNoYesYes
Selection of only relevant include/exclude SNPs for tagging*NoNoNoNoNoNoYes
Include flanking region around genesYesYesYesYesYesNoYes
Automatic inclusion of coding non-synonymous SNPs for taggingNoNoNoNoNoNoYes
Spacing between SNPsYesYesYesNoNoYesYes
OUTPUT-RELATED FEATURESAllele and genotype frequencies for selected tagSNPsNoNoNoNoYesNoYes
Financial cost of genotypingNoNoNoNoNoNoYes
Occurrence of tagSNPs in popular whole genome chips and assaysNoNoNoNoNoNoYes
Representation of tag SNPs in UCSC genome browserNoYesNoYesNoNoYes

*This option predetermines which of the include/exclude SNPs are present in HapMap database for the given population, and uses only those for tagging. If this criterion is not used, the whole search aborts if any one of the include/exclude tag is absent in the HapMap database.

†Only allele frequencies, but not genotype frequencies.

Allows for a gene-centric approach to tagSNP selection; Accepts multiple gene names as input; Allows for automatic conversion of coordinates between different genome assemblies; Provides the option to include flanking sequence around genes; Provides the option to reject SNPs that are too closely spaced, since they are less likely to work in some genotyping platforms; Calculates the cost for the genotyping study; For the ‘include tag’ and ‘exclude tag’ options, predetermines which SNPs are present in the HapMap database for the given population, and implements inclusion or exclusion of only those (in other existing tools, the whole search aborts if any include/exclude tag is absent from the HapMap database); Automatically includes coding non-synonymous SNPs in the region, if specified by the user; Displays selected tagSNPs in the UCSC Genome Browser; Reports allele and genotype frequencies for tagSNPs in different populations; Reports the number of SNPs that have available assays or are present on whole genome chips provided by commercial genotyping platforms and Provides a user-friendly summary table, and downloadable results files. Comparison of QuickSNP features with those of comparable software programs *This option predetermines which of the include/exclude SNPs are present in HapMap database for the given population, and uses only those for tagging. If this criterion is not used, the whole search aborts if any one of the include/exclude tag is absent in the HapMap database. †Only allele frequencies, but not genotype frequencies. In the last few years, millions of new SNPs have been identified, and SNP genotyping technologies have developed rapidly. Investigators need to determine how to select SNPs for study in a chromosomal region in a manner that is efficient while still preserving power. There is a need for new tools, which can perform these functions in an automated manner. QuickSNP provides all of the basic SNP selection functions present in existing tools, while adding additional features.

FUTURE DIRECTIONS

At present, QuickSNP can handle regions as large as 5 Mb (for a genomic position-based search) or 40 genes (for a gene-name based search). In the future, we will attempt to optimize the algorithm and/or upgrade the hardware in order to increase this search limit. Since QuickSNP uses many public domain datasets, we will download and integrate the new updates as soon as they are released. An additional feature we are developing is the ability to assess the SNP coverage within genome-wide platforms for any given gene or genomic region. We will always welcome suggestions and bug reports by users, and will try to respond to these promptly.
  16 in total

1.  Allelic discrimination using fluorogenic probes and the 5' nuclease assay.

Authors:  K J Livak
Journal:  Genet Anal       Date:  1999-02

2.  SNPper: retrieval and analysis of human SNPs.

Authors:  A Riva; I S Kohane
Journal:  Bioinformatics       Date:  2002-12       Impact factor: 6.937

3.  Genotyping over 100,000 SNPs on a pair of oligonucleotide arrays.

Authors:  Hajime Matsuzaki; Shoulian Dong; Halina Loi; Xiaojun Di; Guoying Liu; Earl Hubbell; Jane Law; Tam Berntsen; Monica Chadha; Henry Hui; Geoffrey Yang; Giulia C Kennedy; Teresa A Webster; Simon Cawley; P Sean Walsh; Keith W Jones; Stephen P A Fodor; Rui Mei
Journal:  Nat Methods       Date:  2004-11       Impact factor: 28.547

Review 4.  Genome-wide association studies: theoretical and practical concerns.

Authors:  William Y S Wang; Bryan J Barratt; David G Clayton; John A Todd
Journal:  Nat Rev Genet       Date:  2005-02       Impact factor: 53.242

5.  A genome-wide scalable SNP genotyping assay using microarray technology.

Authors:  Kevin L Gunderson; Frank J Steemers; Grace Lee; Leo G Mendoza; Mark S Chee
Journal:  Nat Genet       Date:  2005-04-17       Impact factor: 38.330

Review 6.  Searching for genetic determinants in the new millennium.

Authors:  N J Risch
Journal:  Nature       Date:  2000-06-15       Impact factor: 49.962

7.  BeadArray technology: enabling an accurate, cost-effective approach to high-throughput genotyping.

Authors:  Arnold Oliphant; David L Barker; John R Stuelpnagel; Mark S Chee
Journal:  Biotechniques       Date:  2002-06       Impact factor: 1.993

8.  Genetics of recurrent early-onset major depression (GenRED): final genome scan report.

Authors:  Peter Holmans; Myrna M Weissman; George S Zubenko; William A Scheftner; Raymond R Crowe; J Raymond Depaulo; James A Knowles; Wendy N Zubenko; Kathleen Murphy-Eberenz; Diana H Marta; Sandra Boutelle; Melvin G McInnis; Philip Adams; Madeline Gladis; Jo Steele; Erin B Miller; James B Potash; Dean F Mackinnon; Douglas F Levinson
Journal:  Am J Psychiatry       Date:  2007-02       Impact factor: 18.112

9.  SNPHunter: a bioinformatic software for single nucleotide polymorphism data acquisition and management.

Authors:  Lin Wang; Simin Liu; Tianhua Niu; Xin Xu
Journal:  BMC Bioinformatics       Date:  2005-03-18       Impact factor: 3.169

10.  PupasView: a visual tool for selecting suitable SNPs, with putative pathological effect in genes, for genotyping purposes.

Authors:  Lucía Conde; Juan M Vaquerizas; Carles Ferrer-Costa; Xavier de la Cruz; Modesto Orozco; Joaquín Dopazo
Journal:  Nucleic Acids Res       Date:  2005-07-01       Impact factor: 16.971

View more
  11 in total

Review 1.  Methods for the Analysis and Interpretation for Rare Variants Associated with Complex Traits.

Authors:  J Dylan Weissenkampen; Yu Jiang; Scott Eckert; Bibo Jiang; Bingshan Li; Dajiang J Liu
Journal:  Curr Protoc Hum Genet       Date:  2019-03-08

2.  Genetic variability within the cholesterol lowering pathway and the effectiveness of statins in reducing the risk of MI.

Authors:  Bas J M Peters; Helmi Pett; Olaf H Klungel; Bruno H Ch Stricker; Bruce M Psaty; Nicole L Glazer; Kerri L Wiggins; Josh C Bis; Anthonius de Boer; Anke-Hilse Maitland-van der Zee
Journal:  Atherosclerosis       Date:  2011-06-17       Impact factor: 5.162

3.  Genetic variants in antioxidant genes are associated with diisocyanate-induced asthma.

Authors:  Berran Yucesoy; Victor J Johnson; Zana L Lummus; Grace E Kissling; Kara Fluharty; Denyse Gautrin; Jean-Luc Malo; André Cartier; Louis-Philippe Boulet; Joaquin Sastre; Santiago Quirce; Dori R Germolec; Susan M Tarlo; Maria-Jesus Cruz; Xavier Munoz; Michael I Luster; David I Bernstein
Journal:  Toxicol Sci       Date:  2012-05-17       Impact factor: 4.849

4.  Inferring population structure and relationship using minimal independent evolutionary markers in Y-chromosome: a hybrid approach of recursive feature selection for hierarchical clustering.

Authors:  Amit Kumar Srivastava; Rupali Chopra; Shafat Ali; Shweta Aggarwal; Lovekesh Vig; Rameshwar Nath Koul Bamezai
Journal:  Nucleic Acids Res       Date:  2014-07-16       Impact factor: 16.971

5.  Association study of Wnt signaling pathway genes in bipolar disorder.

Authors:  Peter P Zandi; Pamela L Belmonte; Virginia L Willour; Fernando S Goes; Judith A Badner; Sylvia G Simpson; Elliot S Gershon; Francis J McMahon; J Raymond DePaulo; James B Potash
Journal:  Arch Gen Psychiatry       Date:  2008-07

6.  N-Acetyltransferase 2 Genotypes Are Associated With Diisocyanate-Induced Asthma.

Authors:  Berran Yucesoy; Grace E Kissling; Victor J Johnson; Zana L Lummus; Denyse Gautrin; André Cartier; Louis-Philippe Boulet; Joaquin Sastre; Santiago Quirce; Susan M Tarlo; Maria-Jesus Cruz; Xavier Munoz; Michael I Luster; David I Bernstein
Journal:  J Occup Environ Med       Date:  2015-12       Impact factor: 2.162

7.  Family-based association of YWHAH in psychotic bipolar disorder.

Authors:  Deepak Grover; Ranjana Verma; Fernando S Goes; Pamela L Belmonte Mahon; Elliot S Gershon; Francis J McMahon; James B Potash; Elliot S Gershon; Francis J McMahon; James B Potash
Journal:  Am J Med Genet B Neuropsychiatr Genet       Date:  2009-10-05       Impact factor: 3.568

8.  A comprehensive in silico analysis of non-synonymous and regulatory SNPs of human MBL2 gene.

Authors:  Namarta Kalia; Aarti Sharma; Manpreet Kaur; Sukhdev Singh Kamboj; Jatinder Singh
Journal:  Springerplus       Date:  2016-06-21

9.  Oxidative Stress-Related Genetic Variants May Modify Associations of Phthalate Exposures with Asthma.

Authors:  I-Jen Wang; Wilfried J J Karmaus
Journal:  Int J Environ Res Public Health       Date:  2017-02-08       Impact factor: 3.390

10.  Interactions Between Bisphenol A Exposure and GSTP1 Polymorphisms in Childhood Asthma.

Authors:  Tien Jen Lin; Wilfried J J Karmaus; Mei Lien Chen; Jiin Chyr Hsu; I Jen Wang
Journal:  Allergy Asthma Immunol Res       Date:  2018-03       Impact factor: 5.764

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.