Literature DB >> 23977990

EvoSNP-DB: A database of genetic diversity in East Asian populations.

Young Uk Kim1, Young Jin Kim, Jong-Young Lee, Kiejung Park.   

Abstract

Genome-wide association studies (GWAS) have become popular as an approach for the identification of large numbers of phenotype-associated variants. However, differences in genetic architecture and environmental factors mean that the effect of variants can vary across populations. Understanding population genetic diversity is valuable for the investigation of possible population specific and independent effects of variants. EvoSNP-DB aims to provide information regarding genetic diversity among East Asian populations, including Chinese, Japanese, and Korean. Non-redundant SNPs (1.6 million) were genotyped in 54 Korean trios (162 samples) and were compared with 4 million SNPs from HapMap phase II populations. EvoSNP-DB provides two user interfaces for data query and visualization, and integrates scores of genetic diversity (Fst and VarLD) at the level of SNPs, genes, and chromosome regions. EvoSNP-DB is a web-based application that allows users to navigate and visualize measurements of population genetic differences in an interactive manner, and is available online at [http://biomi.cdc.go.kr/EvoSNP/].

Entities:  

Mesh:

Year:  2013        PMID: 23977990      PMCID: PMC4133910          DOI: 10.5483/bmbrep.2013.46.8.191

Source DB:  PubMed          Journal:  BMB Rep        ISSN: 1976-6696            Impact factor:   4.778


INTRODUCTION

Recent developments in high throughput SNP chip technologies have enabled researchers to conduct large-scale genome-wide association studies (GWAS) (1-6). These have revealed an unprecedented amount of genetic variants that are associated with complex traits (7). As of August, 2012, there were 1,330 publications and 6,848 phenotype-associated SNPs in the NHGRI GWAS catalogue (http://www.genome.gov/gwastudies/). The availability of plentiful phenotype-related genomic information is expected to lead to clinically applicable personal genomics in the near future (8,9), however, a number of issues require attention before this can be widely applied. Firstly, the identification of causal variants, and a functional investigation of known loci are required (10,11). GWAS have localized associated signals to specific genomic regions, however, most identified variants are located within intergenic, intronic, and gene desert regions, and are regarded as proxies for causal variants. Further analysis, such as fine mapping and resequencing, is required to unveil causal variants of phenotypes. Only a small number of genes in close proximity to associated variants have been examined to identify possible functional relationships with phenotypes. Secondly, the majority of GWAS have been conducted on populations with European ancestry. This data of European relevence should be validated for its application to other ethnicities, such as those of Asian or African ancestry. Although some recent GWAS have been conducted on ethnic groups other than Europeans, sample sizes and numbers of target phenotypes have been relatively small compared with studies of Europeans (2,6,12). It is important to consider population specific associations for personal genomics applications, as phenotype associations regularly vary across populations (3-5). Population specific or independent associations of variants can be identified by GWAS in a specific population, or by independent replication studies. However, these approaches require a large number of samples, compounding the high costs associated with genotyping. As an alternative, the genetic diversity of phenotype-associated regions may be examined. Large differences in genetic architecture among populations are an established cause of discrepancies in associations (3-5). The fixation index (Fst) is one of the most widely used metrics for measuring genetic differentiation between populations (13). Variation in linkage disequilibrium (VarLD) is another approach that measures population differences in LD patterns (14). Various web interfaces have been developed to provide user-friendly graphical interfaces (GUI) and browsers to access genetic variation data, including Haplotter, FstSNPHapMap3, SNP@Evolution, and Singapore Genome Variation Project (SGVP) (15-18). Three of these use only reference information, such as data from HapMap phase 2 and phase 3 (19,20). SGVP also provides information derived from three Southeast Asian populations, as compared to HapMap populations (18). Genetic diversity among East Asian populations has not previously been provided via a web service. In particular, the Korean population is one of the most intensely studied in East Asia, but there is no web resource providing genetic diversity data which includes Koreans. Although two populations (Han Chinese in Beijing )undefined(CHB) and Japanese in Tokyo (JPT)(CHB and JPT) should not be regarded as references for Koreans (21). We therefore developed EvoSNP-DB, a web resource for genetic diversity among East Asian populations.

RESULTS AND DISCUSSION

We constructed EvoSNP-DB by integrating GBrowse and genotype data from 108 Koreans (founders) and 210 HapMap phase II release #22 samples. After quality control, 1,147,845 SNPs overlapped across Korean and other HapMap populations. The EvoSNP-DB database and web server is implemented on a 24×2.66 GHz Xeon core server running on Red Hat Enterprise Linux (version 5.2), Apache (version 2.0), Tomcat (version 5.5), and MySQL (version 5.5). It is viewable in all major web browsers and operating systems, and is available online at [http://biomi.cdc.go.kr/EvoSNP/].

Database design and organization

Data flow through the application is described in Fig. 1. Briefly, genotype data were analyzed to calculate Fst, VarLD, allele frequency, and Hardy-Weinberg equilibrium (HWE). Processed data are stored in the database with annotation information retrieved from UCSC and OMIM. The database is wrapped by Gbrowse and JSP for data query and visualization interfaces. Genotype datasets are derived from the International HapMap Phase II release #22 data repository (11,12), including data from 60 Utah residents with ancestry from Northern and Western Europe (CEU), 45 Han Chinese in Beijing (CHB), 45 Japanese in Tokyo (JPT), and 60 Yoruba in Ibadan, Nigeria (YRI). Considering the relatively small number of samples of CHB and JPT, we pooled the data of both as a single geographical group, and denoted it as ASN (Asian, 90 samples). The SNP information from 108 Korean founders of 54 trios was compared to those of HapMap populations. The database has been integrated with Fst and VarLD metrics to facilitate the graphical representation of the data. Fst measures polymorphism within each population and differentiation among geographical groups (13). To quantify variation in population linkage disequilibrium patterns, we used the varLD program (14). HapMap, UCSC, OMIM, and the NHGRI GWAS catalogue were major sources of annotation information.
Fig. 1.

Flow diagram of EvoSNP-DB construction.

User interface and visualization

Within EvoSNP-DB there are user interfaces for data queries and visualization. Three types of query can be applied: (i) SNP identifier, (ii) mRNA ID or gene symbol, and (iii) specific chromosome region. For example, rs28218 could be used for a SNP based search, NM_002124 or ORF4F16 for a gene search, and chr1:157661000..157806000 to search for this chromosomal region. Regardless of the query type, EvoSNP-DB returns three tables providing Region, Gene, and SNP statistics (Fig. 2). Each table contains summary variation scores. Fig. 2 illustrates the output when rs28218 was used as a search term; scores of the gene TRIO, which contains this SNP, are summarized in the gene statistics table. JSP and GMOD (http://gmod.org) were used to build the table and figure interfaces. Links to public online databases, including Entrez Nucleotide, dbSNP, OMIM (22), and HapMap (20), are provided in EvoSNP-DB, together with the results (Fig. 2). EvoSNP-DB also offers a generic genome browser, which displays overviews of chromosomes, contigs, genes, mRNAs, and SNPs (23). Figs. 3 and 4 demonstrate the output if small or large numbers of SNPs exist in the query region, respectively.
Fig. 2.

A screenshot of the result table from EvoSNP-DB.

Fig. 3.

A detailed screenshot showing EvoSNP-DB search results. Top track: chromosomal overview. SNP locations, diamond shapes. OMIM disease associations, rectangles. Second track: VarLD scores visualized along a 2 Mb chromosomal region. Third track: allele frequencies of SNPs, visualized as a pie chart for the Korean population or as towers for HapMap populations. Bottom track: Genes in the region.

Fig. 4.

A wide screenshot showing the search results with OMIM and GWAS Catalogue. Allele frequency is not displayed, but each SNP is indicated.

EvoSNP-DB provides an open-architecture website using a wiki interface for data access (a wiki is a website that allows its users to add, modify, or delete its content via a web browser), and wiki-based SNP annotation will be available in the near future. This will be particularly useful for constructing accurate and informative annotation for variants identified by the collaborative work of many researchers. MySQL, Python, JSP, and GBrowse were used in database construction, and to enhance interface utility (24).

MATERIALS AND METHODS

Korean genotype data

Previously, we conducted GWAS for two independent Korean population-based cohorts (Ansung and Ansan) as part of the Korean Genome Epidemiology Study (KoGES), and the Korean Association REsource (KARE) project, which was initiated in 2007 (2, 4). In the Ansung area, we recruited additional family members of the original participants to facilitate family based association studies. Among these, 54 trios (162 samples) were genotyped using an Affymetrix Genome-Wide Human SNP array 6.0 and an Illumina human Omni1-Quad Chip. Genotypes were called with Birdseed and BeadStudio GenCall for Affymetrix and Illumina arrays, respectively (25, 26). Initially, ∼1.9 million SNPs from the two platforms (909,622 for Affymetrix and 1,010,624 for the Illumina array) were merged. For quality control, we excluded SNPs using the following criteria: non-autosomal, mendelian errors, high missing genotype rate (> 5%), and deviation from HWE (P < 1E-6). Filtered SNPs were compared with data from HapMap SNPs, including allele, strand, and genomic position. After excluding 14 SNPs with annotation errors, 1,147,845 SNPs were overlapped with HapMap SNPs (27).

HapMap genotype data

HapMap phase II release #22 data (210 samples) were downloaded. Genotype data were converted to the PLINK binary genotype format, and genotype frequencies, allele frequencies, and P-values of HWE calculated using PLINK (28).

Analysis of genetic diversity among populations

Fst and VarLD were used as population genetic diversity metrics (13,14). Fst was calculated for each SNP by a pairwise comparison of four populations. Genome-wide VarLD analysis was performed; VarLD scores were calculated for windows of 50 SNPs, starting from the first SNP of each chromosome and ending with the last. All values from 22 chromosomes were merged and were converted to provide a standard normal distribution (mean=0, standard deviation=1). VarLD analysis procedures were performed for all pairs of populations. To access the degree of genetic difference between populations, we calculated the quartiles of Fst and VarLD score distributions.
  28 in total

1.  The generic genome browser: a building block for a model organism system database.

Authors:  Lincoln D Stein; Christopher Mungall; ShengQiang Shu; Michael Caudy; Marco Mangone; Allen Day; Elizabeth Nickerson; Jason E Stajich; Todd W Harris; Adrian Arva; Suzanna Lewis
Journal:  Genome Res       Date:  2002-10       Impact factor: 9.043

2.  On beyond GWAS.

Authors: 
Journal:  Nat Genet       Date:  2010-07       Impact factor: 38.330

3.  KAREBrowser: SNP database of Korea Association REsource Project.

Authors:  Chang Bum Hong; Young Jin Kim; Sanghoon Moon; Young-Ah Shin; Yoon Shin Cho; Jong-Young Lee
Journal:  BMB Rep       Date:  2012-01       Impact factor: 4.778

4.  A haplotype map of the human genome.

Authors: 
Journal:  Nature       Date:  2005-10-27       Impact factor: 49.962

5.  Potential etiologic and functional implications of genome-wide association loci for human diseases and traits.

Authors:  Lucia A Hindorff; Praveen Sethupathy; Heather A Junkins; Erin M Ramos; Jayashri P Mehta; Francis S Collins; Teri A Manolio
Journal:  Proc Natl Acad Sci U S A       Date:  2009-05-27       Impact factor: 11.205

6.  Singapore Genome Variation Project: a haplotype map of three Southeast Asian populations.

Authors:  Yik-Ying Teo; Xueling Sim; Rick T H Ong; Adrian K S Tan; Jieming Chen; Erwin Tantoso; Kerrin S Small; Chee-Seng Ku; Edmund J D Lee; Mark Seielstad; Kee-Seng Chia
Journal:  Genome Res       Date:  2009-08-21       Impact factor: 9.043

7.  A second generation human haplotype map of over 3.1 million SNPs.

Authors:  Kelly A Frazer; Dennis G Ballinger; David R Cox; David A Hinds; Laura L Stuve; Richard A Gibbs; John W Belmont; Andrew Boudreau; Paul Hardenbol; Suzanne M Leal; Shiran Pasternak; David A Wheeler; Thomas D Willis; Fuli Yu; Huanming Yang; Changqing Zeng; Yang Gao; Haoran Hu; Weitao Hu; Chaohua Li; Wei Lin; Siqi Liu; Hao Pan; Xiaoli Tang; Jian Wang; Wei Wang; Jun Yu; Bo Zhang; Qingrun Zhang; Hongbin Zhao; Hui Zhao; Jun Zhou; Stacey B Gabriel; Rachel Barry; Brendan Blumenstiel; Amy Camargo; Matthew Defelice; Maura Faggart; Mary Goyette; Supriya Gupta; Jamie Moore; Huy Nguyen; Robert C Onofrio; Melissa Parkin; Jessica Roy; Erich Stahl; Ellen Winchester; Liuda Ziaugra; David Altshuler; Yan Shen; Zhijian Yao; Wei Huang; Xun Chu; Yungang He; Li Jin; Yangfan Liu; Yayun Shen; Weiwei Sun; Haifeng Wang; Yi Wang; Ying Wang; Xiaoyan Xiong; Liang Xu; Mary M Y Waye; Stephen K W Tsui; Hong Xue; J Tze-Fei Wong; Luana M Galver; Jian-Bing Fan; Kevin Gunderson; Sarah S Murray; Arnold R Oliphant; Mark S Chee; Alexandre Montpetit; Fanny Chagnon; Vincent Ferretti; Martin Leboeuf; Jean-François Olivier; Michael S Phillips; Stéphanie Roumy; Clémentine Sallée; Andrei Verner; Thomas J Hudson; Pui-Yan Kwok; Dongmei Cai; Daniel C Koboldt; Raymond D Miller; Ludmila Pawlikowska; Patricia Taillon-Miller; Ming Xiao; Lap-Chee Tsui; William Mak; You Qiang Song; Paul K H Tam; Yusuke Nakamura; Takahisa Kawaguchi; Takuya Kitamoto; Takashi Morizono; Atsushi Nagashima; Yozo Ohnishi; Akihiro Sekine; Toshihiro Tanaka; Tatsuhiko Tsunoda; Panos Deloukas; Christine P Bird; Marcos Delgado; Emmanouil T Dermitzakis; Rhian Gwilliam; Sarah Hunt; Jonathan Morrison; Don Powell; Barbara E Stranger; Pamela Whittaker; David R Bentley; Mark J Daly; Paul I W de Bakker; Jeff Barrett; Yves R Chretien; Julian Maller; Steve McCarroll; Nick Patterson; Itsik Pe'er; Alkes Price; Shaun Purcell; Daniel J Richter; Pardis Sabeti; Richa Saxena; Stephen F Schaffner; Pak C Sham; Patrick Varilly; David Altshuler; Lincoln D Stein; Lalitha Krishnan; Albert Vernon Smith; Marcela K Tello-Ruiz; Gudmundur A Thorisson; Aravinda Chakravarti; Peter E Chen; David J Cutler; Carl S Kashuk; Shin Lin; Gonçalo R Abecasis; Weihua Guan; Yun Li; Heather M Munro; Zhaohui Steve Qin; Daryl J Thomas; Gilean McVean; Adam Auton; Leonardo Bottolo; Niall Cardin; Susana Eyheramendy; Colin Freeman; Jonathan Marchini; Simon Myers; Chris Spencer; Matthew Stephens; Peter Donnelly; Lon R Cardon; Geraldine Clarke; David M Evans; Andrew P Morris; Bruce S Weir; Tatsuhiko Tsunoda; James C Mullikin; Stephen T Sherry; Michael Feolo; Andrew Skol; Houcan Zhang; Changqing Zeng; Hui Zhao; Ichiro Matsuda; Yoshimitsu Fukushima; Darryl R Macer; Eiko Suda; Charles N Rotimi; Clement A Adebamowo; Ike Ajayi; Toyin Aniagwu; Patricia A Marshall; Chibuzor Nkwodimmah; Charmaine D M Royal; Mark F Leppert; Missy Dixon; Andy Peiffer; Renzong Qiu; Alastair Kent; Kazuto Kato; Norio Niikawa; Isaac F Adewole; Bartha M Knoppers; Morris W Foster; Ellen Wright Clayton; Jessica Watkin; Richard A Gibbs; John W Belmont; Donna Muzny; Lynne Nazareth; Erica Sodergren; George M Weinstock; David A Wheeler; Imtaz Yakub; Stacey B Gabriel; Robert C Onofrio; Daniel J Richter; Liuda Ziaugra; Bruce W Birren; Mark J Daly; David Altshuler; Richard K Wilson; Lucinda L Fulton; Jane Rogers; John Burton; Nigel P Carter; Christopher M Clee; Mark Griffiths; Matthew C Jones; Kirsten McLay; Robert W Plumb; Mark T Ross; Sarah K Sims; David L Willey; Zhu Chen; Hua Han; Le Kang; Martin Godbout; John C Wallenburg; Paul L'Archevêque; Guy Bellemare; Koji Saeki; Hongguang Wang; Daochang An; Hongbo Fu; Qing Li; Zhen Wang; Renwu Wang; Arthur L Holden; Lisa D Brooks; Jean E McEwen; Mark S Guyer; Vivian Ota Wang; Jane L Peterson; Michael Shi; Jack Spiegel; Lawrence M Sung; Lynn F Zacharia; Francis S Collins; Karen Kennedy; Ruth Jamieson; John Stewart
Journal:  Nature       Date:  2007-10-18       Impact factor: 49.962

8.  FstSNP-HapMap3: a database of SNPs with high population differentiation for HapMap3.

Authors:  Shiwei Duan; Wei Zhang; Nancy Jean Cox; Mary Eileen Dolan
Journal:  Bioinformation       Date:  2008-11-09

9.  Geographical affinities of the HapMap samples.

Authors:  Miao He; Jane Gitschier; Tatiana Zerjal; Peter de Knijff; Chris Tyler-Smith; Yali Xue
Journal:  PLoS One       Date:  2009-03-04       Impact factor: 3.240

10.  SNP@Evolution: a hierarchical database of positive selection on the human genome.

Authors:  Feng Cheng; Wei Chen; Elliott Richards; Libin Deng; Changqing Zeng
Journal:  BMC Evol Biol       Date:  2009-09-05       Impact factor: 3.260

View more
  1 in total

1.  KRGDB: the large-scale variant database of 1722 Koreans based on whole genome sequencing.

Authors:  Kwang Su Jung; Kyung-Won Hong; Hyun Youn Jo; Jongpill Choi; Hyo-Jeong Ban; Seong Beom Cho; Myungguen Chung
Journal:  Database (Oxford)       Date:  2020-01-01       Impact factor: 3.451

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.