Literature DB >> 19238253

FstSNP-HapMap3: a database of SNPs with high population differentiation for HapMap3.

Shiwei Duan1, Wei Zhang, Nancy Jean Cox, Mary Eileen Dolan.   

Abstract

UNLABELLED: The International HapMap Project has recently made available genotypes and frequency data for phase 3 (NCBI build 36, dbSNPb129) of the HapMap providing an enriched genotype dataset for approximately 1.6 million single nucleotide polymorphisms (SNPs) from 1,115 individuals with ancestry from parts of Africa, Asia, Europe, North America and Mexico. In the present study, we aim to facilitate pharmacogenetics studies by providing a database of SNPs with high population differentiation through a genomewide test on allele frequency variation among 11 HapMap3 samples. Common SNPs with minor allele frequency greater than 5 cent from each of 11 HapMap3 samples were included in the present analysis. The population differentiation is measured in terms of fixation index (Fst), and the SNPs with Fst values over 0.5 were defined as highly differentiated SNPs. Our tests were carried out between all pairs of the 11 HapMap3 samples or among subgroups with the same continental ancestries. Altogether we carried out 64 genomewide Fst tests and identified 28,215 highly differentiated SNPs for 49 different combinations of HapMap3 samples in the current database. AVAILABILITY: http://FstSNP-hapmap3.googlecode.com/

Entities:  

Keywords:  Fst; HapMap3; SNP; database; human genome; population differentiation

Year:  2008        PMID: 19238253      PMCID: PMC2639690          DOI: 10.6026/97320630003139

Source DB:  PubMed          Journal:  Bioinformation        ISSN: 0973-2063


Background

With a public dataset of both genotypes and inferred haplotypes for millions of SNPs in ethnically diverse samples, the International HapMap Project [1] has provided a landscape of the human genome that enables genetic scientists to compare their genetic variant results with that of a reference. The vast information from the International HapMap Project has significantly driven the development of more efficient statistical tools for high-throughput analysis of large genetic data sets [2] and enhanced our knowledge of population genetics [3-4]. Given that the HapMap samples are Epstein-Barr virus (EBV)-transformed lymphoblastoid cell lines (LCLs) that can be purchased [5], researchers can perform integrated analysis that includes genetic variation, gene expression [6-9], gene transcript isoforms [10] and sensitivity to drugs [11-15]. HapMap3 enhances the initial HapMap samples with 1,115 individual samples collected from 11 populations around the world, thus providing more diversity. The HapMap3 includes samples from individuals of African, Asian, European and Mexican descent residing in various locations. Therefore, it has become an ideal resource to study genotype-phenotype relationships with cellular phenotypes. In this analysis, we compare the genotypic frequencies of over 1 million SNPs among the samples. We hypothesize that the list of candidate SNPs, which have significantly different allele frequency distributions among the diverse samples, will facilitate studies attempting to correlate genotype with cellular phenotypes such as pharmacogenomics, and alert investigators to SNPs that might contribute disproportionately to substructure within more heterogeneous samples.

Methodology

HapMap3 genotypic data and SNP function classes

The genotypic data of 1,115 individuals from 11 population samples were downloaded from the International HapMap Project website (Phase III, release 1). Two samples NA18955, NA18962 in the CHB group actually belong to the JPT sample set, and thus were dropped out of the analysis. In the current study, the tested population panel focused only on the 931 unrelated individuals comprised of Gujarati Indians in Houston, Texas (GIH, n = 83), individuals of Mexican ancestry in Los Angeles, California (MEX, n = 47) and individuals of African ancestry from the Southwest USA (ASW, n = 47), Luhya in Webuye, Kenya (LWK, n = 83), Maasai in Kinyawa, Kenya (MKK, n = 143), Yoruba in Ibadan, Nigeria (YRI, n = 108), Han Chinese from Beijing, China (CHB, n = 80), Chinese from metropolitan Denver, Colorado (CHD, n = 70), Japanese from Tokyo, Japan (JPT, n= 82); Utah residents with Northern and Western European ancestry from the CEPH collection (CEU, n = 111), and Tuscans from Italy (TSI, n = 77). Of note, two samples (NA18955, NA18962) included in the CHB group belong to JPT. Five groups of samples (ASW, CEU, MEX, MKK and YRI) are familial samples; and the rest are unrelated samples. A total of 1,614,792 polymorphic SNPs were genotyped in the HapMap3 Project. The number of shared SNPs ranges from 1,047,055 to 1,487,361 for the pair-wise combinations of the 11 HapMap3 population samples (Figure 1). Using the same reference allele of a SNP, we calculated the allele frequencies across all the populations.
Figure 1

The diagram for the construction of FstSNP_HapMap3

Fst calculation

Fst, a metric representation of the effect of population subdivision, was estimated according to Wright's approximate formula, Fst = (H where HT represents expected heterozygosity per locus of the total population and HS represents expected heterozygosity of a subpopulation [16]. An Fst value was calculated for each SNP of interest with allele frequencies estimated from the unrelated individuals in each population. We calculated Fst values for only SNPs with minor allele frequencies greater than 5¢ in each of the HapMap3 samples.

Dataset

The dataset [17] contains all SNPs with Fst value over 0.5 from the tests between any two of the 11 HapMap3 samples. There are 28,215 common SNPs with high population differentiation observed for at least 1 of 49 different combinations of HapMap3 population samples in the databases.

Development

The genotypes of approximately 1.6 million SNPs were downloaded from the International HapMap Project (HapMap3, release 1). Fst was evaluated for over 1 million SNPs for each of the 55 paired combinations from the 11 HapMap3 samples, followed by rearrangement into five distinct groups defined by historical geographic ancestry (African, Asian, European, MEX and GIH) (Figure 1). The Fst test was evaluated again among the combinations of the five geographical defined groups. Altogether there are 64 genomewide Fst tests for the HapMap3 samples. SNPs with high population differentiation (Fst > 0.5) were categorized into two major classes: genic and nongenic SNPs. The genic SNPs were further divided into six function classes including intron, UTR (utr-3 or utr-5), locus region (near-gene-3 or near-gene-5), splice site, coding synonymous or nonsynonymous based on the annotation in the NCBI dbSNP129 database. The SNP annotation was added as a separate column to facilitate additional applications in the future.

Database content

This database [17] provides 115, 212 entries of 28,215 SNPs with high differentiation among 48 different combinations of the HapMap3 samples. There are six columns for the database as follows: SNP, SNP_CHR (SNP Chromosome), SNP_POSI (SNP position in dbSNP129), Fst value, Tested_Populations and DbSNP129_Class. The number of the entries in each group ranges from 3 for GIH_TSI group to 10,416 for JPT_YRI group.

Database usage

The users can download the FstSNP_HapMap3 [17] and then query the database by the conditions of genomic regions or a list of SNPs or SNP Class.

Caveats

The Fst test was evaluated in the HapMap3 population samples, some of which have relatively small sample size (e.g. 47 individuals in ASW and MEX). Furthermore, we note that SNPs included in HapMap3 are subject to a number of ascertainment biases; thus, these SNPs cannot be considered as representative of Fst values that might be calculated for all common variants.
  14 in total

1.  A new method for detecting human recombination hotspots and its applications to the HapMap ENCODE data.

Authors:  Jun Li; Michael Q Zhang; Xuegong Zhang
Journal:  Am J Hum Genet       Date:  2006-08-30       Impact factor: 11.025

2.  Evaluation of genetic variation contributing to differences in gene expression between populations.

Authors:  Wei Zhang; Shiwei Duan; Emily O Kistner; Wasim K Bleibel; R Stephanie Huang; Tyson A Clark; Tina X Chen; Anthony C Schweitzer; John E Blume; Nancy J Cox; M Eileen Dolan
Journal:  Am J Hum Genet       Date:  2008-02-28       Impact factor: 11.025

3.  Genetical structure of populations.

Authors:  S WRIGHT
Journal:  Nature       Date:  1950-08-12       Impact factor: 49.962

4.  Population genomics of human gene expression.

Authors:  Barbara E Stranger; Alexandra C Nica; Matthew S Forrest; Antigone Dimas; Christine P Bird; Claude Beazley; Catherine E Ingle; Mark Dunning; Paul Flicek; Daphne Koller; Stephen Montgomery; Simon Tavaré; Panos Deloukas; Emmanouil T Dermitzakis
Journal:  Nat Genet       Date:  2007-09-16       Impact factor: 38.330

5.  Genetic variants contributing to daunorubicin-induced cytotoxicity.

Authors:  R Stephanie Huang; Shiwei Duan; Emily O Kistner; Wasim K Bleibel; Shannon M Delaney; Donna L Fackenthal; Soma Das; M Eileen Dolan
Journal:  Cancer Res       Date:  2008-05-01       Impact factor: 12.701

6.  Identification of genetic variants and gene expression relationships associated with pharmacogenes in humans.

Authors:  Rong Stephanie Huang; Shiwei Duan; Emily O Kistner; Wei Zhang; Wasim K Bleibel; Nancy J Cox; M Eileen Dolan
Journal:  Pharmacogenet Genomics       Date:  2008-06       Impact factor: 2.089

7.  Genome-wide detection and characterization of positive selection in human populations.

Authors:  Pardis C Sabeti; Patrick Varilly; Ben Fry; Jason Lohmueller; Elizabeth Hostetter; Chris Cotsapas; Xiaohui Xie; Elizabeth H Byrne; Steven A McCarroll; Rachelle Gaudet; Stephen F Schaffner; Eric S Lander; Kelly A Frazer; Dennis G Ballinger; David R Cox; David A Hinds; Laura L Stuve; Richard A Gibbs; John W Belmont; Andrew Boudreau; Paul Hardenbol; Suzanne M Leal; Shiran Pasternak; David A Wheeler; Thomas D Willis; Fuli Yu; Huanming Yang; Changqing Zeng; Yang Gao; Haoran Hu; Weitao Hu; Chaohua Li; Wei Lin; Siqi Liu; Hao Pan; Xiaoli Tang; Jian Wang; Wei Wang; Jun Yu; Bo Zhang; Qingrun Zhang; Hongbin Zhao; Hui Zhao; Jun Zhou; Stacey B Gabriel; Rachel Barry; Brendan Blumenstiel; Amy Camargo; Matthew Defelice; Maura Faggart; Mary Goyette; Supriya Gupta; Jamie Moore; Huy Nguyen; Robert C Onofrio; Melissa Parkin; Jessica Roy; Erich Stahl; Ellen Winchester; Liuda Ziaugra; David Altshuler; Yan Shen; Zhijian Yao; Wei Huang; Xun Chu; Yungang He; Li Jin; Yangfan Liu; Yayun Shen; Weiwei Sun; Haifeng Wang; Yi Wang; Ying Wang; Xiaoyan Xiong; Liang Xu; Mary M Y Waye; Stephen K W Tsui; Hong Xue; J Tze-Fei Wong; Luana M Galver; Jian-Bing Fan; Kevin Gunderson; Sarah S Murray; Arnold R Oliphant; Mark S Chee; Alexandre Montpetit; Fanny Chagnon; Vincent Ferretti; Martin Leboeuf; Jean-François Olivier; Michael S Phillips; Stéphanie Roumy; Clémentine Sallée; Andrei Verner; Thomas J Hudson; Pui-Yan Kwok; Dongmei Cai; Daniel C Koboldt; Raymond D Miller; Ludmila Pawlikowska; Patricia Taillon-Miller; Ming Xiao; Lap-Chee Tsui; William Mak; You Qiang Song; Paul K H Tam; Yusuke Nakamura; Takahisa Kawaguchi; Takuya Kitamoto; Takashi Morizono; Atsushi Nagashima; Yozo Ohnishi; Akihiro Sekine; Toshihiro Tanaka; Tatsuhiko Tsunoda; Panos Deloukas; Christine P Bird; Marcos Delgado; Emmanouil T Dermitzakis; Rhian Gwilliam; Sarah Hunt; Jonathan Morrison; Don Powell; Barbara E Stranger; Pamela Whittaker; David R Bentley; Mark J Daly; Paul I W de Bakker; Jeff Barrett; Yves R Chretien; Julian Maller; Steve McCarroll; Nick Patterson; Itsik Pe'er; Alkes Price; Shaun Purcell; Daniel J Richter; Pardis Sabeti; Richa Saxena; Stephen F Schaffner; Pak C Sham; Patrick Varilly; David Altshuler; Lincoln D Stein; Lalitha Krishnan; Albert Vernon Smith; Marcela K Tello-Ruiz; Gudmundur A Thorisson; Aravinda Chakravarti; Peter E Chen; David J Cutler; Carl S Kashuk; Shin Lin; Gonçalo R Abecasis; Weihua Guan; Yun Li; Heather M Munro; Zhaohui Steve Qin; Daryl J Thomas; Gilean McVean; Adam Auton; Leonardo Bottolo; Niall Cardin; Susana Eyheramendy; Colin Freeman; Jonathan Marchini; Simon Myers; Chris Spencer; Matthew Stephens; Peter Donnelly; Lon R Cardon; Geraldine Clarke; David M Evans; Andrew P Morris; Bruce S Weir; Tatsuhiko Tsunoda; Todd A Johnson; James C Mullikin; Stephen T Sherry; Michael Feolo; Andrew Skol; Houcan Zhang; Changqing Zeng; Hui Zhao; Ichiro Matsuda; Yoshimitsu Fukushima; Darryl R Macer; Eiko Suda; Charles N Rotimi; Clement A Adebamowo; Ike Ajayi; Toyin Aniagwu; Patricia A Marshall; Chibuzor Nkwodimmah; Charmaine D M Royal; Mark F Leppert; Missy Dixon; Andy Peiffer; Renzong Qiu; Alastair Kent; Kazuto Kato; Norio Niikawa; Isaac F Adewole; Bartha M Knoppers; Morris W Foster; Ellen Wright Clayton; Jessica Watkin; Richard A Gibbs; John W Belmont; Donna Muzny; Lynne Nazareth; Erica Sodergren; George M Weinstock; David A Wheeler; Imtaz Yakub; Stacey B Gabriel; Robert C Onofrio; Daniel J Richter; Liuda Ziaugra; Bruce W Birren; Mark J Daly; David Altshuler; Richard K Wilson; Lucinda L Fulton; Jane Rogers; John Burton; Nigel P Carter; Christopher M Clee; Mark Griffiths; Matthew C Jones; Kirsten McLay; Robert W Plumb; Mark T Ross; Sarah K Sims; David L Willey; Zhu Chen; Hua Han; Le Kang; Martin Godbout; John C Wallenburg; Paul L'Archevêque; Guy Bellemare; Koji Saeki; Hongguang Wang; Daochang An; Hongbo Fu; Qing Li; Zhen Wang; Renwu Wang; Arthur L Holden; Lisa D Brooks; Jean E McEwen; Mark S Guyer; Vivian Ota Wang; Jane L Peterson; Michael Shi; Jack Spiegel; Lawrence M Sung; Lynn F Zacharia; Francis S Collins; Karen Kennedy; Ruth Jamieson; John Stewart
Journal:  Nature       Date:  2007-10-18       Impact factor: 49.962

8.  Identification of genetic variants contributing to cisplatin-induced cytotoxicity by use of a genomewide approach.

Authors:  R Stephanie Huang; Shiwei Duan; Sunita J Shukla; Emily O Kistner; Tyson A Clark; Tina X Chen; Anthony C Schweitzer; John E Blume; M Eileen Dolan
Journal:  Am J Hum Genet       Date:  2007-08-01       Impact factor: 11.025

9.  A genome-wide approach to identify genetic variants that contribute to etoposide-induced cytotoxicity.

Authors:  R Stephanie Huang; Shiwei Duan; Wasim K Bleibel; Emily O Kistner; Wei Zhang; Tyson A Clark; Tina X Chen; Anthony C Schweitzer; John E Blume; Nancy J Cox; M Eileen Dolan
Journal:  Proc Natl Acad Sci U S A       Date:  2007-05-30       Impact factor: 11.205

10.  Mapping genes that contribute to daunorubicin-induced cytotoxicity.

Authors:  Shiwei Duan; Wasim K Bleibel; Rong Stephanie Huang; Sunita J Shukla; Xiaolin Wu; Judith A Badner; M Eileen Dolan
Journal:  Cancer Res       Date:  2007-06-01       Impact factor: 12.701

View more
  18 in total

1.  Genetic structure of the Spanish population.

Authors:  Javier Gayán; José J Galan; Antonio González-Pérez; María Eugenia Sáez; María Teresa Martínez-Larrad; Carina Zabena; M Carmen Rivero; Ana Salinas; Reposo Ramírez-Lorca; Francisco J Morón; Jose Luis Royo; Concha Moreno-Rey; Juan Velasco; José M Carrasco; Eva Molero; Carolina Ochoa; María Dolores Ochoa; Marta Gutiérrez; Mercedes Reina; Rocío Pascual; Alejandro Romo-Astorga; Juan Luis Susillo-González; Enrique Vázquez; Luis M Real; Agustín Ruiz; Manuel Serrano-Ríos
Journal:  BMC Genomics       Date:  2010-05-25       Impact factor: 3.969

2.  Brazilian urban population genetic structure reveals a high degree of admixture.

Authors:  Suely R Giolo; Júlia M P Soler; Steven C Greenway; Marcio A A Almeida; Mariza de Andrade; J G Seidman; Christine E Seidman; José E Krieger; Alexandre C Pereira
Journal:  Eur J Hum Genet       Date:  2011-08-24       Impact factor: 4.246

3.  Association Study of Three Gene Polymorphisms Recently Identified by a Genome-Wide Association Study with Obesity-Related Phenotypes in Chinese Children.

Authors:  Qi-Ying Song; Jie-Yun Song; Yang Wang; Shuo Wang; Yi-De Yang; Xiang-Rui Meng; Jun Ma; Hai-Jun Wang; Yan Wang
Journal:  Obes Facts       Date:  2017-06-01       Impact factor: 3.942

4.  Extensive variation in chromatin states across humans.

Authors:  Maya Kasowski; Sofia Kyriazopoulou-Panagiotopoulou; Fabian Grubert; Judith B Zaugg; Anshul Kundaje; Yuling Liu; Alan P Boyle; Qiangfeng Cliff Zhang; Fouad Zakharia; Damek V Spacek; Jingjing Li; Dan Xie; Anthony Olarerin-George; Lars M Steinmetz; John B Hogenesch; Manolis Kellis; Serafim Batzoglou; Michael Snyder
Journal:  Science       Date:  2013-10-17       Impact factor: 47.728

5.  The human variome: genomic and epigenomic diversity.

Authors:  Munkit Choy; Mehregan Movassagh; Roger Foo
Journal:  EMBO Mol Med       Date:  2011-08-03       Impact factor: 12.137

6.  Sleep deficits and cannabis use behaviors: an analysis of shared genetics using linkage disequilibrium score regression and polygenic risk prediction.

Authors:  Evan A Winiger; Jarrod M Ellingson; Claire L Morrison; Robin P Corley; Joëlle A Pasman; Tamara L Wall; Christian J Hopfer; John K Hewitt
Journal:  Sleep       Date:  2021-03-12       Impact factor: 5.849

Review 7.  An introductory review of parallel independent component analysis (p-ICA) and a guide to applying p-ICA to genetic data and imaging phenotypes to identify disease-associated biological pathways and systems in common complex disorders.

Authors:  Godfrey D Pearlson; Jingyu Liu; Vince D Calhoun
Journal:  Front Genet       Date:  2015-09-07       Impact factor: 4.599

8.  Mapping of hepatic expression quantitative trait loci (eQTLs) in a Han Chinese population.

Authors:  Xiaoliang Wang; Huamei Tang; Mujian Teng; Zhiqiang Li; Jianguo Li; Junwei Fan; Lin Zhong; Xing Sun; Junming Xu; Guoqing Chen; Dawei Chen; Zhaowen Wang; Tonghai Xing; Jinyan Zhang; Li Huang; Shuyun Wang; Xiao Peng; Shengying Qin; Yongyong Shi; Zhihai Peng
Journal:  J Med Genet       Date:  2014-03-24       Impact factor: 6.318

9.  EvoSNP-DB: A database of genetic diversity in East Asian populations.

Authors:  Young Uk Kim; Young Jin Kim; Jong-Young Lee; Kiejung Park
Journal:  BMB Rep       Date:  2013-08       Impact factor: 4.778

10.  Genetic ancestry is associated with colorectal adenomas and adenocarcinomas in Latino populations.

Authors:  Gustavo Hernandez-Suarez; Maria Carolina Sanabria; Marta Serrano; Oscar F Herran; Jesus Perez; Jose L Plata; Jovanny Zabaleta; Albert Tenesa
Journal:  Eur J Hum Genet       Date:  2014-02-12       Impact factor: 4.246

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.