Literature DB >> 22413087

Ancestry informative marker set for han chinese population.

Hui-Qi Qu, Quan Li, Shuhua Xu, Joseph B McCormick, Susan P Fisher-Hoch, Momiao Xiong, Ji Qian, Li Jin.   

Abstract

The population of Han Chinese is ∼1.226 billion people. Genetic heterogeneity between northern Han Chinese (N-Han) and southern Han Chinese (S-Han) has been demonstrated by recent genome-wide studies. As an initial step toward health disparities and personalized medicine in Chinese population, this study developed a set of ancestry informative markers (AIM) for Han Chinese population.

Entities:  

Keywords:  Han Chinese; ancestry informative marker; genetic association study; population structure

Year:  2012        PMID: 22413087      PMCID: PMC3291503          DOI: 10.1534/g3.112.001941

Source DB:  PubMed          Journal:  G3 (Bethesda)        ISSN: 2160-1836            Impact factor:   3.154


Han Chinese compose the largest ethnic group in the world, which accounts for 91.51% of the Chinese population, or ∼1.226 billion people, according to China’s 2010 census (http://www.chinadaily.com.cn/china/2011-04/28/content_12415449.htm). Chronic diseases including cancer, vascular disease, and infectious diseases, are the leading causes of death in this population (He ). Genetic association study (GAS), a critical approach to understanding molecular mechanisms and population-specific genetic risk of these diseases, can lead to the development of effective interventions at an individual or population level. Currently, a major issue of GAS is the confounding effect of population stratification, which is a common source of false-positive or false-negative results in genetic association studies with case-control study design (Ziv and Burchard 2003). Our recent study identified obvious genetic heterogeneity between northern Han Chinese (N-Han) and southern Han Chinese (S-Han), historically divided by the natural barrier, the Yangtze River (Xu ). This study highlighted the importance of the correction for population stratification in GAS of the Han Chinese population. Population stratification is due to the presence of genetic subgroups with different allele frequencies within a population. When different population subgroups have different disease prevalence, the differences detected in allele frequencies between cases and controls might in fact be independent of disease etiology but actually related to different prevalence. They could result from the underlying sampling bias inherent in the unknown distribution of different genetic populations in the overall sample. This is a common reason for erroneous conclusions of disease associations (Cardon and Bell 2001). By correction for population stratification, a GAS will be able to eliminate spurious genetic associations and thus avoid further fruitless downstream efforts. In addition, a GAS may gain additional statistical power by correcting for population stratification, as shown by our previous study that showed that estimation of the genetic effect for candidate loci could be biased by population divergence (He ). To correct for population stratification, structured association identifies subpopulations within the larger population and tests genetic associations conditioned by the inferred ancestral information (Pritchard and Donnelly 2001). This structured association approach represented by the Eigenstrat algorithm (Price ) has been extensively demonstrated to be an effective approach for the correction of population stratification. To infer subpopulations, a number of DNA polymorphism markers are required with substantially different allele frequencies among the subpopulations, i.e. ancestry informative markers (AIM). The genotypes of a set of AIMs will enable the classification of subpopulations. To date, there is still no consensus standard to define the number of AIMs for correction of population stratification in each specific population. Genotyping cost is a major factor that determines the number of AIMs used in a study (Londin ). We therefore developed a set of AIMs for genetic studies of Han Chinese populations. To minimize the genotyping cost of structured association studies, the classification performance of different number of AIMs were assessed.

MATERIALS AND METHODS

This study analyzed a sample of 308 Han Chinese individuals from different geographic regions in China. In this sample, 150,916 autosome SNPs were genotyped with call rate >95% (Xu ). Principal component analysis (PCA) implemented in the Eigenstrat software (Patterson Price ) was used to identify ethnic outliers and genetically admixed individuals. For the PCA analysis, 18,000 tag SNPs without obvious linkage disequilibrium (LD; r2 < 0.2) were selected genome-wide. Two-hundred thirty-six individuals were unambiguously distinguishable as N-Han or S-Han, and thus were selected for defining the AIMs in Han Chinese. Geographic distribution of these 236 individuals is described in Table 1. In these 236 individuals, ancestry information content I of each autosome SNP was calculated using the infocalc program based on information-theoretic principles (Rosenberg ). Across 22 autosomes, an initial set of AIMs including 5000 SNPs was selected by choosing one SNP marker with the largest I in each 500 kb window. Each SNP marker has frequency >0.05 in both N-Han and S-Han, and has low LD (r2 < 0.2) with distance of >100 kb from the preceding AIM.
Table 1 

Geographic distribution of the 236 Han Chinese individuals

Geographic LocationNumber of IndividualsHistoric Classification
Beijing22Northern Han
Gansu13Northern Han
Non-specific northern Han9Northern Han
Hebei39Northern Han
Heilongjiang7Northern Han
Henan10Northern Han
Jilin3Northern Han
Liaoning4Northern Han
Neimeng7Northern Han
Ningxia2Northern Han
Shandong26Northern Han
Shannxi3Northern Han
Shanxi10Northern Han
Tianjin1Northern Han
Xinjiang6Northern Han
Anhui1Southern Han
Guangdong24Southern Han
Guangxi1Southern Han
Hubei2Southern Han
Hunan1Southern Han
Jiangsu11Southern Han
Jiangxi2Southern Han
Shanghai14Southern Han
Sichuan3Southern Han
Yunnan2Southern Han
Zhejiang13Southern Han

RESULTS AND DISCUSSION

To enable the application of AIMs in genetic studies of Han Chinese, these 5000 AIMs are listed in supporting information, Table S1, ranked by I. Shown by the PCA using these 5000 AIMs, N-Han and S-Han individuals formed two obviously distinct clusters by the first principal component (PC1). This finding is concordant with a recent genome-wide SNP genotyping study that revealed a one-dimensional “north-south” population structure in Han Chinese population (Chen ). For this initial set of AIMs of 5000 SNPs, I of each SNP is highly correlated with its eigenvector weight of PC1 (r = 0.936; Figure S1). This evidence is further support that the information content I is mainly determined by one-dimensional “north-south” population structure. By a stepwise procedure, we decreased the number of AIMs and investigated the change of PCA clustering. In each step, we decreased the number of AIMs by removing AIMs with the smallest I. The classification effect of PCAs was assessed by the maximum Matthews correlation coefficient (MCC) of each set of AIMs (Matthews 1975). We observed that the clustering effect was compromised significantly when less than 30 AIMs were used (Figure 1; Figure S2). On the basis of this analysis, we recommend at least the top 30 SNPs in the AIM list in Table S1 should be used in any structured association study on the Han-Chinese population. More robust correction for population stratification is expected when the top 140 AIMs in Table S1 are used, which differentiated N-Han and S-Han unambiguously in our study (Table 2). We further validated the performance of sets of AIMs by k-fold cross-validation. A threefold cross-validation achieved highly similar MCCs as the original model.
Figure 1 

Maximum Matthews correlation coefficient (MCC) of principal component analysis (PCA) clustering using different number of ancestry informative markers (AIM). The clustering performance is compromised obviously when the number of AIMs decreases to 30. Horizontal axis: the number of SNPs with robust I. Vertical axis: maximum MCC of each set of AIMs.

Table 2 

Classification performance of different number of AIMs

Number of AIMsPC1 CutoffMCCSpecificitySensitivity
15−0.040.8100.9660.901
20−0.030.8710.9300.952
25−0.030.9010.9710.952
30−0.020.9020.9320.969
35−0.030.9510.9860.976
40−0.030.9610.9860.982
45−0.040.9511.0000.970
50−0.030.9611.0000.976
60−0.030.9711.0000.982
70−0.030.9901.0000.994
80−0.030.9901.0000.994
90−0.020.9900.9871.000
100−0.031.0001.0001.000
110−0.040.9901.0000.994
120−0.040.9901.0000.994
130−0.031.0001.0001.000
140−0.041.0001.0001.000
150−0.031.0001.0001.000

AIM, ancestry informative marker; MCC, Matthews correlation coefficient; PC1, first principal component.

Maximum Matthews correlation coefficient (MCC) of principal component analysis (PCA) clustering using different number of ancestry informative markers (AIM). The clustering performance is compromised obviously when the number of AIMs decreases to 30. Horizontal axis: the number of SNPs with robust I. Vertical axis: maximum MCC of each set of AIMs. AIM, ancestry informative marker; MCC, Matthews correlation coefficient; PC1, first principal component. Difference in some common phenotypic traits, e.g. body height, facial features, and daily food compositions, are obvious between N-Han and S-Han Chinese. The population structure by genome-wide studies (Xu ; Chen ) highlighted the importance of correction for population stratification in genetic association study of Han Chinese. A large number of genetic studies are being performed in Han Chinese population, the majority being case-control studies. By providing a set of AIMs, our study aims to help to address the potential population stratification in genetic association studies. However, it is worth emphasizing that correction for population stratification may not always be addressed sufficiently using AIMs (Seldin and Price 2008). Replication of genetic association in an independent study is always important. Besides correction for population stratification, ancestry information inferred using the AIMs in Han Chinese may be used to assess genetic components underlying common traits, as differences in risk for some diseases have been observed between N-Han and S-Han Chinese (Rao ; Zhao ). Understanding subpopulation-specific risk factors for common diseases using the AIMs can be an initial step toward personalized medicine in the era of post-human genome projects (Barnes 2010).
  16 in total

Review 1.  Association study designs for complex diseases.

Authors:  L R Cardon; J I Bell
Journal:  Nat Rev Genet       Date:  2001-02       Impact factor: 53.242

2.  Informativeness of genetic markers for inference of ancestry.

Authors:  Noah A Rosenberg; Lei M Li; Ryk Ward; Jonathan K Pritchard
Journal:  Am J Hum Genet       Date:  2003-11-20       Impact factor: 11.025

Review 3.  Human population structure and genetic association studies.

Authors:  Elad Ziv; Esteban González Burchard
Journal:  Pharmacogenomics       Date:  2003-07       Impact factor: 2.533

4.  Comparison of the predicted and observed secondary structure of T4 phage lysozyme.

Authors:  B W Matthews
Journal:  Biochim Biophys Acta       Date:  1975-10-20

5.  Blood pressure differences between northern and southern Chinese: role of dietary factors: the International Study on Macronutrients and Blood Pressure.

Authors:  Liancheng Zhao; Jeremiah Stamler; Lijing L Yan; Beifan Zhou; Yangfeng Wu; Kiang Liu; Martha L Daviglus; Barbara H Dennis; Paul Elliott; Hirotsugu Ueshima; Jun Yang; Liguang Zhu; Dongshuang Guo
Journal:  Hypertension       Date:  2004-04-26       Impact factor: 10.190

6.  Principal components analysis corrects for stratification in genome-wide association studies.

Authors:  Alkes L Price; Nick J Patterson; Robert M Plenge; Michael E Weinblatt; Nancy A Shadick; David Reich
Journal:  Nat Genet       Date:  2006-07-23       Impact factor: 38.330

7.  Major causes of death among men and women in China.

Authors:  Jiang He; Dongfeng Gu; Xigui Wu; Kristi Reynolds; Xiufang Duan; Chonghua Yao; Jialiang Wang; Chung-Shiuan Chen; Jing Chen; Rachel P Wildman; Michael J Klag; Paul K Whelton
Journal:  N Engl J Med       Date:  2005-09-15       Impact factor: 91.245

8.  Comparison of electrocardiographic findings between Northern and Southern Chinese population samples.

Authors:  X Rao; X Wu; A R Folsom; X Liu; H Zhong; O D Williams; J Stamler
Journal:  Int J Epidemiol       Date:  2000-02       Impact factor: 7.196

9.  Population structure and eigenanalysis.

Authors:  Nick Patterson; Alkes L Price; David Reich
Journal:  PLoS Genet       Date:  2006-12       Impact factor: 5.917

10.  Application of ancestry informative markers to association studies in European Americans.

Authors:  Michael F Seldin; Alkes L Price
Journal:  PLoS Genet       Date:  2008-01       Impact factor: 5.917

View more
  4 in total

1.  A panel of ancestry informative markers to estimate and correct potential effects of population stratification in Han Chinese.

Authors:  Pengfei Qin; Zhiqiang Li; Wenfei Jin; Dongsheng Lu; Haiyi Lou; Jiawei Shen; Li Jin; Yongyong Shi; Shuhua Xu
Journal:  Eur J Hum Genet       Date:  2013-05-29       Impact factor: 4.246

2.  Translational genomic medicine: common metabolic traits and ancestral components of Mexican Americans.

Authors:  Hui-Qi Qu; Quan Li; Yang Lu; Susan P Fisher-Hoch; Joseph B McCormick
Journal:  J Med Genet       Date:  2012-06-20       Impact factor: 6.318

3.  The discovery BPD (D-BPD) program: study protocol of a prospective translational multicenter collaborative study to investigate determinants of chronic lung disease in very low birth weight infants.

Authors:  Gaston Ofman; Mauricio T Caballero; Damian Alvarez Paggi; Jacqui Marzec; Florencia Nowogrodzki; Hye-Youn Cho; Mariana Sorgetti; Guillermo Colantonio; Alejandra Bianchi; Luis M Prudent; Nestor Vain; Gonzalo Mariani; Jorge Digregorio; Elba Lopez Turconi; Cristina Osio; Fernanda Galletti; Mariangeles Quiros; Andrea Brum; Santiago Lopez Garcia; Silvia Garcia; Douglas Bell; Marcus H Jones; Trent E Tipple; Steven R Kleeberger; Fernando P Polack
Journal:  BMC Pediatr       Date:  2019-07-06       Impact factor: 2.125

4.  Genetic diversities and phylogenetic analyses of three Chinese main ethnic groups in southwest China: A Y-Chromosomal STR study.

Authors:  Pengyu Chen; Guanglin He; Xing Zou; Xin Zhang; Jida Li; Zhisong Wang; Hongyan Gao; Li Luo; Zhongqing Zhang; Jian Yu; Yanyan Han
Journal:  Sci Rep       Date:  2018-10-18       Impact factor: 4.379

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.