Literature DB >> 26954150

Investigations on Genetic Architecture of Hairy Loci in Dairy Cattle by Using Single and Whole Genome Regression Approaches.

B Karacaören1.   

Abstract

Development of body hair is an important physiological and cellular process that leads to better adaption in tropical environments for dairy cattle. Various studies suggested a major gene and, more recently, associated genes for hairy locus in dairy cattle. Main aim of this study was to i) employ a variant of the discordant sib pair model, in which half sibs from the same sires are randomly sampled using their affection statues, ii) use various single marker regression approaches, and iii) use whole genome regression approaches to dissect genetic architecture of the hairy gene in the cattle. Whole and single genome regression approaches detected strong genomic signals from Chromosome 23. Although there is a major gene effect on hairy phenotype sourced from chromosome 23: whole genome regression approach also suggested polygenic component related with other parts of the genome. Such a result could not be obtained by any of the single marker approaches.

Entities:  

Keywords:  Discordant Sib Pair Analyses; Genome Wide Association Analyses; Whole Genome Regression Analyses

Year:  2015        PMID: 26954150      PMCID: PMC4932587          DOI: 10.5713/ajas.15.0640

Source DB:  PubMed          Journal:  Asian-Australas J Anim Sci        ISSN: 1011-2367            Impact factor:   2.509


INTRODUCTION

Genome wide association studies (GWAS) are employed to detect single nucleotide polymorphisms (SNP) associated with phenotypes in domestic species. Assumptions regarding underlying genetic architecture are important in association mapping for detecting genetic factors related with the phenotypes of interests (de los Campos et al., 2015). To date single regression tests based on SNPs have been commonly employed in GWAS. Discordant sib pair (DSP) model is one application of single SNP models in association mapping (Boehnke and Langefeld, 1998). DSP design uses matched sib pairs (as cases and controls) from the same families to reduce impact of environmental effects and to increase genetic homogeneity to overcome population stratification problems for association mapping. Karacaören et al. (2010) and Karacaören (2012) suggested the use of DSP design in domestic species based on availability of larger number of sib pairs in animal genetics (due to controlled crosses) compared with humans. However only a small proportion of the genetic variance could be explained by using single SNP regression approaches in GWAS. This phenomena is termed as “missing heritability” (Turkheimer, 2011) in genomics. To overcome this problem Visscher (2008) and Yang et al. (2010) suggested using whole SNPs simultaneously similar to Meuwissen et al. (2001) approach. Meuwissen et al. (2001) proposed to use genomic selection models based on usage of whole markers simultaneously to predict total genetic value of the animals. Genomic selection (prediction) models also are suggested in GWAS to detect associated variants (de los Campos et al., 2010; Fernando and Garrick, 2013) instead of single SNPs models. Employing whole markers altogether in a GWAS might be beneficial for multiple hypothesis testing, linkage disequilibrium and for increasing the power of the study (Moser et al., 2015). Development of body hair is an important physiological and cellular process that leads to better adaption in tropical environments for dairy cattle (Dikmen et al., 2013). Various studies suggested a major gene (Olson et al., 2003) and, more recently, associated genes (Dikmen et al., 2013; Littlejohn et al., 2014) for hairy locus in dairy cattle. Main aim of this study was to i) employ a variant of the DSP model, in which half sibs from the same sires are randomly sampled using their affection statues, ii) use various single SNPs regression approaches (Price et al., 2006; Aulchenko et al., 2007) and iii) use whole genome regression approaches (Moser et al., 2015) to dissect genetic architecture of the hairy gene in the cattle.

MATERIAL AND METHODS

Data

The pedigree included 99 Holstein-Friesians formed by 22 nuclear trios and 77 half sib offspring. Half sib offspring were founded by two sires. For DSP design we used the half sibs offspring of the sire “24230079” (n = 50). Hairiness phenotypes were assessed by visual inspection of the cattle and recorded as a binary trait. The genome consisted of 712,122 SNPs distributed over 29 chromosomes. More details about the dataset could be found at Littlejohn et al. (2014).

Genome wide association analyses

Linear mixed models could be used to test for genome wide association (Zhou and Stevens, 2012). Due to the effect of half sib family structure the genetic stratification needs to be taken into account. We used genomic pedigree information in linear mixed model to take into account of the half sib structure as was implemented in GenABEL (Aulchenko et al., 2007) using genomewide rapid association with the mixed model and regression (GRAMMAR-gamma) (Aulchenko et al., 2007; Svishcheva et al., 2012) approach in R software (R development team, 2013). The linear mixed model used as where y contains the observations, b is the sex effects, a is the additive genetic effect, matrices X and Z are incidence matrices, and e is a vector containing residuals. For the random effects, it is assumed that A is the coefficient of coancestry obtained from genotype of animals; I is an identity matrix, is the additive genetic variance and is the residual variance. In that regard, Price et al. (2006) suggested employing genomic principal components for detection and correction of population stratification in a linear mixed model (1). We used Price et al. (2006) approach for GWAS as was implemented in GenABEL (Aulchenko et al., 2007) based on genomic principal components.

Discordant half sib pair analyses

Discordant sib pair design defines the sib pairs as cases and controls to detect putative association based on different allele counting schemes. We here extend this approach to include half sib pairs offspring. Discordant half sib pair analyses could be used to count of marker alleles in cases and controls allocated by their half sib structures. These counts might use all alleles or those discordant alleles in half sib progenies (Table 1).
Table 1

Allele-counting schemes for discordant half sib pairs

CaseHalf sib genotypesAlleles counted

Scheme 1Scheme 2
111111,11,1--
211121,11,212
311221,12,21,12,2

1 and 2 represent distinct alleles at the marker locus.

Adapted from Boehnke and Langefeld (1998).

Pearson homogeneity statistic could be estimated from 2xm table via following formula; where n stands for counted alleles among cases and controls, i = 1, 2 for cases and controls, respectively and j = 1…m (number of alleles). Due to correlation of test statistics within half sib offspring; permutation tests could be used to assess the signification. As was defined in (Karacaören, 2012) we randomly exchanged the case and control statues of the half sib offspring with probability of 1/2 to detect significance level of the test statistics.

A Bayesian mixture model

We used a hierarchical Bayesian mixture model (Moser et al., 2015) for predicting SNP effects (BayesR). BayesR assumed a mixture of four normal distributions for the SNP effects to be predicted; where β is the SNP effects, p is the mixture proportions (assumed to be 0.00001, 0.0001, 0.001, 0.01), is the genetic variance, f(x|θ) is normally distributed mixture densities with θ parameters vectors and observations, x. We sampled 50,000 markov chains and discarded first 20,000 as burn in period and recorded every 10th sample for thinning the chain. We used uninformative priors to obtain desired posteriors.

RESULTS

We excluded 139,283 SNPs based on minor allele frequency of <5%, leaving 572,839 SNPs in the dataset. We excluded 2 individuals due to too high identity by state (IBS) (0.95>) leaving 97 individuals in the analyses. Mean IBS was estimated as 0.72 (0.03) and mean autosomal heterozygosity was estimated as 0.40 (0.01). Genomic heritability was found to be 0.84. To detect the associated locus with the hairy gene we conducted a genome wide association analyses using genomic pedigree and genomic principal component analyses. The IBS matrix and genomic principal components used in the linear mixed model should adjust and remove the genetic stratification due to the half sib family structure. Both the GRAMMAR (Figure 1) and the principal component analyses (Figure 3) detected strong genomic signals from chromosome 23 (Tables 2 and 3). After corrections for multiple hypothesis testing by 1,000 permutations; 223 SNPs (p<0.05) and 440 SNPs (p<0.05) were declared significant with 1.19 (0.00007) and 3.31 (0.07) inflation factors by GRAMMAR and principal components approaches respectively.
Figure 1

Manhattan plot of genome wide association studies result using GRAMMAR approach. The x-axis of the Manhattan plot shows the genomic position, the y-axis represents the log10 base transformed p-values.

Figure 3

Manhattan plot of genome wide association studies result using principal components approach. The x-axis of the Manhattan plot shows the genomic position, the y-axis represents the log10 base transformed p-values.

Table 2

Summary of genomewide rapid association using mixed model and regression

SNPChromosomePositionpPc% Genetic variance
rs10938650723322149091.67e-150.0009990.6538
rs11024294423322180782.49e-140.0009990.7125
rs10936882423322215142.01e-130.0009990.6623
rs13297843823322229892.01e-130.0009990.6623
rs13624680723325800314.16e-120.0009990.5892
rs11023115723327629984.09e-190.0009990.9789
rs10996459723327618402.10e-120.0009990.6056
rs11088705223327652762.65e-120.0009990.6000
rs13512093023327652762.10e-120.0009990.6056
rs10910882923327662001.24e-120.0009990.6183

SNP, single nucleotide polymorphisms; p, raw p values; Pc, corrected p values using 1,000 permutations.

Table 3

Summary of genomic principal component regression model

SNPChromosomePositionpPc% Genetic variance
rs10938650723322149090.0009990.0009990.6819
rs11024294423322180780.0009990.0009990.5823
rs10936882423322215140.0009990.0009990.5456
rs13297843823322229890.0009990.0009990.5456
rs13624680723325800310.0009990.0009990.5174
rs10901348523327598070.0009990.0009990.4956
rs13652214523327618400.0009990.0009990.5173
rs11023115723327629980.0009990.0009990.5605
rs11088705223327652760.0009990.0009990.5605
rs10910882923327662000.0009990.0009990.5469

SNP, single nucleotide polymorphisms; p, raw p values; Pc, corrected p values using 1,000 permutations.

Genotype and allele counts for a significant SNP (rs109386507) obtained by the principal component regression approach (Table 3) are presented in Table 4 and 5. Pearson homogeneity statistic was found to be 15.43 and 18.00 using allele counting Schemes 1 and 2 (Table 5). Since Scheme 2 counts only discordant alleles it is expected that test statistics and hence, evidence for the association is stronger. Due to dependency between half sib pair offspring, traditional statistical tests cannot be used in conjunction with hypothesis testing. Instead we used Monte Carlo simulations based on 100,000 permutations of hairiness phenotypes to declare significance. We detected 437 and 94 cases within 100,000 permutations for Schemes 1 and 2 respectively. Hence level of significance were found to be 0.004 and 0.0009 using Schemes 1 and 2 respectively.
Table 4

Genotype counts for SNP rs109386507

Unaffected-halfsib genotypeAffected-halfsib genotype

AAABBB
AA1180
AB060
BB000

SNP, single nucleotide polymorphisms.

A and B represents different alleles at the marker locus.

Table 5

Allele counts for SNP rs109386507

Counting schemesAlleles

AB
All alleles (scheme 1)
 Affected sibs446
 Unaffected sibs2624
Discordant alleles (scheme 2)
 Affected sibs180
 Unaffected sibs1818

SNP, single nucleotide polymorphisms.

A and B represents different alleles at the marker locus.

We assumed that SNPs effects were taken from the normal distribution with different mixture proportions. Such an assumption is possible since SNPs might have different proportions of explanatory variances on the phenotypes. We ran the Markov chain algorithm 4 times and investigated the trace plot of number of significant SNPs. Visual inspection of the trace plots show convergence (results not shown) of Markov chains. The posterior mean number of SNPs were 23. Table 6 shows that most of the SNPs had small effects (<0.001). Largest effects were detected from Chromosome 23 (for example rs109407108 and rs137784196). Additive genetic variation explained by chromosome 23 was found to be 87%. Over all 23 SNPs explained 99% of the total genetic variance.
Table 6

Predicted SNPs effects by a bayesian mixture model

SNPChromosomePositionEffect% Genetic variance
rs10940710823358351200.37600.7433
rs13778419623347836370.10200.0546
rs436109689929093540.10000.0530
rs13295296423336173770.09990.0525
rs13744688524247253300.06630.0231
rs13754467223282004500.05510.0159
rs10964077461.09E+080.03730.0073

SNP, single nucleotide polymorphisms.

DISCUSSION

Both linear mixed model with genomic relationship matrix and principal components, half sib DSP analyses and a bayesian mixture model identified strong genomic signals from chromosome 23 for hairy gene. Littlejohn et al. (2014) also detected a strong genomic signal by sib transmission disequilibrium test and suggested a prolactin (PRL) as a candidate gene on chromosome 23 for hairy locus. Both GRAMMAR (Figure 1) and principal components (Figure 3) approaches employed in this study detected similar genomic regions for hairy phenotype. However principal components approaches estimated higher inflation factors (3.31) (Figure 4) and a higher number of SNPs (440) compared with a GRAMMAR approach (1.19) (Figure 2) with lower number of SNPs (223). Since GRAMMAR gamma approach (Svishcheva et al., 2012) explicitly use gamma factors to reduce inflation factors this result is not surprising. However the larger estimates of the inflation factors do not have to reflect the population stratification (Yang et al., 2011) under especially polygenic inheritance as was also pointed out by Lee et al. (2014).
Figure 4

Quantile-Quantile plot of principal components genome wide association studies result. Inflation factor (lambda) = 3.31 (0.07).

Figure 2

Quantile-Quantile plot of GRAMMAR genome wide association studies result. Inflation factor (lambda) = 1.19 (0.00007).

We used half sib progenies in DSP experimental design to confirm the most significant SNPs. Both counting schemes for rs109386507 (Table 4) confirmed the significance of the SNP. Increasing the number of discordant half sib progenies probably will also increase the accuracy. Since half sib family design is common in animal genetics such an extension of DSP design might be useful. Although here we used a single locus allele counting schemes for discordant half sib pair analyses it is possible to extend the model for the genome as was suggested by Boenke and Langefeld (1998). However heavy computation cost of permutations over the genome might be one limitation of the discordant half sib pair model. Whole and single genome regression approaches detected strong genomic signals from Chromosome 23. But with a Bayesian mixture model we assumed a different degree of explanatory variances for the SNPs. In that regard, different from results of Littlejohn et al. (2014) we also detected genomic signals from chromosomes 9 (rs43610968) and 24 (rs137446885) for example. Although chromosome 23 was relatively short compared with the most of the other chromosomes; it explains 87% of total genetic variance detected by the Bayesian mixture model. More than 99% of the SNPs had tiny (or zero) effects on the genetic variance. Although there is a major gene effect on hairy phenotype sourced from chromosome 23; whole genome regression approach also suggested a polygenic component related with other parts of the genome. Such a result could not be obtained by any of the single marker approaches.
  17 in total

1.  Principal components analysis corrects for stratification in genome-wide association studies.

Authors:  Alkes L Price; Nick J Patterson; Robert M Plenge; Michael E Weinblatt; Nancy A Shadick; David Reich
Journal:  Nat Genet       Date:  2006-07-23       Impact factor: 38.330

2.  GenABEL: an R library for genome-wide association analysis.

Authors:  Yurii S Aulchenko; Stephan Ripke; Aaron Isaacs; Cornelia M van Duijn
Journal:  Bioinformatics       Date:  2007-03-23       Impact factor: 6.937

3.  Genomewide rapid association using mixed model and regression: a fast and simple method for genomewide pedigree-based quantitative trait loci association analysis.

Authors:  Yurii S Aulchenko; Dirk-Jan de Koning; Chris Haley
Journal:  Genetics       Date:  2007-07-29       Impact factor: 4.562

Review 4.  Predicting genetic predisposition in humans: the promise of whole-genome markers.

Authors:  Gustavo de los Campos; Daniel Gianola; David B Allison
Journal:  Nat Rev Genet       Date:  2010-11-03       Impact factor: 53.242

5.  Genetic association mapping based on discordant sib pairs: the discordant-alleles test.

Authors:  M Boehnke; C D Langefeld
Journal:  Am J Hum Genet       Date:  1998-04       Impact factor: 11.025

6.  Rapid variance components-based method for whole-genome association analysis.

Authors:  Gulnara R Svishcheva; Tatiana I Axenovich; Nadezhda M Belonogova; Cornelia M van Duijn; Yurii S Aulchenko
Journal:  Nat Genet       Date:  2012-09-16       Impact factor: 38.330

7.  Common SNPs explain a large proportion of the heritability for human height.

Authors:  Jian Yang; Beben Benyamin; Brian P McEvoy; Scott Gordon; Anjali K Henders; Dale R Nyholt; Pamela A Madden; Andrew C Heath; Nicholas G Martin; Grant W Montgomery; Michael E Goddard; Peter M Visscher
Journal:  Nat Genet       Date:  2010-06-20       Impact factor: 38.330

8.  Evidence of a major gene influencing hair length and heat tolerance in Bos taurus cattle.

Authors:  T A Olson; C Lucena; C C Chase; A C Hammond
Journal:  J Anim Sci       Date:  2003-01       Impact factor: 3.159

9.  Genome-wide Association Study of Integrated Meat Quality-related Traits of the Duroc Pig Breed.

Authors:  Taeheon Lee; Dong-Hyun Shin; Seoae Cho; Hyun Sung Kang; Sung Hoon Kim; Hak-Kyo Lee; Heebal Kim; Kang-Seok Seo
Journal:  Asian-Australas J Anim Sci       Date:  2014-03       Impact factor: 2.509

10.  Simultaneous discovery, estimation and prediction analysis of complex traits using a bayesian mixture model.

Authors:  Gerhard Moser; Sang Hong Lee; Ben J Hayes; Michael E Goddard; Naomi R Wray; Peter M Visscher
Journal:  PLoS Genet       Date:  2015-04-07       Impact factor: 5.917

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.