Literature DB >> 21048968

Genome wide association studies for milk production traits in Chinese Holstein population.

Li Jiang1, Jianfeng Liu, Dongxiao Sun, Peipei Ma, Xiangdong Ding, Ying Yu, Qin Zhang.   

Abstract

Genome-wide association studies (GWAS) based on high throughput SNP genotyping technologies open a broad avenue for exploring genes associated with milk production traits in dairy cattle. Motivated by pinpointing novel quantitative trait nucleotide (QTN) across Bos Taurus genome, the present study is to perform GWAS to identify genes affecting milk production traits using current state-of-the-art SNP genotyping technology, i.e., the Illumina BovineSNP50 BeadChip. In the analyses, the five most commonly evaluated milk production traits are involved, including milk yield (MY), milk fat yield (FY), milk protein yield (PY), milk fat percentage (FP) and milk protein percentage (PP). Estimated breeding values (EBVs) of 2,093 daughters from 14 paternal half-sib families are considered as phenotypes within the framework of a daughter design. Association tests between each trait and the 54K SNPs are achieved via two different analysis approaches, a paternal transmission disequilibrium test (TDT)-based approach (L1-TDT) and a mixed model based regression analysis (MMRA). In total, 105 SNPs were detected to be significantly associated genome-wise with one or multiple milk production traits. Of the 105 SNPs, 38 were commonly detected by both methods, while four and 63 were solely detected by L1-TDT and MMRA, respectively. The majority (86 out of 105) of the significant SNPs is located within the reported QTL regions and some are within or close to the reported candidate genes. In particular, two SNPs, ARS-BFGL-NGS-4939 and BFGL-NGS-118998, are located close to the DGAT1 gene (160bp apart) and within the GHR gene, respectively. Our findings herein not only provide confirmatory evidences for previously findings, but also explore a suite of novel SNPs associated with milk production traits, and thus form a solid basis for eventually unraveling the causal mutations for milk production traits in dairy cattle.

Entities:  

Mesh:

Year:  2010        PMID: 21048968      PMCID: PMC2965099          DOI: 10.1371/journal.pone.0013661

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Over the last decades, advances in DNA-based marker technology make it possible to identify genome regions (namely quantitative trait loci, QTL) underlying complex traits such as milk yield in dairy cattle. Instead of traditional animal breeding programmes solely relying on phenotype and pedigree information, the incorporation of detected QTL into genetic evaluation provides a great potential to enhance selection accuracies, hence expediting the genetic improvement of animal productivity. In dairy cattle, since the seminal work on QTL mapping by Georges el al [1], a large number of articles have been published concerning detection of QTLs for milk production traits. So far a total number of 1,137 QTL for milk production traits have been reported via genome scan based on marker-QTL linkage analyses (http://www.animalgenome.org/QTLdb/cattle.html, May 22, 2010). The limitations of QTL mapping using linkage analysis (LA) and/or linkage disequilibrium (LD) [2] based on panels of low to moderate density markers have been well documented previously [3], [4]. In the past decades merely few strong candidate genes with potential effects on milk production traits, i.e., the DGAT1 gene [5] and the GHR gene [6], have been identified and/or functionally confirmed from those findings derived from QTL linkage analyses and fine mapping studies. With the advent of genome-wide panels of single nucleotide polymorphisms (SNPs), SNPs have been widely used for the detection and localization of QTL for complex traits in many species [7], and have proved powerful and useful in identification of casual mutations associated with economically important traits in livestock [8], [9], [10], [11], [12], [13], [14], [15] as well as human diseases [16], [17], [18], [19]. Most recently, along with maturing of genome sequencing and high throughput SNP genotyping technologies, genome-wide association studies (GWAS) are becoming practical for exploring genes associated with complex traits. Compared with traditional QTL mapping strategy, GWAS brings on major advantages both in power to detect causal variants with modest effects and in defining narrower genomic regions harboring causal variants [20]. GWAS has been widely accepted as a primary approach for gene finding and achieved huge success in identifying genes conferring modest disease risks in human. However, only few GWAS focusing on identifying genes for milk production traits have been performed [21], [22]. Furthermore, the common limitation of these studies is that low-density SNP makers were employed in the analyses, leading to a decrease in power to capturing causal genes. Motivated by searching for novel casual variants for milk production traits beyond previous findings via traditional linkage studies, the present study is to perform GWAS to detect potential casual genetic variants for milk production traits, using the Illumina BovineSNP50 BeadChip. The identified SNP loci may be considered as preliminary foundation for further replication studies and eventually unraveling the causal mutations for milk production traits in dairy cattle.

Materials and Methods

The blood samples were collected along with the regular quarantine inspection of the farms, so no ethical approval was required for this study.

Animal resource

A daughter design was employed in this study. In total 2,093 daughters as well as their 14 corresponding sires were collected to construct the study population. The numbers of daughters of the 14 sires range from 83 to 358 daughters with an average of 150. These daughters were from 15 Holstein cattle farms in Beijing, China, where regular and standard performance testing (dairy herd improvement, DHI) has been conducted since 1999. The official up to date estimated breeding values (EBVs) of five milk production traits, including milk yield (MY), fat yield (FY), protein yield (PY), fat percentage (FP), and protein percentage (PP) were used as phenotypes in this study. These EBVs were obtained based on a multiple trait random regression test-day model [23] using the software RUNGE provided by Canadian Dairy Network (CDN) (http://www.cdn.ca). The descriptive statistics of these EBVs for the five traits as well as the average reliabilities of EBVs of the 2,093 daughters are presented in Table 1. It is notable that the program RUNGE gave two sets of accuracies of EBVs for the five milk production traits. One is for milk yield (MY) and the other for the four milk content traits (FY, PY, FP, and PP). This is because that the amount of information used for calculating EBVs was different for MY and for the 4 milk content traits, while all of the 4 milk content traits provided the same amount of information for calculating EBVs.
Table 1

Descriptive statistics of EBVs and the accuracy of five milk production traits for 2,093 daughters.

TraitsMeanStandard deviationMinimumMaximumMean reliability (range)
Milk yield (MY)379.36608.65−1667.002553.000.63 (0.50–0.71)
Fat yield (FY)7.4924.37−73940.52 (0.41–0.70)
Protein yield (PY)10.7217.05−49740.52 (0.41–0.70)
Fat% (FP)−0.070.91−0.900.270.52 (0.41–0.70)
Protein% (PP)−0.010.32−0.420.100.52 (0.41–0.70)

Genotyping

DNA was extracted from blood sample of the daughters and semen sample of the sires using the routine procedures. DNA was quantified and genotyped using the Illumina BovineSNP50 BeadChip containing 54001 SNPs, which is a multi-sample genotyping panel powered by Illumina's Infinium® II Assay. Features of the Illumina BovineSNP50 BeadChip have been detailed previously [24]. All samples were genotyped using BEADSTUDIO (Illumina) and a custom cluster file developed from the 2180 samples.

Genotype quality control

To assess the technical reliability of the genotyping panel, a randomly selected DNA sample was genotyped twice and over 99% identity of called genotypes (two mismatches) was obtained. This demonstrates the technically robust feature of the 50K SNP BeadChip panel employed herein. The quality control procedure can be largely split into two categories, including individual exclusion and SNP removal, as follows: Firstly, an individual would be excluded from the analyses if it had more than 10% missing genotypes or its SNP genotypes had a Mendelian error rate above 2%. For the second criterion, for each sire-daughter pair, we randomly choose 10,000 genotyped SNP loci for which both the sire and the daughter are homozygotes. A Mendelian error happens herein if the two homozygotes are different in the context that the maternal genotype is unavailable. Accordingly, if more than 200 out of 10,000 SNP have Mendelian errors, the daughter will be removed from the sample. Secondly, a SNP would be removed if (1) its call rate was less than 90%, or (2) its minor allele frequency (MAF) was less than 3%, or (3) it was severely depart from Hardy Weinberg Equilibrium (HWE) with a P value lower than 10−6, or (4) its minor genotype frequency was less than five individuals. After the quality control procedures, 73 daughters with >10% missing genotypes and 205 daughters with Mendelian error rate above 2% were excluded, leading to 1,815 daughters remaining for the association analysis. On the other hand, we removed 1,218 SNPs with <90% genotype call rate, 11,008 SNPs with a MAF <0.03, 482 SNPs with extreme value of HWE statistics (P<10−06), and 1,073 SNPs with minor genotype frequency <5 individuals. Eventually, 40,220 SNPs (74.5%) passed these quality control filters. The distribution of the remaining SNPs after filtering and the average distances between adjacent SNPs on each chromosome are given in Table 2. In addition, for the L1-TDT analyses, we excluded extra 1,057 SNPs for which all paternal genotypes are homozygotes, and 39,163 SNPs were finally utilized.
Table 2

Distributions of SNPs after quality control and the average distances between adjacent SNPs on each chromosome.

BTANo. SNPsAverage distance (kb)a
1248565
2203369
3196665
4186966
5160778
6191364
7171165
8179765
9149972
10158667
11169965
12123369
13131864
14130962
15126567
16118066
17120164
18103964
19106861
20120563
21105366
2293966
2383863
2494669
2576957
2677467
2773067
2871964
2979465
X497179
0b 1178
TOTAL40220

Derived from the most recent bovine genome sequence assembly (Btau_4.0) (http://www.ncbi.nlm.nih.gov/genome/guide/cow/)

These SNPs are not assigned to any chromosomes.

Derived from the most recent bovine genome sequence assembly (Btau_4.0) (http://www.ncbi.nlm.nih.gov/genome/guide/cow/) These SNPs are not assigned to any chromosomes.

Statistical analyses

Two methods are adopted to perform GWAS in our studies as follows:

TDT-based single locus regression analyses (L1-TDT)

L1-TDT is a TDT-based association procedure [25], which is specifically suitable for the situation where only a single parent instead of both parents are genotyped for TDT analyses. As merely the genotypes of bulls and their daughters are available within the framework of a daughter design, we employed it to explore the existence of associations between phenotypes and SNP allele transmissions from bulls to their daughters within sire families. Under such circumstance, a phenotypic observation, i.e., EBV considered herein, can be modeled by a SNP effect within family due to transmission disequilibrium of the SNP alleles as well as the effect of the sire corresponding to each half-sib family. For each milk production trait, the equation of the model is given as follows:where y is the EBV of the j th daughter of sire i, μ is the overall mean, s is the fixed effect of sire i, TDS is an indicator variable with a value −1, 0 or 1 to indicate the transmission of a specific SNP allele from sire i to his j th daughter, which is determined according to [26], β is the regression coefficient (or the substitution effect of the SNP), and e is the residual error. For each SNP, β is estimated via a weighted least squares analysis with the weights equal to 1/REL, where REL is the reliability of the EBV of daughter j in family i. The association between the SNP and the trait is tested via the F-test.

Mixed model based single locus regression analyses (MMRA)

Similar to the studies of [21] and [22], we performed association test for each SNP via regression analysis based on the following linear mixed model:where y is the vector of EBVs of all daughters, b is the regression coefficient of EBV on SNP genotypes, x is the vector of the SNP genotype indicators which takes values 0, 1 or 2 corresponding to the three genotypes 11, 12 and 22 (assuming 2 is the allele with a minor frequency), a is the vector of the residual polygenetic effects with (where A is the additive genetic relationship matrix and is the additive variance, and e is the vector of residual errors with (where W is a diagonal matrix with the diagonal elements equal to 1/REL and is the residual error variance). For each SNP, the estimate of b and the corresponding sampling variance can be obtained via mixed model equations (MME), and a Wald chi-squared statistic with df = 1 is constructed to examine whether the SNP is associated with the trait. We employed Fortran 95 to code the computing programs for L1-TDT and MMRA and they are available upon request.

Statistical Inference

For both analyses, the Bonferroni method was adopted to adjust for multiple testing from the number of SNP loci detected. We declared a significant SNP at the genome-wise significance level if a raw P value <0.05/N, here N is the number of SNP loci tested in analyses.

Population stratification assessment

Confounding due to population stratification has been considered as a major plague to the validity of genetic association studies [27]. To view if the population stratification exists in our experimental population, we examined the distribution of the test statistics obtained from the numerous association tests performed and assessed their deviation from the expected distribution of no SNP being associated with the trait of interest utilizing a quantile-quantile (Q-Q) plot, which is a routine and most frequently used tool for scrutinizing the population stratification in GWAS. Since merely MMRA method is not immune to potential population stratification, “Q-Q” plots for the test statistics of MMRA were conducted for the five traits.

Results

Significant SNPs

The profiles of P values (in terms of −log(p)) of all tested SNPs for the five investigated traits are shown in Fig. 1. The numbers of genome-wise significant SNPs detected by L1-TDT or MMRA for the five traits are presented in Table 3. In total, the numbers of significant SNPs detected by either L1-TDT or MMRA for the five traits are 20, 9, 21, 65 and 28, respectively. Since some of these SNPs are associated with more than one trait, the total number of distinct identified SNPs is 105. Of these 105 SNPs, 38 were commonly detected by both methods, while four and 63 were solely detected by L1-TDT and MMRA, respectively. With exception of only four SNPs, all SNPs detected by L1-TDT were also detected by MMRA. The details of these significant SNPs for the five traits, including their positions in the genome, the nearest known genes and the raw P values, are given in Tables 4 through 8, respectively, and further described as follows.
Figure 1

Genome-wide plots of −log10(p-values) for association of SNP loci with five milk production traits in sequential order.

Chromosomes 1–29 and X are shown separated by color. Fig. 1-a1, 1-a2, 1-a3, 1-a4 and 1-a5 refer to plots generated by L1-TDT for MY, FY, PY, FP and PP, respectively. Fig. 1-b1, 1-b2, 1-b3, 1-b4 and 1-b5 refer to plots generated by MMRA for MY, FY, PY, FP and PP, respectively. The corresponding horizontal lines indicate the genome-wise significance levels (−log10(1.28×10−6) for L1-TDT and −log10(1.24×10−6) for MMRA).

Table 3

Numbers of significant SNPs detected by L1-TDT and MMRA.

TraitL1-TDTMMRAOverlapa Totalb
MY1118920
FY1919
PY521521
FP37613365
PP1027928

Numbers of SNPs commonly detected by both methods.

Numbers of SNPs detected by either L1-TDT or MMRA.

Table 4

Genome-wise significant (p<0.05) SNPs for milk yield (MY).

SNPChr.Position (bp)Nearest genec Raw P valueAdjusted P value
NameDistance (bp)
ARS-BFGL-NGS-91705b (rs43282015)1149650628 LOC614166 54909.21E-073.70E-02
Hapmap38643-BTA-95454b 392862402 LOC534011 Within6.24E-072.51E-02
BFGL-NGS-110018a (rs41647754)564833594 HAL 3571.50E-075.89E-03
ARS-BFGL-NGS-49079b (rs42517915)95763632 LOC788012 567573.53E-071.42E-02
BTB-01921442 b (rs43030751)921062122 LOC100139865 309526.61E-072.66E-02
Hapmap30383-BTC-005848ab 1476703 C14H8orf33 877.58E-163.05E-11
ARS-BFGL-NGS-57820 ab 14236532 FOXH1 33961.39E-205.59E-16
ARS-BFGL-NGS-34135 ab 14260341 CYHR1 Within6.18E-082.49E-03
ARS-BFGL-NGS-94706 b (rs17870736)14281533 VPS28 Within4.08E-091.64E-04
ARS-BFGL-NGS-4939 ab 14443937 DGAT1 1601.16E-254.67E-21
Hapmap52798-ss46526455 a (rs41256919)14565311 MAF1 Within8.22E-083.22E-03
ARS-BFGL-NGS-107379 ab 14679600 LOC786966 4602.33E-209.37E-16
UA-IFASA-6878ab (rs41629750)141044041 GRINA 156628.74E-083.52E-03
Hapmap25486-BTC-072553b 141285037 GML Within3.32E-071.34E-02
Hapmap30646-BTC-002054ab 141461085 GPIHBP1 12951.02E-104.10E-06
Hapmap30086-BTC-002066ab 141490178 ZNF66 15667.74E-103.11E-05
ARS-BFGL-NGS-100480 ab 142607583 NIBP Within2.91E-091.17E-04
UA-IFASA-6329 b (rs41579243)143465237 COL22A1 98641.59E-096.39E-05
BFGL-NGS-110563 b 143799228 COL22A1 845542.25E-079.05E-03
Hapmap50053-BTA-61516 b 2639018261 C26H10orf84 287244.83E-071.94E-02

Note: SNPs with a superscript “a” are detected by L1-TDT only, SNPS with a superscript “b” are detected by MMRA only, SNPs with a superscript “ab” are detected by both L1-TDT and MMRA, and SNPs in italic are located within the QTL regions reported previously. Names in parentheses are standard RRS/RS names in the NCBI database (http://www.ncbi.nlm.nih.gov).

The nearest known gene to the significant SNP.

Table 5

Genome-wise significant (p<0.05) SNPs with fat yield (FY).

SNPChr.Position (bp)Nearest geneRaw P valueAdjusted P value
NameDistance (bp)
Hapmap57440-rs29017368 b (rs29017368)562242550 LOC515967 24192.22E-078.93E-03
Hapmap40191-BTA-73919 b (rs41648982)576882812 LOC511240 698458.10E-073.26E-02
Hapmap30381-BTC-005750 b 1450872 C14H8orf33 237011.14E-064.59E-02
ARS-BFGL-NGS-57820 b 14236532 FOXH1 33961.23E-104.95E-06
ARS-BFGL-NGS-34135 b 14260341 CYHR1 Within1.29E-115.19E-07
ARS-BFGL-NGS-94706 ab (rs17870736)14281533 VPS28 Within8.93E-073.50E-02
ARS-BFGL-NGS-4939 b 14443937 DGAT1 1601.01E-144.06E-10
ARS-BFGL-NGS-107379 b 14679600 LOC786966 4601.47E-095.91E-05
BTA-90435-no-rsb (rs41664719)X70532837 LOC516454 2852893.35E-071.35E-02

Note: See note to Table 4.

Table 6

Genome-wise significant (p<0.05) SNPs with protein yield (PY).

SNPChr.Position (bp)Nearest geneRaw P valueAdjusted P value
NameDistance (bp)
BTA-55340-no-rs b (rs41586699)1145954149 PDE9A Within9.72E-073.91E-02
BFGL-NGS-113002b 1149153073 DIP2A Within4.56E-071.83E-02
ARS-BFGL-NGS-91705b (rs43282015)1149650628 LOC614166 54903.57E-091.44E-04
ARS-BFGL-NGS-98387 b 1154783580 ETS2 1697454.93E-071.98E-02
INRA-701 b (rs41589462)333837442 LOC539739 Within7.75E-073.12E-02
BFGL-NGS-115461 b 345261895 SLC30A7 73068.47E-073.41E-02
Hapmap58769-rs29025951 b (rs29025951)347803786 LOC100138725 2194933.11E-071.25E-02
ARS-BFGL-NGS-4358b 350781421 LOC781902 3857643.04E-071.22E-02
Hapmap38643-BTA-95454b 392862402 LOC534011 Within3.77E-071.52E-02
BFGL-NGS-110018 b (rs41647754)564833594 HAL 3578.53E-073.43E-02
ARS-BFGL-NGS-7249 b 630503076 PDHA2 132658.38E-083.37E-03
BTA-83825-no-rs b (rs41659807)97704446 LOC788115 608549.79E-073.94E-02
Hapmap30383-BTC-005848ab 1476703 C14H8orf33 872.99E-131.20E-08
ARS-BFGL-NGS-57820 ab 14236532 FOXH1 33961.71E-126.88E-08
ARS-BFGL-NGS-4939 ab 14443937 DGAT1 1605.81E-132.34E-08
ARS-BFGL-NGS-107379 ab 14679600 LOC786966 4606.18E-122.49E-07
Hapmap30646-BTC-002054 b 141461085 GPIHBP1 12951.55E-076.23E-03
ARS-BFGL-NGS-100480 ab 142607583 NIBP Within1.16E-064.67E-02
UA-IFASA-6329 b (rs41579243)143465237 COL22A1 98647.59E-073.05E-02
ARS-BFGL-BAC-10793 b 1427452257 NKAIN3 1068857.15E-072.88E-02
Hapmap50053-BTA-61516b 2639018261 C26H10orf84 287241.11E-064.46E-02

Note: See note to Table 4.

Table 7

Genome-wise significant (p<0.05) SNPs with fat percentage (FP).

SNPChr.Position (bp)Nearest geneRaw P valueAdjusted P value
NameDistance (bp)
Hapmap39717-BTA-112973 (rs41617243)227529202 KBTBD10 Within1.80E-077.24E-03
Hapmap51303-BTA-74377b (rs41652649)589694749 ITPR2 Within1.20E-064.83E-02
BTB-00285653b (rs43499009)831663727 NFIB Within2.41E-079.69E-03
BFGL-NGS-119907ab 11106766451 GFI1B 155561.30E-065.23E-02
ARS-BFGL-NGS-26919a 11107216562 LOC526069 Within6.87E-082.69E-03
Hapmap30381-BTC-005750ab 1450872 C14H8orf33 237012.28E-189.17E-14
Hapmap30383-BTC-005848ab 1476703 C14H8orf33 871.05E-284.22E-24
BTA-34956-no-rsab (rs41630614)14101473 LOC785799 34792.37E-189.53E-14
ARS-BFGL-NGS-57820ab 14236532 FOXH1 33961.82E-487.32E-44
ARS-BFGL-NGS-34135ab 14260341 CYHR1 Within1.71E-306.88E-26
ARS-BFGL-NGS-94706ab (rs17870736)14281533 VPS28 Within5.78E-302.32E-25
ARS-BFGL-NGS-4939 ab 14443937 DGAT1 1605.23E-642.10E-59
Hapmap52798-ss46526455ab (rs41256919)14565311 MAF1 Within4.56E-121.83E-07
ARS-BFGL-NGS-71749 ab 14596341 OPLAH 32371.20E-114.83E-07
ARS-BFGL-NGS-107379 ab 14679600 LOC786966 4601.77E-457.12E-41
ARS-BFGL-NGS-18365 ab 14741867 LOC524939 152422.13E-088.57E-04
Hapmap25384-BTC-001997 b 14835054 MAPK15 Within6.22E-082.50E-03
Hapmap24715-BTC-001973 b 14856889 MAPK15 Within2.74E-081.10E-03
BTA-35941-no-rs ab (rs41627764)14894252 ZNF623 87792.72E-151.09E-10
ARS-BFGL-NGS-101653b 14931162 EEF1D Within8.84E-113.56E-06
ARS-BFGL-NGS-26520 ab 14996982 ZC3H3 Within3.94E-141.58E-09
UA-IFASA-6878 ab (rs41629750)141044041 GRINA 156621.69E-136.80E-09
ARS-BFGL-NGS-22866 b 141131952 LYPD2 26533.30E-101.33E-05
Hapmap25486-BTC-072553 ab 141285037 GML Within1.09E-134.38E-09
Hapmap29758-BTC-003619 ab 141339276 CYP11B1 366521.95E-087.84E-04
Hapmap30646-BTC-002054 ab 141461085 GPIHBP1 12956.30E-202.53E-15
Hapmap30086-BTC-002066 ab 141490178 ZNF66 15666.61E-202.66E-15
Hapmap30374-BTC-002159 ab 141546591 RHPN1 Within5.44E-132.19E-08
ARS-BFGL-NGS-74378 b 141889210 GPR20 711.46E-085.87E-04
BFGL-NGS-117542 b 141913108 GPR20 239697.64E-103.07E-05
ARS-BFGL-NGS-33248 ab 142130912 PTK2 Within1.26E-085.07E-04
UA-IFASA-9288 b (rs41624797)142201870 PTK2 Within2.19E-128.81E-08
ARS-BFGL-NGS-22111 ab 12347219 EIF2C2 258066.59E-082.65E-03
UA-IFASA-7269 ab (rs41576704)142370256 EIF2C2 27691.02E-084.10E-04
Hapmap26072-BTC-065132 b 142391826 EIF2C2 Within2.15E-088.65E-04
ARS-BFGL-NGS-56327 b 142580414 NIBP Within1.58E-086.35E-04
ARS-BFGL-NGS-100480 ab 142607583 NIBP Within9.14E-163.68E-11
UA-IFASA-5306 b (rs55617160)142711615 NIBP Within2.12E-138.53E-09
UA-IFASA-5765 a 142763657 NIBP Within5.82E-072.28E-02
ARS-BFGL-BAC-25166 a 142805785 NIBP Within5.11E-082.00E-03
Hapmap27703-BTC-053907 ab 142826073 NIBP Within3.03E-111.22E-06
Hapmap22692-BTC-068210 b 143018726 KCNK9 253122.08E-078.37E-03
Hapmap23302-BTC-052123 b 143099635 KCNK9 1062212.03E-078.16E-03
Hapmap25217-BTC-067767 b 143189312 KCNK9 1958981.04E-074.18E-03
UA-IFASA-6329 ab (rs41579243)143465237 COL22A1 98644.29E-131.73E-08
ARS-BFGL-NGS-3571 ab 143587018 COL22A1 Within5.38E-092.16E-04
BFGL-NGS-118478 a 143660264 COL22A1 Within2.92E-071.14E-02
BFGL-NGS-110563 ab 143799228 COL22A1 845543.31E-161.33E-11
Hapmap32262-BTC-066621 b 143834069 LOC618755 673213.92E-111.58E-06
BFGL-NGS-115947 b 143865962 LOC618755 354283.79E-071.52E-02
Hapmap30091-BTC-005211 b 143940998 LOC618755 Within2.10E-078.45E-03
Hapmap27709-BTC-057052 ab (rs42305942)144276966 LOC100138440 385544.14E-071.67E-02
Hapmap51646-BTA-86764 b (rs41657812)144302229 LOC100138440 132917.91E-083.18E-03
Hapmap26591-BTC-056596 b 144477036 LOC100138440 1577016.62E-072.66E-02
Hapmap23618-BTC-056528 b (rs42310935)144518666 LOC100138440 1993312.06E-088.29E-04
Hapmap30988-BTC-056315 ab 144693901 LOC100138440 3745662.88E-081.16E-03
UA-IFASA-6228 b 145204594 KHDRBS3 5602158.09E-073.25E-02
ARS-BFGL-BAC-20965 b 145225004 KHDRBS3 5398052.77E-071.11E-02
BFGL-NGS-110894 b 145282438 KHDRBS3 4823719.21E-093.70E-04
Hapmap33635-BTC-049051 b 145318261 KHDRBS3 4465483.64E-071.46E-02
Hapmap23851-BTC-048718 b 145387836 KHDRBS3 3769731.26E-065.07E-02
Hapmap32234-BTC-048199 ab 145640338 KHDRBS3 1244713.82E-161.54E-11
UA-IFASA-6647 ab 145808644 KHDRBS3 Within2.29E-149.21E-10
Hapmap32948-BTC-047992 b 145839290 KHDRBS3 Within3.71E-071.49E-02
ARS-BFGL-BAC-8730 ab 146252101 MIRN30D 2214207.03E-132.83E-08

Note: See note to Table 4.

Table 8

Genome-wise significant (p<0.05) SNPs with protein percentage (PP).

SNPChr.Position (bp)Nearest geneRaw P ValueAdjusted P value
NameDistance (bp)
Hapmap48524-BTA-92140b (rs42552739)580965296 NCF4 247493.69E-111.48E-06
ARS-BFGL-NGS-21133b 581003368 CSF2RB 20323.29E-071.32E-02
Hapmap59369-rs29018333 b (rs29018333)633989255 LOC536367 Within1.52E-086.11E-04
Hapmap24324-BTC-062449 b 637024132 HERC3 67382.33E-179.37E-13
BTA-121739-no-rs b (rs41622323)637454409 PKD2 Within7.08E-072.85E-02
BTB-00251047 b (rs43463988)641928694 LOC100140991 4016344.33E-071.74E-02
Hapmap41083-BTA-76098 b (rs41652041)680715299 LOC100140587 43191.17E-064.71E-02
ARS-BFGL-NGS-57820 ab 14236532 FOXH1 33962.82E-081.13E-03
ARS-BFGL-NGS-34135ab 14260341 CYHR1 Within3.88E-091.56E-04
ARS-BFGL-NGS-94706ab (rs17870736)14281533 VPS28 Within2.28E-089.17E-04
ARS-BFGL-NGS-4939 ab 14443937 DGAT1 1603.73E-081.50E-03
ARS-BFGL-NGS-107379 ab 14679600 LOC786966 4607.73E-073.11E-02
UA-IFASA-6878 b (rs41629750)141044041 GRINA 156626.79E-072.73E-02
Hapmap27703-BTC-053907 a 142826073 NIBP Within3.95E-071.55E-02
BFGL-NGS-118998 ab 2034036832 GHR Within7.87E-073.17E-02
ARS-BFGL-BAC-2469 b (rs41937533)2035552477 LOC518808 225141.21E-084.87E-04
BTA-50402-no-rs b (rs41945918)2036668000 LOC782462 195107.57E-073.04E-02
Hapmap57531-rs29013890 b (rs29013890)2036955575 LOC782833 320198.97E-083.61E-03
BTB-00778154 ab (rs41941646)2037399087 C9 102196.21E-072.50E-02
BTB-00778141 ab (rs41941633)2037442583 FYB 353601.91E-117.68E-07
ARS-BFGL-NGS-38482 b 2037708167 RICTOR Within1.99E-088.00E-04
Hapmap39660-BTA-50453 b (rs41581059)2037865657 LOC100138964 Within2.97E-071.19E-02
ARS-BFGL-NGS-22355b 2038899763 GDNF 178315.65E-072.27E-02
BTB-00782435b (rs41942492)2039485917 NIPBL Within1.02E-074.10E-03
BTA-13793-rs29018751b (rs29018751)2039518857 NIPBL Within6.27E-102.52E-05
BTB-01842107b (rs42954630)2039601103 NIPBL Within1.53E-106.15E-06
Hapmap53199-rs29014437a (rs29014437)2039728147 LOC782284 381243.35E-071.35E-02
BTA-102910-no-rsb (rs41574319)2041947097 RAI14 1713702.64E-071.06E-02

Note: See note to Table 4.

Genome-wide plots of −log10(p-values) for association of SNP loci with five milk production traits in sequential order.

Chromosomes 1–29 and X are shown separated by color. Fig. 1-a1, 1-a2, 1-a3, 1-a4 and 1-a5 refer to plots generated by L1-TDT for MY, FY, PY, FP and PP, respectively. Fig. 1-b1, 1-b2, 1-b3, 1-b4 and 1-b5 refer to plots generated by MMRA for MY, FY, PY, FP and PP, respectively. The corresponding horizontal lines indicate the genome-wise significance levels (−log10(1.28×10−6) for L1-TDT and −log10(1.24×10−6) for MMRA). Numbers of SNPs commonly detected by both methods. Numbers of SNPs detected by either L1-TDT or MMRA. Note: SNPs with a superscript “a” are detected by L1-TDT only, SNPS with a superscript “b” are detected by MMRA only, SNPs with a superscript “ab” are detected by both L1-TDT and MMRA, and SNPs in italic are located within the QTL regions reported previously. Names in parentheses are standard RRS/RS names in the NCBI database (http://www.ncbi.nlm.nih.gov). The nearest known gene to the significant SNP. Note: See note to Table 4. Note: See note to Table 4. Note: See note to Table 4. Note: See note to Table 4.

Milk Yield (MY)

As seen from Table 4, 14 out of 20 SNPs are located within a 3.63 Mb segment (between 0.07 and 3.7 Mb) on BTA 14. Ten of them fall into the regions that have been reported to harbor QTL for MY [5], [21], [28], [29], [30], [31], [32]. Furthermore, 6 of these SNPs are harbored within the regions of known genes, and the others are located 87 to 84,554 bp away from the nearest known genes.

Fat Yield (FY)

As presented in Table 5, 6 out of 9 SNPs are clustered within a 0.55 Mb segment (between 0.05 and 0.6 Mb) on BTA 14. Eight out of them fall in the regions which have been reported to harbor QTL for FY previously [5], [28], [31], [33], [34], [35]. Furthermore, two of these SNPs fall within the regions of known genes, and the others are located 160 to 285,289 bp away from the nearest known genes.

Protein Yield (PY)

As shown in Table 6, among these 21 SNPs, 7 out of them are located within a 3.33 Mb segment (between 0.07 to 3.4 Mb) on BTA 14. Further, 14 out of these SNPs are within the QTL regions for PY reported in previous studies [5], [21], [28], [33], [36], [37], [38], [39]; 5 of them are located within the regions of known genes, and the others are located 87 to 385,764 bp away from the nearest known genes.

Fat Percentage (FP)

From Table 7, 60 are located within a 6.2 Mb segment (between 0.05 to 6.25 Mb) on BTA 14. 53 of them are located within the QTL regions for FP reported in previous studies [5], [34], [39], [40], [41], [42], [43], [44]. Further, 27 of the 65 detected SNPs are located within the regions of known genes, and the others are 71 to 560,215 bp away from the nearest known genes.

Protein Percentage (PP)

As given by Table 8, out of 28 identified SNPs, there are 4, 7, and 14 SNPs located within a 8.0 Mb segment (between 33.9 to 41.9 Mb) on BTA6, a 2.59 Mb segment (between 0.23 to 2.82 Mb) on BTA14, and a 7.9 Mb segment (between 34.0 to 41.9 Mb) on BTA20, respectively. Among these 28 SNPs, 17 are located within the QTL regions for PP identified in previous studies [5], [28], [29], [34], [42], [45]. Further, 11 of these 28 SNPs are located within the regions of known genes, and the others are 160 to 401,634 bp away from the nearest known genes. The “Q-Q” plots for the test statistics of MMRA are shown in Fig. 2-1 to 2-5. From these plots, it is apparent that the distributions of the χ statistics generated from the association tests across the SNPs tested show no evidence of overall systematic bias. That is, the observed χ statistics of the significant SNPs are above the expected χ statistics, which are largely at the adjusted genome-wide significance level. The profiles of the Q-Q plots clearly show that the significant SNPs identified by MMRA are unlikely threaten by potential population stratification.
Figure 2

Quantile-quantile (Q-Q) plots of genome-wide association results by MMRA for five milk production traits.

Under the null hypothesis of no association at any SNP locus, the points would be expected to follow the slope lines. Deviations from the slope lines correspond to loci that deviate from the null hypotheses.

Quantile-quantile (Q-Q) plots of genome-wide association results by MMRA for five milk production traits.

Under the null hypothesis of no association at any SNP locus, the points would be expected to follow the slope lines. Deviations from the slope lines correspond to loci that deviate from the null hypotheses.

Discussion

In this study, we performed a GWA study for five milk production traits using a daughter design in Chinese Holstein population. To our knowledge, this is one of the first GWA studies for milk production traits using the Illumina BovineSNP50 BeadChip. Two statistical methods, L1-TDT and MMRA, were implemented to analyze association between SNPs and phenotypes. These two methods belong to two distinct analytical approaches, i.e., family-based (L1-TDT) and population-based (MMRA) approaches, respectively, both of which have been widely employed in GWAS. Comparisons between the two methods have been well conducted by many investigators [27], [46], [47], [48]. Consensus with respect to their performance is twofold. On the one hand, population-based analyses largely outperform family-based analyses in statistical power and efficiency. The power limitation of family-based analyses results from “overmatching” on genotype [49]. Much fewer significant SNPs detected by L1-TDT compared with MMRA in this study present consistent evidence for this aspect in practice. On the other hand, family-based analysis always guards against population admixture/stratification caused by recent migration and/or non-random mating, and do not give spurious significant results, although at the expense of some loss of power [50]. The “Q-Q” plots for the test statistics of MMRA (Fig. 2-1 to 2-5) demonstrate that no population admixture/stratification exists in our population. Therefore, it is safe to declare that the SNPs detected by MMRA as well as L1-TDT have convincing associations with the traits of interest. BTA14 has been received wide attention by many investigators. Apart from a large number of QTL reported on BAT14 [34], [39], [51], [52], the well-known DGAT1 gene[5] located at ∼0.44Mb is generally accepted as a major gene affecting milk production traits. Bennewitz et al. [28] revisited the QTL on BTA14 and concluded that there should exist a further conditional QTL which should be in linkage with the DGAT1 gene, and possible epistatic effects arising from them may be an additional source of genetic variation for milk production traits. Indeed, Kaupe et al. [53] recently reported that the CYP11B1 gene located at ∼1.33Mbp has significant effects on MY, PY, FP and PP, and the allele substitution effects of CYP11B1 and DGAT1 together explained more variation in milk production traits than DGAT1 alone. In our study, an apparent feature of our findings is that a large proportion of the significant SNPs (61 out of 105) are located on BTA14. Of the 61 SNPs, 59 are located within the reported QTL regions. In particular, all segments on BTA14 which harbor multiple SNPs for the five traits also harbor the DGAT1 gene, and the four segments for MY, PY, FP and PP also harbor the CYP11B1 gene. Within these segments, 13 SNPs are located very close (within 1Mb) to the DGAT1 gene with the closest one (ARS-BFGL-NGS-4939) only 160bp away from it and 14 SNPs very close to the CYP11B1 gene with the closest one (Hapmap25486-BTC-072553 ) only 8,693bp away from it. In addition to the SNPs on BTA14, most (27 out of 44) of the significant SNPs on other chromosomes are also located within the reported QTL regions. Further, some SNPs are also within or close (within 1Mb) to the reported candidate genes (for a summary of cattle candidate genes for milk production traits, see [54]). In particular, a SNP (BFGL-NGS-118998) located at 34,036,832 bp on BTA20 was found to fall within the GHR gene, which is also generally accepted as a functional causal gene affecting milk yield and components [5], [6]. The other SNPs include the SNPs BTA-121739-no-rs and Hapmap24324-BTC-062449 on BTA6, which are 20,591bp and 450,868bp away from the ABCG2 gene [55], respectively, and the SNP ARS-BFGL-NGS-26919 on BTA11, which is 41,562bp away from the LGB gene [56]. It is notable that for either L1-TDT or MMRA some detected SNPs are associated with phenotypic variation in multiple production traits, including the SNPs ARS-BFGL-NGS-4939, ARS-BFGL-NGS-57820, and ARS-BFGL-NGS-107379 on BTA14 (for all of the five traits), the SNPs ARS-BFGL-NGS-94706 and ARS-BFGL-NGS-34135 on BTA14 (for MY, FY, FP, and PP), the SNPs Hapmap30383-BTC-005848, Hapmap30646-BTC-002054, ARS-BFGL-NGS-100480, and UA-IFASA-6329 on BTA14 (for MY, PY, and FP), the SNP UA-IFASA-6878 on BTA14 (for MY, FP, and PP), the SNPs Hapmap52798-ss46526455, BFGL-NGS-110563, Hapmap25486-BTC-072553, and Hapmap30086-BTC-002066 on BTA14 (for MY and FP), the SNPs ARS-BFGL-NGS-91705 on BTA1, Hapmap38643-BTA-95454 on BTA3, BFGL-NGS-110018 on BTA5, and Hapmap50053-BTA-61516 on BTA26 (for MY and PY), the SNPs Hapmap30381-BTC-005750 on BTA14 (for FY and FP), and the SNP Hapmap27703-BTC-053907 on BTA14 (for FP and PP). This could be explained by pleiotropic effects of these SNPs on multiple milk production traits, leading to genetic correlations among them and there were similar result in many prior studies [28], [53]. In this study, we performed GWAS in the way of SNP by SNP individually via regressing the observations of a single trait on either the genotypes of a SNP (MMRA) or the allele transmission patterns of a SNP from bulls to corresponding half-sib offspring (L1-TDT). Previous studies have shown that single marker tests provide similar or greater power than haplotype-based approaches [57], [58]. In contrast to haplotype-based methods, the main advantage of the single locus test is that it does not necessitate information of SNP positions and reconstruction haplotypes of multiple SNP loci. Thus, it is the preferable method for large scale genome-wise association analyses, e.g., GWAS. Also, we individually perform GWAS for each of five milk production traits. This is the most conventional strategy for current GWAS. However, the five milk production traits considered here are generally regarded as correlated and thus should share common environmental/genetic factors. A multiple traits instead of single trait analysis may be a promising way to take correlations among these traits into consideration. Multivariate analyses have been widely adopted in linkage studies [59], [60], [61], [62], [63], and it has been generally accepted that multivariate analyses outperform univariate analyses in terms of increasing statistical power and precision of parameter estimation [64], [65]. In the next step, an optimal multiple traits analytical strategy will be pursued to further enhance our GWA studies. In our study, the EBVs of daughters were used as phenotypes for association analysis. Besides EBVs, yield deviation (YD) and de-regressed EBVs of individuals are also commonly used as phenotypic observations in GWAS as well as in LA and LA/LD analyses for milk production traits. Comparison among these three kinds of phenotypes with respect to their influence on QTL mapping [66] and marker assisted selection studies [67] demonstrate that none of them has absolute advantages over the others. We also compared using EBVs and de-regressed EBVs as phenotypes for our GWAS and it turned out that the findings of them are basically overlap (data not shown). Therefore, only the findings from using EBVs are reported herein. In all, the present study revealed 105 genome-wise significant SNPs for milk production traits in Chinese dairy cattle population using two different association analysis approaches (L1-TDT and MMRA). Most of these SNPs (86 out of 105) are located within the previously reported QTL regions, and some within or close to the reported candidate genes. The general consistence of the significant SNPs detected herein with the reported QTL and candidate genes and the agreement of the results of the two analysis approaches present strong support for the outcomes of this study. Our findings herein lay a preliminary foundation for guiding follow-up replication studies, and eventually revealing the causal mutations underlying milk production traits in dairy cattle.
  65 in total

1.  Transmission/disequilibrium tests for quantitative traits.

Authors:  F Z Sun; W D Flanders; Q H Yang; H Y Zhao
Journal:  Ann Hum Genet       Date:  2000-11       Impact factor: 1.670

Review 2.  Experience with a test-day model.

Authors:  L R Schaeffer; J Jamrozik; G J Kistemaker; B J Van Doormaal
Journal:  J Dairy Sci       Date:  2000-05       Impact factor: 4.034

Review 3.  Family-based association studies.

Authors:  W J Gauderman; J S Witte; D C Thomas
Journal:  J Natl Cancer Inst Monogr       Date:  1999

4.  Mapping quantitative trait Loci using generalized estimating equations.

Authors:  C Lange; J C Whittaker
Journal:  Genetics       Date:  2001-11       Impact factor: 4.562

5.  Prediction of identity by descent probabilities from marker-haplotypes.

Authors:  T H Meuwissen; M E Goddard
Journal:  Genet Sel Evol       Date:  2001 Nov-Dec       Impact factor: 4.297

6.  A genome scan to identify quantitative trait loci affecting economically important traits in a US Holstein population.

Authors:  M S Ashwell; C P Van Tassell; T S Sonstegard
Journal:  J Dairy Sci       Date:  2001-11       Impact factor: 4.034

7.  A genome scan for QTL influencing milk production and health traits in dairy cattle.

Authors:  D W Heyen; J I Weller; M Ron; M Band; J E Beever; E Feldmesser; Y Da; G R Wiggans; P M VanRaden; H A Lewin
Journal:  Physiol Genomics       Date:  1999-11-11       Impact factor: 3.107

8.  A mammary gland EST showing linkage disequilibrium to a milk production QTL on bovine Chromosome 14.

Authors:  C Looft; N Reinsch; C Karall-Albrecht; S Paul; M Brink; H Thomsen; G Brockmann; C Kühn; M Schwerin; E Kalm
Journal:  Mamm Genome       Date:  2001-08       Impact factor: 2.957

9.  Testing for the presence of previously identified QTL for milk production traits in new populations.

Authors:  P Wiener; I Maclean; J L Williams; J A Woolliams
Journal:  Anim Genet       Date:  2000-12       Impact factor: 3.169

10.  Single-nucleotide polymorphism versus microsatellite markers in a combined linkage and segregation analysis of a quantitative trait.

Authors:  E Warwick Daw; Simon C Heath; Yue Lu
Journal:  BMC Genet       Date:  2005-12-30       Impact factor: 2.797

View more
  94 in total

1.  Genetic variances of SNP loci for milk yield in dairy cattle.

Authors:  Petr Pešek; Josef Přibyl; Luboš Vostrý
Journal:  J Appl Genet       Date:  2014-11-16       Impact factor: 3.240

2.  Detection of QTL for greasy fleece weight in sheep using a 50 K single nucleotide polymorphism chip.

Authors:  Fatemeh Ebrahimi; Mohsen Gholizadeh; Ghodrat Rahimi-Mianji; Ayoub Farhadi
Journal:  Trop Anim Health Prod       Date:  2017-08-11       Impact factor: 1.559

3.  Genome-wide copy number variation in Hanwoo, Black Angus, and Holstein cattle.

Authors:  Jung-Woo Choi; Kyung-Tai Lee; Xiaoping Liao; Paul Stothard; Hyeon-Seung An; Sungmin Ahn; Seunghwan Lee; Sung-Yeoun Lee; Stephen S Moore; Tae-Hun Kim
Journal:  Mamm Genome       Date:  2013-03-30       Impact factor: 2.957

4.  Polymorphisms of caprine GnRHR gene and their association with litter size in West African Dwarf goats.

Authors:  M N Bemji; A M Isa; E M Ibeagha-Awemu; M Wheto
Journal:  Mol Biol Rep       Date:  2017-12-29       Impact factor: 2.316

5.  A novel selection signature in stearoyl-coenzyme A desaturase (SCD) gene for enhanced milk fat content in Bubalus bubalis.

Authors:  J Maryam; M E Babar; Zhang Bao; A Nadeem
Journal:  Trop Anim Health Prod       Date:  2016-06-16       Impact factor: 1.559

6.  Disentangling prenatal and postnatal maternal genetic effects reveals persistent prenatal effects on offspring growth in mice.

Authors:  Jason B Wolf; Larry J Leamy; Charles C Roseman; James M Cheverud
Journal:  Genetics       Date:  2011-09-02       Impact factor: 4.562

7.  Genome-wide association study of 8 carcass traits in Jinghai Yellow chickens using specific-locus amplified fragment sequencing technology.

Authors:  Wenhao Wang; Tao Zhang; Jinyu Wang; Genxi Zhang; Yongjuan Wang; Yinwen Zhang; Jianhui Zhang; Guohui Li; Qian Xue; Kunpeng Han; Xiuhua Zhao; Hongkun Zheng
Journal:  Poult Sci       Date:  2016-03       Impact factor: 3.352

8.  Targeted imputation of sequence variants and gene expression profiling identifies twelve candidate genes associated with lactation volume, composition and calving interval in dairy cattle.

Authors:  Lesley-Ann Raven; Benjamin G Cocks; Kathryn E Kemper; Amanda J Chamberlain; Christy J Vander Jagt; Michael E Goddard; Ben J Hayes
Journal:  Mamm Genome       Date:  2015-11-27       Impact factor: 2.957

9.  Copy number variations of MICAL-L2 shaping gene expression contribute to different phenotypes of cattle.

Authors:  Yao Xu; Liangzhi Zhang; Tao Shi; Yang Zhou; Hanfang Cai; Xianyong Lan; Chunlei Zhang; Chuzhao Lei; Hong Chen
Journal:  Mamm Genome       Date:  2013-11-07       Impact factor: 2.957

10.  Sequence-based genome-wide association study of individual milk mid-infrared wavenumbers in mixed-breed dairy cattle.

Authors:  Kathryn M Tiplady; Thomas J Lopdell; Edwardo Reynolds; Richard G Sherlock; Michael Keehan; Thomas Jj Johnson; Jennie E Pryce; Stephen R Davis; Richard J Spelman; Bevin L Harris; Dorian J Garrick; Mathew D Littlejohn
Journal:  Genet Sel Evol       Date:  2021-07-20       Impact factor: 4.297

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.