Literature DB >> 24330634

Genomic divergence of zebu and taurine cattle identified through high-density SNP genotyping.

Laercio R Porto-Neto1, Tad S Sonstegard, George E Liu, Derek M Bickhart, Marcos V B Da Silva, Marco A Machado, Yuri T Utsunomiya, Jose F Garcia, Cedric Gondro, Curtis P Van Tassell.   

Abstract

BACKGROUND: Natural selection has molded evolution across all taxa. At an arguable date of around 330,000 years ago there were already at least two different types of cattle that became ancestors of nearly all modern cattle, the Bos taurus taurus more adapted to temperate climates and the tropically adapted Bos taurus indicus. After domestication, human selection exponentially intensified these differences. To better understand the genetic differences between these subspecies and detect genomic regions potentially under divergent selection, animals from the International Bovine HapMap Experiment were genotyped for over 770,000 SNP across the genome and compared using smoothed F(ST). The taurine sample was represented by ten breeds and the contrasting zebu cohort by three breeds.
RESULTS: Each cattle group evidenced similar numbers of polymorphic markers well distributed across the genome. Principal components analyses and unsupervised clustering confirmed the well-characterized main division of domestic cattle. The top 1% smoothed F(ST), potentially associated to positive selection, contained 48 genomic regions across 17 chromosomes. Nearly half of the top F(ST) signals (n = 22) were previously detected using a lower density SNP assay. Amongst the strongest signals were the BTA7:~50 Mb and BTA14:~25 Mb; both regions harboring candidate genes and different patterns of linkage disequilibrium that potentially represent intrinsic differences between cattle types. The bottom 1% of the smoothed F(ST) values, potentially associated to balancing selection, included 24 regions across 13 chromosomes. These regions often overlap with copy number variants, including the highly variable region at BTA23:~24 Mb that harbors a large number of MHC genes. Under these regions, 318 unique Ensembl genes are annotated with a significant overrepresentation of immune related pathways.
CONCLUSIONS: Genomic regions that are potentially linked to purifying or balancing selection processes in domestic cattle were identified. These regions are of particular interest to understand the natural and human selective pressures to which these subspecies were exposed to and how the genetic background of these populations evolved in response to environmental challenges and human manipulation.

Entities:  

Mesh:

Year:  2013        PMID: 24330634      PMCID: PMC4046821          DOI: 10.1186/1471-2164-14-876

Source DB:  PubMed          Journal:  BMC Genomics        ISSN: 1471-2164            Impact factor:   3.969


Background

Natural selection has shaped the genome of all living creatures in our planet, including domesticated animals. Nearly all modern cattle can be associated with one of two types or sub-species. This division between the types Bos taurus taurus (taurine cattle) and Bos taurus indicus (zebu cattle) is estimated to have occurred from a common ancestor between 330,000 [1] and 2 million [2] years ago. Since divergence, cattle types have accumulated different genetic variations, which have contributed to highly differentiated phenotypes. It is assumed that the divergence between cattle types was long before domestication, which is estimated to have occurred between 10,000 to 7,000 BC in two separate locations: the Fertile Crescent (taurine cattle) and the Indus Valley (zebu cattle) [3, 4]. After domestication human-oriented selection added further complexity to the evolution of cattle. For most of the history of human-cattle coexistence the environment was the main force driving changes in the animals’ genome. Shortly after domestication, human breeders preferred traits that enabled easy management; however, breeders also sought production improvement traits as well [5]. The introduction of the concept of breed in the 19th century led to human-oriented selection imposing strong bottlenecks, which created population demes based on phenotypes. Breed formation was followed by breed expansion via the use of artificial insemination, which reduced genetic variability within breeds particularly in the sex chromosomes and mitochondrial DNA [6]. This is due to the fact that only one haplotype is passed on to the following generation, and subjected to stronger selective forces when compared to autosomal chromosomes. Positive and balancing selections are terms used to characterize different aspects that selection forces might impose on a population. Positive selection, also termed directional or purifying selection, refers to the selection process through which a particular phenotype (or genotype) is favored in a given environment, which leads to an increase of its frequency in a population. In contrast, balancing selection refers to the selective process through which multiple alleles are selected, thus preserving the genetic diversity in a population. Balancing selection is often observed when heterozygous animals have a competitive advantage. Alternatively, these may be regions of convergent selection across groups. Importantly, both positive and balancing selection phenomena can be tracked using SNP genotypes or sequence data from the cattle genome. SNP genotyping has become widely used in animal genetics and a number of methods have been developed to identify regions under selection. Out of these FST is a widely used statistic to evaluate the diversity of subpopulations of animals or to determine the relative distance between populations. Many variations of the FST concept [7] exist, but all adhere to the core principle of being a metric of allele frequencies and their variance. This metric has also been used to identify loci under selection [8-10]. In this study, we used a pure drift FST model [11] which assumes all animals originated from the same ancestral population. This model was applied to taurine and zebu animals to identify loci under selection. These two groups correspond to the main (and most ancestral) separation of domestic cattle, which in most but not all cases corresponds to animals adapted to tropical and temperate environments. The identification of such loci can aid in the identification of genes and genomic variants that are related to environmental adaptation and/or selection derived from human agro-pastoral activities.

Methods

Statement on the ethical use of animals

No ethics statement was required for the collection of genetic material. The DNA from animals included in this study were either part of previous analyses that obtained specific permissions [12] or were extracted from semen straws collected in accredited AI centers in accordance with the Brazilian legislation on animal welfare.

Cattle samples and SNP genotypes

All individuals were genotyped using the BovineHD BeadChip that includes ~777 K SNP (Illumina, Inc. San Diego, USA) following standard procedures. The SNP set included in this genotyping platform was carefully selected to reduce the potential for ascertainment bias during SNP discovery. Seven different grouping of breeds were used to assess the minor allele frequency of all available SNP, this included Holstein, Angus, Nelore, Bos taurus taurus dairy excluding Holstein, Bos taurus taurus beef ignoring Angus, Bos taurus indicus excluding Nelore, and adapted Bos taurus taurus (e.g. Senepol). This was complemented with sequence data from 30 breeds that were compiled and weighted to minimize ascertainment bias. More information on the BovineHD can be found in the supplier’s website (http://www.illumina.com/documents//products/datasheets/datasheet_bovineHD.pdf). Only animals with call rates > = 98%, and SNP with more than 95% successful genotypes were kept in the final dataset. Filtering was also based on available pedigree information and the estimated proportion of alleles shared identical-by-descent (PI_HAT > 0.8) ([13]http://pngu.mgh.harvard.edu/~purcell/plink/), animals with high relatedness were excluded. A total of 339 Bos taurus taurus or taurine individuals from the Bovine Hapmap DNA panel [12] were included in the analyses. Breeds represented in this group were: Angus (n = 44), Brown Swiss (n = 24), Charolais (n = 37), Guernsey (n = 21), Hereford (n = 36), Holstein (n = 63), Jersey (n = 39), Limousin (n = 47), Norwegian Red (n = 17), and Red Angus (n = 11). The Bos taurus indicus or Zebu animals (n = 166) were also from the Bovine Hapmap experiment, and they were complemented with additional individuals. Breeds represented in this group were: Nelore (n = 91), Gir (n = 50), and Guzera (n = 25). Even though Brahmans are considered zebu animals, it is known that taurine animals were also used during the breed formation and expansion; therefore they were not included in these analyses.

Population and linkage disequilibrium (LD) structure

Pairs of markers with high linkage disequilibrium (LD) provide redundant information and impose higher computational demands for population structure analyses. To remove extraneous information, the dataset was pruned based on LD between markers using the PLINK [13] command --indep-pairwise 50 10 0.1, which calculates LD for each pair of marker in a window of 50 SNP. If a pair of SNP had r2 > 0.1, then one of the SNP was removed, the window was moved 10 SNP and the process restarted. The pruned genotypes defined a dataset including 38,681 SNP that were then used to assess the population structure using three methods: 1) unsupervised clustering of individuals based on maximum likelihood as implemented in the program Admixture Version 1.20 [14] with cluster number (K) equal 2; 2) principal components analysis as implemented in GCTA [15]; and 3) estimated genetic relationship matrix [16] visualized as heat map using R [17]. For plots of LD between markers, r2 were calculated using Haploview [18].

Identification of genomic regions under selection

FST statistics were used to characterize the differentiation between taurine and zebu animals by first identifying SNP potentially under selection. Next, genomic regions with a high proportion of such SNP were identified, and then the genic content of regions with extreme signals for positive and balancing selection were further analyzed. The estimation of SNP FST was based on a pure drift model defined by Nicholson et al. [11], following the simplification proposed by Flori et al. [10]. These analyses were performed using R [17] scripts. The SNP FST were smoothed across the Bovine genome reference assembly UMD 3.1 [19] using a local variable bandwidth kernel estimator [20] (R package lokern), where every fifteen SNP FST values generated one smoothed FST value. This bandwidth was used because it covers a region of ~50Kb which is the average extent of LD found in these populations. The genomic regions with predominantly higher FST values usually resulted in high values of smoothed FST and were potentially associated to positive selection. In contrast, regions with mainly low FST values generated low smoothed FST values and were potentially associated to regions under balancing selection. The top and bottom 1% smoothed FST values were identified, translated into genomic position (UMD 3.1) and the genic content of each region was tested for gene ontology overrepresentation. The cattle chromosome X (BTAX) is highly differentiated between taurine and zebu animals. Therefore, the identification of the top and bottom 1% values included only the autosomes, being the BTAX analyzed separately as it contains regions under relatively strong positive selection. Similar analyses were also performed only within-taurine (n = 9 breeds, the Red Angus was excluded due to small sample size) and only within-zebu (n = 3 breeds). These analyses were performed to gather hints as to the origin of the differentially selected regions seen between zebu and taurine cattle. Regions harboring copy number variants (CNV) might also be under selection and contributing to an observed selection signal, therefore CNV regions that coincide to smoothed FST peaks were further explored. Gene content of cattle CNV regions was assessed using Ensembl (ftp://ftp.ensembl.org/pub/current_fasta/bos_taurus/pep/). It is worthwhile to point out that FST and CNV results did not use the exact same samples. CNV results are based on Bickhart et al. [21] that use a Holstein, a Nelore, a Hereford and 3 Angus samples, also included in the FST analyses. Intersections between balancing selection region coordinates and exon positions were compared using MySQL queries. We obtained a catalog of all bovine peptides from Ensembl. This yielded 22,118 peptides, 345 of which overlap with 24 predicted balancing selection regions, and corresponded to 318 unique Ensembl genes. Using PANTHER version 7 [22], we tested for over representation of biological process, molecular function and pathway terms under the balancing selection regions. Results were Bonferroni [23] adjusted and PANTHER terms with less than five observations were not further analyzed. Similar analyses were performed on the peptides under the 48 positive selection regions. PANTHER results were similar when all peptides under the 24 balancing selection regions and 48 positive selection regions were combined in a single analysis.

Results

SNP genotypes

After quality control, a total of 768,506 SNP were considered. In taurine, most of the autosomes had >90% of markers polymorphic and in zebu slightly less markers were polymorphic (between 80-90%). This distribution was similar across all autosomes; however, the taurine group had a reduced proportion of polymorphic markers when compared to the zebu on BTAX (Figure 1A). Most autosomes had >80% of SNP polymorphic in both groups, with ~10% polymorphic only in taurine and only a reduced number of SNP exclusively polymorphic in zebu. The zebu exclusive SNP group was different again for the BTAX where ~50% of the SNP were polymorphic in both groups and close to 40% polymorphic only in zebu (Figure 1B). Within cattle types, the average heterozygosity was 0.21 and 0.29 for zebu and taurine.
Figure 1

Polymorphic status of the BovineHD (Illumina) markers in zebu and taurine cattle. A) Proportion of polymorphic markers, and B) Proportion of markers by polymorphic status across both cattle types.

Polymorphic status of the BovineHD (Illumina) markers in zebu and taurine cattle. A) Proportion of polymorphic markers, and B) Proportion of markers by polymorphic status across both cattle types.

Population substructure

The separation between taurine and zebu is the most substantial type-distinction between domestic cattle. Clustering animals based on the genetic relationship matrix clearly demonstrates this division between cattle populations (Figure 2), which is also seen using an unsupervised clustering with selected number of clusters K = 2 (Additional file 1: Figure S1A). This latter analysis evidences the majority of individuals are pure bred within each cattle type assigning an estimated proportion of more than 0.9 for either the zebu or taurine clusters.
Figure 2

Heatmap of relationship between individuals of 10 taurine and 3 zebu cattle breed (n = 505) based on the genetic relationship matrix calculated using 768,506 SNP genotypes.

Heatmap of relationship between individuals of 10 taurine and 3 zebu cattle breed (n = 505) based on the genetic relationship matrix calculated using 768,506 SNP genotypes. The first principal component, which is the axis that explains the most variance, not surprisingly corresponds to the same main division. The second principal component starts to subdivide the taurine animals (Additional file 1: Figure S1B and C). This subdivision of taurine animals was also seen in four independent runs of principal components analyses that used the same number of individuals per breed and different random combinations of taurine breeds in addition to the three zebu breeds (Additional file 2: Figure S2). This agrees with the lower pair-wise FST observed between zebu breeds in comparison to taurine breeds (Additional file 3: Table S1).

Genomic regions under selection

Regions under positive and balancing selection were defined as the regions in the top and bottom 1% of all smoothed FST values, respectively (Figure 3, Tables 1 and 2).
Figure 3

Smoothed FST comparing taurine and zebu animals. Only autosomes are plotted in alternated shades of gray. The top and bottom 1% values are highlighted in blue and green, corresponding to the regions under positive and balancing selections.

Table 1

Positive selection: regions in the top 1% smoothed F values

RegionBTASNP start posSNP end posHighest sFSTCNV [21]Within cattle type**Candidate genesCross reference
P1247,857,33548,065,1610.3871
P2*271,565,08672,885,8230.4981Hol[24]
P3319,689,64820,166,0590.3990Hol, Nor, BroCDC42SE1
P4394,742,47995,401,3450.3960
P5412,106,87812,361,8110.3840
P6446,670,94046,814,8750.3800[25]
P7548,229,55648,336,9960.3810[9, 12, 24]
P8555,881,76656,801,7290.4241HolSTAT6, GLI1[24, 25]
P9721,008,80521,606,6670.3910GueITGB1BP3[24, 25]
P10747,299,49747,859,3290.4333LimSPOCK, PPP2CA[12]
P11*750,951,86153,757,3840.8265Ang, Cha, Gue, Nor, GirCD14, CDC23, EGR1, MYOT, TMEM173[24]
P12839,288,11539,800,4920.3930CD274[24]
P13853,490,84554,592,3810.4401Nor[26]
P14858,649,67458,727,0040.3800
P15861,543,37962,874,7500.4641
P16869,691,21470,488,0610.4930AngPOLR3D, PPP3CC
P17873,617,35573,704,6340.3820
P181036,488,82937,051,5370.4160HerCHP[25]
P191227,935,60429,508,9400.4391Hol[24, 25]
P201334,119,21135,054,0480.4020ZEB1
P211348,893,09649,816,6190.4080Guz[9]
P221424,603,09025,298,9720.3950NorPLAG1, XKR4, MOS[10]
P231436,715,71037,511,6580.4440Gue[24]
P241438,919,66939,027,0080.3830Gue
P251442,121,45042,376,9700.3890
P261445,478,31546,437,2760.4300[24]
P271640,318,96540,656,9610.3900
P281640,886,79741,149,8600.3830
P291641,564,54242,407,9970.4400Jer
P301643,250,88043,501,1000.3900[27]
P31*1644,277,28645,534,1770.5102PIK3CD
P321811,298,09611,959,3920.4090BroIRF8[24]
P331814,171,62414,702,6570.4230ACSF3, SPATA2L[10]
P342013,714,10915,135,1070.4290[24]
P35*2071,629,01871,967,6220.5080
P36*2183,7662,416,4320.5640Gue
P37*2131,681,77633,273,6580.5301Her
P382145,793,88345,979,5890.3910
P392168,349,15268,943,2490.4480AngHSP90AA1, PPP2R5C
P402424,114,81624,452,3440.3831Nor
P412951,452,98652,452,9860.3821
P423033,064,38150,329,4060.8100IRAK1, BCAP31, CETN2, GAB3, IKBKG, KIR3DL2, MTM1, SRPK3
P433053,439,16757,258,1570.7110BTK
P443068,834,83871,300,0490.8610CYLC1[12]
P453072,360,20679,580,9640.8440[12]
P463084,352,05285,706,2190.7640IL2RG[12]
P473096,229,383100,603,1580.6990ALAS2, SMC1A, LOC524601, SPIN2, VSIG4
P4830130,500,087132,116,0400.6700RS1

*Regions containing smoothed FST (sFST) in the top 0.1%.

**Regions at sFST top 1% of within taurine and within zebu breeds. Full table of within cattle type results for candidate regions under positive selection is on Additional file 6: Table S3. Ang – Angus; Bro – Brown Swiss, Cha – Charolais, Gue – Guernsey, Jer – Jersey, Her – Hereford, Hol – Holstein, Lim – Limousin, Nor – Norwegian Red, Guz – Guzera, Nel – Nelore.

Table 2

Balancing selection: regions in the bottom 1% smoothed F values

RegionBTASNP start posSNP end posLowestCNV [21]Within cattle type**Candidate genesCross reference
B14110,295,764111,378,1060.0702
B24111,742,866112,562,9020.0762CNTNAP2
B3519,457,75619,898,2370.0750ATP2B1
B4576,719,32777,207,4350.0670PKP2
B562,883,3134,231,1430.0596CCNA2, ANXA5
B6612,490,54513,266,4730.0621CAMK2D
B7654,759,46455,199,7550.0624
B8661,590,74661,892,9760.0760APBB2
B96118,252,961118,649,3640.0610Guz
B10765,205,18365,242,1210.0790GLRA1
B11798,598,18899,371,1570.0503ERAP2, LNPEP
B121111,993,67613,090,8230.0720DYSF
B131116,914,70117,716,2040.0742
B141270,094,56176,785,7430.0596ABCC4
B151453,550,21354,231,3800.0662Nel, Guz, Jer
B161616,039,26117,069,2400.0761FAM5C
B171619,740,33620,450,7790.0730ESRRG
B181636,476,83037,151,5560.0681XCL2
B19178,512,1658,575,7000.0790
B202169,852,42970,269,5310.0540Guz
B21221,504,5831,623,8840.0781SEC61G, NEK10
B22*2324,242,54731,194,9610.02530Ang, Cha, Her, LimBOLA (MHC) genes, TNF, AGER, NCR3, C2, CFB, LY6G6F, BTNL2, IL17A, IL17F, CLIC1, CSNK2B, MOG[12, 24]
B232332,608,46833,237,2580.0691ChaALDH5A1, TDP2, GMNN
B242646,663,80247,234,1090.0550Cha

*Regions containing smoothed FST (sFST) in the bottom 0.1%.

**Regions at sFST bottom 1% of within taurine and within zebu breeds. Full table of within cattle type results for candidate regions under balancing selection is on Additional file 7: Table S4. Ang – Angus; Bro – Brown Swiss, Cha – Charolais, Gue – Guernsey, Jer – Jersey, Her – Hereford, Hol – Holstein, Lim – Limousin, Nor – Norwegian Red, Guz – Guzera, Nel – Nelore.

Smoothed FST comparing taurine and zebu animals. Only autosomes are plotted in alternated shades of gray. The top and bottom 1% values are highlighted in blue and green, corresponding to the regions under positive and balancing selections. Positive selection: regions in the top 1% smoothed F values *Regions containing smoothed FST (sFST) in the top 0.1%. **Regions at sFST top 1% of within taurine and within zebu breeds. Full table of within cattle type results for candidate regions under positive selection is on Additional file 6: Table S3. Ang – Angus; Bro – Brown Swiss, Cha – Charolais, Gue – Guernsey, Jer – Jersey, Her – Hereford, Hol – Holstein, Lim – Limousin, Nor – Norwegian Red, Guz – Guzera, Nel – Nelore. Balancing selection: regions in the bottom 1% smoothed F values *Regions containing smoothed FST (sFST) in the bottom 0.1%. **Regions at sFST bottom 1% of within taurine and within zebu breeds. Full table of within cattle type results for candidate regions under balancing selection is on Additional file 7: Table S4. Ang – Angus; Bro – Brown Swiss, Cha – Charolais, Gue – Guernsey, Jer – Jersey, Her – Hereford, Hol – Holstein, Lim – Limousin, Nor – Norwegian Red, Guz – Guzera, Nel – Nelore.

Regions under positive selection

The top 1% smoothed FST values were distributed across 48 regions in 17 chromosomes (Table 1) including the BTAX (not shown in Figure 3). Of those, 12 regions were known to harbor copy number variations, and 22 regions had been described as under positive selection in previous studies (Table 1). Twenty of them also overlapped on one or more breed specific peaks in the within cattle type analyses. Among the previously described peaks, 10 of them overlapped to taurine breed signals, and 1 to a zebu breed peak. The search for overrepresentation of gene ontology terms was not conclusive. Nevertheless, some regions can be highlighted because of their genic content and/or results from previous studies identifying them as being under selection. The BTA7:47.2-53.7 Mb region (Table 1: regions P10 and P11) harbors two closely linked regions that are potentially under selection. These regions contain a number of immune-related and imprinted genes (CD14, HSPA9 and PCDH family) previously identified to be under selection, and associated with cattle fertility (SPOCK). Moreover, a number of CNV are located in the same region and linkage disequilibrium (LD) blocks larger than the average genomic LD are present in both taurine and zebu animals with LD blocks varying in length (Additional file 4: Figure S3A). Another interesting region is the BTA14:24.6-25.2 Mb region (Table 1: region P22), which confirmed previous results [10] and was recently associated with cattle production-related traits. Interestingly, the zebu and taurine LD patterns also markedly vary within this region (Additional file 4: Figure S3B). The BTAX is the final region to be highlighted, as almost the entire chromosome was shown to be highly differentiated between taurine and zebu.

Regions under balancing selection

The bottom 1% smoothed FST values consisted of 24 genomic regions across 13 chromosomes (Table 2). Of those, only a region on BTA23 had been previously described as a candidate for balancing selection. This region also overlapped taurine breed signals from the within-taurine analysis. In total, 6 regions overlapped within cattle type analyses, three to zebu breed peaks and four to taurine breeds. Fourteen of these regions have been described as having CNV. These included the large region (Table 2: B22) on BTA23:24.2-31.1 Mb comprising the BOLA gene family (MHC – II molecules) which harbors 30 described CNV. This region has also been previously associated with balancing selection [12, 24] in cattle (Table 2). The 24 balancing selection regions overlap with 345 Ensembl peptides, corresponding to 318 unique Ensembl genes (Table 2). Additionally ~83% (20/24) of the balancing selection regions completely or partially span cattle Ensembl genes. We assigned PANTHER accessions to a total of 332 overlapping peptides. Statistically significant over represented peptides were observed for multiple categories. Five pathways were found significantly overrepresented (adjusted p-value <0.05): the olfactory transduction, systemic lupus erythematosus, type I diabetes mellitus, antigen processing and presentation, graft-versus-host disease and allograft rejection pathways; all of which could be linked to immune response systems (a biological process also overrepresented). The average FST for each chromosome in each analysis can be found in the Additional file 5: Table S2. Also in the supplementary material all top and bottom FST peaks for all analyses are presented (Additional file 6: Table S3 and Additional file 7: Table S4).

Discussion

In all, 505 animals derived from 10 taurine and 3 zebu cattle breeds were genotyped across more than 770,000 SNP markers to investigate the genomic changes subsequent to the separation between taurine and zebu cattle, which occurred at a date between 330 thousand and 2 million years ago [1, 2]. Evaluation of the SNP genotyping platform suggested there was minimal bias in properly characterizing both subspecies of animals, except possibly on the sex chromosomes. As expected, most of the chromosomes had a higher proportion of polymorphic markers in taurine animals, also resulting in higher heterozygosity, when compared to zebu (Figure 1A). This is due to the fact that most of the SNP described for cattle were identified using the reference sequence of a taurine animal [19, 28], but this should not overly impact population diversity metrics [29]. Nevertheless, all chromosomes have >80% SNP polymorphic in both cattle types, exception made for BTA1, 13, X and Y (Figure 1B), providing a large number of informative markers. Clustering animals based on the genetic relationship matrix, not surprisingly, split the animals into two groups (taurine and zebu) in agreement to the division along the first principal component and the magnitude of pair-wise FST between breeds. The split along the second principal component between taurine breeds suggests that there is more variation within this cattle type than there is within zebu. Since it is known that unbalanced principal components analyses could mislead interpretations of population structures [30], four randomized evenly sampled analyses were run (Additional file 2: Figure S2). These additional analyses supported the previous results. This could be partly due to more intensive selection and reproductive isolation in taurine breeds than among zebu cattle. However, even though the BovineHD BeadChip was developed to minimize potential ascertainment bias, one cannot entirely reject the possibility that the subdivision seen on principal component 2 was due to this potential bias carried over by the genotyping platform. In the near future when whole genome sequences from a number of breeds and cattle types become available a definitive conclusion about this aspect will be drawn. The BTAX and Y carry a great number of SNP with high difference in allelic frequencies between groups. These chromosomes have probably undergone much stronger selection or, more parsimoniously, higher genetic drift, due to their unique inheritance [6], and the history of domestication, selection, breed formation. Furthermore, the intensive use of artificial insemination techniques have likely contributed to the reduction of genetic variability within breeds (or cattle types) in these chromosomes. It is understood that in the case of the SNP that are polymorphic in both cattle types, the alternative allele likely arose within the cattle population before the split between taurine and zebu, and remained in both populations at variable frequencies. Alleles that are fixed in one subspecies and variable in the other possibly arose after the split. However, this understanding does not take into account that alleles that were fixed in one population also might have arisen before the split, but were fixed due to different selection processes or as a result of different bottlenecks on the populations. The identification of the ancestral allele of these SNP, ideally using whole genome sequences of other Bovids, would contribute to understand the evolutionary processes behind these monomorphic sites. The use of metrics based on variance of allelic frequencies in order to identify genomic regions that are potentially under selection, such as FST, have already been explored in a number of studies using cattle [10, 28, 31], sheep [9] and dogs [8]. In this study a relatively high density of markers (average gap between markers 4Kb) was applied to detect genomic differences between zebu and taurine using FST, identifying regions that were potentially associated with different types of selection. Due to their original geographic distribution, taurine cattle are more adapted to temperate climate, while zebu cattle are better adapted than most taurine cattle to tropical environments. Therefore, differences between these two cattle could be linked to adaptation to the environment; however, it is likely that selection imposed by humans in different geographical locations and livestock-product production goals may have also produced regions that were under differing selective pressures. This study, the most comprehensive to date for cattle, identified 48 regions under potential positive and 24 under balancing selection. A number of these positive selection candidates have been identified to be under selection in previous studies (22 out of 48, Table 1). These previous studies cannot strictly be considered independent analyses since a subset of markers included in the analyses presented here were already used in those. However, in this work more than a 10 fold increase in marker density was used, thus reducing the overlap of SNP across experiments to less than 10%. Further, different cattle samples and populations were used. Thereafter, even though not absolutely independent, from previous studies, our results lend support to the findings from previous articles provide new insights on ancient differentiation between zebu and taurine cattle. These regions may be genomic segments that were under natural selection or drift, but in fact, might for instance represent zebu fragments that were introgressed in taurine breed potentially defining low-level admixed populations [24, 25]. A parallel could also be drawn to described QTLs that overlap these highly divergent genomic regions, e.g. on BTA14:~25 Mb which harbors quantitative loci for stature [32], fertility [33-35] and subcutaneous fat [36]. The different LD structure in these regions supports the concept of introgressed segments as a way of sharing recent polymorphisms between the cattle types [37], and defines quantitative loci and signatures of selection. The highest differentiation peak was found in BTA7:~50 Mb. This region had previously been identified as a site containing a signature of selection [12, 24]. A number of features were also identified in this region, including different LD structure between zebu and taurine cattle, the presence of imprinted genes, and potential association to fertility traits. This region is among the very few regions for positive selection that also contain CNVs; which may seem antagonistic to purifying selection. It is not clear at this point how CNV are being kept in the population at this site and at the same time there is a differential signal for zebu and taurine cattle. It could be in consequence that these CNV being less likely in LD with neighbouring SNPs, because similar CNVs can occur on different haplotype backgrounds. Another possibility is that duplications can initiate gene conversion events, which can then decrease the LD surrounding such variants. Interestingly, CNVs were often observed at most candidate sites for balancing selection, where variation is expected. Fourteen out of 24 balancing selection regions overlap identified CNVs, including the highly variable region on BTA23: ~24 Mb with 30 described CNV (Table 2). This set of balance selection-derived genes possess a wide spectrum of molecular functions and provide a rich resource for testing hypotheses on the genetic basis of phenotypic variation within and among breeds. Consistent with similar analyses in other mammals (human, mouse and dog), several of these genes, which are important in drug detoxification, defense/innate and adaptive immunity, are also highlighted by these analyses in cattle. These gene families include the bovine MHC (BoLA), ATP-binding cassette (ABC) transporters, Glutathione S-transferases, Complement factors, Interleukin-17A (IL17A), Heat shock 70 kDa protein 1A (HSPA1A), Chloride intracellular channel protein 1 (CLIC1), and Casein kinase II subunit beta (CSNK2B), which support the shared GO terms among mammals. Conservation of these genes across mammals suggests that selective pressure may drive acquisition or retention of species-specific gene functions. On the other hand, lineage-specific selection events were detected in mammals, especially in mice and rats. In this regard, it is intriguing to note that mammary gland development genes, such as Butyrophilin-like protein 2 (BTNL2) and Myelin-oligodendrocyte glycoprotein (MOG) were enriched in GO Biological process on the PANTHER analyses. We also detected marked variation between individuals and across diverse cattle breeds, which indicates that these selection events may have occurred within the artiodactyla and/or Bos lineages contributing to cattle speciation and domestication. Genome-wide, most CNVs evolved under neutral evolutionary pressures. Their frequency and sequence context were shaped by demographic events, mutation rate and genetic drift. However, most CNVs in potentially functional regions, especially those overlapping genes, are under purifying selection and there are only a few examples of CNVs on these positive selection sites. Regions that differ in copy number between subspecies can be informative about ancient adaptations that may have led to species-specific phenotypes [38]. Recent copy number changes can inform about human selection that may have led to genetic and phenotypic differences between breeds. Similar to selection for variability seen in balancing regions that result in low FST values, it is worth noting that low values could also represent purifying selection forces that are simultaneously applied in both populations in the same direction, imposing high similarity between the compared groups which would result in low differentiation (low FST). In this case, a potential deleterious mutation affecting both populations would be selected against in both groups. This can partially explain the high frequency of genes associated to Mendelian diseases within those potential balancing selection regions. Highlighting a few examples, Dysferlin (DYSF) is associated to muscular dystrophy [39], ATPase, Ca (2+)-transporting, plasma membrane, 1 (ATP2B1), where mouse knockouts have identified variation underlying embryonic lethality, and has a critical role in male fertility [40], Plakophilin 2 (PKP2), which is linked to circulatory system conditions [41, 42], and Cyclin A (CCNA2) that is an essential regulatory molecule for the cell’s cycle [43]. It is not clear at this point, and it will require further investigation to define if the selection signals seen in these regions are due to the presence of those candidate genes or not. It is not completely clear at this point how the observed signals of selection originated. The within-taurine and within-zebu FST complement the taurine-zebu contrast analysis providing hints on the breed driving each signal. From the autossomal regions previously described as candidate regions under positive selection, around half of them overlap to signals of one or more breeds in the within-taurine analysis (10 out of 19), which is consistent with one’s expectation, since the majority of previous work was done using mostly taurine breeds, and in a few cases also composite cattle. There was only one region previously described as a candidate for balancing selection, in BTA23, and this also overlaps with within-breed type signals. A number of peaks were characterized with more than one breed specific peak in the within-breed analyses, supporting a commonality of selective pressure in at least a few regions in some breeds. However, not all observed signals from the comparison taurine-zebu could be attributed to a specific breed (s), and these suggest that they represent a deeper degree of separation and, possibly, adaptation between cattle types. In summary, genomic regions that are linked to positive and balancing selection were detected within taurine and zebu cattle, which represent the major sub-division of domestic cattle. A number of previously described regions containing positive selection were confirmed. Novel selection regions were likely discovered due to the higher resolution of informative SNPs available in this study compared to previous analyses. Some of these regions overlap with production QTL, and e.g. immune-related genes, suggesting that favorable variations to adaptation and production are present in the general cattle population, however the application of these results into breeding programs to accelerate creation of synthetic breeds with high production value in tropical environments remains elusive until subsequent investigations confirm the underlying effect of the variants underlying the signatures. This information is needed to define breeding systems able to efficiently introgress specific genomic fragments of zebu in taurine cattle and vice-versa.

Conclusions

Genomic regions that are potentially linked to purifying or balancing selection processes in domestic cattle were identified genome-wide. The genetic variants imposing such selective pressure are not known, even though for some regions candidate genes could be assigned, and could serve as resource for new hypothesis testing in the future. These regions are of particular interest to understand the natural and human selective pressures to which these subspecies were exposed and how the genetic background of these populations evolved in response to environmental challenges and human manipulation.

Availability of supporting data

Supporting information is available in the additional files and further supporting data is available from the authors on request. Additional file 1: Figure S1: Population substructure, the main division in domestic cattle (based on 505 individuals, 38,681 SNP). A) Unsupervised clustering result (inferred number of clusters K = 2). The two clusters represent the main division in ancestry of domestic cattle, the zebu (red) and taurine (blue). The estimated proportion of each cluster (y) is given for each individual. #1-91 Nelore, #92-141 Gir, #142-166 Guzera, #167-187 Guernsey, #188-226 Jersey, #227-270 – Angus, #271-281 Red Angus, #282-317 Hereford, #318-364 Limousin, #365-401 Charolais, #402-425 Brown Swiss, #426-488 Holstein, #489-505 Norwegian Red. B-C) Principal components analysis (PCA1 vs PCA2), taurine and zebu animals are plotted B) by cattle type zebu (blue) and taurine (red), and C) by breed. (TIFF 911 KB) Additional file 2: Figure S2: “Balanced” principal components analyses (PCA). In order to investigate if the distribution of the breeds within the principal components factorial plan was due to the uneven number of individuals in each breed, four independent evenly balanced PCA were run. (TIFF 354 KB) Additional file 3: Table S1: Wright’s F-statistics FIS and pair-wise FST between cattle breeds based on 768,506 SNP genotypes. (PDF 72 KB) Additional file 4: Figure S3: Linkage Disequilibrium (r2) of selected regions potentially under positive selection. a) BTA7:47 – 54 Mb. b) BTA14: 24 – 26 Mb. (TIFF 1 MB) Additional file 5: Table S2: Average FST per chromosome for each analysis. (PDF 61 KB) Additional file 6: Table S3: Candidate region for positive selection: top 1% smoothed FST values for all breeds in all analyses. (PDF 2 MB) Additional file 7: Table S4: Candidate regions for balancing selection: bottom 1% smoothed FST values for all breeds in all analyses. (PDF 769 KB)
  33 in total

1.  GCTA: a tool for genome-wide complex trait analysis.

Authors:  Jian Yang; S Hong Lee; Michael E Goddard; Peter M Visscher
Journal:  Am J Hum Genet       Date:  2010-12-17       Impact factor: 11.025

2.  Identification of selection signatures in cattle breeds selected for dairy production.

Authors:  Alessandra Stella; Paolo Ajmone-Marsan; Barbara Lazzari; Paul Boettcher
Journal:  Genetics       Date:  2010-05-17       Impact factor: 4.562

3.  A genome-wide association study of meat and carcass traits in Australian cattle.

Authors:  S Bolormaa; L R Porto Neto; Y D Zhang; R J Bunch; B E Harrison; M E Goddard; W Barendse
Journal:  J Anim Sci       Date:  2011-03-18       Impact factor: 3.159

4.  Variants modulating the expression of a chromosome domain encompassing PLAG1 influence bovine stature.

Authors:  Latifa Karim; Haruko Takeda; Li Lin; Tom Druet; Juan A C Arias; Denis Baurain; Nadine Cambisano; Stephen R Davis; Frédéric Farnir; Bernard Grisart; Bevin L Harris; Mike D Keehan; Mathew D Littlejohn; Richard J Spelman; Michel Georges; Wouter Coppieters
Journal:  Nat Genet       Date:  2011-04-24       Impact factor: 38.330

5.  Analysis of copy number variations among diverse cattle breeds.

Authors:  George E Liu; Yali Hou; Bin Zhu; Maria Francesca Cardone; Lu Jiang; Angelo Cellamare; Apratim Mitra; Leeson J Alexander; Luiz L Coutinho; Maria Elena Dell'Aquila; Lou C Gasbarre; Gianni Lacalandra; Robert W Li; Lakshmi K Matukumalli; Dan Nonneman; Luciana C de A Regitano; Tim P L Smith; Jiuzhou Song; Tad S Sonstegard; Curt P Van Tassell; Mario Ventura; Evan E Eichler; Tara G McDaneld; John W Keele
Journal:  Genome Res       Date:  2010-03-08       Impact factor: 9.043

6.  Footprints of selection in the ancestral admixture of a New World Creole cattle breed.

Authors:  Mathieu Gautier; Michel Naves
Journal:  Mol Ecol       Date:  2011-06-20       Impact factor: 6.185

7.  Tracking footprints of artificial selection in the dog genome.

Authors:  Joshua M Akey; Alison L Ruhe; Dayna T Akey; Aaron K Wong; Caitlin F Connelly; Jennifer Madeoy; Thomas J Nicholas; Mark W Neff
Journal:  Proc Natl Acad Sci U S A       Date:  2010-01-11       Impact factor: 11.205

8.  Application of site and haplotype-frequency based approaches for detecting selection signatures in cattle.

Authors:  Saber Qanbari; Daniel Gianola; Ben Hayes; Flavio Schenkel; Steve Miller; Stephen Moore; Georg Thaller; Henner Simianer
Journal:  BMC Genomics       Date:  2011-06-16       Impact factor: 3.969

9.  A whole genome Bayesian scan for adaptive genetic divergence in West African cattle.

Authors:  Mathieu Gautier; Laurence Flori; Andrea Riebler; Florence Jaffrézic; Denis Laloé; Ivo Gut; Katayoun Moazami-Goudarzi; Jean-Louis Foulley
Journal:  BMC Genomics       Date:  2009-11-21       Impact factor: 3.969

10.  A genealogical interpretation of principal components analysis.

Authors:  Gil McVean
Journal:  PLoS Genet       Date:  2009-10-16       Impact factor: 5.917

View more
  50 in total

1.  Genomic signatures reveal new evidences for selection of important traits in domestic cattle.

Authors:  Lingyang Xu; Derek M Bickhart; John B Cole; Steven G Schroeder; Jiuzhou Song; Curtis P Van Tassell; Tad S Sonstegard; George E Liu
Journal:  Mol Biol Evol       Date:  2014-11-26       Impact factor: 16.240

2.  Genetic diversity, population structure, and correlations between locally adapted zebu and taurine breeds in Brazil using SNP markers.

Authors:  Bárbara Machado Campos; Adriana Santana do Carmo; Andrea Alves do Egito; Arthur Silva da Mariante; Maria Socorro Muaés do Albuquerque; João José Simoni de Gouveia; Carlos Henrique Mendes Malhado; Lucas Lima Verardo; Marcos Vinícius Gualberto Barbosa da Silva; Paulo Luiz Souza Carneiro
Journal:  Trop Anim Health Prod       Date:  2017-08-15       Impact factor: 1.559

3.  Reduced representation approach for identification of genome-wide SNPs and their annotation for economically important traits in Indian Tharparkar cattle.

Authors:  M Joel Devadasan; D Ravi Kumar; M R Vineeth; Anjali Choudhary; T Surya; S K Niranjan; Archana Verma; Jayakumar Sivalingam
Journal:  3 Biotech       Date:  2020-06-16       Impact factor: 2.406

4.  Genotyping, the Usefulness of Imputation to Increase SNP Density, and Imputation Methods and Tools.

Authors:  Florence Phocas
Journal:  Methods Mol Biol       Date:  2022

5.  African Indigenous Cattle: Unique Genetic Resources in a Rapidly Changing World.

Authors:  Okeyo Mwai; Olivier Hanotte; Young-Jun Kwon; Seoae Cho
Journal:  Asian-Australas J Anim Sci       Date:  2015-07       Impact factor: 2.509

6.  Evidence of Bos javanicus x Bos indicus hybridization and major QTLs for birth weight in Indonesian Peranakan Ongole cattle.

Authors:  Hartati Hartati; Yuri Tani Utsunomiya; Tad Stewart Sonstegard; José Fernando Garcia; Jakaria Jakaria; Muladno Muladno
Journal:  BMC Genet       Date:  2015-07-04       Impact factor: 2.797

7.  Genome-wide analysis of genetic diversity and artificial selection in Large White pigs in Russia.

Authors:  Siroj Bakoev; Lyubov Getmantseva; Olga Kostyunina; Nekruz Bakoev; Yuri Prytkov; Alexander Usatov; Tatiana V Tatarinova
Journal:  PeerJ       Date:  2021-07-02       Impact factor: 2.984

8.  Genome-wide association analyses identify genotype-by-environment interactions of growth traits in Simmental cattle.

Authors:  Camila U Braz; Troy N Rowan; Robert D Schnabel; Jared E Decker
Journal:  Sci Rep       Date:  2021-06-25       Impact factor: 4.379

9.  The mosaic genome of indigenous African cattle as a unique genetic resource for African pastoralism.

Authors:  Kwondo Kim; Taehyung Kwon; Tadelle Dessie; DongAhn Yoo; Okeyo Ally Mwai; Jisung Jang; Samsun Sung; SaetByeol Lee; Bashir Salim; Jaehoon Jung; Heesu Jeong; Getinet Mekuriaw Tarekegn; Abdulfatai Tijjani; Dajeong Lim; Seoae Cho; Sung Jong Oh; Hak-Kyo Lee; Jaemin Kim; Choongwon Jeong; Stephen Kemp; Olivier Hanotte; Heebal Kim
Journal:  Nat Genet       Date:  2020-09-28       Impact factor: 41.307

10.  Genomic Variants Revealed by Invariably Missing Genotypes in Nelore Cattle.

Authors:  Joaquim Manoel da Silva; Poliana Fernanda Giachetto; Luiz Otávio Campos da Silva; Leandro Carrijo Cintra; Samuel Rezende Paiva; Alexandre Rodrigues Caetano; Michel Eduardo Beleza Yamagishi
Journal:  PLoS One       Date:  2015-08-25       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.