Yahui Gao1, Mathieu Gautier2,3, Xiangdong Ding1, Hao Zhang1, Yachun Wang1, Xi Wang4, Md Omar Faruque5, Junya Li6, Shaohui Ye7, Xiao Gou7, Jianlin Han8,9, Johannes A Lenstra10, Yi Zhang11. 1. National Engineering Laboratory for Animal Breeding, Key Laboratory of Animal Genetics and Breeding and Reproduction of MOA, College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China. 2. INRA, UMR CBGP (INRA-IRD-Cirad-Montpellier SupAgro), Campus international de Baillarguet, Montferrier-sur-Lez, France. 3. Institut de Biologie Computationnelle, 95 rue de la Galera, 34095, Montpellier, France. 4. Institute of Animal Science and Veterinary Medicine, Shanxi Academy of Agricultural Science, Taiyuan, 030032, China. 5. Department of Animal Breeding and Genetics, Bangladesh Agricultural University, Mymensingh, 2202, Bangladesh. 6. Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, 100193, China. 7. College of Animal Science and Technology, Yunnan Agricultural University, Kunming, 650201, China. 8. CAAS-ILRI Joint Laboratory on Livestock and Forage Genetic Resources, Institute of Animal Science, Chinese Academy of Agricultural Sciences (CAAS), Beijing, 100193, China. 9. ILRI International Livestock Research Institute (ILRI). P.O. Box 30709, Nairobi, 00100, Kenya. 10. Faculty of Veterinary Medicine, Utrecht University, Yalelaan 104, 3584 CM, Utrecht, The Netherlands. 11. National Engineering Laboratory for Animal Breeding, Key Laboratory of Animal Genetics and Breeding and Reproduction of MOA, College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China. yizhang@cau.edu.cn.
Abstract
Indigenous Chinese cattle combine taurine and indicine origins and occupy a broad range of different environments. By 50 K SNP genotyping we found a discontinuous distribution of taurine and indicine cattle ancestries with extremes of less than 10% indicine cattle in the north and more than 90% in the far south and southwest China. Model-based clustering and f4-statistics indicate introgression of both banteng and gayal into southern Chinese cattle while the sporadic yak influence in cattle in or near Tibetan area validate earlier findings of mitochondrial DNA analysis. Geographic patterns of taurine and indicine mitochondrial and Y-chromosomal DNA diversity largely agree with the autosomal cline. The geographic distribution of the genomic admixture of different bovine species is proposed to be the combined effect of prehistoric immigrations, gene flow, major rivers acting as genetic barriers, local breeding objectives and environmental adaptation. Whole-genome scan for genetic differentiation and association analyses with both environmental and morphological covariables are remarkably consistent with previous studies and identify a number of genes implicated in adaptation, which include TNFRSF19, RFX4, SP4 and several coat color genes. We propose indigenous Chinese cattle as a unique and informative resource for gene-level studies of climate adaptation in mammals.
Indigenous Chinese cattle combine taurine and indicine origins and occupy a broad range of different environments. By 50 K SNP genotyping we found a discontinuous distribution of taurine and indicinecattle ancestries with extremes of less than 10% indicinecattle in the north and more than 90% in the far south and southwest China. Model-based clustering and f4-statistics indicate introgression of both banteng and gayal into southern Chinese cattle while the sporadic yak influence in cattle in or near Tibetan area validate earlier findings of mitochondrial DNA analysis. Geographic patterns of taurine and indicine mitochondrial and Y-chromosomal DNA diversity largely agree with the autosomal cline. The geographic distribution of the genomic admixture of different bovine species is proposed to be the combined effect of prehistoric immigrations, gene flow, major rivers acting as genetic barriers, local breeding objectives and environmental adaptation. Whole-genome scan for genetic differentiation and association analyses with both environmental and morphological covariables are remarkably consistent with previous studies and identify a number of genes implicated in adaptation, which include TNFRSF19, RFX4, SP4 and several coat color genes. We propose indigenous Chinese cattle as a unique and informative resource for gene-level studies of climate adaptation in mammals.
China harbors around 10 million of indigenous cattle[1]. It is commonly referred as yellow cattle and divided into 53 indigenous breeds raised in various agro-ecological environments[2,3]. Their diversity and unique species composition emerged from a complex history. Domestic cattle spread to East Asia by at least two routes. Taurinecattle migrated from north Eurasia to northern China and northeast Asia between 5000 and 4000 BP[4]. This is supported by the evidence that ancient cattle from northern China, dated 4500 to 2300 BP, carried only taurine mtDNA haplotypes[5]. A unique mtDNA haplotype, T4, observed in East Asian cattle breeds[6-9] is a subtype of the common haplogroup T3[10], suggesting a founder effect in Chinese taurinecattle[11].Indicinecattle (zebu) migrated eastward from their domestication center in the Indus valley and entered China from the south since 3000 BP[4,12]. Southeast Asian and southern Chinese cattle are morphologically and genetically recognized as zebu[13,14]. Yue et al. provided evidence for an additional southwestern immigration route of zebu from India into northwest China[15].The taurine and indicinecattle migrations resulted in a morphological gradient from humpless taurinecattle in the north to humped indicinecattle in southern and southwestern China[3]. This has been confirmed by genetic studies using mtDNA[7,8,16,17] and Y-linked markers[18]. A genetic diversity study using microsatellite markers clustered Chinese indigenous cattle breeds into one taurine and four indicine groups[19].In addition to taurine and indicinecattle, several other bovine species have been living in southern China and Southeast Asia, including banteng (Bos javanicus), gaur (Bos gaurus) or gayal (Bos frontalis), which may have been the dominant cattle species until 4500 BP[4,12]. Gayal in Yunnan province of China carried indicine or taurine mtDNA but gaur Y chromosome, indicating its hybrid origin[20]. Meanwhile, in Tibetan Autonomous Region (TAR) of China, bidirectional introgression between yak and cattle has also been reported[21-24]. Genetic admixture has been identified between zebu and Bali cattle (domesticbanteng) in Indonesia[22,25]. A previous study on hair color and blood protein polymorphism provided evidence of banteng introgression into Hainan cattle in southeastern China[26], which was confirmed by genomic SNP array data[25].Genomic SNP array has become a powerful tool for population genomics studies in animals. A recent genomic variation study revealed a worldwide pattern of genetic admixture in domestic cattle[25]. Other studies focused on Creole[27], American[28], East African zebu[29] and Korean cattle[30]. These advanced approaches also allow the genomic localization of genes involved in the adaptation to natural or artificial selective constraints[27,31-38]. In the current study, we generated 50 K SNP genotypes to infer the fine-scale characterization of unique species composition of highly diverse Chinese cattle. In addition, we performed a whole genome-scan for adaptive differentiation and association analyses with environmental and morphological population-specific covariables to detect genes that responded to adaptive constraints.
Results
Genomic variation
Observed heterozygosity (Table 1) ranged from 0.145 to 0.327 in Chinese cattle populations (Fig. 1). Mongolian cattle (MG, NM) and Kazakh cattle had the highest values but southern and southwestern Chinese cattle populations were the lowest. This is most likely explained by the ascertainment bias, by which the heterozygosity of indicinecattle is underestimated. Indeed, the observed heterozygosity correlates negatively (r
2 = 0.96) with the zebu ancestry. A similar trend has previously been observed in West-African cattle[39]. Bali cattle (0.026), gayal (0.059) and yak (0.029) also have relatively low levels of heterozygosity as normally observed with SNP panels designed for a different species[40].
Table 1
Sampling information of different cattle populations and their observed heterozygosity (Ho).
Population
Abbr.
Geographical Region
N
Ho1
Source2
Yanbian
YB
Northern China
26
0.291
1
Mongolian_XilinGol
MG
Northern China
31
0.319
1
Mongolian_HulunBuir
NM
Northern China
24
0.317
1
Kazakh
KA
Northwestern China
21
0.327
1
Linzhi
LZ
Tibet Autonomous Region of China (TAR), China
19
0.285
1
Lhasa
LS
Tibet Autonomous Region of China (TAR)
14
0.305
1
Qinchuan
QC
Central China
32
0.297
1,2
Jinnan
JN
Central China
14
0.299
1
Nanyang
NY
Central China
23
0.252
1
Luxi
LX
Central China
16
0.258
1,2
Wannan
WN
Southeastern China
31
0.199
1
Wenling
WL
Southeastern China
31
0.177
1
Hainan
HN
Southeastern China
8
0.145
1,2
Liping
LP
Southwestern China
5
0.179
1
Enshi
ES
Southwestern China
31
0.237
1
Guanling
GL
Southwestern China
4
0.232
1
Honghe
HH
Southwestern China
12
0.248
1
Dengchuan
DC
Southwestern China
31
0.243
1
Banna
BN
Far southwestern China
14
0.186
1
Dehong
DH
Far southwestern China
16
0.161
1
Bangladesh Zebu
BD
Bangladesh
16
0.169
1
Gayal (Bos frontalis)
GAY
Bangladesh
21
n.a.
1
Yak (Bos grunniens)
YAK
Tibet Autonomous Region of China (TAR)
12
n.a.
1
Simmental
SIM
Germany
20
0.297
1
Hanwoo
HAN
Korea
8
0.288
2
Mongolian
MGL
Mongolia
5
0.301
2
Wagyu
WAG
Japan
12
0.248
2
Aceh
ACE
Indonesia
12
0.149
2
Pesisir
PES
Indonesia
6
0.142
2
Brebes
BRE
Indonesia
9
0.158
2
Madura
MAD
Indonesia
7
0.150
2
Sahiwal
SAHW
Pakistan
17
0.159
2
Gir
GIR
India
20
0.157
2
Bali cattle (Bos javanicus)
BAL
Indonesia
20
n.a.
2
Gaur (Bos gaurus)
GAUR
India
10
n.a.
7
Shorthorn
SH
England
20
0.257
2
Holstein
HOL
Netherlands
20
0.314
3
Brown Swiss
BSW
Switzerland
20
0.281
6
Limousin
LMS
France
20
0.309
6
Central Thailand
THC
Thailand
5
0.165
4
Northeast Thailand
THNE
Thailand
8
0.157
4
North Thailand
THR
Thailand
10
0.166
4
South Thailand
THS
Thailand
5
0.142
4
Kalmyk
KAL
Kalmyk, Russian
20
0.319
5
Yakut
YKT
Siberian, Russian
20
0.256
5
Total
756
1Ho was not analyzed for divergent bovine species (gayal, gaur, Bali, and yak) because the estimator is only informative for cattle.
2Source of data. 1) this study; 2) Decker et al.[25]; 3) Gautier et al.[80]; 4) Wangkunmhang et al. (2015); 5) Decker et al.[81]; 6) Matukumalli et al.[67]; 7) Decker et al.[69].
Figure 1
Geographical distribution of cattle populations. Detailed information is showed in Table 1. Map was created using R Project for Statistical Computing v. 3.3.1 (https://www.R-project.org) and packages rworldmap v. 1.3–6 (https://cran.r-project.org/web/packages/rworldmap/index.html), maps v. 3.2.0 (https://cran.r-project.org/web/packages/maps/) and mapproj v. 1.2–5 (https://cran.r-project.org/web/packages/mapproj/index.html). Package rworldmap v. 1.3–6 was used to generate outline and colorful dots while packages maps v. 3.2.0 and mapproj v. 1.2–5 were used to add texts. Different color indicates different geographical regions of population origins.
Sampling information of different cattle populations and their observed heterozygosity (Ho).1Ho was not analyzed for divergent bovine species (gayal, gaur, Bali, and yak) because the estimator is only informative for cattle.2Source of data. 1) this study; 2) Decker et al.[25]; 3) Gautier et al.[80]; 4) Wangkunmhang et al. (2015); 5) Decker et al.[81]; 6) Matukumalli et al.[67]; 7) Decker et al.[69].Geographical distribution of cattle populations. Detailed information is showed in Table 1. Map was created using R Project for Statistical Computing v. 3.3.1 (https://www.R-project.org) and packages rworldmap v. 1.3–6 (https://cran.r-project.org/web/packages/rworldmap/index.html), maps v. 3.2.0 (https://cran.r-project.org/web/packages/maps/) and mapproj v. 1.2–5 (https://cran.r-project.org/web/packages/mapproj/index.html). Package rworldmap v. 1.3–6 was used to generate outline and colorful dots while packages maps v. 3.2.0 and mapproj v. 1.2–5 were used to add texts. Different color indicates different geographical regions of population origins.
Population structure
Five methods were implemented in this part to explore the population structure.
PCA
Figure 2a shows a scatter plot of the first two principal components (PCs), allowing to assess the structuring of genetic diversity across all the 45 sampled populations, including outgroup species (Bali cattle, yak, gaur and gayal). The first PC accounting for 17.1% of total variation separates taurine and indicinecattle as well as other bovine species. The predominant taurine breeds from northern China and Tibet Autonomous Region of China (TAR) cluster with European breeds while populations from southeastern and far-southwestern China are closer to Indian zebu. The second component accounting for 3.1% of all variation displays the contrast of Chinese cattle to other bovine species (Bali, gayal and yak) and also differentiates European cattle breeds. The intermediate positions of LZ, MAD, BRE, Bali and gayal populations indicate gene flow between species.
Figure 2
PCA plots describing the relationships among populations. (a) Whole data set of all the 45 sampled populations. (b) A subset of data including only Asian cattle. Different colors were used to indicate different regions as Fig. 1.
PCA plots describing the relationships among populations. (a) Whole data set of all the 45 sampled populations. (b) A subset of data including only Asian cattle. Different colors were used to indicate different regions as Fig. 1.In an analysis of only Asian cattle (Fig. 2b), the first PC that explains 14.5% of the genetic variation again corresponds to the taurine-indicine separation, but the second PC that explains 1.45% of the genetic variation represents a gradient from India via Southeast Asia to Southeastern China.
Model-based clustering
Figure 3a shows an unsupervised hierarchical clustering performed on the whole data set (i.e., including outgroup species) with different values of K, the number of clusters. K = 2 reproduces the first PCA coordinate with taurine and indicine clusters and a taurine-indicine gradient from north to south. Higher values of K differentiate European from Asian taurine breeds (K = 3), zebu from other bovine species (K = 4), and southern Chinese from Asian indicinecattle (K = 5). Increasing K further generates separate clusters for European cattle breeds whereas Chinese cattle populations tend to show admixed ancestries. The cross-validation error gave the lowest value at K = 17 (Supplementary Fig. S1), which differentiates Bali cattle, gayal, yak, the European taurine breeds and indicinecattle from India/Pakistan, Southeast Asia and southeastern China, respectively.
Figure 3
Ancestries and population structuring of Chinese cattle revealed by (a) unsupervised Admixture analysis (K = 2, 3, 4, 5, 17) and (b) supervised Admixture analysis (K = 5). For supervised admixture analysis, five European cattle breeds (SH, HOL, LMS, SIM) were set to represent taurine ancestry whereas GIR and SAHW represented indicine ancestry. BAL, GAY and YAK were three outgroup bovine species. The putative hybrid animals in Bali and Gayal detected by unsupervised admixture analysis were excluded from the pure ancestry.
Ancestries and population structuring of Chinese cattle revealed by (a) unsupervised Admixture analysis (K = 2, 3, 4, 5, 17) and (b) supervised Admixture analysis (K = 5). For supervised admixture analysis, five European cattle breeds (SH, HOL, LMS, SIM) were set to represent taurine ancestry whereas GIR and SAHW represented indicine ancestry. BAL, GAY and YAK were three outgroup bovine species. The putative hybrid animals in Bali and Gayal detected by unsupervised admixture analysis were excluded from the pure ancestry.Figure 3b shows a supervised clustering with prior population information for taurine breeds (SH, HOL, BSW, SIM), indicine breeds (GIR, SAHW) and other bovine species. In this analysis, Bali cattle represents the banteng and the domesticgayal replaces the wild gaur since the latter was too inbred for this analysis. This analysis reproduces the zebu-banteng hybridization in Indonesia[41] and yak introgression into Tibetan cattle, but also suggests minor banteng and gayal into southeastern China (Supplementary Fig. S2). Supplementary Figure S3 shows the geographic distribution of the inferred species components.
NeighborNet network
In a NeighborNet graph constructed from the matrix of Reynolds’ distances between populations using Splitstree (Fig. 4), European and Indian cattle are at the extreme ends of the network, which is entirely in agreement with both the first PCA coordinate and the K = 2 clustering. Figure 4 also reproduces the intermediate positions of the predominantly taurine-indicine breeds from TAR or central China and also of the predominantly indicine breeds from central or southwestern China. In addition, the network confirms the affinity of Indonesian and southeastern Asian continental breeds with Bali cattle and gayal.
Figure 4
NeighborNet graph of 44 cattle populations. An allele frequency-dependent distance metric (Reynolds) was used to construct the NeighborNet. Different colors were used to indicate different regions as Fig. 1.
NeighborNet graph of 44 cattle populations. An allele frequency-dependent distance metric (Reynolds) was used to construct the NeighborNet. Different colors were used to indicate different regions as Fig. 1.
Population mixture
The taurine-indicine mixed composition of Chinese cattle was confirmed by the sensitive f4 ratio test (Supplementary Table S1). The estimated taurine ancestry ranges from close to 0.0 in southeastern China to close to 1.0 in north, with intermediate values ranging from 0.282 to 0.760, from 0.240 to 0.320 and from 0.656 to 0.770 in central China, southwestern China and TAR, respectively. This pattern was highly correlated with the average memberships estimated by model-based clustering (r = 998, Supplementary Table S1).As shown in Fig. 5a, negative values of the four-population f4-statistics (GIR, X; BAL,YAK) suggests gene flow from Bali to indicine populations in southeastern China and Southeast Asia. This gene flow has clearly been more consequential for the Indonesian cattle breeds, MAD, BRE and PES as well as the southern Chinese breeds LP, HN, WN and WL. For the Indonesian breeds this confirms previous results[25,41]. Although Bali cattle are relatively closely related to gaur, replacing Bali cattle by gaur as source of admixture generates only a moderately negative value for the Indonesian breeds (Fig. 5b). This is even observed for Bali cattle as a test breed and may reflect the inbreeding of the gaur samples. However, the same plot shows relatively low (i.e., negative, indicating gene flow) values for the southern Chinese breeds. In combination with the supervised model-based clustering (Fig. 3b), this may suggest that these breeds have been introgressed by gaur and/or gayal in addition to banteng, the wild ancestor of Bali cattle.
Figure 5
Visualization of f4-statistics. (a) f4-statistics of the form D (GIR, X; BAL, YAK) identified gene flow from Bali to other breeds. (b) f4-statistics of the form D (GIR, X; GAU, YAK) identified gene flow from Gaur to other breeds. The whiskers represent the standard error. Different colors were used to indicate different regions as Fig. 1.
Visualization of f4-statistics. (a) f4-statistics of the form D (GIR, X; BAL, YAK) identified gene flow from Bali to other breeds. (b) f4-statistics of the form D (GIR, X; GAU, YAK) identified gene flow from Gaur to other breeds. The whiskers represent the standard error. Different colors were used to indicate different regions as Fig. 1.A f4(SH,X;GAY,YAK) plot (Supplementary Fig. S4A) generated negative values for the indicine GIR, SAHW and BD as test breeds. Since this is not observed with the wild gaur instead of the domesticgayal (Supplementary Fig. S4B), this indicates indicine introgression into the domesticgayal population. This is confirmed by two other observations:The statistic f4(GAU,GAY;X,YAK) is negative for all test breeds, but clearly more negative for indicine than for taurine breeds (Supplementary Fig. S4C).f4(GIR,BAL;GAY,YAK) is positive but f4(GIR,BAL;GAU,YAK) is not (Supplementary Fig. S4D and Fig. 5B). The same patterns are also observed if GIR is replaced by SH (Supplementary Fig. S4A,B), which is phylogenetically close to the indicine GIR. Apparently the allele sharing of GAY with GIR or SH outweighs the allele sharing expected because of the phylogenetic relationship of gayal and Bali cattle, which most likely is an effect of the ascertainment bias of the SNP panel towards taurine breeds[42].
Uniparental markers
Supplementary Tables S2 and S5 show mtDNA and Y-chromosomal haplotype distributions of Chinese cattle breeds, based on our data supplemented by literature data (Supplementary Table S3). These uniparental markers show a north-to-south taurine-indicine gradient that resembles closely the autosomal cline (Fig. 6). Haplotype diversity (Supplementary Table S2 and Fig. S5C) shows that the diversity of taurine mtDNA hardly decreased from north to south, while indicine mtDNA clearly decreases from southeastern and southwestern China to northern China.
Figure 6
Percentages of indicine mtDNA and Y-chromosomes per breed plotted against the breed average of the autosomal indicine component as derived from the Admixture analysis.
Percentages of indicine mtDNA and Y-chromosomes per breed plotted against the breed average of the autosomal indicine component as derived from the Admixture analysis.
Detection of genomic regions subjected to adaptive constraints
For a genome-wide scan for adaptive differentiation, the XtX differentiation statistics was estimated for each SNP under the so-called core model and visualized in a Manhattan plot (Fig. 7a). Among 84 significant SNPs, SNP Hapmap28985-BTA-73836 on BTA5 was the most significant.
Figure 7
Whole genome scan for adaptive divergence and association analyses in 20 Chinese cattle breeds. (a) Manhattan plot of the XtX statistics. (b) Manhattan plot of the BFmc (association with environmental PC1). (c) Manhattan plot based on the BFmc (association with morphological PC1).
Whole genome scan for adaptive divergence and association analyses in 20 Chinese cattle breeds. (a) Manhattan plot of the XtX statistics. (b) Manhattan plot of the BFmc (association with environmental PC1). (c) Manhattan plot based on the BFmc (association with morphological PC1).The three synthetic environmental covariables are associated with 185 significant SNPs (Supplementary Table S4), from which SNP Hapmap44345-BTA-119580 on BTA11 was the most significant (Fig. 7b). From 30 SNPs associated with morphological covariables (Supplementary Table S4), SNP ARS-BFGL-NGS-67505 on BTA25 was most significant (Fig. 7c).By applying the sliding window approach, 27 significant genomic regions were obtained (in Table 2), of which eight showed significantly differentiated SNPs; 22, 0 and 4 displayed SNPs associated with the first, the second and the third environmental co-variable, respectively; and 4 displayed SNPs associated with the morphological covariable. Note that only 3 regions (out of the 8) containing significantly differentiated SNPs did not contain any SNPs associated with the population-specific covariables studied. Finally, a total of 28 candidate genes were annotated in the significant regions using UCSC (https://genome.ucsc.edu/) (Table 2).
Table 2
Regions consisting of overlapping windows each containing at least one SNP with XtX score >32.2 or at least one Bfmc score >15 identified by BayPass whole genome scan.
Position
XtX
Bfmc
Closest RefSeq gene
env. PC1
env. PC2
env. PC3
morpho. PC1
BTA02: 61.5–62.5
—
61.89 (25.50; 2)
—
—
—
LCT (61.87–61.92)
BTA03: 33.0–34.5
—
33.98 (15.14; 2)
—
—
—
AMPD2 (33.93–3394)
BTA03: 46.5–48.0
—
47.34 (24.04; 2)
—
—
—
PAPSS1 (47.27–47.67)
BTA03: 90.0–91.5
—
90.55 (22.17; 3)
—
—
—
PLPP3 (90.19–90.27)
BTA04: 30.0–31.0
—
30.04 (22.70; 2)
—
—
—
SP4 (30.32–30.41)
BTA05: 17.5–19.0
—
18.02 (35.87; 2)
—
—
—
TMTC3 (17.99–18.05)
BTA05: 69.5–71.0
70.34 (64.00; 2)
70.31 (17.19; 1)
—
70.34 (18.64; 1)
70.34 (26.72; 3)
RFX4 (70.22–70.39)
BTA06: 17.5–19.0
18.29 (44.13; 2)
—
—
—
—
LEF1 (18.33–18.45)
BTA06: 39.5–41.5
39.84 (40.33; 3)
41.25 (17.72; 1)
—
39.84 (25.61; 1)
40.33 (20.22; 1)
SLIT2 (41.24–41.64)
BTA06: 46.0–47.5
—
46.79 (26.40; 2)
—
—
—
SELL1L3 (46.79–46.88)
BTA06: 73.5–75.0
—
74.35 (22.14; 2)
—
—
—
IGFBP7 (74.07–74.15)
BTA06: 105.0–106.5
—
105.63 (16.52; 2)
—
—
—
STK32B (105.52–105.89)
BTA08: 59.5–60.5
—
—
—
—
60.00 (16.98; 2)
UNC13B (59.79–60.01)
BTA08: 81.0–82.0
81.57 (34.93; 1)
81.57 (23.66; 2)
—
—
—
GAS1 (81.50–81.51)
BTA08: 111.5–113.0
—
112.13 (22.78; 2)
—
—
—
PHF19 (112.14–112.16)
BTA10: 31.0–32.5
31.66 (33.71; 2)
—
—
—
—
COPS5 (31.70–31.70)
BTA11: 9.00–10.5
—
9.91 (20.55; 2)
—
—
—
SEMA4F (9.94–9.97)
BTA11: 22.5–24.5
—
23.09 (45.17; 3)
—
23.80 (16.19; 1)
—
NA
BTA12: 34.0–35.5
—
34.68 (20.72; 2)
—
—
—
TNFRSF19 (34.67–34.76)
BTA12: 78.0–79.0
—
78.61 (16.86; 2)
—
—
—
RAP2A (78.51–78.55)
BTA13: 45.5–48.5
47.02 (44.22; 10)
46.08 (28.49; 1)
—
48.09 (45.17; 1)
—
DIP2C (46.89–47.18); PROKR2 (47.91–4792)
BTA13: 50.5–52.0
51.42 (36.09; 2)
—
—
—
50.7 (16.42; 2)
ADRA1D (51.37–51.39); UBE2E3 (50.38–50.65)
BTA13: 53.0–54.5
53.81 (39.73; 2)
—
—
—
—
SIRPB1 (53.90–53.94)
BTA15: 52.0–53.5
—
52.80 (25.87; 2)
—
—
—
CLPB (52.66–52.80)
BTA20: 67.5–69.0
—
68.40 (17.23; 2)
—
—
—
ADAMTS16 (67.94–68.14)
BTA21: 27.5–29.0
—
28.20 (27.59; 2)
—
—
—
KLF13 (28.22–28.22)
BTA24: 39.5–40.5
—
40.49 (20.22; 2)
—
—
—
LAMA1 (40.35–40.51)
For each XtX or Bfmc test, the table gives the peak position in Mb as well as the peak statistics value and the number of SNPs in parentheses with a test value above the corresponding threshold. Dash (—) indicates non-significant results.
Regions consisting of overlapping windows each containing at least one SNP with XtX score >32.2 or at least one Bfmc score >15 identified by BayPass whole genome scan.For each XtX or Bfmc test, the table gives the peak position in Mb as well as the peak statistics value and the number of SNPs in parentheses with a test value above the corresponding threshold. Dash (—) indicates non-significant results.
Discussion
We have investigated the species composition and genomic, mitochondrial as well as Y-chromosomal variation in Chinese cattle populations. A clear north-south gradient of taurine and indicinecattle ancestries combined with banteng, gayal and yak introgressions into southern and southwestern Chinese cattle populations defines the pattern of admixture among Chinese indigenous cattle, which are supposed to underlie local breeding objectives and adaptation to different agro-ecological environments. Genome scan for adaptive differentiation and association with population-specific covariable identify regions and candidate genes relevant for environmental adaption and morphological differentiation in Chinese cattle.Previous genetic diversity studies using mtDNA and Y-linked markers have characterized a north-south gradient of taurine and indicine admixture in Chinese cattle, which is consistent with the transition from humpless to humped morphology[7,8,16-18]. A microsatellite study differentiated five groups of Chinese indigenous cattle breeds[19].Our PCA (Fig. 2) and model-based clustering (Fig. 3, Supplementary Fig. S3) patterns as well as the NeighborNet graph (Fig. 4) reveal a clear transition from taurinecattle in the north to zebu in the south with consistent admixture levels within the breeds and a clear demarcation:Cattle from Manchuria, Inner Mongolia and northwestern China with 4 to 8% indicine admixture. This group corresponds to the taurine type 1[19] (this type also comprises Tibetan cattle) and to Group 9B in the Felius classification[43].Taurindicinecattle in TAR and northern China above the Yellow or Wei River with 28 to 35% indicine admixture, denoted as zebu type 2[19] and belonging to the Huanghuai group of central Chinese cattle (Group 10A)[43].Taurindicinecattle below the Yellow River with 61 to 73% indicine genome, corresponding to indicine types 1 and 4[19]. The northernmost LX, JX and NY breeds belong to the Huanghai group (Group 10A)[40], but the other breeds to the Changzu group (Group 10B)[43].Predominant indicinecattle in southern China with an indicine genomic component of 79 to 87% (zebu type 3[19], Group 10B[43]).Indicinecattle with a >90% indicine genome (also indicine type 3[19], Group 10B[43]).Cattle from Northeast Asia and northern China have typical taurine morphological features. Its genetic distinctiveness and old origin evolved from the unique mtDNA haplogroup T4 found in modern breeds from Japan, Korea, Mongolia[6,44], Siberia (Yakut cattle)[9], northern China[7,8] (Supplementary Fig. S5A) and in ancient cattle in northern China from 2300–4500 BP[5]. In addition, a separate position of Japanese and Korean cattle was found in previous study[25]. In this study, we found that northern Chinese, South Korean and Japanese cattle share genetic ancestry with the Siberian Yakut and the indigenous northeastern China close to Korea. These breeds represent the eastern range of Turano-Mongolian cattle, which have retained their original dark-brown coat color pattern in Mongolian and Korean cattle.The admixture pattern, however, suggests European cattle influence to northern Chinese cattle diversity. Possibly migrations of nomads in the steppes of Central Asia and Mongolia, which in the Middle Ages led to the establishment of the Mongolian empire, facilitated eastward as well as westward gene flow across Eurasia. Additionally, in the past few decades programs have been implemented in China to upgrade productivity by crossing local breeds with European breeds[3,12]. Influence of European cattle was captured by model-based clustering analysis, e.g. Brown Swiss and Simmental in Kazakh, Holstein in LS, and Limousine in LX, JN and WL (Fig. 3).From north to south, levels of taurine autosomal, mitochondrial and Y-chromosomal DNA decrease, but are still appreciable in southwestern China (Supplementary Figs S3 and S5). The occurrence of indicine mtDNA in mixed taurine-indicinecattle is a unique feature of Chinese breeds[11]. The high taurine mtDNA diversity in southern China (Supplementary Fig. S5C) indicates an absence of a major founder effect. This is more compatible with immigration before the arrival of indicinecattle around 3000 BP than with a later introgression into an existing indicine population. The immigration of zebus from the present Myanmar and Indochina to the north with a plausible contribution of an eastward gene flow in western China[15] resulted in significant indicine components in northern Chinese and Mongolian cattle (Supplementary Figs S3 and S5). The pattern of indicine mtDNA diversity (Supplementary Fig. S5C) suggests a population bottleneck when they crossed the Pearl River in southeastern China.Y-chromosomal and autosomal indicine components correlate well (Fig. 6). Exceptions are the Guanlin (GL) and Nanyang (NY) breeds with relatively low indicine Y-chromosomal frequencies. Remarkably, the indicine autosomal component has a discontinuous distribution (Figs 2–4; Supplementary Fig. S3) with the largest gap across the Yellow River separating JN (29% zebu) from ES (61%) and NY (64%). This river might be a physical barrier of gene flow, but this cannot explain the absence in our panel of cattle with an indicine component between 35% and 61%. It might be hypothesized that in cattle with equal taurine and indicine components, outbreeding depression outweighs heterosis, for instance by incompatibility of genes from different origins conferring fitness[45]. It is again remarkable that cattle at both sides of the river are categorized as belonging to the Huanghuai group and resemble western Asian and African cattle because of their similar cervico-thoracic hump[2].The NeighborNet shows that indicinecattle from far southwestern China (DH, BN), which are found in the west of the Mekong River, are more closely related to Thai cattle than to southeastern Chinese cattle. This is also supported by the PCA and admixture patterns (Figs 2 and 3). The separate position of Thai cattle confirms the results of previous study[14]. The Mekong River acted also as a genetic barrier for swamp buffalo[46].Another potential source of diversity in southern Chinese cattle are the introgressions from other bovine species living in China and Southeast Asia, including yak, gayal and its wild ancestor gaur[43], and banteng, represented by its domestic relative Bali cattle. (Fig. 3; Supplementary Fig. S2). There are several examples of hybridization of different bovine species. Yak mtDNA has been detected in indigenous cattle distributed on the Qinghai-Tibetan plateau and in Diqing cattle (DQ) of Yunnan province[21,23]. We did not detect yak mtDNA in our cattle panel, but we detected influence of yak in the Tibetan LZ population based on the model-based clustering analysis.A study of blood protein polymorphism[26] suggested banteng ancestry in Hainan cattle. Using mtDNA and microsatellite genotyping, Mohamad et al.[47] characterized banteng admixture in Indonesian zebu breeds. This was confirmed by 50 K SNP analysis[25], which also detected a low level of banteng introgression in southern Chinese breeds. However, analyzing Chinese cattle together with both banteng (represented by its domestic derivative Bali cattle) and gaur (the wild ancestor of gayal) with model based clustering (Fig. 3) and f4-statistics (Fig. 5) provided consistent evidence of both gayal and banteng introgression into WL, WN and HN breeds from southeastern China and also into Thai zebu at a relatively low proportion.Conversely, similar f4-statistics (Supplementary Fig. S4) suggested introgression of zebu into gayal, which has been confirmed in the gayal population from Yunnan carrying both indicine and taurine mitochondrial genomes[20]. Similarly, cattle introgression has been detected in yak populations[24].The genetic variation described above reflects the combined effect of prehistoric immigrations of taurine and indicinecattle, subsequent gene flow between populations, local selection objectives and environmental adaptation. Indigenous Chinese cattle with indicine-taurine ratios varying between zero and one and subject to a broad range of climates is a valuable resource to identify potential genomic regions and functional genes underlying the environmental adaption. By combining signals of population differentiation (XtX) and association with three synthetic environmental covariables and one synthetic morphological covariable (Supplementary Fig. S6), we identified 27 genomic regions and 28 candidate genes targeted by natural or artificial selection (Table 2).Interestingly, 12 out of the 27 regions overlap with core selective sweep (CSS) regions[48], while 20 and 23 regions overlap with breed-wise and breed group-wise hotspots of selective sweeps, respectively[49] (Supplementary Table S5). However, we report for the first time the region BTA12: 34.0–35.5 Mb, which harbors TNFRSF19. This member of the TNF-receptor superfamily[50] is highly expressed during embryonic development[51].In other studies, strong signals of selection in tropical cattle have been detected on BTA5[32,39,52,53]. Notably, Porto-Neto et al.[32] identified a 20 Mb region on BTA5 with effects on parasite resistance, yearling weight, body condition score, coat color and penile sheath score. We found a significant signature of selection for XtX and the environmental PC1 and PC3 as well as the morphological PC1 on BTA5 region 69.5–71.0 (Table 2), which contains the candidate gene RFX4. This gene is a member of Regulatory Factor X (RFX) family of transcriptional regulators that influence MHC class II expression[54] and play a critical role in brain development[19,55]. It was also found to affect heifer fertility in tropical composition breed Brangus[56].Coat color is an important target of selection in many domestic animals. The common denotation of yellow cattle for the indigenous Chinese cattle refers to its predominant light to dark brown color. In current study, selection signatures were identified near several known color genes, including KITLG (near SNP BTA-74300-no-rs on BTA5)[57,58], and LEF1
[32,59] (here indicated by a peak in the XtX GWAS on BTA6). These genes and another candidate gene MCM6 (near ARS-BFGL-NGS-92772 on BTA2, also identified by Hudson et al.[53]) overlap with pigmentation QTL regions underlying UV-protection[60]. The environmental PC1 signal near IGFBP7 and the combined XtX-morphological signal near ADRA1D (Table 2) are close to the coat color genes KIT
[60,61] and ATRN
[60,62], respectively.We further detected an environmental PC1 association signal near SP4 (Sp4 transcription factor) as novel candidate gene on BTA4. Interestingly, this signal is overlapped with a selection signature region in African cattle[34]. SP4 is a member of the Sp1-family of zinc finger transcription factors and is required for normal murine growth, viability, and male fertility[63]. In cattle, SP4 was suggested to have effect on body size and testicular growth from birth to yearling age[64]. In human, diseases associated with SP4 include bipolar disorder[65] and schizophrenia[66].It is interesting to note that Chinese and African cattle have developed independently a variable taurine-indicine ancestry following a gradient from tropical to temperate climates. An attractive opportunity is a detailed comparison of gene variants involved in climate adaptation by using whole genome sequence data[34]. It may be anticipated that in both regions adaptation to agro-ecological constraints is mediated by recruiting and combining gene variants from taurine and indicine origins with possible original contributions in Chinese indigenous cattle from the indicine mtDNA, and the minor gayal, and banteng and yak genomic ancestry.
Methods
Ethics statement
The protocols for collection of the blood and hair samples of experimental individuals were reviewed and approved by the Institutional Animal Care and Use Committee (IACUC) at China Agricultural University. All experiments were performed in accordance with approved relevant guidelines and regulations.
Samples collection and genotyping
We collected samples of 437 animals from 24 populations (Table 1), twenty of which are indigenous cattle populations from northeastern China, central China, southeastern China, southwestern China, far southwestern China and or TAR (Fig. 1). We also examined Bangladeshi cattle and German Simmental, and two related bovine species, the gayal and yak. Samples were genotyped with Illumina BovineSNP50 BeadChip using standard procedures[67]. Genotypes are accessible via the WIDDE repository (http://widde.toulouse.inra.fr/widde/).We compared these newly generated data with published genotypes of some European and Asian cattle breeds, Bali cattle and gaur (Table 1). The combined data set comprises 37,429 SNPs. Using PLINK[68], we removed SNPs with call rates <90% or with minor allele frequencies <0.001 and discarded individuals with 10% missing genotypes. The resulting data set contained 36,872 SNPs and 736 animals from 44 populations representing taurinecattle, zebu, and three species related to cattle (Bos javanicus - banteng, Bos gruniens - yak and, Bos frontalis - gayal). Gaur (Bos gaurus)[69] was used instead of gayal in f4 analysis (see below) of the species composition of Chinese cattle because of the indicinezebu introgression into gayal.
Population genetic analysis
We used PLINK[68] to calculate the observed homozygosity for each population. Three complementary methods were used to analyze the genetic diversity among populations. First, a Principal Component Analysis (PCA) was carried out to investigate the pattern of genetic differentiation among populations and individuals using the R package SNPRelate[70], which performs eigen-decomposition of the genetic covariance matrix to compute the eigenvalues and eigenvectors. Second, population structure was evaluated by unsupervised and supervised model-based hierarchical clustering implemented in the Admixture software[71]. The results were visualized using the program Distruct[72]. Third, a NeighbourNet network was constructed using Reynold’s distances between populations using Splitstree 4.13[73].To investigate species composition of Chinese cattle, we used the four-population test (f4-statistics) implemented in ADMIXTOOLS[74]. Additionally, taurine and indicine ancestries in Chinese cattle populations were quantified via the f4 ratio estimation in ADMIXTOOLS, which allows inference of the admixture proportions without access to accurate surrogates for the ancestral populations[74]. The proportion of taurine ancestry was then computed asin which O is an outgroup (BAL), B a reference taurinecattle (YKT), C an Indian zebu cattle (GIR), A a population related to B (SH), and X the Chinese target population. Standard errors were computed with the Block Jackknife procedure in ADMIXTOOLS using default options[74].
Mitochondrial DNA and Y-chromosomal markers
A 445-bp mtDNA control region was amplified and sequenced as described previously (GenBank accession codes KY682307-KY682687)[20]. A total of 381 newly generated sequences were analyzed together with published sequences (Supplementary Tables S2 and S3). Haplotype diversity of the segment 16,023–16,262 (numbering according to GenBank accession no. V00654) was computed using the software DnaSP[75]. Y-chromosomal genotyping was carried out for 140 samples (Table S2) with the protocol described by Bonfiglio et al. (2012)[76], which differentiates Y1 (dominant in north Europe) and Y2 (dominant in other taurinecattle) and Y3 (indicinecattle) type Y chromosome[77].
Genome-scan for adaptive differentiation and association with environmental and morphological covariables
Whole genome-scans for adaptive differentiation and association with population-specific co-variables were performed with BayPass 2.1[57]. The underlying models explicitly account for the covariance structure among the population allele frequencies, which make the approach particularly robust to complex demographic histories[57]. Identification of overly differentiated SNPs was based on the XtX statistics[57,78] estimated under the core model of BayPass. To calibrate the XtX’s, a pseudo-observed data set (POD) containing 250,000 SNPs simulated under the inference model with hyperparameters equal to those estimated on the real data set was generated and further analyzed under the same conditions following the procedure described in Gautier[57]. In particular, we ensured that the posterior estimate of the scaled covariance matrix of population allele frequencies (Omega) obtained with the POD was similar to that obtained on the real data since the FMD distance between the two matrices was found equal to 0.28[57]. Similarly, the posterior means of the two hyperparameters a and b for the Beta distribution of across population allele frequencies obtained on the POD (a = b = 1.02) were almost equal to the ones obtained in the original data (a = b = 1.00). Taken together, these sanity checks indicated that the POD faithfully mimics the real data set, allowing us to define a 0.1% significance threshold on the XtX statistics (XtX = 32.3) to identify genomic regions harboring footprints of selection.We collected values for six environmental covariables, i.e. average temperature, average relative humidity, sunshine, average air pressure, wind speed, and precipitation from China Meteorological Administration (http://data.cma.cn/). Values for 10 morphological covariables, i.e. male body weight, female body weight, male height, female height, male body length, female body length, male heart girth, female heart girth, male fore-shank circumference, and female fore–shank circumference were provided for 17 out of the 20 studied indigenous breeds in National Commission of Animal Genetics Resources[3], but these data were not available for the populations LZ, BN and HH. We carried out a PCA on scaled variables for environmental and morphological co-variables separately. The first three environmental PCs and the first morphological PC were retained as uncorrelated co-variables for association studies (Supplementary Tables S6 and S7).Genome-wide analysis of association with population specific co-variables was carried out using the default options of the AUX model parameterized with the scaled covariance matrix (Omega) obtained on the real data set as described above. This model allows to account explicitly for multiple testing issue by integrating over (and estimating) the unknown proportion of SNPs actually associated with a given covariable. The support for association of each SNP with each co-variable was evaluated by computing Bayes Factor (BF) and a BF > 15 is considered as decisive evidence for association[57].As a matter of expedience, we applied a sliding window approach to identify the main genomic regions of interest as described in previous study[57]. Briefly, the UMD3.1 bovine genome assembly[79] on which all the SNPs were mapped was first split into 4718 consecutive 1-Mb windows (with a 500-kb overlap). For each window, we counted the number ns of SNPs that were either significantly differentiated at the 0.1% threshold (i.e., with XtX > 32.3) or associated (BF > 15) with at least one of the four population-specific covariables. Windows with ns > = 2 were deemed significant and overlapping windows were further merged.Supplementary Figures S1-S6 and Tables S1, S2, S5, S6.Supplementary Table S3Supplementary Table S4
Authors: K Mohamad; M Olsson; G Andersson; B Purwantara; H T A van Tol; H Rodriguez-Martinez; B Colenbrander; J A Lenstra Journal: Reprod Domest Anim Date: 2012-01 Impact factor: 2.005
Authors: Nicholas J Hudson; Laercio R Porto-Neto; James Kijas; Sean McWilliam; Ryan J Taft; Antonio Reverter Journal: BMC Bioinformatics Date: 2014-03-07 Impact factor: 3.169
Authors: Holly R Ramey; Jared E Decker; Stephanie D McKay; Megan M Rolf; Robert D Schnabel; Jeremy F Taylor Journal: BMC Genomics Date: 2013-06-07 Impact factor: 3.969
Authors: Kusdiantoro Mohamad; Mia Olsson; Helena T A van Tol; Sofia Mikko; Bart H Vlamings; Göran Andersson; Heriberto Rodríguez-Martínez; Bambang Purwantara; Robert W Paling; Ben Colenbrander; Johannes A Lenstra Journal: PLoS One Date: 2009-05-13 Impact factor: 3.240
Authors: Marco Todesco; Mariana A Pascual; Gregory L Owens; Katherine L Ostevik; Brook T Moyers; Sariel Hübner; Sylvia M Heredia; Min A Hahn; Celine Caseys; Dan G Bock; Loren H Rieseberg Journal: Evol Appl Date: 2016-02-22 Impact factor: 5.183
Authors: Andrey A Yurchenko; Hans D Daetwyler; Nikolay Yudin; Robert D Schnabel; Christy J Vander Jagt; Vladimir Soloshenko; Bulat Lhasaranov; Ruslan Popov; Jeremy F Taylor; Denis M Larkin Journal: Sci Rep Date: 2018-08-28 Impact factor: 4.379
Authors: Mario Barbato; Michael P Reichel; Matilde Passamonti; Wai Yee Low; Licia Colli; Rick Tearle; John L Williams; Paolo Ajmone Marsan Journal: PLoS One Date: 2020-04-09 Impact factor: 3.240