Literature DB >> 26691338

Yak whole-genome resequencing reveals domestication signatures and prehistoric population expansions.

Qiang Qiu1, Lizhong Wang1, Kun Wang2, Yongzhi Yang1, Tao Ma2, Zefu Wang1, Xiao Zhang1, Zhengqiang Ni1, Fujiang Hou1, Ruijun Long1, Richard Abbott3, Johannes Lenstra4, Jianquan Liu1,2.   

Abstract

Yak domestication represents an important episode in the early human occupation of the high-altitude Qinghai-Tibet Plateau (QTP). The precise timing of domestication is debated and little is known about the underlying genetic changes that occurred during the process. Here we investigate genome variation of wild and domestic yaks. We detect signals of selection in 209 genes of domestic yaks, several of which relate to behaviour and tameness. We date yak domestication to 7,300 years before present (yr BP), most likely by nomadic people, and an estimated sixfold increase in yak population size by 3,600 yr BP. These dates coincide with two early human population expansions on the QTP during the early-Neolithic age and the late-Holocene, respectively. Our findings add to an understanding of yak domestication and its importance in the early human occupation of the QTP.

Entities:  

Mesh:

Year:  2015        PMID: 26691338      PMCID: PMC4703879          DOI: 10.1038/ncomms10283

Source DB:  PubMed          Journal:  Nat Commun        ISSN: 2041-1723            Impact factor:   14.919


Domestication of livestock species was a key factor in triggering the socioeconomic transition in humans from a hunter–gatherer lifestyle to one of nomadic pastoralism or agricultural settlement12. This process occurred first in the Middle East ∼11,000 years ago and later in other parts of the world. The Qinghai-Tibet Plateau (QTP) is the world's largest and highest plateau with an area of ∼2.5 million square kilometres and an average elevation of 4,200 m. Humans are known to have colonized this vast area of Asia by at least 20,000 years ago and subsequent large-scale human population expansions occurred during the early Neolithic (10,000–7,000 years before present (yr BP)) and late Holocene (4,000–3,000 yr BP)345. The bovine yak species is endemic to this region. Historical records and archaeological evidence suggest that yak pastoralist societies were established in the QTP by ∼4,500 yr BP (refs 6, 7) and previous analyses of mitochondrial DNA variation indicate that yaks were domesticated during the early Neolithic period, some time between 6,000 and 12,000 yr BP (refs 8, 9). Since then, yak has become the mainstay of Tibetan pastoral society and >14 million domestic yaks are currently kept on the QTP, providing food, shelter, fuel and transport for the indigenous human population1011. The ancestral wild yak population is threatened, but still exists with regular gene flow occurring between wild and domestic populations (Supplementary Note 1). Because yak domestication preceded the development of a human pastoral lifestyle in the QTP, a plausible hypothesis is that yak domestication was closely associated with early human population expansion in the region. To examine the domestication of yak in more detail, we compare the genomes of wild and domestic yaks and investigate genetic changes underlying domestication. We use coalescent modelling to date yak domestication and population expansions more precisely than can be inferred from available archaeological and palaeontological evidence, and relate our findings to information on the prehistoric development of human society on the QTP.

Results

Genome resequencing and genetic variation

We analysed genome sequences from 13 wild yaks, representing three highly diverged mitochondrial lineages89, and 59 domestic yaks from different locations on the QTP (Fig. 1a) representing 48 animals from unselected landraces (D2 population) as well as 11 Tianzhu white yaks (D1 population), which since 130 years ago1112 have been bred by strict selection of coat colour (Fig. 1b). Genome resequencing accomplished an average depth of 6.7 × and average genome coverage of 98% (Supplementary Table 1). We detected a total of 14.56 million high-quality single nucleotide polymorphisms (SNPs), most of which (76.4%) are located in intergenic regions (Supplementary Table 2).
Figure 1

Phylogenetic and population genetic analyses of wild and domestic yaks.

(a) The geographic distribution of the sampling locations for wild (dark red) and domestic (dark blue) yaks. The coloured areas indicate geographic distribution of wild yaks (light red), Tianzhu county (green) and the QTP (grey). (b) A neighbour-joining phylogenetic tree constructed using whole-genome SNPs data. The scale bar represents level of similarity. W: wild yaks; D1, the Tianzhu white breed; and D2: all of the remaining domestic yaks. (c) Principal component (PC) analysis plots of the first two components. The fraction of the variance explained is 3.24% for PC1 and 1.36% for PC2, with Tracy–Widom P<10−44 (Supplementary Table 3). (d) Population structure plots with K=2–5. The y axis quantifies the proportion of the individual's genome from inferred ancestral populations, and x axis shows the different populations. Geographic information is provided in Supplementary Table 1. (e) Decay of linkage disequilibrium of D1, D2 and W populations measured by r2.

Genetic changes underlying domestication

To examine the genome-wide relationships and divergence between wild and domestic yak populations, we visualized pairwise genetic distances in a neighbour-joining tree (Fig. 1b, consensus tree based on 1,000 bootstrap replicates shown in Supplementary Fig. 1). This revealed a clear split between wild and domestic yaks despite continuing gene flow between them, and also separation of Tianzhu white yaks within the domestic population (Supplementary Note 1). Principal component analysis as well as model-based clustering yielded similar results (Fig. 1c,d; Supplementary Note 1; and Supplementary Table 3). Domestication often reduces effective population size (Ne) and genetic diversity1314; but we obtained similar sequence diversity (π) values of 0.0013 and 0.0014 for wild and domestic yaks, respectively (Supplementary Fig. 2 and Supplementary Table 4). We further found that the estimated population-differentiation statistic (FST) between wild and domestic yaks is only 0.058 (Supplementary Table 4), which is smaller than between taurine and zebu cattle or between diverged taurine cattle breeds15. FST estimates supported the gene flow occurring between wild and domestic yaks (Supplementary Note 1). We detected genomic regions that have been subject to selection as inferred from high wild/domestic π log-ratios and an extreme divergence of allele frequencies of wild and domestic yaks1617 (Fig. 2a and Supplementary Fig. 3). We identified 182 potential selective-sweep regions with an average size of 79.5 kb, together comprising around 14.5 Mb or 0.54% of the assembled genome. The role of these regions is confirmed by significantly lower values of Tajima's D and higher linkage disequilibrium patterns (P values 2.7 × 10−12 and 1.5 × 10−4, respectively, Wilcoxon rank-sum test, Supplementary Note 2) in domestic populations. These regions harbour 209 annotated protein-coding genes (Supplementary Data 1), which are expected to represent targets of selection. Among these, GO group GO:0051969 (regulation of transmission of nerve impulse) was overrepresented (P<0.05, Supplementary Table 5 and Supplementary Data 1) with eight genes affecting synaptic circuitry and neurological processes (Arc, ASPA, ATP2B2, MYO6, NTRK2, Rab40c, SNCA and TG). From these genes and 30 other genes (Supplementary Data 1) involved in brain and neuronal development, 19 are considered to be associated with behaviour. ADCYAP1R1 (Fig. 2b) encodes a pituitary adenylate cyclase-activating polypeptide receptor that in humans is strongly expressed in the amygdala and hippocampus, and is associated with fear response, threat stimuli, post-traumatic stress disorder and other anxiety disorders. Adcyap1r1-deficient mice exhibit strongly reduced anxiety-like behaviour18. SCRIB (Fig. 2c) encodes the scribbled planar cell polarity protein, which is a key regulator of brain development and spine morphology. Scrib1 knockout mice exhibit enhanced learning and memory abilities and impaired social behaviour correlated with altered neuronal morphology19. PLXNB1 encodes a neuronal receptor for semaphorins and has an important role in developing nervous systems and controlling axon guidance20. A recent quantitative trait loci study in rat identified PLXNB1 as a candidate gene contributing to differences in tameness and aggression21, which are expected to be important during the early phase of animal domestication2. The pathways of brain and neuronal development identified here to be under selection during yak domestication are similar to those reported previously for rabbit22 and cat23, suggesting common features of domestication in these unrelated species.
Figure 2

Genomic regions with selection sweep signals in domestic yaks.

(a) Distribution of ln ratio (θ/θ) and FST of 50 kb windows with 10 kb steps. Red dots represent windows fulfilling the selected regions requirement (corresponding to Z test P<0.005, where FST≥0.17 and ln ratio≥0.65). Example of genes (b,c) with selection sweep signals in domestic yaks. FST, θπ and Tajima's D values are plotted using a 5-kb sliding window. Wild (green) and domestic (blue) yaks are represented by different colours. Horizontal dashed lines represent mean whole-genome of corresponding values. Genes are shown at the bottom (black rectangle, coding sequences; red line, introns).

Only a few genes subject to selection were associated with specific physical characteristics or economically significant traits, such as TTLL1 and RHPN1 associated with sperm development and RHOD with early pregnancy. Also, a limited number of sweeps associated with coat colour were detected from an examination of genetic divergence between Tianzhu white yaks and other domestic yaks (Supplementary Note 3). In line with the low level of genetic and morphological differentiation recorded between wild and domestic yaks (Supplementary Table 4 and Supplementary Fig. 4), our analyses confirm that the effects of domestication in yaks are not as marked as for most other domestic species122. This may reflect a trade-off between survival of yaks in a harsh high-altitude environment and performance under pastoral conditions.

Demographic history

We employed the pairwise sequentially Markovian coalescent (PSMC) method24 to examine changes in effective population size (Ne) of the ancestral population of both wild and domestic yaks in response to Quaternary climatic change. We applied this method to our deep-coverage (>20 ×) yak genomes from three wild and four domestic yaks, including the reference genome. Both wild and domestic yaks exhibited similar demographic trajectories until about 20,000 years ago (Fig. 3a and Supplementary Fig. 5). The ancestral Ne of yaks shows a peak at ∼1 Myr ago followed by two distinct declines. The first decline occurred ∼0.9 Myr ago, coinciding with extensive glaciation during the mid-Pleistocene25, with three highly divergent mitochondrial lineages known to have survived this decline9. Other animal species such as giant panda and golden snub-nosed monkeys living in the southern and southeastern QTP also suffered during the same period decreases in effective population size2627. The second decline involved at least a threefold decrease in Ne, and occurred ∼40,000 years ago coinciding with the last glaciation25.
Figure 3

Demographic history of yak.

(a) Demographic history inferred by PSMC. The period of the Xixiabangma Glaciation (XG, 1,170–800 thousand years ago, kya), Naynayxungla Glaciation (NG, 780–500 kya) and the last glacial maximum (LGM, ∼20 kya) are shaded in grey. (b) Schematic of demographic scenario modelled in Fastsimcoal2. The ancestral population is in grey, wild yak in brown and domestic yak in blue. The width shows the relative effective population size. The figures at the arrows indicate the average number of migrants per generation between wild and domestic yaks. The folded genome-wide SFS from 13 wild yaks (c) and 59 domestic yaks (d). Different colours represent data before (blue) and after (orange) impute filtering of sites for which the correlation of observed and imputed date was <0.9.

We used the joint site frequency spectrum (SFS) approach implemented in fastsimcoal2 (ref. 28) to simulate more recent demographic fluctuations. Thirty alternative models of historical divergence were fitted to the allele-frequency spectrum of domestic and wild yak populations, incorporating strict isolation, isolation-with-migration, bottlenecks and/or growth (Supplementary Fig. 6). A demographic model in which domestic and wild yaks diverged through a dynamic process involving population bottlenecks in both wild and domestic yaks and extensive post-domestication gene flow produced a significantly better fit than alternative models (Fig. 3b). The allele-frequency spectrum simulated with the best model was very close to the spectrum generated from real data (Supplementary Fig. 7), demonstrating the accuracy of the calculations. Thus in the best fitting model domestication of yaks occurred ∼7,300 yr BP, with a 95% confidence interval of 7,227–7,914 yr BP, slightly later than the domestication of many other livestock species (10,000–8,000 yr BP), but preceding the introduction of taurine cattle to China 5,400–4,700 yr BP (ref. 29). Analyses of mitochondrial, Y-chromosomal and autosomal DNA data suggest that modern humans began colonizing the QTP ∼30,000 yr BP and that their population size expanded rapidly first between 10,000 and 7,000 yr BP and later between 4,000 and 3,000 yr BP (refs 5, 30, 31). However, archaeological and anthropological evidence indicates that the earliest agricultural settlements in the northeastern QTP were established 5,200 yr BP or later332. During the early-Neolithic age, the climate on the QTP was warmer than today25, which may have favored persistence of a hunter–gatherer population in the region. Our results suggest that the yak was domesticated by 7,300 yr BP and may have been triggered by and facilitated the first expansion of human population size on the QTP at this stage. Given the absence of agricultural settlements at this time, the first pastoralists were probably nomadic herders. A similar domestication by nomadic people in another extreme environment has been described for reindeer33. Later in the Holocene agriculture was established on the QTP, for example, the introduction of barley cultivation 4,000–3,600 yr BP (ref. 30). This coincided with a second human population expansion3453032 despite the colder climate of the late Holocene25. Interestingly, our coalescence analyses revealed a sixfold increase in population size of the domestic yak (Ne, from 1,100 to 6,500) during the same period (3,600 yr BP, Fig. 3b), which might have resulted from the second human population expansion on the QTP following the introduction of agriculture or contributed to this second expansion by providing a reliable resource of food, hides and transportation. According to our coalescent analysis, ∼500 years ago the Ne of the wild yak population seriously declined from 21,200 to 1,700, which is consistent with a loss of low-frequency variants (Fig. 3c,d) and a lower genetic diversity in current wild yaks (Supplementary Fig. 2 and Supplementary Table 4). This possibly resulted from habitat loss due to increasing human activities.

Discussion

Despite low morphological divergence and continuing gene flow, we detected a clear genetic split between wild and domestic yaks. We found that the genomes of domestic yaks exhibit clear signatures of selection at genes that probably affect animal behaviour and tameness according to previous reports on other animals2223. These findings suggest that parallel processes of evolution have occurred during the domestication of unrelated animals across different localities of the world. Our study further indicates that the yak is likely to have been domesticated before 7,000 years ago and that domestication was closely associated with the expansion of the human population on the QTP during the early Neolithic period31. Moreover, following the introduction of agriculture30, a further increase in the effective population size of domestic yaks later in the Holocene may have resulted from or contributed to causing a second human population expansion and the subsequent development of human society on the QTP during this period3453032

Methods

Sample collection and sequencing

A total of 84 individuals (15 wild yaks and 69 domestic yaks, Supplementary Table 1) were collected and sequenced, yielding a data set of genomes from 13 wild and 59 domestic yaks without close relatives and with <50% missing data. The wild samples were collected from corpses of wild yaks in the central Kokohili region, which were identified as wild yaks because of their long hair and large skeletons. Domestic yaks were sampled across the species main geographic distribution. Samples were collected under the supervision of ethical committees and permission was obtained when necessary. For each yak, genomic DNA was extracted from muscle samples using a standard phenol/chloroform extraction34. The quality and integrity of the extracted DNA was checked by measuring the A260/A280 ratio and by agarose gel electrophoresis. Paired-end sequencing libraries with an insert size of 500 bp were constructed according to the Illumina manufacturer's instructions for sequencing on the Hiseq 2,000 platform. Sequencing and base calling were performed according to the standard Illumina protocols.

Sequence quality checking

Duplicate reads caused by base-calling and adaptor contamination were removed. Reads with (i) ≥10% unidentified nucleotides (N), (ii) with a phred quality ≤7 for >65% of the read length or with (iii) a stretch of >10 bp identical to the adaptor sequence with up to two mismatches were removed or corrected using a k-mer frequency-based methodology35. Reads were also trimmed if they had three consecutive bp with a phred quality of ≤13, and discarded if they were shorter than 45 bp.

Sex-linked scaffolds

We used Blastz (ref. 36) to perform whole-genome alignment of the yak and taurine cattle genomes and to identify yak sex chromosomes (downloaded from National Center for Biotechnology Information, http://www.ncbi.nlm.nih.gov, UMD3.1.1, GCA_000003055.4). All hits against the cattle sex chromosome were treated as sex-linked scaffolds. A total of 186 scaffolds with a combined size of ∼134 Mb were aligned to the cattle sex chromosome and omitted from subsequent analyses.

Read mapping

High-quality reads were aligned to the Bos grunniens reference genome37 and mitochondrial reference genome (accession number: JQ692071.1) using BWA-MEM (0.7.10-r789) with default parameters38. Sequence Alignment/Map (SAM) format files were imported to SAMtools (v0.1.19)39 for sorting and merging and Picard (http://broadinstitute.github.io/picard/, version 1.92) to assign read group information containing library, lane and sample identity. The Genome Analysis Toolkit (GATK, version 2.6–4-g3e5ff60)40 was used to perform local realignment of reads to enhance the alignments in the vicinity of indel polymorphisms. Realignment was performed with GATK in two steps. The first step used the RealignerTargetCreator to identify regions where realignment was needed, and the second step used IndelRealigner to realign the regions found in the first step, generating for each individual a realigned Binary sequence Alignment/Map (BAM) file.

Filtering alignments

We removed all alignments that were not of sufficiently high quality for SNP detection and subsequent analyses. Alignments to be removed were identified using the following stepwise protocol: (i) discard reads that do not map uniquely; (ii) only use reads for which a mate can be mapped; (iii) discard ‘bad' reads with flag ≥255; (iv) discard bases with a quality <20; and (v) discard reads with a mapping quality <30. We also adjusted the quality scores around indels using SAMtools and removed the alignments that anchored short scaffolds of <2 kb.

Filtering sites

To minimize the influence of sequencing and mapping bias, the following site types were discarded: (i) sites with unbalanced quality scores as determined using Wilcoxon rank-sum test with threshold of P<10−5; (ii) sites with strand bias (P<10−5); (iii) sites with extremely low (<2 ×) or extremely high (>18 ×) coverage, both thresholds being defined after investigating the coverage distribution empirically; (iv) sites that failed the Hardy–Weinberg Equilibrium test and P<10−3, using SAMtools and BCFtools39; and (v) sites for which the available information derived from <90% of the sampled domestic and/or wild populations. The combined application of these filters left us with a data set comprising ∼2.2 Gb, representing 81.7% of the genome.

SNP and genotype calling

Variant discovery analysis was conducted at the population level for wild and domestic yak samples separately. We used the SAMtools model41 implemented in analysis of next generation sequencing data (ANGSD)42 to estimate genotype likelihoods and generated Beagle files. A maximum likelihood approach43 was then used to infer major/minor states based on the genotype likelihoods. Minor allele-frequency polarized by major/minor state was also estimated from the genotype likelihoods based on Kim's method44. A likelihood ratio test statistic for the allele-frequency based on a χ2 distribution with one degree of freedom and a P-value threshold of 1 × 10-6 was used as an SNP discovery criterion. SNPs were retained only if they could be genotyped in at least 90% of the sampled individuals from both domestic and wild populations. This yielded a total of 14.6 million SNPs. A two-step procedure implemented in ANGSD was used to estimate the SFS: (i) sample allele-frequency likelihood files (.saf) were generated using the option ‘–doSaf 1', with ancestral state being assigned by a cattle genome45 (17.4 ×); (ii) the allele-frequency likelihood files were optimized using the realSFS (ref. 46) programme in order the estimate the SFS. Genotypes were called using the full set of genotype likelihoods data. Using the sample allele frequency as a prior for genotype frequencies under the assumption of Hardy–Weinberg equilibrium, we then computed the posterior probabilities of the genotypes at each site for each individual.

Relationships

To identify closely related individuals, the programme PLINK v1.07 (ref. 47) was used to obtain pairwise estimates of Identity-By-State (IBS) scores between all samples. One wild and seven domestic individuals were excluded due to their high pairwise genetic similarity with another sampled individual (IBS>0.9), leaving only unrelated samples for use in the downstream analyses. We also discarded one wild and three domestic individuals with >50% missing data (Supplementary Table 1).

Genome-wide identity scores

To visualize genetic relatedness between domestic and wild populations, we calculated for individual SNPs identity scores as the sum of the products of the frequencies of both alleles with the frequencies of the same allele in the reference genome. Identity scores for 50 kb windows along the genome were averaged over the SNPs within the window (Supplementary Fig. 4).

Population genetics analysis

After mapping sequencing data against the reference yak mitochondrial genome (accession number: JQ692071.1), only positions covered by a minimum number of three independent unique reads with base qualities of ≥30 were used to call the consensus sequences. Eighty yak mitochondrial sequences were generated and aligned together with 81 sequences from B. grunniens, one from Bos taurus, one from Bos indicus, one from Bison bison and one from Bison bonasus (see the labels of external branches on Supplementary Fig. 8 for accession numbers). We partitioned the alignment into six main regions: the D-loop, ribosomal RNA, tRNA and the first, second and third codon positions for Coding DNA Sequence (CDS). The initiation and termination codons and overlapping regions between CDSs were excluded. We also removed sites with missing genotypes in >10% of the sampled individuals. The best mutational model for each of the partitions was then selected using ModelGenerator v851 (ref. 48) with eight rate categories. The partitions and their corresponding mutational models were used for Bayesian phylogenetic inference with MrBayes v3.22 (ref. 49), running two analyses in parallel, each with four Markov Chain Monte Carlo (MCMC) chains. The final tree topology was recovered after a total of 100,000,000 generations, sampling 1 in every 1,000 generations after discarding the first 25% as burn-in. The s.d. of split frequencies was below 0.01 after 100,000,000 generations, indicating the convergence of the four chains to the stationary distribution. The resulting tree, as drawn with MEGA v5.0 (ref. 50), is shown in Supplementary Fig. 1. This tree topology is consistent with the presence of distinct phylogeographical patterns and multiple divergent lineages in yaks as determined in previous studies89 based on D-loop and mitochondrial genomic sequences from a wide range of samples. For autosomal genome data, a neighbour-joining tree was constructed with PHYLIP v3.695 (http://evolution.genetics.washington.edu/phylip.html) using the matrix of pairwise genetic distances. The ancestral states of the SNPs were determined using the close relative of the yak, B. taurus45, as the outgroup. A second frequency tree (Supplementary Fig. 1) was generated based on 1,000 bootstrap replicates using the consensus module of PHYLIP. FigTree (http://tree.bio.ed.ac.uk/software/figtree/) and MEGA v5.0 were used to visualize the phylogenetic trees. Principal component analysis of the SNPs was performed using the smartpca programme in EIGENSOFT v5.0 (ref. 51). A Tracy–Widom test was used to determine the significance level of the eigenvectors. Geographic distances in km between individuals were calculated via the formula: Distance=acos(sin(lat1 × π/180)sin(lat2 × π/180)+cos(lat1 × π/180)cos(lat2 × π/180)cos(lon2 × π/180−lon1 × π/180)) × 6378.135, in which lat1 and lat2 are latitudes in degrees of the two individuals and lon1 and lon2 their longitudes (Supplementary Fig. 9).

Genome-wide patterns of heterozygosity and neutrality tests

The nucleotide diversity (π), population-differentiation statistic (FST), Tajima's D statistic and Watterson estimator (θw) were calculated using a sliding window approach (50 kb window sliding in 10 kb steps)525354. To compensate for missing data and variations in the depth-of-coverage across the different genomes (4.5–8.8 × , average coverage 6.7 ×), an empirical Bayesian method was used to calculate the posterior probabilities for the sample frequency spectrum using a maximum likelihood estimate of the SFS as the prior. The method takes genotype uncertainty into account and is based directly on genotype likelihoods rather than called genotypes. Only genomic windows in which at least 80% of bases were covered were considered to avoid coverage-related bias, leaving 207,111 windows with an average SNPs number of 346 per-window (min 34, max 3,298).

Screening for selective sweeps

To identify genomic regions that may have been subject to selection during domestication, we combined the two domestic yak populations (D1 and D2) as a single domestic gene pool. We scanned the genome for regions with the highest differences in genetic diversity (π log-ratio wild/domestic) and extreme divergence in allele frequency between wild and domestic populations using a genome-wide sliding window strategy. More specifically, we calculated the sequence diversity statistics (π), and the population-differentiation statistic (FST) using a 50 kb window with a 10-kb step. The π log-ratio was calculated as ln(πW)−ln(πD), where πW and πD are the nucleotide diversity values for the wild and domestic yaks, respectively. At a significance level of P<0.005 (Z test, with π log-ratio >0.65 and FST>0.17, Fig. 2a and Supplementary Fig. 3), we identified a total of 182 potential selective-sweep regions (with an average size of 79.5 kb, range from 10 to 450 kb) overlapping with 209 candidate genes, used for subsequent analysis and discussion. To test whether the candidate selective-sweep regions had an excess of singleton polymorphisms, we computed the Tajima's D value for domestic yaks using the same sliding window approach. Regions under selective sweeps had very significantly lower values of Tajima's D (P=2.7 × 10−12, Wilcoxon rank-sum test). In addition, pairwise r2 values showed that the candidate regions exhibited significantly extended linkage disequilibrium (P=1.5 × 10−4, Wilcoxon rank-sum test). These results confirm the occurrence of selective sweeps in the identified regions. The impact of population structure to selection signal was tested by repeating the sweep detection by comparing the two domestic populations D1 and D2 separately to the wild population (W). For 196 (93.8%) of 209 genes selection signals were statistically significant in both domestic populations; relatively strong selection signals were evident for the other 13 candidate genes but did not reach the significance threshold (Supplementary Data 1). Functional classification of GO categories was performed using the Blast2GO programme55. Enrichment analysis was performed and the χ2 test was used to calculate the statistical significance of enrichment. The P values were adjusted by FDR and the adjusted P value cut-off was 0.05. We inferred a demographic history for B. grunniens by applying the Pairwise Sequentially Markovian Coalescence model24 to the complete diploid genome sequences, excluding sexual chromosomes/scaffolds. This method reconstructs the history of changes in population size over time using the distribution of the most recent common ancestor (tMRCA) between two alleles in an individual. PSMC has high false-negative rates at low depth, which leads to a systematic underestimation of true event times. To ensure the quality of consensus sequences, we sequenced three wild and three domestic yaks to a high coverage of 20 × . DNA was prepared and libraries were built using the protocols described above. Consensus sequences were obtained using SAMtools and divided into non-overlapping 100 bp bins. Bases of low sequencing depth (less than a third of the average depth) or high depth (twice the average depth) were masked. The analysis was performed using the following parameters: −N25 −t15 −r5 −p ‘4+25 × 2+4+6'. The mutation rate per generation per site was estimated as: μ=D × g/2 T where D is the observed frequency of pairwise differences between two species, T is the estimated divergence time and g is the estimated generation time for the two species. The estimated generation time (g) was set to 3 years and the estimated divergence time was set to 4.7 Myr based on a previous study on cattle and yak56. These values yielded an estimated mutation rate of 5.84 × 10−9 mutations per generation per site for the yak. PSMC modelling was done using a bootstrapping approach, with sampling performed 100 times to estimate the variance of the simulated results. As PSMC inference is known to be inaccurate for recent datings, we also inferred the joint demographic histories of the wild and domestic yak using the flexible and robust simulation-based composite-likelihood approach implemented in the fastsimcoal2 programme28, which infers demographic parameters from the SFS. The analysis was performed for 13 wild samples and 59 domestic samples. To improve the genotype accuracy and infer missing genotypes, we used BEAGLE (ref. 57) to infer the haplotypes of wild and domestic individuals from previously estimated genotypes. After investigating the empirical minor allele frequency distributions, we inferred haplotypes for non-coding sites alone with estimated minor allele frequency values of >0.038 for wild yak and >0.008 for domestic yak. Only sites for which the correlation between the observed and imputed data (r2) was >0.9 were retained. To examine potential bias introduced by impute filtering, we compared the SFS before and after filtering. No potential bias was found (Fig. 3c,d). The joint SFS of wild and domestic yaks was used to estimate evolutionary scenario parameters. We used the folded spectrum to minimize potential biases when determining the ancestral allelic states. Alternative models of historical events were fitted to the joint SFS of wild and domestic yak and we allowed only instantaneous population size changes (Supplementary Fig. 6). For each model, we ran the programme 50 times with varying starting points to ensure convergence, and retained the fitting with the highest likelihood. Estimates were obtained from 100,000 simulations per likelihood estimation (-n100,000, -N100,000), 40 Expectation/Conditional Maximization (ECM) cycle (-L40) and 50 runs per data set. The best model was addressed through the maximum value of the likelihoods and Akaike information criterion28. Parametric bootstrap estimates were obtained by parameter estimation based on 100 data sets simulated according to CML estimates in best model (model15) estimation parameters (Supplementary Data 2). The population history and parameters from the best model were used to perform forward simulation and residuals analysis with ∂a∂i (ref. 58) to check the accuracy of the demographic model.

Additional information

Accession codes: The sequencing data for this project have been deposited in the European Nucleotide Archive (EMBL-EBI) under accession code PRJNA285834. How to cite this article: Qiu, Q. et al. Yak whole-genome resequencing reveals domestication signatures and prehistoric population expansions. Nat. Commun. 6:10283 doi: 10.1038/ncomms10283 (2015).
  49 in total

1.  A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data.

Authors:  Heng Li
Journal:  Bioinformatics       Date:  2011-09-08       Impact factor: 6.937

2.  The sequence and de novo assembly of the giant panda genome.

Authors:  Ruiqiang Li; Wei Fan; Geng Tian; Hongmei Zhu; Lin He; Jing Cai; Quanfei Huang; Qingle Cai; Bo Li; Yinqi Bai; Zhihe Zhang; Yaping Zhang; Wen Wang; Jun Li; Fuwen Wei; Heng Li; Min Jian; Jianwen Li; Zhaolei Zhang; Rasmus Nielsen; Dawei Li; Wanjun Gu; Zhentao Yang; Zhaoling Xuan; Oliver A Ryder; Frederick Chi-Ching Leung; Yan Zhou; Jianjun Cao; Xiao Sun; Yonggui Fu; Xiaodong Fang; Xiaosen Guo; Bo Wang; Rong Hou; Fujun Shen; Bo Mu; Peixiang Ni; Runmao Lin; Wubin Qian; Guodong Wang; Chang Yu; Wenhui Nie; Jinhuan Wang; Zhigang Wu; Huiqing Liang; Jiumeng Min; Qi Wu; Shifeng Cheng; Jue Ruan; Mingwei Wang; Zhongbin Shi; Ming Wen; Binghang Liu; Xiaoli Ren; Huisong Zheng; Dong Dong; Kathleen Cook; Gao Shan; Hao Zhang; Carolin Kosiol; Xueying Xie; Zuhong Lu; Hancheng Zheng; Yingrui Li; Cynthia C Steiner; Tommy Tsan-Yuk Lam; Siyuan Lin; Qinghui Zhang; Guoqing Li; Jing Tian; Timing Gong; Hongde Liu; Dejin Zhang; Lin Fang; Chen Ye; Juanbin Zhang; Wenbo Hu; Anlong Xu; Yuanyuan Ren; Guojie Zhang; Michael W Bruford; Qibin Li; Lijia Ma; Yiran Guo; Na An; Yujie Hu; Yang Zheng; Yongyong Shi; Zhiqiang Li; Qing Liu; Yanling Chen; Jing Zhao; Ning Qu; Shancen Zhao; Feng Tian; Xiaoling Wang; Haiyin Wang; Lizhi Xu; Xiao Liu; Tomas Vinar; Yajun Wang; Tak-Wah Lam; Siu-Ming Yiu; Shiping Liu; Hemin Zhang; Desheng Li; Yan Huang; Xia Wang; Guohua Yang; Zhi Jiang; Junyi Wang; Nan Qin; Li Li; Jingxiang Li; Lars Bolund; Karsten Kristiansen; Gane Ka-Shu Wong; Maynard Olson; Xiuqing Zhang; Songgang Li; Huanming Yang; Jian Wang; Jun Wang
Journal:  Nature       Date:  2009-12-13       Impact factor: 49.962

3.  Whole-genome sequencing of giant pandas provides insights into demographic history and local adaptation.

Authors:  Shancen Zhao; Pingping Zheng; Shanshan Dong; Xiangjiang Zhan; Qi Wu; Xiaosen Guo; Yibo Hu; Weiming He; Shanning Zhang; Wei Fan; Lifeng Zhu; Dong Li; Xuemei Zhang; Quan Chen; Hemin Zhang; Zhihe Zhang; Xuelin Jin; Jinguo Zhang; Huanming Yang; Jian Wang; Jun Wang; Fuwen Wei
Journal:  Nat Genet       Date:  2012-12-16       Impact factor: 38.330

4.  Genomic analyses identify distinct patterns of selection in domesticated pigs and Tibetan wild boars.

Authors:  Mingzhou Li; Shilin Tian; Long Jin; Guangyu Zhou; Ying Li; Yuan Zhang; Tao Wang; Carol K L Yeung; Lei Chen; Jideng Ma; Jinbo Zhang; Anan Jiang; Ji Li; Chaowei Zhou; Jie Zhang; Yingkai Liu; Xiaoqing Sun; Hongwei Zhao; Zexiong Niu; Pinger Lou; Lingjin Xian; Xiaoyong Shen; Shaoqing Liu; Shunhua Zhang; Mingwang Zhang; Li Zhu; Surong Shuai; Lin Bai; Guoqing Tang; Haifeng Liu; Yanzhi Jiang; Miaomiao Mai; Jian Xiao; Xun Wang; Qi Zhou; Zhiquan Wang; Paul Stothard; Ming Xue; Xiaolian Gao; Zonggang Luo; Yiren Gu; Hongmei Zhu; Xiaoxiang Hu; Yaofeng Zhao; Graham S Plastow; Jinyong Wang; Zhi Jiang; Kui Li; Ning Li; Xuewei Li; Ruiqiang Li
Journal:  Nat Genet       Date:  2013-10-27       Impact factor: 38.330

5.  Human-mouse alignments with BLASTZ.

Authors:  Scott Schwartz; W James Kent; Arian Smit; Zheng Zhang; Robert Baertsch; Ross C Hardison; David Haussler; Webb Miller
Journal:  Genome Res       Date:  2003-01       Impact factor: 9.043

6.  Genetic influences on brain gene expression in rats selected for tameness and aggression.

Authors:  Henrike O Heyne; Susann Lautenschläger; Ronald Nelson; François Besnier; Maxime Rotival; Alexander Cagan; Rimma Kozhemyakina; Irina Z Plyusnina; Lyudmila Trut; Örjan Carlborg; Enrico Petretto; Leonid Kruglyak; Svante Pääbo; Torsten Schöneberg; Frank W Albert
Journal:  Genetics       Date:  2014-09-03       Impact factor: 4.562

7.  Whole-genome resequencing shows numerous genes with nonsynonymous SNPs in the Japanese native cattle Kuchinoshima-Ushi.

Authors:  Ryouka Kawahara-Miki; Kaoru Tsuda; Yuh Shiwa; Yuko Arai-Kichise; Takashi Matsumoto; Yu Kanesaki; Sen-ichi Oda; Shizufumi Ebihara; Shunsuke Yajima; Hirofumi Yoshikawa; Tomohiro Kono
Journal:  BMC Genomics       Date:  2011-02-10       Impact factor: 3.969

8.  Population structure and eigenanalysis.

Authors:  Nick Patterson; Alkes L Price; David Reich
Journal:  PLoS Genet       Date:  2006-12       Impact factor: 5.917

9.  Origin of mitochondrial DNA diversity of domestic yaks.

Authors:  Songchang Guo; Peter Savolainen; Jianping Su; Qian Zhang; Delin Qi; Jie Zhou; Yang Zhong; Xinquan Zhao; Jianquan Liu
Journal:  BMC Evol Biol       Date:  2006-09-22       Impact factor: 3.260

10.  SNP calling, genotype calling, and sample allele frequency estimation from New-Generation Sequencing data.

Authors:  Rasmus Nielsen; Thorfinn Korneliussen; Anders Albrechtsen; Yingrui Li; Jun Wang
Journal:  PLoS One       Date:  2012-07-24       Impact factor: 3.240

View more
  44 in total

1.  Whole-genome analysis of introgressive hybridization and characterization of the bovine legacy of Mongolian yaks.

Authors:  Ivica Medugorac; Alexander Graf; Cécile Grohs; Sophie Rothammer; Yondon Zagdsuren; Elena Gladyr; Natalia Zinovieva; Johanna Barbieri; Doris Seichter; Ingolf Russ; André Eggen; Garrett Hellenthal; Gottfried Brem; Helmut Blum; Stefan Krebs; Aurélien Capitan
Journal:  Nat Genet       Date:  2017-01-30       Impact factor: 38.330

Review 2.  The adaptive strategies of yaks to live in the Asian highlands.

Authors:  Xiaoping Jing; Luming Ding; Jianwei Zhou; Xiaodan Huang; Allan Degen; Ruijun Long
Journal:  Anim Nutr       Date:  2022-03-04

3.  MC1R and KIT Haplotypes Associate With Pigmentation Phenotypes of North American Yak (Bos grunniens).

Authors:  Jessica L Petersen; Theodore S Kalbfleisch; Morgan Parris; Shauna M Tietze; Jenifer Cruickshank
Journal:  J Hered       Date:  2020-04-02       Impact factor: 2.645

Review 4.  An Anecdote on Prospective Protein Targets for Developing Novel Plant Growth Regulators.

Authors:  Rohit Patel; Krina Mehta; Dweipayan Goswami; Meenu Saraf
Journal:  Mol Biotechnol       Date:  2021-09-25       Impact factor: 2.695

5.  The Year of the Wisent.

Authors:  Johannes A Lenstra; Jianquan Liu
Journal:  BMC Biol       Date:  2016-11-18       Impact factor: 7.431

Review 6.  Genomics of Adaptations in Ungulates.

Authors:  Vivien J Chebii; Emmanuel A Mpolya; Farai C Muchadeyi; Jean-Baka Domelevo Entfellner
Journal:  Animals (Basel)       Date:  2021-05-29       Impact factor: 2.752

7.  Genome-wide patterns of copy number variation in the Chinese yak genome.

Authors:  Xiao Zhang; Kun Wang; Lizhong Wang; Yongzhi Yang; Zhengqiang Ni; Xiuyue Xie; Xuemin Shao; Jin Han; Dongshi Wan; Qiang Qiu
Journal:  BMC Genomics       Date:  2016-05-20       Impact factor: 3.969

8.  Genome-wide Association Study Identifies Loci for the Polled Phenotype in Yak.

Authors:  Chunnian Liang; Lizhong Wang; Xiaoyun Wu; Kun Wang; Xuezhi Ding; Mingcheng Wang; Min Chu; Xiuyue Xie; Qiang Qiu; Ping Yan
Journal:  PLoS One       Date:  2016-07-07       Impact factor: 3.240

9.  Whole-genome resequencing of Xishuangbanna fighting chicken to identify signatures of selection.

Authors:  Xing Guo; Qi Fang; Chendong Ma; Bangyuan Zhou; Yi Wan; Runshen Jiang
Journal:  Genet Sel Evol       Date:  2016-08-26       Impact factor: 4.297

10.  Gene Co-Expression Network Analysis Unraveling Transcriptional Regulation of High-Altitude Adaptation of Tibetan Pig.

Authors:  Cunling Jia; Xiaoyan Kong; James E Koltes; Xiao Gou; Shuli Yang; Dawei Yan; Shaoxiong Lu; Zehui Wei
Journal:  PLoS One       Date:  2016-12-09       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.