Literature DB >> 33944937

Structural Variants Selected during Yak Domestication Inferred from Long-Read Whole-Genome Sequencing.

Shangzhe Zhang1, Wenyu Liu1, Xinfeng Liu1, Xin Du1, Ke Zhang1, Yang Zhang2, Yongwu Song3, Yunnan Zi4, Qiang Qiu1, Johannes A Lenstra5, Jianquan Liu1.   

Abstract

Structural variants (SVs) represent an important genetic resource for both natural and artificial selection. Here we present a chromosome-scale reference genome for domestic yak (Bos grunniens) that has longer contigs and scaffolds (N50 44.72 and 114.39 Mb, respectively) than reported for any other ruminant genome. We further obtained long-read resequencing data for 6 wild and 23 domestic yaks and constructed a genetic SV map of 372,220 SVs that covers the geographic range of the yaks. The majority of the SVs contains repetitive sequences and several are in or near genes. By comparing SVs in domestic and wild yaks, we identified genes that are predominantly related to the nervous system, behavior, immunity, and reproduction and may have been targeted by artificial selection during yak domestication. These findings provide new insights in the domestication of animals living at high altitude and highlight the importance of SVs in animal domestication.
© The Author(s) 2021. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

Entities:  

Keywords:  zzm321990 Bos grunnienszzm321990 ; domestication; reference genome; structural variants

Mesh:

Year:  2021        PMID: 33944937      PMCID: PMC8382902          DOI: 10.1093/molbev/msab134

Source DB:  PubMed          Journal:  Mol Biol Evol        ISSN: 0737-4038            Impact factor:   16.240


The domestication of livestock species is one of the major achievements in the human civilization history. A series of phenotypic changes in domesticated animals, such as reduction of brain size and increased tameness, are considered to constitute the domestication syndrome (Hammer 1984). In several domestic species, the underlying genetic basis has been examined by using genetic markers such as single-nucleotide polymorphisms (SNPs), short insertions and deletions, and the copy number variations (CNVs) (Chen et al. 2009, 2018; Serres-Armero et al. 2017; Genova et al. 2018), which account for the most widespread mode of genomic variations. However, the role of structural variants (SVs), which comprise insertions, deletions, duplications, inversions, or translocations of 50 bp or longer (Baker 2012), has remained underexplored due to two technological constraints (Huddleston and Eichler 2016). First, detection of SVs needs long-read sequencing reads spanning over their full length (Chaisson et al. 2015; Sedlazeck et al. 2018). Second, it also requires a continuous reference assembly covering the repetitive fraction in genomes (Weckselblatt and Rudd 2015; Peona et al. 2021). Long-read sequencing is not suitable for the detection of single-nucleotide variations because of a single-base error of 85–95% (Kono and Arakawa 2019), but it is the method-of-choice for detecting large SVs. Recently, long-read sequencing and a high-quality reference genome revealed significant roles of SVs during plant domestication (Fuentes et al. 2019; Zhou et al. 2019). Whole-genome sequencing (WGS) data based on a short-read assembly with a high coverage have been published for domestic and wild yaks (Qiu et al. 2012; Liu et al. 2020). In this study, we present a high-quality reference genome for domestic yak (BosGru3.0). By long-read resequencing of selected 29 wild and domestic yaks from genetic groups from 80 previous (Qiu et al. 2015) and 18 new short-read whole-genome sequences, we obtained a comprehensive and representative SV map for domestic and wild yaks, which allows a tentative identification of SV-related genes involved in the domestication syndrome. For the chromosome-scale BosGru3.0 reference assembly (supplementary fig. S1, Supplementary Material online), DNA was extracted from blood of a male domestic yak from Hongyuan County, Sichuan Province. We conducted a de novo assembly of Oxford Nanopore long reads with a ~88× coverage. Through being polished by Illumina short reads and clustering based on interaction strength from Hi-C data (supplementary fig. S2, Supplementary Material online), we obtained a highly continuous reference genome BosGru3.0, with 116 contigs assembled in 31 chromosomes. The contig and scaffold N50 of BosGru3.0 are 44.72 and 114.39 Mb, respectively, and these values are higher than obtained for other ruminant reference genomes (table 1 and supplementary table S1, Supplementary Material online). Repetitive elements (supplementary fig. S3 and table S2, Supplementary Material online), protein-coding genes (supplementary table S3, Supplementary Material online), and noncoding elements (supplementary table S4, Supplementary Material online) were predicted for our assembly. A total of 21,232 protein-coding genes were predicted (table 1 and supplementary fig. S1, Supplementary Material online, see more details about BosGru3.0 in Supplementary Material online).
Table 1.

Assembly statistics comparison between BosGru2.0 and BosGru3.0.

AssemblyBosGru2.0BosGru3.0
Total length (bp)2,645,161,9112,832,776,395
Number of contigs41,192414
Contig N50 (Mb)1.4144.72
Scaffold N50 (Mb)1.41114.39
Chromosome number031
Unplaced contig number41,192383
Number of gaps192,002646
GC content (%)41.742.0
Protein-coding genes20,49921,232
Assembly statistics comparison between BosGru2.0 and BosGru3.0. Twenty-three domestic individuals covering various locations and six wild yaks (fig.1) were selected for long-read WGS resequencing after excluding duplicated samples (supplementary fig. S4 and table S6, Supplementary Material online). As shown by model-based clustering (fig. 1 and supplementary fig. S5, Supplementary Material online) and genetic distances of short-read whole-genome sequences of 18 yaks combined with previous data (supplementary tables S5, Supplementary Material online, Qiu et al. 2015), the 23 domestic yaks represent the genetic diversity within their distribution range. The average N50 length of the long-read WGS reaches 22.59 Kb (domestic) and 21.99 Kb (wild), with an effective depth of 8.4× to 15.6× (domestic) or 11.4× to 21.2× (wild, supplementary table S6, Supplementary Material online). We identified 372,220 SVs, which included 328,936 deletions, 32,618 insertions and 4,321 duplications, 1,993 inversions, and 4,352 translocations (supplementary figs. S6 and S7 and table S7, Supplementary Material online). We did not find any SV alleles that were absolutely specific for either wild or domestic yaks. We annotated all SVs by their positions on BosGru3.0 and found 257,155 SVs (69.09%) in intergenic regions, and 93,582, 14,964, 1,811, and 3,620 SVs were in intronic, exonic, UTR, or the 150 bp upstream and downstream flanking regions of genes, respectively (supplementary table S8, Supplementary Material online).
Fig. 1

(A) Geographic distribution of all domestic and wild yaks sampled in this research. (B) Genetic groups of 91 domestic and wild yaks in total based on short-read whole-genome sequences with population structure K = 5. Triangles indicate samples selected for long-read whole-genome sequencing. Orange: domestic yak; Blue: wild yak. GS, Gansu; NP, Nepal; PK, Pakistan; QH, Qinghai; SC, Sichuan; XZ, Xizang; YN, Yunnan; WY, Wild yak. (C) Neighbor-joining tree constructed based on SNPs of all long-read samples. (D) Domestication-related SVs in the region of MAGI2.

(A) Geographic distribution of all domestic and wild yaks sampled in this research. (B) Genetic groups of 91 domestic and wild yaks in total based on short-read whole-genome sequences with population structure K = 5. Triangles indicate samples selected for long-read whole-genome sequencing. Orange: domestic yak; Blue: wild yak. GS, Gansu; NP, Nepal; PK, Pakistan; QH, Qinghai; SC, Sichuan; XZ, Xizang; YN, Yunnan; WY, Wild yak. (C) Neighbor-joining tree constructed based on SNPs of all long-read samples. (D) Domestication-related SVs in the region of MAGI2. The majority of the SVs (74.43%) contains repetitive sequences. Overall percentages for different categories of these elements are not substantially different from the percentages for the whole genome (supplementary tables S2 and S9, Supplementary Material online), whereas the length distribution of SVs depends on the underlying molecular events (inversion, duplications, insertions, or deletions [supplementary fig. S7, Supplementary Material online]). Comparison of SV sequences and of the wild yak or domestic yak genomes did not display large differences in the contents of any type of repeats. However, wild yaks have more copies of LINE/RTE-BovB with a low divergence than domestic yaks (supplementary fig. S8, Supplementary Material online), which suggests a recent activity of RTE-BovB in wild yaks. Interestingly, length distribution of inversions and duplications sequences shows a peak at about 1,000 bp (supplementary fig. S7, Supplementary Material online), which mainly consists of non-repetitive elements and LINE-1 elements (supplementary table S10, Supplementary Material online). In order to further identify SVs possibly involved in domestication, we calculated for all SVs the FST between wild and domestic yaks and found 3,680 SVs with FST outliers > 0.28 under artificial selection (supplementary table S11, Supplementary Material online). A tree of the yak genotypes with these SVs increases the separation of domestic and wild yaks relative to the tree of figure 1, but still shows variation in the domestic yaks (supplementary fig. S9, Supplementary Material online). Among these high-FST SVs, 2,391 SVs are (0.64% of all SVs) in the intergenic and 1,288 SVs in the exons, introns, or flanking regions of 725 genes (supplementary table S11, Supplementary Material online). From these, 34 have deletions in exonic regions, 24 of which cause a frameshift in the open reading frame (ORF) (nonsense SVs). We then annotated the functions of the 725 genes carrying high-FST SVs and found that the most significantly enriched function was involved in nervous system development (GO ID: 0007399, 168 genes) and human disease pathway, long-term depression (9 genes, KEGG accession: hsa04730, supplementary fig. S10 and tables S12 and S13, Supplementary Material online). Other GO function categories are related to the nervous system, including neuron differentiation, generation of neurons, and others. Typically, the variant with the second-highest FST was located in an intron of a signal protein MAGI2 (fig. 1). A deletion within the human MAGI2 gene has been associated with epilepsy and schizophrenia (Marshall et al. 2008; Zhang et al. 2020) and several CNVs are located near MAGI2 in an aggressive dog breed (Chen et al. 2009). Similar associations with behavior have been reported for three other high-FST SV genes. GAD2 has been linked to fear in dog (Pendleton et al. 2018). GAD2-knockout mice displayed an increase in spontaneous seizures (Kash et al. 1997). PLCB1 was identified to be associated with schizophrenia (Liu et al. 2005; Lo Vasco et al. 2013), with strong selection signals in domestication of buffalo (Luo et al. 2020) and rabbit (Carneiro et al. 2014). GRIK2, which is also related to fear, anxiety, and aggression, was involved in a selective sweep in domesticated animals including rabbit, dog, and duck (O’Rourke and Boeckx 2020). Other genes carrying SVs are involved in immunity, anatomical structure morphogenesis, and economical traits (supplementary table S12, Supplementary Material online). For example, NAFT has been proved to regulate the expression of potent immunomodulatory cytokines by downstream-targeting IL-2 growth factor in T cells (Müller and Rao 2010). SMOC2 was reported to be related to brachycephaly in dogs (Marchant et al. 2017) and is highly expressed in endometrium and other reproductive tissues (Uhlén et al. 2015). GSK3B is an isoform of GSK3A, which was found to be related to fat storage ability in pig (Fu et al. 2016). Knockout of GSK3A in mice improved glucose tolerance in response to glucose load and elevated hepatic glycogen storage and insulin sensitivity (MacAulay et al. 2007). As for the nonsense SVs, a few genes are involved in mental or brain development as well, for instance, PAX3 (Bang et al. 1997), MAGT1 (Molinari et al. 2008), SHROOM2 (Fairbank et al. 2006), and SSBP3 (supplementary table S11, Supplementary Material online, Hashimoto et al. 2012). Taken together, our results suggest that SVs have been mediated during yak domestication and that preferentially targeted genes are related to the nervous system, behavior, and immunity. These findings provide additional insights into yak domestications (Guo et al. 2006; Wang et al. 2010, 2011; Qiu et al. 2015; Zhang et al. 2016) and evolution of the bovini species (Wu et al. 2018; Zhang et al. 2020).

Materials and Methods

A detailed description of methods is provided in Supplementary Material online.

Supplementary Material

Supplementary data are available at Molecular Biology and Evolution online.

Data Availability

All sequences have been deposited to NCBI BioProject with accession number PRJNA540974. The BosGru3.0 reference sequences have been deposited to NCBI as GCA_005887515.2. Annotation information of BosGru3.0 and detailed SV information are available at Figshare as doi: 10.6084/m9.figshare.11151185. Custom workflow and scripts are available at https://github.com/shangshanzhizhe/YakPopulationSV (last accessed May 12, 2021). Click here for additional data file.
  4 in total

1.  A draft genome of Drung cattle reveals clues to its chromosomal fusion and environmental adaptation.

Authors:  Yan Chen; Tianliu Zhang; Ming Xian; Rui Zhang; Weifei Yang; Baqi Su; Guoqiang Yang; Limin Sun; Wenkun Xu; Shangzhong Xu; Huijiang Gao; Lingyang Xu; Xue Gao; Junya Li
Journal:  Commun Biol       Date:  2022-04-13

2.  Genome-Wide Association Study of Body Weight Trait in Yaks.

Authors:  Jiabo Wang; Xiaowei Li; Wei Peng; Jincheng Zhong; Mingfeng Jiang
Journal:  Animals (Basel)       Date:  2022-07-21       Impact factor: 3.231

3.  Long read genome assemblies complemented by single cell RNA-sequencing reveal genetic and cellular mechanisms underlying the adaptive evolution of yak.

Authors:  Xue Gao; Sheng Wang; Yan-Fen Wang; Shuang Li; Shi-Xin Wu; Rong-Ge Yan; Yi-Wen Zhang; Rui-Dong Wan; Zhen He; Ren-De Song; Xin-Quan Zhao; Dong-Dong Wu; Qi-En Yang
Journal:  Nat Commun       Date:  2022-09-06       Impact factor: 17.694

4.  Whole-Genome Resequencing Highlights the Unique Characteristics of Kecai Yaks.

Authors:  Yandong Kang; Shaoke Guo; Xingdong Wang; Mengli Cao; Jie Pei; Ruiwu Li; Pengjia Bao; Jiefeng Wang; Jiebu Lamao; Dangzhi Gongbao; Ji Lamao; Chunnian Liang; Ping Yan; Xian Guo
Journal:  Animals (Basel)       Date:  2022-10-06       Impact factor: 3.231

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.