| Literature DB >> 29491421 |
Marina Naval-Sanchez1, Quan Nguyen1, Sean McWilliam1, Laercio R Porto-Neto1, Ross Tellam1, Tony Vuocolo1, Antonio Reverter1, Miguel Perez-Enciso2,3, Rudiger Brauning4, Shannon Clarke4, Alan McCulloch4, Wahid Zamani5, Saeid Naderi6, Hamid Reza Rezaei7, Francois Pompanon8, Pierre Taberlet8, Kim C Worley9, Richard A Gibbs9, Donna M Muzny9, Shalini N Jhangiani9, Noelle Cockett10, Hans Daetwyler11,12, James Kijas13.
Abstract
Domestication fundamentally reshaped animal morphology, physiology and behaviour, offering the opportunity to investigate the molecular processes driving evolutionary change. Here we assess sheep domestication and artificial selection by comparing genome sequence from 43 modern breeds (Ovis aries) and their Asian mouflon ancestor (O. orientalis) to identify selection sweeps. Next, we provide a comparative functional annotation of the sheep genome, validated using experimental ChIP-Seq of sheep tissue. Using these annotations, we evaluate the impact of selection and domestication on regulatory sequences and find that sweeps are significantly enriched for protein coding genes, proximal regulatory elements of genes and genome features associated with active transcription. Finally, we find individual sites displaying strong allele frequency divergence are enriched for the same regulatory features. Our data demonstrate that remodelling of gene expression is likely to have been one of the evolutionary forces that drove phenotypic diversification of this common livestock species.Entities:
Mesh:
Year: 2018 PMID: 29491421 PMCID: PMC5830443 DOI: 10.1038/s41467-017-02809-1
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Fig. 1Genome diversity and relatedness of wild and domestic sheep. a The geographic distribution of 43 breeds sampled for whole-genome sequencing. Population based proportion of allele type is given for each major geographic group of breeds, after calling reference variants present in the Texel derived genome assembly OARv3.1. b Proportion and number of private and shared SNP for the collection of wild and domestic sheep genomes. c Reference allele frequency (RAF) correlation between domestic and mouflon populations. RAF domestic and mouflon represent the frequency of the reference allele in domestic and mouflon populations respectively. (Pearson correlation: 76.36% p-value <2.2E−16). d Distribution of nucleotide diversity (π) within species, estimated within 20 kb bins. The boxplot compares the bin mean and variance. The distribution means, evaluated using the Wilcoxon rank-sum test, are significantly difference (p-value <2.2E−16). e Principal component (PC) analysis clustering individual domestic sheep coloured to reflect the geographic origin of breed development
Fig. 2Genomic regions putatively under positive selection in sheep. a Population differentiation (FST) and relative nucleotide diversity between wild and domestic sheep (πmouflon/πdomestic) was estimated in 20 kb genomic bins. A total of 1420 outlier bins exhibiting evidence for selection in domestic sheep genomes are indicated in red (corresponding to Z-test P < 0.001, where FST > 0.156 and ln ratio >0.672). b Genome-wide distribution of relative nucleotide diversity. Positive values identify genomic bins with depressed diversity in domestic sheep compared with mouflon, consistent with positive selection sweeps involved with domestication and selection (Table 1, Supplementary Table 4). c Selective sweeps located either side of KITLG coding exons. Integrated Genome Visualisation (IGV) screenshot of chromosome 3:124236035–125400001 illustrates the reduction of SNP variation in 67 domestic genomes compared with 17 Mouflon. The two regions identified by our selection metrics are shown inside the dashed red boxes
Genes associated with selective sweeps in domestic animals reported in other studies
| Chr | Start | End | Avg rnk |
|
| Closest genes | Distance to closest gene | Function |
|---|---|---|---|---|---|---|---|---|
| 13 | 50310001 | 50650000 | 2 | 3.56 | 0.53 |
| 0 | Neurodegeneration |
| 3 | 124790001 | 125080000 | 21.5 | 2.80 | 0.35 |
| 27570 | Coat colour |
| 2 | 184990001 | 185040000 | 42.5 | 1.98 | 0.33 |
| 139099 | Growth |
| 1 | 199640001 | 199790000 | 51 | 1.79 | 0.35 |
| 0 | Adiposity |
| 4 | 78890001 | 79000000 | 51.5 | 2.44 | 0.31 |
| 29289 | Pigmentation |
| 17 | 59450001 | 59520000 | 62.5 | 2.32 | 0.30 |
| 215541 | Pigmentation |
| 9 | 36140001 | 36240000 | 63 | 1.72 | 0.34 |
| 0 | Fertility, stature |
| 15 | 21900001 | 22010000 | 70.5 | 1.60 | 0.39 |
| 0 | Yellow-fat |
| 9 | 13570001 | 13610000 | 82.5 | 1.85 | 0.30 |
| 0 | Milk fat |
| 6 | 37420001 | 37510000 | 144 | 1.56 | 0.29 |
| 0 | Weight/height |
| 3 | 124640001 | 124680000 | 184 | 1.69 | 0.26 |
| 0 | Coat colour |
| 8 | 75670001 | 75700000 | 228.5 | 1.97 | 0.24 |
| 0 | Litter size; prolifacy |
| 6 | 38500001 | 38530000 | 245.5 | 1.62 | 0.25 |
| 1047670 | Weight/height |
| 5 | 19460001 | 19480000 | 312.5 | 1.23 | 0.27 |
| 5040 | Immune function |
| 3 | 129730001 | 129750000 | 359 | 0.94 | 0.31 |
| 7494 | Weight and milk production |
| 1 | 100990001 | 101010000 | 383.5 | 1.26 | 0.24 |
| 0 | Hair |
| 4 | 78800001 | 78820000 | 386.5 | 1.15 | 0.26 |
| 209289 | Pigmentation |
| 5 | 85880001 | 85900000 | 405 | 1.11 | 0.26 |
| 173852 | Skeletal muscle development |
| 2 | 105830001 | 105850000 | 417.5 | 1.27 | 0.23 |
| 235173 | Limb development |
| 20 | 17530001 | 17550000 | 434.5 | 1.207 | 0.24 |
| 147890 | Reproduction |
| 13 | 63380001 | 63400000 | 469 | 1.237 | 0.22 |
| 0 | Coat colour |
| 1 | 203460001 | 203480000 | 481 | 0.977 | 0.25 |
| 52282 | Stem cell maintenance |
| 6 | 35290001 | 35310000 | 486.5 | 1.05 | 0.24 |
| 201294 | Brain development |
| 6 | 70290001 | 70310000 | 492.5 | 1.24 | 0.21 |
| 55390 | Coat colour |
Gene ontology biological process—top 15 enrichment results for identified selective sweeps (Great v.3.0)[42]
| GO biological process | |||||
|---|---|---|---|---|---|
| Term name | Binom raw | Binom FDR Q-Val | Binom fold enrichment | Binom observed region hits | Binom region set coverage |
| Primary alcohol catabolic process | 1.12E−27 | 2.34E−24 | 23.4 | 27 | 0.020 |
| Regionalisation | 3.22E−19 | 9.62E−17 | 2.3 | 135 | 0.103 |
| Regulation of MAPK cascade | 5.84E−19 | 1.60E−16 | 2.1 | 173 | 0.132 |
| Development of primary male sexual characteristics | 6.82E−19 | 1.83E−16 | 3.1 | 83 | 0.063 |
| Gland development | 8.90E−19 | 2.21E−16 | 2.3 | 138 | 0.105 |
| Male gonad development | 2.34E−18 | 5.20E−16 | 3.3 | 73 | 0.055 |
| Male sex differentiation | 4.05E−18 | 7.98E−16 | 3.0 | 83 | 0.063 |
| Dorsal/ventral pattern formation | 9.33E−18 | 1.68E−15 | 3.5 | 67 | 0.051 |
| Regulation of lipid metabolic process | 2.58E−17 | 4.09E−15 | 2.7 | 92 | 0.070 |
| Negative regulation of cell cycle | 6.13E−17 | 9.28E−15 | 2.6 | 97 | 0.074 |
| Embryonic hemopoiesis | 1.30E−16 | 1.80E−14 | 6.0 | 35 | 0.027 |
| Odontogenesis | 8.84E−16 | 1.04E−13 | 3.2 | 66 | 0.050 |
| Branching morphogenesis of an epithelial tube | 1.43E−15 | 1.61E−13 | 2.63 | 91 | 0.069 |
| Negative regulation of cellular protein metabolic process | 2.00E−14 | 1.65E−12 | 2.1 | 127 | 0.097 |
| Development of primary sexual characteristics | 3.83E−14 | 2.96E−12 | 2.2 | 105 | 0.079 |
Fig. 3Prediction and validation of sheep gene regulatory elements. a Human Epigenome Roadmap chromatin states and ENCODE features used in a reciprocal liftOver protocol to predict regulatory regions in sheep. b Overlap between chromosomal locations identified by experimental sheep tissue ChIP-Seq (H3K4me3 and H3K27ac) and regions predicted by liftOver. H3K4me3 peaks are compared against predicted Roadmap chromatin state TssA identifying active promoters (left) and unique H3K27ac peaks are compared to the Roadmap Enh state, excluding those regions overlapping with predicted TssAs, identifying distal enhancers (right). c The proportion of predicted promoters (left) and enhancers (right) recovered by ChIP-Seq using either H3K4me3 peaks (blue bar) or unique H3K27ac peaks (orange bar). This demonstrates the predicted annotations retain a level of specificity between components of the gene regulatory apparatus. d An example annotated region containing LCORL highlights the correspondence between ChIP-Seq data (Sheep H3K27ac) and liftOver predictions (TSSA, TSSAFlnk, ENCODE proximal, Enh, ENCODE distal)
Fig. 4Genomic feature enrichment in selection sweeps and differentiated sites. a Strength of enrichment for 29 genome features within 1420 sweep bins assessed by location overlap[49]. Genome features were derived from four different sources. The significance threshold from multiple testing is represented by the horizontal line. b Intersection of delta allele frequency (ΔAF) with protein coding gene annotations from reference OARv3.1. The number of SNP in ΔAF bins is given at left, and the M-value (at right) was calculated by comparing the frequency of SNP in each genome feature and ΔAF bin with the corresponding frequency across all bins. c As for b using chromatin state annotations derived from the Roadmap dataset. Additional M-value results using predicted ENCODE marks are provided in Supplementary Fig. 13
Fig. 5Candidate causal missense mutation in FBXL3. a FBXL3 gene representation in human hg19 coordinates chr13:77579389-77601337. b Multiple sequence alignment of SNP across representative vertebrates. c Allele frequency of the G (reference) or A (alternative) allele in domestic sheep (O. aries) (n = 67) or mouflon (O. orientalis) (n = 17). d FBXL3 amino acid sequence with the substitution at residue 182 highlighted in red. Residue 358 (green) is associated with the mouse After hour (Afh) phenotype residue 364 with the overtime phenotype (Ovt)