Literature DB >> 26865845

Prediction of Genes Related to Positive Selection Using Whole-Genome Resequencing in Three Commercial Pig Breeds.

HyoYoung Kim1, Kelsey Caetano-Anolles2, Minseok Seo3, Young-Jun Kwon3, Seoae Cho4, Kangseok Seo5, Heebal Kim6.   

Abstract

Selective sweep can cause genetic differentiation across populations, which allows for the identification of possible causative regions/genes underlying important traits. The pig has experienced a long history of allele frequency changes through artificial selection in the domestication process. We obtained an average of 329,482,871 sequence reads for 24 pigs from three pig breeds: Yorkshire (n = 5), Landrace (n = 13), and Duroc (n = 6). An average read depth of 11.7 was obtained using whole-genome resequencing on an Illumina HiSeq2000 platform. In this study, cross-population extended haplotype homozygosity and cross-population composite likelihood ratio tests were implemented to detect genes experiencing positive selection for the genome-wide resequencing data generated from three commercial pig breeds. In our results, 26, 7, and 14 genes from Yorkshire, Landrace, and Duroc, respectively were detected by two kinds of statistical tests. Significant evidence for positive selection was identified on genes ST6GALNAC2 and EPHX1 in Yorkshire, PARK2 in Landrace, and BMP6, SLA-DQA1, and PRKG1 in Duroc.These genes are reportedly relevant to lactation, reproduction, meat quality, and growth traits. To understand how these single nucleotide polymorphisms (SNPs) related positive selection affect protein function, we analyzed the effect of non-synonymous SNPs. Three SNPs (rs324509622, rs80931851, and rs80937718) in the SLA-DQA1 gene were significant in the enrichment tests, indicating strong evidence for positive selection in Duroc. Our analyses identified genes under positive selection for lactation, reproduction, and meat-quality and growth traits in Yorkshire, Landrace, and Duroc, respectively.

Entities:  

Keywords:  non-synonymous; positive selection; re-sequencing; single nucleotide polymorphism; swine

Year:  2015        PMID: 26865845      PMCID: PMC4742324          DOI: 10.5808/GI.2015.13.4.137

Source DB:  PubMed          Journal:  Genomics Inform        ISSN: 1598-866X


Introduction

Artificial selection is an aspect of the process of domestication development [1] that is reflected in the evolution of domestic animals. Positive selection increases the fitness of adaptive traits [2]. Previous research has demonstrated that the selection and mating of domestic animals is determined by human-generated pressures, and the genetic differences resulting from artificial selection have led to economic development [3]. Among domestic animals, the pig has experienced a particularly long history of haplotype changes through artificial selection in the process of domestication [24]. For most of their history, studies have found positive selection in Yorkshire, Landrace, and Duroc to be associated with specific genes related to lactation [56], reproduction [7], meat quality [8], and growth [3] trait. Given the importance of these traits to the pig farming industry, it is necessary to detect the signals of positive selection in pigs. Identifying genomic selection using high-density single nucleotide polymorphism (SNP) arrays or sequencing data is useful for the discovery of putative trait-related genes. A variant under selection pressure between and/or within populations will show an increase of the frequency of particular alleles or linkage disequilibrium (LD) of SNPs. Selective sweep can cause genetic differentiation across populations [9]. The cross-population extended haplotype homozygosity (XPEHH) [10] and cross-population composite likelihood ratio (XPCLR) [9] tests are widely used to search for signals of selection between the populations. The XPEHH test detects the occurrence of selection based on measuring LD, while the XPCLR test considers the spatial patterns of allele frequencies of SNPs [9]. The XPCLR test is a likelihood method that uses differentiation of multi-locus allele frequencies between two populations to detect selective sweep. Selection with significant signals can break up the haplotype structures more rapidly than mutation or recombination processes [11]. In this study, the XPEHH and XPCLR between-population methods were implemented to detect positive selection in three pig breeds through resequencing using the Illumina HiSeq2000 platform (Illumina, Inc., San Diego, CA, USA). To further explain the biological implications of positively selected genes, functional enrichment analysis was performed.

Methods

Sampling and whole-genome sequencing

We used genomic DNA samples gathered from 12 males and 12 females of three pig breeds: Yorkshire, 2 males and 3 females; Landrace, 7 males and 6 females; and Duroc, 3 males and 3 females. Blood samples were collected for DNA extraction. Sample collections and DNA quality check procedures were performed according to the manufacturer's instructions. Next, we constructed genomic DNA libraries for each sample using TruSeq DNA Library kits (Illumina). The paired-end library was sequenced on an Illumina HiSeq 2000 sequencing platform. The pair-end sequence reads were aligned to the reference pig genome sequence from University of California, Santa Cruz (UCSC; http://genome.ucsc.edu/; susScr3) using Bowtie2 with the default setting. We used the following open-source software: Bowtie2, Picard tools 1.94 (http://picard.sourceforge.net), Samtools 0.1.19 [12], Genome Analysis Toolkit (GATK) 2.6.4 [13], and VCFtools 4.0 [14] for resequencing data processing and SNP calling. Substitution calling was performed using GATK Unified-Genotyper. We phased the haplotypes for the entire pig populations using BEAGLE [15]. Picard tools was used for duplicate read removal and all mate-pair information confirmation. Samtools was used for indexing the results from bam files and calculating the mapped reads using the flagstat option. GATK was used for realignment and SNP calling from resequencing data, and VCFtools was used when VCF files were handled. After filtering, non-biallelic SNPs were excluded.

Detection of selective sweep

To detect selective sweep, we implemented two between-populations methods (XPEHH and XPCLR) between each pairwise breed contrast. These statistics used the phased data based on the SNP genotypes obtained from the wholegenome resequencing. For each contrast, XPEHH and XPCLR were implemented across all SNPs between the three pig breeds to identify potential positive selection. We used a p-value of <0.01 for XPEHH and the top 1,000 genes in XPCLR as the cutoff criteria. These tests revealed that some SNPs might have been under positive selection in the Yorkshire, Landrace, and Duroc breeds, respectively.

Bioinformatic analysis of genes under positive selection

Based on the findings of positive selection, enrichment analysis was performed to examine the biological functions of genes in detected regions. In this study, regions with SNPs under positive selection were extended approximately 10 kb upstream and downstream [16]. We assembled genes located within the extended region using the RefGene from the UCSC Genome Browser (http://genome.ucsc.edu/; ver. hg19). Gene enrichment analysis was performed GO terms [17], including biological process, molecular function, cellular component, and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis [18] using the DAVID (http://david.abcc.ncifcrf.gov/) tool [19].

Analysis of SNP effects

To identify non-synonymous coding SNPs (nsSNPs), we analyzed SNP effects using the SNPeffect database, which detects phenotypic effects of variation (http://snpeffect.vib.be/) [20]. For more exact results, a Fisher's exact test was performed to identify breed-specific amino acids in genes with nsSNPs. The statistical test was used to analyze 2 × 2 contingency tables composed with two factors. Considering our purpose, one factor was breed information, and the other was amino acid information. From the 2 × 2 contingency tables, we expected to observe enrichment of specific amino acids in specific breeds at each position. All data parsing and calculations were performed using Python (ver. 2.5) and R (ver. 3.1.2).

Results

Sequencing and alignments

Genome-wide positive selection in pigs was identified using whole-genome resequencing, in which we aligned whole genome reads from three pig breeds, Yorkshire, Landrace, and Duroc, to the pig reference sequence. These breeds have undergone positive selection with strong signals for lactation, reproduction, meat quality, and growth. On average, ~329,482,871 sequences were mapped to the pig reference genome from the University of California, Santa Cruz (UCSC) Genome Browser (assembly ID: susScr3) using Bowtie2. Table 1 provides a summary of the alignments of 24 pig samples. Over 92.8% of all reads were aligned to the pig reference genome. Pig resequencing data with an average read depth of 11.7× were generated using the Illumina HiSeq2000 platform. A total of 18,440,443 SNPs were identified using the GATK.
Table 1

Summary of alignment rates in 24 pig samples

Samples S_6 and S_16 were excluded after DNA quality checks.

Y, Yorkshire; L, Landrace; D, Duroc.

Selective sweep detected by between-population methods

To detect signals of positive selection in the three pig breeds, two between-population methods (XPEHH and XPCLR) were implemented with phased haplotypes based on pairwise populations of Yorkshire-Landrace (Y-L), Yorkshire-Duroc (Y-D), Landrace-Yorkshire (L-Y), Landrace-Duroc (L-D), Duroc-Yorkshire (D-Y), and Duroc-Landrace (D-L). Here, for Y-L, Yorkshire was regarded as the observed population and Landrace as the reference population. XPEHH and XPCLR scores denote selection that happened in the observed population. As a result, several regions of positive selection were identified at a significance level of p-value < 0.01 [21] in XPEHH, and the top 1,000 genes in XPCLR. Table 2 summarizes the numbers of genes under significant positive selection, which were identified in the six breed pairs (Y-L, Y-D, L-Y, L-D, D-L, and D-Y). For example, for the Y-L breed pair, which detects selection that occurred in the Yorkshire population when compared to the Landrace population, 376 and 1,869 genes were under positive selection based on the XPEHH and XPCLR tests, respectively. For the Y-D breed pair, when using the XPCLR test, 1,498 genes were under positive selection with 3,978 SNPs detected in Yorkshire, with Duroc as the reference population. By the consensus set of genes under positive selection identified with both the XPEHH and XPCLR tests, 26, 7, and 14 genes were identified in Yorkshire, Landrace, and Duroc, respectively (Fig. 1). For example, the 26 Yorkshire genes identified by both the XPCLR and XPEHH tests in both the Y-L and Y-D comparisons, and so effectively detected four times.
Table 2

Summary of genes under significant positive selection

The single nucleotide polymorphisms under positive selection were extended by approximately ±10 kb.

Parentheses indicate the number of single nucleotide polymorphisms at the significance level.

XPEHH, cross-population extended haplotype homozygosity; XPCLR, cross-population composite likelihood ratio.

aBreed specific genes related to lactation, reproduction, meat quality, and growth traits in previous studies.

Fig. 1

Visualization of genome-wide positive selection. Manhattan plots of the degree of selection signals based on pairwise populations of three pig breeds using cross-population extended haplotype homozygosity (A) and cross-population composite likelihood ratio (B). Red lines indicate the significance level and colored dots indicate the genes under significant positive selection. Y-L, Yorkshire-Landrace; Y-D, Yorkshire-Duroc; L-D, Landrace-Duroc; L-Y, Landrace-Yorkshire; D-L, Duroc-Landrace; D-Y, Duroc-Yorkshire.

The biological functions of genes under positive selection

We performed functional enrichment analysis on the following genes under positive selection using DAVID ver. 6.7: ST6GALNAC2 and EPHX1 in Yorkshire; PARK2 in Landrace; and BMP6, SLA-DQA1, and PRKG1 in Duroc. The results revealed enrichment of Kyoto Encyclopedia of Genes and Genomes (KEGG) Pathway terms related to metabolism, diseases, genetic or environmental information processing, and organismal systems (Table 3). The gene BMP6 acts as a hedgehog signaling pathway; the hedgehog protein is an important regulator of slow oxidative fiber clustering and is distributed throughout the tissue in pigs. Previous research has reported that BMP6 plays an important role in meat quality [8]. Among the 26 positively selected genes in Yorkshire, the ST6GALNAC2 gene is expressed during lactation, contributing to immune-related functions [5]. Up-regulation of the EPHX1 (epoxide hydrolase 1) gene occurs during mammary gland involution. Nakamura et al. [6] suggested that EPHX1 is involved in the initial stage of involution. The SEMA5A (semaphoring 5A) gene for Yorkshire reported strong candidate gene which affects the milk production traits for economic traits in cattle. The estrogen receptor (ER) gene for Landrace reported that mice lacking ER α show severe reproduction [7]. This implies that the ER gene might be a positively selected lactation trait in pigs. The Gene Ontology (GO) terms included immune system process (SLA-DQA1) and metabolic process (PRKG1) for Duroc (Supplementary Table 2). Williams et al. [3] reported that a low level of chronic immune system resulted in greater growth in pigs. These results suggest that the 26, 7, and 14 population-specific genes with significant signals may result from potential positive selection for lactation in Yorkshire, reproduction in Landrace, and meat-quality and growth traits in Duroc.
Table 3

Enriched KEGG pathways associated with positively selected genes

KEGG, kyoto encyclopedia of genes and genomes.

SNP effects for population-specific genes

To understand how SNPs related positive selection affect protein function, we analyzed SNP effects using the SNPeffect database. We identified two genes, SLA-DQA1 and EPHX1, with nsSNPs showing evidence of positive selection in D-Y, respectively (Fig. 2, Supplementary Table 1). To obtain accurate results, Fisher's exact test was performed on these two genes. The total numbers of statistical tests performed on the non-synonymous genes were 20 × 3 × 13 (number of cases of amino acids × number of breeds × number of non-synonymous amino acid positions) for SLA-DQA1, and 20 × 3 × 4 for EPHX1. From the enrichment tests, we identified three positions (rs324509622, rs80931851, and rs80937718) that were significant under a Bonferroni-adjusted p-value of <0.05 with an alternative hypothesis: the odds ratio is greater than 1 in the SLA-DQA1 gene. The p-values of the three non-synonymous positions were 5.20e-05, 5.20e-05, and 7.43e-06, respectively. As a note, the above three significant enrichment results were observed only in the Duroc breed. The non-synonymous positions showed strong evidence for positive selection between D-Y based on the XPEHH test. As the result, we validated the Duroc-specific enrichment of amino acids affected by non-synonymous SNPs using statistical tests. This result will contribute to improving the meat quality of pigs.
Fig. 2

Visualization of single nucleotide polymorphism (SNP) effects on positively selected genes. (A) Visualization of the sequence logo for the SLA-DQA1 gene with 13 non-synonymous SNPs (nsSNPs) showing evidence of positive selection in Duroc. (B) Visualization of enrichment tests for the 13 nsSNPs. The red box indicates the significant positive selection regions under the Bonferroni adjusted p-value < 0.05.

Discussion

Identifying positive selection assists in the exploration of the genetic mechanisms of phenotypic diversity [2]. Selective sweep causes changes in the allele frequencies of SNPs, and the resulting changes in genetic variation can be useful in detecting selective pressure [9]. Although several studies have reported selective sweeps identified using chips such as the Illumina Porcine 60K SNP BeadChip [22] in pigs, few have identified selective sweeps using whole-genome sequencing. Therefore, detecting positive selection in the pig genome appears worthwhile. Here we implemented two representative between-population methods, XPEHH and XPCLR, to explore positive selection. XPEHH is designed to detect near fixation or fixed selective sweeps by comparing haplotypes between two populations [10]. XPCLR is more robust for SNP discovery bias than allele frequency-based methods [9]. XPEHH tests whether the SNP site is homozygous in one population and polymorphic in another by comparing the extended haplotype homozygosity score of two populations on a core SNP. A negative XPEHH value indicates that selection happened in the reference population [15]. XPCLR was designed as a multi-locus composite likelihood ratio method to avoid the bias influence in SNP discovery. XPEHH and XPCLR find one core SNP and grid window using multi-locus information, and thus these tests identify more significant selection regions than F or Tajima's D [23]. Identifying significant genome-wide selection regions with multiple loci using a single-marker F test is difficult. Therefore, for multiple SNP methods, we carried out XPEHH and XPCLR and excluded F and Tajima's D. XPEHH assumes an approximately normal distribution. Park et al. [16] determined empirical p-values using the top 1% of the significance, and our results show that these significance cutoffs are more reasonable. In addition, positive selection identified by multiple test methods is more convincing. Bioinformatic analysis of genes under positive selection in this study revealed interesting biological information related to lactation, reproduction, meat quality, and growth. We identified several GO terms using the DAVID functional annotation tool. Immune system process (GO:0002376) was found to be significant in Duroc. Williams et al. [3] reported that a low level of chronic immune system resulted in greater growth in pigs. This GO term suggests that genes with some growth traits may have been under selection in the domestication process. Growth traits are referred to as important indicators of the immune system. This suggests that SLA-DQA1 might play a role in the pig immune system. In addition, BMP6 and PRKG1 in Duroc had biological pathways related to meat-quality traits. Fonseca et al. [8] reported that the BMP6 gene plays a role in increasing the quality of meat juiciness and tenderness. Lee et al. [22] identified the significant SNPs located in the BMP6 gene related to meat quality through a genome-wide association study using the Porcine 60K BeadChip. Therefore, the SLA-DQA1 and BMP6 genes might be effects of positive selection for growth and meat quality, respectively, in Duroc. In Yorkshire, three genes under significant positive selection were ST6GALNAC2, EPHX1, and SEMA5A; these genes were related to lactation traits in several previous studies. The ST6GALNAC2 gene is expressed during lactation [5], EPHX1 is expressed in the early involution of mammary glands [6], and SEMA5A is reportedly related to milk production traits in cattle [21]. In Landrace, the ER gene was reported to be related to reproduction traits in previous studies. Generally ER refers to either ER α or ER β and estrogens are involved in differentiation and maintenance of reproductive tissue. Krege et al. [7] reported that mice lacking ER α exhibited severe reproductive problems. This implies that the ER gene could be an effect of positive selection for lactation traits in pigs as well as in mice. Our findings show that several significant SNPs were present in genes found to be related to lactation, reproduction, meat quality, and growth traits in previous studies. These results indirectly support the effects of positive selection. Unfortunately, information on many of the genes with significant signals has not yet been analyzed. Based on SNP analyses for positively selected genes, SLA-DQA1 and EPHX1 genes exhibited evidence of positive selection in D-Y, respectively. However, an enrichment test revealed that the three significant amino-acid positions in SLA-DQA1 and EPHX1 were not significant. Our analyses suggested that several genes–such as EPHX1, PARK2, BMP6, SLA-DQA1, PRKG1, and ST6GALNAC2–might have undergone positive selection. These genes are associated with lactation, reproduction, meat quality, and growth traits. All findings in this study provide useful data for the study of pigs, however our results are preliminary and need to validate these genes in another set of samples.
  21 in total

1.  The KEGG databases at GenomeNet.

Authors:  Minoru Kanehisa; Susumu Goto; Shuichi Kawashima; Akihiro Nakaya
Journal:  Nucleic Acids Res       Date:  2002-01-01       Impact factor: 16.971

2.  The Gene Ontology (GO) database and informatics resource.

Authors:  M A Harris; J Clark; A Ireland; J Lomax; M Ashburner; R Foulger; K Eilbeck; S Lewis; B Marshall; C Mungall; J Richter; G M Rubin; J A Blake; C Bult; M Dolan; H Drabkin; J T Eppig; D P Hill; L Ni; M Ringwald; R Balakrishnan; J M Cherry; K R Christie; M C Costanzo; S S Dwight; S Engel; D G Fisk; J E Hirschman; E L Hong; R S Nash; A Sethuraman; C L Theesfeld; D Botstein; K Dolinski; B Feierbach; T Berardini; S Mundodi; S Y Rhee; R Apweiler; D Barrell; E Camon; E Dimmer; V Lee; R Chisholm; P Gaudet; W Kibbe; R Kishore; E M Schwarz; P Sternberg; M Gwinn; L Hannick; J Wortman; M Berriman; V Wood; N de la Cruz; P Tonellato; P Jaiswal; T Seigfried; R White
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

3.  EXP-PAC: providing comparative analysis and storage of next generation gene expression data.

Authors:  Philip C Church; Andrzej Goscinski; Christophe Lefèvre
Journal:  Genomics       Date:  2012-05-15       Impact factor: 5.736

4.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.

Authors:  Aaron McKenna; Matthew Hanna; Eric Banks; Andrey Sivachenko; Kristian Cibulskis; Andrew Kernytsky; Kiran Garimella; David Altshuler; Stacey Gabriel; Mark Daly; Mark A DePristo
Journal:  Genome Res       Date:  2010-07-19       Impact factor: 9.043

5.  Population differentiation as a test for selective sweeps.

Authors:  Hua Chen; Nick Patterson; David Reich
Journal:  Genome Res       Date:  2010-01-19       Impact factor: 9.043

6.  The Sequence Alignment/Map format and SAMtools.

Authors:  Heng Li; Bob Handsaker; Alec Wysoker; Tim Fennell; Jue Ruan; Nils Homer; Gabor Marth; Goncalo Abecasis; Richard Durbin
Journal:  Bioinformatics       Date:  2009-06-08       Impact factor: 6.937

7.  Slow fiber cluster pattern in pig longissimus thoracis muscle: implications for myogenesis.

Authors:  S Fonseca; I J Wilsons; G W Horgan; C A Maltin
Journal:  J Anim Sci       Date:  2003-04       Impact factor: 3.159

8.  Ancient DNA, pig domestication, and the spread of the Neolithic into Europe.

Authors:  Greger Larson; Umberto Albarella; Keith Dobney; Peter Rowley-Conwy; Jörg Schibler; Anne Tresset; Jean-Denis Vigne; Ceiridwen J Edwards; Angela Schlumbaum; Alexandru Dinu; Adrian Balaçsescu; Gaynor Dolman; Antonio Tagliacozzo; Ninna Manaseryan; Preston Miracle; Louise Van Wijngaarden-Bakker; Marco Masseti; Daniel G Bradley; Alan Cooper
Journal:  Proc Natl Acad Sci U S A       Date:  2007-09-13       Impact factor: 11.205

9.  SNPeffect: a database mapping molecular phenotypic effects of human non-synonymous coding SNPs.

Authors:  Joke Reumers; Joost Schymkowitz; Jesper Ferkinghoff-Borg; Francois Stricher; Luis Serrano; Frederic Rousseau
Journal:  Nucleic Acids Res       Date:  2005-01-01       Impact factor: 16.971

10.  Investigation of de novo unique differentially expressed genes related to evolution in exercise response during domestication in Thoroughbred race horses.

Authors:  Woncheoul Park; Jaemin Kim; Hyeon Jeong Kim; JaeYoung Choi; Jeong-Woong Park; Hyun-Woo Cho; Byeong-Woo Kim; Myung Hum Park; Teak-Soon Shin; Seong-Keun Cho; Jun-Kyu Park; Heebal Kim; Jae Yeon Hwang; Chang-Kyu Lee; Hak-Kyo Lee; Seoae Cho; Byung-Wook Cho
Journal:  PLoS One       Date:  2014-03-21       Impact factor: 3.240

View more
  1 in total

1.  Optimal sequencing depth design for whole genome re-sequencing in pigs.

Authors:  Yifan Jiang; Yao Jiang; Sheng Wang; Qin Zhang; Xiangdong Ding
Journal:  BMC Bioinformatics       Date:  2019-11-08       Impact factor: 3.169

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.