Literature DB >> 29879810

In silico approaches to discover the functional impact of non-synonymous single nucleotide polymorphisms in selective sweep regions of the Landrace genome.

Donghyun Shin1, Kyung-Hye Won1, Ki-Duk Song1,2.   

Abstract

OBJECTIVE: The aim of this study was to discover the functional impact of non-synonymous single nucleotide polymorphisms (nsSNPs) that were found in selective sweep regions of the Landrace genome.
METHODS: Whole-genome re-sequencing data were obtained from 40 pigs, including 14 Landrace, 16 Yorkshire, and 10 wild boars, which were generated with the Illumina HiSeq 2000 platform. The nsSNPs in the selective sweep regions of the Landrace genome were identified, and the impacts of these variations on protein function were predicted to reveal their potential association with traits of the Landrace breed, such as reproductive capacity.
RESULTS: Total of 53,998 nsSNPs in the mapped regions of pigs were identified, and among them, 345 nsSNPs were found in the selective sweep regions of the Landrace genome which were reported previously. The genes featuring these nsSNPs fell into various functional categories, such as reproductive capacity or growth and development during the perinatal period. The impacts of amino acid sequence changes by nsSNPs on protein function were predicted using two in silico SNP prediction algorithms, i.e., sorting intolerant from tolerant and polymorphism phenotyping v2, to reveal their potential roles in biological processes that might be associated with the reproductive capacity of the Landrace breed.
CONCLUSION: The findings elucidated the domestication history of the Landrace breed and illustrated how Landrace domestication led to patterns of genetic variation related to superior reproductive capacity. Our novel findings will help understand the process of Landrace domestication at the genome level and provide SNPs that are informative for breeding.

Entities:  

Keywords:  Landrace; Next-generation Sequencing; Non-synonymous Single Nucleotide Polymorphism; Reproductive Capacity; Selective Sweep

Year:  2018        PMID: 29879810      PMCID: PMC6212746          DOI: 10.5713/ajas.18.0122

Source DB:  PubMed          Journal:  Asian-Australas J Anim Sci        ISSN: 1011-2367            Impact factor:   2.509


INTRODUCTION

The recently developed high-throughput and cost-effective genotyping techniques allow the thorough exploration of genetic variation in domestic animals. In particular, whole-genome sequencing is a powerful approach for detecting massive amounts of single nucleotide polymorphisms (SNPs) in genome-wide sequence data. One of the strategies for studying genetic variation is to detect the selective sweep signatures based on patterns of linkage disequilibrium (LD) [1], which was proposed by Smith and Haigh [2], and other researchers have expanded and applied it [3-6]. Wang et al [7] performed a relative extended haplotype homozygosity (REHH) test to detect selective sweep regions of the Landrace genome using genotyping by genome sequencing. The genetic signature for selection of body size investigated by estimating the XP-EHH statistic in the Yucatan miniature pig [8]. Whole-genome re-sequencing of Jeju black pig (JBP) and Korean native pigs (which live on the Korean peninsula) were performed to identify signatures of positive selection in JBP, the true and pure Korean native pigs [9]. Studies of selective sweeps in pigs have revealed strong selection signatures associated with genes underlying economic traits such as the body length, disease resistance, pork yield, muscle development, and fertility [10,11]. Diverse types of variants, e.g. copy number variations, insertion/deletion (InDel) and structural variations, have been identified in the selective sweep regions of the Landrace genome [7]. Unlike many SNPs are phenotypically neutral, non-synonymous SNPs (nsSNPs) that are located in protein-coding regions and lead to amino acid substitutions in the corresponding protein product might have functional impacts and play a role in biological processes through altering the protein structure, stability, or function, these variations are often strongly associated with several phenotypes [12]. In the case of pigs, previous studies reported the different polymorphic patterns of nsSNPs in the Toll-like receptor genes between European wild boars and domestic pigs [13]. In this study, we aimed to identify nsSNPs in the selective sweep regions of the Landrace genome that might be related to superior reproductive capacity or growth and development during the perinatal period, and gene networks that were enriched in Landrace genome. Finally, impact of amino acid changes by nsSNPs on protein function was also investigated using in silico bioinformatic tools.

MATERIALS AND METHODS

Sample preparation and whole-genome re-sequencing

In this study, a whole-genome sequence data set consisting of 14 Landrace (Danish), 16 Yorkshire (Large White) pigs, and 10 wild boars, were obtained from the NCBI Sequence Read Archive database (SRP047260). FastQC software [14] were used to perform a quality check on raw sequence data. Using Trimmomatic-0.32 [15], potential adapter sequences were removed before sequence alignment. Paired-end sequence reads were mapped to the pig reference genome (Sscrofa 10.2.75) from the Ensembl database using Bowtie2 [16] with the default settings. For downstream processing and variant calling, following software packages were used: Picard tools (http://broadinstitute.github.io/picard/), SAMtools [17], and Genome Analysis Toolkit (GATK) [18]. “CreateSequenceDictionary” and “MarkDuplicates” Picard command-line tools were used to read reference FASTA sequences for writing bam files with only a sequence dictionary and to filter potential polymerase chain reaction duplicates, respectively. Using SAMtools, index files were created for the reference and bam files. Local realignment of sequence reads was performed to correct misalignment due to the presence of small insertions and deletions using GATK “Realigner-TargetCreator” and “IndelRealigner” arguments. In addition, base quality score recalibration was performed to obtain accurate quality scores and to correct the variation in quality with machine cycle and sequence context. For calling variants, GATK “UnifiedGenotyper” and “SelectVariants” arguments were used with the following filtering criteria. All variants with i) a Phred-scaled quality score of less than 30; ii) read depth less than 5; iii) MQ0 (total count across all samples of mapping quality zero reads) >4; or iv) a Phred-scaled p-value using Fisher’s exact test of more than 200 were filtered out to reduce false-positive calls due to strand bias. “vcf-merge” tools of VCFtools [19] were used to merge all of the variants calling format files for the 40 samples. Additionally, tri-allelic SNPs were excluded, and all filtered SNPs on autosomes (a total of 26,240,429 SNPs) were annotated using an SNP annotation tool, SnpEff version 4.1a and the Ensemble Sus scrofa gene set version 75 (Sscrofa10.2.75). 53,998 nsSNPs (missense variants) were identified on autosomes from 40 sets of pig whole-genome data (Figure 1). Then, certain SNPs due to poor genotyping quality were removed; 4,174 SNPs were excluded based on Hardy-Weinberg equilibrium testing (p≤ 0.000001). In addition, a total of 19,002 SNPs with a minor allele frequency of <0.05 were excluded. After genomic data quality control, there were 30,822 SNPs for downstream analysis.
Figure 1

Functional classification of total single nucleotide polymorphisms (SNPs) from 40 pig whole-genome sequences (16 Yorkshire, 14 Landrace, and 10 wild boar). After SNP calling, all filtered SNPs (a total of 26,240,429 SNPs) were annotated using an SNP annotation tool, SnpEff version 4.1a (reference), and the Ensembl Sus scrofa gene set version 75 (Sscrofa10.2.75). Through SnpEff, we divided all SNPs into 31 functional classes containing non-synonymous SNPs (missense variants), as shown in this figure. The dotted line box in this figure indicates non-synonymous SNPs.

Population structure analysis

Population structure analysis was performed to infer the population structure of the 40 pigs with whole-genome sequence data. The program STRUCTURE (https://web.stanford.edu/group/pritchardlab/structure.html) was used to evaluate the extent of substructure among the 40 individuals belonging to three pig breeds. Bayesian clustering analysis implemented in STRUCTURE (version 2.3.4) was used to estimate the population structure using 30,822 nsSNPs from the whole-genome sequencing data of the 40 pigs [20]. An initial burn-in of 10,000 iterations were followed by 10,000 iterations for parameter estimation was sufficient to ensure the convergence of parameter estimates. To estimate the number of populations (the K parameter of STRUCTURE), the dataset was analyzed by allowing for the values of K = 3 (Figure 2).
Figure 2

Population structure analysis using STRUCTURE. Each individual is represented by a vertical bar, and the length of each colored segment in each of the vertical bars represents the proportion contributed by ancestral populations (K = 3).

Identify nsSNPs in Landrace selective sweep regions

A previous study identified 269 selective sweep regions of the Landrace genome using the REHH test (p-value≤0.01), which was used to detect the recent positive selection signatures by evaluating how LD decays across the genome 7. A total of 261 of 269 selective sweep regions of the Landrace genome were on autosomes, and 345 nsSNPs belonged to 55 Landrace selective sweep regions were identified (Figure 3). Overall, 345 nsSNPs in 55 selective sweep regions of the Landrace genome belonged to 90 genes, and gene function 64 of total 90 genes were discovered. Gene ontology (GO) network analysis was performed using ClueGO [21] to infer the biological meaning of the genes related to nsSNPs in Landrace selective sweep regions.
Figure 3

Genotypes of 345 non-synonymous single nucleotide polymorphisms (SNPs) in Landrace selective sweep regions. The genotype patterns of 345 non-synonymous SNPs in the selective sweep regions of the Landrace genome are represented by a heat map. The colors of the boxes represent the genotypes of each of the 40 individuals from the whole-genome sequencing data. Dark blue indicates that the genotypes of both the alleles were the same as that of the minor allele. Blue boxes indicate that one of the two alleles was the same as the minor allele and the other was the same as the major allele. Sky blue means that the genotypes of both alleles were the same as that of the major allele. The left side of the figure shows a list of each SNP name, which consists of the chromosome, position, and minor allele type. The gray box at the bottom of the figure indicates the three breeds.

Predicting damaging amino acid substitutions of non-synonymous SNPs specific to the Landrace breed

In this study, the functional effects of nsSNPs were predicted using the following in silico algorithms: sorting intolerant from tolerant (SIFT) [22] and polymorphism phenotyping v2 (Polyphen-2) [23]. Total 345 nsSNPs in 55 selective sweep regions of the Landrace genome were analyzed using SIFT. NsSNPs with less than 0.05 of SIFT score, which was regarded as deleterious, were used for PolyPhen-2 ver. 2.2.2 (http://genetics.bwh.harvard.edu/pph2/) analysis to predict the influence of an amino acid change on the structure and function of a protein by using specific empirical rules [23]. From the results of Polyphen-2 analysis, nsSNPs were classified into probably damaging, possibly damaging, and benign based on their scores (ranging from 0 to 1); if Polyphen-2 score for nsSNPs was more than 0.95, nsSNPs were considered to be “probably damaging”, while for values between 0.5 and 0.95, they were regarded as “possibly damaging”. The scores below 0.5 were classified as “benign”. In this study, probably damaging and possibly damaging SNPs were judged as to have strong effects on protein function. If the SIFT score of each SNP was less than 0.05, the SNP was regarded as being deleterious, which could strongly affect protein function. Additionally, we performed PolyPhen-2 (version 2.2.2) analysis to predict the influence of an amino acid change on the structure and function of a protein by using specific empirical rules [23]. Amino acid sequences corresponding to nsSNPs of interest from the Ensembl database were obtained to perform PolyPhen-2 analysis.

RESULTS

DNA sequencing, data preprocessing, and genetic variant calling

A total of 26,240,429 SNPs were extracted on autosomes from the whole-genome sequences of the 40 pigs, including 14 Landrace individuals, and annotated all extracted SNPs using SnpEff version 4.1a (http://snpeff.sourceforge.net/SnpSift.html) [24]. Through this SNP annotation, all SNPs were divided into 31 functional classes, including nsSNPs (Figure 1). Most of the SNPs were located in intergenic or intronic regions; finally, we identified 53,998 nsSNPs (0.205% of the total SNPs). After quality control for all of the nsSNPs, there were 30,822 nsSNPs. Population structure analysis using the genotypic information on these SNPs provided the genetic relationship among breeds. The results from analyzing the population structure clearly distinguished Landrace, Yorkshire, and wild boar (Figure 2).

nsSNPs in Landrace selective sweep regions

A total of 269 selective sweep regions were obtained from a previous study on the Landrace breed to identify nsSNPs related to selective sweeps [7], and a total of 345 nsSNPs were identified from 55 Landrace selective sweep regions (Figure 3) by re-analyzing the data of previous study resequencing data of Landrace and Yorkshire [7]. Information of 345 nsSNPs in the selective sweep regions of the Landrace genome belonged to 90 genes were shown in Table 1. The average number of nsSNPs per gene was 3.83, and the gene length was not correlated to the number of nsSNPs (Figure 4). The deleted in malignant brain tumors 1 (DMBT1) gene consisted of 18 exons harboring 26 nsSNPs that were evenly distributed; this gene had the highest number of nsSNPs among the 90 genes. Moreover, there were considerable frequency differences between Landrace and other breeds (Yorkshire and wild boar) in nsSNPs of the DMBT1 gene (Figure 5). This suggests that DMBT1 is significantly affected by many nsSNPs in Landrace breed establishment. Previous studies strongly suggested an important role of DMBT1 in the process of fertilization in pigs; it was shown to be secreted in the oviduct and involved in the mechanism of fertilization in porcine species [25,26]. In particular, Ambruosi et al [25] reported that oviduct fluid containing DMBT1 protein was strongly related to the preparation of gametes for fertilization, fertilization itself, and subsequent embryonic development. Therefore, we assumed that nsSNPs of DMBT1 of Landrace might correlate with the fertilization capacity that was acquired during artificial selection, making the reproductive capacity of Landrace pigs superior to that of other breeds [27].
Table 1

Gene list containing non-synonymous SNPs in Landrace selective sweep regions

Gene nameCHRGene sartGene end# ns SNPSelective sweep regionGene nameCHRGene sartGene end# ns SNPSelective sweep region


PLG18,739,9818,787,58281:8670943–8797806ENSSSCG00000015184956,925,44956,927,19949:56869539–57122277
MELK1265,175,024265,288,28311:265063188–265212930ENSSSCG00000026119956,962,20356,963,1357
ZFPL126,231,2716,235,56612:6227731–6239068ENSSSCG00000015182956,971,20856,972,1405
ENSSSCG00000021162215,576,68015,577,60932:15569156–15593980ENSSSCG00000028463956,980,33456,981,5725
FAM180B216,204,57916,206,25632:16111708–16299440ENSSSCG00000024117957,283,04257,284,50159:57230656–57379772
ENSSSCG00000025219262,507,45262,508,40812:62355986–62756249ENSSSCG00000024455957,293,94157,296,8061
ENSSSCG00000013821262,624,61662,625,54814DMTF19102,893,256102,929,92119:102847568–103896296
ENSSSCG00000013822262,644,87062,645,79614DENND1B1025,096,49825,193,569110:25139986–25249094
ENSSSCG00000013819262,669,70362,670,6628ENSSSCG000000109071026,249,07926,284,3001710:26197521–26710943
MCOLN1272,056,66472,151,71312:72143419–72172550PTPRC1026,308,75926,332,284210:26197521–26710943
ENSSSCG00000014078285,731,83885,732,24242:85467258–86506548KIAA14621045,386,45045,428,443410:45403837–45436342
ANKRD31285,774,88685,807,1992GJD41063,677,68163,683,060210:63669866–63725092
ANKDD1B286,257,32586,321,7053ENSSSCG000000218291111,141,41311,236,840111:10400737–11376721
SDK133,634,2883,824,25233:3730382–3773007ENSSSCG000000206991111,355,26111,378,0421
PLA2G656,996,4147,059,75615:6988526–7058468CCDC1681178,361,37278,368,8472211:78318648–78678168
BIN2517,315,11717,339,45715:17248525–17487183DNAI2126,779,1526,799,278312:6771152–6805468
TAC3524,048,55324,056,42735:23288996–24074802MARCH101215,897,68115,944,3411012:15890650–15938045
ZBTB39524,066,66024,068,7844MAPT1217,123,47117,172,747212:16937097–17191735
NCAPD2566,432,58466,443,84415:66396846–66725591CCL231241,160,87741,165,234312:41158920–41165901
VAMP1566,646,13566,647,7431CCL11242,467,61842,471,014312:42468535–42621081
TAPBPL566,647,21166,658,6242ENSSSCG000000178341250,542,08550,552,985112:50535159–50581774
DMBT1643,728,92543,753,137266:43719388–43757067SHPK1251,572,87151,592,551112:51579885–51586595
ENSSSCG000000276186119,199,612119,199,92036:119198939–119344591SPNS31252,389,07152,445,090112:52401285–52444137
MCOLN26119,212,826119,273,3641CCDC661342,284,16342,341,496213:41196871–42465605
PCNX17100,745,867100,862,08127:100703442–100775415NOC4L1424,724,49224,730,021214:24592939–24779049
PLD47131,340,863131,347,98737:131291714–131388688DDX511424,730,04524,732,8782
ENSSSCG000000025517131,356,311131,359,4615EP4001424,748,33624,847,5672
ATP8A1835,180,99235,309,86728:34998191–35275833ENSSSCG000000100131450,652,38150,652,947114:50647172–50719083
ENSSSCG0000002799992,277,2562,278,26479:2223331–2577505OSBP21450,669,01950,849,2902
OVCH292,307,9532,321,19710KIF20B14110,499,118110,581,337114:110280822–110542445
ENSSSCG0000002589892,361,2092,362,1475FGFR1IIIC1555,215,59255,269,381115:55142754–55608192
ENSSSCG0000002347792,370,8892,371,83012LETM21555,274,27655,294,3331
ENSSSCG0000002963492,455,3702,528,7831WHSC1L11555,338,00755,406,4292
TRIM393,923,9863,940,04619:3927497–3978728DDHD21555,414,56555,455,1951
HPX93,946,3813,955,2534ASH2L1555,512,10455,552,5041
SMPD193,961,5893,964,5041ENSSSCG0000002968315128,593,493128,594,377615:128498493–128627886
MOGAT2911,119,06211,132,962119:11120076–11136889CWC271646,572,51246,875,541216:46472193–46771773
THAP12911,652,41511,669,84429:11449284–11760977CD931734,381,62634,384,902217:34206246–34400408
GAB2913,936,30714,135,68519:13934282–14030509GZF11734,441,51734,447,221317:34421087–34505222
ELMOD1940,189,95640,282,81419:40189621–40286365NAPB1734,450,36834,485,1521
ATM940,925,89540,945,43939:40793693–41170478CSTL11734,492,91034,496,5852
KDELC2941,043,56441,065,0777CST71734,906,65534,915,135117:34901568–34908632
EXPH5941,073,54641,217,32912DEFB1191739,921,30239,931,655217:39862221–40018288
ENSSSCG00000023913941,145,01741,152,1763DEFB1161739,996,66239,999,0761
ARHGAP20943,174,64843,222,58319:43134418–43291918ENSSSCG000000073371746,357,15446,401,936217:46275105:46424519

SNPs, single nucleotide polymorphisms; nsSNPs, non-synonymous SNPs.

We show the information of genes containing non-synonymous SNPs. In this table, the fifth column indicates the number of non-synonymous SNPs in each gene and the seventh column presents information on the selective sweep regions of the Landrace genome and selective sweep name, consisting of chromosome, start position, and end position.

Figure 4

Correlation between length and number of single nucleotide polymorphisms (SNPs) in genes related to non-synonymous SNPs (nsSNPs) in Landrace selective sweep regions.

Figure 5

Frequency difference of non-synonymous single nucleotide polymorphisms (nsSNPs) in deleted in malignant brain tumors 1 genes between Landrace and other breeds (Yorkshire and wild boar).

Among 90 genes, the functions of 64 genes were predicted, and we performed GO network analysis of these 64 genes using ClueGO [21] to draw inferences on the biological effects of nsSNPs in Landrace selective sweep regions. The information on these networks is shown in Figure 6 and Table 2. The GO network analysis revealed that 19 of the total of 64 genes were associated with five major GO terms, and these major terms were closely related to the reproductive capacity or growth and development of the Landrace breed during the perinatal period. In the GO network, seven genes (C-C motif chemokine ligand 1 [CCL1], CCL23, hemopexin, mucolipin 1, leucine zipper and EF-hand containing transmembrane protein 2, phospholipase A2 group VI [PLA2G6], and protein tyrosine phosphatase, receptor type, C [PTPRC]) were related to cellular metal ion homeostasis in seven major GO terms, and this cluster was the largest in this network. Moreover, these terms were similar to the GO results of a positively selected region identified in Wang’s study of Landrace selective sweeps [7]. Metal ions are one major group of mineral; since components of follicular fluid such as Ca, Cu, and Fe significantly increase as the follicles increase in size, some minerals appear to play an important role in pig reproduction [28]. Five genes (ATPase phospholipid transporting 8A1 [ATP8A1], CCL1, kinesin family member 20B, plasminogen, and PTPRC) were shown to be involved in the positive regulation of locomotion, and its network consisted of four GO terms (positive regulation of locomotion, positive regulation of cellular component movement, positive regulation of cell motility, and positive regulation of cell migration). This cellular movement is a central process in the development and maintenance of multicellular organisms. In addition, tissue formation during embryonic development requires the orchestrated movement of cells in a particular direction. It is reasonable to assume that several genes of these four significant GO terms in the selective sweep regions of the Landrace genome might be related to the superior growth and development of Landrace during the perinatal period. Ten genes (ATP8A1, bridging integrator 2, CD93 molecule [CD93], exophilin 5, GRB2 associated binding protein 2, n-ethylmaleimide-sensitive factor attachment protein, beta, PLA2G6, PTPRC, and vesicle associated membrane protein 1 [VAMP1]) were associated with exocytosis, and five genes (ATP8A1, CD93, DMBT1, PTPRC, and VAMP1) were classified under the secretory granule membrane term in the GO network. The acrosome contains a single secretory granule and is located in the head of mammalian sperm; secretion from this granule is an absolute requirement for fertilization [29]. Acrosome exocytosis is a synchronized and tightly regulated all-or-nothing process, which provides a unique model for studying the multiple steps of the membrane fusion cascade [29]. Therefore, we assumed that these genes containing nsSNPs in the selective sweep region, which are related to exocytosis and the secretory granule membrane, might have been influenced by artificial selection, considering the distinctive reproductive capacity of the Landrace breed [27].
Figure 6

Gene ontology (GO) network analysis of genes related to non-synonymous single nucleotide polymorphisms (SNPs) in Landrace selective sweep regions. Significant results of GO analysis using genes related to non-synonymous SNPs in the selective sweep regions of the Landrace genome with our criteria in ClueGO packages of Cytoscape (number of genes = 4, sharing group percentage = 40.0). These results are largely divided into eight clusters as follows.

Table 2

Information of gene ontology (GO) network analysis of genes related to non-synonymous SNPs in Landrace selective sweep regions

GO IDGO TermTerm p-valueGroup p-value#GenesAssociated genes found
GO:0002274Myeloid leukocyte activation0.0050.0057ATP8A1, BIN2, CD93, GAB2, MAPT, PTPRC, SHPK
GO:0006887Exocytosis0.0010.00110ATP8A1, BIN2, CD93, EXPH5, GAB2, NAPB, PLA2G6, PLG, PTPRC, VAMP1
GO:0030667Secretory granule membrane0.0030.0035ATP8A1, CD93, DMBT1, PTPRC, VAMP1
GO:0040017Positive regulation of locomotion0.0160.0175ATP8A1, CCL1, KIF20B, PLG, PTPRC
GO:0051272Positive regulation of cellular component movement0.0135ATP8A1, CCL1, KIF20B, PLG, PTPRC
GO:2000147Positive regulation of cell motility0.0125ATP8A1, CCL1, KIF20B, PLG, PTPRC
GO:0030335Positive regulation of cell migration0.0105ATP8A1, CCL1, KIF20B, PLG, PTPRC
GO:0006873Cellular ion homeostasis0.0030.0067CCL1, CCL23, HPX, LETM2, MCOLN1, PLA2G6, PTPRC
GO:0055080Cation homeostasis0.0057CCL1, CCL23, HPX, LETM2, MCOLN1, PLA2G6, PTPRC
GO:0030003Cellular cation homeostasis0.0037CCL1, CCL23, HPX, LETM2, MCOLN1, PLA2G6, PTPRC
GO:0055065Metal ion homeostasis0.0037CCL1, CCL23, HPX, LETM2, MCOLN1, PLA2G6, PTPRC
GO:0072507Divalent inorganic cation homeostasis0.0155CCL1, CCL23, MCOLN1, PLA2G6, PTPRC
GO:0006875Cellular metal ion homeostasis0.0017CCL1, CCL23, HPX, LETM2, MCOLN1, PLA2G6, PTPRC
GO:0072503Cellular divalent inorganic cation homeostasis0.0135CCL1, CCL23, MCOLN1, PLA2G6, PTPRC
GO:0055074Calcium ion homeostasis0.0115CCL1, CCL23, MCOLN1, PLA2G6, PTPRC
GO:0006874Cellular calcium ion homeostasis0.0105CCL1, CCL23, MCOLN1, PLA2G6, PTPRC

SNPs, single nucleotide polymorphisms.

Significant results of GO analysis using genes related to non-synonymous SNPs in the selective sweep regions of the Landrace genome with our criteria in ClueGO packages of Cytoscape (number of genes = 4, sharing group percentage = 40.0). These results are largely divided into eight clusters as follows.

Predicting strong effects of nsSNPs on amino acid substitutions in Landrace selective sweep region

Two in silico SNP prediction algorithms, SIFT [22] and PolyPhen-2 [23], were applied to estimate the possible effects of the stabilizing residues on protein functions for 345 nsSNPs in Landrace selective sweep regions. The results of SIFT and Polyphen-2 for 345 non-synonymous SNPs are shown in Tables 3, 4.
Table 3

Summary of non-synonymous single amino acid variation in genes of Landrace selective sweep using SIFT and Polyphen-2

Polyphen-2

BenignPossibly damagingProbably damagingTotal
SIFTDeleterious29192775
Tolerated2342115270
Total2634042345

SIFT, sorting intolerant from tolerant; Polyphen-2, polymorphism phenotyping v2.

Table 4

Forty-six non-synonymous SNPs with strong effects on protein functions based on SIFT and Polyphen-2

SNPCHRPOSA1A2SIFT predictionSIFT scorePolyphen-2 predictionPolyphen-2 scoreGeneSelective sweep
rs328613228216,206,079TGdeleterious0probably damaging0.997FAM180B2:16111708:16299440
2:62624837262,624,837GAdeleterious0.017possibly damaging0.853ENSSSCG000000138212:62355986:62756249
rs340857214262,625,107GAdeleterious0.021possibly damaging0.539
2:62625190262,625,190ATdeleterious0.028possibly damaging0.934
rs335820735262,644,986ATdeleterious0.008probably damaging0.999ENSSSCG00000013822
rs343007761262,645,014TGdeleterious0.018possibly damaging0.506
2:62645060262,645,060AGdeleterious0.012possibly damaging0.604
rs325197977262,645,081AGdeleterious0possibly damaging0.934
2:62669920262,669,920GAdeleterious0.008possibly damaging0.934ENSSSCG00000013819
2:62669953262,669,953TGdeleterious0.007possibly damaging0.934
2:62670031262,670,031GAdeleterious0.012probably damaging0.999
rs342394815285,732,226TCdeleterious0.002probably damaging0.999ENSSSCG000000140782:85467258:86506548
rs337260402285,732,237TGdeleterious0.003probably damaging0.97
rs326720643285,775,718AGdeleterious0.007probably damaging0.984ANKRD31
rs318473425286,321,677TAdeleterious0.033probably damaging0.995ANKDD1B
rs329106718566,654,214CTdeleterious0probably damaging0.993TAPBPL5:66396846:66725591
rs326638161643,729,346TCdeleterious0.007probably damaging0.988DMBT16:43719388:43757067
rs322198139643,750,820GTdeleterious0.017possibly damaging0.915
rs321057648643,750,963AGdeleterious0.009possibly damaging0.663
6:1191998356119,199,835TAdeleterious0.006probably damaging0.998ENSSSCG000000276186:119198939:119344591
rs327779736835,181,016ATdeleterious0possibly damaging0.944ATP8A18:34998191:35275833
rs81399633835,181,037AGdeleterious0.023possibly damaging0.896
rs34363629992,311,094TCdeleterious0.042probably damaging1OVCH29:2223331:2577505
rs31829800993,930,944TAdeleterious0.006probably damaging0.996TRIM39:3927497:3978728
9:11129485911,129,485TGdeleterious0.035probably damaging0.995MOGAT29:11120076:11136889
rs340556206911,129,936TCdeleterious0.013probably damaging0.999
rs81509118911,130,742AGdeleterious0.036probably damaging1
rs342457070911,130,778CAdeleterious0.005probably damaging0.991
rs327337551911,130,783GCdeleterious0.047possibly damaging0.697
rs338381437911,666,878GAdeleterious0.003probably damaging0.983THAP129:11449284:11760977
rs81214615941,047,573TAdeleterious0.024probably damaging0.99KDELC29:40793693:41170478
rs339385194941,076,701GTdeleterious0.04probably damaging0.999EXPH5
9:56962342956,962,342AGdeleterious0.028possibly damaging0.616ENSSSCG000000261199:56869539:57122277
9:56962578956,962,578ACdeleterious0.026probably damaging0.994
rs328160175956,971,732GAdeleterious0.016probably damaging0.994ENSSSCG00000015182
rs335643554956,980,378CTdeleterious0.032possibly damaging0.539ENSSSCG00000028463
rs331490061956,981,034AGdeleterious0.004possibly damaging0.927
rs3260142761063,681,709GCdeleterious0.037possibly damaging0.944GJD410:63669866:63725092
rs3393530311178,365,823GAdeleterious0.008probably damaging0.983CCDC16811:78318648:78678168
11:783678891178,367,889GAdeleterious0probably damaging0.993
rs3426868321178,367,955AGdeleterious0.034possibly damaging0.94
rs3256502261215,917,860TCdeleterious0.002probably damaging0.999MARCH1012:15890650:15938045
rs3362244711215,917,910ACdeleterious0.03possibly damaging0.82
15:554004791555,400,479AGdeleterious0.032probably damaging1WHSC1L115:55142754:55608192
rs3394617601646,612,542CGdeleterious0.007probably damaging0.998CWC2716:46472193:46771773
rs3244242311746,357,195AGdeleterious0probably damaging0.998ENSSSCG0000000733717:46275105:46424519

SNPs, single nucleotide polymorphisms; SIFT, sorting intolerant from tolerant; Polyphen-2, polymorphism phenotyping v2.

We identified that 46 of 345 non-synonymous SNPs in the selective sweep regions of the Landrace genome had strong effects on protein function as determined with both in silico tools: SIFT and PolyPhen-2.

According to the SIFT analysis, 75 of 345 nsSNPs were classified as being deleterious (for some SNPs, there was low confidence in the findings regarding deleteriousness). PolyPhen-2 calculates the true-positive rate as a fraction of predicted mutations; its results showed that 82 amino acid variants involving nsSNPs in the selective sweep regions of the Landrace genome were likely to exert deleterious functional effects. In addition, 46 of these nsSNPs overlapped with the SIFT results. From the results of the two bioinformatics tools, we reasoned that 46 of the 345 nsSNPs might have strong effects on biological mechanisms during the process of Landrace domestication (Table 4). Forty-six nsSNPs that had strong effects on protein function were distributed among 26 genes and 19 selective sweep regions. In addition, 2:62355986–62756249 among the 55 selective sweep regions containing nsSNPs had the most nsSNPs (37 SNPs), and the results of the two tools for predicting the nsSNP effects showed that 10 of 37 SNPs in 2:62355986–62756249 had strong effects on protein function. This was the largest number of nsSNPs with a strong effect among the total of 55 selective sweep regions of the Landrace genome containing an nsSNP. In addition, three genes belonged to this selective sweep region: ENSSSCG00000013821, ENSSSCG00000013822, and ENSSSCG00000013819. Because the selective region (2: 62355986–62756249) where this gene is located has not been annotated, we estimated the approximate functions of these three genes by analyzing their orthologs. We searched for orthologous genes of these three genes for which the detailed function had been discovered in placental mammals; there were no one-to-one orthologous genes and only many-to-many orthologous genes (Table 5). Because the lists of orthologs of the three genes were the same, we guessed that the functions of the three genes would be very similar. Because the orthologous genes consisted of 18 genes from 8 species from placental mammals and all 18 genes were related to olfactory receptors, we assumed that ENSSSCG00000013821, ENSSSCG 00000013822, and ENSSSCG00000013819 were inferred as olfactory receptors. In a previous study of pig evolution, one of the several significant features of porcine genome expansion involved the olfactory receptor gene family [30]. Martien et al [26] reported that there are 1,301 porcine olfactory receptor genes and 343 partial olfactory receptor genes. This large number of functional olfactory receptor genes most probably reflects the strong reliance of pigs on their sense of smell while scavenging for food. The presence of greater number of nsSNPs in genes related to olfactory receptors suggested important roles of these genes during selection. Additionally, the monoacylglycerol O-acyltransferase 2 (MOGAT2) gene was shown to have the greatest number of nsSNPs with a strong effect among the 90 genes. Five SNPs of the total of 11 nsSNPs in the MOGAT2 gene had strong effects on protein function in this study. Although our GO network analysis did not reveal any particularly important network of MOGAT2, this gene has been reported to be important in porcine backfat adipose tissue, which is related to the concentration of lipid and lipid synthesis, as revealed by a transcriptome analysis comparing Landrace and other breeds [31]. In addition, 3 of 26 nsSNPs in the DMBT1 gene were considered to have strong effects on protein function, as revealed by the SIFT and Polyphen-2 results.
Table 5

Information on the orthologs of three genes (ENSSSCG00000013821, ENSSSCG00000013822, and ENSSSCG000000138149) in selective sweep 2:62355986–62756249

SpeciesMatch gene symbolMatch ensemble gene IDCompare regionsENSSSCG00000013821ENSSSCG00000013822ENSSSCG00000013819



dN/dSTarget %idQuery %iddN/dSTarget %idQuery %iddN/dSTarget %idQuery %id
Chimpanzee (Pan troglodytes)OR7A5ENSPTRG0000001060319:15,130,772–15,137,9450.35069.070.70.37269.671.80.32771.270.9
Chimpanzee (Pan troglodytes)OR7A10ENSPTRG0000001060419:15,143,753–15,144,6820.37770.670.10.33371.871.80.33871.869.4
Gibbon (Nomascus leucogenys)OR7A17ENSNLEG00000005159GL397382.1:231,228–275,0980.38371.070.70.35970.370.60.29073.270.9
Gorilla (Gorilla gorilla gorilla)OR7A10ENSGGOG0000001504919:15,120,105–15,121,034-70.670.1-70.970.9-72.269.7
Gorilla (Gorilla gorilla gorilla)OR7A17ENSGGOG0000003483419:15,160,189–15,161,115-72.572.0-72.872.8-73.170.6
Human (Homo sapiens)OR7A10ENSG0000012751519:14,840,948–14,841,8770.41870.269.80.37770.670.60.36171.869.4
Human (Homo sapiens)OR7A17ENSG0000018538519:14,880,426–14,881,4520.33872.271.70.35672.572.50.31772.570.0
Human (Homo sapiens)OR7A5ENSG0000018826919:14,792,490–14,835,3760.35469.671.40.37070.272.50.31371.571.3
Mouse (Mus musculus)Olfr1353ENSMUSG0000004277410:78,963,309–78,971,338-62.562.1-61.261.20.24365.162.8
Mouse (Mus musculus)Olfr1352ENSMUSG0000004649310:78,981,050–78,987,9030.23868.668.20.22467.367.3-68.666.3
Mouse (Mus musculus)Olfr19ENSMUSG0000004810116:16,672,228–16,676,4050.24568.367.90.26766.366.30.25367.665.3
Mouse (Mus musculus)Olfr57ENSMUSG0000006020510:79,028,741–79,036,2740.30866.568.20.28964.366.30.34965.265.0
Mouse (Mus musculus)Olfr1351ENSMUSG0000006321610:79,012,472–79,019,6450.30864.666.20.30362.164.10.34564.364.1
Mouse (Mus musculus)Olfr8ENSMUSG0000009408010:78,950,636–78,958,3780.28463.263.00.31758.458.6-60.758.8
Mouse (Mus musculus)Olfr1354ENSMUSG0000009467310:78,913,171–78,920,3990.26463.663.3-59.059.2-62.360.3
Orangutan (Pongo abelii)OR7A5ENSPPYG0000000965519:15,004,902–15,005,8580.37367.969.50.39567.369.30.35168.968.4
Orangutan (Pongo abelii)OR7A10ENSPPYG0000000965619:15,019,264–15,020,1930.45369.669.10.40270.970.90.35071.268.8
Orangutan (Pongo abelii)OR7A17ENSPPYG0000000965819:15,062,903–15,091,8430.34470.970.40.33971.571.50.34270.968.4
Rat (Rattus norvegicus)Olr1073ENSRNOG000000316887:13,378,338–13,379,273-62.162.1-61.762.10.27065.363.4
Rat (Rattus norvegicus)Olr1076ENSRNOG000000394487:13,424,355–13,425,3110.26366.067.50.24863.865.70.28564.864.4
Rat (Rattus norvegicus)Olr1075ENSRNOG000000394497:13,403,899–13,404,8580.29067.168.80.27266.168.30.29167.467.2
Rat (Rattus norvegicus)Olr1085ENSRNOG000000470907:13,673,934–13,674,866-63.263.00.34358.458.60.32762.360.3
Rat (Rattus norvegicus)Olr1079ENSRNOG000000497817:13,488,205–13,489,1370.27663.663.30.39559.459.60.33662.660.6
Rat (Rattus norvegicus)Olr1077ENSRNOG000000541077:13,460,476–13,461,4050.22969.368.80.24167.067.00.23668.065.6
Rat (Rattus norvegicus)Olr1082ENSRNOG000000589437:13,553,010–13,553,9630.27961.863.00.34858.059.60.34259.659.1
Rat (Rattus norvegicus)Olr1083ENSRNOG000000614807:13,587,479–13,588,4110.29063.263.00.35260.360.50.33262.660.6
Vervet-AGM (Chlorocebus sabaeus)OR7A10ENSCSAG000000061936:13,469,888–13,471,1670.34770.269.80.33072.272.20.34871.869.4

DISCUSSION

Given the interest of the meat production industry in improving the meat quality or piglet number, a genetic investigation focusing on the selective sweep regions of the Landrace genome was previously performed [7]. This study provided vital information for domestic pig breeding. In most selective sweep studies using whole-genome sequencing data, all SNPs, including nsSNPs, were used to detect selective sweep regions. As nsSNPs are mutations that alter the amino acid sequences of encoded proteins, their presence results in a phenotypic change in the organism. Such changes are usually subjected to natural selection. In the case of Landrace, the domestication process had a shorter generation interval than natural selection. Therefore, we believe that nsSNPs had a diverse evolutionary history during the domestication and artificial selection processes, and advanced studies are required to achieve an accurate interpretation of the Landrace genome using nsSNP information after exploring Landrace positive selection based on whole-genome sequence data. In this study, we performed several analyses of nsSNPs of the Landrace genome to obtain a better understanding of the whole genome. We assumed that the information on these nsSNPs might be associated with novel important biological mechanisms related to particular traits of the Landrace breed. For the precise analysis of the characteristics of the Landrace breed from a genomic perspective, we investigated the biological meaning of nsSNPs in the selective sweep regions of the Landrace genome used in a previous study [7]. As a result, there was no correlation between the number of nsSNPs and gene length per 90 genes containing an nsSNP within the selective sweep regions of the Landrace genome (Figure 5), which was contrary to our expectations. Considering that 22 of 90 genes overlapped with multiple selective sweep regions while the others belonged to a single selective sweep region, we assumed that genes containing many nsSNPs in the selective sweep regions of the Landrace genome were more meaningful than our expectation. Subsequently, based on GO network analysis using genes containing 345 nsSNPs in the selective sweep regions of the Landrace genome, a large proportion of selective sweep regions of the Landrace genome where strong amino acid sequence changes had occurred, were involved in the superior reproductive capacity or growth and development of the Landrace breed during the perinatal period. Some of the GO network results overlapped with the GO analysis of all the selective sweep regions in a previous study, while others involved novel interpretations of the Landrace genome [7].

CONCLUSION

Our results strongly suggested that Landrace genetic variants, which could give rise to changes in amino acid sequences, might be important factors for the superior reproductive capacity of this breed. We aimed to perform analyses of the Landrace genome using nsSNPs in selective sweep regions. Our results showed that most of the genes affected by nsSNPs in the selective sweep regions may be closely related to the superior reproductive capacity or growth and development of the Landrace breed during the perinatal period. Furthermore, there were indications that nsSNPs in selection had impacted in Landrace breed establishment. This study will provide insights into the impact of the process of domestication on the Landrace genome.
  27 in total

1.  Inference of population structure using multilocus genotype data.

Authors:  J K Pritchard; M Stephens; P Donnelly
Journal:  Genetics       Date:  2000-06       Impact factor: 4.562

2.  Approximating selective sweeps.

Authors:  Richard Durrett; Jason Schweinsberg
Journal:  Theor Popul Biol       Date:  2004-09       Impact factor: 1.570

3.  The signature of positive selection on standing genetic variation.

Authors:  Molly Przeworski; Graham Coop; Jeffrey D Wall
Journal:  Evolution       Date:  2005-11       Impact factor: 3.694

4.  Deleted in malignant brain tumor 1 is secreted in the oviduct and involved in the mechanism of fertilization in equine and porcine species.

Authors:  Barbara Ambruosi; Gianluca Accogli; Cécile Douet; Sylvie Canepa; Géraldine Pascal; Philippe Monget; Carla Moros Nicolás; Uffe Holmskov; Jan Mollenhauer; Catherine Robbe-Masselot; Olivier Vidal; Salvatore Desantis; Ghylène Goudet
Journal:  Reproduction       Date:  2013-07-01       Impact factor: 3.906

5.  Fast gapped-read alignment with Bowtie 2.

Authors:  Ben Langmead; Steven L Salzberg
Journal:  Nat Methods       Date:  2012-03-04       Impact factor: 28.547

6.  The hitch-hiking effect of a favourable gene.

Authors:  J M Smith; J Haigh
Journal:  Genet Res       Date:  1974-02       Impact factor: 1.588

7.  Human non-synonymous SNPs: server and survey.

Authors:  Vasily Ramensky; Peer Bork; Shamil Sunyaev
Journal:  Nucleic Acids Res       Date:  2002-09-01       Impact factor: 16.971

8.  Genome-wide footprints of pig domestication and selection revealed through massive parallel sequencing of pooled DNA.

Authors:  Andreia J Amaral; Luca Ferretti; Hendrik-Jan Megens; Richard P M A Crooijmans; Haisheng Nie; Sebastian E Ramos-Onsins; Miguel Perez-Enciso; Lawrence B Schook; Martien A M Groenen
Journal:  PLoS One       Date:  2011-04-04       Impact factor: 3.240

9.  Analyses of pig genomes provide insight into porcine demography and evolution.

Authors:  Martien A M Groenen; Alan L Archibald; Hirohide Uenishi; Christopher K Tuggle; Yasuhiro Takeuchi; Max F Rothschild; Claire Rogel-Gaillard; Chankyu Park; Denis Milan; Hendrik-Jan Megens; Shengting Li; Denis M Larkin; Heebal Kim; Laurent A F Frantz; Mario Caccamo; Hyeonju Ahn; Bronwen L Aken; Anna Anselmo; Christian Anthon; Loretta Auvil; Bouabid Badaoui; Craig W Beattie; Christian Bendixen; Daniel Berman; Frank Blecha; Jonas Blomberg; Lars Bolund; Mirte Bosse; Sara Botti; Zhan Bujie; Megan Bystrom; Boris Capitanu; Denise Carvalho-Silva; Patrick Chardon; Celine Chen; Ryan Cheng; Sang-Haeng Choi; William Chow; Richard C Clark; Christopher Clee; Richard P M A Crooijmans; Harry D Dawson; Patrice Dehais; Fioravante De Sapio; Bert Dibbits; Nizar Drou; Zhi-Qiang Du; Kellye Eversole; João Fadista; Susan Fairley; Thomas Faraut; Geoffrey J Faulkner; Katie E Fowler; Merete Fredholm; Eric Fritz; James G R Gilbert; Elisabetta Giuffra; Jan Gorodkin; Darren K Griffin; Jennifer L Harrow; Alexander Hayward; Kerstin Howe; Zhi-Liang Hu; Sean J Humphray; Toby Hunt; Henrik Hornshøj; Jin-Tae Jeon; Patric Jern; Matthew Jones; Jerzy Jurka; Hiroyuki Kanamori; Ronan Kapetanovic; Jaebum Kim; Jae-Hwan Kim; Kyu-Won Kim; Tae-Hun Kim; Greger Larson; Kyooyeol Lee; Kyung-Tai Lee; Richard Leggett; Harris A Lewin; Yingrui Li; Wansheng Liu; Jane E Loveland; Yao Lu; Joan K Lunney; Jian Ma; Ole Madsen; Katherine Mann; Lucy Matthews; Stuart McLaren; Takeya Morozumi; Michael P Murtaugh; Jitendra Narayan; Dinh Truong Nguyen; Peixiang Ni; Song-Jung Oh; Suneel Onteru; Frank Panitz; Eung-Woo Park; Hong-Seog Park; Geraldine Pascal; Yogesh Paudel; Miguel Perez-Enciso; Ricardo Ramirez-Gonzalez; James M Reecy; Sandra Rodriguez-Zas; Gary A Rohrer; Lauretta Rund; Yongming Sang; Kyle Schachtschneider; Joshua G Schraiber; John Schwartz; Linda Scobie; Carol Scott; Stephen Searle; Bertrand Servin; Bruce R Southey; Goran Sperber; Peter Stadler; Jonathan V Sweedler; Hakim Tafer; Bo Thomsen; Rashmi Wali; Jian Wang; Jun Wang; Simon White; Xun Xu; Martine Yerle; Guojie Zhang; Jianguo Zhang; Jie Zhang; Shuhong Zhao; Jane Rogers; Carol Churcher; Lawrence B Schook
Journal:  Nature       Date:  2012-11-15       Impact factor: 49.962

10.  Whole-genome sequencing of Berkshire (European native pig) provides insights into its origin and domestication.

Authors:  Mingzhou Li; Shilin Tian; Carol K L Yeung; Xuehong Meng; Qianzi Tang; Lili Niu; Xun Wang; Long Jin; Jideng Ma; Keren Long; Chaowei Zhou; Yinchuan Cao; Li Zhu; Lin Bai; Guoqing Tang; Yiren Gu; An'an Jiang; Xuewei Li; Ruiqiang Li
Journal:  Sci Rep       Date:  2014-04-14       Impact factor: 4.379

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.