Genomic selection is a useful way to enhance economically important traits in
domestic animals. Previous studies showed that using reference populations with
abundant markers and a large size increases the prediction accuracy of estimated
breeding value (EBV) [1]. However, in small
size of reference population, obtaining an appropriate reference population
comprising individuals of the same breed is difficult, leading to low accuracy of
predictions. As an alternative approach, use of an admixed population including the
target population as a reference has been recommended [2,3]. Such admixed
populations can be used as a reference when breeds are defined as their link by
genotypes. When the reference population comprises a breed that is distinct from the
test population, they must be genetically related, rather than related by pedigree.
Genetic markers can explain the relationships among all individuals in a genomic
relationship matrix. In addition, with greater linkage disequilibrium (LD), the
prediction accuracy of EBV should increase [4-6]. In this point of view, this
study was performed to determine the prediction accuracy of EBV using an admixed
reference population consisting of crossbred Korean native pig and Landrace pigs
(CB).
MATERIALS AND METHODS
Genotypes and phenotypes of collected samples
In accordance with the ethical guidelines, a total of 1,289 pigs (695 Berkshire
[BS] and 594 cross breed (CB) blood samples were collected by veterinarians and
were genotyped using a Porcine 60K SNP chip (Illumina, San Diego, CA, USA)
(Table 1). These samples were
provided by the National Institute of Animal Science (Jeonju, Korea); 25 KNP and
20 Landrace purebred samples were also provided to confirm the genetic
relationships. Quality control (QC) was performed on each population; 41,594 and
39,002 BS and CB single nucleotide polymorphisms (SNPs) remained after QC
(missing chromosomes with 11,166 and 4,214 markers, minor allele frequency less
than 1% with 359 and 5,606 markers, missing genotypes over than 10% with 10,030
and 433 markers for BS and CB, respectively) and were merged to yield a single
admixed population (Table 1). After
merging, with in common and overwrapped markers, 45,875 SNPs remained. The
phenotypes of the 1,289 animals were measured (backfat thickness [BF], carcass
weight [CWT], muscle pH at 24 hours after slaughter [pH], and shear force [SF]).
The sex and slaughter age of all animals were recorded in the phenotype
measurement processes.
Table 1.
Genotype information for the studied population
Number of animal
Original genotype
SNPs removed by QC
Number of SNPs after
QC
Not located or located on sex
chromosome
Minor allele frequency (<
0.01[1)])
Missing genotype (>
0.1[2)])
Berkshire (BS)
695
63149
11,166
359
10,030
41,594
Korean native × Landrace
crossbreed (CB)
594
49255
4,214
5,606
433
39,002
Alleles removed when minor allele frequency < 1%.
Alleles removed when genotype is missing from > 10% of the
entire population.
SNPs, single nucleotide polymorphisms; QC, quality control.
Alleles removed when minor allele frequency < 1%.Alleles removed when genotype is missing from > 10% of the
entire population.SNPs, single nucleotide polymorphisms; QC, quality control.
Analyses prior to genomic estimated breeding value (GEBV) prediction:
population structure and genome-wide association study (GWAS)
The population structure was evaluated, and association studies were conducted to
enable further analyses. Visualization of the population structure is useful to
determine genetic relationships among breeds. Using each 20 samples genotype
information from each BS, CB, Landrace, and KNP populations, principal component
analysis (PCA) was performed to generate clusters, determine any shared
principal components, and detect any incorrectly classified individuals.
Furthermore, plots of LD by distance, within populations and among breeds, were
generated. A GWAS of the traits of interest was performed for genetic comparison
between the CB and BS, and to determine any significant loci or LD
relationships. The GWAS was performed based on a mixed linear model generated
using GCTA software (ver. 1.25.3 [7]).
Bayesian mixture model was created using the BayesR program (default option with
0, 0.0001, 0.001, 0.01 effect sizes of mixture; 50000 MCMC chain; 20,000 burnin;
10th thin interval). Proportion of variance for specific SNP was
calculated as follow:Information on the genetic contributions to traits was obtained from a previous
study [8]. The PCA, LD analysis, and data
processing were performed using PLINK 1.9 [9] and R software (R Development Core Team, Vienna, Austria) [10]. Data were visualized in the R
environment.
Procedure for predicting breeding value
To compare the prediction accuracy of breeding value between single-breed and
admixed reference populations, both the reference and test animal data sets were
randomly sampled 10 times each. There is no intersect animals among test and
reference population. The GEBV predictions were performed using all test and
reference set combinations, and mean accuracy was assessed according to the size
of the reference population. Prediction accuracy using a single-breed reference
population was determined for each breed (250 test animals each) by reference
population size (100, 200, 300, or 400 animals) (Fig. 1). For the analysis involving the admixed reference
population, the reference population size was the same as in the previous
scenario. Admixed reference included each breed with an equal ratio. 125
individuals were randomly selected from each of the two breeds as test
animals.
Fig. 1.
Schematic of the breeding value predictions with and without use of
the admixed reference population (1 and 2, respectively).
A genetic relationship matrix was built using GCTA (ver. 1.25.3 [7]) and ASReml 4.1 [11] was used for genomic prediction. The model used in this
study was as follows:where y indicates the measured phenotype, μ is the
overall mean, X and Z are design matrices
related to fixed effects and effects, respectively, b and
u are vectors of fixed and genetic effects, respectively,
and e indicates error variance. The prediction accuracy was
given by the correlation between GEBV and own phenotype using the following
equation [12]:
RESULTS
Population structure and genome-wide association study (GWAS)
An overview of the population genetic structure was obtained by the PCA and GWAS
prior to genomic prediction (Figs.
2–6). First, in order to
compare the populations with the same sample size, 20 samples SNP genotype
information such as KPN and Landrace purebred populations were randomly
extracted from BS and CB, respectively. As shown in Fig. 2, each population forms a distinct cluster; the first
and second principal components explain 12.89% and 9.38% of the variance in the
population genetic structure, respectively. On the axis of the first component,
the Landrace and BS populations are located close to each other, with the KNP
population being more distant. On the axis of the second component only, the KNP
population was located towards the middle.
Fig. 2.
Principal component analysis among the studied population.
20 samples per each population were used to confirm the genetic
relationship.
Fig. 6.
GWAS based on a Bayesian mixture model of all traits in the Berkshire
and crossbreed (CB) populations.
CWT, carcass weight; BF, backfat thickness; SF, shear force; pH, muscle
pH at 24 hours after slaughter; KN, Korean native pig; LR, Landrace.
Principal component analysis among the studied population.
20 samples per each population were used to confirm the genetic
relationship.LD was examined in each population by distance. (Figs. 3 and 4). KNP has clearly
stronger LD pattern than those for BS, CB, and Landrace, while BS showed the
weakest correlations, and the differences between those of CB and Landrace were
small. In terms of the correlations between breed pairs, those of KNP and
Landrace, and KNP with BS, were weakest, and that of CB with KNP was strongest,
followed by CB with Landrace (Fig. 4).
Fig. 3.
Linkage disequilibrium (LD) by genetic distance for the different
breeds.
KN, Korean native pig; LR, Landrace.
Fig. 4.
Linkage disequilibrium (LD) by genetic distance: correlations between
breeds.
KN, Korean native pig; LR, Landrace.
Linkage disequilibrium (LD) by genetic distance for the different
breeds.
KN, Korean native pig; LR, Landrace.
Linkage disequilibrium (LD) by genetic distance: correlations between
breeds.
KN, Korean native pig; LR, Landrace.The GWAS, which used a mixed linear model (Fig.
5), showed that there were no significant SNPs for any trait in
common, based on a significance threshold of 1.08 ×
10−6, between CB and BS (with Bonferroni correction
applied). BS had significant SNPs for all traits, while CB had significant SNPs
only for pH. In a Bayesian mixture model, the genetic contribution of CB to all
markers was ~0%, while BS made a contribution of > 2.5% contribution to
BF, and > 1% to pH and SF (Fig.
6).
Fig. 5.
GWAS based on a mixed linear regression model of all traits in the
Berkshire and crossbreed (CB) populations.
CWT, carcass weight; BF, backfat thickness; SF, shear force; pH, muscle
pH at 24 hours after slaughter; KN, Korean native pig; LR, Landrace.
GWAS based on a mixed linear regression model of all traits in the
Berkshire and crossbreed (CB) populations.
CWT, carcass weight; BF, backfat thickness; SF, shear force; pH, muscle
pH at 24 hours after slaughter; KN, Korean native pig; LR, Landrace.
GWAS based on a Bayesian mixture model of all traits in the Berkshire
and crossbreed (CB) populations.
CWT, carcass weight; BF, backfat thickness; SF, shear force; pH, muscle
pH at 24 hours after slaughter; KN, Korean native pig; LR, Landrace.
Comparison for prediction accuracies of genomic estimated breeding value
between admixed and single-breed reference populations
The prediction accuracy was zero or negative when using CB and BS as the
reference and test populations, respectively. Increasing the size of the
reference population did not affect the accuracy of the predictions for any
trait except CWT, which increased by 6.26% between reference population sizes of
100 and 400. Use of the admixed population as the same pattern of reference
increased the accuracy of the predictions for the BS population by 0.004, 0.013,
0.024, and 0.035 for CWT, BF, SF, and pH, respectively (Table 2; Fig. 7).
Table 2.
Prediction accuracy for each scenario
Variables
Reference size
Accuracy of CB when
using admixed reference
Accuracy of BS when
using admixed reference
Accuracy of BS when
using CB reference
Accuracy of CB when
using BS reference
Mean
SD
Min
Max
Mean
SD
Min
Max
Mean
SD
Min
Max
Mean
SD
Min
Max
BF
100
−0.022
0.135
−0.203
0.241
0.102
0.101
−0.104
0.240
0.010
0.051
−0.092
0.087
−0.042
0.057
−0.119
0.024
200
0.036
0.091
−0.046
0.229
0.127
0.101
−0.032
0.238
0.016
0.062
−0.077
0.125
−0.033
0.060
−0.124
0.044
300
0.027
0.056
−0.034
0.129
0.140
0.092
−0.003
0.259
0.003
0.067
−0.085
0.101
−0.041
0.047
−0.113
0.040
400
0.057
0.079
−0.030
0.203
0.143
0.084
0.004
0.304
0.004
0.054
−0.087
0.079
−0.025
0.035
−0.074
0.038
CWT
100
0.012
0.095
−0.152
0.305
0.055
0.101
−0.073
0.222
0.001
0.063
−0.098
0.121
0.049
0.054
−0.070
0.123
200
0.056
0.100
−0.067
0.310
0.064
0.078
−0.042
0.166
0.044
0.065
−0.050
0.158
0.043
0.038
−0.017
0.099
300
0.035
0.096
−0.072
0.316
0.072
0.099
−0.073
0.253
0.056
0.076
−0.097
0.155
0.032
0.056
−0.031
0.153
400
0.062
0.094
−0.058
0.300
0.066
0.087
−0.052
0.233
0.063
0.075
−0.074
0.202
0.043
0.055
−0.018
0.161
pH
100
−0.013
0.066
−0.089
0.113
0.131
0.160
−0.146
0.426
0.010
0.053
−0.059
0.096
−0.012
0.041
−0.075
0.051
200
−0.013
0.092
−0.174
0.099
0.188
0.095
0.064
0.342
0.008
0.053
−0.073
0.091
−0.019
0.050
−0.088
0.069
300
0.005
0.108
−0.181
0.193
0.160
0.108
−0.007
0.293
0.011
0.058
−0.071
0.098
−0.001
0.041
−0.067
0.075
400
0.000
0.112
−0.179
0.198
0.258
0.102
0.084
0.438
0.006
0.065
−0.107
0.114
−0.001
0.042
−0.086
0.064
SF
100
0.038
0.072
−0.088
0.175
0.258
0.103
0.069
0.420
−0.027
0.066
−0.124
0.083
0.001
0.057
−0.076
0.085
200
0.073
0.093
−0.044
0.181
0.293
0.072
0.206
0.394
−0.020
0.045
−0.116
0.030
−0.050
0.052
−0.118
0.015
300
0.070
0.098
−0.058
0.240
0.299
0.053
0.228
0.391
−0.028
0.044
−0.086
0.046
−0.019
0.062
−0.135
0.058
400
0.094
0.117
−0.071
0.249
0.339
0.068
0.242
0.443
−0.032
0.049
−0.115
0.045
−0.018
0.045
−0.110
0.048
CB, cross breed of Korean native pig and Lanrace; BS, Berkshire; SD,
standard deviation; BF, backfat thickness; CWT, carcass weight; pH,
muscle pH at 24 hours after slaughter; SF, shear force.
Fig. 7.
Genomic estimated breeding value prediction accuracy for the
Berkshire and crossbred population by reference population size and
breed.
CWT; carcass weight, BF; backfat thickness, SF; shear force, pH; muscle
pH at 24 hours after slaughter, KN; Korean native pig, LR; Landrace.
CB, cross breed of Korean native pig and Lanrace; BS, Berkshire; SD,
standard deviation; BF, backfat thickness; CWT, carcass weight; pH,
muscle pH at 24 hours after slaughter; SF, shear force.
Genomic estimated breeding value prediction accuracy for the
Berkshire and crossbred population by reference population size and
breed.
CWT; carcass weight, BF; backfat thickness, SF; shear force, pH; muscle
pH at 24 hours after slaughter, KN; Korean native pig, LR; Landrace.Using CB and BS as the test and reference populations, respectively, the
prediction accuracy was zero or negative for all traits except CWT. The accuracy
of the predictions for the CB population, when using the admixed population as
the reference, increased marginally with increasing size of the reference
population, but was not markedly higher compared to when the BS was the
reference population.
DISCUSSION
Our GWAS results showed that the prediction accuracy of breeding value varied
according to the degree to which a trait is favored. The prediction accuracy of
single-breed and admixed reference population-based was shown to depend on the
quantitative trait locus (QTL) and relationship among population [13]; the current study did not deal with QTLs,
but carefully suggested that GWAS can also be associated with predict breeding
value. Prediction accuracy with respect to genomic selection varies by both the LD
between markers and QTLs, and genomic relationships (obtained by population
structure analysis) [14]. In this study, the
prediction accuracy for highly associated traits was higher when the admixed
reference population was used, for example for BF, SF, and pH (but not CWT) in the
BS population. In contrast, the CB population had no traits that were highly
associated with those in the BS population, except pH. For BS, using both the
single-breed and admixed reference populations, prediction accuracy for CWT was low
compared to the other traits. In CB, the accuracy rates for CWT and pH were not
markedly different when using the single-breed or admixed reference population;
furthermore, these two traits were less strongly associated in the mixed linear
model for the BS population. For CB, prediction accuracy for BF and SF was higher
with a larger admixed reference population. When we use the admixed reference
population that contains both test population breed in this study, relationship
among them possibly be dense. As we mentioned above, following the Wientjes et al.
[13], accuracy can be improved how they
are related. The haplotypes for specific trait in BS also have a chance to affect
accuracy on CB when predicting GEBV. Thus, use of an admixed reference population
with traits associated with those in the reference population possibly improved the
prediction accuracy of breeding value for test population.A Bayesian approach is recommended for genomic predictions involving multi-breed
populations [15]. A study in dairy cattle
indicated that LD does not persist across breeds, except over short genetic distance
(< 10 kb) [16]. Some of the putative
markers have possibly linked with QTL in LD, while in across breed or multi-breed,
low LD relatedness among breeds that already depicted in LD correlation has a small
impact on prediction accuracy. Using the Bayesian method also allows us to focus on
the QTL rather than LD [17]. As shown in
Fig. 6 of this study, the BS population has
an advantage with regard to markers with the high genetic contribution in BF, SF,
and pH.This study aimed to provide data that could facilitate improvement and conservation
of the KNP. Due to the small size of the KNP population as the reference population,
CB (included KNP genotype information) data was also used as the additional
reference population. Nevertheless, this approach can be to improve prediction
accuracy of breeding value and may facilitate phenotype development by following
suggestions. Firstly, LD phases may have been broken down when breeds are crossed,
which could be advantageous in some circumstances, for example by increasing the
chance of uncovering causal variants for the target trait. Second, the crossing of
genetically different populations results in genetic and phenotypic variance, which
can lead to high performance animals than those of the previous generation. Though
we couldn’t find out putative markers or clear prediction accuracy patterns
based on the CB reference, aspect of accuracy with CB using admixed population as a
reference can provide valuable information when composing reference population.
Furthermore, it is presumed that using the admixed population as a reference
population contributes to EBV accuracy by sharing the phenotype associated Berkshire
haplotype information while utilizing the relatedness of reference population with
the test population.The current pig improvement system of the Korean pig industry is relying on abroad
seed stocks mainly on private farms and pig unions. For this reason, breeding plans
and improvement goals are kept confidential and are not disclosed. To address these
challenges, the National Institute of Animal Science has been running a Swine
Genetic Improvement Network Program since 2008 (https://www.pignet.or.kr). This
program aims to select Korean breeding pigs by establishing a system for genetic
evaluation at the national level through exchanges and network connection of
high-performance pigs among domestic pigs. Therefore, in order to establish a system
for selecting and interacting with excellent pigs, it is considered that it is
necessary to build an efficient reference population for estimating more accurate
EBV as well as understanding the phenotype of the pigs on each farm. The result of
this study is expected that the phenotype EBV estimation using the admixed reference
population requires verification using various populations and additional samples,
but it can provide useful information for the genetic improvement of KNP along with
a Swine Genetic Improvement Network Program.
Authors: Christopher C Chang; Carson C Chow; Laurent Cam Tellier; Shashaank Vattikuti; Shaun M Purcell; James J Lee Journal: Gigascience Date: 2015-02-25 Impact factor: 6.524
Authors: André M Hidalgo; John W M Bastiaansen; Marcos S Lopes; Barbara Harlizius; Martien A M Groenen; Dirk-Jan de Koning Journal: G3 (Bethesda) Date: 2015-05-26 Impact factor: 3.154
Authors: Soo Hyun Lee; Dongwon Seo; Doo Ho Lee; Ji Min Kang; Yeong Kuk Kim; Kyung Tai Lee; Tae Hun Kim; Bong Hwan Choi; Seung Hwan Lee Journal: J Anim Sci Technol Date: 2020-11-30