| Literature DB >> 31921417 |
Xiao Wang1,2, Guosheng Su2, Dan Hao1,3,4, Mogens Sandø Lund2, Haja N Kadarmideen1.
Abstract
BACKGROUND: Genotyping by sequencing (GBS) still has problems with missing genotypes. Imputation is important for using GBS for genomic predictions, especially for low depths, due to the large number of missing genotypes. Minor allele frequency (MAF) is widely used as a marker data editing criteria for genomic predictions. In this study, three imputation methods (Beagle, IMPUTE2 and FImpute software) based on four MAF editing criteria were investigated with regard to imputation accuracy of missing genotypes and accuracy of genomic predictions, based on simulated data of livestock population.Entities:
Keywords: Genomic prediction; Genotyping by sequencing; Imputation; MAF; Simulation
Year: 2020 PMID: 31921417 PMCID: PMC6947967 DOI: 10.1186/s40104-019-0407-9
Source DB: PubMed Journal: J Anim Sci Biotechnol ISSN: 1674-9782
Simulation parameters of the population structure
| Steps | Population structure | Values |
|---|---|---|
| Number of replicates | 10 | |
| Overall heritability | 0.3 | |
| QTL heritability | 0.3 | |
| Phenotypic variance | 1.0 | |
| Step 1: Historical generations (HGs) | Foundation population size of (HGs) | 2000 |
| Number of generations in phase 1 | 1000 | |
| Population size in phase 1 | 2000 | |
| Number of generations in phase 2 | 200 | |
| Population size in the end of phase 2 | 400 | |
| The number of males in the last (HG) | 200 | |
| The number of females in the last (HG) | 200 | |
| Number of males from HG | 40 | |
| Number of females from HG | 200 | |
| Step 2: Expanded generations (EGs) | Number of generations | 1 |
| Litter size | 5 | |
| The proportion of male progeny | 50% | |
| Mating design | Random | |
| Number of males from EG | 100 | |
| Number of females from EG | 500 | |
| Step 3: Recent generations | Number of generations | 10 |
| Litter size | 5 | |
| The Proportion of male progeny | 50% | |
| Mating design | Random | |
| Sire replacement | 80% | |
| Dam replacement | 40% | |
| Selection design | EBV |
Simulation parameters of the genome
| Genome | Values |
|---|---|
| Number of chromosomes | 5 |
| Chromosome length | 100 cM |
| Number of marker loci on one chromosome | 1,000,000 |
| Marker positions | Evenly |
| Number of marker alleles in the first HG | 2 |
| Marker allele frequencies in the first HG | Random |
| Number of QTL loci on one chromosome | 100 |
| QTL positions | Random |
| Number of QTL alleles in the first HG | 2 |
| QTL allele frequencies in the first HG | Random |
| QTL allele effect | From a gamma distribution with a shape of 0.4 |
| Marker mutation rate in historical population | 2.5 × 10−5 |
| QTL mutation rate in historical population | 2.5 × 10−5 |
Fig. 1Correlations for the original GBS (GBS), the Beagle imputed genotypes (Be), the IMPUTE2 imputed genotypes (IM), the FImpute imputed genotypes (FI) and the imputed genotypes based on corrected GBS (GcIM). Note: MAF criteria were used to delete markers with low MAF values before imputation
Fig. 2Correct genotype rates and correlations for GBS and imputed genotypes based on corrected GBS (GcIM). Note: MAF criteria were used to delete markers with low MAF values before imputation
Fig. 3Reliabilities (r2) of genomic predictions using original GBS (GBS), GBS with imputation of missing genotypes (Be, IM, FI), imputed corrected genotype by IMPUTE2 (GcIM) and true genotypes of GBS markers (GBSr), at four depths, averaged over 10 replicates. Bars indicate SE. Note: MAF criteria were used to delete markers with low MAF values before imputation
Fig. 4Regression of true breeding value (TBV) on genomic estimated breeding values (GEBV). Note: b is the regression coefficient
Imputation accuracies compared to true genotypes in the GBS loci (GBSr) and reliabilities of genomic prediction using GBS data imputed by FImpute with or without pedigree information (GcFIped or GcFI) at depth = 4 and MAF ≥ 0.01, averaged over 10 replicates. The imputation was performed after genotype correction (i.e., GcFIped and GcFI). Note: Depth = 4 (1) or Depth = 4 (1 & 2) indicated the genotypes with read = 1 or 1 and 2 were set as missing genotypes. Standard errors were shown within bracket
| Depth = 4 | Depth = 4 | Depth = 4 (1) | Depth = 4 (1 & 2) | |||||
|---|---|---|---|---|---|---|---|---|
| FI | FIped | GcFI | GcFIped | GcFI | GcFIped | GcFI | GcFIped | |
| Correct genotype rate | 0.905 (< 0.0005) | 0.920 (< 0.0005) | 0.915 (< 0.0005) | 0.919 (< 0.0005) | 0.927 (< 0.0005) | 0.935 (< 0.0005) | 0.942 (< 0.0005) | 0.950 (< 0.0005) |
| Correlation | 0.946 (< 0.0005) | 0.955 (< 0.0005) | 0.951 (< 0.0005) | 0.954 (< 0.0005) | 0.960 (< 0.0005) | 0.965 (< 0.0005) | 0.971 (< 0.0005) | 0.976 (< 0.0005) |
| Prediction reliability | 0.666 (0.0253) | 0.670 (0.0245) | 0.666 (0.0248) | 0.668 (0.0245) | 0.674 (0.0246) | 0.679 (0.0246) | 0.683 (0.0242) | 0.689 (0.0246) |