| Literature DB >> 29581880 |
Shaopan Ye1, Xiaolong Yuan1, Xiran Lin1, Ning Gao1, Yuanyu Luo1, Zanmou Chen1, Jiaqi Li1, Xiquan Zhang1, Zhe Zhang1.
Abstract
BACKGROUND: Genome-wide association studies and genomic predictions are thought to be optimized by using whole-genome sequence (WGS) data. However, sequencing thousands of individuals of interest is expensive. Imputation from SNP panels to WGS data is an attractive and less expensive approach to obtain WGS data. The aims of this study were to investigate the accuracy of imputation and to provide insight into the design and execution of genotype imputation.Entities:
Keywords: Chickens; Imputation; Re-sequencing; SNP
Year: 2018 PMID: 29581880 PMCID: PMC5861640 DOI: 10.1186/s40104-018-0241-5
Source DB: PubMed Journal: J Anim Sci Biotechnol ISSN: 1674-9782
Sequencing strategies used for genotype imputation
| Total X | Different sequencing scenarios with fixed cost (X × N) | ||||||
|---|---|---|---|---|---|---|---|
| 24 | 1 × 24 | 2 × 12 | 3 × 8 | 4 × 6 | 6 × 4 | 8 × 3 | 2 × 12 |
| 36 | 2 × 18 | 3 × 12 | 4 × 9 | 6 × 6 | 9 × 4 | 12 × 3 | 18 × 2 |
| 72 | 3 × 24 | 4 × 18 | 6 × 12 | 8 × 9 | 9 × 8 | 12 × 6 | 18 × 4 |
| 96 | 4 × 24 | 6 × 16 | 8 × 12 | 12 × 8 | 16 × 6 | ||
| 144 | 6 × 24 | 8 × 18 | 12 × 12 | 16 × 9 | 18 × 8 | ||
Total X total cost of genotyping, X × N sequenced depth (X) times the number of sequenced animals (N)
Fig. 1The cumulative genetic diversity of selected key individuals was estimated by adding animals with the optimize rank of 24 key individuals one by one. The cumulative genetic diversity means the proportion of the entire chicken population
Summary of the key individuals for re-sequencing
| Animal ID | Clean Reads | Mapped reads rate | Depth of coverage | Uniquely mapped reads rate | SC | GC | NRS | NRD |
|---|---|---|---|---|---|---|---|---|
| 1 | 90,735,760 | 0.951 | 15.81 | 0.942 | 0.988 | 0.979 | 0.972 | 0.042 |
| 2 | 117,996,876 | 0.964 | 12.87 | 0.959 | 0.998 | 0.928 | 0.947 | 0.128 |
| 3 | 105,660,610 | 0.938 | 16.21 | 0.928 | 0.975 | 0.979 | 0.961 | 0.040 |
| 4 | 125,426,182 | 0.953 | 17.11 | 0.947 | 0.974 | 0.976 | 0.954 | 0.047 |
| 5 | 111,564,528 | 0.956 | 13.60 | 0.951 | 0.997 | 0.985 | 0.989 | 0.029 |
| 6 | 127,019,556 | 0.960 | 14.68 | 0.955 | 0.998 | 0.957 | 0.968 | 0.079 |
| 7 | 99,058,934 | 0.873 | 15.11 | 0.864 | 0.990 | 0.976 | 0.974 | 0.046 |
| 8 | 130,604,192 | 0.961 | 15.19 | 0.956 | 0.953 | 0.991 | 0.951 | 0.018 |
| 9 | 141,700,352 | 0.961 | 16.25 | 0.956 | 0.999 | 0.991 | 0.995 | 0.017 |
| 10 | 141,847,268 | 0.965 | 14.39 | 0.960 | 0.999 | 0.991 | 0.995 | 0.018 |
| 11 | 115,053,394 | 0.958 | 13.92 | 0.953 | 0.997 | 0.986 | 0.989 | 0.026 |
| 12 | 141,220,480 | 0.965 | 14.41 | 0.961 | 0.999 | 0.992 | 0.995 | 0.016 |
| 13 | 126,408,732 | 0.959 | 14.44 | 0.951 | 0.997 | 0.953 | 0.964 | 0.087 |
| 14 | 137,853,286 | 0.963 | 14.79 | 0.958 | 0.998 | 0.989 | 0.993 | 0.022 |
| 15 | 124,123,884 | 0.961 | 13.92 | 0.955 | 0.998 | 0.987 | 0.991 | 0.026 |
| 16 | 134,906,464 | 0.970 | 13.88 | 0.965 | 0.999 | 0.990 | 0.995 | 0.020 |
| 17 | 137,609,612 | 0.957 | 13.77 | 0.950 | 0.998 | 0.988 | 0.992 | 0.024 |
| 18 | 140,592,166 | 0.954 | 14.53 | 0.946 | 0.998 | 0.990 | 0.993 | 0.020 |
| 19 | 120,575,426 | 0.955 | 14.53 | 0.949 | 0.996 | 0.986 | 0.988 | 0.028 |
| 20 | 131,305,824 | 0.962 | 14.29 | 0.957 | 0.998 | 0.980 | 0.988 | 0.038 |
| 21 | 140,828,990 | 0.964 | 15.43 | 0.959 | 0.997 | 0.989 | 0.992 | 0.020 |
| 22 | 134,838,280 | 0.965 | 14.33 | 0.960 | 0.998 | 0.966 | 0.977 | 0.064 |
| 23 | 115,132,512 | 0.951 | 13.51 | 0.944 | 0.997 | 0.988 | 0.989 | 0.024 |
| 24 | 112,104,986 | 0.953 | 13.84 | 0.947 | 0.998 | 0.989 | 0.990 | 0.022 |
SC SNP concordance, GC Genotype concordance, NRS Non-reference sensitivity, NRD Non-reference discrepancy
Fig. 2Average imputation accuracy of the direct imputation and two-step imputation obtained with FImpute and Beagle against four chromosomes (chr1, chr3, chr6 and chr28) among 5 replications. 60K_WGS was the direct imputation from 60 K to WGS data. 60K_600K_WGS was the two-step imputation from 60 K to 600 K data and then to WGS data. 600K_WGS was the direct imputation from 600 K to WGS data. The imputation accuracies were the genotype concordance between the true and imputed genotypes
Fig. 3Average imputation accuracies of different X with fixed N (N = 24) obtained with FImpute and Beagle against four chromosomes (chr1, chr3, chr6, and chr28). The imputation accuracies were the genotype concordance between the true and imputed genotypes
Fig. 4Average imputation accuracies of 12 X with different N (1~ 24) obtained with FImpute and Beagle from 5 replications on chromosome 6. The imputation accuracies were the genotype concordance between the true and imputed genotypes
Fig. 5Average imputation accuracy of different total cost obtained with FImpute and Beagle against four chromosomes (chr1, chr3, chr6, and chr28) among 5 replications. A given total cost was defined as the number of sequencing individuals timed the sequence read depth of each individuals. The imputation accuracies were the genotype concordance between the true and imputed genotypes
Fig. 6Average imputation accuracies of different software against minor allele frequency among 5 replications. SNPs were classified by their array-derived MAF
Summary of imputation from 600 K to WGS data
| Chr. | SNP # in sequence | SNP # in chip | SNP # for validation | Total time-consuming, s | |
|---|---|---|---|---|---|
| Beagle | FImpute | ||||
| 1 | 3,177,578 | 81,074 | 1,621 | 403,680 | 3,066 |
| 3 | 1,694,589 | 45,917 | 918 | 232,456 | 1,773 |
| 6 | 622,557 | 17,762 | 355 | 90,281 | 727 |
| 28 | 74,114 | 3,866 | 77 | 1,788 | 111 |