| Literature DB >> 35639514 |
Qifan Zeng1,2,3, Baojun Zhao1, Hao Wang1, Mengqiu Wang1, Mingxuan Teng1, Jingjie Hu1,2, Zhenmin Bao1,2,3, Yangfan Wang1,3.
Abstract
It is of vital importance to understand the population structure, dissect the genetic bases of performance traits, and make proper strategies for selection in breeding programs. However, there is no single webserver covering the specific needs in aquaculture. We present Aquaculture Molecular Breeding Platform (AMBP), the first web server for genetic data analysis in aquatic species of farming interest. AMBP integrates the haplotype reference panels of 18 aquaculture species, which greatly improves the accuracy of genotype imputation. It also supports multiple tools to infer genetic structures, dissect the genetic architecture of performance traits, estimate breeding values, and predict optimum contribution. All the tools are coherently linked in a web-interface for users to generate interpretable results and evaluate statistical appropriateness. The webserver supports standard VCF and PLINK (PED, MAP) files, and implements automated pipelines for format transformation and visualization to simplify the process of analysis. As a demonstration, we applied the webserver to Pacific white shrimp and Atlantic salmon datasets. In summary, AMBP constitutes comprehensive resources and analytical tools for exploring genetic data and guiding practical breeding programs. AMBP is available at http://mgb.qnlm.ac.Entities:
Year: 2022 PMID: 35639514 PMCID: PMC9252723 DOI: 10.1093/nar/gkac424
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 19.160
Figure 1.Functions of AMBP can be classified into three major categories. (A) High-density haplotype reference panels of 18 aquaculture species were implemented in the pipeline for users to impute genotype data with low density markers. The quality of imputation and LD patterns could be visualized online for a better interpretation. (B) Users could characterize the population structure and pairwise kinship using genotype data. Genetic structure inferred by ancestral components and PCA can be used to validate sample information. (C) Users could dissect the genetic landscape of performance traits from genotype and phenotype data by GWA analysis. GEBVs and mating allocations could be predicted by GS and GM. The effects of GS and GM could be further compared by simulating multiple generations.
Data summary of the haplotype reference panels in AMBP
| Species | Genomic Assembly | No. of Samples | No. of SNPs |
|---|---|---|---|
|
| ASM378908v1 | 180 | 3,926,527 |
|
| ASM1920278v1 | 43 | 11,595,609 |
|
| QAU_Airr_1.1 | 40 | 11,325,844 |
|
| cgigas_uk_roslin_v1 | 220 | 3,873,608 |
|
| Cse_v1.0 | 53 | 1,207,365 |
|
| Flounder_ref_guided_V1.0 | 120 | 6,367,725 |
|
| L_crocea_2.0 | 253 | 9,998,395 |
|
| ASM402354v1 | 99 | 7,036,107 |
|
| dlabrax2021 | 76 | 6,531,380 |
|
| O_niloticus_UMD_NMBU | 166 | 5,164,107 |
|
| gadMor3.0 | 220 | 5,936,877 |
|
| ICSASG_v2 | 281 | 5,873,967 |
|
| Okis_V2 | 60 | 7,332,974 |
|
| USDA_OmykA_1.1 | 179 | 17,012,766 |
|
| Ogor_1.0 | 62 | 6,544,976 |
|
| TakRub1.2 | 61 | 1,888,317 |
|
| Ajp_01 | 84 | 20,550,847 |
|
| fAngAng1 | 97 | 11,934,548 |
The imputation accuracy using reference panels in AMBP
| Species | Glimpse | Beagle | ||
|---|---|---|---|---|
| CR (mean ± SD) | R2 (mean ± SD) | CR (mean ± SD) | R2 (mean ± SD) | |
|
| 0.984 ± 0.008 | 0.938 ± 0.033 | 0.927 ± 0.014 | 0.937 ± 0.033 |
|
| 0.925 ± 0.016 | 0.789 ± 0.045 | 0.881 ± 0.005 | 0.732 ± 0.014 |
|
| 0.958 ± 0.028 | 0.905 ± 0.063 | 0.856 ± 0.029 | 0.741 ± 0.067 |
|
| 0.956 ± 0.030 | 0.854 ± 0.103 | 0.894 ± 0.013 | 0.714 ± 0.038 |
|
| 0.924 ± 0.050 | 0.712 ± 0.124 | 0.875 ± 0.058 | 0.607 ± 0.099 |
|
| 0.959 ± 0.018 | 0.855 ± 0.064 | 0.907 ± 0.006 | 0.738 ± 0.022 |
|
| 0.949 ± 0.022 | 0.882 ± 0.068 | 0.856 ± 0.015 | 0.738 ± 0.092 |
|
| 0.966 ± 0.019 | 0.919 ± 0.053 | 0.889 ± 0.044 | 0.795 ± 0.084 |
Figure 2.AMBP deciphered the genetic structure of shrimp populations. (A) Sample clustering with known sampling sources. AP: ancestral population. (B) Sample clustering based on genomic ancestral compositions in the absence of sampling information. PCA analysis with (C) and without (D) sampling information. (E) Ancestral components for each cluster, Ancestry_1: the first largest ancestral population in the cluster, Avg. Q of Ancestry_1: the average fraction of Ancestry_1 in the cluster, Ancestry_2: the second largest ancestral population in the cluster, Avg. Q of Ancestry_2: the average fraction of Ancestry_2 in the cluster. (F) Distribution of the pairwise kinship coefficient and IBS0. IBS0: proportion of genotypes with zero IBS. (G) Close relatives in the population. Avg. HetHet: the average Proportion of SNPs with double heterozygotes, Avg.IBS0: the average proportion of genotypes with zero IBS, Avg. PropIBD: the average Proportion of genomes shared identical-by-descent, Avg.Kinship: the average kinship coefficient.
Figure 3.AMBP dissect the genetic mechanism of performance trait and guide salmon breeding. (A) Global LD pattern generated by ‘Online imputation’, numbers indicated the SNP coordinate on the chromosomes. (B) Manhattan plot of the marker statistical significance across the whole genome. (C) Top significant SNPs revealed by GWA analysis. (D) Matting allocation and optimum numbers of offspring for each pair. (E) Simulated effects of GS and GM in 30 consecutive generations.
Prediction accuracy for dataset prior to and after imputation
| Fold | Prediction accuracy (%) of original genotypes | Prediction accuracy (%) after imputation | Improvement (%) | ||||||
|---|---|---|---|---|---|---|---|---|---|
| RR-GBLUP | Bayes Lasso | SNN | RR-GBLUP | Bayes Lasso | SNN | RR-GBLUP | Bayes Lasso | SNN | |
| 1 | 56.80 | 48.18 | 28.69 | 54.81 | 52.03 | 57.17 | -1.98 | 3.85 | 28.48 |
| 2 | 17.19 | 23.28 | 10.21 | 15.47 | 21.78 | 15.81 | -1.72 | -1.49 | 5.59 |
| 3 | 61.21 | 44.77 | 34.39 | 68.23 | 30.72 | 70.11 | 7.02 | -3.72 | 35.72 |
| 4 | 54.13 | 44.77 | 61.30 | 62.67 | 37.32 | 61.78 | 8.53 | -7.45 | 0.47 |
| 5 | 22.77 | 9.97 | 30.38 | 39.52 | 19.36 | 43.05 | 16.74 | 9.38 | 12.67 |
| 6 | 56.01 | 52.07 | 52.35 | 68.68 | 58.43 | 68.48 | 12.67 | 6.36 | 16.12 |
| 7 | 40.19 | 30.19 | 45.59 | 48.83 | 42.16 | 48.26 | 8.64 | 11.97 | 2.66 |
| 8 | 63.76 | 53.62 | 51.03 | 65.05 | 63.61 | 65.08 | 1.28 | 9.97 | 14.05 |
| 9 | 52.01 | 46.92 | 43.66 | 58.95 | 31.95 | 54.05 | 6.95 | -14.96 | 10.38 |
| 10 | 22.70 | 11.18 | 26.98 | 26.09 | 21.10 | 25.86 | 3.39 | 9.92 | -1.11 |
| Avg ACC | 44.68 | 35.46 | 38.46 | 50.83 | 37.85 | 50.97 | 6.15 | 2.38 | 12.51 |
| Sd ACC | 17.62 | 16.33 | 15.12 | 18.37 | 15.89 | 18.17 | 6.03 | 8.96 | 11.96 |
Functional innovations of AMBP compared with other webservers
| Functions | AMBP | Michigan Imputation Server ( | Animal-ImputeDB ( | Plant-ImputeDB ( | StructuRly ( | SNiPlay ( | CASSAVABASE ( | easyGWAS ( |
|---|---|---|---|---|---|---|---|---|
|
| ||||||||
| Reference panels for aquaculture species |
| |||||||
| Imputation |
|
|
|
| ||||
| Illustration of LD pattern |
| |||||||
| Quality control |
| |||||||
|
| ||||||||
| Ancestry estimation |
|
|
|
| ||||
| PCA analysis |
|
|
| |||||
| Kinship inference |
|
|
| |||||
|
| ||||||||
| GWAS analysis |
|
|
|
|
| |||
| Genomic prediction |
|
| ||||||
| Neural network model |
| |||||||
| Cross validation |
| |||||||
| Genomic mating |
| |||||||
| Simulation analysis |
| |||||||
| Total numbers |
|
|
|
|
|
|
|
|