| Literature DB >> 35923696 |
Maria Bernard1,2, Audrey Dehaullon1, Guangtu Gao3, Katy Paul1, Henri Lagarde1, Mathieu Charles1,2, Martin Prchal4, Jeanne Danon5, Lydia Jaffrelo5, Charles Poncet5, Pierre Patrice6, Pierrick Haffray6, Edwige Quillet1, Mathilde Dupont-Nivet1, Yniv Palti3, Delphine Lallias1, Florence Phocas1.
Abstract
Single nucleotide polymorphism (SNP) arrays, also named « SNP chips », enable very large numbers of individuals to be genotyped at a targeted set of thousands of genome-wide identified markers. We used preexisting variant datasets from USDA, a French commercial line and 30X-coverage whole genome sequencing of INRAE isogenic lines to develop an Affymetrix 665 K SNP array (HD chip) for rainbow trout. In total, we identified 32,372,492 SNPs that were polymorphic in the USDA or INRAE databases. A subset of identified SNPs were selected for inclusion on the chip, prioritizing SNPs whose flanking sequence uniquely aligned to the Swanson reference genome, with homogenous repartition over the genome and the highest Minimum Allele Frequency in both USDA and French databases. Of the 664,531 SNPs which passed the Affymetrix quality filters and were manufactured on the HD chip, 65.3% and 60.9% passed filtering metrics and were polymorphic in two other distinct French commercial populations in which, respectively, 288 and 175 sampled fish were genotyped. Only 576,118 SNPs mapped uniquely on both Swanson and Arlee reference genomes, and 12,071 SNPs did not map at all on the Arlee reference genome. Among those 576,118 SNPs, 38,948 SNPs were kept from the commercially available medium-density 57 K SNP chip. We demonstrate the utility of the HD chip by describing the high rates of linkage disequilibrium at 2-10 kb in the rainbow trout genome in comparison to the linkage disequilibrium observed at 50-100 kb which are usual distances between markers of the medium-density chip.Entities:
Keywords: SNP; doubled haploid lines; high-density chip; isogenic lines; linkage disequilibrium; rainbow trout; sequence; single nucleotide polymorphism
Year: 2022 PMID: 35923696 PMCID: PMC9340366 DOI: 10.3389/fgene.2022.941340
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.772
FIGURE 1Process for submitted SNPs for inclusion on a high-density genotyping array.
FIGURE 2Marker density per Mb for the HD Trout Affymetrix array with 664,503 SNPs positioned on the 29 chromosomes of the Swanson genome reference.
FIGURE 3SNP density per Mb for the INRAE_USDA full variant dataset (32.4M SNPs) located on the 29 chromosomes of the Swanson genome reference.
FIGURE 4MAF distribution of USDA or INRAE SNP datasets. These datasets have been filtered to keep bi-allelic SNP with a minimal MAF >1% in their respective populations.
FIGURE 5Marker density per Mb for the HD Trout Affymetrix array with 576,118 SNPs positioned on the 32 chromosomes of the Arlee reference genome.
FIGURE 6Distribution of SNPs according to their MAF class in the LB and LC French commercial lines.
FIGURE 7LD decay from 2 to 100 kb intermarker distances (average over the 32 chromosomes) for the LB and LC French commercial lines and the HA American population.
FIGURE 8Average linkage disequilibrium (r2 values) from 2 to 1,000 kb derived for all chromosomes and only for Omy5 or Omy13 in populations LB, LC, and HA, respectively.