| Literature DB >> 32398234 |
Alexander F Gileta1,2, Jianjun Gao1, Apurva S Chitre1, Hannah V Bimschleger1, Celine L St Pierre1, Shyam Gopalakrishnan3, Abraham A Palmer4,5.
Abstract
The heterogeneous stock (HS) is an outbred rat population derived from eight inbred rat strains. HS rats are ideally suited for genome wide association studies; however, only a few genotyping microarrays have ever been designed for rats and none of them are currently in production. To address the need for an efficient and cost effective method of genotyping HS rats, we have adapted genotype-by-sequencing (GBS) to obtain genotype information at large numbers of single nucleotide polymorphisms (SNPs). In this paper, we have outlined the laboratory and computational steps we took to optimize double digest genotype-by-sequencing (ddGBS) for use in rats. We evaluated multiple existing computational tools and explain the workflow we have used to call and impute over 3.7 million SNPs. We have also compared various rat genetic maps, which are necessary for imputation, including a recently developed map specific to the HS. Using our approach, we obtained concordance rates of 99% with data obtained using data from a genotyping array. The principles and computational pipeline that we describe could easily be adapted for use in other species for which reliable reference genome sets are available.Entities:
Keywords: genotyping-by-sequencing; heterogeneous stock; imputation; rat
Mesh:
Year: 2020 PMID: 32398234 PMCID: PMC7341140 DOI: 10.1534/g3.120.401325
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Figure 2In silico digest fragment distributions for PstI and potential secondary restriction enzymes. Each panel represents an independent digest of rn6 with the listed enzyme(s). Regions highlighted in blue are fragments that would be selected by the Pippin Prep (125-275bps) after annealing adapters and primers (+125bps). These regions are quantified in Table 1 by multiplying the length of the fragments by the number of fragments to estimate the portion of the genome captured.
Restriction enzyme options for double digest. The percent genome in region columns indicate the percentage of the genome that falls within the provided fragment size ranges and can therefore be captured by GBS
| Restriction Enzyme(s) | Recognition sequence | Length of Overhang (bp) | % Genome in 250-400bp Region | % Genome in 300-450bp Region |
|---|---|---|---|---|
| PstI | CTGCA^G | 4 | 0.48% | 0.56% |
| PstI + AluI | AG^CT | 0 | 3.06% | 2.88% |
| PstI + BfaI | C^TAG | 2 | 3.10% | 3.25% |
| PstI + DpnI | GA^TC | 0 | 2.69% | 3.00% |
| PstI + HaeIII | GG^CC | 0 | 2.71% | 2.79% |
| PstI + MluCI | ^AATT | 4 | 3.32% | 3.21% |
| PstI + MspI | C^CGG | 2 | 1.16% | 1.24% |
| PstI + NlaIII | CATG^ | 4 | 3.45% | 3.31% |
Calculated using rn6 genome length of 2,870,182,909bps.
Restriction enzyme is methylation sensitive.
Figure 1ddGBS sequencing data analysis workflow. Each step of the workflow is described in the text.
Figure 3Genotype discordance rates between array data and variants called by GATK/Beagle or ANGSD-SAMtools/Beagle. The figure compares the number of variants called by combination of ANGSD-SAMtools and Beagle or GATK HaplotypeCaller and Beagle at various thresholds of genotype discordance with array data. Calls were made using the 96 HS rats with array data. The x-axis represents the genotype discordance rate thresholds and the y-axis is the number of variants that surpass that threshold for each genotype calling method.
Imputation accuracy based on different variant reference panels for IMPUTE2. The table includes five different possible reference panels for imputation. The 42 inbred strains, 34 non-founder inbred strains, and 8 HS founders from the 42 inbred strains all were derived from Hermsen (Hermsen ). The UMich 8 HS founders were obtained from Ramdas (Ramdas ). The final set of 8 HS founder was taken from Baud (Rat Genome Sequencing and Mapping Consortium )
| Chr1 | Chr2 | ||
|---|---|---|---|
| 42 inbred strains | Discordance rate | 0.011 | 0.010 |
| # Variants | 790,659 | 882,993 | |
| Genotyping Rate | 0.85 | 0.81 | |
| All 34 non-founder inbred strains | Discordance rate | 0.035 | 0.030 |
| # Variants | 812,550 | 912,749 | |
| Genotyping Rate | 0.84 | 0.80 | |
| 8 HS founders only from the 42 inbred strains | Discordance rate | 0.012 | 0.011 |
| # Variants | 805,424 | 902,061 | |
| Genotyping Rate | 0.57 | 0.53 | |
| UMich 8 HS founders only | Discordance rate | 0.0059 | 0.008 |
| # Variants | 865,514 | 898,621 | |
| Genotyping Rate | 0.42 | 0.41 | |
| Baud | Discordance rate | 0.0095 | 0.0096 |
| # Variants | 507,909 | 540,844 | |
| Genotyping Rate | 0.43 | 0.40 | |