| Literature DB >> 36105082 |
Geoffrey E Pollott1, Richard J Piercy2, Claire Massey2, Mazdak Salavati3,4, Zhangrui Cheng3, D Claire Wathes3.
Abstract
New Mendelian genetic conditions, which adversely affect livestock, arise all the time. To manage them effectively, some methods need to be devised that are quick and accurate. Until recently, finding the causal genomic site of a new autosomal recessive genetic disease has required a two-stage approach using single-nucleotide polymorphism (SNP) chip genotyping to locate the region containing the new variant. This region is then explored using fine-mapping methods to locate the actual site of the new variant. This study explores bioinformatic methods that can be used to identify the causative variants of recessive genetic disorders with full penetrance with just nine whole genome-sequenced animals to simplify and expedite the process to a one-step procedure. Using whole genome sequencing of only three cases and six carriers, the site of a novel variant causing perinatal mortality in Irish moiled calves was located. Four methods were used to interrogate the variant call format (VCF) data file of these nine animals, they are genotype criteria (GCR), autozygosity-by-difference (ABD), variant prediction scoring, and registered SNP information. From more than nine million variants in the VCF file, only one site was identified by all four methods (Chr4: g.77173487A>T (ARS-UCD1.2 (GCF_002263795.1)). This site was a splice acceptor variant located in the glucokinase gene (GCK). It was verified on an independent sample of animals from the breed using genotyping by polymerase chain reaction at the candidate site and autozygosity-by-difference using SNP-chips. Both methods confirmed the candidate site. Investigation of the GCR method found that sites meeting the GCR were not evenly spread across the genome but concentrated in regions of long runs of homozygosity. Locating GCR sites was best performed using two carriers to every case, and the carriers should be distantly related to the cases, within the breed concerned. Fewer than 20 animals need to be sequenced when using the GCR and ABD methods together. The genomic site of novel autosomal recessive Mendelian genetic diseases can be located using fewer than 20 animals combined with two bioinformatic methods, autozygosity-by-difference, and genotype criteria. In many instances it may also be confirmed with variant prediction scoring. This should speed-up and simplify the management of new genetic diseases to a single-step process.Entities:
Keywords: Irish Moiled; WGS; cattle; glucokinase gene; perinatal mortality; recessive genetics; runs of homozigosity
Year: 2022 PMID: 36105082 PMCID: PMC9465091 DOI: 10.3389/fgene.2022.755693
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.772
Summary of results by chromosome.
| BTA | Length (bp) | Number of SNV in VCF file | Number of indels in VCF file | Estimated number of GCR | Actual number of GCR | Number of SIFT sites | Number of NoRS9 sites |
|---|---|---|---|---|---|---|---|
| 1 | 158,534,110 | 535,135 | 101,947 | 32 | 582 | 251 | 694 |
| 2 | 136,231,102 | 452,534 | 82,255 | 27 | 4 | 279 | 165 |
| 3 | 121,005,158 | 394,128 | 70,162 | 24 | 0 | 383 | 271 |
| 4 | 120,000,601 | 373,759 | 72,506 | 23 | 22 | 485 | 347 |
| 5 | 120,089,316 | 401,056 | 71,836 | 24 | 2 | 458 | 173 |
| 6 | 117,806,340 | 362,930 | 71,311 | 22 | 4 | 174 | 221 |
| 7 | 110,682,743 | 358,947 | 67,655 | 22 | 2 | 491 | 250 |
| 8 | 113,319,770 | 319,261 | 62,634 | 19 | 3 | 204 | 486 |
| 9 | 105,454,467 | 328,559 | 62,502 | 20 | 2 | 179 | 185 |
| 10 | 103,308,737 | 353,791 | 63,775 | 21 | 1 | 293 | 162 |
| 11 | 106,982,474 | 324,386 | 58,178 | 19 | 9 | 337 | 134 |
| 12 | 87,216,183 | 335,302 | 63,750 | 20 | 21 | 139 | 148 |
| 13 | 83,472,345 | 232,616 | 43,728 | 14 | 2 | 249 | 188 |
| 14 | 82,403,003 | 262,990 | 49,112 | 16 | 1 | 108 | 135 |
| 15 | 85,007,780 | 285,748 | 54,791 | 17 | 2 | 374 | 160 |
| 16 | 81,013,979 | 268,622 | 49,356 | 16 | 0 | 225 | 388 |
| 17 | 73,167,244 | 278,406 | 50,191 | 17 | 5 | 198 | 108 |
| 18 | 65,820,629 | 194,623 | 37,810 | 12 | 0 | 522 | 384 |
| 19 | 63,449,741 | 227,124 | 39,664 | 14 | 0 | 449 | 110 |
| 20 | 71,974,595 | 230,092 | 43,456 | 14 | 1 | 110 | 84 |
| 21 | 69,862,954 | 202,249 | 38,102 | 12 | 2 | 205 | 261 |
| 22 | 60,773,035 | 180,922 | 33,961 | 11 | 1 | 147 | 164 |
| 23 | 52,498,615 | 255,467 | 42,220 | 15 | 58 | 566 | 337 |
| 24 | 62,317,253 | 228,628 | 39,037 | 14 | 3 | 83 | 109 |
| 25 | 42,350,435 | 142,293 | 25,212 | 9 | 0 | 225 | 43 |
| 26 | 51,992,305 | 177,147 | 33,190 | 11 | 1 | 133 | 120 |
| 27 | 45,612,108 | 175,471 | 32,421 | 11 | 1 | 99 | 88 |
| 28 | 45,940,150 | 179,521 | 31,214 | 11 | 1 | 92 | 99 |
| 29 | 51,098,607 | 172,660 | 31,952 | 10 | 0 | 306 | 358 |
| Total | 2,489,385,779 | 8,234,367 | 1,523,928 | 496 | 730 | 7,764 | 6,372 |
BTA, chromosome number; GCR, genotype criteria sites; NoRS9, number of sites with no RS number and with at least one alternate allele in all nine genotypes. “Estimated number of GCR sites” assumes an even spread across the genome. “Number of SIFT sites” was the number of sites with a “HIGH” SIFT score.
FIGURE 1Manhattan plots of the ABD analysis of nine WGS animals (Kb). p < 0.01 at ABD score = 9,034 Kb where the ABD score on the y axis was the difference between the mean length of cases minus that of controls at each site.
Overlap between the four methods for locating a likely novel variant site.
| Method | Genotype criteria (GCR) | Autozygosity-by-difference (ABD) | High-impact SIFT score (SIFT) | No registered SNP number (NoRS9) |
|---|---|---|---|---|
| GCR |
| |||
| ABD | 22 (10) |
| ||
| SIFT | 1 (1) | 12 (12) |
| |
| NoRS9 | 78 (8) | 41 (40) | 8 (1) |
|
| GCR+ABD | 1 (1) | 8 (8) | ||
| GCR+SIFT | 1 (1) | |||
| ABD+SIFT | 1 (1) | |||
| GCR+ABD+SIFT | 1 (1) |
The table shows the number of sites in the final VCF file identified by each method (numbers in the BTA4 high-ABD region shown in parentheses).
Animal status by genotype for the 41 Sanger-sequenced animals at Chr4: g.77173487A>T (ARS-UCD1.2 (GCF_002263795.1)).
| Animal status | AA | AT | TT | Total |
|---|---|---|---|---|
| Calves | 4 | 2 | 7 | 13 |
| Known adult carriers (live) | 0 | 6 | 0 | 6 |
| Status unknown adults (live) | 13 | 9 | 0 | 22 |
| Total | 17 | 17 | 7 | 41 |
FIGURE 2Manhattan plot of the ABD analysis of the SNP-chip analysis based on the genotypes found in the PCR analysis (Kb). These results were based on animals with phenotyping informed by the PCR results (p < 0.001 at ABD score = 7,023 Kb where the ABD score on the y axis was the difference between the mean length of cases minus that of controls at each site).