| Literature DB >> 32232836 |
Tongqiu Jia1, Brenton Munson1, Hana Lango Allen2, Trey Ideker1, Amit R Majithia1.
Abstract
The UK Biobank is an unprecedented resource for human disease research. In March 2019, 49,997 exomes were made publicly available to investigators. Here we note that thousands of variant calls are unexpectedly absent from this dataset, with 641 genes showing zero variation. We show that the reason for this was an erroneous read alignment to the GRCh38 reference. The missing variants can be recovered by modifying read alignment parameters to correctly handle the expanded set of contigs available in the human genome reference. Given the size and complexity of such population scale datasets, we propose a simple heuristic that can uncover systematic errors using summary data accessible to most investigators.Entities:
Keywords: DNA; exome; genetics; sequence alignment; sequence analysis
Mesh:
Year: 2020 PMID: 32232836 PMCID: PMC7402360 DOI: 10.1111/ahg.12383
Source DB: PubMed Journal: Ann Hum Genet ISSN: 0003-4800 Impact factor: 1.670