| Literature DB >> 33705528 |
Emmanuel Sapin1, Matthew C Keller1.
Abstract
MOTIVATION: Pairwise comparison problems arise in many areas of science. In genomics, datasets are already large and getting larger, and so operations that require pairwise comparisons-either on pairs of SNPs or pairs of individuals-are extremely computationally challenging. We propose a generic algorithm for addressing pairwise comparison problems that breaks a large problem (of order n2 comparisons) into multiple smaller ones (each of order n comparisons), allowing for massive parallelization.Entities:
Year: 2021 PMID: 33705528 PMCID: PMC8352502 DOI: 10.1093/bioinformatics/btab084
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.931
Fig. 1.Matrix P, P1 and
Fig. 2.Matrices P, P1, P2, P3 and P4 for n = p2 = 25
Fig. 3.Comparison of time (A) and RAM (B) performance of the standard versus novel approach for calling IBD segments using GERMLINE (with parameters -min_m 3.5 -bits 75 -err-het 1 -err-hom 1 -w_extend) as a function of different sized subsamples of the UK Biobank. We assume all 22 chromosomes could be run in parallel using the standard approach, and so RAM and time results for this approach are for the longest (2nd) chromosome. The red points are linear extrapolations of the standard approach results on the log-log scale for n = 435, 187