| Literature DB >> 34416909 |
Aaron Gu1,2, Hyun Jae Cho1,2, Nathan C Sheffield3,4,5,6.
Abstract
Functional genomics experiments, like ChIP-Seq or ATAC-Seq, produce results that are summarized as a region set. There is no way to objectively evaluate the effectiveness of region set similarity metrics. We present Bedshift, a tool for perturbing BED files by randomly shifting, adding, and dropping regions from a reference file. The perturbed files can be used to benchmark similarity metrics, as well as for other applications. We highlight differences in behavior between metrics, such as that the Jaccard score is most sensitive to added or dropped regions, while coverage score is most sensitive to shifted regions.Entities:
Mesh:
Year: 2021 PMID: 34416909 PMCID: PMC8379854 DOI: 10.1186/s13059-021-02440-w
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Fig. 1Overview of simulation study and comparison of four similarity metrics. A One BED file was used to create 360 perturbed files, 10 repetitions for each of 36 different combinations of add, drop, and shift perturbations. B Examples of a demo region set that has been shifted to different degrees. C We calculated four similarity metrics between the original file and the perturbed files. The greater the similarity score decrease, the more sensitive the metric is. D Within each parameter group, as the perturbation increases, the similarity score decreases. E Results for shift, drop, and add-only perturbations. L, low; M, moderate; H, high perturbation
Fig. 2Bedshift and similarity score results. A Similarity scores for shift, drop, and add perturbations, along with their pairwise combinations. B Summary of similarity score change for different perturbation levels measured. For a given scenario, each perturbation is classified as either “increasing,” “not used,” or “held constant.” To condense the information from panel A, the amount of decrease shown for each Scenario is the average across all three levels in the corresponding Scenario in panel A. C Summary of the sensitivity of each metric to the three different perturbations