| Literature DB >> 31202632 |
Shahab Sarmashghi1, Vineet Bafna2.
Abstract
Genome annotation remains a fundamental effort in modern biology. With reducing costs and new forms of sequencing technologies, annotations specific to tissue type and experimental conditions are continually being generated (e.g., histone methylation marks). Computing the statistical significance of overlap between two different annotations is key to many biological findings but has not been systematically addressed previously. We formalize the problem as follows: let I and If each describe a collection of n and m intervals of a genome with particular annotation. Under the null hypothesis that genomic intervals in I are randomly arranged with respect to If, what is the significance of k of m intervals of If intersecting with intervals in I? We describe a tool iSTAT that implements a combinatorial algorithm to accurately compute p values. We applied iSTAT to simulated and real datasets to obtain precise estimates and contrasted them against previous results using permutation or parametric tests.Entities:
Keywords: genome annotations; interval overlap; p value; statistical significance
Mesh:
Year: 2019 PMID: 31202632 PMCID: PMC7200088 DOI: 10.1016/j.cels.2019.05.006
Source DB: PubMed Journal: Cell Syst ISSN: 2405-4712 Impact factor: 10.304