Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Beware the Jaccard: the choice of similarity measure is important and non-trivial in genomic colocalisation analysis.

Literature DB >> 31624847

Beware the Jaccard: the choice of similarity measure is important and non-trivial in genomic colocalisation analysis.

Stefania Salvatore¹, Knut Dagestad Rand², Ivar Grytten¹, Egil Ferkingstad³, Diana Domanska¹, Lars Holden⁴, Marius Gheorghe⁵, Anthony Mathelier^5,6, Ingrid Glad², Geir Kjetil Sandve¹.

Abstract

The generation and systematic collection of genome-wide data is ever-increasing. This vast amount of data has enabled researchers to study relations between a variety of genomic and epigenomic features, including genetic variation, gene regulation and phenotypic traits. Such relations are typically investigated by comparatively assessing genomic co-occurrence. Technically, this corresponds to assessing the similarity of pairs of genome-wide binary vectors. A variety of similarity measures have been proposed for this problem in other fields like ecology. However, while several of these measures have been employed for assessing genomic co-occurrence, their appropriateness for the genomic setting has never been investigated. We show that the choice of similarity measure may strongly influence results and propose two alternative modelling assumptions that can be used to guide this choice. On both simulated and real genomic data, the Jaccard index is strongly altered by dataset size and should be used with caution. The Forbes coefficient (fold change) and tetrachoric correlation are less influenced by dataset size, but one should be aware of increased variance for small datasets. All results on simulated and real data can be inspected and reproduced at https://hyperbrowser.uio.no/sim-measure.

Keywords: fold enrichment; genomic track similarity; similarity indices; similarity measures; statistical genomics

Year: 2019 PMID： 31624847 DOI： 10.1093/bib/bbz083

Source DB: PubMed Journal: Brief Bioinform ISSN： 1467-5463 Impact factor: 11.622

Keyword Cloud
Cited

4 in total

1. To Petabytes and beyond: recent advances in probabilistic and signal processing algorithms and their application to metagenomics.

Authors: R A Leo Elworth; Qi Wang; Pavan K Kota; C J Barberan; Benjamin Coleman; Advait Balaji; Gaurav Gupta; Richard G Baraniuk; Anshumali Shrivastava; Todd J Treangen
Journal: Nucleic Acids Res Date: 2020-06-04 Impact factor: 16.971

2. Comparison of the copy-neutral loss of heterozygosity identified from whole-exome sequencing data using three different tools.

Authors: Gang-Taik Lee; Yeun-Jun Chung
Journal: Genomics Inform Date: 2022-03-31

3. LXRα Regulates ChREBPα Transactivity in a Target Gene-Specific Manner through an Agonist-Modulated LBD-LID Interaction.

Authors: Qiong Fan; Rikke Christine Nørgaard; Ivar Grytten; Cecilie Maria Ness; Christin Lucas; Kristin Vekterud; Helen Soedling; Jason Matthews; Roza Berhanu Lemma; Odd Stokke Gabrielsen; Christian Bindesbøll; Stine Marie Ulven; Hilde Irene Nebb; Line Mariann Grønning-Wang; Thomas Sæther
Journal: Cells Date: 2020-05-13 Impact factor: 6.600

4. Integrating Peak Colocalization and Motif Enrichment Analysis for the Discovery of Genome-Wide Regulatory Modules and Transcription Factor Recruitment Rules.

Authors: Mirko Ronzio; Federico Zambelli; Diletta Dolfini; Roberto Mantovani; Giulio Pavesi
Journal: Front Genet Date: 2020-02-21 Impact factor: 4.599

4 in total