Literature DB >> 31797611

Machine learning algorithms for simultaneous supervised detection of peaks in multiple samples and cell types.

Toby Dylan Hocking1,2, Guillaume Bourque.   

Abstract

Joint peak detection is a central problem when comparing samples in epigenomic data analysis, but current algorithms for this task are unsupervised and limited to at most 2 sample types. We propose PeakSegPipeline, a new genome-wide multi-sample peak calling pipeline for epigenomic data sets. It performs peak detection using a constrained maximum likelihood segmentation model with essentially only one free parameter that needs to be tuned: the number of peaks. To select the number of peaks, we propose to learn a penalty function based on user-provided labels that indicate genomic regions with or without peaks in specific samples. In comparisons with state-of-the-art peak detection algorithms, PeakSegPipeline achieves similar or better accuracy, and a more interpretable model with overlapping peaks that occur in exactly the same positions across all samples. Our novel approach is able to learn that predicted peak sizes vary by experiment type.

Mesh:

Year:  2020        PMID: 31797611

Source DB:  PubMed          Journal:  Pac Symp Biocomput        ISSN: 2335-6928


  1 in total

1.  Increased peak detection accuracy in over-dispersed ChIP-seq data with supervised segmentation models.

Authors:  Arnaud Liehrmann; Guillem Rigaill; Toby Dylan Hocking
Journal:  BMC Bioinformatics       Date:  2021-06-14       Impact factor: 3.169

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.