| Literature DB >> 34019098 |
Oleg Shpynov1,2, Aleksei Dievskii1, Roman Chernyatchik1,2, Petr Tsurinov1,2, Maxim N Artyomov2.
Abstract
The widespread application of ChIP-seq led to a growing need for consistent analysis of multiple epigenetics profiles, for instance, in human studies where multiple replicates are a common element of design. Such multisamples experimental designs introduced analytical and computational challenges. For example, when peak calling is done independently for each sample, small differences in signal strength/quality lead to a very different number of peaks for individual samples, making group-level analysis difficult. On the other side, when samples are pooled together for joint analysis, individual-level statistical differences are averaged out. Recently we have demonstrated that a semi-supervised peak calling approach (SPAN) allows for robust analysis of multiple epigenetic profiles while preserving individual sample statistics. Here, we present this approach's implementation, centered around the JBR genome browser, a stand-alone tool that allows for accessible and streamlined annotation, analysis, and visualization. Specifically, JBR supports graphical interactive manual region selection and annotation, thereby addressing supervised learning's key procedural challenge. Furthermore, JBR includes the capability for peak optimization, i.e., calibration of sample-specific peak calling parameters by leveraging manual annotation. This procedure can be applied to a broad range of ChIP-seq datasets of different quality and chromatin accessibility ATAC-seq, including single-cell experiments. JBR was designed for efficient data processing, resulting in fast viewing and analysis of multiple replicates, up to thousands of tracks. Accelerated execution and integrated semi-supervised peak calling make JBR and SPAN next-generation visualization and analysis tools for multisample epigenetic data. AVAILABILITY: SPAN and JBR run on Linux, Mac OS, and Windows, and is freely available at https://research.jetbrains.org/groups/biolabs/tools/span-peak-analyzer and https://research.jetbrains.org/groups/biolabs/tools/jbr-genome-browser. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.Entities:
Year: 2021 PMID: 34019098 PMCID: PMC9502234 DOI: 10.1093/bioinformatics/btab376
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.931
Fig. 1.General scheme of semi-supervised peak calling pipeline with SPAN and JBR Genome Browser. 1) On the left - unsupervised SPAN model training for each individual sample. Coverage is computed for genome split into consequent non-overlapping windows. Optional control track coverage is scaled down proportionally to the treatment coverage and is subtracted from the treatment. SPAN 3 state HMM is trained with EM Baum-Welch algorithm. 2) On the rigth - supervised annotation markup creation in JBR. User uploads bigwig visualization of tracks and creates handful of annotations - peaks, no peaks, peak start, peak end. 3) SPAN model and annotation markup is used in the semi-supervised hyperparameters tuning procedure. FDR and GAP parameters are used to detect enriched windows from HMM (red dash line visualizes statistical false discovery rate of 0.05) and merge close windows into peaks. FDR and GAP combnination is optimized to minimize total number of unsatisfied markup annotations and produce final peaks track