| Literature DB >> 27980397 |
Yixing Han1, Ximiao He2.
Abstract
Epigenetics is one of the most rapidly expanding fields in biomedical research, and the popularity of the high-throughput next-generation sequencing (NGS) highlights the accelerating speed of epigenomics discovery over the past decade. Epigenetics studies the heritable phenotypes resulting from chromatin changes but without alteration on DNA sequence. Epigenetic factors and their interactive network regulate almost all of the fundamental biological procedures, and incorrect epigenetic information may lead to complex diseases. A comprehensive understanding of epigenetic mechanisms, their interactions, and alterations in health and diseases genome widely has become a priority in biological research. Bioinformatics is expected to make a remarkable contribution for this purpose, especially in processing and interpreting the large-scale NGS datasets. In this review, we introduce the epigenetics pioneering achievements in health status and complex diseases; next, we give a systematic review of the epigenomics data generation, summarize public resources and integrative analysis approaches, and finally outline the challenges and future directions in computational epigenomics.Entities:
Keywords: DNA methylation; NGS; chromatin; computational epigenomics; epigenetics; histone modification; integrative analysis; ncRNAs
Year: 2016 PMID: 27980397 PMCID: PMC5138066 DOI: 10.4137/BBI.S38427
Source DB: PubMed Journal: Bioinform Biol Insights ISSN: 1177-9322
Main epigenomics data generation methods.
| APPLICATION | METHODS | PRINCIPLE | REFS |
|---|---|---|---|
| DNA methylation pattern detection | Methylated DNA immunoprecipitation (MeDIP) | Purified DNA is immunoprecipitated with an antibody against methylated cytosines, giving rise to genomic maps of DNA methylation | |
| Bisulfite sequencing | Bisulfite to convert the unmethylated cytosines to uracils | ||
| Reduced representation bisulfite sequencing (RRBS) | Combines restriction enzymes and bisulfite sequencing in order to enrich for the areas of the genome that have a high CpG content | ||
| Histone modification patter detection, chromatin binding protein pattern detection | ChIP chip | Specific antibodies used for enrichment of the DNA fragments at modification sites followed by array hybridazation | |
| ChIP-seq | Specific antibodies used for enrichment of the DNA fragments at modification sites followed by high-throughput sequencing | ||
| 3D structure of chromatin | DNase-seq | At Dnase I hypersensitive sites (DHSs), chromatin are sensitive to cleavage by the Dnase I enzyme. These accessible chromatin zones are functionally related to transcriptional activity | |
| Hi-C chromosome conformation capturing technique | Chromosome contacts are captured by formaldehyde cross-linking | ||
| RNA-protein and RNA-DNA interaction | RIP-chip | Specific antibodies used for immunopreciptation of the RNA fragments at RNA-binding sites followed by reverse transcription and microarray | |
| RIP-seq | Specific antibodies used for immunopreciptation of the RNA fragments at RNA-binding sites followed by reverse transcription and high-throughput sequencing | ||
| CLIP-seq | UV cross-linking with immunoprecipitation to analyze protein interactions with RNA to precisely locate RNA-protein binding site and RNA modifications. Modified versions including PAR-CLIP (photoactivatable-ribonucleoside-enhanced CLIP) can improve the signal-to-noise ratio and iCLIP (Individual-nucleotide resolution CLIP) can achieve a higher efficiency in reverse-transcription. | ||
| ChIRP-seq | Biotin labeled oligos that are complement to interested RNA are used to hybridize crosslinked chromatin fragments to capture biotin-oligo-RNA-DNA-protein complexes, DNA then isolated from the complexes for high-throughput sequencing to illustrate the RNA-DNA interaction |
Software and tools for epigenomic data analysis.
| SOFTWARE/TOOL | DESCRIPTION | URL | REFS |
|---|---|---|---|
| GSNAP | A wild-card bisulfite aligner included in a general-purpose alignment tool (Genomic Short-read Nucleotide Alignment Program) | ||
| LAST | A wild-card bisulfite aligner included in a general-purpose alignment tool | ||
| RMAP | A Wild-card bisulfite aligner included in a general-purpose alignment tool | ||
| segemehl | A wild-card bisulfite aligner included in a general-purpose alignment tool | ||
| Bismark | A widely used three-letter bisulfite aligner based on Bowtie/Bowtie2 | ||
| BRAT | A bisulfite-treated reads tool using the three-letter alignment | ||
| BS-Seeker | A three-letter bisulfite aligner based on Bowtie | ||
| MethylCoder | A three-letter bisulfite aligner based on Bowtie/GSNAP | ||
| BSMAP | A widely used wild-card aligner for bisulfite sequencing reads | ||
| Pash | A wild-card bisulfite aligner using gapped k-mer and multi-positional hash table | ||
| BISMA | Mapping and clustering of bisulfite sequencing data for individual clones from unique and repetitive sequences | ||
| BRAT-BW | A fast, accurate and memory-efficient BS aligner using the FM-index (Burrows-Wheeler transform) | ||
| B-SOLANA | A aligner for bisulfite-sequencing data of ABI SOLiD sequencers | ||
| RRBSMAP | A wild-card aligner for RRBS reads | ||
| BiSeq | An R package for detect differentially methylated regions (DMRs) for BS data | ||
| bumphunter | Bump hunting to identify differentially methylated regions | ||
| DMRcate | An R package for detecting differentially methylated regions (DMRs) based on tunable kernel smoothing | ||
| IMA | An R package for high-throughput analysis of Illumina’s 450K Infinium methylation data | ||
| M3D | An R package for detecting differentially methylated regions (DMRs) using a non-parametric, kernel-based method | ||
| methylSig | An R package for detecting differentially methylated sites (DMCs) or regions (DMRs) using a beta-binomial model | ||
| metilene | A fast and sensitive tool for detecting DMR by a binary segmentation algorithm combined with a two-dimensional statistical test | ||
| MOABS | A tool for detecting differentially methylated sites (DMCs) or regions (DMRs) based on a Beta-Binomial hierarchical model with relative low CpG coverage (~10X) | ||
| NHMMfdr | An R package for detecting differential DNA methylation based on non-homogeneeous hidden Markov model (NHMM) by estimating false discovery rates (FDRs) | ||
| QDMR | A tool for detecting DMR based on Shannon entropy | ||
| Bsmooth | Bsmooth is a pipeline for analyzing whole genome bisulfite sequencing (WGBS) data. It includes tools for aligning the data, quality control, and identifying differentially methylated regions (DMRs). | ||
| MethPipe | A computational pipeline for analyzing bisulfite sequencing data (WGBS and RRBS), including BS mapping (Wild-Card aligner) and DMR calling | ||
| RefFreeDMA | Mapping for RRBS reads and DMR calling without a reference genome | ||
| BWA | A fast and efficientlight-weighted tool that aligns short sequences to a sequence database; based on the Burrows–Wheeler transform | ||
| Bowtie | Ultrafast, memory-efficient short read aligner. Uses a Burrows-Wheeler-Transformed (BWT) index | ||
| ELAND | Efficient Large-Scale Alignment of Nucleotide Databases. Whole genome alignments to a reference genome | Illumina | |
| GenomeMapper | GenomeMapper is a short read mapping tool designed for accurate read alignments. It quickly aligns millions of reads either with ungapped or gapped alignments | ||
| GNUMAP | Genomic Next-generation Universal MAPper is a program designed to accurately map sequence data obtained from next-generation sequencing machines back to a genome of any size. It seeks to align reads from nonunique repeats using statistics | ||
| HiCUP | A tool for mapping and performing quality control on Hi-C data | ||
| GSNAP | Considers a set of variant allele inputs to better align to heterozygous sites | ||
| MAQ | Mapping and Assembly with Qualities (renamed from MAPASS2). Particularly designed for Illumina with preliminary functions to handle ABI SOLiD data | ||
| SOAP | SOAP (Short Oligonucleotide Alignment Program). A program for efficient gapped and ungapped alignment of short oligonucleotides onto reference sequences | ||
| SOAP2 | SOAP2 used a Burrows Wheeler Transformation (BWT) compression index to substitute the seed strategy for indexing the reference sequence in the main memory | ||
| ZOOM | ZOOM (Zillions Of Oligos Mapped) is designed to map millions of short reads, emerged by next-generation sequencing technology, back to the reference genomes, and carry out post-analysis | ||
| BroadPeak | A novel algorithm for identifying broad peaks in diffuse ChIP-seq datasets | ||
| MACS | MACS fits data to a dynamic Poisson distribution; works with and without control data | ||
| PeakSeq | PeakSeq takes into account differences in mappability of genomic regions; enrichment based on FDR calculation | ||
| SICER | A clustering approach for identification of enriched domains from histone modification ChIP-Seq data | ||
| SISSRS | A novel algorithm for precise identification of binding sites from short reads generated from ChIP-Seq experiments | ||
| ZINBA | ZINBA can incorporate multiple genomic factors, such as mappability and GC content; can work with point-source and broad-source peak data | ||
| baySeq | An R package that uses empirical Bayes approach to identify significant differences; assumes negative binomial distribution of data | ||
| ChIPDiff | A toolkit for the genome-wide comparison of histone modification sites identified by ChIP-seq, differential histone modification sites (DHMS) identification, uses binomial distribution, Baum-Welch expectation maximization (EM) algorithm, forward-backward algorithm | ||
| edgeR | An R package that uses negative binomial distribution to model differences in tag counts; uses replicates to better estimate significant differences | ||
| DESeq | DESeq uses negative binomial distribution, but differs in the calculation of the mean and variance of the distribution | ||
| SAMSeq | SAMSeq based on the popular SAM software; a non-parametric method that uses resampling to normalize for differences in sequencing depth | ||
| miRDeep | miRDeep was developed to discover active known or novel miRNAs from deep sequencing data after the removal of adapters with a number of scripts to preprocess and score the mapped data | ||
| miRDeep2 | miRDeep2 is more sensitively and robustly to carry out identifying known and novel miRNAs by evaluating the structure and signature for each precursor, quantifying known miRNAs based on the annotation in miRBase and predicting secondary structure by RNAfold tool | ||
| miRDeep | miRDeep | ||
| DARIO | DARIO is a web service for studying short read data from small RNA-seq experiments. It provides a wide range of analysis features, including quality control, read normalization, ncRNA quantification and prediction of putative ncRNA candidates | ||
| ncPRO-seq | ncPRO-seq is a tool for annotation and profiling of ncRNAs from small-RNA sequencing data. It aims to interrogate and perform detailed analysis on small RNAs derived from annotated non-coding regions in miRBase, piRBase, Rfam and repeatMasker, and regions defined by users. The ncPRO pipeline also has a module to identify regions significantly enriched with short reads that cannot be classified as known ncRNA families | ||
| CoRAL | CoRAL is a machine-learning package that can predict the precursor class of small RNAs present in a high-throughput RNA-sequencing dataset and produces information about the features that are most important for discriminating different populations of small non-coding RNAs | ||
| RNA-CODE | RNA-CODE is designed for ncRNA identification in NGS data that lack quality reference genomes. Given a set of short reads, it classifies the reads into different types of ncRNA families. The classification results can be used to quantify the expression levels of different types of ncRNAs in RNA-seq data and ncRNA composition profiles in metagenomic data, respectively | ||
| CAP-miRSeq | A comprehensive analysis pipeline for deep microRNA sequencing that integrates read preprocessing, alignment, mature/precursor/novel miRNA qualification, variant detection in miRNA coding region, and flexible differential expression between experimental conditions | ||
| iMir | A modular pipeline for comprehensive analysis of smallRNA-Seq data, comprising specific tools for adapter trimming, quality filtering, differential expression analysis, biological target prediction and other useful options by integrating multiple open source modules and resources in an automated workflow | ||
| UEA sRNA workbench | UEA sRNA workbench performs complete analysis of single or multiple-sample small RNA datasets to identify novel micro RNA sequences and profiling small RNA expression patterns in genetic data | ||
| omiRas | omiRas is a web server for annotation, comparison and visualization of interaction networks of non-coding RNAs derived from small RNA-Sequencing | ||
| sRNAtoolbox | sRNAtoolbox provide several tools including sRNAbench for sRNA expression profiling and prediction of novel microRNAs, sRNAde for differential expression analysis, miRNA-consTarget for prediction of miRNAs, sRNAjBrowserDE for visualization differential expression as a fuction of read length and sRNAfuncTerms for determination of over represented functional annotations in target gene set | ||
| iSeeRNA | iSeeRNA is a support vector machine (SVM)-based classifier for the identification of lincRNAs | ||
| Sebnif | Sebnif is an Integrated Bioinformatics Pipeline for the Identification of Novel Large Intergenic Noncoding RNAs (lincRNAs) base on iSeeRNA | ||
| LncRNA2Function | LncRNA2Function – a comprehensive resource for functional investigation of human lncRNAs based on RNA-seq data | ||
| Novoalign | An accurate NGS short reads aligner for aligning to reference genome | ||
| PIPE-CLIP | A Galaxy framework-based comprehensive online pipeline for reliable analysis of data generated by three types of CLIP-seq protocol | ||
| PARalyzer | It utilizes this nucleotide ubstation in a kernel density estimate classifier to generate the high-resolution set of Protein-RNA interaction sites | ||
| Piranha | Piranha is a peak finding and differential binding detection algorithm | ||
| wavClusteR | An integrated pipeline for the analysis of PAR-CLIP data | ||
| dCLIP | dCLIP is designed for quantitative CLIP-seq comparative analysis is able to effectively identify differential binding regions of RBPs in four CLIP-seq datasets | ||
| GraphProt | GraphProt is a machine learning computational framework for learning sequence- and structure-binding preferences of RNA-RBPs from high-throughput experimental data | ||
| MEME | Perform motif discovery on DNA, RNA or protein datasets | ||
| cERMIT | cERMIT is a computationally efficient motif discovery tool based on analyzing genome-wide quantitative regulatory evidence | ||
| GLAM2 (Gapped Local Alignment of Motifs) | GLAM2 is a motif detection tool for discovering motifs allowing indels in a fully general manner from DNA, RNA and protein datasets | ||
| MatrixREDUCE | A motif discovery tool for genome-wide ChIP-seq and CLIP-seq data analysis | ||
| RNA Bind-n-Seq | A quantitative assessment of the sequence and structural binding specificity | ||
| CapR | An efficient algorithm that calculates the probability that each RNA base position is located within each secondary structural context | ||
| RNAcontext | An efficient motif finding method ideally suited for using large-scale RNA-binding affinity datasets to determine the relative binding preferences of RBPs for a wide range of RNA sequences and structures | ||
| ViennaRNA Package 2.0 | A widely used compilation of RNA secondary structure | ||
| Ensembl | A widely used Web-based genome browser with various epigenome data sets | ||
| IGV | A widely used graphical genome browser that is run locally on the user’s computer | ||
| UCSC Genome Browser | Widely used Web-based genome browser hosting all ENCODE data | ||
| BDPC | Web-based tool for bisulfite sequencing data presentation and compilation | ||
| DaVIE | The database with an intuitive user interface to perform visual comparisons across large DNA methylation data sets | ||
| EpiExplorer | A web server provides an interactive gateway for exploring large-scale epigenetic datasets of the human and mouse genome | ||
| EpiGRAPH | A user-friendly software for advanced (epi-) genome analysis and prediction by powerful machine learning algorithms | ||
| WashU Epigenome Browser | Web-based genome browser focusing on the human epigenome | ||
| MethBase | A central reference methylome database created from public BS-seq datasets | ||
| MethDB | A database for DNA methylation and environmental epigenetic effects | ||
| MethyCancer | Database of cancer DNA methylation data | ||
| PubMeth | Database of DNA methylation literature | ||
| ChromatinDB | A database of genome-wide histone modification patterns for | ||
| CR Cistrome | A ChIP-Seq database for chromatin regulators and histone modification linkages in human and mouse | ||
| Histome | A relational knowledgebase of human histone proteins and histone modifying enzymes | ||
| HHMD | The human histone modification database | ||
| starBase V2.0 | starBase is designed for decoding ncRNA and the RNA-protein interaction networks and predicting functions especially incancer samples | ||
| CLIPZ | CLIPZ supports the automatic functional annotation and visualization of CLIP-seq identified binding sites | ||
| doRiNA | A database of RNA interactions in post-transcriptional regulation | ||
| CLIPdb | An intergrated resource for characterizing the regulatory networks between RBPs and various RNA transcript classes | ||
Note:
The descriptions are adapted from the software/tools website descriptions.
Large-scale epigenome projects.
| PROJECTS AND WEBSITES | SUMMARY |
|---|---|
| IHEC (International Human Epigenome Consortium) ( | IHEC launched with a goal to understand to what extent the epigenome has shaped the human population genetically and in response to the environment by coordinating the reference maps of human epigenomes for key cellular states in health and diseases status. It has been distributed to multiple contributing projects including the NIH Roadmap, the ENCODE and the BLUEPRINT projects. |
| NIH Roadmap Epigenomics Mapping Consortium ( | The NIH Roadmap Epigenomics Mapping Consortium was launched with the goal of producing a public resource of human epigenomic data to catalyze basic biology and disease-oriented research. The Consortium expects to deliver a collection of normal epigenomes that will provide a framework or reference for comparison and integration within a broad array of future studies. |
| ENCODE (Encyclopedia of DNA Elements) ( | The ENCODE Consortium is an international collaboration of research groups funded by the National Human Genome Research Institute (NHGRI). The goal of ENCODE is to build a comprehensive parts list of functional elements in the human genome, including elements that act at the protein and RNA levels, and regulatory elements that control cells and circumstances in which a gene is active. Although epigenome mapping is not its main goal, the project includes largescale mapping of DNA methylation, histone modifications and other epigenetic information. |
| BLUEPRINT ( | BLUEPRINT is a large-scale research project receiving close to 30 million euro funding from the EU. 39 leading European universities, research institutes and industry entrepreneurs participate in what is one of the two first so-called high impact research initiatives to receive funding from the EU. |
| HEP (Human Epigenome Project) ( | The partially EU-funded HEP analyzed DNA methylation in 43 unrelated individuals at single basepair resolution. Although the analysis was confined to selected regions on three chromosomes, it is the largest high-resolution, multiindividual epigenome dataset published to date. |
| German DEEP project ( | DEEP focuses on the analysis of cells connected to complex diseases with high socio-economic impact: metabolic diseases such as steatosis and adipositas as well as inflammatory diseases of the joints and the intestine. DEEPs goal is to generate high-end data for comprehensive biomedical interpretation of healthy and diseased cells. With this DEEP will contribute to discover new functional epigenetic links useful for clinical diagnosis, therapy and health risk prevention. All data generated will be made publically available and will be integrated into a sustainable world-wide data structure comprised by the IHEC initiative. |
| HEROIC (High-throughput Epigenetic Regulatory Organisation In Chromatin) (EU) ( | The HEROIC project is a multi-center EU project that applies ChIP-on-chip, chromosome interaction analysis and whole-genome nuclear localization assays to understanding human genome regulation. |
| AHEAD (Alliance for Human Epigenomics and Disease) Task Force (international) ( | The goal of the AHEAD is to initiate and coordinate a comprehensive human epigenome-mapping project. Initially, focus is set on developing a suitable bioinformatic infrastructure and on performing epigenome mapping in a selection of normal tissues, which may provide the reference for subsequent mapping in abnormal cells. |
| ICGC (International Cancer Genome Consortium) ( | The goal of the ICGC is to obtain a comprehensive description of genomic, transcriptomic and epigenomic changes in 50 different tumor types and/or subtypes which are of clinical and societal importance across the globe. |
| TCGA (The Cancer Genome Atlas) ( | The Cancer Genome Atlas (TCGA), collaboration between the National Cancer Institute (NCI) and National Human Genome Research Institute (NHGRI), aims to generate comprehensive, multi-dimensional maps of the key genomic changes in major types and subtypes of cancer. |
| FANTOM project ( | FANTOM is an international research consortium established to assign functional annotations to the full-length cDNAs that were collected during the Mouse Encyclopedia Project at RIKEN. FANTOM developed and expanded over time to encompass the fields of transcriptome analysis. FANTOM database and the FANTOM full-length cDNA clone bank are worldwide available resources that already fueled the iPS development. |
| GENECODE project ( | GENCODE as a sub-project of the ENCODE scale-up project are aiming to integrated annotation of gene features. Currently running phase is continuously to improve the coverage and accuracy of the human and mouse gene set by enhancing and extending the annotation of all evidence-based gene features at a high accuracy, including protein-coding loci with alternatively splices variants, non-coding loci and pseudogenes. |
Note: *The descriptions are adapted from indicated website sources.