| Literature DB >> 29433549 |
Abstract
High-throughput sequencing assays have become an increasingly common part of biological research across multiple fields. Even as the resulting sequences pile up in public databases, it is not always obvious how to make use of these data sets. Functional genomics offers approaches to integrate these "big" data into our understanding of rheumatic diseases. This review aims to provide a primer on thinking about big data from functional genomics in the context of rheumatology, using examples from the field's literature as well as the author's own work to illustrate the execution of functional genomics research. Study design is crucial to ensure the right samples are used to address the question of interest. In addition, sequencing assays produce a variety of data types, from gene expression to 3D chromatin structure and single-cell technologies, that can be integrated into a model of the underlying gene regulatory networks. The best approach for this analysis uses the scientific process: bioinformatic methods should be used in an iterative, hypothesis-driven manner to uncover the disease mechanism. Finally, the future of functional genomics will see big data fully integrated into rheumatology, leading to computationally trained researchers and interactive databases. The goal of this review is not to provide a manual, but to enhance the familiarity of readers with functional genomic approaches and provide a better sense of the challenges and possibilities.Entities:
Keywords: Big data; Chromatin; Epigenomics; Functional genomics; Gene expression; High-throughput sequencing assays
Mesh:
Year: 2018 PMID: 29433549 PMCID: PMC5810031 DOI: 10.1186/s13075-017-1504-9
Source DB: PubMed Journal: Arthritis Res Ther ISSN: 1478-6354 Impact factor: 5.156
Fig. 1Common functional elements. Inner box: definitions of various functional elements (left to right): repressed element with methylated DNA (DNAme) and repressive modification of the histone tail of a nearby nucleosome (H3K27me3); active enhancer in an open chromatin region bound by two transcription factors (TF1, TF2) and marked by H3K4me1 or H3K4me2 (enhancer) and H3K27ac (activity); promoter bound by RNA Polymerase II (PolII) in an open chromatin region around the transcription start site (TSS, black arrow) of a gene body and marked by H3K4me2 or H3K4me3; and mRNA molecules transcribed from the gene with 5′ caps and 3′ poly (A) tails. Middle box: list of high-throughput sequencing assays used to annotate these functional elements, including bisulfite sequencing (BS-seq) for DNA methylation, Assay for Transposase Accessible Chromatin (ATAC-seq) for open chromatin, high-throughput chromosome conformation capture (Hi-C) or Chromatin Interaction Analysis by Paired-End Tag Sequencing (ChIA-PET) for interactions between regions, chromatin immunoprecipitation (ChIP-seq) for histone modifications, RNA-sequencing (RNA-seq) for mature mRNA (either 5′ biased, full length, or 3′ biased), and Global Run-On (GRO-seq) for nascent mRNA. Outer box: interactions between functional elements
Different assays for functional genomics, common protocols, format of the data after processing, and application to the GRN
| Assay | Protocol | Processed data format | Direct application to GRN |
|---|---|---|---|
| Gene expression | RNA-seq, microarray | Table of samples (columns) by genes (rows) | Output |
| Chromatin state | ChIP-seq, ATAC-seq | List of peaks | Nodes |
| Chromatin interactions | Hi-C, ChIA-PET, GAM | Matrix of interactions | Edges |
| DNA methylation | BS-seq | List of methylated sites | Nodes |
| Single-cell technology | scRNA-seq | Table of cells (columns) by genes (rows) | Discriminate different GRNs |
GRN gene regulatory network, RNA-seq RNA-sequencing, ChIP-seq chromatin immunoprecipitation followed by high-throughput sequencing, ATAC-seq Assay for Transposase Accessible Chromatin followed by high-throughput sequencing, Hi-C high-throughput chromosome conformation capture, ChIA-PET Chromatin Interaction Analysis by Paired-End Tag Sequencing, GAM genome architecture mapping, BS-seq bisulfite sequencing, scRNA-seq single-cell RNA-sequencing
Fig. 2Sample output from functional genomic assays. Data generated from lung alveolar macrophages (blue) and bone marrow monocytes (grey) isolated from mice [13]. a Genome browser view of raw data from ChIP-seq (H3K4me2, H3K4me1, H3K27ac), ATAC-seq, and 3′-biased RNA-seq in 50 kb locus around RAMP1 and CCR2/CCR5. Highlighted regions from left to right represent: promoter, active intragenic enhancer, and 3′ end of RAMP1 (blue); poised intergenic enhancer and promoter/3′ end of CCR2 (gray); and promoter/3′ end of CCR2 (yellow). Genomic coordinates given above and scale of each track indicated on the left. Genes represented by blue lines below: thin lines for introns, medium lines for untranslated regions (UTRs), thick lines for exons; arrows on the gene body specify gene direction. b Quantitative measures of functional elements in a. Promoter usage is given by H3K4me2, enhancer usage by H3K4me1, enhancer activity by H3K27ac, chromatin accessibility by ATAC-seq, and gene expression by RNA-seq. Values represent normalized density (read count per kb region length per million reads) for ATAC-seq and ChIP-seq, and normalized CPM (counts per million reads) for RNA-seq (note varying scale). RAMP1, example of lung-specific gene with constitutive promoter. CCR2, example of highly monocyte-specific gene with monocyte-specific promoter and enhancer. CCR5, example of nonexpressed gene with low promoter activity. c Heatmaps clustered into lung-specific and monocyte-specific functional elements indicating how data from individual genes are integrated into global analyses. Differential enhancer usage measured by absolute value of H3K4me1 in 6575 regions and differential gene expression measured by relative value of RNA-seq in 3348 genes. ATAC-seq Assay for Transposase Accessible Chromatin followed by high-throughput sequencing, RNA-seq RNA-sequencing