| Literature DB >> 31751002 |
Jennifer Jou1, Idan Gabdank1, Yunhai Luo1, Khine Lin1, Paul Sud1, Zachary Myers1, Jason A Hilton1, Meenakshi S Kagda1, Bonita Lam1, Emma O'Neill1, Philip Adenekan1, Keenan Graham1, Ulugbek K Baymuradov1, Stuart R Miyasato1, J Seth Strattan1, Otto Jolanki1, Jin-Wook Lee1, Casey Litton1, Forrest Y Tanaka1, Benjamin C Hitz1, J Michael Cherry1.
Abstract
The Encyclopedia of DNA Elements (ENCODE) web portal hosts genomic data generated by the ENCODE Consortium, Genomics of Gene Regulation, The NIH Roadmap Epigenomics Consortium, and the modENCODE and modERN projects. The goal of the ENCODE project is to build a comprehensive map of the functional elements of the human and mouse genomes. Currently, the portal database stores over 500 TB of raw and processed data from over 15,000 experiments spanning assays that measure gene expression, DNA accessibility, DNA and RNA binding, DNA methylation, and 3D chromatin structure across numerous cell lines, tissue types, and differentiation states with selected genetic and molecular perturbations. The ENCODE portal provides unrestricted access to the aforementioned data and relevant metadata as a service to the scientific community. The metadata model captures the details of the experiments, raw and processed data files, and processing pipelines in human and machine-readable form and enables the user to search for specific data either using a web browser or programmatically via REST API. Furthermore, ENCODE data can be freely visualized or downloaded for additional analyses.Entities:
Keywords: ENCODE; database; epigenetics; human genome; regulatory elements
Mesh:
Substances:
Year: 2019 PMID: 31751002 PMCID: PMC7307447 DOI: 10.1002/cpbi.89
Source DB: PubMed Journal: Curr Protoc Bioinformatics ISSN: 1934-3396
Figure 1The ENCODE home page. This image shows the Data drop‐down menu in the toolbar opened. The first item in the menu is a link to the Experiment Matrix page.
Figure 2The Experiment Matrix page displays available ENCODE data in a matrix with biosample types and assays as the axes. Each cell of the matrix is clickable and leads to a list of experiments matching the given combination of biosample type and assay.
Figure 3The Experiment search page displays ENCODE data as a list of search results. Each experiment is shown with a brief summary of the biological material and assay name, and a link to its individual experiment summary page with more metadata details (see Fig. 5). On the left is the facet sidebar, which can be used to modify and refine the search results. The “Add all items to cart” button is a Cart function, explained further in Support Protocol 3.
Dataset Statusesa
| Status | Meaning |
|---|---|
| Released |
Publicly available datasets are marked with the “released” status. Datasets become publicly available after automatic and manual review to make sure they meet the standards and do not have data or metadata issues and inconsistencies. This status is selected by default when visiting all search views, including the Matrix page (refer to |
| Revoked | An error was found with the experiment after it became publicly available, so the status was changed to “revoked” to indicate that caution should be exercised before using the data. Some examples of errors are: The data was not compliant with the ENCODE quality requirements Issue discovered with experimental elements (antibody, biosample, etc.) |
| Archived | The dataset was superseded by another dataset that has higher quality, was collected and/or processed with newer technology, etc. The ENCODE DCC encourages use of the superseding experiment instead of the archived one. |
Additional information is available at https://www.encodeproject.org/help/getting‐started/status‐terms/.
Figure 5An experiment summary page. Below the page title and audits, the page is organized into distinct sections containing the following information: (A) Summary section: key info including but not limited to the assay performed, biosample used, assay target if applicable, platform, and controls. (B) Attribution section: information about the lab that performed the experiment and when the experiment was released. (C) Replicates section: table of experimental replicates with links to biosamples, antibodies, libraries, and genetic modifications when applicable. (D) Files section: information about the raw and processed data files generated from this experiment and subsequent analysis, provenance of data files as reflected in file association graph, and visualization of experiment‐specific genome tracks when applicable. (E) Documents section: links to additional protocol documents describing the experimental methods.
Figure 4A truncated view of the facets with the items that should be selected after step 13 of the Basic Protocol.
Audit Flag Categories
| Category | Description |
|---|---|
| Read coverage | Read depth or coverage issues for libraries. These standards were agreed upon by ENCODE production labs and are outlined in full on the data standards pages ( |
| Replication | Issues with replicate concordance or other replicate inconsistencies |
| Library complexity | Bottlenecking or library complexity issues, as outlined in the ENCODE Histone ChIP‐seq ( |
| Enrichment | Low SPOT scores for DNase‐seq experiments as outlined in the ENCODE DNase‐seq standards ( |
| Uniform pipeline requirements | Various pipeline issues, such as unexpected inconsistencies in read length, insufficient read length, and unknown platforms or other missing information |
| Antibody | Mismatches between antibody and target metadata or missing characterizations for antibodies |
| Metadata | Missing required metadata |
| Dataset consistency | Inconsistencies between different experiments grouped together in a series |
Additional information is available at https://www.encodeproject.org/data‐standards/audits/.
Figure 6The Cart page. On the left are the file selectors. Although visually similar to the facet sidebar explored in the Basic Protocol, these file selectors only affect which files will be included in files.txt, introduced in Support Protocol 1. As selections are made, the number above the selectors, which reads “137 files selected” in this figure, will update dynamically. On the right is the list of experiments saved in the cart.
Figure 7Genome browser tab in the Files section of an Experiment summary page. This tab contains the embedded Valis genome browser, which can be used to visualize signal and peaks tracks directly on the ENCODE portal. Filters to the left of the browser are used to select which tracks to visualize. Here, only the “signal p‐value” tracks are selected for visualization. The small arrow above the words “Choose an assembly” can be used to collapse and expand the filter sidebar.
Syntax for Query Building
| Syntax | Parameter example | Description |
|---|---|---|
|
|
| The equal symbol ( |
|
|
| The ampersand ( |
|
|
|
|
|
|
| The wildcard ( |
|
|
| The data model of the ENCODE portal allows for certain objects to be embedded in others. Objects are able to access the properties of the objects embedded in them. The period joins properties and sub‐properties to form the “path” to an embedded property, akin to how the forward slash ( |
|
|
|
|
|
|
|
Object: No objects are embedded Embedded: All objects are embedded Raw: Object links are in UUID (Universally Unique Identifier) format, rather than the default @id format |