| Literature DB >> 22075998 |
Kate R Rosenbloom1, Timothy R Dreszer, Jeffrey C Long, Venkat S Malladi, Cricket A Sloan, Brian J Raney, Melissa S Cline, Donna Karolchik, Galt P Barber, Hiram Clawson, Mark Diekhans, Pauline A Fujita, Mary Goldman, Robert C Gravell, Rachel A Harte, Angie S Hinrichs, Vanessa M Kirkup, Robert M Kuhn, Katrina Learned, Morgan Maddren, Laurence R Meyer, Andy Pohl, Brooke Rhead, Matthew C Wong, Ann S Zweig, David Haussler, W James Kent.
Abstract
The Encyclopedia of DNA Elements (ENCODE) Consortium is entering its 5th year of production-level effort generating high-quality whole-genome functional annotations of the human genome. The past year has brought the ENCODE compendium of functional elements to critical mass, with a diverse set of 27 biochemical assays now covering 200 distinct human cell types. Within the mouse genome, which has been under study by ENCODE groups for the past 2 years, 37 cell types have been assayed. Over 2000 individual experiments have been completed and submitted to the Data Coordination Center for public use. UCSC makes this data available on the quality-reviewed public Genome Browser (http://genome.ucsc.edu) and on an early-access Preview Browser (http://genome-preview.ucsc.edu). Visual browsing, data mining and download of raw and processed data files are all supported. An ENCODE portal (http://encodeproject.org) provides specialized tools and information about the ENCODE data sets.Entities:
Mesh:
Year: 2011 PMID: 22075998 PMCID: PMC3245183 DOI: 10.1093/nar/gkr1012
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
ENCODE experiments in the human genome are focused on a set of cell lines selected by the Consortium for intensive study
| Cell lines | Karyo | Tissue | Description | Datasets |
|---|---|---|---|---|
| Tier 1 | ||||
| GM12878 | Normal | Blood | Lymphoblastoid | 166 |
| H1-hESC | Normal | Embryonic stem | Embryonic stem | 89 |
| K562 | Cancer | Blood | Leukemia | 253 |
| Tier 2 existing | ||||
| HeLa-S3 | Cancer | Uterine cervix | Cervical carcinoma | 118 |
| HepG2 | Cancer | Liver | Liver carcinoma | 135 |
| HUVEC | Normal | Umbilical endothelium | Umbilical vein endothelial | 54 |
| Tier 2 added in 2011 | ||||
| A549 | Cancer | Lung | Lung carcinoma | 35 |
| CD14+ | Normal | Blood | Monocyte | 2 |
| IMR90 | Normal | Lung | Lung fibroblast | 3 |
| MCF-7 | Cancer | Breast | Breast carcinoma | 33 |
| SK-N-SH | Cancer | Brain | Neuroblastoma | 25 |
| Tier 3 | ||||
| 219 additional | 928 total |
All assays are performed in Tier 1; Tier 2 cell types are designated as the next level of priority.
ENCODE encompasses a diverse set of assays
| Data type | No. of experiments |
|---|---|
| Chromatin Interactions | |
| 5C | 4 |
| ChIA-PET | 6 |
| DNA methylation | |
| Methyl array | 63 |
| Methyl RRBS | 93 |
| Methyl-seq | 20 |
| Histone modifications | |
| ChIP-seq | 221 |
| ChIP-seq (MOUSE) | 28 |
| Open chromatin | |
| DNase-DGF | 19 |
| DNase-seq | 135 |
| Dnase-seq (MOUSE) | 27 |
| FAIRE-seq | 27 |
| RNA profiling | |
| CAGE | 45 |
| Exon array | 120 |
| RNA-chip | 26 |
| RNA-PET | 22 |
| RNA-seq | 151 |
| RNA-seq (MOUSE) | 27 |
| Transcription factor binding sites | |
| Epitope-tag ChIP-seq | 12 |
| ChIP-seq | 745 |
| ChIP-seq (MOUSE) | 92 |
| Other | |
| Bi-directional promoters | 1 |
| DNA cleavage | 1 |
| DNA-PET | 6 |
| Gencode genes | 5 |
| Genotype | 64 |
| Negative regulatory elements | 2 |
| Nucleosome positioning | 2 |
| Proteogenomics | 5 |
| RNA binding proteins | 49 |
| Short read mapability | 13 |
Descriptive overviews along with methods and references are included in the description page that accompanies all datasets.
Figure 1.ENCODE data displayed in the UCSC Genome Browser together with two annotations from the Roadmap Epigenomics Release III data hub. The genomic region contains two protein coding genes, plasma membrane calcium ATPase 4a (ATP2B4) and lymphocyte transmembrane adaptor 1 isoform a (LAX1). The GENCODE Genes track shows multiple variant transcripts for both genes as well as a snoRNA in the region. The Epigenomics Roadmap tracks just below the GENCODE track show H3K4me3, a histone mark associated with promoters, in two cell lines not assayed by the ENCODE project. These tracks show support for the short, non-coding form of LAX1 in mesenchymal stem cells, and support for the longer isoform in CD34 cells, based on peaks at likely promoter regions. The next three tracks are transparent overlays from seven cell lines assayed by the ENCODE project showing the H3K4me3 mark again, the H3K27Ac mark associated with active regulatory regions, and a log plot of transcription levels in the same cell lines. The histone marks and pattern of transcription show coordinated, cell-type-specific activity; the ATP2B4 gene is most active in NHEK (purple) and K562 (blue) cells, while LAX1 is most active in GM12878 (orange) cells. The DNAse and Transcription Factor ChIP-seq clusters shown in the last two tracks summarize data from a much wider range of cell lines and indicate a large number of regulatory regions. Additional details for these annotations are available on click-through.
Figure 2.Data matrix display and selection of files for download. This feature will be linked to the ENCODE portal, and will navigate to the Advanced Search features of File and Track Search.
ENCODE vital statistics, as of September 2011
| Category | Human | Mouse |
|---|---|---|
| Experiments | 1861 | 174 |
| Assay types | 29 | 3 |
| Cell and tissue types | 235 | 34 |
| ChIP antibodies | 179 | 30 |