| Literature DB >> 25270877 |
Manuel Sánchez-Castillo1, David Ruau1, Adam C Wilkinson1, Felicia S L Ng1, Rebecca Hannah1, Evangelia Diamanti1, Patrick Lombard2, Nicola K Wilson3, Berthold Gottgens4.
Abstract
CODEX (http://codex.stemcells.cam.ac.uk/) is a user-friendly database for the direct access and interrogation of publicly available next-generation sequencing (NGS) data, specifically aimed at experimental biologists. In an era of multi-centre genomic dataset generation, CODEX provides a single database where these samples are collected, uniformly processed and vetted. The main drive of CODEX is to provide the wider scientific community with instant access to high-quality NGS data, which, irrespective of the publishing laboratory, is directly comparable. CODEX allows users to immediately visualize or download processed datasets, or compare user-generated data against the database's cumulative knowledge-base. CODEX contains four types of NGS experiments: transcription factor chromatin immunoprecipitation coupled to high-throughput sequencing (ChIP-Seq), histone modification ChIP-Seq, DNase-Seq and RNA-Seq. These are largely encompassed within two specialized repositories, HAEMCODE and ESCODE, which are focused on haematopoiesis and embryonic stem cell samples, respectively. To date, CODEX contains over 1000 samples, including 221 unique TFs and 93 unique cell types. CODEX therefore provides one of the most complete resources of publicly available NGS data for the direct interrogation of transcriptional programmes that regulate cellular identity and fate in the context of mammalian development, homeostasis and disease.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25270877 PMCID: PMC4384009 DOI: 10.1093/nar/gku895
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 19.160
Figure 1.Flow diagram of the CODEX processing pipeline for ChIP-Seq, RNA-Seq and DNase-Seq. Data is downloaded from GEO and converted to fastq (in-house experiments are directly provided in this format). A quality test is performed and adapters and overrepresented sequences are removed from the raw reads. Trimmed sequences are then aligned and the resulting SAM file is converted to a BED format file from which a density profile is computed.
Figure 2.User-friendly, comparative and informative analysis of NGS data. CODEX provides users with immediate access to uniformly analysed publicly available TF ChIP-Seq, histone ChIP-Seq, DNase-Seq and RNA-Seq experiments. NGS experiments can be viewed as sessions in the UCSC Genome Browser, downloaded for further analysis or further integrated using built-in web-tools including Comparison Between Organisms, GSCA, Correlation Analysis or Motif Analysis.