| Literature DB >> 24884663 |
Akira Hasegawa, Carsten Daub, Piero Carninci, Yoshihide Hayashizaki, Timo Lassmann1.
Abstract
BACKGROUND: Cap analysis of gene expression (CAGE) is a sequencing based technology to capture the 5' ends of RNAs in a biological sample. After mapping, a CAGE peak on the genome indicates the position of an active transcriptional start site (TSS) and the number of reads correspond to its expression level. CAGE is prominently used in both the FANTOM and ENCODE project but presently there is no software package to perform the essential data processing steps.Entities:
Mesh:
Year: 2014 PMID: 24884663 PMCID: PMC4033680 DOI: 10.1186/1471-2105-15-144
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Importing SAM to BAM functionality
| Input | input=SAM |
| Output | output=BAM |
| Command line | samtools view -bSo [output] [input] |
| Parameters | NA |
Importing BWA align functionality
| Input | input=FASTQ reference=FASTA |
| Output | output=SAI |
| Command line | bwa aln -n [error-rate] [reference] [input] > [output] |
| Command parameter | error-rate=NUMBER |
| Default value | error-rate=0.04 |
Tools available in MOIRAI
| SplitByBarcode | Demultiplexing for CAGE |
| TagDust | Remove artificial sequences |
| rRNAdust | Remove rRNA sequence |
| SAMstat | Statistics of reads |
| Tome | Expression database |
| Graph | Draw PNG graphs from text table |
ENCODE CAGE K562 Libraries
| Polysome | longNonPolyA | nanoCAGE | |
| Chromatin | TotalRNA | nanoCAGE | |
| Nucleoplasm | TotalRNA | nanoCAGE | |
| Nucleolus | TotalRNA | nanoCAGE | |
| Nucleus | longPolyA | CAGE | biological |
| Nucleus | longPolyA | CAGE | biological |
| Cytosol | longPolyA | CAGE | biological |
| Cytosol | longPolyA | CAGE | biological |
| Cytosol | longPolyA | CAGE | |
| Whole cell | longPolyA | CAGE | biological |
| Whole cell | longPolyA | CAGE | biological |
Figure 1A screenshot of the MOIRAI workflow for aligning CAGE sequences to a reference genome. Each box represents one process and a direction of arrow shows flow of data. Computation starts from input units represented by green boxes. Gray boxes represent computational units where temporary files are deleted after workflow completes. Results are kept by redirecting them to file/directory units represented by blue boxes. Content of text/image file can be embedded and shown within a workflow for displaying final products or for checking quality of data production.
Figure 2CAGE annotation based on Refseq.
Figure 3Hierarchal clustering of samples based on the expression of CAGE peaks.