| Literature DB >> 24304889 |
Robert Petryszak1, Tony Burdett, Benedetto Fiorelli, Nuno A Fonseca, Mar Gonzalez-Porta, Emma Hastings, Wolfgang Huber, Simon Jupp, Maria Keays, Nataliya Kryvych, Julie McMurry, John C Marioni, James Malone, Karine Megy, Gabriella Rustici, Amy Y Tang, Jan Taubert, Eleanor Williams, Oliver Mannion, Helen E Parkinson, Alvis Brazma.
Abstract
Expression Atlas (http://www.ebi.ac.uk/gxa) is a value-added database providing information about gene, protein and splice variant expression in different cell types, organism parts, developmental stages, diseases and other biological and experimental conditions. The database consists of selected high-quality microarray and RNA-sequencing experiments from ArrayExpress that have been manually curated, annotated with Experimental Factor Ontology terms and processed using standardized microarray and RNA-sequencing analysis methods. The new version of Expression Atlas introduces the concept of 'baseline' expression, i.e. gene and splice variant abundance levels in healthy or untreated conditions, such as tissues or cell types. Differential gene expression data benefit from an in-depth curation of experimental intent, resulting in biologically meaningful 'contrasts', i.e. instances of differential pairwise comparisons between two sets of biological replicates. Other novel aspects of Expression Atlas are its strict quality control of raw experimental data, up-to-date RNA-sequencing analysis methods, expression data at the level of gene sets, as well as genes and a more powerful search interface designed to maximize the biological value provided to the user.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24304889 PMCID: PMC3964963 DOI: 10.1093/nar/gkt1270
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.The RNA-seq processing pipeline used to generate data for Expression Atlas. The experimental metadata is retrieved from ArrayExpress. The raw FASTQ files, retrieved from European Nucleotide Archive, undergo a quality control procedure via FASTQC package to remove low-quality reads and uncalled bases. Subsequently, contaminated reads (e.g. bacterial in the cases of vertebrate samples) are removed. TopHat 1 is used for mapping the reads to the reference genome, Cufflinks 1 quantifies baseline expression for genes and transcripts and HTseq quantifies expression used for subsequent differential expression analysis with DESeq. The final (summarized) baseline expression count for a gene in a condition is a median across first technical replicates, then across biological replicates corresponding to that condition.
Figure 2.Example baseline expression experiment page, with help annotations—Illumina Body Map. (For further information see: http://www.ebi.ac.uk/gxa/help/baseline-atlas.html).
Figure 4.Baseline expression on summary page example for human BRCA1 gene: http://www.ebi.ac.uk/gxa/genes/ENSG00000012048.
Figure 5.Differential expression on summary page for human BRCA1 gene: http://www.ebi.ac.uk/gxa/genes/ENSG00000012048.