| Literature DB >> 29069466 |
René Dréos1, Giovanna Ambrosini1,2, Romain Groux1, Rouayda Cavin Périer1, Philipp Bucher1,2.
Abstract
The Mass Genome Annotation (MGA) repository is a resource designed to store published next generation sequencing data and other genome annotation data (such as gene start sites, SNPs, etc.) in a completely standardised format. Each sample has undergone local processing in order the meet the strict MGA format requirements. The original data source, the reformatting procedure and the biological characteristics of the samples are described in an accompanying documentation file manually edited by data curators. 10 model organisms are currently represented: Homo sapiens, Mus musculus, Danio rerio, Drosophila melanogaster, Apis mellifera, Caenorhabditis elegans, Arabidopsis thaliana, Zea mays, Saccharomyces cerevisiae and Schizosaccharomyces pombe. As of today, the resource contains over 24 000 samples. In conjunction with other tools developed by our group (the ChIP-Seq and SSA servers), it allows users to carry out a great variety of analysis task with MGA samples, such as making aggregation plots and heat maps for selected genomic regions, finding peak regions, generating custom tracks for visualizing genomic features in a UCSC genome browser window, or downloading chromatin data in a table format suitable for local processing with more advanced statistical analysis software such as R. Home page: http://ccg.vital-it.ch/mga/.Entities:
Mesh:
Year: 2018 PMID: 29069466 PMCID: PMC5753388 DOI: 10.1093/nar/gkx995
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Content of the MGA repository. (A) Proportion of samples in the database grouped by type. (B) Proportion of samples grouped by organism. Assemblies belonging to the same organism are merged together. (C) Samples numbers stratified by type and organism. Dot areas are proportional to the total number of samples in that category. The corresponding numbers can be found in a weakly updated table posted on the MGA home page at http://ccg.vital-it.ch/mga.
Figure 2.Examples of MGA data analysis. (A) Nucleosome organization for the lymphoblastoid cell line GM12878 around CTCF sites from the ENCODE ‘Uniform TFBS’ series. This plot was done using ChIP-Extract, sorting the CTCF sites according to similarity with the overall nucleosome pattern. (B) Distribution of conservation scores from PhyloP around TSS from the EPDnew database. D. melanogaster shows a distinctive distribution with sharp peaks corresponding to the TSS, TATA-box and DPE element. This plot was done using ChIP-Cor. (C) Example of reproducing a published figure (Figure 5A from (18)) showing the nucleosome organization around promoter-associated YY1 peaks stratified by YY1 binding strength evaluated as the number of YY1 sequencing reads around YY1 peaks. This plot was done using ChIP-Cor and ChIP-Extract. Detailed descriptions how to reproduce this figure can be found in Supplementary File 1.
Comparison with other resources
| Cistrome | ChIP-Atlas | GeneProf | MGA | |
|---|---|---|---|---|
| Organism | Hs, Mm [a] | Hs, Mm, Dm, Ce, Sc [a] | Hs, Mm, Dr, Ce, Dm, Sc, At, Gg, Ss, Os [a] | Hs, Mm, Dr, Dm, Ce, Ma, At, Zm, Sc, Sp [a] |
| Exp. Assays, genomic features | ChIP, DNase, ATAC | ChIP, DNase, MNase | ChIP, RNA, MNase | ChIP, DNase, ATAC, TSS, DNA-met, annotation, conservation, variation. |
| # of Samples | 23΄319 | 53’867 | 13’423 | 24΄344 |
| Downlaods, Export (format) | Signal (bigWig) Peaks (bed) | Signal (bigWig) Peaks (bed) | Signal (wig) Peaks (bed) Report (pdf) | Read-alignments (sga, bed) Peaks (sga, bed) Signal (bigwig) DNA sequence (fasta) Heatmaps (text table) |
| Query interface | Menu driven, free text-based | By sample ID (SRX) | Menu driven, free text-based | Menu-driven, free text-based |
| Lift-over | No | No | No | Yes |
| QC report | Yes | No | Yes | No |
| Metadata annotation | Yes | Yes | Yes | Yes |
| Visualization | WashU browser UCSC browser | IGV | Internal | UCSC browser |
| Integrated Analysis tools | Limited, only for peak files | Target genes, colocalisation, in-silico ChIP | Examine experiment analysis history | APs, Heatmaps, peak finder, DNA motif analysis |
a) Note: Hs, H. sapiens; Mm: M. musculus; Dm: D. melanogaster; Dr: D. rerio; Ce: C. elegans; At: A. thaliana; Zm: Z. mays; Or: O. sativa; Sc: S. cerevisiae; Sp: S. pombe; Gg: G. gallus; Ss: S. scrofa.