| Literature DB >> 18971253 |
Elodie Portales-Casamar1, David Arenillas, Jonathan Lim, Magdalena I Swanson, Steven Jiang, Anthony McCallum, Stefan Kirov, Wyeth W Wasserman.
Abstract
The PAZAR database unites independently created and maintained data collections of transcription factor and regulatory sequence annotation. The flexible PAZAR schema permits the representation of diverse information derived from experiments ranging from biochemical protein-DNA binding to cellular reporter gene assays. Data collections can be made available to the public, or restricted to specific system users. The data 'boutiques' within the shopping-mall-inspired system facilitate the analysis of genomics data and the creation of predictive models of gene regulation. Since its initial release, PAZAR has grown in terms of data, features and through the addition of an associated package of software tools called the ORCA toolkit (ORCAtk). ORCAtk allows users to rapidly develop analyses based on the information stored in the PAZAR system. PAZAR is available at http://www.pazar.info. ORCAtk can be accessed through convenient buttons located in the PAZAR pages or via our website at http://www.cisreg.ca/ORCAtk.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18971253 PMCID: PMC2686574 DOI: 10.1093/nar/gkn783
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
PAZAR database content on August 29, 2008*
| Project | Regulated genes | Regulatory sequence (genomic) | Regulatory sequence (artificial) | Transcription factors | Transcription factor profiles | Annotated publications |
|---|---|---|---|---|---|---|
| ABS | 205 | 611 | – | 152 | – | 110 |
| AREs | 20 | 20 | – | 1 | – | 20 |
| HNF4 | 79 | 107 | – | 1 | – | 70 |
| JASPAR core | – | – | 3229 | 84 | 138 | 106 |
| Liver set | 14 | 62 | – | – | – | – |
| MUC5AC | 2 | 23 | – | 13 | – | 9 |
| Muscle set | 15 | 49 | – | – | – | – |
| Olf_Ebf_TFBS | 19 | 25 | – | 1 | – | 12 |
| ORegAnno | 256 | 690 | – | 114 | – | 305 |
| ORegAnno ENCODEprom | 302 | 457 | – | – | – | 1 |
| ORegAnno Erythroid | 8 | 33 | – | 1 | – | 1 |
| ORegAnno STAT1 lit | 28 | 37 | – | 1 | – | 29 |
| Pleiades genes | 285 | 1302 | 135 | 313 | – | 714 |
| TFe | 47 | 68 | 6 | 25 | – | 49 |
| TOTAL | 1284 | 3499 | 3370 | 708 | 138 | 1433 |
*This table includes only the experimentally validated annotations available in PAZAR and therefore excludes the Kellis predictions.
Figure 1.ORCAtk analysis pipeline. ORCAtk can be launched either directly or through PAZAR. To initiate analysis, ORCAtk first retrieves the user-specified sequence from the Ensembl database (‘e!’). Second, based on user selection, ORCAtk either performs a pairwise alignment or retrieves a multi-species alignment-based phastCons score profile from the UCSC database. Third, conserved regions are identified. If the user has chosen TFBS analysis, ORCAtk then performs a search for TFBSs in the conserved regions. The TF binding profiles for this optional analysis step are provided by the user, either by selection from the JASPAR database or by upload from a file previously downloaded from PAZAR or other sources. As output, ORCAtk provides both a graphical display and text files of the results. In addition, ORCAtk results can be viewed as tracks in the UCSC genome browser.
Figure 2.The PAZAR-ORCAtk link provides a portal to gene regulation studies. On the PAZAR Gene View page, a button is available for the user to launch an analysis using the ORCAtk. Clicking the button opens a new window displaying the ORCAtk interface with the appropriate gene already selected. The user can now proceed to the analysis including multiple steps where custom parameters can be defined or the default parameters used. The results of the analysis, as well as the annotations from PAZAR, can then be displayed on the UCSC genome browser which provides a user-friendly platform to compare experimentally-verified and predicted binding sites.