| Literature DB >> 20944597 |
Abstract
Comparative analysis of epigenomes offers new opportunities to understand cellular differentiation, mutation effects and disease processes. But the scale and heterogeneity of epigenetic data present numerous computational challenges.Entities:
Mesh:
Year: 2010 PMID: 20944597 PMCID: PMC4144442 DOI: 10.1038/nbt1010-1053
Source DB: PubMed Journal: Nat Biotechnol ISSN: 1087-0156 Impact factor: 54.908
Figure 1The Scope of Epigenomic Variation
The spectrum of epigenomic variation is wide, spanning biological processes on all time scales, from rapid physiological homeostatic processes that may occur at the scale of minutes to the diversity across species separated by tens of millions of years of evolution. Waddington's epigenetic landscape27 and the bifurcating tree of cellular differentiation corresponding to the landscape are on the right, highlighted by the light blue background.
Figure 2A Cyberinfrastructure for Epigenome Analysis and Comparison
The cyberinfrastructure seamlessly connects users and resources that are geographically distributed over the network. A clinical researcher conducting a study of disease-related epigenomic perturbations may rely almost completely on remote resources distributed over the web for primary processing of the data (Data Levels 0-3) and comparative analysis using the Human Epigenome Atlas. Upon publication of results, individual projects contribute data to the Human Epigenome Atlas, thus enhancing the utility of this shared resource for future users.
Cyberinfrastructure for epigenome research: key concepts
The components of the emerging cyberinfrastructure are organized around six general requirements listed in the first column.
| Requirement | Concept | Description and examples of relevance for | ||
|---|---|---|---|---|
| 1. | Data reuse | Data Level | This abstraction captures commonalities and | |
| Data Level 0 | Refers to the DNA sequence | |||
| Data Level 1 | Refers to reads mapped to a | |||
| Data Level 2 | “Raw epigenomic signal” such as | |||
| Data Level 3 | Typically discrete data such as | |||
| Data Level 4 | Results of epigenome | |||
| Syntax | Data formats to meet the often conflicting | |||
| Semantics | Theory of meaning. This term is commonly used in | |||
| Semantic | Set of technologies including RDF for knowledge | |||
| Metadata | Data about data. Key requirement for data reuse. | |||
| 2. | Tool | Pipeline | Set of analysis tools that are invoked sequentially | |
| Workflow | Formal, portable, programmatically executable | |||
| Workbench | An environment for integration of data analysis | |||
| 3. | Web services | URI and URL | Address system of the Web. Used to uniquely | |
| REST API | Representational State Transfer Application | |||
| 4. | Access to | Cloud | Access to “elastic”, on-demand computing and | |
| Software-as- | Access to software applications over the web such | |||
| 5. | Collaboration | Authentication | Protocol (e.g., OpenID) allowing users or | |
| Web 2.0 | Web hosting of collaborative processes such as | |||
| 6. | Databases knowledge-bases | Examples include NCBI GEO and SRA archives, | ||