| Literature DB >> 20948564 |
Jemila S Hamid1, Pingzhao Hu, Nicole M Roslin, Vicki Ling, Celia M T Greenwood, Joseph Beyene.
Abstract
Due to rapid technological advances, various types of genomic and proteomic data with different sizes, formats, and structures have become available. Among them are gene expression, single nucleotide polymorphism, copy number variation, and protein-protein/gene-gene interactions. Each of these distinct data types provides a different, partly independent and complementary, view of the whole genome. However, understanding functions of genes, proteins, and other aspects of the genome requires more information than provided by each of the datasets. Integrating data from different sources is, therefore, an important part of current research in genomics and proteomics. Data integration also plays important roles in combining clinical, environmental, and demographic data with high-throughput genomic data. Nevertheless, the concept of data integration is not well defined in the literature and it may mean different things to different researchers. In this paper, we first propose a conceptual framework for integrating genetic, genomic, and proteomic data. The framework captures fundamental aspects of data integration and is developed taking the key steps in genetic, genomic, and proteomic data fusion. Secondly, we provide a review of some of the most commonly used current methods and approaches for combining genomic data with focus on the statistical aspects.Entities:
Year: 2009 PMID: 20948564 PMCID: PMC2950414 DOI: 10.4061/2009/869093
Source DB: PubMed Journal: Hum Genomics Proteomics ISSN: 1757-4242
Figure 1Conceptual framework for data integration in genetics and genomics.
Some illustrative examples for integrating similar and heterogeneous genomic, genetic, and proteomic data.
| Data types | Biological/statistical question | Stages of integration | Example/comments |
|---|---|---|---|
| Sample classification | Early | Jiang et al. [ | |
| Similar data types | Differential gene analysis | Late |
Rhodes et al. 2002 [ |
| Gene mapping | Late | Ioannidis et al. [ | |
| Candidate gene discovery | Intermediate | Adler et al. [ | |
| Heterogeneous data types | Protein classification | Intermediate | Lanckriet et al. [ |
| Gene mapping | Intermediate |
McCaroll and Altshuler (2007) [ | |
| Gene set (function) differential analysis | Intermediate | Al-Shahrour et al. [ | |
Figure 2An illustrative flowchart for finding disease causing genes by integrating heterogeneous data.