| Literature DB >> 24052743 |
Abstract
The combination of improving technologies for molecular interrogation of global molecular alterations in human diseases along with increases in computational capacity, have enabled unprecedented insight into disease etiology, pathogenesis and have enabled new possibilities for biomarker development. A large body of data has accumulated over recent years, with a most prominent increase in information originating from genomic, transcriptomic and proteomic profiling levels. However, the complexity of the data made discovery of high-order disease mechanisms involving various biological layers, difficult, and therefore required new approaches toward integration of such data into a complete representation of molecular events occurring on cellular level. For this reason, we developed a new mode of integration of results coming from heterogeneous origins, using rank statistics of results from each profiling level. Due to the increased use of next-generation sequencing technology, experimental information is becoming increasingly more associated to sequence information, for which reason we have decided to synthesize the heterogeneous results using the information of their genomic position. We therefore propose a novel positional integratomic approach toward studying 'omic' information in human disease.Entities:
Keywords: Data integration; Genomics; High-throughput technologies; Transcriptomics
Year: 2012 PMID: 24052743 PMCID: PMC3776674 DOI: 10.2478/v10034-012-0018-7
Source DB: PubMed Journal: Balkan J Med Genet ISSN: 1311-0160 Impact factor: 0.519
Figure 1.Process of integration of numerous heterogeneous data sources. First, data on significant alterations on a certain biological layer is obtained from selected studies (data from various layers is coded by letters a–n and differing colors). These alterations or signals are then positioned into genomic bins of fixed size and bin-scores for each of the bins is estimated. For each of the layers in a–n, bins are then prioritized on the basis of this score and the rank of each bin is separated. The final integration step is then performed by calculating rank products for each of the genomic bins, based on their rank in each of data sources.
Figure 2.The genome-wide distribution of significance values, based on the permutation test of integration scores. Each region or genomic bin is represented by a dot whose height represent significances in the −log10P form, with regions characterized by high accumulation of heterogeneous data attaining higher −log10P values. The HLA region on chromosome 6 attained the highest score in these analyses with p values below 1•10−9. Notably, non-HLA regions score high as well, offering a landscape of new genomic regions for further down-stream investigations