| Literature DB >> 25435347 |
Shicai Wang, Ioannis Pandis, Chao Wu, Sijin He, David Johnson, Ibrahim Emam, Florian Guitton, Yike Guo.
Abstract
BACKGROUND: High-throughput transcriptomic data generated by microarray experiments is the most abundant and frequently stored kind of data currently used in translational medicine studies. Although microarray data is supported in data warehouses such as tranSMART, when querying relational databases for hundreds of different patient gene expression records queries are slow due to poor performance. Non-relational data models, such as the key-value model implemented in NoSQL databases, hold promise to be more performant solutions. Our motivation is to improve the performance of the tranSMART data warehouse with a view to supporting Next Generation Sequencing data.Entities:
Mesh:
Year: 2014 PMID: 25435347 PMCID: PMC4248814 DOI: 10.1186/1471-2164-15-S8-S3
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Relational microarray data schema.
| GENE_SYMBOL | PROBESET_ID | PATIENT_ID | TRIAL_NAME | RAW | LOG | ZSCORE |
|---|---|---|---|---|---|---|
| VARCHAR2(100) |
A classic microarray table structure, the DEAPP schema in tranSMART.
Example of a relational model representation of a patient record.
| GENE_SYMBOL | PROBESET_ID | PATIENT_ID | TRIAL_NAME | RAW | LOG | ZSCORE |
|---|---|---|---|---|---|---|
| LDOC1 |
Figure 1JSON example. Example of a JSON object that maps to the patient record illustrated in Table 2.
Example in DEAPP
| GENE_SYMBOL | PROBESET_ID | PATIENT_ID | TRIAL_NAME | RAW | LOG | ZSCORE |
|---|---|---|---|---|---|---|
Example data model in BigTable transformed from table 3.
| Key | Value | ||
|---|---|---|---|
| MULTMYEL + 79622 | |||
| MULTMYEL + 79622 | |||
| MULTMYEL + 79737 | |||
| 4a. LOG Family StoreFile | |||
| MULTMYEL + 79622 | |||
| MULTMYEL + 79622 | |||
| MULTMYEL + 79737 | |||
| 4b. RAW Family StoreFile | |||
| MULTMYEL + 79622 | |||
| MULTMYEL + 79622 | |||
| MULTMYEL + 79737 | |||
| 4c. ZSCORE Family StoreFile | |||
Figure 2Performance of key-value vs. relational data model. The bar chart shows the query retrieval times for each of the test cases over varying numbers of patient record queries. The NoSQL model implementation on HBase performs the best with an approximately 3.06~7.42-fold increase in query performance than relational model on MySQL Cluster and 2.68~10.50-fold increase than the relational model on MongoDB.
Figure 3Performance of Random Read vs. Scan in key-value data model. The bar chart shows the query retrieval times for each of the test cases over varying numbers of patient record queries using both Random Read and Scan methods. The numbers above scan bar show the patient numbers the scan read in that case. The error bar shows the deviation of each test.
Figure 4Performance of Random Read in different Families. The bar chart shows the query retrieval times for each of the test cases over varying numbers of patient record queries of three Families. The error bar shows the deviation of each test.
Figure 5Performance of Scan in different Families. The bar chart shows the query retrieval times for each of the test cases over varying numbers of patient record queries of three Families. The error bar shows the deviation of each test.