| Literature DB >> 24621103 |
Zhuohui Gan, Jianwu Wang, Nathan Salomonis, Jennifer C Stowe, Gabriel G Haddad, Andrew D McCulloch, Ilkay Altintas, Alexander C Zambon1.
Abstract
BACKGROUND: Mandatory deposit of raw microarray data files for public access, prior to study publication, provides significant opportunities to conduct new bioinformatics analyses within and across multiple datasets. Analysis of raw microarray data files (e.g. Affymetrix CEL files) can be time consuming, complex, and requires fundamental computational and bioinformatics skills. The development of analytical workflows to automate these tasks simplifies the processing of, improves the efficiency of, and serves to standardize multiple and sequential analyses. Once installed, workflows facilitate the tedious steps required to run rapid intra- and inter-dataset comparisons.Entities:
Mesh:
Year: 2014 PMID: 24621103 PMCID: PMC3975178 DOI: 10.1186/1471-2105-15-69
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1The conceptual view of MAAMD. Briefly, each dataset among n targeted datasets is analyzed in a loop one by one. Input files are prepared in advance. MAAMD extracts required information from these input files. The targeted dataset is then downloaded and re-organized. A quality control is executed to assist the users to evaluate the sample quality. MAAMD then prepares for meta-analyses by asking the users to select groups, intra-set comparisons. The meta-analyses are then executed and the results are stored at the assigned location. When all targeted datasets have been analyzed, an inter-set comparison can be executed to identify the common regulated genes among datasets.
Figure 2The implementation of MAAMD in Kepler. (A) Design of the entire workflow. (B) Design of module e in Figure 2A. (C) Design of module d in Figure 2A. (D) Design of module c in Figure 2C. (E) Design of module d in Figure 2C. (F) Design of module e in Figure 2C.
Figure 3A screenshot of the summary section of the quality control report for dataset GSE9400. The quality control report includes five sections “Between array comparison”, “Array intensity distribution”, “Variance mean dependence”, “Affymetrix specific plots” and “Individual array quality”.
An example of the csv file to describe targeted datasets
| gse15879 | Dm | Drosophila_2 | C:/MAAMD/datainfo-gse15879.csv |
| gse14981 | Dm | Drosophila_2 | C:/MAAMD/datainfo-gse14981.csv |
| gse12160 | Dm | Drosophila_2 | C:/MAAMD/datainfo-gse12160.csv |
| gse9400 | Mm | Mouse430_2 | C:/MAAMD/datainfo-gse9400.csv |
Listed in Table 1 is the content of “datasets.csv” that describes the targeted datasets for the study case.
An example of the csv file to describe the samples in one dataset
| GSM239142.CEL | mmc57bl_muscle_norm_s1.CEL | con |
| GSM239143.CEL | mmc57bl_muscle_norm_s2.CEL | con |
| GSM239144.CEL | mmc57bl_muscle_norm_s3.CEL | con |
| GSM239145.CEL | mmc57bl_muscle_norm_s4.CEL | con |
| GSM239146.CEL | mmc57bl_muscle_802wks_s1.CEL | hyp |
| GSM239147.CEL | mmc57bl_muscle_802wks_s2.CEL | hyp |
| GSM239148.CEL | mmc57bl_muscle_802wks_s3.CEL | hyp |
| GSM239149.CEL | mmc57bl_muscle_802wks_s4.CEL | hyp |
Listed in Table 2 is the content of “datainfo-gse9400.csv” that supplies information for the samples in GSE9400 dataset.
Figure 4The structure of work and output folder of MAAMD. (A) Contents of the work folder. (B) the concent of “workflow” floder. (C) the concent of the output folder for an individual data set “GSE9400”. (D) the concent of “result” folder for GSE9400. (E) the content of “ExpressionOutput” folder in the “result” folder of GSE9400 where “DATASET-GSE9400.txt” is located.
The list of common regulated genes between datasets
| GSE15879 v.s. GSE14981 | A comparison between chronic and acute hypoxia in flies | 7 | Hsp26, HSPA1A, GstD1, CG14120 | CG3734, CG13607, Cyp4d2 |
| GSE15879 v.s. GSE12160 | A comparison between chronic hypoxia and chronic hyperoxia in flies | 12 | Hsp26, CG14120, RGN, CG31300, LOC423786 | Lsp1beta, CG15766, Cyp4d2, CG5897, HPGD, Lsd-1, si:dkey-7814.10 |
| GSE15879 v.s. GSE9400 | A comparison of chronic hypoxia in mice and flies | 4 | RRM2, AGPAT3 | RRM1, FBP2 |
| GSE15879 v.s. GSE14981 v.s. GSE12160 | A comparison of chronic hypoxia, chronic hyperoxia and acute hypoxia in flies | 3 | Hsp26, CG14120 | Cyp4d2 |
The column “Compared Datasets” lists the detailed comparisons. The column “Significance” states the biological meaning to make such a comparison. The column “Common Regulated Genes” lists the total number of common regulated genes, the detailed conserved genes and the detailed differential genes.