| Literature DB >> 27983788 |
J Rafael Montenegro-Burke1, Aries E Aisporna1, H Paul Benton1, Duane Rinehart1, Mingliang Fang1, Tao Huan1, Benedikt Warth1, Erica Forsberg1, Brian T Abe2, Julijana Ivanisevic3, Dennis W Wolan4, Luc Teyton2, Luke Lairson5, Gary Siuzdak1,6.
Abstract
The speed and throughput of analytical platforms has been a driving force in recent years in the "omics" technologies and while great strides have been accomplished in both chromatography and mass spectrometry, data analysis times have not benefited at the same pace. Even though personal computers have become more powerful, data transfer times still represent a bottleneck in data processing because of the increasingly complex data files and studies with a greater number of samples. To meet the demand of analyzing hundreds to thousands of samples within a given experiment, we have developed a data streaming platform, XCMS Stream, which capitalizes on the acquisition time to compress and stream recently acquired data files to data processing servers, mimicking just-in-time production strategies from the manufacturing industry. The utility of this XCMS Online-based technology is demonstrated here in the analysis of T cell metabolism and other large-scale metabolomic studies. A large scale example on a 1000 sample data set demonstrated a 10 000-fold time savings, reducing data analysis time from days to minutes. Further, XCMS Stream has the capability to increase the efficiency of downstream biochemical dependent data acquisition (BDDA) analysis by initiating data conversion and data processing on subsets of data acquired, expanding its application beyond data transfer to smart preliminary data decision-making prior to full acquisition.Entities:
Mesh:
Year: 2017 PMID: 27983788 PMCID: PMC5244434 DOI: 10.1021/acs.analchem.6b03890
Source DB: PubMed Journal: Anal Chem ISSN: 0003-2700 Impact factor: 6.986
Figure 1Theoretical time comparison of analytical process with data streaming capabilities. (A) Traditional process of data acquisition followed by data conversion and uploading to server for data processing and analysis before obtaining results. (B) Alternative process utilizing real-time data streaming. Files are compressed and streamed to server after acquisition while other data files are being acquired, reducing the time needed for obtaining results. (C) Direct data upload time comparison for different number of samples between with and without streaming capabilities (assuming 1 min upload time for each data file for both streaming and no streaming scenarios).
Figure 2XCMS Stream flowchart showing the general strategy for data streaming. After the acquisition of each individual LC-MS run, the data file is compressed before being streamed to reduce its size. XO Cloud serves the purpose of “Data buffer” between streaming and “Job” submission to XCMS Online. This is necessary because the data processing and data analysis in XCMS Online cannot start without all files necessary for the requested “Job”. Upon data upload completion to XCMS Online, data processing and analysis can then take place.
Figure 3XCMS Stream screenshot of user-friendly interface. The “Directory” section indicates where the files are stored. As an optional section, the “Column Details” allows the data entry of specific stationary phase information for the particular analysis. In “Job Information”, the possibility to stream the data online or offline can be selected, where online refers to data streaming while other samples are being acquired and offline refers to data streaming after all samples have been acquired. Furthermore, the selection of single, pairwise and multigroup jobs are available as well as a “Bio Source” option (H. sapiens is default). “Data Information” is utilized for the determination of the number of samples and when each data file is complete to start the streaming process. Moreover, the file mask allows for correctly assignment of each sample to a particular data set for the “Job” in XCMSonline data processing and analysis.
Figure 4Time comparison between XCMS Stream and manual data uploading for the pairwise analysis of CD4 and CD8 human T cells and 1000 urine samples from ref.[11] (A) A Large time savings are gained by “Online Streaming” with results being generated only 4 h after data acquisition compared to “Manual Uploading” (18h). Data dead time is the time after the completion of data acquisition and data uploading for processing. (B) In “Batch Streaming”, the data files are automatically uploaded to user specific data sets and an XCMS Online job is generated. This is performed after data acquisition is completed. The time savings of “Batch Streaming” compared to “Manual Uploading” for 1000 urine samples is 5.7 days.
Figure 5Extrapolation of time comparison between XCMS Stream and manual data uploading for large data sets. Data transfer time comparison in days (logarithmic scale) for different number of samples between online, batch streaming and manual uploading.