| Literature DB >> 25267794 |
Christopher Wilks1, Melissa S Cline2, Erich Weiler2, Mark Diehkans2, Brian Craft2, Christy Martin2, Daniel Murphy2, Howdy Pierce2, John Black2, Donavan Nelson2, Brian Litzinger2, Thomas Hatton2, Lori Maltbie2, Michael Ainsworth2, Patrick Allen2, Linda Rosewood2, Elizabeth Mitchell2, Bradley Smith2, Jim Warner2, John Groboske2, Haifang Telc2, Daniel Wilson2, Brian Sanford2, Hannes Schmidt2, David Haussler2, Daniel Maltbie2.
Abstract
The Cancer Genomics Hub (CGHub) is the online repository of the sequencing programs of the National Cancer Institute (NCI), including The Cancer Genomics Atlas (TCGA), the Cancer Cell Line Encyclopedia (CCLE) and the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) projects, with data from 25 different types of cancer. The CGHub currently contains >1.4 PB of data, has grown at an average rate of 50 TB a month and serves >100 TB per week. The architecture of CGHub is designed to support bulk searching and downloading through a Web-accessible application programming interface, enforce patient genome confidentiality in data storage and transmission and optimize for efficiency in access and transfer. In this article, we describe the design of these three components, present performance results for our transfer protocol, GeneTorrent, and finally report on the growth of the system in terms of data stored and transferred, including estimated limits on the current architecture. Our experienced-based estimates suggest that centralizing storage and computational resources is more efficient than wide distribution across many satellite labs. Database URL: https://cghub.ucsc.edu. Published by Oxford University Press 2014. This work is written by US Government employees and is in the public domain in the US.Entities:
Mesh:
Year: 2014 PMID: 25267794 PMCID: PMC4178372 DOI: 10.1093/database/bau093
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451