| Literature DB >> 32024987 |
Sergei Yakneen1,2,3, Sebastian M Waszak4, Michael Gertz5, Jan O Korbel6,7.
Abstract
We present Butler, a computational tool that facilitates large-scale genomic analyses on public and academic clouds. Butler includes innovative anomaly detection and self-healing functions that improve the efficiency of data processing and analysis by 43% compared with current approaches. Butler enabled processing of a 725-terabyte cancer genome dataset from the Pan-Cancer Analysis of Whole Genomes (PCAWG) project in a time-efficient and uniform manner.Entities:
Mesh:
Year: 2020 PMID: 32024987 PMCID: PMC7062635 DOI: 10.1038/s41587-019-0360-3
Source DB: PubMed Journal: Nat Biotechnol ISSN: 1087-0156 Impact factor: 54.908
Fig. 1Butler framework architecture.
a, The framework consists of several interconnected components, each running on a separate virtual machine (VM). See Methods and Supplementary Note 1 for details. b, Metrics flow from all VMs into a time series database. The self-healing agent detects anomalies and takes appropriate action. See Supplementary Note 1 for details. Solid arrows indicate information flow; dashed arrows indicate metrics flow; dashed-and-dotted arrows indicate configuration instructions.
Fig. 2Butler performance comparison.
a,b, Comparing the ratio of actual to target progress rates for core PCAWG pipelines (a) vs. Butler pipelines (b). See Methods for details. c, Mean actual/target progress rate ratio across pipelines for core PCAWG (mean 0.49) vs. Butler (mean 0.7) pipelines, each of which were run once over the entirety of PCAWG samples available to us. d,e, Progress rate uniformity of core PCAWG pipelines (d) vs. Butler (e). See Methods for details. In all panels the samples are arranged by their completion date. Runtime includes time spent on failed attempts. Comparison between Butler and core pipelines was facilitated in the context of the PCAWG. Similar comparison between Butler and other frameworks is presently impractical at this scale due to the high costs and complexity involved.