| Literature DB >> 23800020 |
Samuel Lampa1, Martin Dahlö, Pall I Olason, Jonas Hagberg, Ola Spjuth.
Abstract
: Analyzing and storing data and results from next-generation sequencing (NGS) experiments is a challenging task, hampered by ever-increasing data volumes and frequent updates of analysis methods and tools. Storage and computation have grown beyond the capacity of personal computers and there is a need for suitable e-infrastructures for processing. Here we describe UPPNEX, an implementation of such an infrastructure, tailored to the needs of data storage and analysis of NGS data in Sweden serving various labs and multiple instruments from the major sequencing technology platforms. UPPNEX comprises resources for high-performance computing, large-scale and high-availability storage, an extensive bioinformatics software suite, up-to-date reference genomes and annotations, a support function with system and application experts as well as a web portal and support ticket system. UPPNEX applications are numerous and diverse, and include whole genome-, de novo- and exome sequencing, targeted resequencing, SNP discovery, RNASeq, and methylation analysis. There are over 300 projects that utilize UPPNEX and include large undertakings such as the sequencing of the flycatcher and Norwegian spruce. We describe the strategic decisions made when investing in hardware, setting up maintenance and support, allocating resources, and illustrate major challenges such as managing data growth. We conclude with summarizing our experiences and observations with UPPNEX to date, providing insights into the successful and less successful decisions made.Entities:
Year: 2013 PMID: 23800020 PMCID: PMC3704847 DOI: 10.1186/2047-217X-2-9
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Figure 1Overview of the architecture of the UPPNEX infrastructure. Data is produced by sequencing platforms or individual research groups, and transferred to UPPNEX. Research groups then log in to the UPPNEX system and analyze their data using the installed software. For long-term storage, there is a possibility to move data from UPPNEX to the Swedish national storage initiative SweStore.
Figure 2Overview of the resource utilization at UPPNEX.a Number of active projects. The number of active projects has steadily increased, although a plateau seems to have been reached. b Total amount of NGS storage consisting of fast parallel storage, global scratch folders, and NGS storage on SweStore. Storage has generally increased with some small dips associated with clean-up campaigns. The larger deviation in 2012 is believed to be due to data duplication during migration to a new storage system. c CPU usage trend. Usage has increased but with significant variations. The dip in late 2011 was due to a longer downtime at UPPMAX for move to a new computer room. The explanation for the long drop early in 2012 is unknown.