| Literature DB >> 32838759 |
Tiziana Castrignanò1, Silvia Gioiosa2,3, Tiziano Flati2,3, Mirko Cestari2, Ernesto Picardi3,4, Matteo Chiara3,5, Maddalena Fratelli6, Stefano Amente7, Marco Cirilli8, Marco Antonio Tangaro3, Giovanni Chillemi3,9, Graziano Pesole10,11, Federico Zambelli12,13.
Abstract
BACKGROUND: The advent of Next Generation Sequencing (NGS) technologies and the concomitant reduction in sequencing costs allows unprecedented high throughput profiling of biological systems in a cost-efficient manner. Modern biological experiments are increasingly becoming both data and computationally intensive and the wealth of publicly available biological data is introducing bioinformatics into the "Big Data" era. For these reasons, the effective application of High Performance Computing (HPC) architectures is becoming progressively more recognized also by bioinformaticians. Here we describe HPC resources provisioning pilot programs dedicated to bioinformaticians, run by the Italian Node of ELIXIR (ELIXIR-IT) in collaboration with CINECA, the main Italian supercomputing center.Entities:
Keywords: Bioinformatics; Compute service; HPC; NGS data analysis; Software environment
Mesh:
Year: 2020 PMID: 32838759 PMCID: PMC7446135 DOI: 10.1186/s12859-020-03565-8
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1The typical flow for novel NGS data transferred to CINECA and analysed by ELIXIR-IT HPC@CINECA users. The CINECA storage facility provides iRODS [9] technology to archive and facilitate moving data between different supercomputers if needed
Cineca high-performance computing clusters available for bioinformatic projects during the ELIXIR-IT HPC@CINECA call period, some providing higher-memory nodes. Projects are assigned to one cluster or the other upon technical evaluation from Cineca’s staff and depending on the nature and needs of the project itself. Detailed instructions on how to get access to and use the clusters are provided to the PIs after projects approval and are also available on the Cineca website. Marconi A2 is going to be replaced by the Marconi 100 cluster in a few months
| HPC cluster | Nodes | Total core | RAM/Node (GB) | Architecture |
|---|---|---|---|---|
| Pico (2015–2017) | 70 | 1400 | 128 | |
| Galileo | 516 | 16,512 | 128 | |
| Marconi A2 | 3176 | 215,968 | 96 (cache mode) |
Fig. 2Schematic draw of CINECA system infrastructure . (i) HOME area: intended for source codes, executables, small data files; (ii) SCRATCH area: intended for the output of batch jobs; (iii) WORK area: output of batch jobs as well as for secure sharing within the project team; (iv) DRES: intended as a medium/long term repository and as a shared area within the project team and across HPC platforms; (v) tape area: i personal long term archive area - via Linear Tape File System (LTFS)
Fig. 3Panel A) show the distribution of research projects between broadly-defined research areas
Panel B) shows the growth rates of submitted projects normalised per month, only 4 months were considered for 2016 (starting year) and 2019 (current year).
Fig. 4Evaluation of the scalability of the optimized REDItools version on the Marconi-A2 infrastructure using 540, 1080, 2160 and 4320 cores. The plot shows the elapsed time (in seconds) needed to analyze a single sample when using an increasing number of cores
Fig. 5CINECA roadmap towards systems of exaflop capabilities