| Literature DB >> 34405237 |
Haris Zafeiropoulos1,2, Anastasia Gioti1, Stelios Ninidakis1, Antonis Potirakis1, Savvas Paragkamian1,2, Nelina Angelova1, Aglaia Antoniou1, Theodoros Danis1,3, Eliza Kaitetzidou1, Panagiotis Kasapidis1, Jon Bent Kristoffersen1, Vasileios Papadogiannis1, Christina Pavloudi1, Quoc Viet Ha4, Jacques Lagnel5, Nikos Pattakos1, Giorgos Perantinos1, Dimitris Sidirokastritis6, Panagiotis Vavilis6, Georgios Kotoulas1, Tereza Manousaki1, Elena Sarropoulou1, Costas S Tsigenopoulos1, Christos Arvanitidis1,7, Antonios Magoulas1, Evangelos Pafilis1.
Abstract
High-performance computing (HPC) systems have become indispensable for modern marine research, providing support to an increasing number and diversity of users. Pairing with the impetus offered by high-throughput methods to key areas such as non-model organism studies, their operation continuously evolves to meet the corresponding computational challenges. Here, we present a Tier 2 (regional) HPC facility, operating for over a decade at the Institute of Marine Biology, Biotechnology, and Aquaculture of the Hellenic Centre for Marine Research in Greece. Strategic choices made in design and upgrades aimed to strike a balance between depth (the need for a few high-memory nodes) and breadth (a number of slimmer nodes), as dictated by the idiosyncrasy of the supported research. Qualitative computational requirement analysis of the latter revealed the diversity of marine fields, methods, and approaches adopted to translate data into knowledge. In addition, hardware and software architectures, usage statistics, policy, and user management aspects of the facility are presented. Drawing upon the last decade's experience from the different levels of operation of the Institute of Marine Biology, Biotechnology, and Aquaculture HPC facility, a number of lessons are presented; these have contributed to the facility's future directions in light of emerging distribution technologies (e.g., containers) and Research Infrastructure evolution. In combination with detailed knowledge of the facility usage and its upcoming upgrade, future collaborations in marine research and beyond are envisioned.Entities:
Keywords: aquaculture; biodiversity; biotechnology; computational requirements; containerization; high performance computing; high-throughput sequencing; marine research; research infrastructures
Mesh:
Year: 2021 PMID: 34405237 PMCID: PMC8371273 DOI: 10.1093/gigascience/giab053
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Figure 1:Evolution of the IMBBC HPC facility during the past 12 years, with hardware upgrades (blue boxes) and funding milestones (logos of RIs) highlighted. A single server that launched the bioinformatics era in 2009 evolved to the current Tier 2 system Zorba (Box 4), which allows processing of a wide variety of information from DNA sequences to biodiversity data. Different names of the facility denote distinct system architectures.
Figure 2:Block diagram of the Zorba architecture. This is the IMBBC HPC facility architecture in its current setup, after 12 years of development. There are 2 login nodes and 1 intermediate where users may develop their analyses. Computational nodes are split into 4 partitions with different specs and policy terms: bigmem supporting processes requiring up to 640 GB RAM, batch handling mostly (but not exclusively) parallel-driven jobs (either in a single node or across several nodes), minibatch aiming to serve parallel jobs with reduced resource requirements, and fast partition for non-intensive jobs. All servers, except file systems, run Debian 9 (kernel 4.9.0-8-amd64). CCBY icons from the Noun Project: “nfs file document icon" by IYIKON, PK; “Earth" By mungang kim, KR; “database": By Vectorstall, PK; “switch" by Bonegolem, IT
Figure 3:Bar chart with the number of publications that have used IMBBC HPC facility resources, grouped by scientific field. The different methods for data acquisition are also presented. WGS, whole-genome sequencing; WTS, whole-transcriptome sequencing.
Figure 4:Red bars denote published research with high resource requirements of the various computational methods employed at the IMBBC HPC facility due to (a) long computational times (>48 h), (b) high memory requirements (>128 GB), or (c) high storage requirements (>200 GB). For instance, no eDNA-based community analyses performed at Zorba thus far have required a large amounts of memory.