| Literature DB >> 31069047 |
Aleksandr Agafonov1, Kimmo Mattila2, Cuong Duong Tuan3, Lars Tiede4, Inge Alexander Raknes5, Lars Ailo Bongo1.
Abstract
META-pipe is a complete service for the analysis of marine metagenomic data. It provides assembly of high-throughput sequence data, functional annotation of predicted genes, and taxonomic profiling. The functional annotation is computationally demanding and is therefore currently run on a high-performance computing cluster in Norway. However, additional compute resources are necessary to open the service to all ELIXIR users. We describe our approach for setting up and executing the functional analysis of META-pipe on additional academic and commercial clouds. Our goal is to provide a powerful analysis service that is easy to use and to maintain. Our design therefore uses a distributed architecture where we combine central servers with multiple distributed backends that execute the computationally intensive jobs. We believe our experiences developing and operating META-pipe provides a useful model for others that plan to provide a portal based data analysis service in ELIXIR and other organizations with geographically distributed compute and storage resources.Entities:
Keywords: AAI federation; Amazon Web Services; Apache Spark; EGI Federated Cloud; ELIXIR; META-pipe; OpenStack; Portability
Year: 2017 PMID: 31069047 PMCID: PMC6480938 DOI: 10.12688/f1000research.13204.1
Source DB: PubMed Journal: F1000Res ISSN: 2046-1402
Figure 1. META-pipe backend architecture has three servers located at the University of Tromsø.
The authorization server, which is integrated with the ELIXIR AAI, enables login for Elixir users. The storage server stores all META-pipe input, output and provenance data. The job server schedules and maintains submitted analysis jobs. The jobs are implemented as Spark programs that are executed by an execution manager running in an execution environment. There can be multiple execution managers distributed over many HPC clusters and clouds.
Figure 2. META-pipe deployment.
End-users run analyses using the META-pipe web app. The web app is integrated with ELIXIR AAI, so users can authenticate using their home institution username and password. Resource providers use the cluster setup tool to set up an execution manager, on, for example, the cPouta OpenStack cloud, which executes analysis jobs. The execution manager, pipeline, and dependencies are all read from our artifacts server. META-pipe developers use git to maintain the code. Our GitLab is integrated with Jenkins that compiles and runs integration tests and pushes new META-pipe versions to the artifacts server. META-pipe administrators administer all jobs using the META-pipe Job manager interface.