| Literature DB >> 32382696 |
Azza E Ahmed1,2, Phelelani T Mpangase3, Sumir Panji4, Shakuntala Baichoo5, Yassine Souilmi6, Faisal M Fadlelmola1, Mustafa Alghali1, Shaun Aron3, Hocine Bendou7, Eugene De Beste7, Mamana Mbiyavanga4, Oussema Souiai8, Long Yi7, Jennie Zermeno9, Don Armstrong9, Brian D O'Connor10, Liudmila Sergeevna Mainzer9,11, Michael R Crusoe12, Ayton Meintjes4, Peter Van Heusden7, Gerrit Botha4, Fourie Joubert13, C Victor Jongeneel9, Scott Hazelhurst3,14, Nicola Mulder4.
Abstract
The need for portable and reproducible genomics analysis pipelines is growing globally as well as in Africa, especially with the growth of collaborative projects like the Human Health and Heredity in Africa Consortium (H3Africa). The Pan-African H3Africa Bioinformatics Network (H3ABioNet) recognized the need for portable, reproducible pipelines adapted to heterogeneous computing environments, and for the nurturing of technical expertise in workflow languages and containerization technologies. Building on the network's Standard Operating Procedures (SOPs) for common genomic analyses, H3ABioNet arranged its first Cloud Computing and Reproducible Workflows Hackathon in 2016, with the purpose of translating those SOPs into analysis pipelines able to run on heterogeneous computing environments and meeting the needs of H3Africa research projects. This paper describes the preparations for this hackathon and reflects upon the lessons learned about its impact on building the technical and scientific expertise of African researchers. The workflows developed were made publicly available in GitHub repositories and deposited as container images on Quay.io. Copyright:Entities:
Keywords: Bioinformatics; capacity building; hackathon; pipeline; reproducible; workflow
Year: 2019 PMID: 32382696 PMCID: PMC7194140 DOI: 10.12688/aasopenres.12847.2
Source DB: PubMed Journal: AAS Open Res ISSN: 2515-9321
Figure 1. Planning and execution of the H3ABioNet Cloud computing hackathon.
H3ABioNet developed SOPs for 5 analysis niches needed within H3Africa projects. 4 out of these were implemented as portable workflows as a result of the H3ABioNet 2016 Cloud Computing hackathon that brought together early career scientists, expert mentors and collaborators by utilizing many planning and communication platforms. (SOPs: Standard Operating Procedures).
Significance and impact of the developed pipelines as part of the H3ABioNet 2016 Cloud Computing hackathon, along with implementation notes.
| Analysis
| Implementation | Significance & Impact | Testing environment
| GitHub link
|
|---|---|---|---|---|
| Whole Genome/
| CWL | Such data is extensively generated within H3Africa projects
| • EGI FedCloud resource (+)
|
|
| 16S rDNA
| CWL and
| For performing 16S rDNA diversity analysis of microbial species
| • AWS EC2 & Azure VMs (+/-)
|
|
| Genome-wide
| Nextflow | The H3Africa Consortium will genotype over 30,000 individuals
| • PBS cluster
|
|
| SNP imputation | Nexflow | Of value in population structure and admixture studies.
| • SGE cluster (-)
|
|
* + and - indicates testing with and without docker, respectively, in the given environment
** Corresponding docker containers are available at: https://quay.io/organization/h3abionet_org and https://dockstore.org/workflows/h3abionet/h3agatk
Communication channels used for the hackathon.
| Channel | Link | Purpose |
|---|---|---|
| Mailing list | - | Group wide announcements and
|
| Mconf |
| Online meetings |
| Slack |
| Inner group discussions and chat |
| Trello |
| Plan goals and activities, and track
|
| GitHub |
| Code repository and version control |
| Google Drive |
| Document sharing |