| Literature DB >> 34425749 |
Shishir Reddy1, Ling-Hong Hung2, Olga Sala-Torra3, Jerald P Radich3,4,5, Cecilia Cs Yeung3,6, Ka Yee Yeung7.
Abstract
BACKGROUND: Long-read sequencing has great promise in enabling portable, rapid molecular-assisted cancer diagnoses. A key challenge in democratizing long-read sequencing technology in the biomedical and clinical community is the lack of graphical bioinformatics software tools which can efficiently process the raw nanopore reads, support graphical output and interactive visualizations for interpretations of results. Another obstacle is that high performance software tools for long-read sequencing data analyses often leverage graphics processing units (GPU), which is challenging and time-consuming to configure, especially on the cloud.Entities:
Keywords: Cancer diagnostics; Cloud computing; FAIR; GPU; Leukemia; Long-read sequencing; Nanopore; Workflows
Mesh:
Year: 2021 PMID: 34425749 PMCID: PMC8381503 DOI: 10.1186/s12864-021-07927-1
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 4.547
Fig. 1Screenshots of our interactive GPU workflow which uses the Biodepot-workflow-builder platform. Panel A is a screenshot of the workflow using the open-source Bonito basecaller. Panel B is a screenshot of the workflow using the proprietary Guppy basecaller. Both basecallers use GPUs. For the Guppy workflow, the user enters the URL for the Oxford Nanopore Technology Guppy installation package which is then used to create a container to execute Guppy. The other steps in the two workflows are identical, consisting of data download, alignment and visualization. Each of these steps are performed by software modules encapsulated in Docker containers and represented by the graphical widgets. Lines connecting the widgets indicate flow of data between the modules. The user double clicks on the Start widget, enters the necessary parameters into the forms and presses a graphical start button to start the workflow. Double-clicking on a widget brings up a point-and-click interface for users to enter parameters, monitor results and control execution of the associated workflow module. Unlike other workflow execution platforms, the Biodepot-workflow-builder supports modules with interactive graphics. This is leveraged in this workflow to automatically open the final BAM files in the Interactive Graphics Viewer (IGV) which we use to check for diagnostic translocation breakpoints in our cell-line data. The execution time of the basecallers Guppy and Bonito on GPU-enabled machines using the NB4 cell line averaged 88.9 s (standard error 1.2) and 948.2 s (standard error 1.7) on an AWS g4dn.4xlarge GPU instance. For comparison, the CPU version of Guppy averaged 2551.8 s (standard error 22.4) on an AWS virtual machine instance (c5d.18xlarge) using 72 vCPUs
Fig. 2A. IGV viewer alignment on the PML and RARA genes of a Flongle Nanopore generated sequences for the NB4 cell line. Library was generated from DNA with a PCR-free enrichment protocol using CRISPR guides targeting PML and RARA genes. The top panel shows the alignment for the reads processed with the Bonito basecaller and minimap2 aligner in the Bwb. The middle panel shows the alignment of reads processed with the Guppy basecaller and minimap2 aligner in the Bwb workflow. The bottom panel shows reads processed in a manual step-by-step workflow using the Guppy flipflop basecaller and minimap2 aligner. Reads with PML-RARA breakpoint are colored to highlight the fragment aligned to PML and RARA.B. Genomic BCR-ABL1 breakpoint identified in the K562 cell line by long-read sequencing. Schematic representation (generated with http://wormweb.org/exonintron) shows the breakpoint captured with our amplification-free enrichment protocol and long-read sequencing. The breakpoint is represented in the upper graphic by the red vertical line, and the location of the sequence specific guides is marked by colored arrows. ABL1 intron 1 spans 140Kbs. In the lower panel, nanopore sequence alignments in IGV show sequences partially aligned to BCR and ABL1. Reads with BCR-ABL1 breakpoint are colored to highlight the same read is partially aligned to BCR and ABL1.
Comparison of runtime from different basecallers (Guppy and Bonito) using the NB4 cell line. The AWS results were averaged over 4 runs. The local host results were averaged over 5 runs
| Basecaller | cloud/local | average runtime (seconds) | standard error (seconds) |
|---|---|---|---|
| Guppy CPU | AWS c5d18xlarge | 2551.8 | 22.4 |
| Guppy GPU | AWS g4dn.4xlarge | 88.9 | 1.2 |
| Guppy GPU | Laptop | 135.3 | 0.6 |
| Bonito GPU | AWS g4dn.4xlarge | 948.2 | 1.7 |