| Literature DB >> 31602299 |
Phillip A Richmond1,2, Wyeth W Wasserman1,2.
Abstract
Researchers in the life sciences are increasingly faced with the task of obtaining compute resources and training to analyze large, high-throughput technology generated datasets. As demand for compute resources has grown, high performance computing (HPC) systems have been implemented by research organizations and international consortiums to support academic researchers. However, life science researchers lack effective time-of-need training resources for utilization of these systems. Current training options have drawbacks that inhibit the effective training of researchers without experience in computational analysis. We identified the need for flexible, centrally-organized, easily accessible, interactive, and compute resource specific training for academic HPC use. In our delivery of a modular workshop series, we provided foundational training to a group of researchers in a coordinated manner, allowing them to further pursue additional training and analysis on compute resources available to them. Efficacy measures indicate that the material was effectively delivered to a broad audience in a short time period, including both virtual and on-site students. The practical approach to catalyze academic HPC use is amenable to diverse systems worldwide. Copyright:Entities:
Keywords: education; genome analysis; genomics; high throughput computing; hpc; life sciences
Mesh:
Year: 2019 PMID: 31602299 PMCID: PMC6774052 DOI: 10.12688/f1000research.19320.1
Source DB: PubMed Journal: F1000Res ISSN: 2046-1402
Workshop materials.
Breakdown of the workshop materials, including the learning objectives, commands learned, and prerequisites for each session.
| Session # | Title | Learning Objectives | Commands Learned | Prerequisites |
|---|---|---|---|---|
|
| Introduction To Linux I:
| Basics of computing with high
| cp, ls, mv, cut, clear, mkdir, rm,
| Log-in to the
|
| Filesystem hierarchy | ||||
| Basic command line operations | ||||
| File handling & permissions | ||||
| Standard out | ||||
| File Formats: GTF, Clinvar Variant
| ||||
|
| Introduction to Linux II:
| Basics of computing with high
| emacs, nano, qstat, qsub, showq,
| Session 1 Problem
|
| Editing files with linux file-editors | ||||
| Shell Scripts | ||||
| Interacting with the queue | ||||
| File transfer | ||||
|
| Short Read Mapping and
| Next Generation Sequencing Primer | BWA mem, samtools sort,
| Session 2 Problem
|
| Map short read DNA sequences to
| ||||
| Convert file formats using Samtools | ||||
| Utilize scheduler for pipeline
| ||||
| Visualize short read data in IGV | ||||
| File Formats: SAM, BAM, indexed
| ||||
|
| Variant Calling (Small
| Exome sequencing | freebayes, bgzip, tabix | Session 3 Problem
|
| Mapped read post-processing | ||||
| Variant calling for small variants | ||||
| Variant compression and indexing | ||||
| Visualization of variants and short-
| ||||
| File Formats: VCF, compressed VCF,
| ||||
|
| Variant Interpretation with
| Brief introduction to MySQL queries | gemini load, gemini query, gemini
| Course Exam |
| Variant annotation with VEP | ||||
| File Formats: PED file, gemini.db | ||||
|
| RNA seq I: Analysis with
| RNAseq overview | HISAT2, Stringtie | Course Exam |
| RNAseq read mapping and
| ||||
| Transcript assembly | ||||
| Transcript quantification | ||||
|
| RNA seq II: Differential
| R data loading and visualization | R, DESeq2 | Course Exam |
| Differential expression using DESeq2 |
Figure 1. Participant background.
Description of workshop attendees including A) distribution of sexes; B) educational level; and C) the university from which they participated.
Figure 2. Workshop results.
A breakdown of the workshop results including A) distribution of course completion rates annotated by mode of attendance and prior experience; B) efficacy of each module based on survey responses; C) per-session attendance and problem set completion; and D) course efficacy breakdown including problem set and examination utility. Values were tallied based on attendance sign in sheets, user-submitted assignments, server workshop directories, and survey responses (Underlying data).