| Literature DB >> 27309738 |
Bastian Schiffthaler1, Myrto Kostadima2, Nicolas Delhomme1,3, Gabriella Rustici4.
Abstract
The advancement of high-throughput sequencing (HTS) technologies and the rapid development of numerous analysis algorithms and pipelines in this field has resulted in an unprecedentedly high demand for training scientists in HTS data analysis. Embarking on developing new training materials is challenging for many reasons. Trainers often do not have prior experience in preparing or delivering such materials and struggle to keep them up to date. A repository of curated HTS training materials would support trainers in materials preparation, reduce the duplication of effort by increasing the usage of existing materials, and allow for the sharing of teaching experience among the HTS trainers' community. To achieve this, we have developed a strategy for materials' curation and dissemination. Standards for describing training materials have been proposed and applied to the curation of existing materials. A Git repository has been set up for sharing annotated materials that can now be reused, modified, or incorporated into new courses. This repository uses Git; hence, it is decentralized and self-managed by the community and can be forked/built-upon by all users. The repository is accessible at http://bioinformatics.upsc.se/htmr.Entities:
Mesh:
Year: 2016 PMID: 27309738 PMCID: PMC4910983 DOI: 10.1371/journal.pcbi.1004937
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Training providers—among the trainers consortium—offering hands on training courses on the analysis of HTS data.
| Institution | Target audience | URL |
|---|---|---|
| Undergraduate, PhD students, Postgraduate, Professional development | ||
| PhD students, Postgraduate, Professional development | ||
| PhD students, Postgraduate, Professional development | ||
| PhD students, Postgraduate, Professional development | ||
| PhD students, Postgraduate, Professional development | ||
| Undergraduate, PhD students, Postgraduate, Professional development | ||
| Postgraduate, Professional development | ||
| PhD students, Postgraduate, Professional development | ||
| PhD students, Postgraduate, Professional development | ||
| PhD students, Postgraduate, Professional development | ||
| Undergraduate, PhD students, Postgraduate, Professional development |
The devised minimal set of descriptors for training materials.
| Descriptor | Content | Example |
|---|---|---|
| Title | ChIP-Seq tutorial | |
| Contact details of the author | Name and email address | |
| Content description/aims | A brief description of the content covered in the training material and overall aims | In this course we provide a basic introduction to conducting ChIP-Seq data analysis using the Galaxy framework. We will be retracing most of the steps required to get from an Illumina FASTQ sequence file all the way to performing peak calling and identifying over-represented sequence motifs and functional annotation. The aim is to give biologists the tools to independently run a basic analysis of ChIP-Seq data. |
| Target audience | Wet lab biologists with no or little programming experience | |
| Learning objectives | Provide the trainees with an indication of what they should know/be able to do upon completion of the selected training module; | Develop an appropriate experimental design. Describe and perform the steps of a basic ChIP-Seq analysis workflow. Visualize data and results at various stages of the analysis. Annotate results. Critically assess data and results. |
| Prerequisites | Any prior knowledge that might help the participants to achieve the LOs | Working knowledge of Galaxy |
| Content | List of all files associated with the training material. For each file, the approximate length of time required to deliver the training and the IT/software requirements for running the tutorials should be indicated | Presentations, tutorials |
| Datasets | Should include: a description of the dataset, its provenance (including links to where the dataset can be found), how was it modified from its original form (if applicable, also include the code used to modify the dataset), and what can be demonstrated by using this particular dataset | In this practical we aim to identify potential transcription factor binding sites of Oct4 in mouse embryonic stem cells [ |
| Stability | Stability of the module’s content, e.g., how many times has this material been used in the classroom and when it was last updated | |
| Literature references |
Teaching topic core modules, as established by the NGS Trainers Consortium during the “Best Practice in HTS data analysis” workshop.
| Topic | Level | Module Name | Module objectives | Essential /Optional |
|---|---|---|---|---|
| Prerequisite | Basic | Linux | Develop a familiarity with the Linux command line environment, such as navigating folders, learn what commands and parameters are and how to use them, open files on a terminal. | Essential |
| R-programming | Describe the R programming environment. Perform basic programming in R | Essential | ||
| Statistics | Review the basic statistical concepts | Essential | ||
| HTS Introduction | Basic | Technologies | Define the principles of high throughput sequencing technologies | Essential |
| Data Formats | Describe the different data formats commonly used for HTS data | Essential | ||
| ChIP-Seq | Advanced | Preprocessing | Describe necessary preprocessing steps, Perform a Quality Assessment and interpret the results | Essential |
| Alignment | Perform alignment, Discuss alignment considerations | Essential | ||
| ChIP-Seq-QC | Explain ChIP-Seq specific QC steps, Perform ChIP-Seq QC on a data set | Essential | ||
| Peak-calling | List appropriate peak-calling software, Describe the theoretical basis of peak-calling, Apply different peak callers | Essential | ||
| Visualization | Visualize raw and processed data, Assess data quality | Essential | ||
| Annotation | Interpret results in the genomic context | Optional | ||
| Differential binding (DB) | Perform DB analysis, Interpret the output, Recognize the need for normalization | Optional | ||
| Working with biological replicates | Compare and combine different biological replicates using IDR analysis | Optional | ||
| Non-peak based analysis | Inspect signal around regions of interest, Generate carpet plots | Optional | ||
| RNA-Seq | Advanced | Preprocessing | Apply QC software and interpret the output, Decide/perform necessary preprocessing steps | Essential |
| Alignment | Distinguish between genome and transcriptome alignment, Select the appropriate tool, Recognize the challenges and pitfalls, Produce alignment, Interpret the alignment file | Essential | ||
| Alignment QC | Apply QC software, Interpret the output, Decide/perform necessary filtering steps | Essential | ||
| Feature summarization | Produce a table of counts, Identify a proper strategy for the biological question, Interpret output | Essential | ||
| Exploratory analysis | Visualize alignments, Applying QC software (e.g. clustering, Principal Component Analysis (PCA), etc.), Identify confounding effects and take necessary action | Essential | ||
| De-novo transcriptome assembly | Perform the analysis, Recognize the challenges and pitfalls, Interpret the assemblers output | Optional | ||
| Differential expression (DE) | Perform DE analysis, Interpret the output, Recognize the need for normalization and dispersion estimation | Optional | ||
| Variant analysis | Advanced | Preprocessing | Apply QC software and interpret the output, Decide/perform necessary preprocessing steps | Essential |
| Alignment | Differentiate genome and transcriptome alignment, Select the appropriate tool, Recognize the challenges and pitfalls, Produce alignment, Interpret the aligners output | Essential | ||
| Alignment QC | Apply QC software and interpret the output, Decide/perform necessary filtering steps | Essential | ||
| Variant Calling | Apply variant calling software, Understand the format and the different type of variants | Essential | ||
| Variant Analysis | Visualize variant calls, Interpret the output, Select/filter variants | Essential | ||
| Annotating Variants | Annotate variants, Lookup potential effects on coding regions, Evaluate putative clinical relevance | Optional |
The level column details what the target audience of individual topics should be. The optional modules are the ones that can vary based on the course program and duration. The essential modules are the ones that are considered mandatory for any course addressing the corresponding topic; i.e., they are the core set of modules, common to all downstream analyses, that a trainee must learn about.
Fig 1Overview of the repository’s implementation, content, search interface, and flowchart of roles and actions.
Authors add their material to the main repository, where they can be searched and retrieved by trainers. Any trainer can then build on existing materials for their own lectures and practicals and request a merge of the updated materials into the main repository, which upon validation, results in the older version of the material to be archived. Two of the consortium members act as curators for each topic, ensuring the completeness of the descriptors and the adequate use of the controlled vocabularies. The newly added materials are then indexed and made discoverable through the search interface. Both the search interface and repository rely on a GitLab instance hosted by the consortium. The Git logo is licensed under CC BY-SA 4.0 by GitLab (https://about.gitlab.com/press/).