| Literature DB >> 23793381 |
Jing-Woei Li1, Dan Bolser, Magnus Manske, Federico Manuel Giorgi, Nikolay Vyahhi, Björn Usadel, Bernardo J Clavijo, Ting-Fung Chan, Nathalie Wong, Daniel Zerbino, Maria Victoria Schneider.
Abstract
Next-generation sequencing (NGS) is increasingly being adopted as the backbone of biomedical research. With the commercialization of various affordable desktop sequencers, NGS will be reached by increasing numbers of cellular and molecular biologists, necessitating community consensus on bioinformatics protocols to tackle the exponential increase in quantity of sequence data. The current resources for NGS informatics are extremely fragmented. Finding a centralized synthesis is difficult. A multitude of tools exist for NGS data analysis; however, none of these satisfies all possible uses and needs. This gap in functionality could be filled by integrating different methods in customized pipelines, an approach helped by the open-source nature of many NGS programmes. Drawing from community spirit and with the use of the Wikipedia framework, we have initiated a collaborative NGS resource: The NGS WikiBook. We have collected a sufficient amount of text to incentivize a broader community to contribute to it. Users can search, browse, edit and create new content, so as to facilitate self-learning and feedback to the community. The overall structure and style for this dynamic material is designed for the bench biologists and non-bioinformaticians. The flexibility of online material allows the readers to ignore details in a first read, yet have immediate access to the information they need. Each chapter comes with practical exercises so readers may familiarize themselves with each step. The NGS WikiBook aims to create a collective laboratory book and protocol that explains the key concepts and describes best practices in this fast-evolving field.Entities:
Keywords: best practice; bioinformatics; collaborative learning; next-generation sequencing; training
Mesh:
Year: 2013 PMID: 23793381 PMCID: PMC3771235 DOI: 10.1093/bib/bbt045
Source DB: PubMed Journal: Brief Bioinform ISSN: 1467-5463 Impact factor: 11.622
Examples of disparate expectations of NGS bioinformatics by life scientists and bioinformaticians
| Misconception on bioinformatics | The reality |
|---|---|
| It is a rapid and easy publishing field. | The rather rapid publishing in the bioinformatics field is possible because it has sped up the observation and allowed analysis at an unprecedented speed compared with traditional experimental approaches. |
| A final result is generated automatically by pressing a button. | There exists no magic programme to do everything. Setting up computationally is an expensive operation and has to be done with great attention to details and understanding of the procedures. Besides, every result should be replicable by repeating the analysis with a slight change in parameters or with a different fundamental approach. |
| NGS analysis is all about alignment/read mapping that anyone can master within one day. | Choosing the optimal approach depends on the biological question being asked and the NGS technology used. |
Summary of content in the NGS WikiBook
| Chapter | Theme | What is it about? |
|---|---|---|
| 1 | Introduction | Overview of the field. Starting with sequencing technologies, their properties, strengths and weaknesses, covering the various biologies that they assay and finishing with a section on common sequencing terminology. An overview of a typical sequencing workflow is presented. |
| 2 | Big data | Some of the (perhaps unexpected) difficulties that arise when dealing with typical volumes of NGS data. From shipping hard drives around the world to the amount of computer memory needed to assemble the data when they arrive. File formats, archives and algorithms that have been developed to deal with these problems are discussed. |
| 3 | Bioinformatics from the outside | Discussing the interfaces used by bioinformaticians. The command line with its text interface and blinking cursor and also more user-friendly graphical user interfaces (GUIs), which were developed especially for bioinformatics pipelines, are reviewed. |
| 4 | Preprocessing | Discussing the best practices of controlling the quality of a NGS data set, and cleaning of low-quality data. |
| 5 | Alignment | How to map a set of reads to a reference sequence. |
| 6 | DNA variants | How to call variants (single nucleotide variation, copy number variation or structural variations) using mapped reads. |
| 7 | RNA | How to determine exons, isoforms and gene expression levels from mapped RNA-seq reads. |
| 8 | Epigenetics | Pull-down assays, which are used to determine epigenetic traits such as histone or CpG methylation. |
| 9 | Chromatin structure | Technologies used to determine the structure of the chromatin, e.g. the placement of the histones or the physical proximity of different chromosomal regions when the DNA lies in the nucleus. |
| 10 | Ways to assemble a genome from NGS reads. | |
| 11 | Ways to assemble a transcriptome from NGS reads. | |
| 12 | Authors | Contributors of substantial amount of work to this WikiBook should add themselves to this chapter. |