Literature DB >> 33410471

PM4NGS, a project management framework for next-generation sequencing data analysis.

Roberto Vera Alvarez1, Lorinc Pongor2, Leonardo Mariño-Ramírez3, David Landsman1.   

Abstract

BACKGROUND: FAIR (Findability, Accessibility, Interoperability, and Reusability) next-generation sequencing (NGS) data analysis relies on complex computational biology workflows and pipelines to guarantee reproducibility, portability, and scalability. Moreover, workflow languages, managers, and container technologies have helped address the problem of data analysis pipeline execution across multiple platforms in scalable ways.
FINDINGS: Here, we present a project management framework for NGS data analysis called PM4NGS. This framework is composed of an automatic creation of a standard organizational structure of directories and files, bioinformatics tool management using Docker or Bioconda, and data analysis pipelines in CWL format. Pre-configured Jupyter notebooks with minimum Python code are included in PM4NGS to produce a project report and publication-ready figures. We present 3 pipelines for demonstration purposes including the analysis of RNA-Seq, ChIP-Seq, and ChIP-exo datasets.
CONCLUSIONS: PM4NGS is an open source framework that creates a standard organizational structure for NGS data analysis projects. PM4NGS is easy to install, configure, and use by non-bioinformaticians on personal computers and laptops. It permits execution of the NGS data analysis on Windows 10 with the Windows Subsystem for Linux feature activated. The framework aims to reduce the gap between researcher in experimental laboratories producing NGS data and workflows for data analysis. PM4NGS documentation can be accessed at https://pm4ngs.readthedocs.io/. Published by Oxford University Press on behalf of GigaScience 2021.

Entities:  

Keywords:  ChIP-Seq; ChIP-exo; FAIR, RNA-Seq; NGS pipelines; NGS sequence analysis; open source frameworks

Mesh:

Year:  2021        PMID: 33410471      PMCID: PMC7788391          DOI: 10.1093/gigascience/giaa141

Source DB:  PubMed          Journal:  Gigascience        ISSN: 2047-217X            Impact factor:   6.524


  33 in total

1.  STAR: ultrafast universal RNA-seq aligner.

Authors:  Alexander Dobin; Carrie A Davis; Felix Schlesinger; Jorg Drenkow; Chris Zaleski; Sonali Jha; Philippe Batut; Mark Chaisson; Thomas R Gingeras
Journal:  Bioinformatics       Date:  2012-10-25       Impact factor: 6.937

2.  Comprehensive genome-wide protein-DNA interactions detected at single-nucleotide resolution.

Authors:  Ho Sung Rhee; B Franklin Pugh
Journal:  Cell       Date:  2011-12-09       Impact factor: 41.582

3.  The Sequence Read Archive: explosive growth of sequencing data.

Authors:  Yuichi Kodama; Martin Shumway; Rasko Leinonen
Journal:  Nucleic Acids Res       Date:  2011-10-18       Impact factor: 16.971

4.  Differential expression analysis for sequence count data.

Authors:  Simon Anders; Wolfgang Huber
Journal:  Genome Biol       Date:  2010-10-27       Impact factor: 13.583

5.  snakePipes: facilitating flexible, scalable and integrative epigenomic analysis.

Authors:  Vivek Bhardwaj; Steffen Heyne; Katarzyna Sikora; Leily Rabbani; Michael Rauer; Fabian Kilpert; Andreas S Richter; Devon P Ryan; Thomas Manke
Journal:  Bioinformatics       Date:  2019-11-01       Impact factor: 6.937

6.  Fast and accurate short read alignment with Burrows-Wheeler transform.

Authors:  Heng Li; Richard Durbin
Journal:  Bioinformatics       Date:  2009-05-18       Impact factor: 6.937

7.  Model-based analysis of ChIP-Seq (MACS).

Authors:  Yong Zhang; Tao Liu; Clifford A Meyer; Jérôme Eeckhoute; David S Johnson; Bradley E Bernstein; Chad Nusbaum; Richard M Myers; Myles Brown; Wei Li; X Shirley Liu
Journal:  Genome Biol       Date:  2008-09-17       Impact factor: 13.583

8.  Design and analysis of ChIP-seq experiments for DNA-binding proteins.

Authors:  Peter V Kharchenko; Michael Y Tolstorukov; Peter J Park
Journal:  Nat Biotechnol       Date:  2008-11-16       Impact factor: 54.908

9.  BioWardrobe: an integrated platform for analysis of epigenomics and transcriptomics data.

Authors:  Andrey V Kartashov; Artem Barski
Journal:  Genome Biol       Date:  2015-08-07       Impact factor: 13.583

10.  The FAIR Guiding Principles for scientific data management and stewardship.

Authors:  Mark D Wilkinson; Michel Dumontier; I Jsbrand Jan Aalbersberg; Gabrielle Appleton; Myles Axton; Arie Baak; Niklas Blomberg; Jan-Willem Boiten; Luiz Bonino da Silva Santos; Philip E Bourne; Jildau Bouwman; Anthony J Brookes; Tim Clark; Mercè Crosas; Ingrid Dillo; Olivier Dumon; Scott Edmunds; Chris T Evelo; Richard Finkers; Alejandra Gonzalez-Beltran; Alasdair J G Gray; Paul Groth; Carole Goble; Jeffrey S Grethe; Jaap Heringa; Peter A C 't Hoen; Rob Hooft; Tobias Kuhn; Ruben Kok; Joost Kok; Scott J Lusher; Maryann E Martone; Albert Mons; Abel L Packer; Bengt Persson; Philippe Rocca-Serra; Marco Roos; Rene van Schaik; Susanna-Assunta Sansone; Erik Schultes; Thierry Sengstag; Ted Slater; George Strawn; Morris A Swertz; Mark Thompson; Johan van der Lei; Erik van Mulligen; Jan Velterop; Andra Waagmeester; Peter Wittenburg; Katherine Wolstencroft; Jun Zhao; Barend Mons
Journal:  Sci Data       Date:  2016-03-15       Impact factor: 6.444

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.