Literature DB >> 33438730

Streamlining data-intensive biology with workflow systems.

Taylor Reiter1, Phillip T Brooks1, Luiz Irber1, Shannon E K Joslin2, Charles M Reid1, Camille Scott1, C Titus Brown1, N Tessa Pierce-Ward1.   

Abstract

As the scale of biological data generation has increased, the bottleneck of research has shifted from data generation to analysis. Researchers commonly need to build computational workflows that include multiple analytic tools and require incremental development as experimental insights demand tool and parameter modifications. These workflows can produce hundreds to thousands of intermediate files and results that must be integrated for biological insight. Data-centric workflow systems that internally manage computational resources, software, and conditional execution of analysis steps are reshaping the landscape of biological data analysis and empowering researchers to conduct reproducible analyses at scale. Adoption of these tools can facilitate and expedite robust data analysis, but knowledge of these techniques is still lacking. Here, we provide a series of strategies for leveraging workflow systems with structured project, data, and resource management to streamline large-scale biological analysis. We present these practices in the context of high-throughput sequencing data analysis, but the principles are broadly applicable to biologists working beyond this field.
© The Author(s) 2021. Published by Oxford University Press GigaScience.

Keywords:  automation; data-intensive biology; repeatability; workflows

Year:  2021        PMID: 33438730     DOI: 10.1093/gigascience/giaa140

Source DB:  PubMed          Journal:  Gigascience        ISSN: 2047-217X            Impact factor:   6.524


  10 in total

Review 1.  A simple guide to de novo transcriptome assembly and annotation.

Authors:  Venket Raghavan; Louis Kraft; Fantin Mesny; Linda Rigerte
Journal:  Brief Bioinform       Date:  2022-03-10       Impact factor: 11.622

2.  Ten simple rules to cultivate transdisciplinary collaboration in data science.

Authors:  Faryad Sahneh; Meghan A Balk; Marina Kisley; Chi-Kwan Chan; Mercury Fox; Brian Nord; Eric Lyons; Tyson Swetnam; Daniela Huppenkothen; Will Sutherland; Ramona L Walls; Daven P Quinn; Tonantzin Tarin; David LeBauer; David Ribes; Dunbar P Birnie; Carol Lushbough; Eric Carr; Grey Nearing; Jeremy Fischer; Kevin Tyle; Luis Carrasco; Meagan Lang; Peter W Rose; Richard R Rushforth; Samapriya Roy; Thomas Matheson; Tina Lee; C Titus Brown; Tracy K Teal; Monica Papeș; Stephen Kobourov; Nirav Merchant
Journal:  PLoS Comput Biol       Date:  2021-05-13       Impact factor: 4.475

3.  Principles for data analysis workflows.

Authors:  Sara Stoudt; Váleri N Vásquez; Ciera C Martinez
Journal:  PLoS Comput Biol       Date:  2021-03-18       Impact factor: 4.475

4.  Using prototyping to choose a bioinformatics workflow management system.

Authors:  Michael Jackson; Kostas Kavoussanakis; Edward W J Wallace
Journal:  PLoS Comput Biol       Date:  2021-02-25       Impact factor: 4.475

5.  Inverting the model of genomics data sharing with the NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space.

Authors:  Michael C Schatz; Anthony A Philippakis; Enis Afgan; Eric Banks; Vincent J Carey; Robert J Carroll; Alessandro Culotti; Kyle Ellrott; Jeremy Goecks; Robert L Grossman; Ira M Hall; Kasper D Hansen; Jonathan Lawson; Jeffrey T Leek; Anne O'Donnell Luria; Stephen Mosher; Martin Morgan; Anton Nekrutenko; Brian D O'Connor; Kevin Osborn; Benedict Paten; Candace Patterson; Frederick J Tan; Casey Overby Taylor; Jennifer Vessio; Levi Waldron; Ting Wang; Kristin Wuichet
Journal:  Cell Genom       Date:  2022-01-13

6.  Pre-exascale HPC approaches for molecular dynamics simulations. Covid-19 research: A use case.

Authors:  Miłosz Wieczór; Vito Genna; Juan Aranda; Rosa M Badia; Josep Lluís Gelpí; Vytautas Gapsys; Bert L de Groot; Erik Lindahl; Martí Municoy; Adam Hospital; Modesto Orozco
Journal:  Wiley Interdiscip Rev Comput Mol Sci       Date:  2022-05-30

7.  Tourmaline: A containerized workflow for rapid and iterable amplicon sequence analysis using QIIME 2 and Snakemake.

Authors:  Luke R Thompson; Sean R Anderson; Paul A Den Uyl; Nastassia V Patin; Shen Jean Lim; Grant Sanderson; Kelly D Goodwin
Journal:  Gigascience       Date:  2022-07-28       Impact factor: 7.658

8.  medna-metadata: an open-source data management system for tracking environmental DNA samples and metadata.

Authors:  M Kimble; S Allers; K Campbell; C Chen; L M Jackson; B L King; S Silverbrand; G York; K Beard
Journal:  Bioinformatics       Date:  2022-08-12       Impact factor: 6.931

9.  Editorial: Curriculum Applications in Microbiology: Bioinformatics in the Classroom.

Authors:  Melanie Crystal Melendrez; Sophie Shaw; C Titus Brown; Brad W Goodner; Christopher Kvaal
Journal:  Front Microbiol       Date:  2021-07-01       Impact factor: 5.640

10.  Ten simple rules for making a software tool workflow-ready.

Authors:  Paul Brack; Peter Crowther; Stian Soiland-Reyes; Stuart Owen; Douglas Lowe; Alan R Williams; Quentin Groom; Mathias Dillen; Frederik Coppens; Björn Grüning; Ignacio Eguinoa; Philip Ewels; Carole Goble
Journal:  PLoS Comput Biol       Date:  2022-03-24       Impact factor: 4.475

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.