Evaluation of the quality of genomic "data products" such as genome assemblies or gene sets is of critical importance in order to recognize possible issues and correct them during the generation of new data. It is equally essential to guide subsequent or comparative analyses with existing data, as the correct interpretation of the results necessarily requires knowledge about the quality level and reliability of the inputs. Using datasets of near universal single-copy orthologs derived from OrthoDB, BUSCO can estimate the completeness and redundancy of genomic data by providing biologically meaningful metrics based on expected gene content. These can complement technical metrics such as contiguity measures (e.g., number of contigs/scaffolds, and N50 values). Here, we describe the use of the BUSCO tool suite to assess different data types that can range from genome assemblies of single isolates and assembled transcriptomes and annotated gene sets to metagenome-assembled genomes where the taxonomic origin of the species is unknown. BUSCO is the only tool capable of assessing all these types of sequences from both eukaryotic and prokaryotic species. The protocols detail the various BUSCO running modes and the novel workflows introduced in versions 4 and 5, including the batch analysis on multiple inputs, the auto-lineage workflow to run assessments without specifying a dataset, and a workflow for the evaluation of (large) eukaryotic genomes. The protocols further cover the BUSCO setup, guidelines to interpret the results, and BUSCO "plugin" workflows for performing common operations in genomics using BUSCO results, such as building phylogenomic trees and visualizing syntenies.
Evaluation of the quality of genomic "data products" such as genome assemblies or gene sets is of critical importance in order to recognize possible issues and correct them during the generation of new data. It is equally essential to guide subsequent or comparative analyses with existing data, as the correct interpretation of the results necessarily requires knowledge about the quality level and reliability of the inputs. Using datasets of near universal single-copy orthologs derived from OrthoDB, BUSCO can estimate the completeness and redundancy of genomic data by providing biologically meaningful metrics based on expected gene content. These can complement technical metrics such as contiguity measures (e.g., number of contigs/scaffolds, and N50 values). Here, we describe the use of the BUSCO tool suite to assess different data types that can range from genome assemblies of single isolates and assembled transcriptomes and annotated gene sets to metagenome-assembled genomes where the taxonomic origin of the species is unknown. BUSCO is the only tool capable of assessing all these types of sequences from both eukaryotic and prokaryotic species. The protocols detail the various BUSCO running modes and the novel workflows introduced in versions 4 and 5, including the batch analysis on multiple inputs, the auto-lineage workflow to run assessments without specifying a dataset, and a workflow for the evaluation of (large) eukaryotic genomes. The protocols further cover the BUSCO setup, guidelines to interpret the results, and BUSCO "plugin" workflows for performing common operations in genomics using BUSCO results, such as building phylogenomic trees and visualizing syntenies.
Authors: Nadia Kamal; Nikos Tsardakas Renhuldt; Johan Bentzer; Heidrun Gundlach; Georg Haberer; Angéla Juhász; Thomas Lux; Utpal Bose; Jason A Tye-Din; Daniel Lang; Nico van Gessel; Ralf Reski; Yong-Bi Fu; Peter Spégel; Alf Ceplitis; Axel Himmelbach; Amanda J Waters; Wubishet A Bekele; Michelle L Colgrave; Mats Hansson; Nils Stein; Klaus F X Mayer; Eric N Jellen; Peter J Maughan; Nicholas A Tinker; Martin Mascher; Olof Olsson; Manuel Spannagl; Nick Sirijovski Journal: Nature Date: 2022-05-18 Impact factor: 69.504
Authors: Elisa Carotti; Federica Carducci; Samuele Greco; Marco Gerdol; Daniele Di Marino; Nunzio Perta; Anna La Teana; Adriana Canapa; Marco Barucca; Maria Assunta Biscotti Journal: Int J Mol Sci Date: 2022-05-06 Impact factor: 6.208
Authors: Lina Reslan; George F Araj; Marc Finianos; Rima El Asmar; Jaroslav Hrabak; Ghassan Dbaibo; Ibrahim Bitar Journal: Front Microbiol Date: 2022-01-25 Impact factor: 5.640
Authors: Rebecca E Roberts; Twinkle Biswas; Jothi Kumar Yuvaraj; Ewald Grosse-Wilde; Daniel Powell; Bill S Hansson; Christer Löfstedt; Martin N Andersson Journal: Mol Ecol Date: 2022-05-26 Impact factor: 6.622