Measuring the expression of genes on a genome-wide scale has become an essential part of many biofilm studies. Historically this was done using microarrays (e.g. Refs. [1,2]), but currently ‘next-generation RNA sequencing’ (RNA-seq) (mostly using the Illumina sequencing technology) has become the method of choice for transcriptome studies (e.g. Refs. [[3], [4], [5]]). Besides these techniques that provide information about gene expression at the genome-scale level (i.e. quantify the expression level of all genes), other approaches can be used to measure the expression levels of a smaller subset of genes. This includes quantitative real-time PCR (qPCR) (e.g. Refs. [6,7]) and the construction of translational fusion reporters in which the gene coding for the transcript of interest is coupled to a reporter gene like eGFP (e.g. Ref. [8]) or lacZ (e.g. Ref. [9]). Historically the latter approaches (most often qPCR) have been used to confirm data obtained in large-scale transcriptomics studies, but whether this is necessary and/or provides an added value is not always clear. Authors, reviewers and editors often struggle with this question and the aim of this editorial is to provide a balanced overview of the issue and provide some guidance.The main question in this debate comes down to: how reliable is RNA-seq to identify differentially expressed genes and to estimate how much their expression differs between different conditions? And is qPCR needed to validate such expression differences? The focus on validation of genome-scale expression studies likely stems from prior work with microarrays. While microarrays allowed to carry out gene expression studies on a scale not seen before, and despite their overall high level of performance, some concerns were raised about reproducibility and bias (e.g. Refs. [10,11]). Because of this, many researchers felt the need to validate microarray results with qPCR. However, RNA-seq does not suffer from the same issues as (some) microarrays did and there are a number of studies that have specifically addressed the correlation between results obtained with RNA-seq and qPCR. A comprehensive analysis was published by Everaert et al. [12], in which five RNA-seq analysis pipelines are compared to wet-lab qPCR results and this for >18.000 protein-coding genes. While this study is based on RNA samples from human origin, there is nothing that suggests the outcome of this study would be different for studies with microorganisms. One of the main conclusion from this study is that depending on the analysis workflow 15–20% of genes are considered as ‘non-concordant’ when results obtained with RNA-seq are compared to results obtained with qPCR (with ‘non-concordant’ defined as both approaches yielding differential expression in opposing directions, or one of the methods showing differential expression while the other does not). However, of the genes showing non-concordant results, 93% show a fold change lower than 2 and approx. 80% show a fold change lower than 1.5. In addition, of the non-concordant genes with a fold change > 2, the vast majority are expressed at very low levels. Overall, the conclusion was that there appears to be a very small fraction (approx. 1.8%) of genes that are severely non-concordant, and these genes are typically lower expressed and shorter. Examples of other studies that show good correlations between results obtained with qPCR and with RNA-seq include [[13], [14], [15], [16]]. A more general reflection on the value of validation in genome-scale studies can be found in Ref. [17].A second important aspect in this discussion is feasibility. It is not a priori known for which genes RNA-seq potentially yields non-concordant results in a particular study set up and as such it could be suggested to determine expression levels of all genes with qPCR or, alternatively, randomly select some genes for follow-up with qPCR. The former option is obviously not realistic in terms of cost and workload (and defeats the purpose of doing RNA-seq in the first place). The latter option could be an alternative, but how many genes need to be confirmed with another approach? As some genes are concordant and others are non-concordant, obtaining concordant results for a random selection of genes is no guarantee that other genes have been correctly identified as differentially expressed by RNA-seq and seems unlikely to provide much added value in most cases.If all experimental steps and data analyses are carried out according to the state-of-the-art, results from RNA-seq are expected to be reliable and if they are based on a sufficient number of biological replicates, the added value of validating them with qPCR (or any other approach) is likely to be low. However, the situation is different when an entire story is based on differential expression of only a few genes, especially if expression levels of these genes are low and/or differences in expression are small. In such a case, orthogonal method validation (e.g. by qPCR or reporter fusions) seems appropriate, as one wants to make sure that the observed differences in expression for these genes on which the story is based are real and can be independently verified. In addition, qPCR would be valuable to measure expression of selected genes in additional samples. E.g. when RNA-seq identifies differential expression of gene X in a particular strain and/or condition, qPCR could be used to confirm this differential expression in additional strains and/or conditions.While not the main topic of this editorial, I would like to point out that it is important to follow the minimum information guidelines that have been developed for different techniques and biological experiments; an overview of these can be found at https://fairsharing.org/collection/MIBBI. Of particular relevance in this context are the MIQE guidelines for qPCR (https://fairsharing.org/FAIRsharing.mxz4jy) [18] and the MINSEQE guidelines for high-throughput sequencing (https://fairsharing.org/FAIRsharing.a55z32). In addition, it is worth emphasizing that also for biofilm experiments such minimal guidelines are available (MIABiE, https://fairsharing.org/FAIRsharing.6mk8xz) [19] and that there is a specific minimum information guideline for biofilm experiments in microtiter plates [20].In conclusion, the data available suggest that RNA-seq methods and data analysis approaches are robust enough to not always require validation by qPCR and/or other approaches, although there are situations where this may be of added value. While this editorial by no means presents a comprehensive overview of this topic, the hope is that it will provide some guidance to scientists struggling with the question whether RNA-seq data obtained in biofilm studies need independent verification.
Authors: Malachi Griffith; Obi L Griffith; Jill Mwenifumbo; Rodrigo Goya; A Sorana Morrissy; Ryan D Morin; Richard Corbett; Michelle J Tang; Ying-Chen Hou; Trevor J Pugh; Gordon Robertson; Suganthi Chittaranjan; Adrian Ally; Jennifer K Asano; Susanna Y Chan; Haiyan I Li; Helen McDonald; Kevin Teague; Yongjun Zhao; Thomas Zeng; Allen Delaney; Martin Hirst; Gregg B Morin; Steven J M Jones; Isabella T Tai; Marco A Marra Journal: Nat Methods Date: 2010-09-12 Impact factor: 28.547
Authors: Heleen Nailis; Sona Kucharíková; Markéta Ricicová; Patrick Van Dijck; Dieter Deforce; Hans Nelis; Tom Coenye Journal: BMC Microbiol Date: 2010-04-16 Impact factor: 3.605
Authors: Anália Lourenço; Tom Coenye; Darla M Goeres; Gianfranco Donelli; Andreia S Azevedo; Howard Ceri; Filipa L Coelho; Hans-Curt Flemming; Talis Juhna; Susana P Lopes; Rosário Oliveira; Antonio Oliver; Mark E Shirtliff; Ana M Sousa; Paul Stoodley; Maria Olivia Pereira; Nuno F Azevedo Journal: Pathog Dis Date: 2014-02-24 Impact factor: 3.166
Authors: Angela R Wu; Norma F Neff; Tomer Kalisky; Piero Dalerba; Barbara Treutlein; Michael E Rothenberg; Francis M Mburu; Gary L Mantalas; Sopheak Sim; Michael F Clarke; Stephen R Quake Journal: Nat Methods Date: 2013-10-20 Impact factor: 28.547
Authors: Daniel M Cornforth; Justine L Dees; Carolyn B Ibberson; Holly K Huse; Inger H Mathiesen; Klaus Kirketerp-Møller; Randy D Wolcott; Kendra P Rumbaugh; Thomas Bjarnsholt; Marvin Whiteley Journal: Proc Natl Acad Sci U S A Date: 2018-05-14 Impact factor: 11.205
Authors: Elizabeth Braunlin; Juan E Abrahante; Ron McElmurry; Michael Evans; Miles Smith; Davis Seelig; M Gerard O'Sullivan; Jakub Tolar; Chester B Whitley; R Scott McIvor Journal: Mol Genet Metab Date: 2022-02-03 Impact factor: 4.204
Authors: Justin P Shaffer; Morgan E Carter; Joseph E Spraker; Meara Clark; Brian A Smith; Kevin L Hockett; David A Baltrus; A Elizabeth Arnold Journal: mSystems Date: 2022-03-16 Impact factor: 7.324