Adrià Antich1, Creu Palacin2, Owen S Wangensteen3, Xavier Turon4. 1. Department of Marine Ecology, Centre for Advanced Studies of Blanes (CEAB-CSIC), Blanes (Girona), Catalonia, Spain. 2. Department of Evolutionary Biology, Ecology and Environmental Sciences, University of Barcelona and Research Institute of Biodiversity (IRBIO), Barcelona, Catalonia, Spain. 3. Norwegian College of Fishery Science, UiT The Arctic University of Norway, Tromsö, Norway. owen.wangensteen@uit.no. 4. Department of Marine Ecology, Centre for Advanced Studies of Blanes (CEAB-CSIC), Blanes (Girona), Catalonia, Spain. xturon@ceab.csic.es.
Abstract
BACKGROUND: The recent blooming of metabarcoding applications to biodiversity studies comes with some relevant methodological debates. One such issue concerns the treatment of reads by denoising or by clustering methods, which have been wrongly presented as alternatives. It has also been suggested that denoised sequence variants should replace clusters as the basic unit of metabarcoding analyses, missing the fact that sequence clusters are a proxy for species-level entities, the basic unit in biodiversity studies. We argue here that methods developed and tested for ribosomal markers have been uncritically applied to highly variable markers such as cytochrome oxidase I (COI) without conceptual or operational (e.g., parameter setting) adjustment. COI has a naturally high intraspecies variability that should be assessed and reported, as it is a source of highly valuable information. We contend that denoising and clustering are not alternatives. Rather, they are complementary and both should be used together in COI metabarcoding pipelines. RESULTS: Using a COI dataset from benthic marine communities, we compared two denoising procedures (based on the UNOISE3 and the DADA2 algorithms), set suitable parameters for denoising and clustering, and applied these steps in different orders. Our results indicated that the UNOISE3 algorithm preserved a higher intra-cluster variability. We introduce the program DnoisE to implement the UNOISE3 algorithm taking into account the natural variability (measured as entropy) of each codon position in protein-coding genes. This correction increased the number of sequences retained by 88%. The order of the steps (denoising and clustering) had little influence on the final outcome. CONCLUSIONS: We highlight the need for combining denoising and clustering, with adequate choice of stringency parameters, in COI metabarcoding. We present a program that uses the coding properties of this marker to improve the denoising step. We recommend researchers to report their results in terms of both denoised sequences (a proxy for haplotypes) and clusters formed (a proxy for species), and to avoid collapsing the sequences of the latter into a single representative. This will allow studies at the cluster (ideally equating species-level diversity) and at the intra-cluster level, and will ease additivity and comparability between studies.
BACKGROUND: The recent blooming of metabarcoding applications to biodiversity studies comes with some relevant methodological debates. One such issue concerns the treatment of reads by denoising or by clustering methods, which have been wrongly presented as alternatives. It has also been suggested that denoised sequence variants should replace clusters as the basic unit of metabarcoding analyses, missing the fact that sequence clusters are a proxy for species-level entities, the basic unit in biodiversity studies. We argue here that methods developed and tested for ribosomal markers have been uncritically applied to highly variable markers such as cytochrome oxidase I (COI) without conceptual or operational (e.g., parameter setting) adjustment. COI has a naturally high intraspecies variability that should be assessed and reported, as it is a source of highly valuable information. We contend that denoising and clustering are not alternatives. Rather, they are complementary and both should be used together in COI metabarcoding pipelines. RESULTS: Using a COI dataset from benthic marine communities, we compared two denoising procedures (based on the UNOISE3 and the DADA2 algorithms), set suitable parameters for denoising and clustering, and applied these steps in different orders. Our results indicated that the UNOISE3 algorithm preserved a higher intra-cluster variability. We introduce the program DnoisE to implement the UNOISE3 algorithm taking into account the natural variability (measured as entropy) of each codon position in protein-coding genes. This correction increased the number of sequences retained by 88%. The order of the steps (denoising and clustering) had little influence on the final outcome. CONCLUSIONS: We highlight the need for combining denoising and clustering, with adequate choice of stringency parameters, in COI metabarcoding. We present a program that uses the coding properties of this marker to improve the denoising step. We recommend researchers to report their results in terms of both denoised sequences (a proxy for haplotypes) and clusters formed (a proxy for species), and to avoid collapsing the sequences of the latter into a single representative. This will allow studies at the cluster (ideally equating species-level diversity) and at the intra-cluster level, and will ease additivity and comparability between studies.
Entities:
Keywords:
COI; Clustering; Denoising; Metabarcoding; Metaphylogeography; Operational taxonomic units
Authors: Benjamin J Callahan; Paul J McMurdie; Michael J Rosen; Andrew W Han; Amy Jo A Johnson; Susan P Holmes Journal: Nat Methods Date: 2016-05-23 Impact factor: 28.547
Authors: Devon R O'Rourke; Nicholas A Bokulich; Michelle A Jusino; Matthew D MacManes; Jeffrey T Foster Journal: Ecol Evol Date: 2020-07-23 Impact factor: 3.167
Authors: C David de Santana; Lynne R Parenti; Casey B Dillman; Jonathan A Coddington; Douglas A Bastos; Carole C Baldwin; Jansen Zuanon; Gislene Torrente-Vilara; Raphaël Covain; Naércio A Menezes; Aléssio Datovo; T Sado; M Miya Journal: Sci Rep Date: 2021-09-13 Impact factor: 4.379
Authors: Babett Günther; Eve Jourdain; Lindsay Rubincam; Richard Karoliussen; Sam L Cox; Sophie Arnaud Haond Journal: Sci Rep Date: 2022-04-19 Impact factor: 4.996
Authors: Thomas J Creedy; Carmelo Andújar; Emmanouil Meramveliotakis; Victor Noguerales; Isaac Overcast; Anna Papadopoulou; Hélène Morlon; Alfried P Vogler; Brent C Emerson; Paula Arribas Journal: Mol Ecol Resour Date: 2021-09-30 Impact factor: 8.678
Authors: Marisa C W Lim; Anton Seimon; Batya Nightingale; Charles C Y Xu; Stephan R P Halloy; Adam J Solon; Nicholas B Dragone; Steven K Schmidt; Alex Tait; Sandra Elvin; Aurora C Elmore; Tracie A Seimon Journal: iScience Date: 2022-08-15
Authors: Luke E Holman; Shirley Parker-Nance; Mark de Bruyn; Simon Creer; Gary Carvalho; Marc Rius Journal: Philos Trans R Soc Lond B Biol Sci Date: 2022-01-24 Impact factor: 6.671