BACKGROUND: High-throughput automated sequencing has enabled an exponential growth rate of sequencing data. This requires increasing sequence quality and reliability in order to avoid database contamination with artefactual sequences. The arrival of pyrosequencing enhances this problem and necessitates customisable pre-processing algorithms. RESULTS: SeqTrim has been implemented both as a Web and as a standalone command line application. Already-published and newly-designed algorithms have been included to identify sequence inserts, to remove low quality, vector, adaptor, low complexity and contaminant sequences, and to detect chimeric reads. The availability of several input and output formats allows its inclusion in sequence processing workflows. Due to its specific algorithms, SeqTrim outperforms other pre-processors implemented as Web services or standalone applications. It performs equally well with sequences from EST libraries, SSH libraries, genomic DNA libraries and pyrosequencing reads and does not lead to over-trimming. CONCLUSIONS: SeqTrim is an efficient pipeline designed for pre-processing of any type of sequence read, including next-generation sequencing. It is easily configurable and provides a friendly interface that allows users to know what happened with sequences at every pre-processing stage, and to verify pre-processing of an individual sequence if desired. The recommended pipeline reveals more information about each sequence than previously described pre-processors and can discard more sequencing or experimental artefacts.
BACKGROUND: High-throughput automated sequencing has enabled an exponential growth rate of sequencing data. This requires increasing sequence quality and reliability in order to avoid database contamination with artefactual sequences. The arrival of pyrosequencing enhances this problem and necessitates customisable pre-processing algorithms. RESULTS: SeqTrim has been implemented both as a Web and as a standalone command line application. Already-published and newly-designed algorithms have been included to identify sequence inserts, to remove low quality, vector, adaptor, low complexity and contaminant sequences, and to detect chimeric reads. The availability of several input and output formats allows its inclusion in sequence processing workflows. Due to its specific algorithms, SeqTrim outperforms other pre-processors implemented as Web services or standalone applications. It performs equally well with sequences from EST libraries, SSH libraries, genomic DNA libraries and pyrosequencing reads and does not lead to over-trimming. CONCLUSIONS: SeqTrim is an efficient pipeline designed for pre-processing of any type of sequence read, including next-generation sequencing. It is easily configurable and provides a friendly interface that allows users to know what happened with sequences at every pre-processing stage, and to verify pre-processing of an individual sequence if desired. The recommended pipeline reveals more information about each sequence than previously described pre-processors and can discard more sequencing or experimental artefacts.
Authors: Agnes Hotz-Wagenblatt; Thomas Hankeln; Peter Ernst; Karl-Heinz Glatting; Erwin R Schmidt; Sándor Suhai Journal: Nucleic Acids Res Date: 2003-07-01 Impact factor: 16.971
Authors: Javier Forment; Francisco Gilabert; Antonio Robles; Vicente Conejero; Fernando Nuez; Jose M Blanca Journal: BMC Bioinformatics Date: 2008-01-07 Impact factor: 3.169
Authors: Francesca Aprile; Zaira Heredia-Ponce; Francisco M Cazorla; Antonio de Vicente; José A Gutiérrez-Barranquero Journal: Appl Environ Microbiol Date: 2020-12-23 Impact factor: 4.792
Authors: Adam R Rivers; Shalabh Sharma; Susannah G Tringe; Jeffrey Martin; Samantha B Joye; Mary Ann Moran Journal: ISME J Date: 2013-08-01 Impact factor: 10.302
Authors: José F Cobo-Díaz; Antonio J Fernández-González; Pablo J Villadas; Ana B Robles; Nicolás Toro; Manuel Fernández-López Journal: Microb Ecol Date: 2015-03-03 Impact factor: 4.552
Authors: Patricia I S Pinto; Cláudia C Guerreiro; Rita A Costa; Juan F Martinez-Blanch; Carlos Carballo; Francisco M Codoñer; Manuel Manchado; Deborah M Power Journal: Sci Rep Date: 2019-09-20 Impact factor: 4.379