| Literature DB >> 33738520 |
Adeleh Saffar1, Maryam M Matin2,3.
Abstract
Contaminations in sequencing data, especially in reference genomes, lead to inevitable errors in downstream analyses. Similarly, presence of contaminants in transcriptomes, misrepresents the molecular basis of various interactions. In this study, we report the presence of a large number of plant transcriptomes contaminated with RNAs encoding POU domain proteins; a family of proteins that has not been reported in plants and fungi. Besides, our findings illustrated that there are four POU domain protein-coding sequences in the reference genome of Rhodamnia argentea. It turned out that the existing foreign fragments are related to arthropods that are considered as plant pests. We also identified two contaminated draft genomes, Humulus lupulus and Cannabis sativa that contained complete rDNA sequences originating from Tetranychus species. As a result, careful screening of sequencing data before releasing them in public databases or checking existing genomes for possible contaminations is recommended.Entities:
Keywords: Contamination; Insect; POU domain protein; Plant transcriptomes
Year: 2021 PMID: 33738520 DOI: 10.1007/s00438-021-01768-z
Source DB: PubMed Journal: Mol Genet Genomics ISSN: 1617-4623 Impact factor: 3.291