Literature DB >> 31943790

The impact of contaminants on the accuracy of genome skimming and the effectiveness of exclusion read filters.

Eleonora Rachtman1, Metin Balaban1, Vineet Bafna2, Siavash Mirarab3.   

Abstract

The ability to detect the identity of a sample obtained from its environment is a cornerstone of molecular ecological research. Thanks to the falling price of shotgun sequencing, genome skimming, the acquisition of short reads spread across the genome at low coverage, is emerging as an alternative to traditional barcoding. By obtaining far more data across the whole genome, skimming has the promise to increase the precision of sample identification beyond traditional barcoding while keeping the costs manageable. While methods for assembly-free sample identification based on genome skims are now available, little is known about how these methods react to the presence of DNA from organisms other than the target species. In this paper, we show that the accuracy of distances computed between a pair of genome skims based on k-mer similarity can degrade dramatically if the skims include contaminant reads; i.e., any reads originating from other organisms. We establish a theoretical model of the impact of contamination. We then suggest and evaluate a solution to the contamination problem: Query reads in a genome skim against an extensive database of possible contaminants (e.g., all microbial organisms) and filter out any read that matches. We evaluate the effectiveness of this strategy when implemented using Kraken-II, in detailed analyses. Our results show substantial improvements in accuracy as a result of filtering but also point to limitations, including a need for relatively close matches in the contaminant database.
© 2020 John Wiley & Sons Ltd.

Keywords:  Kraken; contamination; filtering; genome skimming; shotgun sequencing

Mesh:

Substances:

Year:  2020        PMID: 31943790     DOI: 10.1111/1755-0998.13135

Source DB:  PubMed          Journal:  Mol Ecol Resour        ISSN: 1755-098X            Impact factor:   7.090


  6 in total

1.  Estimating repeat spectra and genome length from low-coverage genome skims with RESPECT.

Authors:  Shahab Sarmashghi; Metin Balaban; Eleonora Rachtman; Behrouz Touri; Siavash Mirarab; Vineet Bafna
Journal:  PLoS Comput Biol       Date:  2021-11-15       Impact factor: 4.475

Review 2.  Contamination detection in genomic data: more is not enough.

Authors:  Luc Cornet; Denis Baurain
Journal:  Genome Biol       Date:  2022-02-21       Impact factor: 13.583

3.  Testing Efficacy of Assembly-Free and Alignment-Free Methods for Species Identification Using Genome Skims, with Patellogastropoda as a Test Case.

Authors:  Tao Xu; Lingfeng Kong; Qi Li
Journal:  Genes (Basel)       Date:  2022-07-02       Impact factor: 4.141

4.  Genome skimming approach reveals the gene arrangements in the chloroplast genomes of the highly endangered Crocus L. species: Crocus istanbulensis (B.Mathew) Rukšāns.

Authors:  Selahattin Baris Cay; Yusuf Ulas Cinar; Selim Can Kuralay; Behcet Inal; Gokmen Zararsiz; Almila Ciftci; Rachel Mollman; Onur Obut; Vahap Eldem; Yakup Bakir; Osman Erol
Journal:  PLoS One       Date:  2022-06-15       Impact factor: 3.752

5.  Phylogenetic double placement of mixed samples.

Authors:  Metin Balaban; Siavash Mirarab
Journal:  Bioinformatics       Date:  2020-07-01       Impact factor: 6.937

6.  Beyond DNA barcoding: The unrealized potential of genome skim data in sample identification.

Authors:  Kristine Bohmann; Siavash Mirarab; Vineet Bafna; M Thomas P Gilbert
Journal:  Mol Ecol       Date:  2020-06-29       Impact factor: 6.185

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.