Literature DB >> 33301893

Sequencing facility and DNA source associated patterns of virus-mappable reads in whole-genome sequencing data.

Xun Chen1, Dawei Li2.   

Abstract

Numerous viral sequences have been reported in the whole-genome sequencing (WGS) data of human blood. However, it is not clear to what degree the virus-mappable reads represent true viral sequences rather than random-mapping or noise originating from sample preparation, sequencing processes, or other sources. Identification of patterns of virus-mappable reads may generate novel indicators for evaluating the origins of these viral sequences. We characterized paired-end unmapped reads and reads aligned to viral references in human WGS datasets, then compared patterns of the virus-mappable reads among DNA sources and sequencing facilities which produced these datasets. We then examined potential origins of the source- and facility-associated viral reads. The proportions of clean unmapped reads among the seven sequencing facilities were significantly different (P < 2 × 10-16). We identified 260,339 reads that were mappable to a total of 99 viral references in 2535 samples. The majority (86.7%) of these virus-mappable reads (corresponding to 47 viral references), which can be classified into four groups based on their distinct patterns, were strongly associated with sequencing facility or DNA source (adjusted P value <0.01). Possible origins of these reads include artificial sequences in library preparation, recombinant vectors in cell culture, and phages co-contaminated with their host bacteria. The sequencing facility-associated virus-mappable reads and patterns were repeatedly observed in other datasets produced in the same facilities. We have constructed an analytic framework and profiled the unmapped reads mappable to viral references. The results provide a new understanding of sequencing facility- and DNA source-associated batch effects in deep sequencing data and may facilitate improved bioinformatics filtering of reads.
Copyright © 2020 Elsevier Inc. All rights reserved.

Entities:  

Keywords:  Batch effect; High-throughput sequencing; Human virome; Unmapped reads

Mesh:

Year:  2020        PMID: 33301893      PMCID: PMC7856238          DOI: 10.1016/j.ygeno.2020.12.004

Source DB:  PubMed          Journal:  Genomics        ISSN: 0888-7543            Impact factor:   5.736


  50 in total

1.  A fast and symmetric DUST implementation to mask low-complexity DNA sequences.

Authors:  Aleksandr Morgulis; E Michael Gertz; Alejandro A Schäffer; Richa Agarwala
Journal:  J Comput Biol       Date:  2006-06       Impact factor: 1.479

2.  Sequence-based discovery of Bradyrhizobium enterica in cord colitis syndrome.

Authors:  Ami S Bhatt; Samuel S Freeman; Alex F Herrera; Chandra Sekhar Pedamallu; Dirk Gevers; Fujiko Duke; Joonil Jung; Monia Michaud; Bruce J Walker; Sarah Young; Ashlee M Earl; Aleksander D Kostic; Akinyemi I Ojesina; Robert Hasserjian; Karen K Ballen; Yi-Bin Chen; Gabriela Hobbs; Joseph H Antin; Robert J Soiffer; Lindsey R Baden; Wendy S Garrett; Jason L Hornick; Francisco M Marty; Matthew Meyerson
Journal:  N Engl J Med       Date:  2013-08-08       Impact factor: 91.245

Review 3.  Torquetenovirus: the human virome from bench to bedside.

Authors:  D Focosi; G Antonelli; M Pistello; F Maggi
Journal:  Clin Microbiol Infect       Date:  2016-04-16       Impact factor: 8.067

4.  Microbial contamination in next generation sequencing: implications for sequence-based analysis of clinical samples.

Authors:  Michael J Strong; Guorong Xu; Lisa Morici; Sandra Splinter Bon-Durant; Melody Baddoo; Zhen Lin; Claire Fewell; Christopher M Taylor; Erik K Flemington
Journal:  PLoS Pathog       Date:  2014-11-20       Impact factor: 6.823

5.  Diverse and widespread contamination evident in the unmapped depths of high throughput sequencing data.

Authors:  Richard W Lusk
Journal:  PLoS One       Date:  2014-10-29       Impact factor: 3.240

6.  Integrated genomic and molecular characterization of cervical cancer.

Authors: 
Journal:  Nature       Date:  2017-01-23       Impact factor: 49.962

7.  Comprehensive and Integrative Genomic Characterization of Hepatocellular Carcinoma.

Authors: 
Journal:  Cell       Date:  2017-06-15       Impact factor: 66.850

8.  The landscape of viral expression and host gene fusion and adaptation in human cancer.

Authors:  Ka-Wei Tang; Babak Alaei-Mahabadi; Tore Samuelsson; Magnus Lindh; Erik Larsson
Journal:  Nat Commun       Date:  2013       Impact factor: 14.919

9.  Viral expression associated with gastrointestinal adenocarcinomas in TCGA high-throughput sequencing data.

Authors:  Daria Salyakina; Nicholas F Tsinoremas
Journal:  Hum Genomics       Date:  2013-11-27       Impact factor: 4.639

Review 10.  Viral pathogen discovery.

Authors:  Charles Y Chiu
Journal:  Curr Opin Microbiol       Date:  2013-05-29       Impact factor: 7.934

View more
  1 in total

1.  Characterization of Hepatitis B Virus Integrations Identified in Hepatocellular Carcinoma Genomes.

Authors:  Pranav P Mathkar; Xun Chen; Arvis Sulovari; Dawei Li
Journal:  Viruses       Date:  2021-02-04       Impact factor: 5.048

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.