Literature DB >> 28414515

Zseq: An Approach for Preprocessing Next-Generation Sequencing Data.

Abedalrhman Alkhateeb1, Luis Rueda1.   

Abstract

Next-generation sequencing technology generates a huge number of reads (short sequences), which contain a vast amount of genomic data. The sequencing process, however, comes with artifacts. Preprocessing of sequences is mandatory for further downstream analysis. We present Zseq, a linear method that identifies the most informative genomic sequences and reduces the number of biased sequences, sequence duplications, and ambiguous nucleotides. Zseq finds the complexity of the sequences by counting the number of unique k-mers in each sequence as its corresponding score and also takes into the account other factors such as ambiguous nucleotides or high GC-content percentage in k-mers. Based on a z-score threshold, Zseq sweeps through the sequences again and filters those with a z-score less than the user-defined threshold. Zseq algorithm is able to provide a better mapping rate; it reduces the number of ambiguous bases significantly in comparison with other methods. Evaluation of the filtered reads has been conducted by aligning the reads and assembling the transcripts using the reference genome as well as de novo assembly. The assembled transcripts show a better discriminative ability to separate cancer and normal samples in comparison with another state-of-the-art method. Moreover, de novo assembled transcripts from the reads filtered by Zseq have longer genomic sequences than other tested methods. Estimating the threshold of the cutoff point is introduced using labeling rules with optimistic results.

Entities:  

Keywords:  RNA-SEQ analysis; machine learning; next-generation sequencing; preprocessing

Mesh:

Year:  2017        PMID: 28414515      PMCID: PMC5563921          DOI: 10.1089/cmb.2017.0021

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  21 in total

1.  Analysis of microarray data using Z score transformation.

Authors:  Chris Cheadle; Marquis P Vawter; William J Freed; Kevin G Becker
Journal:  J Mol Diagn       Date:  2003-05       Impact factor: 5.568

2.  A fast and symmetric DUST implementation to mask low-complexity DNA sequences.

Authors:  Aleksandr Morgulis; E Michael Gertz; Alejandro A Schäffer; Richa Agarwala
Journal:  J Comput Biol       Date:  2006-06       Impact factor: 1.479

3.  Recurrent chimeric RNAs enriched in human prostate cancer identified by deep sequencing.

Authors:  Kalpana Kannan; Liguo Wang; Jianghua Wang; Michael M Ittmann; Wei Li; Laising Yen
Journal:  Proc Natl Acad Sci U S A       Date:  2011-05-12       Impact factor: 11.205

4.  Loss of expression of a 55 kDa nuclear protein (nmt55) in estrogen receptor-negative human breast cancer.

Authors:  A M Traish; Y H Huang; J Ashba; M Pronovost; M Pavao; D B McAneny; R B Moreland
Journal:  Diagn Mol Pathol       Date:  1997-08

5.  Both selective and neutral processes drive GC content evolution in the human genome.

Authors:  Uberto Pozzoli; Giorgia Menozzi; Matteo Fumagalli; Matteo Cereda; Giacomo P Comi; Rachele Cagliani; Nereo Bresolin; Manuela Sironi
Journal:  BMC Evol Biol       Date:  2008-03-27       Impact factor: 3.260

6.  A large genome center's improvements to the Illumina sequencing system.

Authors:  Michael A Quail; Iwanka Kozarewa; Frances Smith; Aylwyn Scally; Philip J Stephens; Richard Durbin; Harold Swerdlow; Daniel J Turner
Journal:  Nat Methods       Date:  2008-12       Impact factor: 28.547

7.  Effects of GC bias in next-generation-sequencing data on de novo genome assembly.

Authors:  Yen-Chun Chen; Tsunglin Liu; Chun-Hui Yu; Tzen-Yuh Chiang; Chi-Chuan Hwang
Journal:  PLoS One       Date:  2013-04-29       Impact factor: 3.240

8.  Full-length transcriptome assembly from RNA-Seq data without a reference genome.

Authors:  Manfred G Grabherr; Brian J Haas; Moran Yassour; Joshua Z Levin; Dawn A Thompson; Ido Amit; Xian Adiconis; Lin Fan; Raktima Raychowdhury; Qiandong Zeng; Zehua Chen; Evan Mauceli; Nir Hacohen; Andreas Gnirke; Nicholas Rhind; Federica di Palma; Bruce W Birren; Chad Nusbaum; Kerstin Lindblad-Toh; Nir Friedman; Aviv Regev
Journal:  Nat Biotechnol       Date:  2011-05-15       Impact factor: 54.908

9.  Bioinformatics analysis of alternative polyadenylation in green alga Chlamydomonas reinhardtii using transcriptome sequences from three different sequencing platforms.

Authors:  Zhixin Zhao; Xiaohui Wu; Praveen Kumar Raj Kumar; Min Dong; Guoli Ji; Qingshun Quinn Li; Chun Liang
Journal:  G3 (Bethesda)       Date:  2014-03-13       Impact factor: 3.154

10.  TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions.

Authors:  Daehwan Kim; Geo Pertea; Cole Trapnell; Harold Pimentel; Ryan Kelley; Steven L Salzberg
Journal:  Genome Biol       Date:  2013-04-25       Impact factor: 13.583

View more
  4 in total

Review 1.  Music of metagenomics-a review of its applications, analysis pipeline, and associated tools.

Authors:  Bilal Wajid; Faria Anwar; Imran Wajid; Haseeb Nisar; Sharoze Meraj; Ali Zafar; Mustafa Kamal Al-Shawaqfeh; Ali Riza Ekti; Asia Khatoon; Jan S Suchodolski
Journal:  Funct Integr Genomics       Date:  2021-10-18       Impact factor: 3.410

2.  A comparison of methods for multiple degree of freedom testing in repeated measures RNA-sequencing experiments.

Authors:  Elizabeth A Wynn; Brian E Vestal; Tasha E Fingerlin; Camille M Moore
Journal:  BMC Med Res Methodol       Date:  2022-05-28       Impact factor: 4.612

Review 3.  RNA-Seq differential expression analysis: An extended review and a software tool.

Authors:  Juliana Costa-Silva; Douglas Domingues; Fabricio Martins Lopes
Journal:  PLoS One       Date:  2017-12-21       Impact factor: 3.240

4.  Generative Adversarial Networks for Creating Synthetic Nucleic Acid Sequences of Cat Genome.

Authors:  Debapriya Hazra; Mi-Ryung Kim; Yung-Cheol Byun
Journal:  Int J Mol Sci       Date:  2022-03-28       Impact factor: 5.923

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.