Literature DB >> 33673854

seqQscorer: automated quality control of next-generation sequencing data using machine learning.

Steffen Albrecht1, Maximilian Sprang1, Miguel A Andrade-Navarro1, Jean-Fred Fontaine2.   

Abstract

Controlling quality of next-generation sequencing (NGS) data files is a necessary but complex task. To address this problem, we statistically characterize common NGS quality features and develop a novel quality control procedure involving tree-based and deep learning classification algorithms. Predictive models, validated on internal and external functional genomics datasets, are to some extent generalizable to data from unseen species. The derived statistical guidelines and predictive models represent a valuable resource for users of NGS data to better understand quality issues and perform automatic quality control. Our guidelines and software are available at https://github.com/salbrec/seqQscorer .

Entities:  

Keywords:  Bioinformatics; Classification; Machine learning; Next-generation sequencing data; Quality control

Mesh:

Year:  2021        PMID: 33673854      PMCID: PMC7934511          DOI: 10.1186/s13059-021-02294-2

Source DB:  PubMed          Journal:  Genome Biol        ISSN: 1474-7596            Impact factor:   13.583


  29 in total

1.  The ENCODE (ENCyclopedia Of DNA Elements) Project.

Authors: 
Journal:  Science       Date:  2004-10-22       Impact factor: 47.728

2.  Hepatic transcriptome signatures in patients with varying degrees of nonalcoholic fatty liver disease compared with healthy normal-weight individuals.

Authors:  Malte P Suppli; Kristoffer T G Rigbolt; Sanne S Veidal; Sara Heebøll; Peter Lykke Eriksen; Mia Demant; Jonatan I Bagger; Jens Christian Nielsen; Denise Oró; Sebastian W Thrane; Asger Lund; Charlotte Strandberg; Merete J Kønig; Tina Vilsbøll; Niels Vrang; Karen L Thomsen; Henning Grønbæk; Jacob Jelsing; Henrik H Hansen; Filip K Knop
Journal:  Am J Physiol Gastrointest Liver Physiol       Date:  2019-01-17       Impact factor: 4.052

3.  ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia.

Authors:  Stephen G Landt; Georgi K Marinov; Anshul Kundaje; Pouya Kheradpour; Florencia Pauli; Serafim Batzoglou; Bradley E Bernstein; Peter Bickel; James B Brown; Philip Cayting; Yiwen Chen; Gilberto DeSalvo; Charles Epstein; Katherine I Fisher-Aylor; Ghia Euskirchen; Mark Gerstein; Jason Gertz; Alexander J Hartemink; Michael M Hoffman; Vishwanath R Iyer; Youngsook L Jung; Subhradip Karmakar; Manolis Kellis; Peter V Kharchenko; Qunhua Li; Tao Liu; X Shirley Liu; Lijia Ma; Aleksandar Milosavljevic; Richard M Myers; Peter J Park; Michael J Pazin; Marc D Perry; Debasish Raha; Timothy E Reddy; Joel Rozowsky; Noam Shoresh; Arend Sidow; Matthew Slattery; John A Stamatoyannopoulos; Michael Y Tolstorukov; Kevin P White; Simon Xi; Peggy J Farnham; Jason D Lieb; Barbara J Wold; Michael Snyder
Journal:  Genome Res       Date:  2012-09       Impact factor: 9.043

4.  NCBI GEO: archive for functional genomics data sets--update.

Authors:  Tanya Barrett; Stephen E Wilhite; Pierre Ledoux; Carlos Evangelista; Irene F Kim; Maxim Tomashevsky; Kimberly A Marshall; Katherine H Phillippy; Patti M Sherman; Michelle Holko; Andrey Yefanov; Hyeseung Lee; Naigong Zhang; Cynthia L Robertson; Nadezhda Serova; Sean Davis; Alexandra Soboleva
Journal:  Nucleic Acids Res       Date:  2012-11-27       Impact factor: 16.971

5.  A user's guide to the encyclopedia of DNA elements (ENCODE).

Authors: 
Journal:  PLoS Biol       Date:  2011-04-19       Impact factor: 8.029

6.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.

Authors:  Michael I Love; Wolfgang Huber; Simon Anders
Journal:  Genome Biol       Date:  2014       Impact factor: 13.583

7.  The Encyclopedia of DNA elements (ENCODE): data portal update.

Authors:  Carrie A Davis; Benjamin C Hitz; Cricket A Sloan; Esther T Chan; Jean M Davidson; Idan Gabdank; Jason A Hilton; Kriti Jain; Ulugbek K Baymuradov; Aditi K Narayanan; Kathrina C Onate; Keenan Graham; Stuart R Miyasato; Timothy R Dreszer; J Seth Strattan; Otto Jolanki; Forrest Y Tanaka; J Michael Cherry
Journal:  Nucleic Acids Res       Date:  2018-01-04       Impact factor: 16.971

8.  FQC Dashboard: integrates FastQC results into a web-based, interactive, and extensible FASTQ quality control tool.

Authors:  Joseph Brown; Meg Pirrung; Lee Ann McCue
Journal:  Bioinformatics       Date:  2017-06-09       Impact factor: 6.937

9.  Salmon provides fast and bias-aware quantification of transcript expression.

Authors:  Rob Patro; Geet Duggal; Michael I Love; Rafael A Irizarry; Carl Kingsford
Journal:  Nat Methods       Date:  2017-03-06       Impact factor: 28.547

10.  Fast and accurate short read alignment with Burrows-Wheeler transform.

Authors:  Heng Li; Richard Durbin
Journal:  Bioinformatics       Date:  2009-05-18       Impact factor: 6.937

View more
  3 in total

1.  Statistical guidelines for quality control of next-generation sequencing techniques.

Authors:  Maximilian Sprang; Matteo Krüger; Miguel A Andrade-Navarro; Jean-Fred Fontaine
Journal:  Life Sci Alliance       Date:  2021-08-30

2.  A quality control portal for sequencing data deposited at the European genome-phenome archive.

Authors:  Dietmar Fernández-Orth; Manuel Rueda; Babita Singh; Mauricio Moldes; Aina Jene; Marta Ferri; Claudia Vasallo; Lauren A Fromont; Arcadi Navarro; Jordi Rambla
Journal:  Brief Bioinform       Date:  2022-05-13       Impact factor: 13.994

3.  Batch effect detection and correction in RNA-seq data using machine-learning-based automated assessment of quality.

Authors:  Maximilian Sprang; Miguel A Andrade-Navarro; Jean-Fred Fontaine
Journal:  BMC Bioinformatics       Date:  2022-07-14       Impact factor: 3.307

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.