Literature DB >> 30138581

Statistical Binning for Barcoded Reads Improves Downstream Analyses.

Ariya Shajii1, Ibrahim Numanagić2, Christopher Whelan3, Bonnie Berger4.   

Abstract

Sequencing technologies are capturing longer-range genomic information at lower error rates, enabling alignment to genomic regions that are inaccessible with short reads. However, many methods are unable to align reads to much of the genome, recognized as important in disease, and thus report erroneous results in downstream analyses. We introduce EMA, a novel two-tiered statistical binning model for barcoded read alignment, that first probabilistically maps reads to potentially multiple "read clouds" and then within clouds by newly exploiting the non-uniform read densities characteristic of barcoded read sequencing. EMA substantially improves downstream accuracy over existing methods, including phasing and genotyping on 10x data, with fewer false variant calls in nearly half the time. EMA effectively resolves particularly challenging alignments in genomic regions that contain nearby homologous elements, uncovering variants in the pharmacogenomically important CYP2D region, and clinically important genes C4 (schizophrenia) and AMY1A (obesity), which go undetected by existing methods. Our work provides a framework for future generation sequencing.
Copyright © 2018 The Author(s). Published by Elsevier Inc. All rights reserved.

Entities:  

Keywords:  barcoded short-reads; linked-reads; read mapping; third-generation sequencing

Mesh:

Substances:

Year:  2018        PMID: 30138581      PMCID: PMC6214366          DOI: 10.1016/j.cels.2018.07.005

Source DB:  PubMed          Journal:  Cell Syst        ISSN: 2405-4712            Impact factor:   10.304


  31 in total

1.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.

Authors:  Aaron McKenna; Matthew Hanna; Eric Banks; Andrey Sivachenko; Kristian Cibulskis; Andrew Kernytsky; Kiran Garimella; David Altshuler; Stacey Gabriel; Mark Daly; Mark A DePristo
Journal:  Genome Res       Date:  2010-07-19       Impact factor: 9.043

Review 2.  Genetic polymorphisms of cytochrome P450 2D6 (CYP2D6): clinical consequences, evolutionary aspects and functional diversity.

Authors:  M Ingelman-Sundberg
Journal:  Pharmacogenomics J       Date:  2005       Impact factor: 3.550

3.  Comparative Analysis of Single-Cell RNA Sequencing Methods.

Authors:  Christoph Ziegenhain; Beate Vieth; Swati Parekh; Björn Reinius; Amy Guillaumet-Adkins; Martha Smets; Heinrich Leonhardt; Holger Heyn; Ines Hellmann; Wolfgang Enard
Journal:  Mol Cell       Date:  2017-02-16       Impact factor: 17.970

4.  Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls.

Authors:  Justin M Zook; Brad Chapman; Jason Wang; David Mittelman; Oliver Hofmann; Winston Hide; Marc Salit
Journal:  Nat Biotechnol       Date:  2014-02-16       Impact factor: 54.908

5.  A framework for variation discovery and genotyping using next-generation DNA sequencing data.

Authors:  Mark A DePristo; Eric Banks; Ryan Poplin; Kiran V Garimella; Jared R Maguire; Christopher Hartl; Anthony A Philippakis; Guillermo del Angel; Manuel A Rivas; Matt Hanna; Aaron McKenna; Tim J Fennell; Andrew M Kernytsky; Andrey Y Sivachenko; Kristian Cibulskis; Stacey B Gabriel; David Altshuler; Mark J Daly
Journal:  Nat Genet       Date:  2011-04-10       Impact factor: 38.330

6.  Illumina TruSeq synthetic long-reads empower de novo assembly and resolve complex, highly-repetitive transposable elements.

Authors:  Rajiv C McCoy; Ryan W Taylor; Timothy A Blauwkamp; Joanna L Kelley; Michael Kertesz; Dmitry Pushkarev; Dmitri A Petrov; Anna-Sophie Fiston-Lavier
Journal:  PLoS One       Date:  2014-09-04       Impact factor: 3.240

Review 7.  The evolution of nanopore sequencing.

Authors:  Yue Wang; Qiuping Yang; Zhimin Wang
Journal:  Front Genet       Date:  2015-01-07       Impact factor: 4.599

8.  A hybrid approach for de novo human genome sequence assembly and phasing.

Authors:  Yulia Mostovoy; Michal Levy-Sakin; Jessica Lam; Ernest T Lam; Alex R Hastie; Patrick Marks; Joyce Lee; Catherine Chu; Chin Lin; Željko Džakula; Han Cao; Stephen A Schlebusch; Kristina Giorda; Michael Schnall-Levin; Jeffrey D Wall; Pui-Yan Kwok
Journal:  Nat Methods       Date:  2016-05-09       Impact factor: 28.547

9.  Low copy number of the salivary amylase gene predisposes to obesity.

Authors:  Mario Falchi; Julia Sarah El-Sayed Moustafa; Petros Takousis; Francesco Pesce; Amélie Bonnefond; Johanna C Andersson-Assarsson; Peter H Sudmant; Rajkumar Dorajoo; Mashael Nedham Al-Shafai; Leonardo Bottolo; Erdal Ozdemir; Hon-Cheong So; Robert W Davies; Alexandre Patrice; Robert Dent; Massimo Mangino; Pirro G Hysi; Aurélie Dechaume; Marlène Huyvaert; Jane Skinner; Marie Pigeyre; Robert Caiazzo; Violeta Raverdy; Emmanuel Vaillant; Sarah Field; Beverley Balkau; Michel Marre; Sophie Visvikis-Siest; Jacques Weill; Odile Poulain-Godefroy; Peter Jacobson; Lars Sjostrom; Christopher J Hammond; Panos Deloukas; Pak Chung Sham; Ruth McPherson; Jeannette Lee; E Shyong Tai; Robert Sladek; Lena M S Carlsson; Andrew Walley; Evan E Eichler; Francois Pattou; Timothy D Spector; Philippe Froguel
Journal:  Nat Genet       Date:  2014-03-30       Impact factor: 38.330

10.  Fast and accurate short read alignment with Burrows-Wheeler transform.

Authors:  Heng Li; Richard Durbin
Journal:  Bioinformatics       Date:  2009-05-18       Impact factor: 6.937

View more
  9 in total

Review 1.  Structural variation in the sequencing era.

Authors:  Steve S Ho; Alexander E Urban; Ryan E Mills
Journal:  Nat Rev Genet       Date:  2019-11-15       Impact factor: 53.242

2.  Efficient detection and assembly of non-reference DNA sequences with synthetic long reads.

Authors:  Dmitry Meleshko; Rui Yang; Patrick Marks; Stephen Williams; Iman Hajirasouliha
Journal:  Nucleic Acids Res       Date:  2022-10-14       Impact factor: 19.160

3.  Seq: A High-Performance Language for Bioinformatics.

Authors:  Ariya Shajii; Ibrahim Numanagić; Riyadh Baghdadi; Bonnie Berger; Saman Amarasinghe
Journal:  Proc ACM Program Lang       Date:  2019-10-10

4.  CYP2C8, CYP2C9, and CYP2C19 Characterization Using Next-Generation Sequencing and Haplotype Analysis: A GeT-RM Collaborative Project.

Authors:  Andrea Gaedigk; Erin C Boone; Steven E Scherer; Seung-Been Lee; Ibrahim Numanagić; Cenk Sahinalp; Joshua D Smith; Sean McGee; Aparna Radhakrishnan; Xiang Qin; Wendy Y Wang; Emily G Farrow; Nina Gonzaludo; Aaron L Halpern; Deborah A Nickerson; Neil A Miller; Victoria M Pratt; Lisa V Kalman
Journal:  J Mol Diagn       Date:  2022-02-05       Impact factor: 5.341

5.  Samovar: Single-Sample Mosaic Single-Nucleotide Variant Calling with Linked Reads.

Authors:  Charlotte A Darby; James R Fitch; Patrick J Brennan; Benjamin J Kelly; Natalie Bir; Vincent Magrini; Jeffrey Leonard; Catherine E Cottrell; Julie M Gastier-Foster; Richard K Wilson; Elaine R Mardis; Peter White; Ben Langmead; Michael C Schatz
Journal:  iScience       Date:  2019-05-29

6.  False gene and chromosome losses in genome assemblies caused by GC content variation and repeats.

Authors:  Juwan Kim; Chul Lee; Byung June Ko; Dong Ahn Yoo; Sohyoung Won; Adam M Phillippy; Olivier Fedrigo; Guojie Zhang; Kerstin Howe; Jonathan Wood; Richard Durbin; Giulio Formenti; Samara Brown; Lindsey Cantin; Claudio V Mello; Seoae Cho; Arang Rhie; Heebal Kim; Erich D Jarvis
Journal:  Genome Biol       Date:  2022-09-27       Impact factor: 17.906

7.  The Assembled and Annotated Genome of the Fairy-Ring Fungus Marasmius oreades.

Authors:  Markus Hiltunen; Sandra Lorena Ament-Velásquez; Hanna Johannesson
Journal:  Genome Biol Evol       Date:  2021-07-06       Impact factor: 3.416

8.  Levenshtein Distance, Sequence Comparison and Biological Database Search.

Authors:  Bonnie Berger; Michael S Waterman; Yun William Yu
Journal:  IEEE Trans Inf Theory       Date:  2020-05-21       Impact factor: 2.501

9.  Improved haplotype inference by exploiting long-range linking and allelic imbalance in RNA-seq datasets.

Authors:  Emily Berger; Deniz Yorukoglu; Lillian Zhang; Sarah K Nyquist; Alex K Shalek; Manolis Kellis; Ibrahim Numanagić; Bonnie Berger
Journal:  Nat Commun       Date:  2020-09-16       Impact factor: 14.919

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.