Literature DB >> 24618463

A gradient-boosting approach for filtering de novo mutations in parent-offspring trios.

Yongzhuang Liu1, Bingshan Li2, Renjie Tan1, Xiaolin Zhu2, Yadong Wang2.   

Abstract

MOTIVATION: Whole-genome and -exome sequencing on parent-offspring trios is a powerful approach to identifying disease-associated genes by detecting de novo mutations in patients. Accurate detection of de novo mutations from sequencing data is a critical step in trio-based genetic studies. Existing bioinformatic approaches usually yield high error rates due to sequencing artifacts and alignment issues, which may either miss true de novo mutations or call too many false ones, making downstream validation and analysis difficult. In particular, current approaches have much worse specificity than sensitivity, and developing effective filters to discriminate genuine from spurious de novo mutations remains an unsolved challenge.
RESULTS: In this article, we curated 59 sequence features in whole genome and exome alignment context which are considered to be relevant to discriminating true de novo mutations from artifacts, and then employed a machine-learning approach to classify candidates as true or false de novo mutations. Specifically, we built a classifier, named De Novo Mutation Filter (DNMFilter), using gradient boosting as the classification algorithm. We built the training set using experimentally validated true and false de novo mutations as well as collected false de novo mutations from an in-house large-scale exome-sequencing project. We evaluated DNMFilter's theoretical performance and investigated relative importance of different sequence features on the classification accuracy. Finally, we applied DNMFilter on our in-house whole exome trios and one CEU trio from the 1000 Genomes Project and found that DNMFilter could be coupled with commonly used de novo mutation detection approaches as an effective filtering approach to significantly reduce false discovery rate without sacrificing sensitivity. AVAILABILITY: The software DNMFilter implemented using a combination of Java and R is freely available from the website at http://humangenome.duke.edu/software.
© The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

Entities:  

Mesh:

Year:  2014        PMID: 24618463      PMCID: PMC4071207          DOI: 10.1093/bioinformatics/btu141

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  26 in total

1.  SNP detection and genotyping from low-coverage sequencing data on multiple diploid samples.

Authors:  Si Quang Le; Richard Durbin
Journal:  Genome Res       Date:  2010-10-27       Impact factor: 9.043

2.  Variation in genome-wide mutation rates within and between human families.

Authors:  Donald F Conrad; Jonathan E M Keebler; Mark A DePristo; Sarah J Lindsay; Yujun Zhang; Ferran Casals; Youssef Idaghdour; Chris L Hartl; Carlos Torroja; Kiran V Garimella; Martine Zilversmit; Reed Cartwright; Guy A Rouleau; Mark Daly; Eric A Stone; Matthew E Hurles; Philip Awadalla
Journal:  Nat Genet       Date:  2011-06-12       Impact factor: 38.330

3.  Increased exonic de novo mutation rate in individuals with schizophrenia.

Authors:  Simon L Girard; Julie Gauthier; Anne Noreau; Lan Xiong; Sirui Zhou; Loubna Jouan; Alexandre Dionne-Laporte; Dan Spiegelman; Edouard Henrion; Ousmane Diallo; Pascale Thibodeau; Isabelle Bachand; Jessie Y J Bao; Amy Hin Yan Tong; Chi-Ho Lin; Bruno Millet; Nematollah Jaafari; Ridha Joober; Patrick A Dion; Si Lok; Marie-Odile Krebs; Guy A Rouleau
Journal:  Nat Genet       Date:  2011-07-10       Impact factor: 38.330

Review 4.  Genotype and SNP calling from next-generation sequencing data.

Authors:  Rasmus Nielsen; Joshua S Paul; Anders Albrechtsen; Yun S Song
Journal:  Nat Rev Genet       Date:  2011-06       Impact factor: 53.242

5.  DeNovoGear: de novo indel and point mutation discovery and phasing.

Authors:  Avinash Ramu; Michiel J Noordam; Rachel S Schwartz; Arthur Wuster; Matthew E Hurles; Reed A Cartwright; Donald F Conrad
Journal:  Nat Methods       Date:  2013-08-25       Impact factor: 28.547

6.  A framework for variation discovery and genotyping using next-generation DNA sequencing data.

Authors:  Mark A DePristo; Eric Banks; Ryan Poplin; Kiran V Garimella; Jared R Maguire; Christopher Hartl; Anthony A Philippakis; Guillermo del Angel; Manuel A Rivas; Matt Hanna; Aaron McKenna; Tim J Fennell; Andrew M Kernytsky; Andrey Y Sivachenko; Kristian Cibulskis; Stacey B Gabriel; David Altshuler; Mark J Daly
Journal:  Nat Genet       Date:  2011-04-10       Impact factor: 38.330

7.  Exome sequencing supports a de novo mutational paradigm for schizophrenia.

Authors:  Bin Xu; J Louw Roos; Phillip Dexheimer; Braden Boone; Brooks Plummer; Shawn Levy; Joseph A Gogos; Maria Karayiorgou
Journal:  Nat Genet       Date:  2011-08-07       Impact factor: 38.330

8.  Feature-based classifiers for somatic mutation detection in tumour-normal paired sequencing data.

Authors:  Jiarui Ding; Ali Bashashati; Andrew Roth; Arusha Oloumi; Kane Tse; Thomas Zeng; Gholamreza Haffari; Martin Hirst; Marco A Marra; Anne Condon; Samuel Aparicio; Sohrab P Shah
Journal:  Bioinformatics       Date:  2011-11-13       Impact factor: 6.937

9.  An integrative variant analysis suite for whole exome next-generation sequencing data.

Authors:  Danny Challis; Jin Yu; Uday S Evani; Andrew R Jackson; Sameer Paithankar; Cristian Coarfa; Aleksandar Milosavljevic; Richard A Gibbs; Fuli Yu
Journal:  BMC Bioinformatics       Date:  2012-01-12       Impact factor: 3.169

10.  Fast and accurate short read alignment with Burrows-Wheeler transform.

Authors:  Heng Li; Richard Durbin
Journal:  Bioinformatics       Date:  2009-05-18       Impact factor: 6.937

View more
  11 in total

1.  Joint detection of copy number variations in parent-offspring trios.

Authors:  Yongzhuang Liu; Jian Liu; Jianguo Lu; Jiajie Peng; Liran Juan; Xiaolin Zhu; Bingshan Li; Yadong Wang
Journal:  Bioinformatics       Date:  2015-12-07       Impact factor: 6.937

2.  Whole-genome deep-learning analysis identifies contribution of noncoding mutations to autism risk.

Authors:  Jian Zhou; Christopher Y Park; Chandra L Theesfeld; Aaron K Wong; Yuan Yuan; Claudia Scheckel; John J Fak; Julien Funk; Kevin Yao; Yoko Tajima; Alan Packer; Robert B Darnell; Olga G Troyanskaya
Journal:  Nat Genet       Date:  2019-05-27       Impact factor: 38.330

3.  A Bayesian framework for de novo mutation calling in parents-offspring trios.

Authors:  Qiang Wei; Xiaowei Zhan; Xue Zhong; Yongzhuang Liu; Yujun Han; Wei Chen; Bingshan Li
Journal:  Bioinformatics       Date:  2014-12-21       Impact factor: 6.937

4.  Genome Sequencing of Autism-Affected Families Reveals Disruption of Putative Noncoding Regulatory DNA.

Authors:  Tychele N Turner; Fereydoun Hormozdiari; Michael H Duyzend; Sarah A McClymont; Paul W Hook; Ivan Iossifov; Archana Raja; Carl Baker; Kendra Hoekzema; Holly A Stessman; Michael C Zody; Bradley J Nelson; John Huddleston; Richard Sandstrom; Joshua D Smith; David Hanna; James M Swanson; Elaine M Faustman; Michael J Bamshad; John Stamatoyannopoulos; Deborah A Nickerson; Andrew S McCallion; Robert Darnell; Evan E Eichler
Journal:  Am J Hum Genet       Date:  2015-12-31       Impact factor: 11.025

5.  Effective Analysis of Inpatient Satisfaction: The Random Forest Algorithm.

Authors:  Chengcheng Li; Conghui Liao; Xuehui Meng; Honghua Chen; Weiling Chen; Bo Wei; Pinghua Zhu
Journal:  Patient Prefer Adherence       Date:  2021-04-07       Impact factor: 2.711

6.  Exome sequencing of multiple-sclerosis patients and their unaffected first-degree relatives.

Authors:  Sheila Garcia-Rosa; Maria Galli de Amorim; Renan Valieris; Vanessa Daccach Marques; Julio Cesar Cetrulo Lorenzi; Vania Balardin Toller; Guilherme Sciascia do Olival; Wilson Araújo da Silva Júnior; Israel Tojal da Silva; Amilton Antunes Barreira; Diana Noronha Nunes; Emmanuel Dias-Neto
Journal:  BMC Res Notes       Date:  2017-12-12

7.  Exome sequencing of 457 autism families recruited online provides evidence for autism risk genes.

Authors:  Pamela Feliciano; Xueya Zhou; Irina Astrovskaya; Tychele N Turner; Tianyun Wang; Leo Brueggeman; Rebecca Barnard; Alexander Hsieh; LeeAnne Green Snyder; Donna M Muzny; Aniko Sabo; Richard A Gibbs; Evan E Eichler; Brian J O'Roak; Jacob J Michaelson; Natalia Volfovsky; Yufeng Shen; Wendy K Chung
Journal:  NPJ Genom Med       Date:  2019-08-23       Impact factor: 8.617

8.  Contributions of de novo variants to systemic lupus erythematosus.

Authors:  Jonas Carlsson Almlöf; Sara Nystedt; Aikaterini Mechtidou; Dag Leonard; Maija-Leena Eloranta; Giorgia Grosso; Christopher Sjöwall; Anders A Bengtsson; Andreas Jönsen; Iva Gunnarsson; Elisabet Svenungsson; Lars Rönnblom; Johanna K Sandling; Ann-Christine Syvänen
Journal:  Eur J Hum Genet       Date:  2020-07-28       Impact factor: 4.246

9.  Systematic analysis of exonic germline and postzygotic de novo mutations in bipolar disorder.

Authors:  Masaki Nishioka; An-A Kazuno; Takumi Nakamura; Naomi Sakai; Takashi Hayama; Kumiko Fujii; Koji Matsuo; Atsuko Komori; Mizuho Ishiwata; Yoshinori Watanabe; Takashi Oka; Nana Matoba; Muneko Kataoka; Ahmed N Alkanaq; Kohei Hamanaka; Takashi Tsuboi; Toru Sengoku; Kazuhiro Ogata; Nakao Iwata; Masashi Ikeda; Naomichi Matsumoto; Tadafumi Kato; Atsushi Takata
Journal:  Nat Commun       Date:  2021-06-18       Impact factor: 14.919

10.  McTwo: a two-step feature selection algorithm based on maximal information coefficient.

Authors:  Ruiquan Ge; Manli Zhou; Youxi Luo; Qinghan Meng; Guoqin Mai; Dongli Ma; Guoqing Wang; Fengfeng Zhou
Journal:  BMC Bioinformatics       Date:  2016-03-23       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.