Literature DB >> 30722045

EnsembleCNV: an ensemble machine learning algorithm to identify and genotype copy number variation using SNP array data.

Zhongyang Zhang1,2, Haoxiang Cheng1,2, Xiumei Hong3, Antonio F Di Narzo1,2, Oscar Franzen4, Shouneng Peng1,2, Arno Ruusalepp5, Jason C Kovacic6, Johan L M Bjorkegren1,2,4, Xiaobin Wang3,7, Ke Hao1,2,8,9.   

Abstract

The associations between diseases/traits and copy number variants (CNVs) have not been systematically investigated in genome-wide association studies (GWASs), primarily due to a lack of robust and accurate tools for CNV genotyping. Herein, we propose a novel ensemble learning framework, ensembleCNV, to detect and genotype CNVs using single nucleotide polymorphism (SNP) array data. EnsembleCNV (a) identifies and eliminates batch effects at raw data level; (b) assembles individual CNV calls into CNV regions (CNVRs) from multiple existing callers with complementary strengths by a heuristic algorithm; (c) re-genotypes each CNVR with local likelihood model adjusted by global information across multiple CNVRs; (d) refines CNVR boundaries by local correlation structure in copy number intensities; (e) provides direct CNV genotyping accompanied with confidence score, directly accessible for downstream quality control and association analysis. Benchmarked on two large datasets, ensembleCNV outperformed competing methods and achieved a high call rate (93.3%) and reproducibility (98.6%), while concurrently achieving high sensitivity by capturing 85% of common CNVs documented in the 1000 Genomes Project. Given this CNV call rate and accuracy, which are comparable to SNP genotyping, we suggest ensembleCNV holds significant promise for performing genome-wide CNV association studies and investigating how CNVs predispose to human diseases.
© The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Year:  2019        PMID: 30722045      PMCID: PMC6468244          DOI: 10.1093/nar/gkz068

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


  42 in total

1.  Spatial smoothing and hot spot detection for CGH data using the fused lasso.

Authors:  Robert Tibshirani; Pei Wang
Journal:  Biostatistics       Date:  2007-05-18       Impact factor: 5.899

Review 2.  Copy-number variation and association studies of human disease.

Authors:  Steven A McCarroll; David M Altshuler
Journal:  Nat Genet       Date:  2007-07       Impact factor: 38.330

3.  Discovery and statistical genotyping of copy-number variation from whole-exome sequencing depth.

Authors:  Menachem Fromer; Jennifer L Moran; Kimberly Chambert; Eric Banks; Sarah E Bergen; Douglas M Ruderfer; Robert E Handsaker; Steven A McCarroll; Michael C O'Donovan; Michael J Owen; George Kirov; Patrick F Sullivan; Christina M Hultman; Pamela Sklar; Shaun M Purcell
Journal:  Am J Hum Genet       Date:  2012-10-05       Impact factor: 11.025

4.  Comprehensive assessment of array-based platforms and calling algorithms for detection of copy number variants.

Authors:  Dalila Pinto; Katayoon Darvishi; Xinghua Shi; Diana Rajan; Diane Rigler; Tom Fitzgerald; Anath C Lionel; Bhooma Thiruvahindrapuram; Jeffrey R Macdonald; Ryan Mills; Aparna Prasad; Kristin Noonan; Susan Gribble; Elena Prigmore; Patricia K Donahoe; Richard S Smith; Ji Hyeon Park; Matthew E Hurles; Nigel P Carter; Charles Lee; Stephen W Scherer; Lars Feuk
Journal:  Nat Biotechnol       Date:  2011-05-08       Impact factor: 54.908

Review 5.  Genome structural variation discovery and genotyping.

Authors:  Can Alkan; Bradley P Coe; Evan E Eichler
Journal:  Nat Rev Genet       Date:  2011-03-01       Impact factor: 53.242

6.  Human copy number variation and complex genetic disease.

Authors:  Santhosh Girirajan; Catarina D Campbell; Evan E Eichler
Journal:  Annu Rev Genet       Date:  2011-08-19       Impact factor: 16.830

7.  Reconstructing DNA copy number by joint segmentation of multiple sequences.

Authors:  Zhongyang Zhang; Kenneth Lange; Chiara Sabatti
Journal:  BMC Bioinformatics       Date:  2012-08-16       Impact factor: 3.169

8.  A framework for variation discovery and genotyping using next-generation DNA sequencing data.

Authors:  Mark A DePristo; Eric Banks; Ryan Poplin; Kiran V Garimella; Jared R Maguire; Christopher Hartl; Anthony A Philippakis; Guillermo del Angel; Manuel A Rivas; Matt Hanna; Aaron McKenna; Tim J Fennell; Andrew M Kernytsky; Andrey Y Sivachenko; Kristian Cibulskis; Stacey B Gabriel; David Altshuler; Mark J Daly
Journal:  Nat Genet       Date:  2011-04-10       Impact factor: 38.330

9.  The NHGRI GWAS Catalog, a curated resource of SNP-trait associations.

Authors:  Danielle Welter; Jacqueline MacArthur; Joannella Morales; Tony Burdett; Peggy Hall; Heather Junkins; Alan Klemm; Paul Flicek; Teri Manolio; Lucia Hindorff; Helen Parkinson
Journal:  Nucleic Acids Res       Date:  2013-12-06       Impact factor: 16.971

10.  Meta-analysis of five genome-wide association studies identifies multiple new loci associated with testicular germ cell tumor.

Authors:  Zhaoming Wang; Katherine A McGlynn; Ewa Rajpert-De Meyts; D Timothy Bishop; Charles C Chung; Marlene D Dalgaard; Mark H Greene; Ramneek Gupta; Tom Grotmol; Trine B Haugen; Robert Karlsson; Kevin Litchfield; Nandita Mitra; Kasper Nielsen; Louise C Pyle; Stephen M Schwartz; Vésteinn Thorsson; Saran Vardhanabhuti; Fredrik Wiklund; Clare Turnbull; Stephen J Chanock; Peter A Kanetsky; Katherine L Nathanson
Journal:  Nat Genet       Date:  2017-06-12       Impact factor: 38.330

View more
  5 in total

1.  Multiclass Cancer Prediction Based on Copy Number Variation Using Deep Learning.

Authors:  Haleema Attique; Sajid Shah; Saima Jabeen; Fiaz Gul Khan; Ahmad Khan; Mohammed ELAffendi
Journal:  Comput Intell Neurosci       Date:  2022-06-09

Review 2.  Implications of germline copy-number variations in psychiatric disorders: review of large-scale genetic studies.

Authors:  Masahiro Nakatochi; Itaru Kushima; Norio Ozaki
Journal:  J Hum Genet       Date:  2020-09-21       Impact factor: 3.172

Review 3.  Progress in Methods for Copy Number Variation Profiling.

Authors:  Veronika Gordeeva; Elena Sharova; Georgij Arapidi
Journal:  Int J Mol Sci       Date:  2022-02-15       Impact factor: 5.923

4.  A Novel Computational Framework to Predict Disease-Related Copy Number Variations by Integrating Multiple Data Sources.

Authors:  Lin Yuan; Tao Sun; Jing Zhao; Zhen Shen
Journal:  Front Genet       Date:  2021-06-29       Impact factor: 4.599

5.  A genome-wide analysis of copy number variation in Murciano-Granadina goats.

Authors:  Dailu Guan; Amparo Martínez; Anna Castelló; Vincenzo Landi; María Gracia Luigi-Sierra; Javier Fernández-Álvarez; Betlem Cabrera; Juan Vicente Delgado; Xavier Such; Jordi Jordana; Marcel Amills
Journal:  Genet Sel Evol       Date:  2020-08-08       Impact factor: 4.297

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.