Literature DB >> 25152046

Population clustering based on copy number variations detected from next generation sequencing data.

Junbo Duan1, Ji-Gang Zhang, Mingxi Wan, Hong-Wen Deng, Yu-Ping Wang.   

Abstract

Copy number variations (CNVs) can be used as significant bio-markers and next generation sequencing (NGS) provides a high resolution detection of these CNVs. But how to extract features from CNVs and further apply them to genomic studies such as population clustering have become a big challenge. In this paper, we propose a novel method for population clustering based on CNVs from NGS. First, CNVs are extracted from each sample to form a feature matrix. Then, this feature matrix is decomposed into the source matrix and weight matrix with non-negative matrix factorization (NMF). The source matrix consists of common CNVs that are shared by all the samples from the same group, and the weight matrix indicates the corresponding level of CNVs from each sample. Therefore, using NMF of CNVs one can differentiate samples from different ethnic groups, i.e. population clustering. To validate the approach, we applied it to the analysis of both simulation data and two real data set from the 1000 Genomes Project. The results on simulation data demonstrate that the proposed method can recover the true common CNVs with high quality. The results on the first real data analysis show that the proposed method can cluster two family trio with different ancestries into two ethnic groups and the results on the second real data analysis show that the proposed method can be applied to the whole-genome with large sample size consisting of multiple groups. Both results demonstrate the potential of the proposed method for population clustering.

Entities:  

Keywords:  1000 Genomes Project; Next generation sequencing; copy number variations; non-negative matrix factorization

Mesh:

Year:  2014        PMID: 25152046      PMCID: PMC4504183          DOI: 10.1142/S0219720014500218

Source DB:  PubMed          Journal:  J Bioinform Comput Biol        ISSN: 0219-7200            Impact factor:   1.122


  35 in total

1.  Learning the parts of objects by non-negative matrix factorization.

Authors:  D D Lee; H S Seung
Journal:  Nature       Date:  1999-10-21       Impact factor: 49.962

Review 2.  Structural variation in the human genome and its role in disease.

Authors:  Paweł Stankiewicz; James R Lupski
Journal:  Annu Rev Med       Date:  2010       Impact factor: 13.739

Review 3.  Computational methods for discovering structural variation with next-generation sequencing.

Authors:  Paul Medvedev; Monica Stanciu; Michael Brudno
Journal:  Nat Methods       Date:  2009-11       Impact factor: 28.547

4.  Comparative studies of de novo assembly tools for next-generation sequencing technologies.

Authors:  Yong Lin; Jian Li; Hui Shen; Lei Zhang; Christopher J Papasian; Hong-Wen Deng
Journal:  Bioinformatics       Date:  2011-06-02       Impact factor: 6.937

5.  Strong association of de novo copy number mutations with autism.

Authors:  Jonathan Sebat; B Lakshmi; Dheeraj Malhotra; Jennifer Troge; Christa Lese-Martin; Tom Walsh; Boris Yamrom; Seungtai Yoon; Alex Krasnitz; Jude Kendall; Anthony Leotta; Deepa Pai; Ray Zhang; Yoon-Ha Lee; James Hicks; Sarah J Spence; Annette T Lee; Kaija Puura; Terho Lehtimäki; David Ledbetter; Peter K Gregersen; Joel Bregman; James S Sutcliffe; Vaidehi Jobanputra; Wendy Chung; Dorothy Warburton; Mary-Claire King; David Skuse; Daniel H Geschwind; T Conrad Gilliam; Kenny Ye; Michael Wigler
Journal:  Science       Date:  2007-03-15       Impact factor: 47.728

Review 6.  RNA-Seq: a revolutionary tool for transcriptomics.

Authors:  Zhong Wang; Mark Gerstein; Michael Snyder
Journal:  Nat Rev Genet       Date:  2009-01       Impact factor: 53.242

7.  Copy number variation signature to predict human ancestry.

Authors:  Melissa Pronold; Marzieh Vali; Roger Pique-Regi; Shahab Asgharzadeh
Journal:  BMC Bioinformatics       Date:  2012-12-27       Impact factor: 3.169

8.  Bioinformatics for next generation sequencing data.

Authors:  Alberto Magi; Matteo Benelli; Alessia Gozzini; Francesca Girolami; Francesca Torricelli; Maria Luisa Brandi
Journal:  Genes (Basel)       Date:  2010-09-14       Impact factor: 4.096

9.  Large recurrent microdeletions associated with schizophrenia.

Authors:  Hreinn Stefansson; Dan Rujescu; Sven Cichon; Olli P H Pietiläinen; Andres Ingason; Stacy Steinberg; Ragnheidur Fossdal; Engilbert Sigurdsson; Thordur Sigmundsson; Jacobine E Buizer-Voskamp; Thomas Hansen; Klaus D Jakobsen; Pierandrea Muglia; Clyde Francks; Paul M Matthews; Arnaldur Gylfason; Bjarni V Halldorsson; Daniel Gudbjartsson; Thorgeir E Thorgeirsson; Asgeir Sigurdsson; Adalbjorg Jonasdottir; Aslaug Jonasdottir; Asgeir Bjornsson; Sigurborg Mattiasdottir; Thorarinn Blondal; Magnus Haraldsson; Brynja B Magnusdottir; Ina Giegling; Hans-Jürgen Möller; Annette Hartmann; Kevin V Shianna; Dongliang Ge; Anna C Need; Caroline Crombie; Gillian Fraser; Nicholas Walker; Jouko Lonnqvist; Jaana Suvisaari; Annamarie Tuulio-Henriksson; Tiina Paunio; Timi Toulopoulou; Elvira Bramon; Marta Di Forti; Robin Murray; Mirella Ruggeri; Evangelos Vassos; Sarah Tosato; Muriel Walshe; Tao Li; Catalina Vasilescu; Thomas W Mühleisen; August G Wang; Henrik Ullum; Srdjan Djurovic; Ingrid Melle; Jes Olesen; Lambertus A Kiemeney; Barbara Franke; Chiara Sabatti; Nelson B Freimer; Jeffrey R Gulcher; Unnur Thorsteinsdottir; Augustine Kong; Ole A Andreassen; Roel A Ophoff; Alexander Georgi; Marcella Rietschel; Thomas Werge; Hannes Petursson; David B Goldstein; Markus M Nöthen; Leena Peltonen; David A Collier; David St Clair; Kari Stefansson
Journal:  Nature       Date:  2008-09-11       Impact factor: 49.962

10.  CNV-seq, a new method to detect copy number variation using high-throughput sequencing.

Authors:  Chao Xie; Martti T Tammi
Journal:  BMC Bioinformatics       Date:  2009-03-06       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.