Literature DB >> 27061184

Anchored pseudo-de novo assembly of human genomes identifies extensive sequence variation from unmapped sequence reads.

Joshua J Faber-Hammond1, Kim H Brown2.   

Abstract

The human genome reference (HGR) completion marked the genomics era beginning, yet despite its utility universal application is limited by the small number of individuals used in its development. This is highlighted by the presence of high-quality sequence reads failing to map within the HGR. Sequences failing to map generally represent 2-5 % of total reads, which may harbor regions that would enhance our understanding of population variation, evolution, and disease. Alternatively, complete de novo assemblies can be created, but these effectively ignore the groundwork of the HGR. In an effort to find a middle ground, we developed a bioinformatic pipeline that maps paired-end reads to the HGR as separate single reads, exports unmappable reads, de novo assembles these reads per individual and then combines assemblies into a secondary reference assembly used for comparative analysis. Using 45 diverse 1000 Genomes Project individuals, we identified 351,361 contigs covering 195.5 Mb of sequence unincorporated in GRCh38. 30,879 contigs are represented in multiple individuals with ~40 % showing high sequence complexity. Genomic coordinates were generated for 99.9 %, with 52.5 % exhibiting high-quality mapping scores. Comparative genomic analyses with archaic humans and primates revealed significant sequence alignments and comparisons with model organism RefSeq gene datasets identified novel human genes. If incorporated, these sequences will expand the HGR, but more importantly our data highlight that with this method low coverage (~10-20×) next-generation sequencing can still be used to identify novel unmapped sequences to explore biological functions contributing to human phenotypic variation, disease and functionality for personal genomic medicine.

Entities:  

Mesh:

Year:  2016        PMID: 27061184      PMCID: PMC4899208          DOI: 10.1007/s00439-016-1667-5

Source DB:  PubMed          Journal:  Hum Genet        ISSN: 0340-6717            Impact factor:   4.132


  49 in total

1.  Large-scale whole-genome sequencing of the Icelandic population.

Authors:  Daniel F Gudbjartsson; Hannes Helgason; Sigurjon A Gudjonsson; Florian Zink; Asmundur Oddson; Arnaldur Gylfason; Soren Besenbacher; Gisli Magnusson; Bjarni V Halldorsson; Eirikur Hjartarson; Gunnar Th Sigurdsson; Simon N Stacey; Michael L Frigge; Hilma Holm; Jona Saemundsdottir; Hafdis Th Helgadottir; Hrefna Johannsdottir; Gunnlaugur Sigfusson; Gudmundur Thorgeirsson; Jon Th Sverrisson; Solveig Gretarsdottir; G Bragi Walters; Thorunn Rafnar; Bjarni Thjodleifsson; Einar S Bjornsson; Sigurdur Olafsson; Hildur Thorarinsdottir; Thora Steingrimsdottir; Thora S Gudmundsdottir; Asgeir Theodors; Jon G Jonasson; Asgeir Sigurdsson; Gyda Bjornsdottir; Jon J Jonsson; Olafur Thorarensen; Petur Ludvigsson; Hakon Gudbjartsson; Gudmundur I Eyjolfsson; Olof Sigurdardottir; Isleifur Olafsson; David O Arnar; Olafur Th Magnusson; Augustine Kong; Gisli Masson; Unnur Thorsteinsdottir; Agnar Helgason; Patrick Sulem; Kari Stefansson
Journal:  Nat Genet       Date:  2015-03-25       Impact factor: 38.330

2.  African ancestry and higher prevalence of triple-negative breast cancer: findings from an international study.

Authors:  Azadeh Stark; Celina G Kleer; Iman Martin; Baffour Awuah; Anthony Nsiah-Asare; Valerie Takyi; Maria Braman; Solomon E Quayson; Richard Zarbo; Max Wicha; Lisa Newman
Journal:  Cancer       Date:  2010-11-01       Impact factor: 6.860

3.  Fast gapped-read alignment with Bowtie 2.

Authors:  Ben Langmead; Steven L Salzberg
Journal:  Nat Methods       Date:  2012-03-04       Impact factor: 28.547

Review 4.  The Human Genome Diversity Project: past, present and future.

Authors:  L Luca Cavalli-Sforza
Journal:  Nat Rev Genet       Date:  2005-04       Impact factor: 53.242

5.  Mitochondrial DNA and human evolution.

Authors:  R L Cann; M Stoneking; A C Wilson
Journal:  Nature       Date:  1987 Jan 1-7       Impact factor: 49.962

6.  ALLPATHS: de novo assembly of whole-genome shotgun microreads.

Authors:  Jonathan Butler; Iain MacCallum; Michael Kleber; Ilya A Shlyakhter; Matthew K Belmonte; Eric S Lander; Chad Nusbaum; David B Jaffe
Journal:  Genome Res       Date:  2008-03-13       Impact factor: 9.043

7.  Simplifier: a web tool to eliminate redundant NGS contigs.

Authors:  Rommel Thiago Jucá Ramos; Adriana Ribeiro Carneiro; Vasco Azevedo; Maria Paula Schneider; Debmalya Barh; Artur Silva
Journal:  Bioinformation       Date:  2012-10-13

8.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences.

Authors:  Jeremy Goecks; Anton Nekrutenko; James Taylor
Journal:  Genome Biol       Date:  2010-08-25       Impact factor: 13.583

9.  A comprehensive map of mobile element insertion polymorphisms in humans.

Authors:  Chip Stewart; Deniz Kural; Michael P Strömberg; Jerilyn A Walker; Miriam K Konkel; Adrian M Stütz; Alexander E Urban; Fabian Grubert; Hugo Y K Lam; Wan-Ping Lee; Michele Busby; Amit R Indap; Erik Garrison; Chad Huff; Jinchuan Xing; Michael P Snyder; Lynn B Jorde; Mark A Batzer; Jan O Korbel; Gabor T Marth
Journal:  PLoS Genet       Date:  2011-08-18       Impact factor: 5.917

10.  Whole genome sequence of a Turkish individual.

Authors:  Haluk Dogan; Handan Can; Hasan H Otu
Journal:  PLoS One       Date:  2014-01-09       Impact factor: 3.240

View more
  5 in total

Review 1.  Pan-genomics in the human genome era.

Authors:  Rachel M Sherman; Steven L Salzberg
Journal:  Nat Rev Genet       Date:  2020-02-07       Impact factor: 53.242

2.  Assembly and Analysis of Unmapped Genome Sequence Reads Reveal Novel Sequence and Variation in Dogs.

Authors:  Lindsay A Holden; Meharji Arumilli; Marjo K Hytönen; Sruthi Hundi; Jarkko Salojärvi; Kim H Brown; Hannes Lohi
Journal:  Sci Rep       Date:  2018-07-18       Impact factor: 4.379

3.  HUPAN: a pan-genome analysis pipeline for human genomes.

Authors:  Zhongqu Duan; Yuyang Qiao; Jinyuan Lu; Huimin Lu; Wenmin Zhang; Fazhe Yan; Chen Sun; Zhiqiang Hu; Zhen Zhang; Guichao Li; Hongzhuan Chen; Zhen Xiang; Zhenggang Zhu; Hongyu Zhao; Yingyan Yu; Chaochun Wei
Journal:  Genome Biol       Date:  2019-07-31       Impact factor: 13.583

4.  The Genomic Scrapheap Challenge; Extracting Relevant Data from Unmapped Whole Genome Sequencing Reads, Including Strain Specific Genomic Segments, in Rats.

Authors:  Robin H van der Weide; Marieke Simonis; Roel Hermsen; Pim Toonen; Edwin Cuppen; Joep de Ligt
Journal:  PLoS One       Date:  2016-08-08       Impact factor: 3.240

5.  Population-scale detection of non-reference sequence variants using colored de Bruijn Graphs.

Authors:  Thomas Krannich; W Timothy J White; Sebastian Niehus; Guillaume Holley; Bjarni V Halldórsson; Birte Kehr
Journal:  Bioinformatics       Date:  2021-11-02       Impact factor: 6.937

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.