| Literature DB >> 30643257 |
Goran Rakocevic1,2, Vladimir Semenyuk1,2, Wan-Ping Lee1, James Spencer1,2, John Browning1,2, Ivan J Johnson1,2, Vladan Arsenijevic1,2, Jelena Nadj1,2, Kaushik Ghose1,2, Maria C Suciu1,2, Sun-Gou Ji1,2, Gülfem Demir1,2, Lizao Li1,2, Berke Ç Toptaş1,2, Alexey Dolgoborodov1, Björn Pollex1,2, Iosif Spulber1, Irina Glotova1,2, Péter Kómár1,2, Andrew L Stachyra1,2, Yilong Li1,2, Milos Popovic1,2, Morten Källberg1, Amit Jain1,2, Deniz Kural3,4.
Abstract
The human reference genome serves as the foundation for genomics by providing a scaffold for alignment of sequencing reads, but currently only reflects a single consensus haplotype, thus impairing analysis accuracy. Here we present a graph reference genome implementation that enables read alignment across 2,800 diploid genomes encompassing 12.6 million SNPs and 4.0 million insertions and deletions (indels). The pipeline processes one whole-genome sequencing sample in 6.5 h using a system with 36 CPU cores. We show that using a graph genome reference improves read mapping sensitivity and produces a 0.5% increase in variant calling recall, with unaffected specificity. Structural variations incorporated into a graph genome can be genotyped accurately under a unified framework. Finally, we show that iterative augmentation of graph genomes yields incremental gains in variant calling accuracy. Our implementation is an important advance toward fulfilling the promise of graph genomes to radically enhance the scalability and accuracy of genomic analyses.Entities:
Mesh:
Year: 2019 PMID: 30643257 DOI: 10.1038/s41588-018-0316-4
Source DB: PubMed Journal: Nat Genet ISSN: 1061-4036 Impact factor: 38.330