Literature DB >> 33341897

Simulation of African and non-African low and high coverage whole genome sequence data to assess variant calling approaches.

Shatha Alosaimi1, Noëlle van Biljon2, Denis Awany1, Prisca K Thami1, Joel Defo1, Jacquiline W Mugo3, Christian D Bope4, Gaston K Mazandu1,3, Nicola J Mulder3,5, Emile R Chimusa1,5.   

Abstract

Current variant calling (VC) approaches have been designed to leverage populations of long-range haplotypes and were benchmarked using populations of European descent, whereas most genetic diversity is found in non-European such as Africa populations. Working with these genetically diverse populations, VC tools may produce false positive and false negative results, which may produce misleading conclusions in prioritization of mutations, clinical relevancy and actionability of genes. The most prominent question is which tool or pipeline has a high rate of sensitivity and precision when analysing African data with either low or high sequence coverage, given the high genetic diversity and heterogeneity of this data. Here, a total of 100 synthetic Whole Genome Sequencing (WGS) samples, mimicking the genetics profile of African and European subjects for different specific coverage levels (high/low), have been generated to assess the performance of nine different VC tools on these contrasting datasets. The performances of these tools were assessed in false positive and false negative call rates by comparing the simulated golden variants to the variants identified by each VC tool. Combining our results on sensitivity and positive predictive value (PPV), VarDict [PPV = 0.999 and Matthews correlation coefficient (MCC) = 0.832] and BCFtools (PPV = 0.999 and MCC = 0.813) perform best when using African population data on high and low coverage data. Overall, current VC tools produce high false positive and false negative rates when analysing African compared with European data. This highlights the need for development of VC approaches with high sensitivity and precision tailored for populations characterized by high genetic variations and low linkage disequilibrium.
© The Author(s) 2020. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

Entities:  

Keywords:  DNA sequence; genomics; next-generation sequence; simulation; variant calling

Mesh:

Year:  2021        PMID: 33341897      PMCID: PMC8294538          DOI: 10.1093/bib/bbaa366

Source DB:  PubMed          Journal:  Brief Bioinform        ISSN: 1467-5463            Impact factor:   11.622


  37 in total

Review 1.  African genetic diversity: implications for human demographic history, modern human origins, and complex disease mapping.

Authors:  Michael C Campbell; Sarah A Tishkoff
Journal:  Annu Rev Genomics Hum Genet       Date:  2008       Impact factor: 8.929

Review 2.  Genetic studies of African populations: an overview on disease susceptibility and response to vaccines and therapeutics.

Authors:  Giorgio Sirugo; Branwen J Hennig; Adebowale A Adeyemo; Alice Matimba; Melanie J Newport; Muntaser E Ibrahim; Kelli K Ryckman; Alessandra Tacconelli; Renato Mariani-Costantini; Giuseppe Novelli; Himla Soodyall; Charles N Rotimi; Raj S Ramesar; Sarah A Tishkoff; Scott M Williams
Journal:  Hum Genet       Date:  2008-05-30       Impact factor: 4.132

Review 3.  Clinical applications of next generation sequencing in cancer: from panels, to exomes, to genomes.

Authors:  Tony Shen; Stefan Hans Pajaro-Van de Stadt; Nai Chien Yeat; Jimmy C-H Lin
Journal:  Front Genet       Date:  2015-06-17       Impact factor: 4.599

4.  Evaluating Variant Calling Tools for Non-Matched Next-Generation Sequencing Data.

Authors:  Sarah Sandmann; Aniek O de Graaf; Mohsen Karimi; Bert A van der Reijden; Eva Hellström-Lindberg; Joop H Jansen; Martin Dugas
Journal:  Sci Rep       Date:  2017-02-24       Impact factor: 4.379

5.  Fast and accurate short read alignment with Burrows-Wheeler transform.

Authors:  Heng Li; Richard Durbin
Journal:  Bioinformatics       Date:  2009-05-18       Impact factor: 6.937

6.  Variant callers for next-generation sequencing data: a comparison study.

Authors:  Xiangtao Liu; Shizhong Han; Zuoheng Wang; Joel Gelernter; Bao-Zhu Yang
Journal:  PLoS One       Date:  2013-09-27       Impact factor: 3.240

7.  Performance comparison of SNP detection tools with illumina exome sequencing data--an assessment using both family pedigree information and sample-matched SNP array data.

Authors:  Ming Yi; Yongmei Zhao; Li Jia; Mei He; Electron Kebebew; Robert M Stephens
Journal:  Nucleic Acids Res       Date:  2014-05-15       Impact factor: 16.971

Review 8.  Review of current methods, applications, and data management for the bioinformatics analysis of whole exome sequencing.

Authors:  Riyue Bao; Lei Huang; Jorge Andrade; Wei Tan; Warren A Kibbe; Hongmei Jiang; Gang Feng
Journal:  Cancer Inform       Date:  2014-09-21

9.  Systematic comparison of variant calling pipelines using gold standard personal exome variants.

Authors:  Sohyun Hwang; Eiru Kim; Insuk Lee; Edward M Marcotte
Journal:  Sci Rep       Date:  2015-12-07       Impact factor: 4.379

Review 10.  A review of somatic single nucleotide variant calling algorithms for next-generation sequencing data.

Authors:  Chang Xu
Journal:  Comput Struct Biotechnol J       Date:  2018-02-06       Impact factor: 7.271

View more
  3 in total

1.  The evaluation of Bcftools mpileup and GATK HaplotypeCaller for variant calling in non-human species.

Authors:  Messaoud Lefouili; Kiwoong Nam
Journal:  Sci Rep       Date:  2022-07-05       Impact factor: 4.996

2.  High-throughput estimation of allele frequencies using combined pooled-population sequencing and haplotype-based data processing.

Authors:  Michael Schneider; Asis Shrestha; Agim Ballvora; Jens Léon
Journal:  Plant Methods       Date:  2022-03-21       Impact factor: 4.993

3.  Human OMICs and Computational Biology Research in Africa: Current Challenges and Prospects.

Authors:  Yosr Hamdi; Lyndon Zass; Houcemeddine Othman; Fouzia Radouani; Imane Allali; Mariem Hanachi; Chiamaka Jessica Okeke; Melek Chaouch; Maureen Bilinga Tendwa; Chaimae Samtal; Reem Mohamed Sallam; Nihad Alsayed; Michael Turkson; Samah Ahmed; Alia Benkahla; Lilia Romdhane; Oussema Souiai; Özlem Tastan Bishop; Kais Ghedira; Faisal Mohamed Fadlelmola; Nicola Mulder; Samar Kamal Kassim
Journal:  OMICS       Date:  2021-04-01
  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.