Literature DB >> 33035220

A cautionary note on the use of unsupervised machine learning algorithms to characterise malaria parasite population structure from genetic distance matrices.

James A Watson1,2, Aimee R Taylor3,4, Elizabeth A Ashley2,5, Arjen Dondorp1,2, Caroline O Buckee3, Nicholas J White1,2, Chris C Holmes6,7.   

Abstract

Genetic surveillance of malaria parasites supports malaria control programmes, treatment guidelines and elimination strategies. Surveillance studies often pose questions about malaria parasite ancestry (e.g. how antimalarial resistance has spread) and employ statistical methods that characterise parasite population structure. Many of the methods used to characterise structure are unsupervised machine learning algorithms which depend on a genetic distance matrix, notably principal coordinates analysis (PCoA) and hierarchical agglomerative clustering (HAC). PCoA and HAC are sensitive to both the definition of genetic distance and algorithmic specification. Importantly, neither algorithm infers malaria parasite ancestry. As such, PCoA and HAC can inform (e.g. via exploratory data visualisation and hypothesis generation), but not answer comprehensively, key questions about malaria parasite ancestry. We illustrate the sensitivity of PCoA and HAC using 393 Plasmodium falciparum whole genome sequences collected from Cambodia and neighbouring regions (where antimalarial resistance has emerged and spread recently) and we provide tentative guidance for the use and interpretation of PCoA and HAC in malaria parasite genetic epidemiology. This guidance includes a call for fully transparent and reproducible analysis pipelines that feature (i) a clearly outlined scientific question; (ii) a clear justification of analytical methods used to answer the scientific question along with discussion of any inferential limitations; (iii) publicly available genetic distance matrices when downstream analyses depend on them; and (iv) sensitivity analyses. To bridge the inferential disconnect between the output of non-inferential unsupervised learning algorithms and the scientific questions of interest, tailor-made statistical models are needed to infer malaria parasite ancestry. In the absence of such models speculative reasoning should feature only as discussion but not as results.

Entities:  

Mesh:

Substances:

Year:  2020        PMID: 33035220      PMCID: PMC7577480          DOI: 10.1371/journal.pgen.1009037

Source DB:  PubMed          Journal:  PLoS Genet        ISSN: 1553-7390            Impact factor:   5.917


  43 in total

1.  Spread of a single multidrug resistant malaria parasite lineage (PfPailin) to Vietnam.

Authors:  Mallika Imwong; Tran T Hien; Nguyen T Thuy-Nhien; Arjen M Dondorp; Nicholas J White
Journal:  Lancet Infect Dis       Date:  2017-10       Impact factor: 25.071

2.  The influence of family groups on inferences made with the program Structure.

Authors:  E C Anderson; K K Dunham
Journal:  Mol Ecol Resour       Date:  2008-11       Impact factor: 7.090

3.  The fine-scale genetic structure of the British population.

Authors:  Stephen Leslie; Bruce Winney; Garrett Hellenthal; Dan Davison; Abdelhamid Boumertit; Tammy Day; Katarzyna Hutnik; Ellen C Royrvik; Barry Cunliffe; Daniel J Lawson; Daniel Falush; Colin Freeman; Matti Pirinen; Simon Myers; Mark Robinson; Peter Donnelly; Walter Bodmer
Journal:  Nature       Date:  2015-03-19       Impact factor: 49.962

4.  Tanglegrams for rooted phylogenetic trees and networks.

Authors:  Celine Scornavacca; Franziska Zickmann; Daniel H Huson
Journal:  Bioinformatics       Date:  2011-07-01       Impact factor: 6.937

5.  Malaria in Venezuela: changes in the complexity of infection reflects the increment in transmission intensity.

Authors:  M Andreína Pacheco; David A Forero-Peña; Kristan A Schneider; Melynar Chavero; Angel Gamardo; Luisamy Figuera; Esha R Kadakia; María E Grillet; Joseli Oliveira-Ferreira; Ananias A Escalante
Journal:  Malar J       Date:  2020-05-07       Impact factor: 2.979

6.  Ten quick tips for effective dimensionality reduction.

Authors:  Lan Huong Nguyen; Susan Holmes
Journal:  PLoS Comput Biol       Date:  2019-06-20       Impact factor: 4.475

7.  Estimating Relatedness Between Malaria Parasites.

Authors:  Aimee R Taylor; Pierre E Jacob; Daniel E Neafsey; Caroline O Buckee
Journal:  Genetics       Date:  2019-06-17       Impact factor: 4.562

Review 8.  Supervised Machine Learning for Population Genetics: A New Paradigm.

Authors:  Daniel R Schrider; Andrew D Kern
Journal:  Trends Genet       Date:  2018-01-10       Impact factor: 11.639

9.  De Novo Mutations Resolve Disease Transmission Pathways in Clonal Malaria.

Authors:  Seth N Redmond; Bronwyn M MacInnis; Selina Bopp; Amy K Bei; Daouda Ndiaye; Daniel L Hartl; Dyann F Wirth; Sarah K Volkman; Daniel E Neafsey
Journal:  Mol Biol Evol       Date:  2018-07-01       Impact factor: 16.240

10.  Genomic analysis of a pre-elimination Malaysian Plasmodium vivax population reveals selective pressures and changing transmission dynamics.

Authors:  Sarah Auburn; Ernest D Benavente; Olivo Miotto; Richard D Pearson; Roberto Amato; Matthew J Grigg; Bridget E Barber; Timothy William; Irene Handayuni; Jutta Marfurt; Hidayat Trimarsanto; Rintis Noviyanti; Kanlaya Sriprawat; Francois Nosten; Susana Campino; Taane G Clark; Nicholas M Anstey; Dominic P Kwiatkowski; Ric N Price
Journal:  Nat Commun       Date:  2018-07-03       Impact factor: 14.919

View more
  2 in total

Review 1.  Advances and opportunities in malaria population genomics.

Authors:  Daniel E Neafsey; Aimee R Taylor; Bronwyn L MacInnis
Journal:  Nat Rev Genet       Date:  2021-04-08       Impact factor: 59.581

2.  Identity-by-descent with uncertainty characterises connectivity of Plasmodium falciparum populations on the Colombian-Pacific coast.

Authors:  Aimee R Taylor; Diego F Echeverry; Timothy J C Anderson; Daniel E Neafsey; Caroline O Buckee
Journal:  PLoS Genet       Date:  2020-11-16       Impact factor: 5.917

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.