Literature DB >> 32627174

Rapid diagnosis of hereditary haemolytic anaemias using automated rheoscopy and supervised machine learning.

Pedro L Moura1,2, Johannes G G Dobbe3, Geert J Streekstra3, Minke A E Rab4,5, Martijn Veldthuis6, Elisa Fermo7, Richard van Wijk4, Rob van Zwieten6,8, Paola Bianchi7, Ashley M Toye1,2,9, Timothy J Satchwell1,2,9.   

Abstract

Entities:  

Mesh:

Substances:

Year:  2020        PMID: 32627174      PMCID: PMC8221027          DOI: 10.1111/bjh.16868

Source DB:  PubMed          Journal:  Br J Haematol        ISSN: 0007-1048            Impact factor:   6.998


× No keyword cloud information.
Haemolytic anaemias arise when red blood cell (RBC) integrity is compromised, eventually resulting in premature clearance or lysis and leading to anaemia when these effects cannot be sufficiently compensated by the capacity of the bone marrow to produce new cells. Hereditary anaemia occurs as a consequence of genetic mutation (e.g. affecting membrane complex or cytoskeletal proteins, haemoglobin or metabolic enzymes), and diagnosing affected patients is a complex process since, given the wide variety of possible genetic causes, multiple examinations must be performed and an unambiguous result is usually reached only after DNA sequencing. Furthermore, phenotypic severity can vary widely not just among individuals with different mutations but also among individuals suffering from the same mutation, thereby complicating diagnosis. While molecular diagnoses have become increasingly easier, cheaper and faster to perform in recent years, constraints on their use still exist, and phenotype‐based diagnostic methods still constitute an important proposition. Ektacytometry is a standard diagnostic platform for RBC disorders , but only provides cell population‐based data and requires a trained expert for data interpretation. Single‐cell rheoscopy can provide additional information, with higher complexity as a drawback; however, analysis of such data could potentially be facilitated by the use of machine learning (ML, automated, algorithm‐based systems that generate data‐driven predictions ). We present here a preliminary framework for automated rheoscopy‐based diagnosis of several types of hereditary haemolytic anaemia samples Fig  that requires low sample volumes and is efficient, rapid and expandable.
Fig 1

Different hereditary rare anaemias display distinct area and deformability profiles. (A) Design of the method for automatic sample classification. Whole blood is collected by the clinician, and a sample is obtained and processed using an Automated Rheoscope and Cell Analyzer (ARCA). Images acquired are subjected to computational analysis to determine cross‐sectional area and deformability of at least 1000 individual cells, and the resulting datasets are then classified through trained computational models, achieving a diagnosis in less than 30 min. (B) Contour plots of cross‐sectional area plotted against the deformability index (as measured by dividing cell length by cell width), visualizing the probability distribution of erythrocytes (RBCs), cultured reticulocytes (reticulocytes) and erythrocytes treated with an anti‐Glycophorin A antibody (BRIC256, International Blood Group Reference Laboratory) before analysis to induce membrane stiffening (BRIC256 RBCs). The control erythrocyte and cultured reticulocyte data shown in this panel were previously reported in Moura et al.9 A minimum of 1000 cells were analysed per sample. All samples were analysed using the ARCA. (C) Contour plots of cross‐sectional area plotted against the deformability index (as measured by dividing cell length by cell width), visualizing the probability distribution of patient samples overlaid to allow for comparison with healthy controls. A minimum of 1000 cells were analysed per blood sample. All samples were analysed using the ARCA. The samples are listed from left to right: Top row: healthy controls (n = 6), hereditary spherocytosis patients (n = 13), congenital dyserythropoietic anaemia II patients (n = 9). Bottom row: pyruvate kinase deficiency patients (n = 6), dehydrated stomatocytosis type 1 or hereditary xerocytosis patients (n = 10), dehydrated stomatocytosis type 2 or Gardos xerocytosis patients (n = 3).

Different hereditary rare anaemias display distinct area and deformability profiles. (A) Design of the method for automatic sample classification. Whole blood is collected by the clinician, and a sample is obtained and processed using an Automated Rheoscope and Cell Analyzer (ARCA). Images acquired are subjected to computational analysis to determine cross‐sectional area and deformability of at least 1000 individual cells, and the resulting datasets are then classified through trained computational models, achieving a diagnosis in less than 30 min. (B) Contour plots of cross‐sectional area plotted against the deformability index (as measured by dividing cell length by cell width), visualizing the probability distribution of erythrocytes (RBCs), cultured reticulocytes (reticulocytes) and erythrocytes treated with an anti‐Glycophorin A antibody (BRIC256, International Blood Group Reference Laboratory) before analysis to induce membrane stiffening (BRIC256 RBCs). The control erythrocyte and cultured reticulocyte data shown in this panel were previously reported in Moura et al.9 A minimum of 1000 cells were analysed per sample. All samples were analysed using the ARCA. (C) Contour plots of cross‐sectional area plotted against the deformability index (as measured by dividing cell length by cell width), visualizing the probability distribution of patient samples overlaid to allow for comparison with healthy controls. A minimum of 1000 cells were analysed per blood sample. All samples were analysed using the ARCA. The samples are listed from left to right: Top row: healthy controls (n = 6), hereditary spherocytosis patients (n = 13), congenital dyserythropoietic anaemia II patients (n = 9). Bottom row: pyruvate kinase deficiency patients (n = 6), dehydrated stomatocytosis type 1 or hereditary xerocytosis patients (n = 10), dehydrated stomatocytosis type 2 or Gardos xerocytosis patients (n = 3).

Materials and methods

Peripheral blood donor and patient samples

Healthy control donor and diagnosed patient samples were collected according to procedures approved by the research ethics committee and in accordance with the Declaration of Helsinki. In all, 47 blood samples were analysed at the University of Bristol (United Kingdom) following shipment from clinics in Milan (Italy) or Utrecht (the Netherlands) [6 controls, 13 hereditary spherocytosis (HS) patients, 9 congenital dyserythropoietic anaemia type II (CDAII) patients, 6 pyruvate kinase deficiency (PKD) patients, 10 hereditary xerocytosis/ dehydrated hereditary stomatocytosis (DHS) 1 (HX) patients and 3 Gardos xerocytosis/ DHS2 (GX) patients]. A further 26 samples (11 controls, 7 HS patients and 8 hereditary elliptocytosis [HE] patients) were analysed at Sanquin (Amsterdam, the Netherlands).

Automated Rheoscope and Cell Analyser

An amount of 1 µl of whole blood was diluted in 200 µl of a polyvinylpyrrolidone solution (viscosity 28·1 mPa·s). Samples were assessed in an Automated Rheoscope and Cell Analyser (ARCA) according to published protocols. At least 1000 cells per sample were analysed, providing the deformability index (DI) and cross‐sectional area (area) quantification.

Computational analysis

A Python script was developed for statistical analysis, data visualisation and automatic dataset classification (Data availability). The full datasets used for training purposes were sampled and randomised into testing (500 cells) and training datasets (remainder). Deformability Index (DI) and area were normalised by the maximum measurable values (3.3/5.0 DI from Bristol and Sanquin, respectively, and 140 μm2 area) and the training datasets were repeatedly subjected to random sampling to generate 10,000 subsets of 500 cells each, followed by calculation of the average and standard deviation of the DI and area. Each sample category was then attributed unique identifiers. Classifiers were generated with the scikit‐learn package, trained with the generated subsets and tested with the initial testing subsets. Classification of unseen datasets was performed by selecting the mode of the machine‐selected identifiers after 10,000 classifications.

Results and discussion

We have demonstrated in previous work that automated rheoscopy‐based analyses can elucidate differences arising from reticulocyte maturation as well as loss of cellular stability. A particularly interesting observation from the same work was the fact that combining the single‐cell deformability index (DI) and cross‐sectional area measurements provides a novel metric (Fig 1B) which to date has not been examined in the context of disease diagnosis. Therefore, we evaluated whole blood samples from diagnosed anaemic patients of varied aetiologies (HS, CDAII, PKD, HX and GX) against healthy donors using the proposed methodology (Fig 1C). Crucially, despite these diseases being frequently misdiagnosed due to overlapping clinical or morphological phenotypes, , we observed them to display unique rheoscopy “fingerprints” upon visualisation. Machine‐learning algorithms were next explored to automate the classification of ARCA data and thus facilitate the processing of larger numbers of samples, A flow chart listing the procedure used for these attempts is displayed in Fig 2A.
Fig 2

Machine‐learning‐based classification of automated rheoscopy datasets provides accurate diagnoses for unseen samples. (A) Flow diagram outlining the procedure for ARCA‐based data visualisation and automated sample classification. The sample is first analysed to produce a raw data table. These data are then reorganised into a Python pandas (“panel data”) data frame for ease of processing. If visualisation is required, samples from a given sample type are stochastically equalised in cell number, joined and subjected to kernel density estimation to estimate the probability density functions of analysed features (e.g. cross‐sectional area, deformability index, cell angle) and then visualized through contour plots or scatter plots. Data to be used for machine learning undergo feature extraction (removal of all non‐essential information) and a subsection is sampled randomly (without reposition) for creation of a testing set. The remaining data then undergo augmentation by generation of a series of randomly sampled datasets (with reposition, 10,000×) which will be used for training a supervised machine‐learning algorithm. After training, a predictive model (i.e. classifier) is generated which first is tested with the previously generated testing set. Upon satisfactory results with the testing set, the classifier can then generate predictions for new unseen data. The final results consist of a sample label (or classification) and the certainty of that classification (B) Comparison of the overall prediction accuracy of multiple supervised machine‐learning algorithms in ARCA‐based automated sample diagnosis as a function of the number of datasets per condition used for classifier training (from no datasets used, which should result in a random diagnosis, to a maximum of six datasets), comparing the samples analysed at the University of Bristol (except Gardos xerocytosis samples, which were too few to analyse). Prediction accuracy is coloured on a percentage scale from red (0%) to blue (100%). The best‐performing algorithm per no. of datasets is bolded in the accuracy matrix. The graph displays the average prediction accuracy of all algorithms (blue). Error bars = ± standard deviation (SD). The prediction accuracies of the best‐performing algorithms are plotted in green, while the prediction accuracies of the worst‐performing algorithms are plotted in red. (C) Prediction accuracy of the best performing algorithm in (B). The samples used consist of healthy controls, congenital dyserythropoietic anaemia II patients (CDAII), hereditary spherocytosis patients (HS), hereditary xerocytosis patients (HX) and pyruvate kinase deficiency patients (PKD). Rows identify real samples provided, whilst columns identify the algorithm's prediction of the provided samples' identity. The blue diagonal indicates samples that were correctly diagnosed (true positives). Red cells in the surrounding matrix indicate incorrect diagnoses (i.e. two HS samples were misdiagnosed as CDAII and one HX sample was misdiagnosed as HS). Accuracy is provided as a percentage of the true positives within the total number of samples and is coloured on a percentage scale from red (0%) to blue (100%). Average accuracy is provided as an average of the accuracies for all sample types. Data for all other algorithms and sample numbers tested are provided in Figs S1–S7. (D) Comparison of the overall prediction accuracy of multiple supervised machine‐learning algorithms in ARCA‐based automated sample diagnosis as a function of the number of datasets used for training, comparing samples from healthy controls, hereditary spherocytosis patients and hereditary elliptocytosis patients analysed at Sanquin. The graph displays the average prediction accuracy of all algorithms (blue). Error bars = ±SD. The prediction accuracies of the best‐performing algorithms are plotted in green, while the prediction accuracies of the worst‐performing algorithms are plotted in red.

Machine‐learning‐based classification of automated rheoscopy datasets provides accurate diagnoses for unseen samples. (A) Flow diagram outlining the procedure for ARCA‐based data visualisation and automated sample classification. The sample is first analysed to produce a raw data table. These data are then reorganised into a Python pandas (“panel data”) data frame for ease of processing. If visualisation is required, samples from a given sample type are stochastically equalised in cell number, joined and subjected to kernel density estimation to estimate the probability density functions of analysed features (e.g. cross‐sectional area, deformability index, cell angle) and then visualized through contour plots or scatter plots. Data to be used for machine learning undergo feature extraction (removal of all non‐essential information) and a subsection is sampled randomly (without reposition) for creation of a testing set. The remaining data then undergo augmentation by generation of a series of randomly sampled datasets (with reposition, 10,000×) which will be used for training a supervised machine‐learning algorithm. After training, a predictive model (i.e. classifier) is generated which first is tested with the previously generated testing set. Upon satisfactory results with the testing set, the classifier can then generate predictions for new unseen data. The final results consist of a sample label (or classification) and the certainty of that classification (B) Comparison of the overall prediction accuracy of multiple supervised machine‐learning algorithms in ARCA‐based automated sample diagnosis as a function of the number of datasets per condition used for classifier training (from no datasets used, which should result in a random diagnosis, to a maximum of six datasets), comparing the samples analysed at the University of Bristol (except Gardos xerocytosis samples, which were too few to analyse). Prediction accuracy is coloured on a percentage scale from red (0%) to blue (100%). The best‐performing algorithm per no. of datasets is bolded in the accuracy matrix. The graph displays the average prediction accuracy of all algorithms (blue). Error bars = ± standard deviation (SD). The prediction accuracies of the best‐performing algorithms are plotted in green, while the prediction accuracies of the worst‐performing algorithms are plotted in red. (C) Prediction accuracy of the best performing algorithm in (B). The samples used consist of healthy controls, congenital dyserythropoietic anaemia II patients (CDAII), hereditary spherocytosis patients (HS), hereditary xerocytosis patients (HX) and pyruvate kinase deficiency patients (PKD). Rows identify real samples provided, whilst columns identify the algorithm's prediction of the provided samples' identity. The blue diagonal indicates samples that were correctly diagnosed (true positives). Red cells in the surrounding matrix indicate incorrect diagnoses (i.e. two HS samples were misdiagnosed as CDAII and one HX sample was misdiagnosed as HS). Accuracy is provided as a percentage of the true positives within the total number of samples and is coloured on a percentage scale from red (0%) to blue (100%). Average accuracy is provided as an average of the accuracies for all sample types. Data for all other algorithms and sample numbers tested are provided in Figs S1–S7. (D) Comparison of the overall prediction accuracy of multiple supervised machine‐learning algorithms in ARCA‐based automated sample diagnosis as a function of the number of datasets used for training, comparing samples from healthy controls, hereditary spherocytosis patients and hereditary elliptocytosis patients analysed at Sanquin. The graph displays the average prediction accuracy of all algorithms (blue). Error bars = ±SD. The prediction accuracies of the best‐performing algorithms are plotted in green, while the prediction accuracies of the worst‐performing algorithms are plotted in red. To provide sufficient information for training a ML classifier, the data were augmented through random sampling, vastly extending the number of new datasets with similar characteristics. We then tested the trained classifiers on a combination of fully unseen data and the testing sets generated before augmentation. A full summary of the prediction accuracies achieved (and listing the best performing classifiers) is provided in Fig 2B with the best performing algorithm correctly identifying sample datasets with 92% accuracy (Fig 2C). We note that the GX samples were excluded due to the sample number being too low for classifier training. For further verification, the classifiers were retrained on additional samples (11 controls, 7 HS patients and 8 HE patients) obtained on a second ARCA device in an independent laboratory and using different acquisition settings. Again, we observed increasing classification accuracy up to the use of six training datasets (at which point the classifier likely overfits these data), as per Fig 2D, achieving a final prediction accuracy for multiclass classification that is comparable to that offered by osmotic gradient ektacytometry when classifying HS samples alone. Importantly, the best performing algorithms utilized here achieve complete differentiation between controls and diseased patients and accurately identify a variety of disorders potentially allowing for the rapid preliminary identification or discrimination of more elusive diseases (such as CDAII and PKD) without time‐consuming laboratory assays or molecular testing methods. Furthermore, the possibility to continuously incorporate data from new samples or the expansion with haematological conditions beyond those characterised in this study may ultimately allow for diagnosing a large number of samples in a relatively short period using minimal sample volumes. In conclusion, the method described in this work represents an exciting step forward towards facilitating the improved diagnosis of haemolytic anaemias.

Funding information

PLM was funded by the European Union (H2020‐MSCAITN‐2015, “RELEVANCE”, Grant agreement number 675117). MAER is supported by Eurostars grant estar18105 and by an unrestricted grant provided by RR Mechatronics. PB was funded by the Fondazione IRCCS Ca' Granda Ospedale Maggiore Policlinico, Grant no. 2019 175/02, 2019. AMT and TJS were funded by an NHS Blood and Transplant (NHSBT) R&D grant (WP15‐05) and the National Institute for Health Research Blood and Transplant Research Unit (NIHR BTRU) in Red Cell Products (IS‐BTU‐1214‐10032). The views expressed are those of the author(s) and not necessarily those of the NIHR or the Department of Health and Social Care.

Author contributions

PLM acquired data, prepared figures and developed the Python code for dataset analysis and classification. JGGD and GJS provided essential ARCA equipment and image analysis software. MAER and RvW diagnosed HX patients and provided blood samples. MV and RvZ diagnosed HS and HE patients and performed initial ARCA analysis. EF and PB diagnosed HS, CDAII, HX and PKD patients and provided blood samples. PLM, AMT and TJS conceived and designed experiments and wrote the manuscript. TJS and AMT contributed equally to conception and supervision of the work. All authors read and edited the manuscript.

Conflicts of interest

The authors declare no competing financial interests.
  14 in total

1.  Novel mechanisms of PIEZO1 dysfunction in hereditary xerocytosis.

Authors:  Edyta Glogowska; Eve R Schneider; Yelena Maksimova; Vincent P Schulz; Kimberly Lezon-Geyda; John Wu; Kottayam Radhakrishnan; Siobán B Keel; Donald Mahoney; Alison M Freidmann; Rachel A Altura; Elena O Gracheva; Sviatoslav N Bagriantsev; Theodosia A Kalfa; Patrick G Gallagher
Journal:  Blood       Date:  2017-07-17       Impact factor: 22.113

Review 2.  Machine learning: applications of artificial intelligence to imaging and diagnosis.

Authors:  James A Nichols; Hsien W Herbert Chan; Matthew A B Baker
Journal:  Biophys Rev       Date:  2018-09-04

3.  Osmotic gradient ektacytometry: A valuable screening test for hereditary spherocytosis and other red blood cell membrane disorders.

Authors:  E Llaudet-Planas; J L Vives-Corrons; V Rizzuto; P Gómez-Ramírez; J Sevilla Navarro; M T Coll Sibina; M García-Bernal; A Ruiz Llobet; I Badell; P Velasco-Puyó; J L Dapena; M M Mañú-Pereira
Journal:  Int J Lab Hematol       Date:  2017-10-10       Impact factor: 2.877

Review 4.  Clinical presentation and management of hemolytic anemias.

Authors:  Kalust Ucar
Journal:  Oncology (Williston Park)       Date:  2002-09       Impact factor: 2.990

5.  Diagnostic tool for red blood cell membrane disorders: Assessment of a new generation ektacytometer.

Authors:  Lydie Da Costa; Ludovic Suner; Julie Galimand; Amandine Bonnel; Tiffany Pascreau; Nathalie Couque; Odile Fenneteau; Narla Mohandas
Journal:  Blood Cells Mol Dis       Date:  2015-09-16       Impact factor: 3.039

Review 6.  Rare Hereditary Hemolytic Anemias: Diagnostic Approach and Considerations in Management.

Authors:  Mary Risinger; Myesa Emberesh; Theodosia A Kalfa
Journal:  Hematol Oncol Clin North Am       Date:  2019-03-29       Impact factor: 3.722

7.  PIEZO1 gain-of-function mutations delay reticulocyte maturation in hereditary xerocytosis.

Authors:  Pedro L Moura; Bethan R Hawley; Johannes G G Dobbe; Geert J Streekstra; Minke A E Rab; Paola Bianchi; Richard van Wijk; Ashley M Toye; Timothy J Satchwell
Journal:  Haematologica       Date:  2019-10-17       Impact factor: 9.941

8.  Flow-cytometric analysis of erythrocytes and reticulocytes in congenital dyserythropoietic anaemia type II (CDA II): value in differential diagnosis with hereditary spherocytosis.

Authors:  P Danise; G Amendola; B Nobili; S Perrotta; E Miraglia Del Giudice; S M Matarese; A Iolascon; C Brugnara
Journal:  Clin Lab Haematol       Date:  2001-02

Review 9.  Diagnostic approaches for inherited hemolytic anemia in the genetic era.

Authors:  Yonggoo Kim; Joonhong Park; Myungshin Kim
Journal:  Blood Res       Date:  2017-06-22

10.  Next-generation sequencing approach for the diagnosis of human diseases: open challenges and new opportunities.

Authors:  Chiara Di Resta; Silvia Galbiati; Paola Carrera; Maurizio Ferrari
Journal:  EJIFCC       Date:  2018-04-30
View more
  1 in total

1.  Integrating artificial intelligence into haematology training and practice: Opportunities, threats and proposed solutions.

Authors:  Shang Yuin Chai; Amjad Hayat; Gerard Thomas Flaherty
Journal:  Br J Haematol       Date:  2022-07-04       Impact factor: 8.615

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.