Literature DB >> 36249461

A mobile-optimized artificial intelligence system for gestational age and fetal malpresentation assessment.

Ryan G Gomes¹, Bellington Vwalika^2,3, Chace Lee¹, Angelica Willis¹, Marcin Sieniek¹, Joan T Price^3,4, Christina Chen¹, Margaret P Kasaro^2,4, James A Taylor¹, Elizabeth M Stringer³, Scott Mayer McKinney¹, Ntazana Sindano⁴, George E Dahl⁵, William Goodnight², Justin Gilmer⁵, Benjamin H Chi^3,4, Charles Lau¹, Terry Spitz¹, T Saensuksopa¹, Kris Liu¹, Tiya Tiyasirichokchai¹, Jonny Wong¹, Rory Pilgrim¹, Akib Uddin¹, Greg Corrado¹, Lily Peng¹, Katherine Chou¹, Daniel Tse¹, Jeffrey S A Stringer^3,4, Shravya Shetty¹.

Abstract

Background: Fetal ultrasound is an important component of antenatal care, but shortage of adequately trained healthcare workers has limited its adoption in low-to-middle-income countries. This study investigated the use of artificial intelligence for fetal ultrasound in under-resourced settings.
Methods: Blind sweep ultrasounds, consisting of six freehand ultrasound sweeps, were collected by sonographers in the USA and Zambia, and novice operators in Zambia. We developed artificial intelligence (AI) models that used blind sweeps to predict gestational age (GA) and fetal malpresentation. AI GA estimates and standard fetal biometry estimates were compared to a previously established ground truth, and evaluated for difference in absolute error. Fetal malpresentation (non-cephalic vs cephalic) was compared to sonographer assessment. On-device AI model run-times were benchmarked on Android mobile phones.
Results: Here we show that GA estimation accuracy of the AI model is non-inferior to standard fetal biometry estimates (error difference -1.4 ± 4.5 days, 95% CI -1.8, -0.9, n = 406). Non-inferiority is maintained when blind sweeps are acquired by novice operators performing only two of six sweep motion types. Fetal malpresentation AUC-ROC is 0.977 (95% CI, 0.949, 1.00, n = 613), sonographers and novices have similar AUC-ROC. Software run-times on mobile phones for both diagnostic models are less than 3 s after completion of a sweep. Conclusions: The gestational age model is non-inferior to the clinical standard and the fetal malpresentation model has high AUC-ROCs across operators and devices. Our AI models are able to run on-device, without internet connectivity, and provide feedback scores to assist in upleveling the capabilities of lightly trained ultrasound operators in low resource settings.

Entities: Chemical

Keywords: Health care; Medical research

Year: 2022 PMID： 36249461 PMCID： PMC9553916 DOI： 10.1038/s43856-022-00194-5

Source DB: PubMed Journal: Commun Med (Lond) ISSN： 2730-664X

Introduction

Despite considerable progress in maternal healthcare in recent decades, maternal and perinatal deaths remain high with 295,000 maternal deaths during and following pregnancy and 2.4 million neonatal deaths each year. The majority of these deaths occur in low-to-middle-income countries (LMICs)[1-3]. The lack of antenatal care and limited access to facilities that can provide lifesaving treatment for the mother, fetus and newborn contribute to inequities in quality of care and outcomes in these regions[4,5]. Obstetric ultrasound is an important component of quality antenatal care. The WHO recommends one routine early ultrasound scan for all pregnant women, but up to 50% of women in developing countries receive no ultrasound screening during pregnancy[6]. Fetal ultrasounds can be used to estimate gestational age (GA), which is critical in scheduling and planning for screening tests throughout pregnancy and interventions for pregnancy complications such as preeclampsia and preterm labor. Fetal ultrasounds later in pregnancy can also be used to diagnose fetal malpresentation, which affects up to 3–4% of pregnancies at term and is associated with trauma-related injury during birth, perinatal mortality, and maternal morbidity[7-11]. Though ultrasound devices have traditionally been costly, the recent commercial availability of low-cost, battery-powered handheld devices could greatly expand access[12-14]. However, current ultrasound training programs require months of supervised evaluation as well as indefinite continuing education visits for quality assurance[13-19]. GA estimation and diagnosis of fetal malpresentation require expert interpretation of anatomical imagery during the ultrasound acquisition process. GA estimation via clinical standard biometry[20] requires expertly locating fetal anatomical structures and manually measuring their physical sizes in precisely collected images (head circumference, abdominal circumference, femur length, among others). To address these barriers, prior studies have introduced a protocol where fetal ultrasounds can be acquired by minimally trained operators via a “blind sweep” protocol, consisting of six predefined freehand sweeps over the abdomen[21-27]. While blind-sweep protocols simplify the ultrasound acquisition process, new methods are required for interpreting the resulting imagery. AI-based interpretation may provide a promising direction for generating automated clinical estimates from blind-sweep video sequences. In this study, we used two prospectively collected fetal ultrasound datasets to estimate gestational age and fetal malpresentation while demonstrating key considerations for use by novice users in LMICs: (a) validating that it is possible to build blind-sweep GA and fetal malpresentation models that run in real-time on mobile devices; (b) evaluating generalization of these models to minimally trained ultrasound operators and low-cost ultrasound devices; (c) describing a modified 2-sweep blind-sweep protocol to simplify novice acquisition; (d) adding feedback scores to provide real-time information on sweep quality.

Methods

Blind-sweep procedure

Blind-sweep ultrasounds consisted of a fixed number of predefined freehand ultrasound sweeps over the gravid abdomen. Certified sonographers completed up to 15 sweeps. Novice operators (“novices”), with 8 h of blind-sweep ultrasound acquisition training, completed six sweeps. Evaluation of both sonographers and novices was limited to a set of six sweeps—three vertical and three horizontal sweeps (Fig. 1b).

Fig. 1

Development of an artificial intelligence system to acquire and interpret blind-sweep ultrasound for antenatal diagnostics.

a Datasets were curated from sites in Zambia and the USA and include ultrasound acquired by sonographers and midwives. Ground truth for gestational age was derived from the initial exam as part of clinical practice. An artificial intelligence (AI) system was trained to identify gestational age and fetal malpresentation and was evaluated by comparing the accuracy of AI predictions with the accuracy of clinical standard procedures. The AI system was developed using only sonographer blind-sweep data, and its generalization to novice users was tested on midwife data. Design of the AI system considered suitability for deployment in low-to-middle-income countries in three ways: first, the system interpreted ultrasound from low-cost portable ultrasound devices; second, near real-time interpretation is available offline on mobile phone devices; and finally, the AI system produces feedback scores that can be used to provide feedback to users. b Blind-sweep ultrasound acquisition procedure. The procedure can be performed by novices with a few hours of ultrasound training. While the complete protocol involves six sweeps, a set of two sweeps (M and R) were found to be sufficient for maintaining the accuracy of gestational age estimation.

Development of an artificial intelligence system to acquire and interpret blind-sweep ultrasound for antenatal diagnostics.

Fetal age machine learning initiative (FAMLI) and novice user study datasets

Data were analyzed from the Fetal Age Machine Learning Initiative cohort, which collected ultrasound data from study sites at Chapel Hill, NC (USA), and the Novice User Study collected from Lusaka, Zambia (Fig. 1a)[27]. The goal of this prospectively collected dataset was to enable the development of technology to estimate gestational age[28]. Data collection occurred between September 2018 and June 2021. All study participants provided written informed consent, and the research was approved by the UNC institutional review board (IRB #18-1848) and the biomedical research ethics committee at the University of Zambia. Blind-sweep data were collected with standard ultrasound devices (SonoSite M-Turbo or GE Voluson) as well as a low-cost portable ultrasound device (ButterflyIQ). Studies included standard clinical assessments of GA[20] and fetal malpresentation performed by a trained sonographer using a standard ultrasound device.

Algorithm development

We developed two deep learning neural network models to predict GA and fetal malpresentation. Our models generated diagnostic predictions directly from ultrasound video: sequences of image pixel values were the input and an estimate of the clinical quantity of interest was the output. The GA model produced an estimate of age, measured in days, for each blind-sweep video sequence. The GA model additionally provided an estimate of its confidence in the estimate for a given video sequence. No intermediate fetal biometric measurements were required during training or generated during inference. The fetal malpresentation model predicted a probability score between 0.0 and 1.0 for whether the fetus is in noncephalic presentation. See Supplementary Materials for a technical discussion and details regarding model development. In the USA, the ground truth GA was determined for each participant based on the “best obstetric estimate,” as part of routine clinical care, using procedures recommended by the American College of Obstetricians and Gynecologists (ACOG)[29]. The best obstetric estimate combines information from the last menstrual period (LMP), GA derived from assisted reproductive technology (if applicable), and fetal ultrasound anatomic measurements. In Zambia, only the first fetal ultrasound was used to determine the ground truth GA as the LMP in this setting was considered less reliable as patients often presented for care later in pregnancy. The GA model was trained on sonographer-acquired blind sweeps (up to 15 sweeps per patient) as well as sonographer-acquired “fly-to” videos that capture five to ten seconds before the sonographer has acquired standard fetal biometry images. The fetal malpresentation model was only trained on blind sweeps. For each training set case, fetal malpresentation was specified as one of four possible values by a sonographer (cephalic, breech, transverse, oblique), and dichotomized to “cephalic” vs “noncephalic”. This dichotomization is clinically justified since cephalic cases are considered normal while all noncephalic cases require further medical attention. Our analysis cohort included all pregnant women in the FAMLI and Novice User Study datasets who had the necessary ground truth information for gestational age and fetal presentation from September 2018 to January 2021. Study participants were assigned at random to one of three dataset splits: train, tune, or test. We used the following proportions: 60% train/20% tune/20% test for study participants who did not receive novice sweeps, and 10% tune/90% test for participants who received novice sweeps. The tuning set was used for optimizing machine learning training hyperparameters and selecting a classification threshold probability for the fetal malpresentation model. This threshold was chosen to yield equal noncephalic specificity and sensitivity on the tuning set, blinded to the test sets. None of the blind-sweep data collected by the novices were used for training. Cases consisted of multiple blind-sweep videos, and our models generated predictions independently for each video sequence within the case. For the GA model, each blind sweep was divided into multiple video sequences. For the fetal malpresentation model, video sequences corresponded to a single complete blind sweep. We then aggregated the predictions to generate a single case-level estimate for either GA or fetal malpresentation (described further in the Mobile Device Inference section in supplementary materials).

Evaluation

The evaluation was performed on the FAMLI (sonographer-acquired) and Novice User Study (novice-acquired) datasets. Test sets consisted of patients independent of those used for AI development (Fig. 1a). For our GA model evaluation, the primary FAMLI test set comprised 407 women in 657 study visits in the USA. A second test set, “Novice User Study” included 114 participants in 140 study visits in Zambia. Novice blind-sweep studies were exclusively performed at Zambian sites. Sweeps collected with standard ultrasound devices were available for 406 of 407 participants in the sonographer-acquired test set, and 112 of 114 participants in the novice-acquired test set. Sweeps collected with the low-cost device were available for 104 of 407 participants in the sonographer-acquired test set, and 56 of 114 participants in the novice-acquired test set. Analyzable data from the low-cost device became available later during the study, and this group of patients is representative of the full patient set. We randomly selected one study visit per patient for each analysis group to avoid combining correlated measurements from the same patient. For our fetal malpresentation model, the test set included 613 patients from the sonographer-acquired and novice-acquired datasets, resulting in 65 instances of noncephalic presentation (10.6%). For each patient, the last study visit of the third trimester was included. Of note, there are more patients in the malpresentation model test set since the ground truth is not dependent on a prior visit. The disposition of study participants are summarized in STARD diagrams (Supplementary Fig. 1) and Supplementary Table 1.

Table 1

Gestational age estimation.

	Sweeps collected by sonographers		Sweeps collected by novices
	Standard ultrasound device	Low-cost handheld device	Standard ultrasound device	Low-cost handheld device
Number	406	104	112	56
Blind-sweep MAE ± sd (days)	3.8 ± 3.6	3.3 ± 2.8	4.4 ± 3.5	5.0 ± 4.0
Standard fetal biometry estimates MAE ± sd (days)	5.2 ± 4.6	3.8 ± 3.6	4.8 ± 3.7	4.7 ± 4.0
Blind sweep—standard fetal biometry mean difference ± sd (days)	−1.4 ± 4.5	−0.6 ± 3.8	−0.4 ± 4.8	0.4 ± 5.1
MAE difference 95% CI (days)	−1.8, −0.9	−1.3, 0.1	−1.3, 0.5	−1.0, 1.7
Blind sweep ME ± sd (days)	−0.9 ± 5.3	0.4 ± 4.4	−1.5 ± 5.5	−3.8 ± 5.4
Standard fetal biometry estimates ME ± sd (days)	−1.4 ± 7.0	−0.25 ± 5.4	−2.6 ± 5.3	−3.4 ± 5.2
Reduced blind-sweep protocol MAE ± sd (days)	4.0 ± 3.7	3.5 ± 3.0	4.5 ± 3.5	5.1 ± 4.2

Mean absolute error (MAE) and mean error (ME) between gestational age (GA) estimated using the blind-sweep procedure and ground truth, and the MAE and ME between the GA estimated using the standard fetal biometry ultrasound procedure and ground truth. One visit by each participant eligible for each subgroup was selected at random. The reduced blind-sweep protocol (last row) included only two blind sweeps. All other blind-sweep results used a set of six blind sweeps per patient visit. All fetal biometry GA estimates were collected by expert sonographers using standard ultrasound devices.

Table 2

Fetal malpresentation estimation.

Subset	Number of participants	Number of malpresentations	AUC-ROC (95% CI)	Sensitivity (95% CI)	Specificity (95% CI)
All	613	65	0.977 (0.949, 1.0)	0.938 (0.848, 0.983)	0.973 (0.955, 0.985)
Low-cost device only	213	29	0.970 (0.944, 0.997)	0.931 (0.772, 0.992)	0.940 (0.896, 0.970)
Standard device only	598	65	0.980 (0.953, 1.000)	0.954 (0.871, 0.990)	0.977 (0.961, 0.988)
Novice only	189	21	0.992 (0.983, 1.000)	1.000 (0.839, 1.000)	0.952 (0.908, 0.979)
Sonographer only	424	43	0.972 (0.933, 989)	0.907 (0.779, 0.974)	0.987 (0.970, 0.996)

The fetal malpresentation model was assessed by comparing predictions to the determination of a sonographer. In each subset of the data, we selected only the latest eligible visit from each patient. For sensitivity and specificity computations, model predictions were binarized according to a predefined threshold. Confidence intervals on the area under the receiver operating characteristic (AUC-ROC) were computed using the DeLong method. Confidence intervals on sensitivity and specificity were computed with the Clopper–Pearson method.

Table 3

Mobile-device model run-time benchmarks.

	Processor type
Mobile phone	GPU mean ± standard deviation	CPU w/ XNNPACK library (4 threads)	CPU (4 threads)
Pixel 3	0.9 ± 0.1 s	2.1 ± 1.0 s	13.2 ± 2.9 s
Pixel 4	0.2 ± 0.1 s	1.5 ± 0.8 s	9.8 ± 2.5 s
Samsung Galaxy S10	0.5 ± 0.1 s	1.7 ± 1.1 s	10.3 ± 2.3 s
Xiaomi Mi 9	1.0 ± 0.2 s	1.8 ± 1.3 s	13.7 ± 3.4 s

Time to model inference results (mean and standard deviation in seconds) measured from the end of a 10-s-long blind-sweep video. Both gestational age and fetal malpresentation models run simultaneously on the same video sequence and image preprocessing operations are included. Near real-time inference is achievable on smartphones with graphics processing units or compute libraries optimized for neural network operations. This enables a simple and fast examination procedure in clinical environments.

22 in total

1. The frequency of breech presentation by gestational age at birth: a large population-based study.

Authors: D E Hickok; D C Gordon; J A Milberg; M A Williams; J R Daling
Journal: Am J Obstet Gynecol Date: 1992-03 Impact factor: 8.661

2. Comparison Study of Low-Cost Ultrasound Devices for Estimation of Gestational Age in Resource-Limited Countries.

Authors: Thomas L A van den Heuvel; Dagmar de Bruijn; Desirée Moens-van de Moesdijk; Anette Beverdam; Bram van Ginneken; Chris L de Korte
Journal: Ultrasound Med Biol Date: 2018-08-06 Impact factor: 2.998

3. Malpresentation in low- and middle-income countries: Associations with perinatal and maternal outcomes in the Global Network.

Authors: Cassandra R Duffy; Janet L Moore; Sarah Saleem; Antoinette Tshefu; Carl L Bose; Elwyn Chomba; Waldemar A Carlo; Ana L Garces; Nancy F Krebs; K Michael Hambidge; Shivaprasad S Goudar; Richard J Derman; Archana Patel; Patricia L Hibberd; Fabian Esamai; Edward A Liechty; Dennis D Wallace; Elizabeth M McClure; Robert L Goldenberg
Journal: Acta Obstet Gynecol Scand Date: 2018-12-20 Impact factor: 3.636

4. Estimating fetal age: computer-assisted analysis of multiple fetal growth parameters.

Authors: F P Hadlock; R L Deter; R B Harrist; S K Park
Journal: Radiology Date: 1984-08 Impact factor: 11.105

5. The diagnostic impact of limited, screening obstetric ultrasound when performed by midwives in rural Uganda.

Authors: J O Swanson; M G Kawooya; D L Swanson; D S Hippe; P Dungu-Matovu; R Nathan
Journal: J Perinatol Date: 2014-04-03 Impact factor: 2.521

6. Cost estimation alongside a multi-regional, multi-country randomized trial of antenatal ultrasound in five low-and-middle-income countries.

Authors: B W Bresnahan; E Vodicka; J B Babigumira; A M Malik; F Yego; A Lokangaka; B M Chitah; Z Bauer; H Chavez; J L Moore; L P Garrison; J O Swanson; D Swanson; E M McClure; R L Goldenberg; F Esamai; A L Garces; E Chomba; S Saleem; A Tshefu; C L Bose; M Bauserman; W Carlo; S Bucher; E A Liechty; R O Nathan
Journal: BMC Public Health Date: 2021-05-20 Impact factor: 3.295

7. Evaluation of Focused Obstetric Ultrasound Examinations by Health Care Personnel in the Democratic Republic of Congo, Guatemala, Kenya, Pakistan, and Zambia.

Authors: Robert O Nathan; Jonathan O Swanson; David L Swanson; Elizabeth M McClure; Victor Lokomba Bolamba; Adrien Lokangaka; Irma Sayury Pineda; Lester Figueroa; Walter López-Gomez; Ana Garces; David Muyodi; Fabian Esamai; Nancy Kanaiza; Waseem Mirza; Farnaz Naqvi; Sarah Saleem; Musaku Mwenechanya; Melody Chiwila; Dorothy Hamsumonde; Dennis D Wallace; Holly Franklin; Robert L Goldenberg
Journal: Curr Probl Diagn Radiol Date: 2016-11-10

Review 8. Obstetric ultrasound use in low and middle income countries: a narrative review.

Authors: Eunsoo Timothy Kim; Kavita Singh; Allisyn Moran; Deborah Armbruster; Naoko Kozuki
Journal: Reprod Health Date: 2018-07-20 Impact factor: 3.223

9. Ultrasound-based gestational-age estimation in late pregnancy.

Authors: A T Papageorghiou; B Kemp; W Stones; E O Ohuma; S H Kennedy; M Purwar; L J Salomon; D G Altman; J A Noble; E Bertino; M G Gravett; R Pang; L Cheikh Ismail; F C Barros; A Lambert; Y A Jaffer; C G Victora; Z A Bhutta; J Villar
Journal: Ultrasound Obstet Gynecol Date: 2016-12 Impact factor: 7.299

10. WHO recommendations on antenatal care for a positive pregnancy experience-going beyond survival.

Authors: Ӧ Tunçalp; J P Pena-Rosas; T Lawrie; M Bucagu; O T Oladapo; A Portela; A Metin Gülmezoglu
Journal: BJOG Date: 2017-03-09 Impact factor: 6.531