Literature DB >> 36048518

Developing Clinical Artificial Intelligence for Obstetric Ultrasound to Improve Access in Underserved Regions: Protocol for a Computer-Assisted Low-Cost Point-of-Care UltraSound (CALOPUS) Study.

Shinjini Bhatnagar¹, Aris T Papageorghiou^2,3, J Alison Noble⁴, Alice Self², Qingchao Chen⁴, Bapu Koundinya Desiraju¹, Sumeet Dhariwal⁴, Alexander D Gleed⁴, Divyanshu Mishra¹, Ramachandran Thiruvengadam¹, Varun Chandramohan¹, Rachel Craik², Elizabeth Wilden⁴, Ashok Khurana⁵.

Abstract

BACKGROUND: The World Health Organization recommends a package of pregnancy care that includes obstetric ultrasound scans. There are significant barriers to universal access to antenatal ultrasound, particularly because of the cost and need for maintenance of ultrasound equipment and a lack of trained personnel. As low-cost, handheld ultrasound devices have become widely available, the current roadblock is the global shortage of health care providers trained in obstetric scanning.
OBJECTIVE: The aim of this study is to improve pregnancy and risk assessment for women in underserved regions. Therefore, we are undertaking the Computer-Assisted Low-Cost Point-of-Care UltraSound (CALOPUS) project, bringing together experts in machine learning and clinical obstetric ultrasound.
METHODS: In this prospective study conducted in two clinical centers (United Kingdom and India), participating pregnant women were scanned and full-length ultrasounds were performed. Each woman underwent 2 consecutive ultrasound scans. The first was a series of simple, standardized ultrasound sweeps (the CALOPUS protocol), immediately followed by a routine, full clinical ultrasound examination that served as the comparator. We describe the development of a simple-to-use clinical protocol designed for nonexpert users to assess fetal viability, detect the presence of multiple pregnancies, evaluate placental location, assess amniotic fluid volume, determine fetal presentation, and perform basic fetal biometry. The CALOPUS protocol was designed using the smallest number of steps to minimize redundant information, while maximizing diagnostic information. Here, we describe how ultrasound videos and annotations are captured for machine learning.
RESULTS: Over 5571 scans have been acquired, from which 1,541,751 label annotations have been performed. An adapted protocol, including a low pelvic brim sweep and a well-filled maternal bladder, improved visualization of the cervix from 28% to 91% and classification of placental location from 82% to 94%. Excellent levels of intra- and interannotator agreement are achievable following training and standardization.
CONCLUSIONS: The CALOPUS study is a unique study that uses obstetric ultrasound videos and annotations from pregnancies dated from 11 weeks and followed up until birth using novel ultrasound and annotation protocols. The data from this study are being used to develop and test several different machine learning algorithms to address key clinical diagnostic questions pertaining to obstetric risk management. We also highlight some of the challenges and potential solutions to interdisciplinary multinational imaging collaboration. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): RR1-10.2196/37374. ©Alice Self, Qingchao Chen, Bapu Koundinya Desiraju, Sumeet Dhariwal, Alexander D Gleed, Divyanshu Mishra, Ramachandran Thiruvengadam, Varun Chandramohan, Rachel Craik, Elizabeth Wilden, Ashok Khurana, The CALOPUS Study Group, Shinjini Bhatnagar, Aris T Papageorghiou, J Alison Noble. Originally published in JMIR Research Protocols (https://www.researchprotocols.org), 01.09.2022.

Entities: Chemical

Keywords: artificial intelligence; data annotation; machine learning; obstetrics; ultrasound

Year: 2022 PMID： 36048518 PMCID： PMC9478819 DOI： 10.2196/37374

Source DB: PubMed Journal: JMIR Res Protoc ISSN： 1929-0748

Introduction

Background

Every year between 250,000 and 300,000 women die during pregnancy or following childbirth, and approximately 2.5 million neonates die within the first 28 days of life [1]. Most of these deaths occur in low-resource settings and can be prevented by timely access to evidence-based interventions. To help mitigate these deaths and the associated large burden of morbidity, the World Health Organization recommends a package of antenatal care [2] that includes an ultrasound before 24 weeks of gestation. The recommendations for the antenatal ultrasound are to assess fetal cardiac activity, fetal number and chorionicity, gestational age and fetal size, placental appearance and location, and a basic anomaly screen [2]. In addition to early screening, routine assessment of fetal malpresentation near term is also effective in reducing morbidity and mortality [3]. Significant barriers remain to the universal access to antenatal ultrasound. Primarily, these are related to procuring and maintaining ultrasound equipment and a lack of trained personnel [4]. The first of these barriers is being addressed through technological advances in low-cost, handheld ultrasound devices, making point-of-care ultrasound more accessible. However, implementation remains limited due to insufficient numbers of trained health care providers. In addition, training in obstetric ultrasound is lengthy, costly, and difficult to scale up. Several research efforts have described how teleradiology or automated solutions might overcome these obstacles through the use of simple obstetric ultrasound protocols using portable devices [5-10].

Objectives

To contribute to the improvements in antenatal care and pregnancy risk assessment for women in low-resource settings, we are undertaking the Computer-Assisted Low-Cost Point-of-Care UltraSound (CALOPUS) project. This brings together experts in machine learning and clinical obstetric ultrasound in an international collaboration (Multimedia Appendix 1). The aim of this study is to develop and evaluate simple-to-use clinical protocols and machine learning–based decision-making tools for nonexpert users, which are designed to be functionally suitable for implementation at scale in underserved regions. In this paper, we describe our approach to working toward automating the requirements of a basic ultrasound examination [2,11]. Well-designed acquisition and curation of data and their careful annotation are crucial in such studies. The aims of this paper are to report the development of optimal clinical acquisition and image annotation protocols suitable for automated analysis and to describe our experience and share learning regarding ultrasound video acquisition and annotation in a multisite setting.

Methods

Study Overview

The ultimate objective of the CALOPUS study is to develop machine learning models based on simplified obstetric ultrasound to predict pregnancy risk factors, such as the detection of noncephalic presentation or a low-lying placenta. The primary objectives of this phase of CALOPUS are as follows: To develop an optimal clinical acquisition protocol suitable for automated analysis that can be obtained by a minimally trained health care provider To capture ultrasound videos and perform video annotation to develop a data set suitable for machine learning To advance capabilities in real-time ultrasound video partitioning to prevent fetal sex determination, which is an important prerequisite for the global dissemination of ultrasound This prospective study is an interdisciplinary international collaboration among the Translational Health Science and Technology Institute, Delhi, India; the Civil Hospital, Gurugram, Haryana, India; the Institute of Biomedical Engineering and the Nuffield Department of Women’s and Reproductive Health, University of Oxford, United Kingdom; and the Women’s Centre, John Radcliffe Hospital, Oxford, United Kingdom.

Setting, Study Design, and Participants

In the 2 hospitals, the participating pregnant women were scanned in a room set up with an ultrasound machine configured to record full-length scans by screen capture. Each woman underwent 2 consecutive ultrasound scans. The first was our CALOPUS protocol, which consists of a series of simple, standardized ultrasound sweeps (Figure 1), immediately followed by a routine, full clinical ultrasound examination that served as the comparator. All scans were performed by trained sonologists (both sonographers and medical doctors trained in obstetric ultrasound) on identical GE Voluson E8 (General Electric Healthcare) ultrasound machines using C2-9 or C1-5 curvilinear probes. All data were anonymized at the point of collection using specifically designed video capture software that blanks out the patient identifiable information on the screen recording.

Figure 1

The Computer-Assisted Low-Cost Point-of-Care UltraSound protocol, developed on the basis of the studies described (see text).

The Computer-Assisted Low-Cost Point-of-Care UltraSound protocol, developed on the basis of the studies described (see text). The enrollment of participants, clinical data collection, and the ultrasound scans (CALOPUS and routine clinical scans) were performed by a dedicated clinical research team of doctors, nurses, and sonologists. All women were eligible if they were able to provide informed consent, were aged ≥18 years, and had (or were attending for) a scan between 11 and 14 weeks of pregnancy for gestational age assessment based on fetal crown-rump length (CRL). In India, participants were recruited from the ongoing interdisciplinary Group for Advanced Research on BirtH outcomes–Department of Biotechnology India Initiative (GARBH-Ini) cohort [12]. This is a prospective observational cohort of pregnant women enrolled before 20 weeks of gestation. All pregnant women were followed up during pregnancy with ultrasound scans at 18 to 20, 30 to 32, and 35 to 37 weeks of gestation, and birth outcomes were determined. At Oxford, women attending the antenatal ultrasound department of the John Radcliffe Hospital were invited to enroll throughout pregnancy from 11+0 weeks of gestation. Consecutive recruitment was up to a weekly quota of 20 scans per week and could be targeted to better focused recruitment of women in specific gestational age windows. As the incidence of breech presentation reduces to approximately 3% to 4% at term [13], the anticipated number of scans recorded of noncephalic presenting fetuses would inevitably be small, even in a large data set. As machine learning models require a degree of balance in the training data set, to ensure a higher proportion of ultrasound scans in women with noncephalic fetal presentation, women in the Indian cohort with a noncephalic fetus at the 30 to 32 or 35 to 37 weeks scan received additional scans every 15 days until the presentation became cephalic or until delivery.

Ethics Approval

In India, ethics approval was obtained from the institutional ethics committees of the Translational Health Science and Technology Institute, Faridabad, India, THS/1.8.1/(71), dated August 26, 2019, and Gurugram Civil Hospital, Haryana, India, GHG/IEC letter, dated September 3, 2019. Written informed consent was obtained from all study participants by the study nurses under the supervision of research officers. For illiterate women, the details of the study were explained in the presence of a literate impartial witness. Verbal consent and thumb impressions were taken from the participants along with the signature of the witnesses. In the United Kingdom, approval from the West of Scotland Research Ethics Committee 5 (reference 18/WS/0051) was obtained, along with approvals from the Health Research Authority and Oxford University Hospitals. Trained research midwives or the sonologists themselves completed the consent process with participants, and written informed consent was obtained.

Defining the CALOPUS Scan Acquisition Protocol

Due to the exploratory nature of this collaboration, the protocol for scan acquisition developed over time. In 2016, Abuhamad et al [11] published a clinical study validating the feasibility and accuracy of a 6-step approach to performing a focused basic obstetric ultrasound. Each step corresponds to an integral part of prenatal care and helps identify women at high risk of obstetric complications. For example, in a setting with poor health care and transport infrastructure, it would be ill-advised for a woman to attempt a vaginal breech birth or delivery of twins without access to comprehensive emergency obstetric care facilities. Similarly, a woman with known major placenta previa should be delivered by cesarean section and not attempt vaginal birth. The purpose of this 6-step approach was to provide a simple framework that could be readily taught to health care providers with limited scanning experience. A series of 5 simple steps allows the evaluation of fetal presentation, presence of cardiac activity (fetal viability), presence of multiple pregnancies, determination of placental location, and amniotic fluid measurement. The sixth step is a standard clinical freehand scan for fetal biometry. This 6-step approach was adopted as the initial scanning protocol for the CALOPUS. However, although a protocol for human point-of-care ultrasound benefits from sequential decision-making, it contains overlap and redundant information, that is, each step contains information relevant to more than one of the six aims. A hypothesis of this study is that automation offers the opportunity to optimize the scanning protocol to reduce redundancy in overlapping scan sweeps. To test this hypothesis, we conducted an initial analysis of 470 videos from the first 80 participants. All women had viable second or third trimester pregnancies (gestational age range from 19+3 to 40+0 weeks).

Clinical Protocol Refinement to One Optimized for Machine Learning

On the basis of this, the original 6-step approach was modified into the CALOPUS ultrasound protocol (Figure 1 and Multimedia Appendix 2). Steps 1 and 5 of the original approach were removed to streamline the acquisition process. Hence, the original step 2 became the new step 1 of the CALOPUS ultrasound protocol, as it consistently confirmed both fetal presentation and viability. Steps 2 and 3 of the CALOPUS ultrasound protocol were from the original 6-step approach and aim to identify multiple pregnancies and quantify amniotic fluid. We used maximum vertical pool (MVP) depth to assess amniotic fluid volume (rather than the amniotic fluid index due to its simplicity), as neither have been demonstrated to be superior in predicting adverse outcome. Another reason to use MVP depth was that amniotic fluid index results in a higher number of diagnoses of oligohydramnios than MVP depth, which in turn may result in an increased number of inductions of labor and cesarean sections but without benefits to perinatal outcomes [14,15]. Even in combination with other steps, step 4 of the original 6-step approach was found to be inadequate for determining the relationship between the placenta and the cervix, because the cervix is not always directly behind the pubic symphysis. This is important, as clinically, a low-lying placenta is ruled out by measuring the distance from the placental margin to the internal cervical os. It is widely accepted that an anterior placenta will not impede vaginal delivery if it is more than 10 mm from the internal os during the second trimester. It has been recommended that the posterior placenta should be >15.5 mm from the internal os in the second trimester [16]. Hence, a new step (step 5 of the CALOPUS ultrasound protocol) was introduced to improve visualization of the uterine lower segment and cervix and to reduce the number of false positive cases (where the placenta is not low-lying but is suspected to be so). This was a U-shaped sweep from the right to the left iliac fossa, along the maternal pelvic brim. The final step of the CALOPUS ultrasound protocol (step 6) is a validation step to measure the CRL if <14 weeks gestation or the head circumference, transcerebellar diameter, biparietal diameter, abdominal circumference, femur length, and the deepest pool of amniotic fluid if ≥14 weeks (CRL>84 mm [17]) according to previously described ultrasound methodology [18-20]. This was done so that any measurable biometry planes were identified from steps 1 to 5 of the CALOPUS ultrasound protocol, and biometric measurement values were compared with the paired standard clinical ultrasound. The methodology for measuring fetal biometry and amniotic fluid remained constant across all acquisition protocol iterations, with the use of a standardized protocol from international standards [20,21] to ensure the quality control of the images obtained.

Statistical Justification

A pilot study was conducted to calculate the sample size required to answer the following questions: Does the addition of step 5 of the CALOPUS ultrasound protocol help to visualize the cervix more than step 4 of the CALOPUS ultrasound protocol alone? Does the fullness of the bladder have more of an impact on visualization of the cervix than the addition of step 5 of the CALOPUS ultrasound protocol? Is the addition of step 5 of the CALOPUS ultrasound protocol clinically significant? The pilot study focused on women between 18 and 22 weeks of gestation, because at this gestation, it should be possible to confirm placental location with transabdominal scanning alone. At more advanced gestations, a transvaginal scan is often necessary to check the distance between the placental margin and cervix, particularly when the placenta is posterior and within the lower segment of the uterus. The pilot study was a paired cohort study in which data were analyzed using McNemar test (a 2-sample paired-proportions test) performed in STATA (version 16.1). A total of 69 women underwent both steps 4 and 5 of the CALOPUS ultrasound protocol. The cervix was defined as having been seen if part of the endocervical canal was visible, and the bladder was well filled if it contained >200 mL of urine as defined by a cuboid volumetric calculation. This recommendation for bladder fullness is based on guidance for placental imaging, because a full bladder stretches out the lower segment to provide a more realistic idea of the relationship between the placental edge and the cervix or any previous cesarean scar [22]. The sample sizes were calculated using type 1 error rate of 0.05 and 90% power. A sample size of 212 scans was needed to detect the differences observed in the pilot study between the addition of steps 5 and 4 of the CALOPUS ultrasound protocol alone or the contribution of a full bladder in correctly diagnosing the placental location, compared with a freehand scan gold standard.

Data Processing

The ultrasound machine video output port was split at both sites. One port was connected to an image-acquisition hardware installed on a desktop computer. Data collection software was implemented to control the image-acquisition hardware and to collect the ultrasound videos in real time, without affecting the scanning procedure. The recordings were saved locally on a desktop computer.

Data Quality Checks and Storage

Scans underwent manual quality checking for the following reasons: To assess whether all the scans have synced properly from the clinical site To assess if the ultrasound signal had been successfully captured in the video To assess if any steps or sweeps were missing from the scan directory To record the duration of the video files To undertake a basic visual check of video quality (eg, image corruption or distortion). The scans were then transferred to the server for annotation and subsequent analysis at each site.

Annotation Protocol

The annotation of key anatomical structures in the CALOPUS ultrasound protocol video sweeps are used alongside the video as the training input to the machine learning algorithms. Initial manual annotations were performed by a single annotator who placed bounding boxes around the structures of interest using a VATIC backend [23] and a self-designed XML administrator web page on the data server desktop. The annotation tool was subsequently changed to CVAT [24] to increase functionality by including segmentation and point annotation with more attributes. Irrespective of the tool used, frames were not annotated where there was significant motion artifact.

Quality Assurance of Annotations

Acquiring the large number of annotations required for machine learning models can be prohibitively slow using a single annotator. In addition, a single annotator may introduce annotation bias. We used a team of 5 annotators in the United Kingdom and India to increase the rate of annotation and to reduce bias, but this posed a new challenge: How can we quantify the acceptable variation between annotators to ensure an achievable standard for the annotation team? The annotators were sonologists with experience in performing antenatal ultrasound scans. A standard operating procedure was developed (Multimedia Appendix 3), and several metrics were selected to enable comparison between annotators. A standardization exercise was undertaken by the first 2 members of the annotation team, using annotations of 9 recordings of step 1 (6712 frames). These were annotated on a frame-by-frame basis, placing bounding boxes around 11 anatomical features of interest: the fetal head; cerebellum; heart; spine; abdomen; pelvis; stomach and femur; and the maternal bladder, amniotic fluid, and placenta. Following the standardization of the 9 videos, annotators were then assessed on further 20 videos to ensure that their annotations were in line with the standard operating procedure of the annotation. For ongoing quality assurance, every tenth annotation was repeated by a second annotator to ensure consistency was maintained over time. To guide the expected levels of agreement required for new annotators to achieve and provide an expected standard for ongoing annotation quality assurance, we conducted a baseline intra- and interannotator agreement study. Each of the 2 annotators annotated all the 18,717 frames from 20 videos for the 11 anatomical features described earlier. They repeated all annotations 2 weeks later to allow calculation of both intra- and interannotator agreement. We developed a code to assess the agreement between sets of annotations available on GitHub [25]. The metrics are calculated as follows: Partial match: a frame was denoted as a partial match if >50% of the labels were the same between 2 different annotations. Exact match: a frame was denoted as an exact match if the bounding boxes drawn in each frame were the same between 2 different annotations divided by the total number of frames. Match per label: closely related to the exact match for an individual anatomical feature, with the denominator as the total number of frames with a given anatomy label by either of the annotators. Bounding box intersection over union: calculated for 2 bounding box annotations of the same feature on the same frame. The total area of the bounding box overlap (the intersection) was divided by the area of union between the 2 annotations (total area of both annotators’ bounding boxes for any feature minus the area of intersection).

Results

Participants

As of June 2022, a total of 5661 participants have been recruited into the CALOPUS study. After excluding 90 scans following manual quality checking of the scans, 5571 (98.41%) participants remained. Table 1 shows the number of scans acquired at different gestational ages through the iterations of the CALOPUS protocol.

Table 1

Ultrasound data collected as part of the Computer-Assisted Low-Cost Point-of-Care UltraSound (CALOPUS) study.

Gestational age (weeks)	Six-step approach (n=431), n (%)	CALOPUS ultrasound protocol (n=5140), n (%)
<14	4 (0.9)	909 (17.7)
14+0 to 17+6	1 (0.2)	435 (8.5)
18+0 to 24+6	265 (61.5)	1485 (28.9)
25+0 to 29+6	7 (1.6)	354 (6.9)
30+0 to 34+6	69 (16)	1022 (19.9)
≥35+0	85 (19.7)	935 (18.2)

Ultrasound data collected as part of the Computer-Assisted Low-Cost Point-of-Care UltraSound (CALOPUS) study.

Defining the CALOPUS Protocol

Step 1 of the 6-step approach confirmed presentation in only 60% (46/77) of the cases, whereas step 2 (designed to detect the fetal heart) was able to do so in 99% of the cases (79/80; the single failure was for a twin pregnancy). Conversely, step 2 was able to confirm viability in 81% (62/77) of the videos, and analysis of one or more additional steps was needed to determine fetal heart activity, which was feasible in 99% (79/80) of the participants. Furthermore, steps 3.2 and 5 of the 6-step approach sweep across the same parts of the maternal abdomen, so that a computational algorithm could measure pools of amniotic fluid in step 3.2 rather than requiring a repeat (step 5). Placental location was the most difficult component of the scan to assess. Steps 3 and 4 combined located the placenta in 86% (69/80) of the cases. Ruling out a low-lying placenta was particularly difficult for posterior placentas or those within the lower segment where the cervix could not be visualized. This initial analysis concluded that a revised protocol for automated decision-making algorithms should be designed to reduce the number of steps owing to redundancy in information obtained. However, more than one step may be needed to achieve each objective, in particular for the determination of placenta location. We subsequently analyzed 212 scans to assess the impact of step 5 of the CALOPUS ultrasound protocol and bladder filling (Table 2). Our results showed that the cervix was visualized 4-fold more frequently when step 5 of the CALOPUS ultrasound protocol was included as well as step 4 (n=120), compared to 29 times when using step 4 of the CALOPUS ultrasound protocol alone. The bladder was well filled (>200 mL) in 30.7% (65/212) of the women. The cervix was visualized in 91% (59/65) of the cases where the bladder was well filled and both steps 4 and 5 of the CALOPUS ultrasound protocol were assessed, compared with the cervix seen in 28% (18/65) of the scans where only step 4 of the CALOPUS ultrasound protocol was considered. This demonstrates that both bladder fullness and the addition of step 5 of the CALOPUS ultrasound protocol have a considerable impact on the frequency at which the cervix is visualized. The addition of step 5 of the CALOPUS ultrasound protocol is more significant than bladder filling alone.

Table 2

The impact of bladder filling and different steps on visualization of the cervix and classification of placental location.

				Bladder well filled (n=65), n (%)		Bladder poorly filled (n=147), n (%)
Visualization of the cervix
	Step 4 alone
		Seen (n=29, 13.7%)	18 (28)		11 (7)
		Unseen (n=183, 86.3%)	47 (72)		136 (93)
	Steps 4 and 5
		Seen (n=120, 56.6%)	59 (91)		61 (41)
		Unseen (n=92, 43.3%)	6 (9)		86 (59)
Classification of placental location
	Step 4 alone
		Correct (n=165, 77.8%)	53 (82)		112 (76)
		Incorrect (n=47, 22.2%)	12 (18)		35 (24)
	Steps 4 and 5
		Correct (n=189, 89.2%)	61 (94)		128 (87)
		Incorrect (n=23, 10.8%)	4 (6)		19 (13)

The impact of bladder filling and different steps on visualization of the cervix and classification of placental location. It should be noted that it is not always necessary to visualize the cervix to correctly identify whether a placenta is low-lying (eg, when the placenta is at the uterine fundus). Therefore, we examined whether the addition of step 5 of the CALOPUS ultrasound protocol was clinically beneficial by increasing the correct classification of low-lying placental position compared with a standard ultrasound examination. A protocol including both steps 4 and 5 of the CALOPUS ultrasound protocol correctly classified placentas as low-lying or not in 89% (189/212) of the instances compared to 78% (65/212) with step 4 of the CALOPUS ultrasound protocol alone. These data indicate that the CALOPUS ultrasound protocol should include a low pelvic brim sweep (step 5). Women should be scanned with a full bladder to optimize visualization of the lower segment and assist in the classification of placental location.

Annotations

Manual annotation is very time-intensive: a video clip lasting 20 to 30 seconds typically takes approximately 40 minutes to fully annotate with bounding boxes. This is because of the time taken to change the size and position of the bounding boxes in response to the probe and fetal movements (despite using an interpolative setting from one frame to the next). The total number of frames recorded for each participant is approximately 4500 (30 frames per second), and so routinely, only 1 or 2 video clips per participant were fully annotated with bounding boxes around the structures of interest, with other videos having frame-level anatomy labels. At the time of publication, 1,541,751 label annotations were made. Overall, 1,026,744 (66.6%) annotations were bounding box annotations around the features of interest, 7788 (0.5%) were segmentations, and 507,219 (32.9%) were labels assigned to frames. As opposed to bounding box annotation, frame labeling significantly reduces the annotation time but comes at the expense of having no information about the location of a structure within a frame. Table 3 lists the number of videos and frames labeled.

Table 3

Number of video annotations in the Computer-Assisted Low-Cost Point-of-Care UltraSound study.

	Videos (n=1057), n (%)	Frames (n=586,253), n (%)
Step 1	446 (42.2)	254,095 (43.3)
Step 2	161 (15.2)	114,958 (19.6)
Step 3	163 (15.4)	67,633 (11.5)
Step 4	164 (15.5)	103,269 (17.6)
Step 5	123 (11.6)	46,298 (7.9)

Number of video annotations in the Computer-Assisted Low-Cost Point-of-Care UltraSound study. The levels of agreement between annotators and their repeated annotations over 18,717 frames are listed in Table 4. Annotator 2 demonstrated greater reproducibility in annotation than annotator 1, but the same patterns were observed throughout: partial match percentages were higher than for exact label match, and the reproducibility of an annotator’s own annotations was greater than the agreement between 2 different annotators. The pooled κ values for match per label demonstrated an excellent level of agreement both between annotators and for repeat annotations, and the bounding box intersection over union >50% was high.

Table 4

Baseline intra- and interannotator agreement in the Computer-Assisted Low-Cost Point-of-Care UltraSound study.

	Intra-annotator agreement			Observed interannotator agreement
	Annotator 1	Annotator 2
Partial match (%)	98.1	98.3	94.5
Exact match (%)	84	91.1	71.4
Match per label (%)	91.8	95.7	83.1
Pooled κ	0.93	0.96	0.85
Bounding box overlap >50% (%)	94.8	97.1	93.4

Baseline intra- and interannotator agreement in the Computer-Assisted Low-Cost Point-of-Care UltraSound study.

Discussion

Principal Findings

The CALOPUS study aims to develop and evaluate simple-to-use clinical protocols and machine learning–based decision-making tools for nonexpert users that are suitable for use in underserved regions. In this paper, we present the methods used to collect clinically relevant, detailed, repeatable, and quality-assessed data for the development of machine learning algorithms for obstetric ultrasound.

Strengths and Limitations

This study has several strengths and uses a multidisciplinary clinical engineering team approach. The prospective nature of data collection allows meticulous quality assurance, which is not possible with retrospective collection of existing clinical data. As a prospective study, it also allows us to collect targeted data for the purpose of machine learning on ultrasound videos, which is not routinely collected in most settings. Preliminary investigation of a machine learning model for automatic detection of breech presentation, built and validated on these data has been described [26], and current work is ongoing to improve this; we are also developing machine learning model for automatic assessment of low-lying placentas [27] and another for preventing fetal sex determination (mandated in some settings), achieved by automated recognition and obscuring of the fetal pelvic region. We also used a subset of the CALOPUS data and annotations from the Automatic Amniotic Fluid Measurement and Analysis From Ultrasound Video Ultrasound Challenge: Automatic amniotic fluid measurement and analysis from ultrasound videos [28] We believe that the dual site and international nature of this collaboration increases external validity and enrollment. Nevertheless, this is at the expense of adding complexity to the setup of the study sites and standardization processes. Challenges included training of staff in the use of the ultrasound protocol and annotation of data, which had to be performed remotely. This increased the risk of undesirable changes. Fortnightly meetings between the data acquisition and annotation teams were undertaken to ensure that logistical and technical issues could be identified and corrected promptly and that feedback regarding scanning and annotation variations could be discussed. Despite this, when recruitment speed exceeds annotation speed, problems with data collection may not become evident for some time, as was the case on one occasion. We have now introduced methods to automate some of the quality checks, such as sweep duration to give advice about scanning speed and to identify frames “jumping,” which may only be noticed when watching videos at much slower playback speeds. An important limitation of our study was the use of high-cost ultrasound equipment by expert sonologists. Although our ultimate aim is to develop low-cost ultrasound solutions, there were several reasons why we opted to use a GE Voluson E8 machine during this phase of the study. First, the scan images produced are of high quality. We believe this is a reasonable first step when building machine learning algorithms by training models with high-quality images initially and then introducing lower-quality images later on. Domain adaptation techniques are available for this purpose, which allow parameter refinement from a model built with large data in one domain (in our case, a high-cost ultrasound machine) to a second domain with a small amount of data (in our case, a low-cost ultrasound machine). Second, both clinical sites had GE Voluson E8 machines, which simplified the initiation of the study along with routine clinical scanning. Having demonstrated the feasibility of higher cost systems, we will then translate the models we develop into clinically useful tools to be used by minimally trained personnel with low-cost, handheld ultrasound technology. Despite being a collaborative project, there have been limitations from the respective ethics committees regarding what data can be shared between sites. Thus, although 1 in 10 scans can be shared for quality assurance purposes, the study design of the machine learning model development and testing was affected. One solution is to build single-site models that use a second site for testing [27]. Another is the use of transfer learning, whereby a model is built at one site and refined using data at the second site. Federated learning is also being considered, where a model is trained on data available at each site separately, and once site-specific models are built, model weights are combined to give a final model [29]. This emerging approach is beneficial as the transfer of original data is not required; rather, encoded weights are shared. We believe this is likely to benefit wider use for multisite machine learning–based imaging research in the future.

Challenges and Further Scope

We observed a lack of universal tools for annotating ultrasound videos. It would be advantageous for there to be open source software that truly caters to the annotation of medical images and videos by accepting multiple file types, enabling measurements to be taken, and allowing frame labeling, bounding box annotation, and segmentation. Currently, several different tools need to be used to perform these tasks. For most human-executed tasks, reproducibility between ultrasound practitioners is lower than that within the same practitioner [30]. Widely used guidelines exist for the standardized acquisition of ultrasound images in clinical practice [31-33]. We are not aware of any standards for manual annotation and were compelled to develop definitions for labeling different anatomies owing to the complexity of fetal position and movements. The agreement between sonographers assessing a single frame is not 100%, so even a lower agreement is expected when annotating a video consisting of hundreds of frames. The assessment of interannotator agreement is also important in identifying the best level of agreement that could be expected between a human observer and a machine learning model. As manual annotations of anatomical planes provide the only way for the clinical ground truth of anatomical standards to be demarcated, it would be unreasonable to aim for 100% agreement when experienced annotators cannot consistently reach this target of agreement. Finally, it is worth reflecting on the challenges of interdisciplinary research in this emerging research area. Clinical assessment of CALOPUS videos is novel; therefore, explicitly translating this into a machine learning algorithm input is exploratory in nature. Interdisciplinary research explores the rules needed to assess diagnostic information from both the human and artificial intelligence perspectives. Engineers and clinicians need to work together to understand both perspectives and benefit from regular and clear communication to discuss study design, meet each other’s data requirements, and understand differing perspectives about important research questions. Our clinical team has found this an exciting area to work in as it allows a better understanding of clinical decision-making processes when these are explicitly verbalized, whereas the engineering team has found working on real-world clinical problems highly rewarding, as these show interesting, unique challenges not commonly encountered elsewhere.

Conclusions

The CALOPUS study is a unique study that uses obstetric ultrasound videos and annotations from pregnancies dated from 11 weeks and followed up until birth using novel ultrasound and annotation protocols. The data from this study are being used to develop and test several different machine learning algorithms to address key clinical diagnostic questions for obstetric risk management. We also highlight some of the challenges and potential solutions to interdisciplinary international collaboration.

23 in total

1. Intra- and interobserver variability in fetal ultrasound measurements.

Authors: I Sarris; C Ioannou; P Chamberlain; E Ohuma; F Roseman; L Hoch; D G Altman; A T Papageorghiou
Journal: Ultrasound Obstet Gynecol Date: 2012-03 Impact factor: 7.299

2. Practice guidelines for performance of the routine mid-trimester fetal ultrasound scan.

Authors: L J Salomon; Z Alfirevic; V Berghella; C Bilardo; E Hernandez-Andrade; S L Johnsen; K Kalache; K-Y Leung; G Malinger; H Munoz; F Prefumo; A Toi; W Lee
Journal: Ultrasound Obstet Gynecol Date: 2011-01 Impact factor: 7.299

3. The frequency of breech presentation by gestational age at birth: a large population-based study.

Authors: D E Hickok; D C Gordon; J A Milberg; M A Williams; J R Daling
Journal: Am J Obstet Gynecol Date: 1992-03 Impact factor: 8.661

4. Standardisation and quality control of ultrasound measurements taken in the INTERGROWTH-21st Project.

Authors: I Sarris; C Ioannou; E O Ohuma; D G Altman; L Hoch; C Cosgrove; S Fathima; L J Salomon; A T Papageorghiou
Journal: BJOG Date: 2013-07-11 Impact factor: 6.531

5. Automated Fetal Head Detection and Circumference Estimation from Free-Hand Ultrasound Sweeps Using Deep Learning in Resource-Limited Countries.

Authors: Thomas L A van den Heuvel; Hezkiel Petros; Stefano Santini; Chris L de Korte; Bram van Ginneken
Journal: Ultrasound Med Biol Date: 2018-12-17 Impact factor: 2.998

6. Standardisation of crown-rump length measurement.

Authors: C Ioannou; I Sarris; L Hoch; L J Salomon; A T Papageorghiou
Journal: BJOG Date: 2013-05-17 Impact factor: 6.531

7. Validation of a Telemedicine Quality Assurance Method for Point-of-Care Obstetric Ultrasound Used in Low-Resource Settings.

Authors: Anne Dougherty; Miriam Kasten; Michael DeSarno; Gary Badger; Mary Streeter; David C Jones; Betsy Sussman; Kristen DeStigter
Journal: J Ultrasound Med Date: 2020-08-08 Impact factor: 2.153

8. A Pregnancy Cohort to Study Multidimensional Correlates of Preterm Birth in India: Study Design, Implementation, and Baseline Characteristics of the Participants.

Authors: Shinjini Bhatnagar; Partha P Majumder; Dinakar M Salunke
Journal: Am J Epidemiol Date: 2019-04-01 Impact factor: 4.897

9. Testing telediagnostic obstetric ultrasound in Peru: a new horizon in expanding access to prenatal ultrasound.

Authors: Marika Toscano; Thomas J Marini; Kathryn Drennan; Timothy M Baran; Jonah Kan; Brian Garra; Ann M Dozier; Rafael L Ortega; Rosemary A Quinn; Yu T Zhao; Miguel S Egoavil; Lorena Tamayo; Claudia Carlotto; Benjamin Castaneda
Journal: BMC Pregnancy Childbirth Date: 2021-04-26 Impact factor: 3.007

10. Follow-up ultrasound in second-trimester low-positioned anterior and posterior placentae: prospective cohort study.

Authors: C H J R Jansen; C E Kleinrouweler; A W Kastelein; L Ruiter; E van Leeuwen; B W Mol; E Pajkrt
Journal: Ultrasound Obstet Gynecol Date: 2020-10-13 Impact factor: 7.299