Literature DB >> 30505673

A systematic review and meta-analysis of the accuracy of weight estimation systems used in paediatric emergency care in developing countries.

Mike Wells¹, Lara Nicole Goldstein¹, Alison Bentley¹.

Abstract

INTRODUCTION: When weight cannot be measured during the management of medical emergencies in children, a convenient, quick and accurate method of weight estimation is required, as many drug doses and other interventions are based on body weight. Many weight estimation methodologies in current use have been shown to be inaccurate, especially in low- and middle-income countries with a high prevalence of underweight children. This meta-analysis evaluated the accuracy of weight estimation systems in children from studies from low- and middle-income countries.
METHODS: Articles from low- and middle-income countries were screened for inclusion to evaluate and compare the accuracy of existing systems and the newer dual length- and habitus-based methods, using standard meta-analysis techniques.
RESULTS: The 2D systems and parental estimates performed best overall. The PAWPER tape, parental estimates, the Wozniak method and the Mercy method were the most accurate systems with percentage of weight estimates within 10% of actual weight (PW10) accuracies of 86.9%, 80.4%, 72.1% and 71.4% respectively. The Broselow tape (PW10 47.1%) achieved a moderate accuracy and age-based estimates a very low accuracy (PW10 11.8-47.5%).
CONCLUSIONS: The PAWPER tape, the Wozniak method and the Mercy method achieved an acceptable level of accuracy in studies from low- and middle-income countries and should preferentially be used and further advanced for clinical emergency medicine practice. Parental estimates may be considered if the regular caregiver of the child is present and a recent measured weight is known. The Broselow tape and age-based formulas should be abandoned in low- and middle-income country populations as they are potentially dangerously inaccurate.

Entities: Chemical

Keywords: Broselow tape; Low- and middle-income countries; Mercy method; PAWPER tape; Weight estimation

Year: 2017 PMID： 30505673 PMCID： PMC6246873 DOI： 10.1016/j.afjem.2017.06.001

Source DB: PubMed Journal: Afr J Emerg Med ISSN： 2211-419X

African relevance

This is the first meta-analysis of weight estimation systems in low- and middle-income countries. The Broselow tape and age formulas are potentially harmful in low- and middle-income countries. The dual burden of over- and underweight children requires advanced weight estimation. The PAWPER tape, the Wozniak method or the Mercy method should be used in Africa.

Introduction

Throughout the world, the prevalence of obesity in children has increased to the point where “fat is the new normal” [1], [2]. Low and middle income countries have not escaped the epidemic of obesity but also suffer from a high prevalence of underweight children: a dual burden of extremes of habitus [3]. These factors have a major impact on the accuracy and safety of paediatric weight-estimation systems. Drug doses in children are commonly based on their total body weight, but children can seldom actually be weighed during the management of medical emergencies [4]. An accurate estimation of weight is required to facilitate accurate drug dose calculations under these circumstances [5], [6], [7]. The accuracy of the weight estimation is important to ensure that a sufficient dose is administered to ensure the efficacy of the treatment and, on the other hand, to minimise the likelihood of overdosing with the consequent potential negative effects [8], [9], [10], [11]. Some of the older methods that are still commonly used to estimate weight include age-based formulas, length-based formulas, the Broselow tape, guesses by healthcare providers and estimates by parents – these can be classified as one-dimensional systems as only one parameter is used to generate a weight estimation. Many of these methods were derived from populations of well- or over-nourished children and have been shown to lack accuracy and consistency of performance, especially between different populations [12], [13], [14]. To limit the degree of underestimation of weight in high-income country populations, newer age-based formulas have been developed over the last decade to accommodate for the increasing prevalence of obesity in children [15], [16]. The Broselow tape has also been updated and modified (to the current version: 2011 edition A) to reduce the risk of underestimation of weight [17]. Few of the older one-dimensional weight estimation systems have performed well in populations in low- and middle-income countries, where there is a higher prevalence of malnourished children [18], [19], [20]. The potential exists for significant overestimation of weight by methodologies that are derived from populations with a high prevalence of obesity, especially those developed more recently [20]. The newest generation of weight estimation systems, however, have been designed to be equally accurate in normal, under- and overweight children [21], [22]. They are two-dimensional techniques that make use of dual length- and habitus-based parameters for weight estimation. These are the Mercy method, the PAWPER tape and the Wozniak method. Preliminary evidence in both high-income countries and low- and middle-income countries has shown that they are far superior in accuracy to traditional methods [5], [6], [12], [18], [19], [23], [24], [25], [26], [27], [28], [29], [30]. Given the differences between populations in high-income and middle or low-income countries, there is a need to establish which methods predict weight most accurately in children in under-resourced environments, to minimise potential medication errors and resultant patient harm. The objective of this study was therefore to determine which paediatric weight estimation systems predicted total body weight most accurately in children from developing countries (low- and middle-income countries). No systematic reviews or meta-analyses have addressed this topic before.

Methods

Search strategy

Online databases (MEDLINE, SCOPUS, Science Direct and Google) were searched for eligible studies, published between January 1986 and January 2017, using the following search terms: “paediatric weight estimation”, “weight estimation children” and “Broselow tape”. Articles in any language were included if English translations were obtainable. Potential studies for inclusion were also identified from the references sections and citations of reviewed articles. To minimise the possibility of publication bias, all studies with adequate reporting were included, whether full-text articles, research reports, abstracts, conference presentations or other unpublished data.

Study selection and eligibility criteria

All studies that evaluated or compared any of the weight-estimation methodologies described in Table 2 were assessed for inclusion into the study by two separate researchers (MW and LG). Only studies from low- and middle-income countries (as defined in the United Nations World Economic Situation and Prospects report) were included for further analysis [31]. Studies that did not include original data were excluded. Studies that did not include usable statistics–data describing bias and precision (mean percentage error with standard deviations or limits of agreement) or data describing overall accuracy (percentages of estimations within 10% or 20% of actual weight (PW10 and PW20))–were also excluded from the meta-analysis (see Fig. 1).

Table 2

Summary and description of weight estimation methodologies described in the literature. Omitted methods included the Carroll method (insufficient data), neonatal applications (out of scope) and various tape systems with no identified primary or validation studies. Systems with only one study evaluating their accuracy were not included in the comparisons with other studies: the Ali formula, Park formula, hanging-leg weight method and the finger-counting method.

	Name	Formula	Restrictions/Limitations/Acceptable accuracy/benefits
Age-based & length-based formulas	Ali formula	Wt=(2.5×Z)+8	Derived in a Trinidadian population of children ≤5 yrs of age in 2012. No validation studies to date. Age restriction 1–5 years of age
	Argall formula	Wt=(3×Z)+6or[Wt=3×(Z+2)]	Developed from a small UK study in 2003 (300 children). Generally found to underestimate weight, more so in older and heavier children. Age restriction 1–10 years of age
	Advanced Paediatric Life Support formula (new)	Wt=z2+4	For infants ≤12 months of age	Derived in a UK population and adopted in 2011 by the Advanced Life Support Group from a combination of the original APLS and the Luscombe formulas. It was untested and unvalidated at the time of adoption. Generally overestimates weight. Age restriction birth to 12 years of age
		Wt=(2×Z)+8or[Wt=2×(Z+4)]	For children aged 1–5 years
		Wt=(3×Z)+7	For children aged 6–12 years
	Australian Resuscitation Council formula	Wt=3.5	At birth.	Adopted by the ARC in Australia in 1996. Same as New Zealand Resuscitation Council formula. Generally underestimates weight, more so in older and heavier children. Differing accuracy in different ethnic, socio-economic and international populations. No specific age restriction noted
		Wt=(2×Z)+8	For children aged 1–9 years
		Wt=3.3×Z	For children 10 years and over
	Best Guess formulas	Wt=z+92	For infants ≤12 months of age	Also known as the Tinning formulas. Derived in Australian population in 2007 from a retrospective database study of more than 70000 children. Generally overestimates weight, especially in poorer populations. Has been evaluated in several validation studies with mixed results
		Wt=(2×Z)+10or[Wt=2×(Z+5)]	For children aged 1–5 years
		Wt=4×Z	For children aged 6–14 years
	European Paediatric Life Support formula	Wt=2×(Z+4)or[Wt=(2×Z)+8]	Original population and date of derivation unclear. Generally underestimates weight, more so in older and heavier children. Differing accuracy in different ethnic, socio-economic and international populations. Age restriction 1–10 years of age
	Garwood formula	Wt=z4+6	Developed in a UK population from a sample of 1252 children in 2012. The initial validation study was flawed, but this formula has been subjected to a validation study subsequently (showing poor performance). For children aged 1–16 yrs
	Leffler formulas	Wt=z+82	For children <1 year of age	Also known as the Tintinalli formula, the original origin is unclear, but became popular after the Leffler study in 1997. Overestimates weight in younger children (≤6 yrs) and underestimates weight in older children (>6 yrs)
	Leffler formulas	Wt=(2×Z)+10	For children aged 1–10 years
	Luscombe formula	Wt=(3×Z)+7	Developed in the UK in 2007 from a large database of nearly 14000 children. Underestimates weight in most populations studied, but significantly overestimates weight in populations from developing countries. Age restriction 1–10 years
	Nelson formulas (originally Weech’s formulas)	Wt=z+92	For infants 3–12 months	As described in Nelson’s Textbook of Paediatrics. The origin is probably from Weech’s formulas, first reported in 1954 in the USA. The Weech formula is still in use today as one of the standard measurement denominators for determining underweight status. Weight most often overestimated in infants and older children (>6 yrs) and underestimated in younger children (≤6 years)
		Wt=2×(Z+4)	For children aged 1–6 years
		Wt=(Z×7)-52	For children aged 7–12 years
	Shann formulas	Wt=(2×Z)+9	For children aged 1–9 years	Used in Australasia primarily. Origin is unclear. Underestimates weight increasingly with increasing age
	Shann formulas	Wt=(3×Z)	For children aged >9 years
	Theron formula	Wt=e(0.175571×Z)+2.197099	Derived in 2005 in New Zealand from a small study of 900 children that included a large number of Pacific Island children (high weight-for-age). The developers intended it for use in children high in the weight-for-age centiles. Age restriction 1–10 years. Overestimates weight in most populations
	Traub-Johnson formula	Wt=2.05×e0.02X	Derived in 1980 from USA national growth data from 1959. This formula was used to estimate ideal body weight and adjusted body weight, which were used interchangeably. The formula was intended to estimate the 50th centile of weight-for-height. Underestimates total body weight. For children aged 1–18 years
	Traub-Kichen formula	Wt=2.396×1.0188X	Derived in 1983 in the USA from data from more than 20000 children in the National Centre for Health Statistics database. The formula was intended to estimate the 50th centile of weight-for-height which the developers regarded as an approximation of ideal body weight. Underestimates total body weight. For children over 74 cm and aged 1–17 years
OTHER LENGTH-BASED SYSTEMS	Broselow tape	Weight estimated directly by placing tape next to child and measuring from head to heel. The estimated weight and colour zone is read off the tape	Developed in 1985 in the USA from US growth data and first validated in a sample of just over 900 children in 1988. Several changes have been made over the years: the latest version is the 2011A edition. Underestimates weight except in populations with a high prevalence of poor nutrition. Inaccuracy increases with increasing length / weight. Increased underestimation of weight in obese and overweight individuals. Substantial number of children “too tall for the tape” but who are not at adult weight. Length restriction 46–143 cm. Maximum weight estimation 36 kg
	Blantyre tape	Weight estimated directly by placing tape next to child and measuring from head to heel. The estimated weight is read off the tape	Developed in Malawi using values 85% of the 50th centile of the American National Centre for Health Statistics weight-for-length growth charts. Validated on a sample of 729 children. The developers reported a reasonable accuracy between 4 and 16 kg but the reporting of data was flawed and is unverifiable. Length restriction of 45–130 cm
	Wozniak formulas	Wt=(1.443×U)+(1.596×M)-32.963	Developed in Botswana in 2012 from a sample of 777 children with a high prevalence of HIV infection and growth retardation. Measurements of mid-arm circumference and ulna length or tibia length are used to estimate weight using the formula. The accuracy of the method decreases in children <10 kg and children >40 kg
	Wozniak formulas	Wt=(0.86×T)+(1.715×M)-30.426
	PAWPER tape	Weight estimated directly by placing tape next to child and measuring from head to heel. A habitus score (1–5) is assigned to the child based on body habitus (1=very thin, 3=average, 5=very fat). The estimated weight for that length and habitus score is read off the tape	Developed in 2004 in South Africa based on WHO weight-for-length growth charts and validated on a sample of 453 children in 2013. Estimates weight uniformly across length range of tape. Performs well in children who are under- or overweight. Length restriction 43–153 cm. Maximum weight estimation 47 kg. The extended PAWPER tape accommodates children up to 180 cm in length, a maximum weight estimation of 116 kg and with a 7-point habitus score assessment (habitus scores 6 and 7 were added to accommodate children above the 95% centile of weight-for-length i.e. for obese and severely obese children)
	Mercy method	Humerus length and mid-arm circumference are measured and then used to determine “segmental weights” from a table. Specifically designed tapes “2D” and “3D” tapes may be used which eliminates the need for a data table	Developed in the USA from a database of 19625 children and validated across several centres in 2012, 2013 and 2014, including in developing countries. Consistently good weight estimation across age and habitus ranges. Decreased accuracy in younger children (<2 years)

Abbreviations: Z = age in years (to the nearest half year; some texts have this value as the age at the last birthday or completed years of age); z = age in months; X = height or length in cm; M = mid-arm circumference in cm; LW = hanging leg-weight in kg; FL = foot length in cm; U = ulna length in cm; T = tibial length in cm.

Fig. 1

PRISMA flow-chart of the meta-analysis design and study selection.

Prevalence of underweight, overweight and obese children in the countries and regions represented in this study. Data from three developed countries is shown for comparison. 1Global Burden of Disease Study 2013. Global Burden of Disease Study 2013 (GBD 2013) Obesity Prevalence 1990–2013. Seattle, United States: Institute for Health Metrics and Evaluation (IHME); 2014. 2de Onis M, Blossner M, Borghi E, Frongillo EA, Morris R. Estimates of global prevalence of childhood underweight in 1990 and 2015. JAMA. 2004;291(21):2600–2606. 3Prevalence of underweight, weight for age (% of children under 5): The World Bank; 2016 [cited 2017]. Available from: http://data.worldbank.org/indicator/SH.STA.MALN.ZS. Summary and description of weight estimation methodologies described in the literature. Omitted methods included the Carroll method (insufficient data), neonatal applications (out of scope) and various tape systems with no identified primary or validation studies. Systems with only one study evaluating their accuracy were not included in the comparisons with other studies: the Ali formula, Park formula, hanging-leg weight method and the finger-counting method. Abbreviations: Z = age in years (to the nearest half year; some texts have this value as the age at the last birthday or completed years of age); z = age in months; X = height or length in cm; M = mid-arm circumference in cm; LW = hanging leg-weight in kg; FL = foot length in cm; U = ulna length in cm; T = tibial length in cm. PRISMA flow-chart of the meta-analysis design and study selection.

Data abstraction and analysis

Standard statistics for meta-analysis of method-comparison studies were used, with an emphasis on evaluating accuracy (pooled categorical data – PW10), bias (pooled mean differences – mean percentage error) as well as precision (pooled variance – limits of agreement) [32], [33]. The included studies showed a large amount of within-study variance as well as between-study variance that needed to be considered. Two methods of representing the pooled parametric and non-parametric data were employed: a fixed effects model weighted by inverse variance (method 1) a random effects model (method 2) The quality of the statistical analysis and data reporting of many of the evaluated studies was incomplete or unusable (e.g. mean percentage error and/or measures of variance not reported or PW10 not reported). Important descriptive statistics frequently needed to be imputed to enable inclusion into the meta-analysis pool. Missing data was imputed using standard methodologies when possible [34]. The use of absolute differences (in kg) between actual weight and estimated weight was statistically unsound because this could not account for the large differences in the variances between infants or younger children and older children [23], [35]. Studies that presented only this data were excluded. An adequate performance of any weight estimation system was defined by a benchmark accuracy indicator of a PW10 >70% and a PW20 >95% [35].

Subgroup analysis

There was considerable heterogeneity in the use and composition of subgroups within the included studies. There was insufficient data to allow for statistical consolidation and analysis of subgroups.

Software

Statistical analysis was performed using Stata (StataCorp. 2015. Stata Statistical Software: Release 14. College Station, TX: StataCorp LP), Graphpad Prism (GraphPad Prism version 8.00 for Mac, GraphPad Software, La Jolla California USA, www.graphpad.com) and Review manager (Review Manager (RevMan) [Computer program]. Version 5.3. Copenhagen: The Nordic Cochrane Centre, The Cochrane Collaboration, 2014).

Results

Study selection

From the 171 studies identified and screened in the literature search, 147 were examined in detail (see Fig. 1). A total of 25 articles were included in the quantitative meta-analysis, with 31,392 patients (28,644 in 20 prospective studies and 457,859 in 3 retrospective studies). Of these, 11 articles contained data suitable for direct pair-wise comparisons of two or more methods of weight estimation.

Excluded studies

Limited and incomplete data presentation and statistical analysis was the most common reason for excluding potentially relevant studies. Many studies contained incorrectly used or interpreted statistical analysis, which made the findings and interpretation of the originating studies unreliable. If the data presentation was adequate, to allow for data imputation, it was included into the meta-analysis.

Origin of included studies

Articles included for the meta-analysis originated widely: Africa (52%), Central Asia (30%), South-eastern Asia (11%) and Central America and the Caribbean (7%). The countries represented by studies in this review are listed in Table 1, together with data on the prevalence of overweight, obesity and underweight within those countries and regions.

Table 1

Prevalence of underweight, overweight and obese children in the countries and regions represented in this study. Data from three developed countries is shown for comparison.

Country	Prevalence of overweight and obesity (age 2–19) 2013¹ %	Prevalence of obesity (age 2–19) 2013¹ %	Prevalence of underweight by region 2015^2,3 %	Prevalence of underweight (age <5) 2000–2014^2,3 %
Botswana	14.4	4.5	13.3	11.2
Egypt	35.4	13.6	4.2	6.8
India	5.2	2.4	28.7	43.5
Iran	23.9	6.5	9.2	4.6
Kenya	11.3	2.8	23.6	16.4
Malawi	18.4	6.2	13.3	16.7
Mali	11.6	3.8	26.2	27.9
Mexico	28.8	10.1	3.4	2.8
Philippines	5.5	2.4	17.9	20.2
South Africa	22.5	8.3	13.3	8.7
Sudan	12.8	5.7	33.3	27.6
Thailand	14.4	5.2	17.9	9.2
Trinidad	20.2	7.5	2.8	4.4

Australia	23.7	7.1	0.9	0.2
USA	29.2	12.9	0.9	0.5
UK	27.6	7.7	0.9	0.9

1Global Burden of Disease Study 2013. Global Burden of Disease Study 2013 (GBD 2013) Obesity Prevalence 1990–2013. Seattle, United States: Institute for Health Metrics and Evaluation (IHME); 2014.

2de Onis M, Blossner M, Borghi E, Frongillo EA, Morris R. Estimates of global prevalence of childhood underweight in 1990 and 2015. JAMA. 2004;291(21):2600–2606.

3Prevalence of underweight, weight for age (% of children under 5): The World Bank; 2016 [cited 2017]. Available from: http://data.worldbank.org/indicator/SH.STA.MALN.ZS.

Risk of bias within and across studies

There is no recognised systematic method of evaluating the risk of bias in the observational studies reviewed in this meta-analysis. The risk of performance bias was common because it is difficult to blind measuring personnel when using devices such as the Broselow tape to estimate weight (although blinding can be achieved through masking the Broselow tape when assessing accuracy of colour zone measurement). There was also risk of detection bias, with unblinded outcome assessors, but most of the data was in objective, numerical form, and this lack of blinding was considered unlikely to cause significant bias. Reporting bias was countered by including all identified, methodologically-sound studies (published or not). Methodological sources of potential bias were common (e.g. if the Broselow tape was not actually used to estimate weight), but these studies were independently assessed and rated according to the individual risk of bias. Repeat evaluation of pooled data with and without studies at risk of this bias returned similar outcomes. Studies at high risk of bias were excluded from the meta-analysis. Table 3 contains a summary of the articles included in this review, including a brief narrative of the major findings and limitations of each study, the risk of bias assessment and level of evidence for each study.

Table 3

Author and date	Study size (N)	Country	Design	Patient ages	Estimation techniques evaluated	Statistics	Target	Data	LOE	Risk of Bias	Major findings; comments; major limitations
Molyneux 1999 [36]	142	Malawi	P	8 mo to 5 yrs	Blantyre tape, HCP guesses	PW20	<20%^†	Yes	5	M: UO: L	Findings: HCP guesses were very inaccurate; Blantyre tape better than guesses. A 20% error considered an acceptable target. Comments: Very young study population, mostly under 5 years. Limitations:Limited statistical analysis, inadequate detail of reporting

Bavdekar 2006 [37]	500	India	P	0–2 yrs	Novel formula based on foot length	C, other	None	No	5	S: HM: HO: H	Findings: Foot length can be used for emergency drug calculation Comments: Only 10% of population >1 yr old. Drug doses used as endpoint. Limitations: Poor statistics do not support the main finding Very poor statistics. Method of data analysis made findings unreliable and uninterpretable – bias confused with accuracy.

Varghese 2006 [38]	500	India	P	1–12 yrs	Argall, EPLS, Nelson, BT	C, MD	None	No	5	S: HM: UO: H	Findings: Formulas overestimated weight in this developing-world study. The BT was the most accurate. Comments: More than half the study population was under 6 months of age and only 8% >5 yrs. Limitations:Limited statistical and data analysis made findings unreliable

Pollock 2007 [39]	100	Malawi	P	1–7 yrs	EPLS, Luscombe	MPE	None	Yes	4	S: UM: UO: L	Findings: The Luscombe formula was less accurate than EPLS with greater overestimation of weight. The authors suggested length-based systems should rather be used. Comments: Scientific letter. Both formulas performed very poorly with significant overestimation of weight. Limitations:Incomplete statistical reporting and analysis

Ramarajan 2008 [40]	548	India	P	0–12 yrs	BT	MD, LOA, MPE, PW10	<10%^†	Yes	3	O: L	Findings: BT overestimated weight by >10% in Indian children over 10 kg. A correction factor was developed, but not validated. Comments: One of the few studies to show overestimation of weight by BT. Limitations:BT version NR. Incomplete statistical analysis and data presentation

Cattamanchi 2009 [41]	15,000	India	P	2 mo to 12 yrs	BT	C, MD, LOA, MPE, PW10	<10%^*	Yes	3	O: L	Findings: The BT performed well, especially in children <10 kg but underestimated all others, especially in children >18 kg. Comments: Abstract. Very large prospective study. The authors recommended a new version of BT for Indian children because of underestimation of weight. Limitations:BT version NR. Incomplete statistical analysis and some misinterpretation of data

Geduld 2011 [42]	2832	South Africa	P	0–10 yrs	EPLS, Luscombe, BG, BT	MD, LOA, MPE, PW10	<10%^*	Yes	4	O: L	Findings: BT and EPLS formula most accurate in this population. Careful titration of drugs and use of clinical judgment most important in using medications safely. Comments: Data from a poor community in South Africa. The accuracy of EPLS formula was the best ever reported while the accuracy of BT was on par with other reports. Only 4% of children excluded as too tall for the tape. The authors question whether the differences in accuracy of any weight estimation system are likely to affect outcomes. Limitations:BT not actually used and BT version NR

Ali 2012 [43]	1723	Trinidad	R	1–5 yrs	EPLS, Luscombe, new formula	R, MD, LOA, MPE, PW10	<10%^†	Yes	4	S: UM: UO: L	Findings: All formulas performed similarly and all poorly, even the new formula derived from the study population. Comments: Only children aged 1–5 included in study. Limitations:No validation sample for derived formula. Incomplete statistics

Trakulsrichai 2012 [44]	595	Thailand	P	0–12 yrs	BT, parental estimates, growth charts	MD, PW10	<10%^†	Yes	4	S: UM: UO: L	Findings: Family member estimation was most accurate and the BT the most accurate of other weight estimation methods. Comments: Equal underestimation and overestimation by BT while family estimates tended to overestimate. Limitations:Incomplete statistics. BT version NR.

Wozniak 2012 [25]	777	Botswana	P	18 mo to 12 yrs	EPLS, Luscombe, Theron, Cattermole, BT	PW10, PW15,% correct zone	<10%^†	Yes	4	M: UO: L	Findings: Prediction models incorporating MAC and either tibia or ulna length performed extremely well. Age-based formulas were very inaccurate. Comments: Masters dissertation. Weight was markedly overestimated by formulas in this population with a high prevalence of HIV. Limitations:Incomplete statistics

Akabarian 2013 [45]	403	Iran	P	0–14 yrs	BT, parental estimates	PW10, PW15	<10%^*	Yes	5	S: UM: UO: L	Findings: The BT was less accurate than parental estimates but recommended nonetheless. Comments: Article in Arabic. Very good performance of BT compared to previous studies. Limitations:Exclusion criterion of weight >35 kg limited the assessment of BT accuracy. Limited statistical analysis and data presentation. BT version NR

Clark 2013 [20], [46]	583	Sudan	R	6 mo to 5 yrs	BT	MPE, PW10 Colour zones	<10%^**	Yes	4	M: HO: H	Findings: The BT performed very poorly. Comments: Abstract. Study in South Sudan, the “hungriest place on earth” where 61% of study population was malnourished. There was up to a two colour-zone overestimation in severely malnourished children with only 26% agreement in normally nourished children. Very poor performance of the BT. Dangerous overestimation of weight in undernourished children. Limitations:Very limited statistical analysis and data presentation. BT not actually used and BT version NR

Hegazy 2013 [47]	508	Egypt	P	1–16 yrs	EPLS, Shann, Garwood formula	MD, MPE, PW10, PW20	<10%^†	Yes	5	S: UM: UO: L	Findings: Garwood formula performed best, especially in older children. Comments: Sample population of cancer patients. Very poor performance of all formulas tested – none were close to acceptable accuracy. Limitations:Formulas used for children older than intended. Poor interpretation of findings

House 2013 [48]	967	Kenya	P	0–14 yrs	BT, EPLS, Nelson	C, R, MD, LOA, MPE	MPE <10%	Yes	5	M: UO: L	Findings: BT performed better than formulas and a measure of habitus assessment (e.g. MAC) was suggested. BT should be used rather than formulas. Comments: BT2007B. Underestimation of weight predominated. Limitations:Incomplete statistical analysis and data presentation. Poor interpretation of results from previous studies; flawed outcome measures used (indicators of bias only)

Wells 2013 [5]	453	South Africa	P	0–12 yrs	BT, PAWPER tape	MD, LOA, RMSE, MPE, PELOA, RMSPE, PW10, PW20	<10%^†	Yes	3	S: UO: L	Findings: PT performed better than BT in every category analysed and better than any previously published system. Comments: BT2007B. Population with mixed under- and overweight. Multi-centre study of habitus-modified length-based weight estimation. Limitations: Assessment of body habitus a “guess”.

Omisanjo 2014 [49]	2754	Nigeria	P	1 mo to 11 yrs	Best Guess, Nelson	MPE, PELOA	MPE < ±5%	Yes	4	S: LM: UO: L	Findings: Neither formula was accurate in Nigerian children – substantial overestimation of weight. Comments: This was one of the largest prospective studies of age-based formulas in the developing world. Limitations:Some limitations from incomplete statistics

Batmanabane 2014 [18]	374	India	P	0–16 yrs	EPLS, ARC, Argall, BG, BT, Cattermole, Leffler, Luscombe, Nelson, Shann, TJ, TK, Mercy method	MD, RMSE, MPE, PW10, PW20	None	Yes	3	M: UO: L	Findings: Mercy method performed as well in Indian children as previously shown in Western populations. Comments: Good statistics. Overestimation of weight by all methods except Mercy. Limitations:BT not actually used and BT version NR.

Chiengkriwate 2014 [50]	3869	Thailand	R	0–15 yrs	BT	C, R, MD, MPE, PELOA, PW10	<10%^*	Yes	4	S: LM: UO: L	Findings: BT underestimated weight in Thai children, more so in older children, as in Western populations. Comments: BT 2007 edition A. BT performance consistently at a PW10 of just below 60%. Limitations:BT not actually used

Dicko 2014 [19]	473	Mali	P	0–16 yrs	Mercy, EPLS, ARC, BT, Nelson	MD, LOA, RMSE, MPE, PELOA, PW10, PW20	None	Yes	3	M: UO: L	Findings: Mercy method performed extremely well in this population in Mali, like its performance elsewhere in the world. Other methods overestimated weight. Comments: BMI of study population 15.6, with 22% underweight and 1.7% overweight or obese. Good inter-rater reliability. Limitations:Some limitations in statistics. BT not actually used and version NR

Eke 2014 [3]	370	Nigeria	P	1–12 yrs	APLS	C	None	No	5	S: LM: HO: H	Findings: The APLS formula underestimated weight in these Nigerian children. Comments: The findings are unreliable and the conclusions consequently unsupportable in view of the fatal methodological flaws. Limitations:Statistical analysis fatally flawed

Asskaryar 2015 [51]	1185	India	P	1 mo to 12 yrs	BT	MPE, PW10	<10%^†	Yes	5	S: UM: UO: L	Findings: BT significantly overestimated weight in Indian children. An 8% modification of the tape improved its accuracy. Comments: BT 2007 edition B. The improved performance was not substantially better than the original and was still below acceptable performance. Recalibrating bias alone is not enough when precision is low. Limitations:Limited data presentation and statistical analysis

Badeli 2015 [52]	216	Iran	P	1–10 yrs	DWEM, Oakley, TJ, TK, MAC, Theron, Leffler, EPLS, HCP guess, parental estimate	MPE, ICC	None	No	7	S: UM: HO: H	Findings: The authors reported that HCP guesses and EPLS formula were more accurate than other methods. Comments: The findings are completely unreliable and the conclusions consequently unreasonable. Limitations:Statistical analysis fatally flawed, so much as to provide findings opposite to likely true results. Analysis of bias confused with accuracy

Khouli 2015 [53]	815	Mexico	P	0–12 yrs	BT	C, MD, LOA, MPE, PELOA, PW10	None	Yes	4	S: LM: UO: L	Findings: BT not accurate in Mexican population. Comments: Reasonable statistics. Both under- and overestimation found to be a problem. Nearly half of children had at least one colour zone error. Limitations:BT not actually used and BT version NR. Poor interpretation of statistics.

Young 2015 [54]	207	Philippines	P	1–9 yrs	EPLS, APLS, Luscombe, BG, finger counting, BT	MD, LOA	None	No	5	S: UM: UO: U	Findings: BT performed best in this population, updated APLS formula performed worst. Comments: BT 2011 edition A. Limitations:BT not actually used. Very limited and incomplete statistical analysis. Not useful for comparison to other studies or for inclusion into meta-analysis

AlHarbi 2016 [55]	3537	Saudi Arabia	P	1 mo to 12 yrs	BT 2007B and BT 2011A	C, ICC, MD, LOA	None	No	5	S: UM: HO: U	Findings: BT2011A performed better than BT2007B in this population. The authors suggested that the tapes were accurate. Comments: The method of statistical analysis does not support the conclusions drawn in this paper. Limitations:Unclear if the BT was used or if derived from length measurements. Limited, incomplete and inappropriate statistical analysis. Not useful for comparison to other studies or for inclusion into meta-analysis

Aliyu 2016 [56]	300	Nigeria	P	0–5 yrs	BT, APLS	MD, LOA, MPE, PW10	None	Yes	4	S: UM: UO: L	Findings: BT and APLS formulas performed well in this population. Comments: Contrary to the authors conclusions, although the BT and APLS formula performed similarly, they were both inaccurate. Limitations:Not clear if BT used, or if derived from length measurements. BT edition NR. Incomplete statistical analysis

Bowen 2016 [57]	1381	Zambia	P	0–14 yrs	BT, APLS, EPLS, ARC, Argall, BG, CAWR, Garwood, Luscombe, Michigan, Nelson, Park, Shann, Theron, Tintinalli	MD, LOA, MPE, PW10, PW20	<10%^*	Yes	3	S: UM: UO: L	Findings: BT performed better than every formula in this population, BG and Michigan formulas performed worst. Comments: None of the methods were accurate and all methods overestimated weight. Limitations:BT not actually used. BT version NR

Georgoulas 2016 [12]	300	South Africa	P	1 mo to 12 yrs	BT, PAWPER, Wozniak, Mercy	MD, LOA, MPE, PELOA, PW10, PW20	<10%^*	Yes	3	S: LM: UO: L	Findings: PAWPER tape performed best, but good performances from Wozniak and Mercy methods. BT was worst performer. Comments: Unpublished data. BT 2011 edition A. Poor population with high proportion of underweight children. First comparative study of these methodologies. Relatively weaker performance by all methods in infants, but Wozniak especially was very weak. Limitations:Assessment of body habitus done by single researcher

Mishra 2016 [58]	603	India	P	0–10 yrs	BT	C, colour zone	None	No	7	S: LM: UO: U	Findings: BT performed bet in smallest children. Comments: BT 2007 edition B. Limitations:Very limited and incomplete statistical analysis. Not useful for comparison to other studies or for inclusion into meta-analysis

Ralston 2016 [59]	453990	MulticentreInternational	R	6 mo to 5 yrs	BT, MAC, height + MAC model	MD, LOA, MPE, PW10, PW25	None	Yes	4	S: LM: UO: L	Findings: A model incorporating height and MAC was the most accurate. BT 2011 edition A less accurate than 2007 edition B in this population. Comments: BT 2007 edition B and 2011 edition A. A very large multinational database study. Limitations:BT not actually used

Sahar 2016 [60]	1163	Malaysia	P	0–12 yrs	BT	MPE, PELOA	None	Yes	4	S: UM: UO: L	Findings: BT underestimated weight in small children and overestimated weight in older children. It was not accurate. Comments: As with studies elsewhere there was a large variation in accuracy. Limitations:BT version NR. Limited and incomplete statistical analysis

Wells 2017 [27]	328	South Africa	P	0–16 yrs	BT, PAWPER, Wozniak, Mercy	MD, LOA, MPE, PELOA, PW10, PW20	<10%^†	Yes	3	S: UM: LO: L	Findings: The BT performed poorly in this study, the Wozniak method and the Mercy method showed intermediate accuracy and the PAWPER was most accurate overall. Comments: Unpublished data. This was a population selected for older children and for children with deviations from “average” weight-for-length. The extended PAWPER tape worked well in this population. Limitations:The Mercy method was used in a simulated resuscitation setting (supine children) which may have affected its accuracy

Whitfield 2017 [24]											As for Wozniak 2012

Level of Evidence (LOE)

Level 1 – Randomized clinical trials or meta-analyses of multiple clinical trials with substantial treatment effects

Level 2 – Randomized clinical trials with smaller or less significant treatment effects

Level 3 – Prospective, controlled, non-randomized cohort studies

Level 4 – Historic, non-randomized cohort or case-control studies (retrospective from chart)

Level 5 – Case series; patients compiled in serial fashion, control group lacking (inappropriate exclusion of cases)

Level 6 – Animal studies or mechanical model studies or adult studies applied to children

Level 7 – Extrapolations from existing data collected for other purposes, theoretical analyses

Level 8 – Rational conjecture (common sense); common practices accepted before evidence-based guidelines

If data analysis was not appropriate for method-comparison studies–no assessment of bias, precision (with confidence intervals or measure of variance) and accuracy–then the LOE was downgraded one level. If study did not group ages and/or weight categories appropriately or alternatively use percentage error analysis or logarithmic transformation, then the LOE was downgraded one level. Studies that were downgraded on this basis are identified by a double-dagger superscript‡ in the LOE column.

The risk of bias assessment was made using standard principles as follows:

Risk of Bias

Only bias that was not considered to be “low” is indicated in the table

S – selection bias; no randomisation in any study that was screened (appropriately), so any form of systematic or preferential selection was flagged

M – methodological bias; methodological flaws which might have affected the results and impact on the meta-analysis were flagged

O – overall bias; the impact of potential bias on overall findings and impact on meta-analysis was assessed and indicated

L – low

U – unknown

H – high

Abbreviations: Pro (prospective study), Retro (retrospective or virtual study), EPLS (European paediatric life support formula), BG (best guess formula), ARC (Australian resuscitation council formula), APLS (new advanced paediatric life support formula), CAWR (Chinese age-weight rule), MAC (mid-arm circumference).

Studies included in the qualitative review and quantitative meta-analysis, in chronological order. A summary of findings as well as a short commentary on significant aspects is included. In the description of target accuracy, some studies used an implied weight-estimation target (indicated by an asterisk*) and some expressed a clear, strong opinion (indicated by a dagger†). The level of evidence was assessed was a system modified from that used by the European Resuscitation Council. The LOE provided an overall index of the reliability of an individual study. Level of Evidence (LOE) Level 1 – Randomized clinical trials or meta-analyses of multiple clinical trials with substantial treatment effects Level 2 – Randomized clinical trials with smaller or less significant treatment effects Level 3 – Prospective, controlled, non-randomized cohort studies Level 4 – Historic, non-randomized cohort or case-control studies (retrospective from chart) Level 5 – Case series; patients compiled in serial fashion, control group lacking (inappropriate exclusion of cases) Level 6 – Animal studies or mechanical model studies or adult studies applied to children Level 7 – Extrapolations from existing data collected for other purposes, theoretical analyses Level 8 – Rational conjecture (common sense); common practices accepted before evidence-based guidelines If data analysis was not appropriate for method-comparison studies–no assessment of bias, precision (with confidence intervals or measure of variance) and accuracy–then the LOE was downgraded one level. If study did not group ages and/or weight categories appropriately or alternatively use percentage error analysis or logarithmic transformation, then the LOE was downgraded one level. Studies that were downgraded on this basis are identified by a double-dagger superscript‡ in the LOE column. The risk of bias assessment was made using standard principles as follows: Risk of Bias Only bias that was not considered to be “low” is indicated in the table S – selection bias; no randomisation in any study that was screened (appropriately), so any form of systematic or preferential selection was flagged M – methodological bias; methodological flaws which might have affected the results and impact on the meta-analysis were flagged O – overall bias; the impact of potential bias on overall findings and impact on meta-analysis was assessed and indicated L – low U – unknown H – high Abbreviations: Pro (prospective study), Retro (retrospective or virtual study), EPLS (European paediatric life support formula), BG (best guess formula), ARC (Australian resuscitation council formula), APLS (new advanced paediatric life support formula), CAWR (Chinese age-weight rule), MAC (mid-arm circumference).

Meta-analysis data on bias (trueness), precision and accuracy of paediatric weight estimation systems

Table 3 contains a description of each of the weight estimation systems evaluated, as well as any restrictions on their applicability. The raw and imputed data for each of the weight-estimation methodologies included in the meta-analysis are shown in the supplementary Table. Of the 17 studies that indicated a target for weight estimation accuracy, 16 favoured an error <10% as desirable, while one study supported a <20% error.Supplementary Fig. A shows the forest plots of the pooled data for each of the weight estimation systems. The following main findings can be summarised from these figures: Bias was highly variable between populations for age-based formulas, but with an overall pattern of overestimation of weight. There was substantial overestimation of weight from the new Advanced Paediatric Life Support (APLS), Australian Resuscitation Council (ARC), Best Guess (BG), Luscombe and Nelson formulas. The Broselow tape also showed large variations in bias between studies while bias was very close to zero for the two-dimensional systems. There was very poor precision (wide limits of agreement) within as well as wide variability between studies for all one-dimensional weight estimation systems, especially the age-based formulas. Differences between populations were primarily disparities in bias, but with consistently poor precision. The two-dimensional systems, except for the Wozniak method, showed narrow limits of agreement, with a PW10 >70% and a PW20 >95%. Overall, no age-based formula performed well. The Broselow tape was noticeably better than the formulas, although still with limits of agreement wider than desirable. Of the habitus-based systems, the PAWPER tape and the Mercy method performed best and the Wozniak method less well with more outlier estimations. There was no significant trend detected for any method in terms of change in performance over time. The studies from the poorest countries (such as Sudan and Botswana), with very underweight populations, showed the greatest overestimation of weight by the one-dimensional systems. Supplementary Fig. B shows the PW10 data (accuracy), with the following notable findings: Only the PAWPER tape, parental estimates, the Wozniak method and the Mercy method exhibited reasonable accuracy (PW10 >70%). The results were similar to those described for the parametric data: wide inter-study variation, poor performance of age-based formulas, intermediate performance of the Broselow tape and good performance of the two-dimensional systems and parental estimates. Fig. 2 shows the comparison between the bias, precision and accuracy for the pooled data for each of the major weight-estimation systems. The important findings were:

Fig. 2

Bar chart and forest plot and showing the pooled, random effects data of all weight estimation systems evaluated. The studies are ordered from bottom to top according to decreasing variance (MPE data) and increasing accuracy (PW10 data) respectively. The number of studies included for each system is indicated. The vertical red line on the bar chart indicates the threshold for acceptable accuracy on a weight estimation system. The shaded green area and dashed lines on the forest plot indicate the maximum acceptable MPE and PELOA benchmark, respectively.

Most age-based formulas had a large to very large bias towards overestimation of weight, with the European Paediatric Life Support Formula (EPLS) formula the least biased. There was a notably lesser bias for the Broselow tape and the two-dimensional methods. There were wide limits of agreement for all methods other than the PAWPER tape and the Mercy method. The poorer precision of the Wozniak method (compared to the other two-dimensional methods), with a high PW10, indicated many outlier estimations, probably because of its poor performance in infants. Age-based formula systems occupied the 12 worst performing positions in the parametric analysis and 13 of the 15 worst-performing in the non-parametric analysis. No formula had a PW10 accuracy of better than 48%. The new APLS formula was much less accurate than the EPLS formula, which it superseded when the EPLS formula was considered to have become too inaccurate. two-dimensional habitus-modified systems occupied three of the top four PW10 positions and the top two mean percentage error/limits of agreement positions. Parental estimates and the Broselow tape were the best performing, non-habitus modified systems. If a PW10 >70% and PW20 >95% were used as a benchmark of acceptable accuracy, only parental estimates, the PAWPER tape, the Mercy method and the Wozniak method would have achieved this standard. Bar chart and forest plot and showing the pooled, random effects data of all weight estimation systems evaluated. The studies are ordered from bottom to top according to decreasing variance (MPE data) and increasing accuracy (PW10 data) respectively. The number of studies included for each system is indicated. The vertical red line on the bar chart indicates the threshold for acceptable accuracy on a weight estimation system. The shaded green area and dashed lines on the forest plot indicate the maximum acceptable MPE and PELOA benchmark, respectively. Fig. 3 shows the results of the direct statistical comparisons between weight estimation systems (from studies where pair-wise data could be pooled) using non-parametric measures of accuracy (PW10). There was no significant difference between most of the age formulas, with a one exception, as shown in the figure, with a small effect size (odds ratio of 2.1). The Broselow tape was significantly more accurate than every age-based formula with which it was compared, with a medium effect size (odds ratio around 4). The Wozniak method, the Mercy method and the PAWPER tape significantly outperformed the Broselow tape with a small to medium effect size (odds ratio 2.0, 3.4 and 5.3 respectively) and the PAWPER tape was significantly better than the Mercy method with a small effect size (odds ratio 2.3).

Fig. 3

Direct comparisons between weight-estimation systems using pooled, paired data. The PW10 statistic was used with an inverse variance meta-analysis, employing a random-effects model. This model was selected because of non-uniform samples with high inter- and intra-sample variability. Outcomes where the total or pooled result do not cross 1 were considered statistically significant. The results of the pooled data analyses are shown in Table 4. There were very few substantial differences between the FE and RE analyses. These differences were not substantial enough to raise uncertainties about the overall outcomes.

Table 4

Weight estimation meta-analysis summary data. This table contains both fixed effects (FE) and random effects (RE) data. There was no subgroup data available.

	System		MPEMean (95%CI)	LLOA (95%CI)	ULOA (95%CI)	Number of children (number of studies)	PW10 (95%CI)	Number of children (number of studies)
Age-based weight estimation formulas	Ali formula		−3.1 (−3.9, −2.3)	−36.2 (−37.7, −34.7)	30.0 (28.5, 31.5)	1723 (1)	47.5 (45.1, 49.9)	1723 (1)
	APLS formula (new)	FERE	13.9 (12.4, 15.5)13.5 (12.0, 14.9)	−34.7 (−37.6, −31.7)−31.5 (−34.2, −28.7)	62.6 (59.6, 65.5)58.4 (55.7, 61.2)	945 (3)	27.6 (25.9, 29.2)29.0 (27.3, 30.6)	2820 (5)
	ARC formula	FERE	11.9 (10.5, 13.3)14.0 (12.4, 15.5)	−28.9 (−31.6, −26.2)−30.4 (−33.4, −27.5)	52.7 (50.0, 55.4)58.3 (55.4, 61.3)	796 (2)	35.2 (33.0, 37.3)33.5 (31.4, 35.7)	1867 (3)
	Argall formula	FERE	31.5 (27.6, 35.4)	−29.3 (−36.5, −22.0)	92.3 (85.0, 99.5)	249 (1)	22.4 (20.1, 24.8)17.9 (15.7, 20.0)	1201 (2)
	Best Guess formula	FERE	20.1 (19.5, 20.6)25.8 (25.2, 26.5)	−24.9 (−25.9, −23.8)−25.3 (−26.7, −24.1)	65.0 (64.0, 66.1)77.0 (75.6, 78.2)	6233 (4)	22.7 (21.8, 23.6)17.8 (16.9, 18.6)	8197 (6)
	EPLS formula	FERE	−2.1 (−2.5, −1.7)3.4 (2.9, 3.8)	−37.2 (−38.0, −36.4)−36.4 (−37.3, −35.5)	33.0 (32.2, 33.8)43.1 (42.2, 44.0)	6565 (7)	46.2 (45.1, 47.3)39.5 (38.4, 40.5)	8167 (9)
	Garwood formula	FERE	14.4 (11.6, 17.2)	−41.3 (−46.5, −36.0)	70.1 (64.8, 75.3)	394 (1)	27.5 (25.2, 29.8)30.1 (27.8, 32.4)	1465 (2)
	Leffler formula	FERE	27.8 (24.2, 31.4)	−28.1 (−34.7, −21.4)	83.7 (77.0, 90.3)	247 (1)	20.3 (18.1, 22.5)16.8 (14.7, 18.8)	1271 (2)
	Luscombe formula	FERE	9.9 (9.3, 10.5)18.1 (17.5, 18.7)	−31.7 (−32.8, −30.5)−26.0 (−27.2, −24.9)	51.5 (50.3, 52.6)62.2 (61.1, 63.4)	4909 (4)	30.8 (29.7, 32.0)23.0 (22.0, 24.0)	6387 (6)
	Nelson formula	FERE	10.0 (9.4, 10.6)16.2 (15.5, 16.8)	−28.0 (−29.0, −26.9)−26.6 (−28.0, −25.4)	47.9 (46.9, 49.0)59.0 (57.6, 60.2)	4419 (4)	36.5 (35.3, 37.7)28.8 (27.7, 30.0)	6026 (6)
	Shann formula	FERE	5.4 (3.5, 7.3)6.6 (4.6, 8.5)	−46.7 (−50.3, −43.2)−45.8 (−55.3, −62.5)	57.5 (54.0, 61.1)58.9 (55.3, 62.5)	744 (2)	31.0 (28.9, 33.1)30.8 (28.7, 33.0)	1815 (3)
	Theron formula	FERE	51.4 (46.1, 56.7)	−32.1 (−42.0, −22.2)	135 (125, 145)	249 (1)	14.4 (12.8, 16.1)11.8 (10.3, 13.3)	1700 (3)

1D length-based systems	Broselow tape	FERE	4.5 (4.5, 4.5)2.9 (2.8, 2.9)	−16.3 (−16.3, −16.2)−25.4 (−25.5, −25.3)	25.2 (25.2, 25.3)31.1 (30.9, 31.2)	467020 (14)	62.6 (62.4, 62.7)47.1 (47.0, 47.2)	484883 (19)
1D length-based systems	Growth-charts						51.4 (47.4, 55.4)	595 (1)

2D systems	PAWPER tape	FERE	0.7 (0.3, 1.1)0.9 (0.5, 1.3)	−12.4 (−13.2, −11.7)−12.6 (−13.3, −11.8)	13.8 (13.1, 14.6)14.4 (13.6, 15.1)	1081 (3)	87.1 (85.1, 89.1)86.9 (84.9, 88.9)	1081 (3)
	Mercy Method	FERE	−1.2 (−1.7, −0.7)−1.1 (−1.6, −0.5)	−18.8 (−19.7, −18.0)−18.7 (−19.5, −17.8)	16.4 (15.6, 17.3)16.4 (15.6, 17.3)	1475 (4)	71.2 (68.9, 73.5)71.4 (69.1, 73.7)	1475 (4)
	Wozniak method	FERE	−3.8 (−5.1, −2.5)−3.8 (−5.1, −2.5)	−36.1 (−38.6, −33.7)−36.1 (−38.6, −33.7)	28.5 (26.1, 31.0)28.5 (26.1, 31.0)	628 (2)	74.9 (72.6, 77.2)72.1 (69.8, 74.4)	1405 (3)

Other systems	MAC formula	FERE					27.9 (27.8, 28.0)30.1 (30.0, 30.2)	454767 (2)
Other systems	Estimate by parent	FERE					81.3 (78.9, 83.7)80.4 (77.9, 82.9)	998 (2)

Weight estimation meta-analysis summary data. This table contains both fixed effects (FE) and random effects (RE) data. There was no subgroup data available. When examining the PW20 data, only the PAWPER tape (98.7%) and the Mercy method (96.3%) met the benchmark acceptability criteria. The Broselow tape and pooled age-based formulas achieved PW20 values of 76.2% and 59.9% respectively.

Discussion

In this quantitative meta-analysis, we demonstrated a significantly superior performance of the two-dimensional weight estimation systems over all other methodologies, except for parental estimates. The Broselow tape showed an intermediate performance. Age-based formulas showed an unvaryingly dismal precision and accuracy with a pronounced bias to overestimation of weight which could create a high potential for patient harm.

Guesses by healthcare providers and estimates by parents

There was only one low- and middle-income country study, from Malawi, that evaluated healthcare provider guesses of weight, which showed a very poor accuracy, with a PW20 of only 54% [36]. This is in keeping with studies from a high-income country, which showed similarly poor accuracy [36], [61]. healthcare providers cannot, and should not, use guesses to obtain an estimate of weight for drug dose calculations [62]. Family estimates of weight are only useful if the family member, usually a parent, is present with the child at the time that they require emergency care, if they are willing to offer an estimate and are in an appropriate emotional state to do so [63]. There have been only two studies in low- and middle-income countries that evaluated the accuracy of family estimates of their children’s weight, from large central hospitals in Thailand and Iran, which both showed excellent results [44], [45]. Interestingly, despite these estimates proving more accurate than the Broselow tape in one study, the authors still recommended the use of the tape over the family estimates [45]. If the conditions that have previously been shown to be associated with an accurate weight estimation by family are met (the accompanying family member must be the regular caregiver of the child and the child must have had a recent measurement of weight by, or in the presence of that family member), these estimations probably will have sufficient accuracy for acceptable resuscitation drug dosage calculation [44], [64]. It remains a clinical judgement call as to whether to rely on a parental estimate or not, but some authors argue that clinicians can assume proper therapeutic doses even if parents are only semi-accurate in estimating a child’s weight [65]. Parental estimates have the advantage of providing an equal accuracy across a wide age and weight range, unlike many other systems that have decreasing accuracy with increasing age and weight [63]. Parental estimates have been shown to be poor in obese children, however [66]. In this meta-analysis, parental estimates were better than all systems other than the two-dimensional systems. These studies were from relatively “privileged”, urban areas of low- and middle-income countries, however, and whether these findings would be translatable to other settings is not clear. The prerequisites for appropriate use of parental estimates mean that it cannot be guaranteed to be able to be used in all children. It is a useful backup method of weight estimation, but should probably not be used or relied on as a primary weight estimation method.

Age-based formulas

In this meta-analysis, every age-based formula that was tested showed exceedingly poor accuracy, with a PW10 >50% for only one formula in only one study [42]. There were numerous studies identified that evaluated age-based formulas, none of which showed acceptable accuracy or precision [57]. This is similar to what has been found in high-income countries [67]. Although there are differences in bias between studies from high-income countries and low- and middle-income countries (under- vs overestimation), poor precision and accuracy are a consistent finding with age-based formulas in all populations. The fundamental causes of the inaccuracy these systems are the inherent variability of weight-for-age and a non-linear relationship between weight and age. In this study, even the most homogeneous populations showed an intra-study variance that was much the same as that between studies. A desktop growth-chart exercise can demonstrate that children on the 3rd or 97th centiles of weight-for-age may receive weight estimations that are 80% above or 40% below their actual weight when applying age-based formulas, even in the age group in which formulas are most accurate (<5yrs) [11], [68]. This becomes especially important in low- and middle-income countries where both extremes of weight-for-age may be prevalent since obesity is as much of a problem in these societies as in high income countries [68]. Some authors and courses still support the use of age-based formulas because of their apparent simplicity and because they require no equipment [69]. Their use does require that the child’s correct age be known (guessing age has been shown to lack accuracy [70]) and that the formula is remembered and calculated accurately: both memory and simple arithmetic are unreliable in emergencies, as stress precipitates errors [71]. In this meta-analysis, no age-based formula performed well and there was evidence of significant and sizeable overestimation of weight by these systems [18], [19], [39]. No formula was clearly superior to the others: although there was one significant difference (EPLS vs. Luscombe), the effect size was small. The APLS formula merits particular comment: it was introduced in 2011 because of demonstrated inaccuracies of the EPLS (original APLS) formula [72]. The meta-analysis data showed that the new three-part APLS formula was even less accurate than its predecessor with a greater overestimation of weight in low- and middle-income countries. An inescapable conclusion is that age-based formulas should no longer be used and clinicians that manage children should ensure that a better weight-estimation system is available [12], [73], [74]. Although there is an increasing body of opinion that age-based formulas are statistically incapable of estimating weight accurately, they continue to be used, taught and researched [75]. The futility of age-based weight estimation was summed up perfectly by the author of a very large study on age-based formulas: “Accurate paediatric weight estimation by age: mission impossible” [67].

One dimensional length-based methods

Length-based formulas

The Traub-Johnson and Traub-Kichen formulas were designed to predict a child’s weight at the 50th weight-for-length centile by using body length. There was only one study that evaluated these formulas, an Indian study with a very high prevalence of low weight-for-length, which showed poor accuracy and a bias to overestimate weight [18]. These equations are complex and require the use of a scientific calculator, which makes them vulnerable to errors, which might not be instantly detectable.

Growth charts methods

Only one study evaluated this methodology in a Thai population, with a poor outcome (PW10 51.4%) [44]. The inaccuracy and time-consuming nature of this technique does not make it suitable for use in emergencies. The accuracy is also dependent on contemporary population-specific growth charts. Some experts have suggested that the weight estimate can be modified according to body habitus by shifting to an appropriate centile [76]. This has never been evaluated, however.

One dimensional tape-based methods

Although there are at least nine length-only weight-estimation tapes, only the Broselow tape has been extensively studied – it is by far the most studied weight-estimation tool in the literature. Of the other tapes, only the Blantyre tape has been formally evaluated: a single, small study of very young children, which reported a PW20 of 78.8% [36]. The Broselow tape is, in effect, a full-scale depiction of the 50th centile of the National Centre for Health Statistics (NCHS) weight-for-length growth chart. Like other one-dimensional length-based systems, it is vulnerable to error based on variations from the median of weight-for-length (differences in body habitus) [20], [40], [51], [77]. Two studies from India have attempted to apply a correction factor to limit the overestimation of weight by the Broselow tape [40], [51]. Although this succeeded in partially correcting the bias, the precision and overall accuracy were still poor. In this meta-analysis, the Broselow tape showed only a moderate degree of accuracy in most studies, with a tendency to overestimate weight. Although studies in Western populations have demonstrated an overall underestimation of weight, the results of this meta-analysis, as well as the individual studies in low- and middle-income country populations, have mostly shown an overestimation of weight, potentially to a dangerous degree (if drug doses were to be computed from those weights) [18], [19], [20], [40], [51]. Proponents of the Broselow tape have defended its underestimation of weight in overweight populations by suggesting that the tape estimates ideal body weight (IBW) rather than total body weight (TBW), which might be advantageous in obese children from a drug dosing perspective [4], [7]. This is not true. IBW is higher than TBW in underweight children and inadvertent overdosing may easily occur. IBW should only be used in obese children and only for dose calculations for hydrophilic drugs [78]. The Broselow tape significantly outperformed the age-based formulas against which it could be directly compared, with a medium to large effect size. The assessment of the performance of the Broselow tape was somewhat confounded by the fact that the tape itself has evolved over successive editions from 1998 to 2011, both in content as well as the position of the weight- and colour-divisions. The newest version of the Broselow tape (2011 edition A) was substantially modified from the previous version (2007 edition B) to decrease the underestimation of weight reported from studies in high income countries. This may make it overestimate weight to an even greater degree in populations with a high prevalence of underweight children [20]. Since the Broselow tape has proven inaccurate across the world, both in populations with underweight as well as populations with obese children, its use should be reconsidered [56], [79], [80], [81].

One dimensional habitus-based systems

Although two studies in high income countries have shown that a formula based on mid-arm circumference (MAC) was more accurate than age-based formulas in older children, the only study to date in low- and middle-income countries showed a very poor result [24], [25], [82], [83]. The age-range restriction of this method is also prohibitive and, like other one-dimensional systems is unlikely to ever achieve acceptable accuracy and precision [6].

Two-dimensional length-based, habitus-based, systems

The developers of the originator two-dimensional system were the first to recognise that “Body habitus + height = accurate weight estimate”, which has been the basis of the most accurate systems available today [21]. The Wozniak system, derived from a study in Botswana in 2012, validated the use of segmental lengths (ulna and tibia lengths in this case) as surrogates of total body length, when combined with MAC as a measure of body habitus [24], [25]. This system was very accurate in an undernourished, high HIV-prevalence population and established a new level in terms of what accuracy a weight estimation system could achieve. It was promising in two additional validation studies in South Africa, but very poor performance in infants was worrying – further studies are required to establish whether this system warrants further development [12], [27], [84]. The Mercy method, developed in the USA, similarly makes use of humerus length as a surrogate for length, together with MAC as surrogate for habitus, to generate an estimation of weight [23]. The pooled accuracy of this method in the meta-analysis was excellent with an accuracy of PW10 of around 70% in both over- and undernourished populations and in children aged from 1 to 16 years. The authors suggested that this system might even be accurate enough to use instead of poorly calibrated scales in resource-limited communities, although this claim still needs to be tested [11]. The practicality and efficacy of the Mercy system in emergency care still needs to be evaluated: the poorest performance of the Mercy method was in one study which measured children in the supine position, as it would be used in an emergency [27]. In addition, the arithmetic requirements, simple when encountered with no stress, might prove difficult and error-ridden during a resuscitation scenario. The PAWPER tape, developed in South Africa, includes a measure of total body length and of body habitus in the methodology. Once a length-based weight has been estimated in a child, this weight can be modified up or down according to a visual assessment of the child’s body habitus [5]. Although length-based measurements are objective and simple to perform, assessment of body habitus is more subjective and dependent on training and experience, but generally reliable [74]. Differences in “average” habitus between different populations and at different ages may contribute to making accurate habitus scoring more difficult [1], [2]. The PAWPER tape system showed very good results in this meta-analysis with a PW10 of 86.9%, excellent precision and virtually zero bias. Directed paired comparisons on pooled data showed the Mercy, Wozniak and PAWPER systems to be significantly better than the Broselow tape with medium to large effect sizes, which indicated that these systems were far superior in accuracy. Although there were no studies that provided data to allow for direct comparisons between age-based formulas and the two-dimensional systems, the superiority of the two-dimensional methods can be inferred from other findings. Direct comparisons between the Mercy method, the Wozniak method and the PAWPER tape showed to PAWPER tape to be significantly more accurate than both the other methods in the low- and middle-income countries, with a small to medium effect size.

Differences between and within different populations

Although there were differences in bias between different studies, the underlying lack of precision (wide limits of agreement) within each population was similar. In other words, the variability between populations was congruent with the intra-population variability shown in even the most homogeneous populations. The significance of this is that, although non-habitus based recalibration of a system for a specific population might reduce the bias, the underlying variability and imprecision would not allow an acceptable degree of overall accuracy. This was well shown in the study by Asskaryar 2015, which failed to successfully recalibrate the Broselow tape in an Indian population by manipulating only the observed bias [51]. This further suggests that the one-dimensional systems do not have the statistical capacity to accommodate for either intra- or inter-population variability. No studies looked at differences in accuracy between the sexes, but where differences have been shown in other populations they have been of very small effect size, equivalent to within-population variance. The two-dimensional systems, especially the Mercy method and the PAWPER tape, have proven to be the closest to a universally applicable system – uniformly accurate at most ages, both within and between various populations and in both the high-income and low- and middle-income countries. The heterogeneity between different study populations would diminish or undermine the meta-analysis findings but for two factors. The first consideration was that virtually all studies on the one-dimensional systems showed substantial variance of weight estimation performance within each study population as well as between populations. The meta-analysis findings are therefore likely to be valid given this similarity. The second consideration is that the two-dimensional systems showed far less variance both within and between study populations, which suggested that much of the variance was attributable to the weight estimation methodology rather than the population characteristics or differences. When two-dimensional and two-dimensional systems were evaluated in the same sample, the inevitably higher variance found in one-dimensional systems reinforced this conclusion.

Important relevant and related issues not covered in this review

There were issues that were not addressed in this review, which have an impact on the use of weight-estimation systems in emergencies, as well as the interpretation of the findings of this study: Some weight estimation systems were not included because of an absence of sufficient evaluable data (see Table 2). There are strong, valid arguments for the use of alternative measures of weight for (some) drug dose calculations in obese children [4], [8]. Weight estimation systems should, ideally be able to estimate both TBW and IBW to allow for accurate drug dose calculations for underweight, “normal weight” and obese children [78], [85], [86]. How weight-estimation systems and resuscitation aids integrate may prove to be as important as the accuracy of weight estimation. Further research is required to establish the optimum balance between accuracy and cognitive demands during emergency care. The two-dimensional systems should be developed with this integration in mind. What degree of under- or overestimation of weight is dangerous to a child when calculating drug doses is not known [70]. Small differences in weight estimation accuracy are unlikely to reduce medication errors, but large improvements in accuracy, such as demonstrated between the one-dimensional and two-dimensional systems in this study, make fewer medication errors much more likely [42], [71], [87]. Best practice dictates that it is essential to eliminate as many sources of errors as possible [77], [88]. Luscombe, one of the early researchers in age-based weight estimation, phrased it perfectly: “To continue to use a demonstrably poor weight estimate cannot be considered good medical practice” [16].

Limitations

The limitations of this study are analogous to what is expected from any meta-analysis of this nature [32]. The poor quality of statistical reporting and analysis in many articles and the resultant need for imputation of data was the major limitation in this study. The heterogeneity between the populations of the included studies could affect the outcomes of the meta-analytical statistical techniques. However, since the same weight estimation systems are being used in vastly different populations it was considered appropriate to evaluate the performance of these systems in individual as well as pooled samples to establish the validity of the current practice. One of the key objectives of this study was to determine whether individual weight estimation systems could transcend both within- and between-population heterogeneity. The relative paucity of studies containing direct, paired comparisons between systems further limited the quantitative aspect of the meta-analysis. Nonetheless, the magnitude of the differences between the groups of weight-estimation systems was clearly apparent is most likely a true finding despite these limitations.

Conclusions

The only weight-estimation systems that were found to be of acceptable accuracy were the length-based, habitus-modified systems and parental estimations of weight. The PAWPER tape and the Mercy Method achieved an accuracy and precision that surpassed all other methods in low- and middle-income country populations. These systems should be used and developed for clinical use and integration with resuscitation aids. Wide discrepancies in the accuracy of the Broselow tape should provoke tremendous caution in its use (without a formally developed and validated habitus-modifying component). In its latest iteration, the 2011 edition A, it may substantially overestimate weight in children from low- and middle-income countries or poor communities, resulting in inadvertent medication overdosing. Without exception, all the age- and length-based formulas evaluated proved to be unacceptably inaccurate, with consequent high possibility for patient harm. There is sufficient evidence to conclude that age-based formulas should no longer be used. The benefit of not requiring equipment for their use is heavily outweighed by their inaccuracy and the negative cognitive load and vulnerability to error during resuscitation scenarios. The hierarchy of choice of weight estimation methods is thus: dual length- and habitus-based systems should be used for weight estimation in children from low- and middle-income countries because of superior accuracy to other systems (high quality evidence); the Broselow tape or parental estimates of weight should be used for weight estimation in preference to age-based formulas and healthcare provider guesses (high quality evidence); age-based formulas and healthcare provider guesses should not be used for weight estimation in children because of potential patient harm (high quality evidence).

Sources of support

No financial support was given for this study; the authors covered any costs incurred.

Guide to tables and figures
Table 1 shows the prevalence of underweight and obesity in LMIC and HIC which gives a perspective on the scope of the problem of weight estimation in children
Table 2 provides a description and some details of some of the most commonly-used weight estimation systems in children
Fig. 1 provides a summary of the PRISMA methodology and the number of studies reviewed and included in the meta-analysis
Table 3 contains a summary on each of the studies included in the review and meta-analysis
Supplementary Fig. B illustrates the bias and precision and accuracy of each study for each method. Weight estimation methods that were accurate in one study were generally accurate in all, and methods that showed a high variance and poor accuracy were consistently poor performers
Fig. 2 shows the results of the pooled data for bias, precision and accuracy. This figure provides the overall best indication of the performance of each methodology
Table 4 provides the actual numbers and finer details of the results displayed in Figs. 2 and 3
Fig. 3 shows the results of direct comparisons between various weight-estimation systems using data from multiple studies

65 in total

1. Validation of weight estimation by age and length based methods in the Western Cape, South Africa population.

Authors: Heike Geduld; Peter W Hodkinson; Lee A Wallis
Journal: Emerg Med J Date: 2010-10-13 Impact factor: 2.740

2. Weight estimation in paediatric resuscitation: A hefty issue in New Zealand.

Authors: Sally Britnell; Jane Koziol-McLain
Journal: Emerg Med Australas Date: 2015-04-06 Impact factor: 2.151

3. Are APLS formulae for estimating weight appropriate for use in children admitted to PICU?

Authors: Christopher Flannigan; Thomas W Bourke; Ashley Sproule; Ashey Sproule; Mike Stevenson; Mark Terris
Journal: Resuscitation Date: 2014-04-12 Impact factor: 5.262

4. Weight estimation in resuscitation: is the current formula still valid?

Authors: Mark Luscombe; Ben Owens
Journal: Arch Dis Child Date: 2007-01-09 Impact factor: 3.791

5. Anthropometric measures are simple and accurate paediatric weight-prediction proxies in resource-poor settings with a high HIV prevalence.

Authors: Kyly C Whitfield; Roberta Wozniak; Mia Pradinuk; Crystal D Karakochuk; Gabriel Anabwani; Zachary Daly; Stuart M MacLeod; Charles P Larson; Timothy J Green
Journal: Arch Dis Child Date: 2016-04-12 Impact factor: 3.791

6. Visual biases in judging body weight.

Authors: Katri K Cornelissen; Lucinda J Gledhill; Piers L Cornelissen; Martin J Tovée
Journal: Br J Health Psychol Date: 2016-02-09

7. Using age on clothes size label to estimate weight in emergency paediatric patients.

Authors: Laura D Elgie; Andrew R Williams
Journal: Eur J Emerg Med Date: 2012-10 Impact factor: 2.799

8. Accuracy of the Broselow Tape in South Sudan, "The Hungriest Place on Earth".

Authors: Melissa C Clark; Roger J Lewis; Ross J Fleischman; Adedamola A Ogunniyi; Dipesh S Patel; Ross I Donaldson
Journal: Acad Emerg Med Date: 2015-12-15 Impact factor: 3.451

9. Meta-analysis: neither quick nor easy.

Authors: Nancy G Berman; Robert A Parker
Journal: BMC Med Res Methodol Date: 2002-08-09 Impact factor: 4.615

10. Using the Mercy Method for Weight Estimation in Indian Children.

Authors: Gitanjali Batmanabane; Pradeep Kumar Jena; Roshan Dikshit; Susan Abdel-Rahman
Journal: Glob Pediatr Health Date: 2015-01-09

9 in total

1. A validation of newly developed weight estimating tape for Korean pediatric patients.

Authors: Sungwoo Choi; Sangun Nah; Sumin Kim; Eun O Seong; So Hyun Kim; Sangsoo Han
Journal: PLoS One Date: 2022-07-07 Impact factor: 3.752

2. The utility of pediatric age-based weight estimation formulas for emergency drug dose calculations in obese children.

Authors: Mike Wells; Lara Goldstein
Journal: J Am Coll Emerg Physicians Open Date: 2020-05-22

3. Drug dosing errors in simulated paediatric emergencies - Comprehensive dosing guides outperform length-based tapes with precalculated drug doses.

Authors: Mike Wells; Lara Goldstein
Journal: Afr J Emerg Med Date: 2020-02-07

4. A validation of the PAWPER XL-MAC tape for total body weight estimation in preschool children from low- and middle-income countries.

Authors: Mike Wells
Journal: PLoS One Date: 2019-01-07 Impact factor: 3.240

8. The PAWPER tape as a tool for rapid weight assessment in a Paediatric Emergency Department: Validation study and comparison with parents' estimation and Broselow tape.

Authors: Davide Silvagni; Laura Baggio; Cristina Mazzi; Giorgio Cuffaro; Silvia Carlassara; Simona Spada; Paolo Biban
Journal: Resusc Plus Date: 2022-09-15

9. How and Why Paediatric Weight Estimation Systems Fail - A Body Composition Study.

Authors: Mike Wells; Lara N Goldstein
Journal: Cureus Date: 2020-03-07

9 in total