João F D Rodrigues1, Daniel Moran2, Richard Wood2, Paul Behrens1,3. 1. Institute of Environmental Sciences CML , Leiden University , Einsteinweg 2 , 2333 CC Leiden , Netherlands. 2. Programme for Industrial Ecology, Energy and Process Technology Department , NTNU , NO-7491 Trondheim , Norway. 3. Leiden University College the Hague , Leiden University , 2595 DG The Hague , Netherlands.
Abstract
Consumption-based carbon accounts (CBCAs) track how final demand in a region causes carbon emissions elsewhere due to supply chains in the global economic network, taking into account international trade. Despite the importance of CBCAs as an approach for understanding and quantifying responsibilities in climate mitigation efforts, very little is known of their uncertainties. Here we use five global multiregional input-output (MRIO) databases to empirically calibrate a stochastic multivariate model of the global economy and its GHG emissions in order to identify the main drivers of uncertainty in global CBCAs. We find that the uncertainty of country CBCAs varies between 2 and 16% and that the uncertainty of emissions does not decrease significantly with their size. We find that the bias of ignoring correlations in the data (that is, independent sampling) is significant, with uncertainties being systematically underestimated. We find that both CBCAs and source MRIO tables exhibit strong correlations between the sector-level data of different countries. Finally, we find that the largest contributors to global CBCA uncertainty are the electricity sector data globally and Chinese national data in particular. We anticipate that this work will provide practitioners an approach to understand CBCA uncertainties and researchers compiling MRIOs a guide to prioritize uncertainty reduction efforts.
Consumption-based carbon accounts (CBCAs) track how final demand in a region causes carbon emissions elsewhere due to supply chains in the global economic network, taking into account international trade. Despite the importance of CBCAs as an approach for understanding and quantifying responsibilities in climate mitigation efforts, very little is known of their uncertainties. Here we use five global multiregional input-output (MRIO) databases to empirically calibrate a stochastic multivariate model of the global economy and its GHG emissions in order to identify the main drivers of uncertainty in global CBCAs. We find that the uncertainty of country CBCAs varies between 2 and 16% and that the uncertainty of emissions does not decrease significantly with their size. We find that the bias of ignoring correlations in the data (that is, independent sampling) is significant, with uncertainties being systematically underestimated. We find that both CBCAs and source MRIO tables exhibit strong correlations between the sector-level data of different countries. Finally, we find that the largest contributors to global CBCA uncertainty are the electricity sector data globally and Chinese national data in particular. We anticipate that this work will provide practitioners an approach to understand CBCA uncertainties and researchers compiling MRIOs a guide to prioritize uncertainty reduction efforts.
International approaches
to climate change have relied upon assigning
responsibility for emissions worldwide such that nations face a common
but differentiated responsibility in climate change mitigation. This
is commonly referred to as carbon accounting and can vary nation by
nation depending on the methodologies used.[1] Recent attention has been placed on computing the amount of greenhouse
gases emitted along international supply chains to satisfy the demand
for goods and services,[2−7] having implications for where climate mitigation burdens should
lie. At a macro level, such consumption-based carbon accounts (CBCA)[8] are calculated using global multiregional input-output
(MRIO) databases,[9] whose compilation involves
the collection and processing of large amounts of source data.[10−14] Such databases typically report only a point estimate for each source
datum (for example, emissions from coal-generated electricity in China),
with only one existing database currently reporting uncertainty estimates.[11] We consider it is important to better understand
the uncertainty of CBCAs, in order to ensure the robustness of CBCA
in policy applications.[15] Note that since
more data transformations are involved in their calculation, CBCAs
are expected to exhibit higher uncertainty than the corresponding
production-based metrics.[16]At an
empirical level Sato[15] and Owen[17] provide an overview of environmental and economic
data sources of CBCA, each subject to uncertainty. Environmental data
includes the estimation of GHG emissions, their allocation to economic
activities, and, for the purpose of calculating climate change contributions,
weighting based on global warming potential. The uncertainty of GHG
emission data itself can be quite high. Liu et al.[18] found that estimates of total CO2 emissions
for China in 2008 vary by 15%, in large part due to problems involving
measuring coal consumption.[19] Yoshida et
al.[20] found that the coefficient of variation
(or CV, the standard deviation divided by the mean) of the emissions
of carbon per unit of output of different firms within a manufacturing
sector in Japan in 1990 was larger than one for 17 out of 23 sectors.
More generally, and still only looking at data only on GHG emissions,
Ballantyne et al.[21] provide a review of
different sources of error in the global carbon budget and find, among
other things, that in 2010 30% of the uncertainty in global carbon
accumulation is due to errors in data on fossil fuel use. Within IO
tables themselves, economic data includes national accounts on final
and intermediate demand of various goods and services and international
trade, and these can also hold uncertainties. Manski[22] provides an overview of the sources of uncertainty in official
national statistics, and Helbling and Terrones[23] find that the discrepancy between global imports and exports
is now on the order of 1%. International trade data is not usually
available with the same level of detail as domestic data, and therefore
additional estimation procedures are often necessary,[24] with different MRIO databases resorting to different methodological
options (see Chapter 2 of Owen[17] for a
review of how MRIO databases differ in source data and construction).
In short, this means that MRIO tables suffer more uncertainty than
single region IO tables.The current consensus is that the major
source of uncertainty affecting
CBCA results is the GHG emission data and not the economic data. In
support of this hypothesis Owen et al.[16] found that national CBCAs have uncertainties in the same range as
production-based ones. Karstensen et al.[25] furthermore point to the relatively large influence of emission
coverage and choice of global warming or temperature response metric
according to different time horizons. Finally, Moran and Wood[26] found that there is a general agreement in the
CBCA results from different global MRIOs, again with the largest discrepancies
being found in the GHG emission extensions.However, due to
limitations of data availability and computational
power, the reporting of uncertainty in CBCA is still rare. Further,
when uncertainty is estimated, a key simplification is performed whereby
the underlying calculations assume the data to be independent, that
is, correlations between different elements in the data set are not
considered. Correlation is a metric of association between different
data elements, which measures the strength of the linear relationship
between the data. Examples are the relationship between energy recorded
in the intermediate transaction matrix and fuel combustion emissions
in the GHG account or the level of demand for a product to the output
of the product. While there is a large tradition in the study of the
uncertainty of IO models,[27−35] recently reviewed in Temurshoev,[36] this
literature does not usually take into account correlations and thus
how uncertainty varies jointly between data elements. A few exceptions
include[37−39] those reviewed by Ten Raa and Steel[40] and, more recently, studied by Rodrigues.[41]In investigating CBCA uncertainty, a few studies
explicitly mention
the role of correlations or data dependence. Hertwich and Peters[42] divide the source data by blocks, present estimates
of uncertainty for each, and argue that the relative uncertainty of
country CBCAs is lower than that of product-level CBCA because errors
cancel out, thus assuming independence. Lenzen et al.,[43] the only dedicated study of uncertainty in MRIO
analysis that we are aware of, argue that due to the lack of available
data, source data should be assumed to be uncorrelated. Furthermore,
relative errors of the data in their study decrease with the magnitude
of a value. More recently, as arguments in favor of assuming zero
correlations between MRIO coefficients, Karstensen et al.[25] argue that besides the absence of underlying
information, filling in the correlation matrix of an MRIO model would
be computationally prohibitive.In this study we investigate
the uncertainty of CBCAs, focusing
on what both users and producers of CBCA can learn when using dependent
sampling (i.e., taking correlations into account). No existing database
reports metrics of association or dependence between data, so we use
the set of the five most common global MRIO tables as a sample. After
characterizing the uncertainty and correlation structure of global
CBCAs we examine how approximations with independent sampling perform.
We then search for patterns in the source MRIO data and examine the
impact of reducing the uncertainty of particular source elements on
the global CBCA uncertainty. We conclude the paper by comparing our
results with the literature, suggesting directions of future research
and summarizing the implications of our findings.
Data and Methods
Data Sources
and Basic Concepts
An environmentally
extended global MRIO model is a description of the world economy,
linking consumption in a given region through international supply
chains to environmental pressures anywhere in the world. Different
approaches have been used earlier to inform the error distribution
of the source data: for example, Lenzen et al.[43] use a range of auxiliary statistics on relative standard
errors of similar data, Lenzen et al.[11] use the degree of adjustment during the balancing procedure, and
Yamakawa and Peters[34] use time-series variation.
However, to the best of our knowledge these approaches do not allow
quantifying the dependencies between different components of the data
set, which is a crucial aspect of the research we wish to undertake.
We therefore use a harmonized set of global MRIOs which we will use
to calculate the uncertainty of CBCA using dependent sampling.The main data source of this study is a set of five global product-by-product[44] MRIO databases that were harmonized and tailored
to calculate country/region-level global carbon footprints in the
year 2007. These five MRIOs are constructed from the Exiobase[45] (http://exiobase.eu), WIOD[12] (http://www.wiod.org/), EORA[11] (http://worldmrio.com/, OECD[14] (http://www.oecd.org/sti/ind/input-outputtablesedition2015accesstodata.htm), and GTAP[10] (https://www.gtap.agecon.purdue.edu/ databases. Each MRIO database was harmonized to n = 22 regions, n = 17 sectors per region and a single category
of final demand per country, primary inputs, and carbon emission types.
The regional classification consists of 20 countries covering 78%
of global CO2 emissions and two aggregate regions. The
list of regions and sectors can be found as Supporting Information (SI) S1, and the concordance with the original
MRIOs can be found in Owen et al.[16] and
Steen-Olsen et al.[46]The CBCA of
region k is obtained aswhere y is the k-th column of Y, the
matrix of final demand, I is the identity matrix, A is the matrix of technical coefficients, expressing how
much a unit of demand for a product leads to increased output in a
sector, b is the vector of environmental stressors, or
carbon direct emissions, indicating how much emissions result from
a unit of production of every sector, and h is the k-th element of h, the vector of household carbon emissions. We assume that vectors
are in column format by default and prime, ′, denotes transpose.
The number of rows in Y and the number of both rows and
columns in A is n × n.Coefficients A and b are calibrated on
the basis of flows as A = Zx̂–1 and b = rx̂–1, where Z, x, and r are, respectively,
the matrix of interindustry transactions, the vector of total sales,
and the vector of direct carbon emissions of industries, and ^
represents diagonal matrix.
Data Compression
The number of points
in an MRIO model
can be very large (for reference EXIOBASE consists of over 96 million
data points). This raises computational problems when analyzing covariances
(explained in the following subsection) and makes the interpretation
of the results more difficult. Hence, in this subsection we explain
how it is possible to reduce the data volume while minimizing the
loss of relevant information.According to Rodrigues et al.[47] the elements of A and Y are not obtained from source data, because use of imported products
by industries is not generally known. Instead, algorithms are applied
to import use tables that specify use of products by industries (but
not source country) and international trade data that specifies bilateral
trade in products (but not destination industry). The values reported
in the MRIOs may differ from the pure application of the algorithms,
due to the use of balancing[48] and other
processing procedures[47] along the MRIO
construction pipeline. We make this assumption explicit by replacing
the original A and Y by alternatives which
explicitly separate international trade from use of products, whose
elements are defined aswhere subscript(s) and superscript(s) explicitly
indicate the sector(s), i and j,
and region(s), a and b. The sum
of international trade coefficients, R (for intermediate use) and F (for final use), over import region a equals one. The sum of intermediate use coefficients, T, over product i is less
than one (to account for primary inputs). The sum of demand composition
coefficients, C, over product i is one. This formulation of eq splits absolute total country final demand,
the scale factor s,
from the relative composition and trade coefficients. The new quantities
defined in eqs and 3 are obtained by aggregating the elements of A and Y over the appropriate dimensions. This
formulation is very similar to the one illustrated in Tables 1–3
of Rodrigues et al.,[47] with the difference
that their separate domestic and import use matrices are considered. Eq is also related to the
way Owen et al.[16] split final demand into
volume, region share, and product share.When undertaking this
reformulation of the MRIO data, for country-level
footprints, the average and maximum discrepancy between the original
and transformed MRIOs (for the trade/use split described above) is
less than 0.6 and 2.5%, respectively, for every region and MRIO. Additional
considerations about the data compression and resulting discrepancies
can be found in SI S2. Even in MRIOs in
which the transformation defined by eqs and 3 is not an approximation
but is exact (i.e., they were constructed using the trade-share assumption,
such as EXIOBASE), there is still a discrepancy because those databases
were constructed with more regional and sectoral detail. Since we
performed aggregation to harmonize the different MRIOs, such discrepancies
are expected.Under the above transformation there are three
equivalent formats
to represent an MRIO, all of which yield exactly the same CBCA results.
These are the flow format: Z, Y, r, and h; the coefficient
format: A, W, b, and d, where W = Ydiag(s)−1 and d = hdiag(s)−1; and the modular format: T (technology); R (trade in intermediate
goods); C (demand composition); F (trade
in consumer goods); s (scale); b (industry
emissions), and d (household emissions). Scale factors
have monetary units, both emission coefficient blocks have units of
GHG emissions per monetary unit, and the other blocks are adimensional.We use these three different formats to show the bias in results
when undertaking the assessment of uncertainty on the three formats
when assuming independent sampling, while the modular format will
be used in all other analyses reported in the Results where correlations are taken into account.Finally, SI S2 shows that the modular
format provides significant advantages in terms of data storage. The
modular format also makes the interpretation of results clearer by
reducing the number of elements in a data block and thus facilitating
the identification of patterns.
Modeling Uncertainty
The goal of this paper is to study
the uncertainty of CBCAs and, in particular, to capture the effect
of dependencies among MRIO data elements. In this subsection we describe
how we do it: we first discuss and formalize the concept of uncertainty
and then clarify the distinction between dependent and independent
sampling.There are many potential sources of uncertainty[49] which, according to the Bayesian paradigm, can
be formalized through a probability distribution, in which the probability
of an outcome expresses the degree of belief that an observer has
in that particular outcome.[50,51] In practice it is often
difficult to assign a specific value to each and every possible outcome,
so we resort to aggregate metrics. In this paper we follow the Bayesian
approach of Weise and Woger[52] and interpret
the mean of a probability distribution as the best
guess and the standard deviation or SD as the uncertainty.Mathematically, if p(t) is the
probability density function, and t is a non-negative
real number, then the mean, μ, and standard deviation, σ,
areIf more than one variable is being considered, then a further
metric
for characterizing the dependency between them is necessary. The most
common metric, which we use in this study, is the Pearson correlation
coefficient or simply correlation, ρ. If the
variables are t1 and t2, then the correlation iswhere p(t1,t2) is the joint
probability
density function, and μ1, μ2, σ1, and σ2 are the means and standard deviations
of variables t1 and t2. Correlations measure the strength of association between
two variables, expressing how close they are to exhibiting a linear
relation, taking value 1 if it is strictly linear with positive slope,
– 1 if strictly linear with negative slope, and 0 if no linearity
is apparent.If a multivariate probability distribution function
characterizing
the MRIO is known, it is in principle possible to use eq to derive analytical expressions
of the stochastic properties (mean, standard deviation, and correlations)
of the CBCA. However, as far as we are aware such a formula is not
available, so the typical way to calculate CBCA uncertainty is through
Monte Carlo sampling.In Monte Carlo sampling a large sample
of n realizations
of the source data are generated, and for each realization the desired
CBCA is calculated. These n CBCA realizations are
then used to obtain aggregate metrics of the underlying probability
distribution aswhere we used t, t1, and t2 as arbitrary
variables, and k is the iterator in the sample set.
Variables m, s, and r are the empirically calculated mean, standard deviation, and correlation;
and m1, m2, s1, and s2 are the means and standard deviations of variables t1 and t2.A distinction
should be made about the way the sample of MRIOs
is extracted from the probability distribution. With dependent sampling all dimensions that compose the MRIO are obtained at the
same time. Thus, if eq is applied to the MRIO sample, the same values are obtained as those
from eq . If independent sampling is performed, then each dimension of
the MRIO is sampled in isolation, and application of eq to the MRIO sample will consistently
yield a zero correlation coefficient (given a sample size large enough).Note that a linear combination of MRIO systems that are balanced
(row and column sums match) is itself balanced. Hence, every realization
obtained through dependent sampling is balanced since it will be a
weighted sum of the original MRIOs. A realization obtained from independent
sampling, however, is not necessarily balanced, since the dependencies
among row and column values of the original tables are not captured.Dependent sampling respects the dependencies in the source data,
while independent sampling does not. Then why do all past studies
of CBCA uncertainty use independent sampling, as reviewed in the Introduction? Because collecting data to calibrate
the uncertainty of individual dimensions of an MRIO is already difficult
enough, and no one has so far been able to calibrate correlations
in the source data.In order to make this problem tractable,
in the present study we
interpret the population of five available MRIOs as a (very small)
sample from the probability distribution of a true but unknown ’meta’-MRIO.
We can then characterize both the meta-MRIO and the meta-CBCA distribution.Note that it is conventional to start with a known theoretical
distribution (e.g., normal or log-normal) from which a (large) sample
is extracted. We did attempt to characterize a conventional multivariate
distribution based on the 5 MRIO sample but were unsuccessful. In
the Discussion we present more details on
the probability distributions we explored.To perform dependent
sampling each individual MRIO is used to calculate
a separate CBCA estimate, and eqs –9 are applied to the n = 5 resulting CBCAs to characterize the stochastic properties
(mean, standard deviation, and correlations) of a meta-CBCA. Unless
stated otherwise all of the material reported in the Results section is calculated this way. The properties of
the source data, which are examined in later subsections of the Results are characterized by applying the same set
of equations to the n = 5 population of MRIOs (interpreted
as a sample of the unknown true meta-MRIO).It is important
to contrast the results using dependent and independent
sampling, to assess how much they differ. Hence, in subsection Bias of Independent Sampling in the Results section we make calculations with independent sampling.
We obtain these by resampling the five MRIO set a large number of
times, such that at every time a particular element of the MRIO is
extracted with equal probability from each of the original five MRIOs.
Mathematically, this procedure can be represented as follows. Let t(k) be the vector representation of the k-th MRIO obtained from independent sampling, with k = 1,...,n with n large.
This means that t(k) is a particular MRIO entry (e.g., some technical coefficient).
Under independent sampling every t(k) is set equal to the corresponding entry i of one of the original 5 MRIOs with the same probability.
Results
Uncertainties of Country CBCAs
The distribution of
country CBCAs is heterogeneous and dominated by a small set of data
elements (a small number of regions, sectors, extensions, inputs,
etc.). The USA and China each represent more than 16.3% of global
emissions (as does the composite region Rest of the World), with the
next largest emitting country (Japan) representing less than 5.1%
(see Table ). There
is no clear trend in the relation between the coefficient of variation
of regions and their respective size. In SI S3 we show the CV of all country CBCAs, where we see the lowest value
is for Indonesia, at 2.2%, and the highest value is for The Netherlands
at 16.0%, with a median of 7.5%.
Table 1
Expected Value and
Coefficient of
Variation of the CBCA of Selected Countries and the World
country
mean (%)
mean (GtCO2)
CV (%)
USA
23.07
6.50
5.53
RoWorld
16.72
4.71
10.72
China
16.28
4.59
9.07
Japan
5.08
1.43
8.51
India
4.61
1.30
3.77
World
100.00
28.18
5.74
Table also shows
that the CV of the world (calculated with dependent sampling) is 5.74%.
If this value would have been obtained by summing over country CBCAs
while assuming country CBCAs to be independent it would be 2.8%. That
is, if the standard deviation of the world is obtained as , where σ is
the standard deviation of a country. Hence, assuming independence
among country CBCAs underestimates the world CBCA by half. This is
because the distribution of correlations among country CBCAs strongly
deviates from zero, as shown in Figure . Country CBCA correlations have a mean ± standard
deviation (SD) of 0.63 ± 0.36, with a median of 0.76. This means
that a practitioner, when faced with two countries and no additional
information, should assume their CBCA to exhibit a strong positive
correlation. Only Taiwan and the Rest of the World exhibit strong
negative correlations with some other regions. It would be interesting
to explore which characteristics the Taiwanese data shares with the
Rest-of-the-World data, but this question falls outside the scope
of the present study as it requires an in-depth study of those two
particular regions.
Figure 1
Correlation between country CBCAs. See SI Table S2 for the meaning of country codes.
Correlation between country CBCAs. See SI Table S2 for the meaning of country codes.
Uncertainties of Product CBCAs
If we now look into
the contribution of different product categories to a country CBCA
we find that, in total, over 80% of emissions come from just five
product categories: electricity, transport, household direct emissions,
fuel and trade (a composite product category), and oil. Figure shows the relation between
the CV and expected value of product-level CBCAs, with power-law regression
lines for those top-5 product categories. The power-law regression
lines take the form y = ax, where y is the CV of product CBCA, x is the mean of product CBCA, and a and b are fitted parameters. Figure is represented in a loglog scale due to
the large scatter in the data, which covers several orders of magnitude
in both axes. The points underlying each regression line are the 22
regions.
Figure 2
Coefficient of variation (CV) vs mean of CBCAs of product categories.
On the x axis, products with greater emission contributions
appear to the right of the plot. On the y axis, products
with larger variations appear at the top of the plot. A perfectly
horizontal line would imply that uncertainty does not vary at all
with the size of emissions from that product (as approximately the
case in Electricity and Transport). Only regressions for the top 5
contributing product categories are shown.
Coefficient of variation (CV) vs mean of CBCAs of product categories.
On the x axis, products with greater emission contributions
appear to the right of the plot. On the y axis, products
with larger variations appear at the top of the plot. A perfectly
horizontal line would imply that uncertainty does not vary at all
with the size of emissions from that product (as approximately the
case in Electricity and Transport). Only regressions for the top 5
contributing product categories are shown.SI Section S3 reports the numerical
values underlying the figure. We can see that the slope of those curves
is not strong: indicating that relative uncertainty does not decrease
with size (with the exception of the oil product category). Note also,
that when comparing across consumption categories some of those top-5
product categories have the highest CV. Thus, our results do not support
the observation that, in general, errors decrease with size.We have also examined the correlation between the uncertainty of
aggregate consumption categories, as shown in Figure and 4. That is, we
have plotted the two-way correlation coefficients between emissions
for the same products in different regions and for the two-way correlation
coefficients between product pairs within regions. We find that correlations
are strongly positive for the same product category across regions.
For ease of visualization, when comparing correlations of product
category pairs within a region we have shown only the top-5. Here
we find that the variation is low, and the medians span the whole
range of possible uncertainties. The correlation of product-category
pairs across regions, shown in SI Section S4, is somewhere in between, exhibiting larger dispersion and generally
more positive values.
Figure 3
Correlation of same product across regions. Red horizontal
line
is median; box is 50%-confidence interval (interquartile range); the
maximum length of whiskers is 1.5 times the interquartile range, and
red circles are outliers.
Figure 4
Correlations between products within a region. Elec = Electricity;
FT = Fuel/trade; Trans = Transport; HH = households.
Correlation of same product across regions. Red horizontal
line
is median; box is 50%-confidence interval (interquartile range); the
maximum length of whiskers is 1.5 times the interquartile range, and
red circles are outliers.Correlations between products within a region. Elec = Electricity;
FT = Fuel/trade; Trans = Transport; HH = households.
Bias of Independent Sampling
Analyses
of CBCA uncertainty
are usually performed using independent sampling or, equivalently,
by explicitly assuming source data to be uncorrelated. That is, it
is assumed that data from different sources for different nations
or product categories can vary independently, e.g., emissions from
electricity in China going up while emissions from electricity in
the USA go down. We now compare the performance of such approximations
with the actual results from dependent sampling.Note that even
if one decides to ignore correlations in CBCA calculations, source
data in MRIO databases is reported in absolute terms, while the use
of MRIO data in CBCA calculations requires the conversion of intermediate
inputs to coefficients. Dietzenbacher,[53] building upon the work of Roland-Holst,[54] finds that applying uncertainty estimates to the data pre- or postcalculation
of coefficients yields similar results, as the bias (i.e., over- or
underestimation of true results) is small.We consider three
approximations, corresponding to the assumption
of zero correlations on the flow, coefficient, and modular version
of the MRIO (see Methods for details). We
performed Monte Carlo sampling with 10000 simulations of which we
report the bias (i.e., the distance between the approximation and
the true value) for the mean and CV of CBCAs at different levels of
aggregation.We report values for the relative bias of the mean,
defined as
(y–x)/x,
where x is the actual value and y is the approximation; and for the absolute bias of the CV, y–x, where x and y hold the same meaning. For the world as a whole, the bias
in the mean value is +5% for both the modular and coefficient formats
and +0.06% for the flow format, while corresponding percentages for
CV are in the range of 4 to 6%, as shown in Table . That is, in all cases, by assuming independent
sampling as in conventional approaches, total emissions are overestimated
when compared to dependent sampling, and only the bias of the mean
from the flow format is negligible. Average country-level biases are
much lower than on the global aggregate but with a large variation
across countries, with the flow format still exhibiting a noticeably
smaller bias, and with a smaller standard deviation than the other
formats. Product-level biases in CBCAs for the flow format are clustered
around zero, with a small but positive average. The product-level
biases of the other two formats have a large variation, with a standard
deviation five times larger than that of the flow format. The majority
of product CBCA CV biases are in the negative range, and the flow
format values are especially so, with a smaller SD than the other
two formats.
Table 2
Bias of the Mean and CV of Aggregate
CBCAsa
modular
coefficient
flow
world
mean
4.96
5.06
0.06
CV
5.91
5.72
4.35
countries
mean
0.88 ± 4.93
0.44 ± 4.84
0.17 ± 2.25
CV
17.27 ± 6.00
15.57 ± 6.19
11.44 ± 6.40
products
mean
4.00 ± 24.02
3.50 ± 25.69
0.59 ± 5.03
CV
–8.82 ± 23.95
–11.6 ± 24.09
–17.97 ± 16.32
For countries and products values
reported are expected value ± standard deviation. All figures
are in percentage.
For countries and products values
reported are expected value ± standard deviation. All figures
are in percentage.Thus,
our analysis does not support the claim found in the literature,
obtained with independent sampling calculations, that biases are positive
but negligible. We find them to be substantial and, in the case of
uncertainty, systematically negative, meaning that uncertainty is
underestimated by independent sampling. In SI Section S5 we report the values of biases for countries and
products for the different formats.
Patterns in Source Data
Uncertainty
It is conventional
in IO studies to use a downward-sloping relation between uncertainty
(CV) and size (expectation). For example, if a country has a larger
electricity sector, then the relative uncertainty of its emissions
should be lower. This is on the basis of either statistics or the
assumption that each value in the IO data set is actually the result
of summing over smaller values, and errors therein cancel out. In
our data set, however, the relation between CV and mean is mostly
flat and with a large variance for most data blocks, as exemplified
in Figure for industry
emission coefficients and in SI Section S6 for other data blocks. Figure is represented in a loglog scale due to the large
scatter in the data, which covers several orders of magnitude in both
axes.
Figure 5
Uncertainty of production emission coefficients. On the x axis, the mean emission coefficients, carbon intensities,
are plotted, with higher carbon intensities toward the right-hand
side. The y axis shows the coefficients of variation
in those product emission coefficients, with higher variations at
the top of the plot. Estimations for each sector are fitted with a
regression line and show that there is no systematic reduction in
uncertainty (CV) as the sector emission coefficient increases.
Uncertainty of production emission coefficients. On the x axis, the mean emission coefficients, carbon intensities,
are plotted, with higher carbon intensities toward the right-hand
side. The y axis shows the coefficients of variation
in those product emission coefficients, with higher variations at
the top of the plot. Estimations for each sector are fitted with a
regression line and show that there is no systematic reduction in
uncertainty (CV) as the sector emission coefficient increases.Following this, to better understand
the relation between the expected
value (magnitude) of a data point and the uncertainty of that data
point, we calculated power-law regressions of the form y = ax, where y is the CV, x is the expected value, and a and b are fitted parameters. We performed
this for all blocks in the MRIO data set. When a single regression
is performed per data block, the coefficient of determination R2 is weak, as seen in column ’single’
of Table and the red line in Figure . That is, the coefficients for each sector
and region in a given data block are not well correlated with one
another. We additionally performed similar regressions for every sector-region
combination (depending on the data block) that would still leave n = 22 regions for the regression. The black lines of Figure illustrate the result
of this exercise for industry emission coefficients, and column ’sector’
of Table shows the R2 values. The R2 values of column ’sector’ are obtained by squaring
the correlation between the set of true values and set of values obtained
from the separate n sector-level regressions in each
data block, where the number n is indicated in the
last column of Table .
Table 3
R2 Coefficient
of Power-Law Regressions between Mean and CV of Different Source Data
Blocksa
block
single (%)
sector (%)
n (−)
scale
3.98
3.98
1
household emissions
8.55
8.55
1
composition
14.26
70.93
17
industry emissions
0.07
70.16
17
technology
15.5
55.82
289
trade in final products (self)
52.28
70.90
17
trade in intermediate products (self)
41.08
62.43
17
trade in final products (other)
9.88
66.71
357
trade in intermediate products (other)
9.49
62.77
357
Single = one regression per block;
sector = one regression per sector; n = number of
regressions per block.
Single = one regression per block;
sector = one regression per sector; n = number of
regressions per block.What
we can learn from Figure is that at the level of the data block as a whole
there is a flat trend connecting uncertainty and magnitude (even if
with a large variation). That is, on the level of the data block,
uncertainties do not decrease with the size of the data block but
remain fairly stable. However, when examined at the level of specific
sectors (the black lines, mostly with a positive gradient), there
is actually an increase of uncertainty with the size of that sector.
The overall flat pattern (the almost horizontal red line) emerges
only when different sectors are bundled together. Similar plots to Figure for every data block
are reported in SI S6, for which the same
reasoning applies, although the results are not so extreme: the relation
between uncertainty and size at the block level is downward-trending,
but at the sector-level data this pattern becomes sometimes reversed,
with the block-level pattern emerging from the juxtaposition of sector
data.Separate plots in SI Section S6 and
separate summary statistics in Table are reported for international trade coefficients
with the same country (self) and with other countries (other) as the
former coefficients are generally high and the former are generally
low, so we found patterns were clearer if they were examined separately.We conclude that practitioners should not expect, a priori, that
the slope of CV vs mean be downward sloping at the level of sectoral
data, even if it is so for a data block as a whole. In fact, in many
cases, the uncertainty may actually increase with increasing size.
Uncertainty Reduction
Next, we determine which factors
dominate the uncertainty of the global CBCA. We do this by exploring
the effect of reducing the uncertainty of specific source data elements
or blocks. By doing this we can prioritize efforts in improving data
collection for IO tables. We can also develop a further understanding
about the interaction between uncertainty and different components
in IO tables.We explore this uncertainty by
an iterative, stepwise reduction of uncertainty in the source data
and comparing the resulting CBCA uncertainty after each step. At each
step, we find the data point in the IO tables for which setting that
data point to the average of the separate MRIO values would give the
greatest reduction in the overall uncertainty of the world CBCA. We
find that 20 elements account for 99.9% of the uncertainty of global
CBCA, listed in Table (The key of data block acronyms is reported in Table .). Since this process is path
dependent, to assess the robustness of the result we repeated the
calculations with alternative settings: setting the value not to the
average but either to the minimum or to the maximum of the five MRIOs
and not setting the value of all MRIOs exactly to the average but
moving in that direction, by a factor of 10 and 50%. In all cases
convergence is fast, and most of the same data elements reappear,
although the exact data elements in each top-20 set are not always
the same, as shown in SI S7. In future
studies this path dependence might be avoidable if a linear approximation
is performed.[55−57]
Table 4
Reduction on Global CBCA When the
Uncertainty of Top 20 Data Elements Is Eliminated, Assuming the Mean
Is the True Valuea
rank
block
source region
source sector
destination
region
destination sector
Ind. (%)
Cum. (%)
1
IndEm
RoWorld
Electricity
19.95
80.05
2
IndEm
Russia
Electricity
9.04
71.01
3
IndEm
China
Electric eq.
9.13
61.87
4
Scale
USA
6.83
55.05
5
Techn.
China
Electricity
Electricity
8.25
46.80
6
Comp.
China
Electricity
8.03
38.76
7
IntTrade
RoWorld
RoWorld
Transport
5.34
33.42
8
IndEm
Russia
Oil
5.18
28.24
9
Techn.
India
Electricity
Transport
4.71
23.53
10
IntTrade
China
China
Electric eq.
4.23
19.30
11
Techn.
RoWorld
Transport
Transport
3.22
16.08
12
IndEm
China
Construction
3.41
12.67
13
Comp.
RoWorld
Other serv.
2.90
9.77
14
IntTrade
China
China
Mining
2.65
7.12
15
Techn.
USA
Electricity
Fuel/trade
2.40
4.72
16
IndEm
RoWorld
Construction
2.81
1.91
17
IndEm
Canada
Oil
1.31
0.61
18
Techn.
Russia
Transport
Oil
0.42
0.19
19
Techn.
USA
Oil
Communicat.
0.14
0.05
20
IntTrade
France
RoWorld
Transport eq.
0.04
0.01
The key of data block acronyms
is reported in Table . Ind = individual; Cum = cumulative.
Table 5
List of Acronyms of Data Blocks
short
long
1
Scale
scale
2
Comp
composition
3
Tech
technology
5
FinTr
trade in final
goods
6
IntTr
trade in intermediate goods
7
HHEm
household emissions
8
IndEm
industry emissions
The key of data block acronyms
is reported in Table . Ind = individual; Cum = cumulative.Table supports
the idea that to understand uncertainty domestic data is more important
than international trade data. China is by a large margin the most
frequent country, followed by the rest of the world and other large
economies. Among data blocks industry emissions stand out. Finally,
electricity is the most frequent sector in this list, followed by
transport, with other sectors appearing with much less frequency (oil,
transport equipment, metals, construction and mining, among others).
These findings reflect the underlying choices made in MRIO data construction–whether
it be choice of data (geographical specificity, timeliness, based
on physical or monetary relationships) or conceptual approaches to
construction (prioritization of different data, level of balancing
allowed, etc.). Other studies have found similar relationships and
provide some reflection: Wieland et al.[58] also discovered that Chinese domestic data is an issue; Tukker et
al.[59] found that the highest uncertainty
in footprint analyses is caused by the environmental data; and Owen
et al.[60] found that structural paths involving
the electricity sector contribute significantly to model differences,
because of the way electricity is treated in different databases.In summary, the examination of embodied emissions and the elimination
of uncertainty in inputs yield a consistent perspective, with a small
number of blocks, sectors, and regions accounting for the bulk of
uncertainty.
Discussion
Empirical Estimates
Hertwich and Peters[42] reported uncertainty
estimates for several data
blocks, which can be compared with our own results. The ranges for
the coefficient of variation (CV) of product CBCAs is in the range
50–200%, and country CBCAs is in the range 5–15%. These
numbers agree well with our results, in which the CV of the product
CBCAs ranged from 10 to 200%, and country CBCAs ranged from 2 to 16%.Concerning source data, the same reference[42] reports that the CV of emission coefficients is 5–10% for
OECD and 10–20% for non-OECD countries; consumption coefficients
are 10%; technical coefficients are 1–50%; and trade coefficients
have 20% uncertainty concerning country of origin and 10% concerning
trade volumes. By contrast, in our study industry emission coefficient
CVs ranged from 10 to 200% and household emissions ranged from 10
to 50%; consumption ranged from 2 to 200%; technical coefficients
ranged from 1 to 200%; trade coefficients with other countries ranged
from 5 to 200% and self-trade coefficients ranged from 0.2 to 50%
(ignoring outliers); and scale coefficients ranged from 1 to 15%.Although not all values show strong agreement, we can say that
our results concerning CBCAs are in line with the literature, but
they do differ strongly concerning the source data, with our results
suggesting a wider uncertainty range in all data blocks.
Data and Method
Considerations
The conventional procedure
in IO independent Monte Carlo integration is to assume a normal[26,35] or log-normal[25,43] distribution, although others
have also been used, in particular the beta distribution[40] (for a review see Kop Jansen[61] and Temurshoev[36]). Due to several
technical reasons, we could not use these multivariate approaches.
The normal distribution was ruled out because uncertainties are too
high, leading to an unacceptably large proportion of negatives. The
log-normal distribution cannot handle the large span of strong negative
covariances observed in the data set. We could not find in the literature
a natural multivariate version of another distribution that would
accommodate arbitrary covariances as required to calibrate our model,
although we explored variants of the beta,[62] gamma,[63] and folded normal[64] distributions.We also tried to use a
formal sensitivity method to quantify how much a particular source
data point contributes to the resulting uncertainty. As recently reviewed
by Borgonovo and Plischke,[65] there are
two main types of sensitivity analysis: local and global. Local sensitivity
analysis examines the effect on model output of a change in a single
parameter at a time and is employed in a deterministic framework,
e.g., to identify the parameters that most strongly affect key sectors.[66] Global sensitivity analysis (GSA) breaks down
the variation in model response among its multiple inputs at the same
time. The most popular GSA method is the variance decomposition of
Sobol,[67] in which the variance of output
is split among additive terms that reflect the contribution of input
variance, although distribution-based GSA methods[68] are gaining popularity.In the end, given the small
sample size of the source data we chose
not to use these methods and, instead, perform the heuristic analysis
reported in this paper. As more MRIOs become available, or future
revisions of existing MRIOs converge, it may become possible to use
these more sophisticated techniques. Our present results are an important
step forward and a benchmark against which to compare those future
studies.Besides the analysis focusing on correlations performed
here, in
the future it might also be interesting to explore the role of partial
correlations in MRIO and CBCA uncertainty.
Final Remarks
For CBCA practitioners our results suggest
caution about the extrapolation of uncertainty when aggregating results
over both spatial and sectoral scales. Contrary to the established
literature we do not recommend assuming that errors cancel out and
that independence can be safely assumed. We found that the CBCA of
whole countries is strongly correlated and that, in general, the uncertainty
of product CBCAs is not reduced as the size of that product CBCA increases.
We also found that independent sampling (i.e., ignoring correlations
in the source data) leads to the underestimation of uncertainty.For MRIO developers, our results point out the elements of the data
landscape in which refinement efforts should be prioritized, if the
goal is to reduce the uncertainty of CBCAs. These are primarily the
following: the environmental extensions, among data blocks; data related
to the electricity supply chain, from a sectoral point of view; and
to China, from a geographic point of view. More generally, we provide
a methodology for the prioritization of data refinements even if other
criteria besides global CBCA are considered. Although the analysis
reported here focused on CBCAs, a similar study could be performed
for other environmental or economic extensions, provided they are
reported by all MRIOs present in the sample.Finally, since
part of the data used here is the same data used
to calculate production-based and income-based carbon accounts,[1] all the results presented here are also transferable,
within the scope of the relevant data block. More specifically, income-based
carbon accounts[69,70] require all of the data used
here, with value added coefficients replacing the role of consumption
coefficients. In the case of production-based accounts it is direct
emissions (from industry and households) and total (intermediate and
final) consumption that are relevant.CBCA is seen as a prominent
alternative to traditional, production-based
approaches, and it could open up new opportunities for climate policy
innovation.[8] However, the understanding
of uncertainties in the data used for CBCA has been a key limiting
factor.[71] The work outlined here represents
a step toward understanding uncertainties and provides a basis for
developing a standardized procedure for CBCA uncertainty estimation.
Authors: Jianguo Liu; Harold Mooney; Vanessa Hull; Steven J Davis; Joanne Gaskell; Thomas Hertel; Jane Lubchenco; Karen C Seto; Peter Gleick; Claire Kremen; Shuxin Li Journal: Science Date: 2015-02-27 Impact factor: 47.728