Literature DB >> 29897752

Uncertainty of Consumption-Based Carbon Accounts.

João F D Rodrigues¹, Daniel Moran², Richard Wood², Paul Behrens^1,3.

Abstract

Consumption-based carbon accounts (CBCAs) track how final demand in a region causes carbon emissions elsewhere due to supply chains in the global economic network, taking into account international trade. Despite the importance of CBCAs as an approach for understanding and quantifying responsibilities in climate mitigation efforts, very little is known of their uncertainties. Here we use five global multiregional input-output (MRIO) databases to empirically calibrate a stochastic multivariate model of the global economy and its GHG emissions in order to identify the main drivers of uncertainty in global CBCAs. We find that the uncertainty of country CBCAs varies between 2 and 16% and that the uncertainty of emissions does not decrease significantly with their size. We find that the bias of ignoring correlations in the data (that is, independent sampling) is significant, with uncertainties being systematically underestimated. We find that both CBCAs and source MRIO tables exhibit strong correlations between the sector-level data of different countries. Finally, we find that the largest contributors to global CBCA uncertainty are the electricity sector data globally and Chinese national data in particular. We anticipate that this work will provide practitioners an approach to understand CBCA uncertainties and researchers compiling MRIOs a guide to prioritize uncertainty reduction efforts.

Entities: Chemical Species

Mesh：

Substances：
Carbon

Year: 2018 PMID： 29897752 PMCID： PMC6150677 DOI： 10.1021/acs.est.8b00632

Source DB: PubMed Journal: Environ Sci Technol ISSN： 0013-936X Impact factor: 9.028

Introduction

International approaches to climate change have relied upon assigning responsibility for emissions worldwide such that nations face a common but differentiated responsibility in climate change mitigation. This is commonly referred to as carbon accounting and can vary nation by nation depending on the methodologies used.[1] Recent attention has been placed on computing the amount of greenhouse gases emitted along international supply chains to satisfy the demand for goods and services,[2−7] having implications for where climate mitigation burdens should lie. At a macro level, such consumption-based carbon accounts (CBCA)[8] are calculated using global multiregional input-output (MRIO) databases,[9] whose compilation involves the collection and processing of large amounts of source data.[10−14] Such databases typically report only a point estimate for each source datum (for example, emissions from coal-generated electricity in China), with only one existing database currently reporting uncertainty estimates.[11] We consider it is important to better understand the uncertainty of CBCAs, in order to ensure the robustness of CBCA in policy applications.[15] Note that since more data transformations are involved in their calculation, CBCAs are expected to exhibit higher uncertainty than the corresponding production-based metrics.[16] At an empirical level Sato[15] and Owen[17] provide an overview of environmental and economic data sources of CBCA, each subject to uncertainty. Environmental data includes the estimation of GHG emissions, their allocation to economic activities, and, for the purpose of calculating climate change contributions, weighting based on global warming potential. The uncertainty of GHG emission data itself can be quite high. Liu et al.[18] found that estimates of total CO2 emissions for China in 2008 vary by 15%, in large part due to problems involving measuring coal consumption.[19] Yoshida et al.[20] found that the coefficient of variation (or CV, the standard deviation divided by the mean) of the emissions of carbon per unit of output of different firms within a manufacturing sector in Japan in 1990 was larger than one for 17 out of 23 sectors. More generally, and still only looking at data only on GHG emissions, Ballantyne et al.[21] provide a review of different sources of error in the global carbon budget and find, among other things, that in 2010 30% of the uncertainty in global carbon accumulation is due to errors in data on fossil fuel use. Within IO tables themselves, economic data includes national accounts on final and intermediate demand of various goods and services and international trade, and these can also hold uncertainties. Manski[22] provides an overview of the sources of uncertainty in official national statistics, and Helbling and Terrones[23] find that the discrepancy between global imports and exports is now on the order of 1%. International trade data is not usually available with the same level of detail as domestic data, and therefore additional estimation procedures are often necessary,[24] with different MRIO databases resorting to different methodological options (see Chapter 2 of Owen[17] for a review of how MRIO databases differ in source data and construction). In short, this means that MRIO tables suffer more uncertainty than single region IO tables. The current consensus is that the major source of uncertainty affecting CBCA results is the GHG emission data and not the economic data. In support of this hypothesis Owen et al.[16] found that national CBCAs have uncertainties in the same range as production-based ones. Karstensen et al.[25] furthermore point to the relatively large influence of emission coverage and choice of global warming or temperature response metric according to different time horizons. Finally, Moran and Wood[26] found that there is a general agreement in the CBCA results from different global MRIOs, again with the largest discrepancies being found in the GHG emission extensions. However, due to limitations of data availability and computational power, the reporting of uncertainty in CBCA is still rare. Further, when uncertainty is estimated, a key simplification is performed whereby the underlying calculations assume the data to be independent, that is, correlations between different elements in the data set are not considered. Correlation is a metric of association between different data elements, which measures the strength of the linear relationship between the data. Examples are the relationship between energy recorded in the intermediate transaction matrix and fuel combustion emissions in the GHG account or the level of demand for a product to the output of the product. While there is a large tradition in the study of the uncertainty of IO models,[27−35] recently reviewed in Temurshoev,[36] this literature does not usually take into account correlations and thus how uncertainty varies jointly between data elements. A few exceptions include[37−39] those reviewed by Ten Raa and Steel[40] and, more recently, studied by Rodrigues.[41] In investigating CBCA uncertainty, a few studies explicitly mention the role of correlations or data dependence. Hertwich and Peters[42] divide the source data by blocks, present estimates of uncertainty for each, and argue that the relative uncertainty of country CBCAs is lower than that of product-level CBCA because errors cancel out, thus assuming independence. Lenzen et al.,[43] the only dedicated study of uncertainty in MRIO analysis that we are aware of, argue that due to the lack of available data, source data should be assumed to be uncorrelated. Furthermore, relative errors of the data in their study decrease with the magnitude of a value. More recently, as arguments in favor of assuming zero correlations between MRIO coefficients, Karstensen et al.[25] argue that besides the absence of underlying information, filling in the correlation matrix of an MRIO model would be computationally prohibitive. In this study we investigate the uncertainty of CBCAs, focusing on what both users and producers of CBCA can learn when using dependent sampling (i.e., taking correlations into account). No existing database reports metrics of association or dependence between data, so we use the set of the five most common global MRIO tables as a sample. After characterizing the uncertainty and correlation structure of global CBCAs we examine how approximations with independent sampling perform. We then search for patterns in the source MRIO data and examine the impact of reducing the uncertainty of particular source elements on the global CBCA uncertainty. We conclude the paper by comparing our results with the literature, suggesting directions of future research and summarizing the implications of our findings.

Data and Methods

Data Sources and Basic Concepts

An environmentally extended global MRIO model is a description of the world economy, linking consumption in a given region through international supply chains to environmental pressures anywhere in the world. Different approaches have been used earlier to inform the error distribution of the source data: for example, Lenzen et al.[43] use a range of auxiliary statistics on relative standard errors of similar data, Lenzen et al.[11] use the degree of adjustment during the balancing procedure, and Yamakawa and Peters[34] use time-series variation. However, to the best of our knowledge these approaches do not allow quantifying the dependencies between different components of the data set, which is a crucial aspect of the research we wish to undertake. We therefore use a harmonized set of global MRIOs which we will use to calculate the uncertainty of CBCA using dependent sampling. The main data source of this study is a set of five global product-by-product[44] MRIO databases that were harmonized and tailored to calculate country/region-level global carbon footprints in the year 2007. These five MRIOs are constructed from the Exiobase[45] (http://exiobase.eu), WIOD[12] (http://www.wiod.org/), EORA[11] (http://worldmrio.com/, OECD[14] (http://www.oecd.org/sti/ind/input-outputtablesedition2015accesstodata.htm), and GTAP[10] (https://www.gtap.agecon.purdue.edu/ databases. Each MRIO database was harmonized to n = 22 regions, n = 17 sectors per region and a single category of final demand per country, primary inputs, and carbon emission types. The regional classification consists of 20 countries covering 78% of global CO2 emissions and two aggregate regions. The list of regions and sectors can be found as Supporting Information (SI) S1, and the concordance with the original MRIOs can be found in Owen et al.[16] and Steen-Olsen et al.[46] The CBCA of region k is obtained aswhere y is the k-th column of Y, the matrix of final demand, I is the identity matrix, A is the matrix of technical coefficients, expressing how much a unit of demand for a product leads to increased output in a sector, b is the vector of environmental stressors, or carbon direct emissions, indicating how much emissions result from a unit of production of every sector, and h is the k-th element of h, the vector of household carbon emissions. We assume that vectors are in column format by default and prime, ′, denotes transpose. The number of rows in Y and the number of both rows and columns in A is n × n. Coefficients A and b are calibrated on the basis of flows as A = Zx̂–1 and b = rx̂–1, where Z, x, and r are, respectively, the matrix of interindustry transactions, the vector of total sales, and the vector of direct carbon emissions of industries, and ^ represents diagonal matrix.

Data Compression

The number of points in an MRIO model can be very large (for reference EXIOBASE consists of over 96 million data points). This raises computational problems when analyzing covariances (explained in the following subsection) and makes the interpretation of the results more difficult. Hence, in this subsection we explain how it is possible to reduce the data volume while minimizing the loss of relevant information. According to Rodrigues et al.[47] the elements of A and Y are not obtained from source data, because use of imported products by industries is not generally known. Instead, algorithms are applied to import use tables that specify use of products by industries (but not source country) and international trade data that specifies bilateral trade in products (but not destination industry). The values reported in the MRIOs may differ from the pure application of the algorithms, due to the use of balancing[48] and other processing procedures[47] along the MRIO construction pipeline. We make this assumption explicit by replacing the original A and Y by alternatives which explicitly separate international trade from use of products, whose elements are defined aswhere subscript(s) and superscript(s) explicitly indicate the sector(s), i and j, and region(s), a and b. The sum of international trade coefficients, R (for intermediate use) and F (for final use), over import region a equals one. The sum of intermediate use coefficients, T, over product i is less than one (to account for primary inputs). The sum of demand composition coefficients, C, over product i is one. This formulation of eq splits absolute total country final demand, the scale factor s, from the relative composition and trade coefficients. The new quantities defined in eqs and 3 are obtained by aggregating the elements of A and Y over the appropriate dimensions. This formulation is very similar to the one illustrated in Tables 1–3 of Rodrigues et al.,[47] with the difference that their separate domestic and import use matrices are considered. Eq is also related to the way Owen et al.[16] split final demand into volume, region share, and product share. When undertaking this reformulation of the MRIO data, for country-level footprints, the average and maximum discrepancy between the original and transformed MRIOs (for the trade/use split described above) is less than 0.6 and 2.5%, respectively, for every region and MRIO. Additional considerations about the data compression and resulting discrepancies can be found in SI S2. Even in MRIOs in which the transformation defined by eqs and 3 is not an approximation but is exact (i.e., they were constructed using the trade-share assumption, such as EXIOBASE), there is still a discrepancy because those databases were constructed with more regional and sectoral detail. Since we performed aggregation to harmonize the different MRIOs, such discrepancies are expected. Under the above transformation there are three equivalent formats to represent an MRIO, all of which yield exactly the same CBCA results. These are the flow format: Z, Y, r, and h; the coefficient format: A, W, b, and d, where W = Ydiag(s)−1 and d = hdiag(s)−1; and the modular format: T (technology); R (trade in intermediate goods); C (demand composition); F (trade in consumer goods); s (scale); b (industry emissions), and d (household emissions). Scale factors have monetary units, both emission coefficient blocks have units of GHG emissions per monetary unit, and the other blocks are adimensional. We use these three different formats to show the bias in results when undertaking the assessment of uncertainty on the three formats when assuming independent sampling, while the modular format will be used in all other analyses reported in the Results where correlations are taken into account. Finally, SI S2 shows that the modular format provides significant advantages in terms of data storage. The modular format also makes the interpretation of results clearer by reducing the number of elements in a data block and thus facilitating the identification of patterns.

Modeling Uncertainty

The goal of this paper is to study the uncertainty of CBCAs and, in particular, to capture the effect of dependencies among MRIO data elements. In this subsection we describe how we do it: we first discuss and formalize the concept of uncertainty and then clarify the distinction between dependent and independent sampling. There are many potential sources of uncertainty[49] which, according to the Bayesian paradigm, can be formalized through a probability distribution, in which the probability of an outcome expresses the degree of belief that an observer has in that particular outcome.[50,51] In practice it is often difficult to assign a specific value to each and every possible outcome, so we resort to aggregate metrics. In this paper we follow the Bayesian approach of Weise and Woger[52] and interpret the mean of a probability distribution as the best guess and the standard deviation or SD as the uncertainty. Mathematically, if p(t) is the probability density function, and t is a non-negative real number, then the mean, μ, and standard deviation, σ, are If more than one variable is being considered, then a further metric for characterizing the dependency between them is necessary. The most common metric, which we use in this study, is the Pearson correlation coefficient or simply correlation, ρ. If the variables are t1 and t2, then the correlation iswhere p(t1,t2) is the joint probability density function, and μ1, μ2, σ1, and σ2 are the means and standard deviations of variables t1 and t2. Correlations measure the strength of association between two variables, expressing how close they are to exhibiting a linear relation, taking value 1 if it is strictly linear with positive slope, – 1 if strictly linear with negative slope, and 0 if no linearity is apparent. If a multivariate probability distribution function characterizing the MRIO is known, it is in principle possible to use eq to derive analytical expressions of the stochastic properties (mean, standard deviation, and correlations) of the CBCA. However, as far as we are aware such a formula is not available, so the typical way to calculate CBCA uncertainty is through Monte Carlo sampling. In Monte Carlo sampling a large sample of n realizations of the source data are generated, and for each realization the desired CBCA is calculated. These n CBCA realizations are then used to obtain aggregate metrics of the underlying probability distribution aswhere we used t, t1, and t2 as arbitrary variables, and k is the iterator in the sample set. Variables m, s, and r are the empirically calculated mean, standard deviation, and correlation; and m1, m2, s1, and s2 are the means and standard deviations of variables t1 and t2. A distinction should be made about the way the sample of MRIOs is extracted from the probability distribution. With dependent sampling all dimensions that compose the MRIO are obtained at the same time. Thus, if eq is applied to the MRIO sample, the same values are obtained as those from eq . If independent sampling is performed, then each dimension of the MRIO is sampled in isolation, and application of eq to the MRIO sample will consistently yield a zero correlation coefficient (given a sample size large enough). Note that a linear combination of MRIO systems that are balanced (row and column sums match) is itself balanced. Hence, every realization obtained through dependent sampling is balanced since it will be a weighted sum of the original MRIOs. A realization obtained from independent sampling, however, is not necessarily balanced, since the dependencies among row and column values of the original tables are not captured. Dependent sampling respects the dependencies in the source data, while independent sampling does not. Then why do all past studies of CBCA uncertainty use independent sampling, as reviewed in the Introduction? Because collecting data to calibrate the uncertainty of individual dimensions of an MRIO is already difficult enough, and no one has so far been able to calibrate correlations in the source data. In order to make this problem tractable, in the present study we interpret the population of five available MRIOs as a (very small) sample from the probability distribution of a true but unknown ’meta’-MRIO. We can then characterize both the meta-MRIO and the meta-CBCA distribution. Note that it is conventional to start with a known theoretical distribution (e.g., normal or log-normal) from which a (large) sample is extracted. We did attempt to characterize a conventional multivariate distribution based on the 5 MRIO sample but were unsuccessful. In the Discussion we present more details on the probability distributions we explored. To perform dependent sampling each individual MRIO is used to calculate a separate CBCA estimate, and eqs –9 are applied to the n = 5 resulting CBCAs to characterize the stochastic properties (mean, standard deviation, and correlations) of a meta-CBCA. Unless stated otherwise all of the material reported in the Results section is calculated this way. The properties of the source data, which are examined in later subsections of the Results are characterized by applying the same set of equations to the n = 5 population of MRIOs (interpreted as a sample of the unknown true meta-MRIO). It is important to contrast the results using dependent and independent sampling, to assess how much they differ. Hence, in subsection Bias of Independent Sampling in the Results section we make calculations with independent sampling. We obtain these by resampling the five MRIO set a large number of times, such that at every time a particular element of the MRIO is extracted with equal probability from each of the original five MRIOs. Mathematically, this procedure can be represented as follows. Let t(k) be the vector representation of the k-th MRIO obtained from independent sampling, with k = 1,...,n with n large. This means that t(k) is a particular MRIO entry (e.g., some technical coefficient). Under independent sampling every t(k) is set equal to the corresponding entry i of one of the original 5 MRIOs with the same probability.

Results

Uncertainties of Country CBCAs

The distribution of country CBCAs is heterogeneous and dominated by a small set of data elements (a small number of regions, sectors, extensions, inputs, etc.). The USA and China each represent more than 16.3% of global emissions (as does the composite region Rest of the World), with the next largest emitting country (Japan) representing less than 5.1% (see Table ). There is no clear trend in the relation between the coefficient of variation of regions and their respective size. In SI S3 we show the CV of all country CBCAs, where we see the lowest value is for Indonesia, at 2.2%, and the highest value is for The Netherlands at 16.0%, with a median of 7.5%.

Table 1

Expected Value and Coefficient of Variation of the CBCA of Selected Countries and the World

country	mean (%)	mean (GtCO₂)	CV (%)
USA	23.07	6.50	5.53
RoWorld	16.72	4.71	10.72
China	16.28	4.59	9.07
Japan	5.08	1.43	8.51
India	4.61	1.30	3.77
World	100.00	28.18	5.74

Table also shows that the CV of the world (calculated with dependent sampling) is 5.74%. If this value would have been obtained by summing over country CBCAs while assuming country CBCAs to be independent it would be 2.8%. That is, if the standard deviation of the world is obtained as , where σ is the standard deviation of a country. Hence, assuming independence among country CBCAs underestimates the world CBCA by half. This is because the distribution of correlations among country CBCAs strongly deviates from zero, as shown in Figure . Country CBCA correlations have a mean ± standard deviation (SD) of 0.63 ± 0.36, with a median of 0.76. This means that a practitioner, when faced with two countries and no additional information, should assume their CBCA to exhibit a strong positive correlation. Only Taiwan and the Rest of the World exhibit strong negative correlations with some other regions. It would be interesting to explore which characteristics the Taiwanese data shares with the Rest-of-the-World data, but this question falls outside the scope of the present study as it requires an in-depth study of those two particular regions.

Figure 1

Correlation between country CBCAs. See SI Table S2 for the meaning of country codes.

Uncertainties of Product CBCAs

If we now look into the contribution of different product categories to a country CBCA we find that, in total, over 80% of emissions come from just five product categories: electricity, transport, household direct emissions, fuel and trade (a composite product category), and oil. Figure shows the relation between the CV and expected value of product-level CBCAs, with power-law regression lines for those top-5 product categories. The power-law regression lines take the form y = ax, where y is the CV of product CBCA, x is the mean of product CBCA, and a and b are fitted parameters. Figure is represented in a loglog scale due to the large scatter in the data, which covers several orders of magnitude in both axes. The points underlying each regression line are the 22 regions.

Figure 2

Coefficient of variation (CV) vs mean of CBCAs of product categories. On the x axis, products with greater emission contributions appear to the right of the plot. On the y axis, products with larger variations appear at the top of the plot. A perfectly horizontal line would imply that uncertainty does not vary at all with the size of emissions from that product (as approximately the case in Electricity and Transport). Only regressions for the top 5 contributing product categories are shown. SI Section S3 reports the numerical values underlying the figure. We can see that the slope of those curves is not strong: indicating that relative uncertainty does not decrease with size (with the exception of the oil product category). Note also, that when comparing across consumption categories some of those top-5 product categories have the highest CV. Thus, our results do not support the observation that, in general, errors decrease with size. We have also examined the correlation between the uncertainty of aggregate consumption categories, as shown in Figure and 4. That is, we have plotted the two-way correlation coefficients between emissions for the same products in different regions and for the two-way correlation coefficients between product pairs within regions. We find that correlations are strongly positive for the same product category across regions. For ease of visualization, when comparing correlations of product category pairs within a region we have shown only the top-5. Here we find that the variation is low, and the medians span the whole range of possible uncertainties. The correlation of product-category pairs across regions, shown in SI Section S4, is somewhere in between, exhibiting larger dispersion and generally more positive values.

Figure 3

Figure 4

Correlations between products within a region. Elec = Electricity; FT = Fuel/trade; Trans = Transport; HH = households.

Correlation of same product across regions. Red horizontal line is median; box is 50%-confidence interval (interquartile range); the maximum length of whiskers is 1.5 times the interquartile range, and red circles are outliers. Correlations between products within a region. Elec = Electricity; FT = Fuel/trade; Trans = Transport; HH = households.

Bias of Independent Sampling

Analyses of CBCA uncertainty are usually performed using independent sampling or, equivalently, by explicitly assuming source data to be uncorrelated. That is, it is assumed that data from different sources for different nations or product categories can vary independently, e.g., emissions from electricity in China going up while emissions from electricity in the USA go down. We now compare the performance of such approximations with the actual results from dependent sampling. Note that even if one decides to ignore correlations in CBCA calculations, source data in MRIO databases is reported in absolute terms, while the use of MRIO data in CBCA calculations requires the conversion of intermediate inputs to coefficients. Dietzenbacher,[53] building upon the work of Roland-Holst,[54] finds that applying uncertainty estimates to the data pre- or postcalculation of coefficients yields similar results, as the bias (i.e., over- or underestimation of true results) is small. We consider three approximations, corresponding to the assumption of zero correlations on the flow, coefficient, and modular version of the MRIO (see Methods for details). We performed Monte Carlo sampling with 10000 simulations of which we report the bias (i.e., the distance between the approximation and the true value) for the mean and CV of CBCAs at different levels of aggregation. We report values for the relative bias of the mean, defined as (y–x)/x, where x is the actual value and y is the approximation; and for the absolute bias of the CV, y–x, where x and y hold the same meaning. For the world as a whole, the bias in the mean value is +5% for both the modular and coefficient formats and +0.06% for the flow format, while corresponding percentages for CV are in the range of 4 to 6%, as shown in Table . That is, in all cases, by assuming independent sampling as in conventional approaches, total emissions are overestimated when compared to dependent sampling, and only the bias of the mean from the flow format is negligible. Average country-level biases are much lower than on the global aggregate but with a large variation across countries, with the flow format still exhibiting a noticeably smaller bias, and with a smaller standard deviation than the other formats. Product-level biases in CBCAs for the flow format are clustered around zero, with a small but positive average. The product-level biases of the other two formats have a large variation, with a standard deviation five times larger than that of the flow format. The majority of product CBCA CV biases are in the negative range, and the flow format values are especially so, with a smaller SD than the other two formats.

Table 2

Bias of the Mean and CV of Aggregate CBCAsa

		modular	coefficient	flow
world	mean	4.96	5.06	0.06
	CV	5.91	5.72	4.35
countries	mean	0.88 ± 4.93	0.44 ± 4.84	0.17 ± 2.25
	CV	17.27 ± 6.00	15.57 ± 6.19	11.44 ± 6.40
products	mean	4.00 ± 24.02	3.50 ± 25.69	0.59 ± 5.03
	CV	–8.82 ± 23.95	–11.6 ± 24.09	–17.97 ± 16.32

For countries and products values reported are expected value ± standard deviation. All figures are in percentage.

For countries and products values reported are expected value ± standard deviation. All figures are in percentage. Thus, our analysis does not support the claim found in the literature, obtained with independent sampling calculations, that biases are positive but negligible. We find them to be substantial and, in the case of uncertainty, systematically negative, meaning that uncertainty is underestimated by independent sampling. In SI Section S5 we report the values of biases for countries and products for the different formats.

Patterns in Source Data Uncertainty

It is conventional in IO studies to use a downward-sloping relation between uncertainty (CV) and size (expectation). For example, if a country has a larger electricity sector, then the relative uncertainty of its emissions should be lower. This is on the basis of either statistics or the assumption that each value in the IO data set is actually the result of summing over smaller values, and errors therein cancel out. In our data set, however, the relation between CV and mean is mostly flat and with a large variance for most data blocks, as exemplified in Figure for industry emission coefficients and in SI Section S6 for other data blocks. Figure is represented in a loglog scale due to the large scatter in the data, which covers several orders of magnitude in both axes.

Figure 5

Uncertainty of production emission coefficients. On the x axis, the mean emission coefficients, carbon intensities, are plotted, with higher carbon intensities toward the right-hand side. The y axis shows the coefficients of variation in those product emission coefficients, with higher variations at the top of the plot. Estimations for each sector are fitted with a regression line and show that there is no systematic reduction in uncertainty (CV) as the sector emission coefficient increases. Following this, to better understand the relation between the expected value (magnitude) of a data point and the uncertainty of that data point, we calculated power-law regressions of the form y = ax, where y is the CV, x is the expected value, and a and b are fitted parameters. We performed this for all blocks in the MRIO data set. When a single regression is performed per data block, the coefficient of determination R2 is weak, as seen in column ’single’ of Table and the red line in Figure . That is, the coefficients for each sector and region in a given data block are not well correlated with one another. We additionally performed similar regressions for every sector-region combination (depending on the data block) that would still leave n = 22 regions for the regression. The black lines of Figure illustrate the result of this exercise for industry emission coefficients, and column ’sector’ of Table shows the R2 values. The R2 values of column ’sector’ are obtained by squaring the correlation between the set of true values and set of values obtained from the separate n sector-level regressions in each data block, where the number n is indicated in the last column of Table .

Table 3

R2 Coefficient of Power-Law Regressions between Mean and CV of Different Source Data Blocksa

block	single (%)	sector (%)	n (−)
scale	3.98	3.98	1
household emissions	8.55	8.55	1
composition	14.26	70.93	17
industry emissions	0.07	70.16	17
technology	15.5	55.82	289
trade in final products (self)	52.28	70.90	17
trade in intermediate products (self)	41.08	62.43	17
trade in final products (other)	9.88	66.71	357
trade in intermediate products (other)	9.49	62.77	357

Single = one regression per block; sector = one regression per sector; n = number of regressions per block.

Single = one regression per block; sector = one regression per sector; n = number of regressions per block. What we can learn from Figure is that at the level of the data block as a whole there is a flat trend connecting uncertainty and magnitude (even if with a large variation). That is, on the level of the data block, uncertainties do not decrease with the size of the data block but remain fairly stable. However, when examined at the level of specific sectors (the black lines, mostly with a positive gradient), there is actually an increase of uncertainty with the size of that sector. The overall flat pattern (the almost horizontal red line) emerges only when different sectors are bundled together. Similar plots to Figure for every data block are reported in SI S6, for which the same reasoning applies, although the results are not so extreme: the relation between uncertainty and size at the block level is downward-trending, but at the sector-level data this pattern becomes sometimes reversed, with the block-level pattern emerging from the juxtaposition of sector data. Separate plots in SI Section S6 and separate summary statistics in Table are reported for international trade coefficients with the same country (self) and with other countries (other) as the former coefficients are generally high and the former are generally low, so we found patterns were clearer if they were examined separately. We conclude that practitioners should not expect, a priori, that the slope of CV vs mean be downward sloping at the level of sectoral data, even if it is so for a data block as a whole. In fact, in many cases, the uncertainty may actually increase with increasing size.

Uncertainty Reduction

Next, we determine which factors dominate the uncertainty of the global CBCA. We do this by exploring the effect of reducing the uncertainty of specific source data elements or blocks. By doing this we can prioritize efforts in improving data collection for IO tables. We can also develop a further understanding about the interaction between uncertainty and different components in IO tables. We explore this uncertainty by an iterative, stepwise reduction of uncertainty in the source data and comparing the resulting CBCA uncertainty after each step. At each step, we find the data point in the IO tables for which setting that data point to the average of the separate MRIO values would give the greatest reduction in the overall uncertainty of the world CBCA. We find that 20 elements account for 99.9% of the uncertainty of global CBCA, listed in Table (The key of data block acronyms is reported in Table .). Since this process is path dependent, to assess the robustness of the result we repeated the calculations with alternative settings: setting the value not to the average but either to the minimum or to the maximum of the five MRIOs and not setting the value of all MRIOs exactly to the average but moving in that direction, by a factor of 10 and 50%. In all cases convergence is fast, and most of the same data elements reappear, although the exact data elements in each top-20 set are not always the same, as shown in SI S7. In future studies this path dependence might be avoidable if a linear approximation is performed.[55−57]

Table 4

Reduction on Global CBCA When the Uncertainty of Top 20 Data Elements Is Eliminated, Assuming the Mean Is the True Valuea

rank	block	source region	source sector	destination region	destination sector	Ind. (%)	Cum. (%)
1	IndEm	RoWorld	Electricity			19.95	80.05
2	IndEm	Russia	Electricity			9.04	71.01
3	IndEm	China	Electric eq.			9.13	61.87
4	Scale			USA		6.83	55.05
5	Techn.	China	Electricity		Electricity	8.25	46.80
6	Comp.			China	Electricity	8.03	38.76
7	IntTrade	RoWorld		RoWorld	Transport	5.34	33.42
8	IndEm	Russia	Oil			5.18	28.24
9	Techn.	India	Electricity		Transport	4.71	23.53
10	IntTrade	China		China	Electric eq.	4.23	19.30
11	Techn.	RoWorld	Transport		Transport	3.22	16.08
12	IndEm	China	Construction			3.41	12.67
13	Comp.			RoWorld	Other serv.	2.90	9.77
14	IntTrade	China		China	Mining	2.65	7.12
15	Techn.	USA	Electricity		Fuel/trade	2.40	4.72
16	IndEm	RoWorld	Construction			2.81	1.91
17	IndEm	Canada	Oil			1.31	0.61
18	Techn.	Russia	Transport		Oil	0.42	0.19
19	Techn.	USA	Oil		Communicat.	0.14	0.05
20	IntTrade	France		RoWorld	Transport eq.	0.04	0.01

The key of data block acronyms is reported in Table . Ind = individual; Cum = cumulative.

Table 5

List of Acronyms of Data Blocks

	short	long
1	Scale	scale
2	Comp	composition
3	Tech	technology
5	FinTr	trade in final goods
6	IntTr	trade in intermediate goods
7	HHEm	household emissions
8	IndEm	industry emissions

The key of data block acronyms is reported in Table . Ind = individual; Cum = cumulative. Table supports the idea that to understand uncertainty domestic data is more important than international trade data. China is by a large margin the most frequent country, followed by the rest of the world and other large economies. Among data blocks industry emissions stand out. Finally, electricity is the most frequent sector in this list, followed by transport, with other sectors appearing with much less frequency (oil, transport equipment, metals, construction and mining, among others). These findings reflect the underlying choices made in MRIO data construction–whether it be choice of data (geographical specificity, timeliness, based on physical or monetary relationships) or conceptual approaches to construction (prioritization of different data, level of balancing allowed, etc.). Other studies have found similar relationships and provide some reflection: Wieland et al.[58] also discovered that Chinese domestic data is an issue; Tukker et al.[59] found that the highest uncertainty in footprint analyses is caused by the environmental data; and Owen et al.[60] found that structural paths involving the electricity sector contribute significantly to model differences, because of the way electricity is treated in different databases. In summary, the examination of embodied emissions and the elimination of uncertainty in inputs yield a consistent perspective, with a small number of blocks, sectors, and regions accounting for the bulk of uncertainty.

Discussion

Empirical Estimates

Hertwich and Peters[42] reported uncertainty estimates for several data blocks, which can be compared with our own results. The ranges for the coefficient of variation (CV) of product CBCAs is in the range 50–200%, and country CBCAs is in the range 5–15%. These numbers agree well with our results, in which the CV of the product CBCAs ranged from 10 to 200%, and country CBCAs ranged from 2 to 16%. Concerning source data, the same reference[42] reports that the CV of emission coefficients is 5–10% for OECD and 10–20% for non-OECD countries; consumption coefficients are 10%; technical coefficients are 1–50%; and trade coefficients have 20% uncertainty concerning country of origin and 10% concerning trade volumes. By contrast, in our study industry emission coefficient CVs ranged from 10 to 200% and household emissions ranged from 10 to 50%; consumption ranged from 2 to 200%; technical coefficients ranged from 1 to 200%; trade coefficients with other countries ranged from 5 to 200% and self-trade coefficients ranged from 0.2 to 50% (ignoring outliers); and scale coefficients ranged from 1 to 15%. Although not all values show strong agreement, we can say that our results concerning CBCAs are in line with the literature, but they do differ strongly concerning the source data, with our results suggesting a wider uncertainty range in all data blocks.

Data and Method Considerations

The conventional procedure in IO independent Monte Carlo integration is to assume a normal[26,35] or log-normal[25,43] distribution, although others have also been used, in particular the beta distribution[40] (for a review see Kop Jansen[61] and Temurshoev[36]). Due to several technical reasons, we could not use these multivariate approaches. The normal distribution was ruled out because uncertainties are too high, leading to an unacceptably large proportion of negatives. The log-normal distribution cannot handle the large span of strong negative covariances observed in the data set. We could not find in the literature a natural multivariate version of another distribution that would accommodate arbitrary covariances as required to calibrate our model, although we explored variants of the beta,[62] gamma,[63] and folded normal[64] distributions. We also tried to use a formal sensitivity method to quantify how much a particular source data point contributes to the resulting uncertainty. As recently reviewed by Borgonovo and Plischke,[65] there are two main types of sensitivity analysis: local and global. Local sensitivity analysis examines the effect on model output of a change in a single parameter at a time and is employed in a deterministic framework, e.g., to identify the parameters that most strongly affect key sectors.[66] Global sensitivity analysis (GSA) breaks down the variation in model response among its multiple inputs at the same time. The most popular GSA method is the variance decomposition of Sobol,[67] in which the variance of output is split among additive terms that reflect the contribution of input variance, although distribution-based GSA methods[68] are gaining popularity. In the end, given the small sample size of the source data we chose not to use these methods and, instead, perform the heuristic analysis reported in this paper. As more MRIOs become available, or future revisions of existing MRIOs converge, it may become possible to use these more sophisticated techniques. Our present results are an important step forward and a benchmark against which to compare those future studies. Besides the analysis focusing on correlations performed here, in the future it might also be interesting to explore the role of partial correlations in MRIO and CBCA uncertainty.

Final Remarks

For CBCA practitioners our results suggest caution about the extrapolation of uncertainty when aggregating results over both spatial and sectoral scales. Contrary to the established literature we do not recommend assuming that errors cancel out and that independence can be safely assumed. We found that the CBCA of whole countries is strongly correlated and that, in general, the uncertainty of product CBCAs is not reduced as the size of that product CBCA increases. We also found that independent sampling (i.e., ignoring correlations in the source data) leads to the underestimation of uncertainty. For MRIO developers, our results point out the elements of the data landscape in which refinement efforts should be prioritized, if the goal is to reduce the uncertainty of CBCAs. These are primarily the following: the environmental extensions, among data blocks; data related to the electricity supply chain, from a sectoral point of view; and to China, from a geographic point of view. More generally, we provide a methodology for the prioritization of data refinements even if other criteria besides global CBCA are considered. Although the analysis reported here focused on CBCAs, a similar study could be performed for other environmental or economic extensions, provided they are reported by all MRIOs present in the sample. Finally, since part of the data used here is the same data used to calculate production-based and income-based carbon accounts,[1] all the results presented here are also transferable, within the scope of the relevant data block. More specifically, income-based carbon accounts[69,70] require all of the data used here, with value added coefficients replacing the role of consumption coefficients. In the case of production-based accounts it is direct emissions (from industry and households) and total (intermediate and final) consumption that are relevant. CBCA is seen as a prominent alternative to traditional, production-based approaches, and it could open up new opportunities for climate policy innovation.[8] However, the understanding of uncertainties in the data used for CBCA has been a key limiting factor.[71] The work outlined here represents a step toward understanding uncertainties and provides a basis for developing a standardized procedure for CBCA uncertainty estimation.

9 in total

1. Carbon footprint of nations: a global, trade-linked analysis.

Authors: Edgar G Hertwich; Glen P Peters
Journal: Environ Sci Technol Date: 2009-08-15 Impact factor: 9.028

2. CO2 embodied in international trade with implications for global climate policy.

Authors: Glen P Peters; Edgar G Hertwich
Journal: Environ Sci Technol Date: 2008-03-01 Impact factor: 9.028

Review 3. Sustainability. Systems integration for global sustainability.

Authors: Jianguo Liu; Harold Mooney; Vanessa Hull; Steven J Davis; Joanne Gaskell; Thomas Hertel; Jane Lubchenco; Karen C Seto; Peter Gleick; Claire Kremen; Shuxin Li
Journal: Science Date: 2015-02-27 Impact factor: 47.728

4. Income-Based Greenhouse Gas Emissions of Nations.

Authors: Sai Liang; Shen Qu; Zeqi Zhu; Dabo Guan; Ming Xu
Journal: Environ Sci Technol Date: 2016-12-22 Impact factor: 9.028

5. Mapping the structure of the world economy.

Authors: Manfred Lenzen; Keiichiro Kanemoto; Daniel Moran; Arne Geschke
Journal: Environ Sci Technol Date: 2012-07-13 Impact factor: 9.028

6. Drivers of the growth in global greenhouse gas emissions.

Authors: Iñaki Arto; Erik Dietzenbacher
Journal: Environ Sci Technol Date: 2014-05-05 Impact factor: 9.028

7. Mapping the Carbon Footprint of Nations.

Authors: Keiichiro Kanemoto; Daniel Moran; Edgar G Hertwich
Journal: Environ Sci Technol Date: 2016-09-15 Impact factor: 9.028

8. Consumption-based accounting of CO2 emissions.

Authors: Steven J Davis; Ken Caldeira
Journal: Proc Natl Acad Sci U S A Date: 2010-03-08 Impact factor: 11.205

9. Reduced carbon emission estimates from fossil fuel combustion and cement production in China.

Authors: Zhu Liu; Dabo Guan; Wei Wei; Steven J Davis; Philippe Ciais; Jin Bai; Shushi Peng; Qiang Zhang; Klaus Hubacek; Gregg Marland; Robert J Andres; Douglas Crawford-Brown; Jintai Lin; Hongyan Zhao; Chaopeng Hong; Thomas A Boden; Kuishuang Feng; Glen P Peters; Fengming Xi; Junguo Liu; Yuan Li; Yu Zhao; Ning Zeng; Kebin He
Journal: Nature Date: 2015-08-20 Impact factor: 49.962