Literature DB >> 34194954

Green Algorithms: Quantifying the Carbon Footprint of Computation.

Loïc Lannelongue^1,2,3, Jason Grealey^4,5, Michael Inouye^{1,2,3,4,6,7,8}.

Abstract

Climate change is profoundly affecting nearly all aspects of life on earth, including human societies, economies, and health. Various human activities are responsible for significant greenhouse gas (GHG) emissions, including data centers and other sources of large-scale computation. Although many important scientific milestones are achieved thanks to the development of high-performance computing, the resultant environmental impact is underappreciated. In this work, a methodological framework to estimate the carbon footprint of any computational task in a standardized and reliable way is presented and metrics to contextualize GHG emissions are defined. A freely available online tool, Green Algorithms (www.green-algorithms.org) is developed, which enables a user to estimate and report the carbon footprint of their computation. The tool easily integrates with computational processes as it requires minimal information and does not interfere with existing code, while also accounting for a broad range of hardware configurations. Finally, the GHG emissions of algorithms used for particle physics simulations, weather forecasts, and natural language processing are quantified. Taken together, this study develops a simple generalizable framework and freely available tool to quantify the carbon footprint of nearly any computation. Combined with recommendations to minimize unnecessary CO2 emissions, the authors hope to raise awareness and facilitate greener computation.

Entities: Chemical

Keywords: climate change; computational research; green computing

Year: 2021 PMID： 34194954 PMCID： PMC8224424 DOI： 10.1002/advs.202100707

Source DB: PubMed Journal: Adv Sci (Weinh) ISSN： 2198-3844 Impact factor: 16.806

Introduction

The concentration of greenhouse gases (GHGs) in the atmosphere has a dramatic influence on climate change with both global and locally focused consequences, such as rising sea levels, devastating wildfires in Australia, extreme typhoons in the Pacific, severe droughts across Africa, as well as repercussions for human health. With 100 megatonnes of CO2 emissions per year (Note S1, Supporting information), similar to American commercial aviation, the contribution of data centers and high‐performance computing facilities to climate change is substantial. So far, rapidly increasing demand has been paralleled by increasingly energy‐efficient facilities, with overall electricity consumption of data centers somewhat stable. However, this stability is likely to end in the coming years, with a best‐case scenario forecasting a threefold increase in the energy needs of the sector.[ , ] Advances in computation, including those in hardware, software, and algorithms, have enabled scientific research to progress at unprecedented rates. Weather forecasts have increased in accuracy to the point where 5‐day forecasts are approximately as accurate as 1‐day forecasts 40 years ago,[ ] physics algorithms have produced the first direct image of a black hole 55 million light‐years away,[ , , ] the human genome has been mined to uncover thousands of genetic variants for disease,[ ] and machine learning (ML) permeates many aspects of society, including economic and social interactions.[ , , , ] An example of the scale of computational research in science is the Extreme Science and Engineering Discovery Environment (XSEDE) in the USA. In 2020 only, almost 9 billion compute hours have been used for scientific computing,[ ] a pace of 24 million h per day. Yet, the costs associated with large‐scale computation are not being fully captured. Power consumption results in GHG emissions and the environmental costs of performing computations using data centers, personal computers, and the immense diversity of architectures are unclear. While programmes in green computing (the study of environmentally responsible information and communications technologies) have been developed over the past decade, these mainly focus on energy‐efficient hardware and cloud‐related technologies.[ , , ] With widely recognized power‐hungry and expensive training algorithms, deep learning has begun to address its carbon footprint. ML models have grown exponentially in size over the past few years,[ ] with some algorithms training for thousands of core‐hours, and the associated energy consumption and cost have become a growing concern.[ ] In natural language processing (NLP), Strubell et al.[ ] found that designing and training translation engines can emit between 0.6 and 280 tonnes of CO2. While not all NLP algorithms require frequent retraining, algorithms in other fields are run daily or weekly, multiplying their energy consumption. Astronomy also relies largely on supercomputers to analyses data, which has motivated some investigations into the carbon footprint of the field.[ , ] For example, it has been estimated that the usage of supercomputers by Australian astronomers emits 15 kilotonnes of CO2 per year, equivalent to 22 tonnes per researcher.[ ] Cryptocurrencies, and their so‐called “mining farms,” have also seen their environmental impact increase exponentially in recent years, and several reports have shed doubts on their sustainability. A 2018 study estimated the yearly energy usage of Bitcoin to be 46 TWh, resulting in 22 Mt of CO2 released in the atmosphere.[ ] In March 2021, Bitcoin usage is estimated to be 130 TWh, which, if Bitcoin was a country, would rank its energy usage in 28th highest position in the world, ahead of Argentina and Ukraine.[ ] Although crypto‐mining relies on dedicated hardware (application‐specific integrated circuits) instead of usual processors; therefore, it does not compete directly with scientific computing. Regardless, the magnitude of its carbon footprint needs to be addressed urgently. Previous studies have made advances in estimating GHG emissions of computation but have limitations which preclude broad applicability. These limitations include the requirement that users self‐monitor their power consumption[ ] and are restricted with respect to hardware (e.g., GPUs and/or cloud systems[ , ]), software (e.g., Python package integration[ ]), or applications (e.g., ML).[ , , ] To facilitate green computing and widespread user uptake, there is a clear, and arguably urgent, need for both a general and easy‐to‐use methodology for estimating carbon emissions that can be applied to any computational task. In this study, we present a simple and widely applicable method and a tool for estimating the carbon footprint of computation. The method considers the different sources of energy usage, such as processors and memory, overhead of computing facilities and geographic location, while balancing accuracy and practicality. The online calculator (www.green‐algorithms.org) implements this methodology and provides further context by interpreting carbon amounts using travel distances and carbon sequestration. We demonstrate the applicability of the Green Algorithms method by estimating the carbon footprint of particle physics simulations, weather forecast models, and NLP algorithms as well as the carbon effects of distributed computation using multiple CPUs. Finally, we make recommendations on ways for scientists to reduce their GHG emissions as well as discuss the limitations of our approach.

Results

We developed a simple method which estimates the carbon footprint of an algorithm based on a number of factors, including the hardware requirements of the tool, the runtime and the location of the data center (Experimental Section). Using a pragmatic scaling factor (PSF), we further augment our model by allowing for empirical estimates of repeated computations for a particular task, for example, parameter tuning and trial‐and‐errors. The resultant gCO2e is compared to the amount of carbon sequestered by trees and the emissions of common activities such as driving a car and air travel. We designed a freely available online tool, Green Algorithms (www.green‐algorithms.org; Figure ), which implements our approach and allows users to evaluate their computations or estimate the carbon savings or costs of redeploying them on other architectures.

Figure 1

The Green Algorithms calculator (www.green‐algorithms.org).

The Green Algorithms calculator (www.green‐algorithms.org). We apply this tool to a range of algorithms selected from a variety of scientific fields: physics (particle simulations and DNA irradiation), atmospheric sciences (weather forecasting), and ML (NLP) (Figure ). For each task, we curate published benchmarks and use www.green‐algorithms.org to estimate the GHG emissions (Experimental Section). For parameters independent of the algorithm itself, we use average worldwide values, such as the worldwide average power usage effectiveness (PUE) of 1.67[ ] and carbon intensity (CI) of 475 gCO2e kWh−1.[ ]

Figure 2

Carbon footprint (gCO2e) for a selection of algorithms, with and without their pragmatic scaling factor.

Particle Physics Simulations

In particle physics, complex simulations are used to model the passage of particles through matter. Geant4[ ] is a popular toolkit based on Monte‐Carlo methods with wide‐ranging applications, such as the simulation of detectors in the Large Hadron Collider and analysis of radiation burden on patients in clinical practice or external beam therapy.[ , , ] Meylan et al.[ ] investigated the biological effects of ionizing radiations on DNA on an entire human genome (6.4 × 109 nucleotide pairs) using GEANT4‐DNA, an extension of GEANT4. To quantify the DNA damage of radiation, they run experiments with photons of different energy, from 0.5 to 20 MeV. Each experiment runs for three weeks to simulate 5000 particles (protons) using 24 processing threads and up to 10 GB of memory. Using the Green Algorithms tool, and assuming an average CPU power draw (such as the Xeon E5‐2680, capable of running 24 threads on 12 cores), and worldwide average values for PUE and CI, we estimated that a single experiment emits 49 465 gCO2e. When taking into account a PSF of 11, corresponding to the 11 different energy levels tested, the carbon footprint of such study is 544 115 gCO2e. Using estimates of car and air travel (Experimental Section), 544 115 gCO2e is approximately equivalent to driving 3109 km (in a European car) or flying economy from New York to San Francisco. In terms of carbon sequestration (Experimental Section), it would take a mature tree 49 years to remove the CO2 equivalent to the GHG emissions of this study from the atmosphere (593 tree‐months). A common way to reduce the running time of algorithms is to distribute the computations over multiple processing cores. If the benefit in terms of time is well documented for each task, as in[ ], the environmental impact is usually not taken into account. GEANT4 is a versatile toolbox; it contains an electromagnetic package simulating particle transport in matter and high energy physics detector response.[ ] Schweitzer et al.[ ] use a standardized example, TestEm12,[ ] to compare the performances of different hardware configurations, from 1 to 60 cores (i.e., a full Xeon Phi CPU). With the Green Algorithms tool, we estimated the carbon footprint of each configuration (Figure ), which shows that increasing the number of cores up to 15 improves both running time and GHG emissions. However, when multiplying the number of cores further by 4 (from 15 to 60), the running time is only halved, resulting in a twofold increase in emissions, from 238 to 481 gCO2e. Generally, if the reduction in running time is lower than the relative increase in the number of cores, distributing the computations will worsen the carbon footprint. In particular, scientists should be mindful of marginal improvements in running time which have disproportionally large effects on GHG emissions, as demonstrated by the gap between 30 and 60 cores in Figure 3. For any parallelized computation, there is likely to be a specific optimal number of cores for minimal GHG emissions.

Figure 3

Effect of parallelization using multiple cores on run time and carbon footprint using TestEm12 GEANT4 simulation.

Weather Forecasting

Weather forecasts are based on sophisticated models simulating the dynamics between different components of the earth (such as the atmosphere and oceans). Operational models face stringent time requirements to provide live predictions to the public, with a goal of running about 200–300 forecast days (FDs) in one (wall clock) day.[ ] Neumann et al.[ ] present the performances of two models in use for current weather forecasts: i) the Integrated Forecast System (IFS)[ ] used by the European Centre for Medium‐Range Weather Forecasts (ECMWF) for 10‐day forecasts, and ii) the ICOsahedral Non‐hydrostatic (ICON)[ ] designed by the German Weather Service (Deutscher Wetterdienst, DWD) and whose predictions are used by more than 30 national weather services.[ ] The configurations in daily use by the ECMWF include a supercomputer based in Reading, UK, which has a PUE of 1.45,[ ] while ICON is run on the German Meteorological Computation Centre (DMRZ)[ ] based in Germany (PUE unknown). Neumann et al.[ ] ran their experiments on hardware similar to what equips both facilities, “Broadwell” CPU nodes (Intel E5‐2695v4, 36 cores) and minimum 64 GB memory per node. We utilize these parameters for our CO2e emission estimates. It is important to note that ICON and IFS each solve slightly different problems, and therefore are not directly comparable. The DWD uses ICON with a horizontal resolution of 13 km[ ] and generates a FD in 8 min. Based on the experiments run by Neumann et al.,[ ] this requires 575 Broadwell nodes (20 700 CPU cores). We estimate that generating one FD emits 12 848 gCO2e (14 tree‐months). With a running time of 8 min per FD, ICON can generate 180 FDs in 24 h. When taking into account this PSF of 180, we estimated that each day, the ICON weather forecasting algorithm releases ≈2 312 653 gCO2e, equivalent to driving 13 215 km or flying from New York to San Francisco four times. In terms of carbon sequestration, the emissions of each day of ICON weather forecast are equivalent to 2523 tree‐months. At ECMWF, IFS makes 10‐day operational weather forecasts with a resolution of 9 km. To achieve a similar threshold of 180 FDs per day, 128 Broadwell nodes are necessary (4608 cores).[ , ] Using the PUE of the UK ECMWF facility (1.45), we estimate the impact of producing one FD with IFS to be 1660 gCO2e. Using a PSF of 180 for one day's forecasts, we estimated emissions of 298 915 gCO2e, equivalent to driving 1708 km or three return flights between Paris and London. These emissions are equivalent to 326 tree‐months. Furthermore, we modeled the planned scenario of the ECMWF transferring its supercomputing to Bologna, Italy, in 2021.[ ] Compared to the data center in Reading, the new data center in Bologna is estimated to have a more efficient PUE of 1.27.[ ] Prima facie this move appears to save substantial GHG emissions; however, it is notable that the CI of Italy is 33% higher than the UK.[ ] Unless the sources of electricity for the data center in Bologna are different from the rest of Italy and in the absence of further optimizations, we estimated that the move would result in an 18% increase in GHG emissions from the ECMWF (from 298 915 to 350 063 gCO2e).

Natural Language Processing

In NLP, the complexity and financial costs of model training are major issues.[ ] This has motivated the development of language representations that can be trained once to model the complexity of natural language, and which could be used as input for more specialized algorithms. The BERT (Bidirectional Encoder Representations from Transformers)[ ] algorithm is a field leader which yields both high performance and flexibility: state‐of‐the‐art algorithms for more specific tasks are obtained by fine‐tuning a pre‐trained BERT model, for example in scientific text analysis[ ] or biomedical text mining.[ ] Yet, while the BERT model is intended to avoid retraining, many data scientists, perhaps understandably, continue to recreate or attempt to improve upon BERT, leading to redundant and ultimately inefficient computation as well as excess CO2e emissions. Even with optimized hardware (such as NVIDIA Volta GPUs), a BERT training run may take three days or more.[ ] Using these optimized parameters, Strubell et al.[ ] showed that a run time of 79 h on 64 Tesla V100 GPUs was necessary to train BERT, with a usage factor of the GPUs of 62.7%. With the Greens Algorithms calculator, we estimated that a BERT training run would emit 754 407 gCO2e (driving 4311 km in a European car; 1.3 flights from New York to San Francisco; and 823 tree‐months). When considering a conservative PSF of 100 for hyperparameters search, we obtain a carbon footprint of 75 440 740 gCO2e. While BERT is a particularly widely utilized NLP tool, Google has also developed a chatbot algorithm, Meena, which was trained for 30 days on a TPU‐v3 Pod containing 2048 Tensor Processing Unit (TPU) cores.[ ] There is limited information on the power draw of TPU cores and memory; however, the power supply of this pod has been estimated to be 288 kW.[ ] Using a run time of 30 days, assuming full usage of the TPUs and ignoring memory power draw, the Greens Algorithms calculator estimated that Meena training emitted 164 488 320 gCO2e, which corresponds to 179 442 tree‐months or 71 flights between New‐York and Melbourne.

Discussion

The method and Green Algorithms tool presented here provides users with a practical way to estimate the carbon footprint of their computations. The method focuses on producing sensible estimates with small overheads for scientists wishing to measure the footprint of their work. Consequently, the online calculator is simple to use and generalizable to nearly any computational task. We applied the Green Algorithms calculator to a variety of tasks, including particle physics simulations, weather forecasting, and NLP, to estimate their relative and ongoing carbon emissions. Real‐world changes to computational infrastructures, such as moving data centers, was also quantifiable in terms of carbon footprint and was shown to be of substantive importance; for example, moving data centers may attain a more efficient PUE but a difference in CI may negate any efficiency gains, potentially making such a move detrimental to the environment. Our work substantially enhances and extends prior frameworks for estimating the carbon footprint of computation. In particular, we have integrated and formalized previously unclear factors such as core usage and unitary power draw (per‐core or per‐GB of memory). As a result, and as presented in the Experimental Section, the carbon footprint of an algorithm can be broken down to a small number of key, easily quantifiable elements, such as number of cores, memory size and usage factor. This reduces the burden on the user, who is not required to either measure the power draw of hardware manually or use a limited range of cloud providers for their computations. This makes the method highly flexible in comparison to previous work. Besides drawing attention to the growing issue of GHG emissions of data centers, one of the benefits of presenting a detailed open methodology and tool is to provide users with the information they need to reduce their carbon footprint. Perhaps the most important challenge in green computing is to make the estimation and reporting of GHG emissions a standard practice. This requires transparent and easy‐to‐use methodology, such as the Green Algorithms calculator (www.green‐algorithms.org) and the open‐source code and data presented here (see section Code availability). Our approach has a number of limitations. First, the carbon footprint estimated is restricted to GHGs emitted to power computers during a particular task. We do not perform a life cycle assessment and therefore, do not consider the full environmental and social impact of manufacturing, maintaining, and disposing of the hardware used, or the maintenance of the power plants. Including these is impractical at scale and would greatly reduce who can use the method. Besides, the conversion of the impact of various GHG into CO2e is commonly based on a 100‐year timescale; however, this is now debated as it can misrepresent the impact of short‐lived climate pollutants like methane[ ] and new standards may be needed in the future. Second, the TDP may substantially underestimate power draw in some situations. For example, when hyperthreading is used, the real power consumption can be double the indicated TDP.[ ] The TDP value remains a sensible estimate of the base consumption of the processor in most situations, but users using hyperthreading should be aware of the impact on power consumption. Third, while the power consumption from storage is usually minimal at the scale of one computation, if central storage is constantly queried by the algorithm (for example, to avoid overloading memory), this can be an important factor in power draw; however, there are resources which can be utilized if the algorithm is designed to be heavily storage reliant.[ ] Moreover, at the scale of the data center, storage represents a significant part of electricity usage[ ] and research projects relying on large databases should separately acknowledge the long‐term carbon footprint of storage. Fourth, while some averaging is necessary, the energy mix of a country varies by the hour. For example, the CI of South Australia, which relies on wind and gas to produce electricity,[ ] can vary between 112 and 592 gCO2e kWh−1 within one day, depending on the quantity of coal‐produced electricity imported from the neighboring state of Victoria.[ ] Although most regions are relatively stable, these outliers may require a finer estimation. Our online calculator uses averaged values sourced from government reports.[ ] Fifth, the PUE has some limitations as a measure of data centers energy usage,[ , ] due to inconsistencies in ways to calculate it. For example, reporting of PUE is highly variable from yearly averages to best‐case scenarios, such as in winter when minimal cooling is required (as demonstrated by Google's quarterly results[ ]). Whether to include infrastructure components such as security or on‐site power generation is also source of discrepancies between data centers.[ ] Although some companies present well‐justified results, many PUEs have no or insufficient justification. Furthermore, PUE is not defined when computations are run on a laptop or desktop computer. As the device is used for multiple tasks simultaneously, it is impossible to estimate the power overhead due to the algorithm. In the calculator, we use a PUE of 1 because of the lack of information, but we caution this should not be interpreted as a sign of efficiency. Even though discrepancies will remain, the widespread adoption of an accurate, transparent, and certified estimation of PUE, such as the ISO/IEC standard,[ ] would be a substantial step for the computing community. Sixth, the carbon emissions in the section Results are based on manual curation of the literature. When parameters such as usage factor or PUE were not specified, we made some assumptions (100% core usage, or using average PUE) that can explain differences between our estimates and the real emissions. For best results, authors should estimate and publish their emissions. There are various, realistic actions one can take to reduce the carbon footprint of their computation. Acting on the various parameters in Green Algorithms (see Experimental Section), is a clear and easy way approach. Below, we describe a selection of practical changes one can make:

Algorithm Optimization

Increasing the efficiency of an algorithm can have myriad benefits, even apart from reducing its carbon footprint. Therefore, we highly recommend this and foresee algorithm optimization as one of the most productive, easily recognizable core activities of green computing. While speed is an obvious efficiency gain, part of algorithm optimization also includes memory minimization. The power draw from memory mainly depends on the memory available, not the actual memory used,[ ] and the memory available is often the peak memory needed for one step of the algorithm (typically a merge or aggregation). By optimizing these steps, one can easily reduce energy consumption.

Reduce the Pragmatic Scaling Factor

Limiting the number of times an algorithm runs, especially those that are power hungry, is perhaps the easiest way to reduce carbon footprint. Relatedly, best practices to limit PSF (as well as financial cost) include limiting parameter fine‐tuning to the minimum necessary and building a small‐scale example for debugging.

Choice of Data Center

Carbon footprint is directly proportional to data center efficiency and the CI of the location. The latter is perhaps the parameter which most affects total carbon footprint because of inter‐country variation, from under 20 gCO2e kWh−1 in Norway and Switzerland to over 800 gCO2e kWh−1 in Australia, South Africa, and some states in the USA. To rigorously assess the impact of punctually relocating computations, the marginal CI, rather than the average one, should be used.[ ] The marginal value depends on which power plant would be solicited to meet the unexpected increased demand. Although it would ideally be used, it varies by the hour and is often not practical to estimate accurately at scale. When the marginal CI is unknown, the average one (presented in Experimental Section and Figure S2, Supporting information) can be used by scientists as a practical lower bound estimate to assess the benefit of moving computations. Indeed, due to the low operating cost of renewable technologies, the marginal power plants (which are the last one solicited) are generally high‐carbon technologies such as fuel or gas[ ] which leads the marginal CI to be higher than the average CI. Besides, if the move is permanent, by relocating an HPC facility or using cloud computing for example, then the energy needs are incorporated into utility planning and the average CI is the appropriate metric to use. Data center efficiency (PUE) varies widely between facilities but, in general, large data centers optimize cooling and power supply, reducing the energy overhead and making them more efficient than personal servers. Notably, a 2016 report estimated that if 80% of small data centers in the USA were aggregated into hyperscale facilities, energy usage would reduce by 25%.[ ] For users to make informed choices, data centers should report their PUE and other energy metrics. While large providers like Google or Microsoft widely advertise their servers’ efficiency,[ , ] smaller structures often do not. As highlighted here, cloud providers offer the opportunity to use efficient data centers in low‐carbon countries, and they can be greener alternatives to local data centers.

Offsetting GHG emissions

Carbon offsetting is a flexible way to compensate for carbon footprint. An institution or a user themself can directly support reductions in CO2 or other GHGs, for example by sponsoring fuel‐efficient stoves in developing countries, reduction in deforestation or hydroelectric and wind‐based power plants.[ , ] The pros and cons of carbon offsetting are still debated due to the variety of mechanisms and intricate international legislations and competing standards. Therefore, we only present here an overview and point interested scientists at some resources. Multiple international standards regulate the purchase of carbon credits and ensure the efficiency of the projects supported.[ ] Most of the well‐established standards are managed by non‐profits and abide by the mechanisms set in place by the Kyoto protocol (in particular Certified Emission Reduction)[ ] and the PAS 2060 Carbon Neutrality standard from the British Standards Institution.[ ] Although the primary aim is carbon offsetting, projects are often also selected in line with the United Nations’ Agenda 30 for Sustainable Development,[ ] a broader action plan addressing inequalities, food security, and peace. Amongst the most popular standards are the Gold Standard (founded by WWF and other NGOs),[ ] Verra (formerly Verified Carbon Standard),[ ] and the American Carbon Registry (a private voluntary GHG registry).[ ] In addition to direct engagement with these standards, platforms like Carbon Footprint[ ] select certified projects and facilitate the purchase of credits.

Conclusions

The framework presented here is generalizable to nearly any computation and may be used as a foundation for other aspects of green computing. The carbon footprint of computation is substantial and may be affecting the climate. We therefore hope that this new tool and metrics raise awareness of these issues as well as facilitate pragmatic solutions which may help to mitigate the environmental consequences of modern computation. Overall, with the right tools and practices, we believe HPC and cloud computing can be immensely positive forces for both improving the human condition and saving the environment.

Experimental Section

The carbon footprint of an algorithm depends on two factors: the energy needed to run it and the pollutants emitted when producing such energy. The former depends on the computing resources used (e.g., number of cores, running time, and data center efficiency) while the later, called carbon intensity, depends on the location and production methods used (e.g., nuclear, gas, or coal). There are several competing definitions of “carbon footprint,” and in this project, the extended definition from Wright et al.[ ] was used. The climate impact of an event is presented in terms of carbon dioxide equivalent (CO2e) and summarizes the global warming effect of the GHG emitted in the determined timeframe, here running a set of computations. The GHGs considered were carbon dioxide (CO2), methane (CH4) and nitrous oxide (N2O);[ ] these are the three most common GHGs of the “Kyoto basket” defined in the Kyoto Protocol[ ] and represent 97.9% of global GHG emissions.[ ] The conversion into CO2e was done using Global Warming Potential (GWP) factors from the Intergovernmental Panel on Climate Change (IPCC)[ , ] based on a 100‐year horizon (GWP100). When estimating these parameters, accuracy and feasibility must be balanced. This study focused on a methodology that could be easily and broadly adopted by the community and therefore, restricts the scope of the environmental impact considered to GHGs emitted to power computing facilities for a specific task. Moreover, the framework presented requires no extra computation, nor involves invasive monitoring tools.

Energy Consumption

An algorithm's energy[ ] needs were modeled as a function of the running time, the number, type, and process time of computing cores (CPU or GPU), the amount of memory mobilized, and the power draw of these resources. The model further included the efficiency of the data center,[ ], which represents how much extra power is necessary to run the facility (e.g., cooling and lighting). Similar to previous works,[ , ] this estimate was based on the power draw from processors and memory, as well as the efficiency of the data center. However, the formula was refined and flexibility was added by including a unitary power draw (per core and per GB of memory) and the processor's usage factor. The energy consumption E (in kWh) was expressed as: where t is the running time (hours), n c the number of cores, and n m the size of memory available (gigabytes). u c is the core usage factor (between 0 and 1). P c is the power draw of a computing core and P m the power draw of memory (Watt). PUE is the efficiency coefficient of the data centre. The assumptions made regarding the different components are discussed below. It is previously shown that the power draw of a server motherboard is negligible[ ] and in a desktop computer, the motherboard handles a multitude of tasks, which makes it impractical to assess the fraction of power usage attributable to a specific algorithm. For these reasons, it was decided not to include the motherboard's power draw in this model.

Power Draw of the Computing Core

The metric commonly used to report the power draw of a processor, either CPU or GPU, is its thermal design power (TDP, in Watt) and is provided by the manufacturer. TDP values frequently corresponded to CPU specifications which include multiple cores, thus here TDP values were normalized to per‐core. While TDP is not a direct measure of power consumption, rather the amount of heat a cooling system dissipates during regular use—it is commonly considered a reasonable approximation. The energy used by the processor was the power draw multiplied by processing time, scaled by the usage factor. However, processing time could not be known a priori and, on some platforms, tracking could be impractical at scale. Modeling exact processing time of past projects may also necessitate re‐running jobs, which would generate unnecessary emissions. Therefore, when this processing time is unknown, the simplifying assumption was made that core usage is 100% of run time (u c = 1 in Equation (1)).

Power Draw from Memory

Memory power draw is mainly due to background consumption with a negligible contribution from the workload and database size.[ ] Moreover, the power draw is mainly affected by the total memory allocated, not by the actual size of the database used, because the load is shared between all memory slots which keeps every slot in a power‐hungry active state. Therefore, the primary factor influencing power draw from memory is the quantity of memory mobilized, which simply requires an estimation of the power draw per gigabyte. Measured experimentally, this was estimated to be 0.3725 W per GB.[ , ] For example, requesting 29 GB of memory draws 10.8 W, which is the same as one core of a popular Core‐i5 CPU. Figure S1, Supporting information further compares the power draw of memory to a range of popular CPUs.

Power Draw from Storage

The power draw of storage equipment (HDD or SSD) varies significantly with workload.[ ] However, in regular use, storage is typically solicited far less than memory and is mainly used as a more permanent record of the data, independently of the task at hand. The power draw of storage was estimated to be 0.001 W per GB (Note S2, Supporting information). As above, by comparison, the power draw of memory (0.3725 W per GB) and a Core‐i5 CPU (10.8 W per core) are more than two orders of magnitude greater. While the researcher overhead for approximating storage usage may not be substantial, it is unlikely to make a significant difference to overall power usage (and GHG emissions) estimation. Therefore, the power consumption of storage was not considered in this work.

Energy Efficiency

Data center energy consumption includes additional factors, such as server cooling systems, power delivery components, and lighting. The efficiency of a given data center can be measured by the Power Usage Effectiveness,[ , ] defined as the ratio between the total power drawn by the facility and the power used by computing equipment: A data center PUE of 1.0 represents an ideal situation where all power supplied to the building is utilized by computing equipment. The global average of data centers was estimated as 1.67 in 2019.[ ] While data centers with relatively inefficient PUE may not report it as such, some data centers and companies have invested significant resources to bring their PUEs as close to 1.0 as possible; for example, Google uses ML to reduce its global yearly average PUE to 1.10.[ , ]

Carbon Intensity of Energy Production

For a given country and energy mix, the carbon footprint in CO2e represents the amount of CO2 with the same global warming impact as the GHGs emitted, which simplifies the comparison between different electricity production methods. The carbon footprint of producing 1 kWh of energy (CI) varies significantly between locations due to the broad range of production methods (Figure S2, Supporting information), from 12 gCO2e kWh−1 in Switzerland (mainly powered by hydro) to 880 gCO2e kWh−1 in Australia (mainly powered by coal and gas).[ , ] The 2020 CI values aggregated by Carbon Footprint [ ] were used; these production factors take into account the GHG emissions at the power plants (power generation) as well as, when available, the footprint of distributing energy to the data center.

Estimation of Carbon Footprint

The carbon footprint C (in gCO2e) of producing a quantity of energy E (in kWh) from sources with a CI (in gCO2e kWh−1) is then: By putting together Equations (1) and (3), the long‐form equation of the carbon footprint Cis obtained:

CO2e of Driving and Air Travel

gCO2e was contextualized by estimating an equivalence in terms of distance travelled by car or by passenger aircraft. Previous studies had estimated the emissions of the average passenger car in Europe as 175 gCO2e km−1[ , ] (251 gCO2e km−1 in the United States[ ]). The emissions of flying on a jet aircraft in economy class were estimated between 139 and 244 gCO2e km−1 per person, depending on the length of the flight.[ ] Three reference flights were used: Paris to London (50 000 gCO2e), New York to San Francisco (570 000 gCO2e), and New York to Melbourne (2 310 000 gCO2e).[ ]

CO2 Sequestration by Trees

Trees play a major role in carbon sequestration and although not all GHGs emitted could be sequestered, CO2 represents 74.4% of these emissions.[ ] To provide a metric of reversion for CO2e, the number of trees needed to sequester the equivalent emissions of a given computation was computed. The metric tree‐months, the number of months a mature tree needs to absorb a given quantity of CO2 , was defined. While the amount of CO2 sequestered by a tree per unit of time depends on a number of factors, such as its species, size, or environment, it was estimated that a mature tree sequesters, on average, ≈11 kg of CO2 per year,[ ] giving the multiplier in tree‐months a value close to 1 kg of CO2 per month (0.92 g).

Pragmatic Scaling Factor

Many analyses are presented as a single run of a particular algorithm or software tool; however, computations are rarely performed only once. Algorithms are run multiple times, sometimes hundreds, systematically or manually, with different parameterizations. Statistical models may include any number of combinations of covariates, fitting procedures, etc. It is important to include these repeats in the carbon footprint. To take into account the number of times a computation is performed in practice, the PSF was defined, a scaling factor by which the estimated GHG emissions are multiplied. The value and causes of the PSF vary greatly between tasks. In ML, tuning the hyper‐parameters of a model requires hundreds, if not thousands,[ ] of runs, while other tools require less tuning and can sometimes be run a smaller number of times. As per published work or the user's own experience, the PSF should be estimated for any specific task; besides, in Green Algorithms it is provided for, and recommended that, each user estimate their own PSF.

Conflict of Interest

The authors declare no conflict of interest. Supporting Information Click here for additional data file.

12 in total

1. Shade trees reduce building energy use and CO2 emissions from power plants.

Authors: H Akbari
Journal: Environ Pollut Date: 2002 Impact factor: 8.071

2. GATE: a simulation toolkit for PET and SPECT.

Authors: S Jan; G Santin; D Strul; S Staelens; K Assié; D Autret; S Avner; R Barbier; M Bardiès; P M Bloomfield; D Brasse; V Breton; P Bruyndonckx; I Buvat; A F Chatziioannou; Y Choi; Y H Chung; C Comtat; D Donnarieix; L Ferrer; S J Glick; C J Groiselle; D Guez; P F Honore; S Kerhoas-Cavata; A S Kirov; V Kohli; M Koole; M Krieguer; D J van der Laan; F Lamare; G Largeron; C Lartizien; D Lazaro; M C Maas; L Maigne; F Mayet; F Melot; C Merheb; E Pennacchio; J Perez; U Pietrzyk; F R Rannou; M Rey; D R Schaart; C R Schmidtlein; L Simon; T Y Song; J M Vieira; D Visvikis; R Van de Walle; E Wieërs; C Morel
Journal: Phys Med Biol Date: 2004-10-07 Impact factor: 3.609

3. GATE V6: a major enhancement of the GATE simulation platform enabling modelling of CT and radiotherapy.

Authors: S Jan; D Benoit; E Becheva; T Carlier; F Cassol; P Descourt; T Frisson; L Grevillot; L Guigues; L Maigne; C Morel; Y Perrot; N Rehfeld; D Sarrut; D R Schaart; S Stute; U Pietrzyk; D Visvikis; N Zahra; I Buvat
Journal: Phys Med Biol Date: 2011-01-20 Impact factor: 3.609

Review 4. Current Applications and Future Impact of Machine Learning in Radiology.

Authors: Garry Choy; Omid Khalilzadeh; Mark Michalski; Synho Do; Anthony E Samir; Oleg S Pianykh; J Raymond Geis; Pari V Pandharipande; James A Brink; Keith J Dreyer
Journal: Radiology Date: 2018-06-26 Impact factor: 11.105

5. Advances in weather prediction.

Authors: Richard B Alley; Kerry A Emanuel; Fuqing Zhang
Journal: Science Date: 2019-01-24 Impact factor: 47.728

6. How to stop data centres from gobbling up the world's electricity.

Authors: Nicola Jones
Journal: Nature Date: 2018-09 Impact factor: 49.962

Review 7. A review of the use and potential of the GATE Monte Carlo simulation code for radiation therapy and dosimetry applications.

Authors: David Sarrut; Manuel Bardiès; Nicolas Boussion; Nicolas Freud; Sébastien Jan; Jean-Michel Létang; George Loudos; Lydia Maigne; Sara Marcatili; Thibault Mauxion; Panagiotis Papadimitroulas; Yann Perrot; Uwe Pietrzyk; Charlotte Robert; Dennis R Schaart; Dimitris Visvikis; Irène Buvat
Journal: Med Phys Date: 2014-06 Impact factor: 4.071

8. Simulation of early DNA damage after the irradiation of a fibroblast cell nucleus using Geant4-DNA.

Authors: Sylvain Meylan; Sébastien Incerti; Mathieu Karamitros; Nicolas Tang; Marta Bueno; Isabelle Clairand; Carmen Villagrasa
Journal: Sci Rep Date: 2017-09-20 Impact factor: 4.379

9. Assessing the scales in numerical weather and climate predictions: will exascale be the rescue?

Authors: Philipp Neumann; Peter Düben; Panagiotis Adamidis; Peter Bauer; Matthias Brück; Luis Kornblueh; Daniel Klocke; Bjorn Stevens; Nils Wedi; Joachim Biercamp
Journal: Philos Trans A Math Phys Eng Sci Date: 2019-04-08 Impact factor: 4.226

10. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019.

Authors: Annalisa Buniello; Jacqueline A L MacArthur; Maria Cerezo; Laura W Harris; James Hayhurst; Cinzia Malangone; Aoife McMahon; Joannella Morales; Edward Mountjoy; Elliot Sollis; Daniel Suveges; Olga Vrousgou; Patricia L Whetzel; Ridwan Amode; Jose A Guillen; Harpreet S Riat; Stephen J Trevanion; Peggy Hall; Heather Junkins; Paul Flicek; Tony Burdett; Lucia A Hindorff; Fiona Cunningham; Helen Parkinson
Journal: Nucleic Acids Res Date: 2019-01-08 Impact factor: 16.971

8 in total

1. A panoramic view and swot analysis of artificial intelligence for achieving the sustainable development goals by 2030: progress and prospects.

Authors: Iván Palomares; Eugenio Martínez-Cámara; Rosana Montes; Pablo García-Moral; Manuel Chiachio; Juan Chiachio; Sergio Alonso; Francisco J Melero; Daniel Molina; Bárbara Fernández; Cristina Moral; Rosario Marchena; Javier Pérez de Vargas; Francisco Herrera
Journal: Appl Intell (Dordr) Date: 2021-06-11 Impact factor: 5.086

2. A Context-Enhanced De-identification System.

Authors: Kahyun Lee; Mehmet Kayaalp; Sam Henry; Özlem Uzuner
Journal: ACM Trans Comput Healthc Date: 2021-10-15

3. The environmental sustainability of data-driven health research: A scoping review.

Authors: Gabrielle Samuel; A M Lucassen
Journal: Digit Health Date: 2022-07-05

4. Ten simple rules to make your computing more environmentally sustainable.

Authors: Loïc Lannelongue; Jason Grealey; Alex Bateman; Michael Inouye
Journal: PLoS Comput Biol Date: 2021-09-20 Impact factor: 4.475

5. The Carbon Footprint of Bioinformatics.

Authors: Jason Grealey; Loïc Lannelongue; Woei-Yuh Saw; Jonathan Marten; Guillaume Méric; Sergio Ruiz-Carmona; Michael Inouye
Journal: Mol Biol Evol Date: 2022-03-02 Impact factor: 16.240

6. Embracing Green Computing in Molecular Phylogenetics.

Authors: Sudhir Kumar
Journal: Mol Biol Evol Date: 2022-03-02 Impact factor: 16.240

7. A comprehensive evaluation of microbial differential abundance analysis methods: current status and potential solutions.

Authors: Lu Yang; Jun Chen
Journal: Microbiome Date: 2022-08-19 Impact factor: 16.837

8. Efficient computation of Faith's phylogenetic diversity with applications in characterizing microbiomes.

Authors: George Armstrong; Kalen Cantrell; Shi Huang; Daniel McDonald; Niina Haiminen; Anna Paola Carrieri; Qiyun Zhu; Antonio Gonzalez; Imran McGrath; Kristen L Beck; Daniel Hakim; Aki S Havulinna; Guillaume Méric; Teemu Niiranen; Leo Lahti; Veikko Salomaa; Mohit Jain; Michael Inouye; Austin D Swafford; Ho-Cheol Kim; Laxmi Parida; Yoshiki Vázquez-Baeza; Rob Knight
Journal: Genome Res Date: 2021-09-03 Impact factor: 9.043

8 in total