| Literature DB >> 29161317 |
Shijie Yu1, Hailong Yang1, Rui Wang1, Zhongzhi Luan1, Depei Qian1,2.
Abstract
As the energy consumption has been surging in an unsustainable way, it is important to understand the impact of existing architecture designs from energy efficiency perspective, which is especially valuable for High Performance Computing (HPC) and datacenter environment hosting tens of thousands of servers. One obstacle hindering the advance of comprehensive evaluation on energy efficiency is the deficient power measuring approach. Most of the energy study relies on either external power meters or power models, both of these two methods contain intrinsic drawbacks in their practical adoption and measuring accuracy. Fortunately, the advent of Intel Running Average Power Limit (RAPL) interfaces has promoted the power measurement ability into next level, with higher accuracy and finer time resolution. Therefore, we argue it is the exact time to conduct an in-depth evaluation of the existing architecture designs to understand their impact on system energy efficiency. In this paper, we leverage representative benchmark suites including serial and parallel workloads from diverse domains to evaluate the architecture features such as Non Uniform Memory Access (NUMA), Simultaneous Multithreading (SMT) and Turbo Boost. The energy is tracked at subcomponent level such as Central Processing Unit (CPU) cores, uncore components and Dynamic Random-Access Memory (DRAM) through exploiting the power measurement ability exposed by RAPL. The experiments reveal non-intuitive results: 1) the mismatch between local compute and remote memory node caused by NUMA effect not only generates dramatic power and energy surge but also deteriorates the energy efficiency significantly; 2) for multithreaded application such as the Princeton Application Repository for Shared-Memory Computers (PARSEC), most of the workloads benefit a notable increase of energy efficiency using SMT, with more than 40% decline in average power consumption; 3) Turbo Boost is effective to accelerate the workload execution and further preserve the energy, however it may not be applicable on system with tight power budget.Entities:
Mesh:
Year: 2017 PMID: 29161317 PMCID: PMC5697812 DOI: 10.1371/journal.pone.0188428
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1The energy proportionality gap for each system component with the ep workload running at different input scales.
The size of the problem at different input scale for workload ep.
| Workload | Scale S | Scale W | Scale A | Scale B | Scale C | Scale D |
|---|---|---|---|---|---|---|
| EP | 224 | 225 | 228 | 230 | 232 | 236 |
Benchmark suites of representative workloads.
| Benchmark Suite | Parallelization | Workloads |
|---|---|---|
| NPB-MPI | MPI | MultiGrid ( |
| SPECCPU | - | |
| PARSEC | Pthread |
Fig 2The cumulative distribution of the time interval for RAPL updating the energy registers.
98% of energy update interval fall in the range that less than 1.15 ms.
System configurations of two Sandy Bridge servers.
| Vendor/Model | Intel Sandy Bridge EP |
| CPU Sockets | 2x Intel Xeon E5-2620 |
| Core per Socket | 6 |
| SMT | 12 logical threads when enabled |
| Turbo Boost | 2.0GHz(2.5GHz) |
| Memory | 4x 4GB SamSung DDR3-1333 |
| Motherboard | Lenovo RD630 |
| Disk | 3x 300GB SATA Seagate ST9300605SS |
| OS | CentOS 6.2 |
| Linux Kernel | 2.6.32-220.el6.x86_64 |
| Vendor/Model | Intel Sandy Bridge EP |
| CPU Sockets | 2x Intel Xeon E5-2609 |
| Core per Socket | 4 |
| SMT | not supported |
| Turbo Boost | not supported |
| Memory | 4x 8GB Kingston DDR3-1066 |
| Motherboard | Supermicro X9DRG-QF |
| Disk | 2x 128GB SATA Western Digital WD1003FBYX-01Y7B1 |
| OS | CentOS 6.3 |
| Linux Kernel | 2.6.32-279.el6.x86_64 |
Parameters of RAPL measurement range including Maximum Time Window (MTW), Maximum Power (MaxP) and Minimum Power (MinP).
| Domain/Range | MTW | MaxP | MinP |
|---|---|---|---|
| PKG | 46ms | 150w | 63w |
| DRAM | 39ms | 75w | 15w |
Fig 3Energy characterization of NUMA with NPB workloads, (a) performance per watts (b) energy ratio of RAPL domains, (c) average power consumption and (d) total energy consumption.
The results are normalized to NUMA disabled.
Fig 4Energy characterization of SMT with PARSEC workloads, (a) performance per watts (b) energy ratio of RAPL domains, (c) average power consumption and (d) total energy consumption.
The results are normalized to SMT disabled.
Fig 5Energy characterization of Turbo Boost with SPECCPU workloads, (a) performance per watts (b) energy ratio of RAPL domains, (c) average power consumption and (d) total energy consumption.
The results are normalized to Turbo Boost disabled.
Fig 6Energy efficiency of Turbo Boost with SPECCPU workloads using IPS as performance metric.