| Literature DB >> 29649172 |
Vlad Diaconita1, Ana-Ramona Bologa2, Razvan Bologa3.
Abstract
A smart city implies a consistent use of technology for the benefit of the community. As the city develops over time, components and subsystems such as smart grids, smart water management, smart traffic and transportation systems, smart waste management systems, smart security systems, or e-governance are added. These components ingest and generate a multitude of structured, semi-structured or unstructured data that may be processed using a variety of algorithms in batches, micro batches or in real-time. The ICT architecture must be able to handle the increased storage and processing needs. When vertical scaling is no longer a viable solution, Hadoop can offer efficient linear horizontal scaling, solving storage, processing, and data analyses problems in many ways. This enables architects and developers to choose a stack according to their needs and skill-levels. In this paper, we propose a Hadoop-based architectural stack that can provide the ICT backbone for efficiently managing a smart city. On the one hand, Hadoop, together with Spark and the plethora of NoSQL databases and accompanying Apache projects, is a mature ecosystem. This is one of the reasons why it is an attractive option for a Smart City architecture. On the other hand, it is also very dynamic; things can change very quickly, and many new frameworks, products and options continue to emerge as others decline. To construct an optimized, modern architecture, we discuss and compare various products and engines based on a process that takes into consideration how the products perform and scale, as well as the reusability of the code, innovations, features, and support and interest in online communities.Entities:
Keywords: Elasticsearch; Hadoop; IoT; Spark; cloud computing; sensors; smart cities
Year: 2018 PMID: 29649172 PMCID: PMC5948833 DOI: 10.3390/s18041181
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Hadoop architecture for smart cities.
Figure 2The structure of the first data set.
Scores for the analyzed products (part 1)—the details regarding the tests, including the queries, are in the Excel file included in the Complementary Materials of the article. In the tab dedicated to the criteria, in the cell comments, there are additional explications regarding how the score was calculated.
| Criteria | Hive with MR | Hive with Tez | Hive with Spark | Oraloader | Sqoop |
|---|---|---|---|---|---|
| 60.08 | 74.16 | 67.70 | 0.00 | 0.00 | |
| 34.80 | 100.00 | 62.20 | 100.00 | 93.23 | |
| 50.00 | 50.00 | 50.00 | 10.00 | 50.00 | |
| 75.00 | 100.00 | 75.00 | 0.00 | 15.00 | |
| 100.00 | 100.00 | 100.00 | 75.00 | 75.00 | |
| 100.00 | 100.00 | 100.00 | 50.00 | 100.00 | |
Scores for the analyzed products (part 2)—the details regarding the tests, including the queries, are in the Excel file included in the Complementary Materials of the article. In the tab dedicated to the criteria, in the cell comments, there are additional explications regarding how the score was calculated.
| Criteria | Cassandra + Presto | HBase + Pheonix | Spark Streaming | Storm | Ambari | Hue |
|---|---|---|---|---|---|---|
| 51.30 | 75.85 | 100.00 | 100.00 | 0.00 | 0.00 | |
| 100.00 | 34.43 | 99.00 | 100.00 | 0.00 | 0.00 | |
| 75.00 | 75.00 | 75.00 | 50.00 | 0.00 | 0.00 | |
| 0.00 | 0.00 | 100.00 | 0.00 | 15.00 | 0.00 | |
| 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | |
| 100.00 | 75.00 | 100.00 | 71.50 | 100.00 | 75.00 | |
Test results for execution time (HiveQL with MapReduce as execution engine, the queries can be found in the Complementary Materials).
| Execution Time (s) | HiveQL with MapReduce as Execution Engine | ||||
|---|---|---|---|---|---|
| Query 1 | Query 2 | Query 3 | Query 4 | Query 5 | |
| 1 node | 139.9 | 102.467 | 7150 | 2440 | 48.15 |
| 3 node cluster | 80.9 | 100.69 | 6107 | 1401 | 40.7 |
| 5 node cluster | 57.16 | 85.04 | 5803 | 1190 | 32.5 |
Test results for execution time (HiveQL with Tez as execution engine, the queries can be found in the Complementary Materials).
| Execution Time (s) | HiveQL with Tez as Execution Engine | ||||
|---|---|---|---|---|---|
| Query 1 | Query 2 | Query 3 | Query 4 | Query 5 | |
| 1 node | 219 | 118.53 | 1487 | 1250.93 | 27.07 |
| 3 node cluster | 51.67 | 79.48 | 780.53 | 708 | 20.1 |
| 5 node cluster | 43.7 | 40.25 | 530.38 | 600 | 12.5 |
Test results for execution time (HiveQL with Spark as execution engine, the queries can be found in the Complementary Materials).
| Execution Time (s) | HiveQL with Spark as Execution Engine | ||||
|---|---|---|---|---|---|
| Query 1 | Query 2 | Query 3 | Query 4 | Query 5 | |
| 1 node | 263 | 123.98 | 2967.8 | 2001 | 31.644 |
| 3 node cluster | 80 | 101.21 | 1735.6 | 1380 | 24.3 |
| 5 node cluster | 51 | 83.7 | 1250.3 | 1081 | 18.2 |
Test results for execution time (Spark 2.1: spark-submit, Spark SQL over an HDFS stored file and over an Hive stored table, the queries can be found in the Complementary Materials).
| Execution Time (s) | Spark 2.1 (Spark-Submit, Spark SQL over HDFS Stored File) | Spark 2.1 (Spark-Submit, Spark SQL over the Hive Stored Table) | ||||
|---|---|---|---|---|---|---|
| Query 1 | Query 2 | Query 3,4,5 | Query 1 | Query 2 | Query 3,4,5 | |
| 1 node | 13 | 25 | n/a | 12 | 23 | n/a |
| 3 node cluster | 48.98 | 80 | n/a | 40.2 | 61 | n/a |
| 5 node cluster | 38.3 | 57.9 | n/a | 31.7 | 43.5 | n/a |
Test results for execution time (the queries can be found in the Complementary Materials).
| Hbase and Phoenix | Cassandra and Presto | |||||||
|---|---|---|---|---|---|---|---|---|
| Query 1 | Query 2 | Query 3 | Query 4 | Query 1 | Query 2 | Query 3 | Query 4 | |
| 48.5 | 2.32 | 80.2 | 78.5 | 40.1 | 2 | 25 | 13 | |
| 27.9 | 1.79 | 58 | 50.5 | 34.3 | 1.85 | 20 | 10 | |