Performance Optimization System for Hadoop and Spark Frameworks

The optimization of large-scale data sets depends on the technologies and methods used. The MapReduce model, implemented on Apache Hadoop or Spark, allows splitting large data sets into a set of blocks distributed on several machines. Data compression reduces data size and transfer time between disks and memory but requires additional processing. Therefore, finding an optimal tradeoff is a challenge, as a high compression factor may underload Input/Output but overload the processor. The paper aims to present a system enabling the selection of the compression tools and tuning the compression factor to reach the best performance in Apache Hadoop and Spark infrastructures based on simulation analyzes.

Język:: Angielski

Częstotliwość wydawania:: 4 razy w roku
Dziedziny czasopisma:: Informatyka, Technologia informacyjna

Kanał RSS czasopisma

Performance Optimization System for Hadoop and Spark Frameworks

Hrachya Astsatryan

Aram Kocharyan

Daniel Hagimont

Arthur Lalayan

Data publikacji: 31 gru 2020

Zakres stron: 5 - 17

Otrzymano: 06 lip 2020

Przyjęty: 25 wrz 2020

DOI: https://doi.org/10.2478/cait-2020-0056

Słowa kluczoweHadoop, Spark, data compression, CPU/IO tradeoff, performance optimization

© 2020 Hrachya Astsatryan et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Słowa kluczowe
Hadoop, Spark, data compression, CPU/IO tradeoff, performance optimization