Accesso libero

Performance Optimization System for Hadoop and Spark Frameworks

Cybernetics and Information Technologies's Cover Image
Cybernetics and Information Technologies
Special Issue on New Developments in Scalable Computing
INFORMAZIONI SU QUESTO ARTICOLO

Cita

The optimization of large-scale data sets depends on the technologies and methods used. The MapReduce model, implemented on Apache Hadoop or Spark, allows splitting large data sets into a set of blocks distributed on several machines. Data compression reduces data size and transfer time between disks and memory but requires additional processing. Therefore, finding an optimal tradeoff is a challenge, as a high compression factor may underload Input/Output but overload the processor. The paper aims to present a system enabling the selection of the compression tools and tuning the compression factor to reach the best performance in Apache Hadoop and Spark infrastructures based on simulation analyzes.

eISSN:
1314-4081
Lingua:
Inglese
Frequenza di pubblicazione:
4 volte all'anno
Argomenti della rivista:
Computer Sciences, Information Technology