Performance Optimization System for Hadoop and Spark Frameworks

1. Chen, J., Y. Chen, X. Du, C. Li, J. Lu, S. Zhao, X. Zhou. Big Data Challenge: A Data Management Perspective. – Frontiers of Computer Science, Vol. 7, 2013, No 2, pp. 157-164.10.1007/s11704-013-3903-7Search in Google Scholar

2. Lublinsky, B., K. T. Smith, A. Yakubovich. Professional Hadoop Solutions. Indiana, USA, John Wiley & Sons, 2013, p. 504.Search in Google Scholar

3. Zaharia, M., R. S. Xin, P. Wendell, T. Das, M. Armbrust, A. Dave, X. Meng, J. Rosen, S. Venkataraman, M. J. Franklin, A. Ghodsi. Apache Spark: A Unified Engine for Big Data Processing. – Communications of the ACM, Vol. 59, 2016, No 11, pp. 56-65.10.1145/2934664Search in Google Scholar

4. Cheng, D., X. Zhou, P. Lama, J. Mike, C. Jiang. Energy Efficiency Aware Task Assignment with DVFS in Heterogeneous Hadoop Clusters. – IEEE Transactions on Parallel and Distributed Systems, Vol. 29, 2017, No 1, pp. 70-82.10.1109/TPDS.2017.2745571Search in Google Scholar

5. Nitu, V., A. Kocharyan, H. Yaya, A. Tchana, D. Hagimont, H. Astsatryan. Working Set Size Estimation Techniques in Virtualized Environments: One Size Does Not Fit All – ACM Meas. Anal. Comput. Syst., Vol. 2, 2018, pp. 1-21.10.1145/3179422Search in Google Scholar

6. Kothuri, P., D. Garcia, J. Hermans. Developing and Optimizing Applications in Hadoop.– Journal of Physics: Conference Series, Vol. 898, 2017, No 5.10.1088/1742-6596/898/7/072038Search in Google Scholar

7. Dean, J., S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. – Communications of the ACM, Vol. 51, 2008, No 1, pp. 107-113.10.1145/1327452.1327492Search in Google Scholar

8. Won, H., M. C. Nguyen, M. S. Gil, Y. S. Moon, K. Y. Whang. Moving Metadata from Ad Hoc Files to Database Tables for Robust, Highly Available, and Scalable HDFS. – The Journal of Supercomputing, Vol. 73, 2017, No 6, pp. 2657-2681.10.1007/s11227-016-1949-7Search in Google Scholar

9. Uthayakumar, J., T. Vengattaraman, P. Dhavachelvan. A Survey on Data Compression Techniques: From the Perspective of Data Quality, Coding Schemes, Data Type and Applications. – Journal of King Saud University – Computer and Information Sciences, 2018.Search in Google Scholar

10. Liu, L. Y., J. F. Wang, R. J. Wang, J. Y. Lee. Design and Hardware Architectures for Dynamic Huffman Coding – IEEE Proceedings-Computers and Digital Techniques, Vol. 142, 1995, No 6, pp. 411-418.10.1049/ip-cdt:19952157Search in Google Scholar

11. Fenwick, P. M. The Burrows-Wheeler Transform for Block Sorting Text Compression: Principles and Improvements. – The Computer Journal, Vol. 39, 1996, No 9, pp. 731-740.10.1093/comjnl/39.9.731Search in Google Scholar

12. Fang, J., J. Chen, Z. Al-Ars, P. Hofstee, J. Hidders. Work-in-Progress: A High-Bandwidth Snappy Decompressor in Reconfigurable Logic. – In: Proc. of IEEE International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), Turin, Italy, 30 September – 5 October 2018, pp. 1-2.10.1109/CODESISSS.2018.8525953Search in Google Scholar

13. Liu, W., F. Mei, C. Wang, M. O’Neill, E. E. Swartzlander. Data Compression Device Based on Modified LZ4 Algorithm. – IEEE Transactions on Consumer Electronics, Vol. 64, 2018, No 1, pp. 110-117.10.1109/TCE.2018.2810480Search in Google Scholar

14. Rattanaopas, K., S. Kaewkeeree. Improving Hadoop MapReduce Performance with Data Compression: A Study Using Wordcount Job. – In: Proc. of 14th IEEE International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON’17), 2017, pp. 564-567.10.1109/ECTICon.2017.8096300Search in Google Scholar

15. Haider, A., X. Yang, N. Liu, X. H. Sun, S. He. IC-Data: Improving Compressed Data Processing in Hadoop. – In: Proc. of 22nd IEEE International Conference on High Performance Computing (HiPC’15), 2015, pp. 356-365.10.1109/HiPC.2015.28Search in Google Scholar

16. Chen, Y., A. Ganapathi, R. H. Katz. To Compress or Not to Compress-Compute vs IO Tradeoffs for Mapreduce Energy Efficiency. – In: Proc. of 1st ACM SIGCOMM Workshop on Green Networking, 2010, pp. 23-28.10.1145/1851290.1851296Search in Google Scholar

17. Lang, W., J. M. Patel. Energy Management for MapReduce Clusters. – In: Proc. of VLDB Endowment, Vol. 3, 2010, No 1-2, pp. 129-139.10.14778/1920841.1920862Search in Google Scholar

18. Li, W., H. Yang, Z. Luan, D. Qian. Energy Prediction for Mapreduce Workloads. – In: Proc. of 9th IEEE International Conference on Dependable, Autonomic and Secure Computing, 2011, pp. 443-448.10.1109/DASC.2011.88Search in Google Scholar

19. Wirtz, T., R. Ge. Improving Mapreduce Energy Efficiency for Computation Intensive Workloads. – In: Proc. of IEEE International Green Computing Conference and Workshops, 2011, pp. 1-8.10.1109/IGCC.2011.6008564Search in Google Scholar

20. Leverich, J., C. Kozyrakis. On the Energy (in) Efficiency of Hadoop Clusters. – ACM SIGOPS Operating Systems Review, Vol. 44, 2010, No 1, pp. 61-65.10.1145/1740390.1740405Search in Google Scholar

21. Tiwari, N., S. Sarkar, U. Bellur, M. Indrawan. An Empirical Study of Hadoop’s Energy Efficiency on a HPC Cluster. – Procedia Computer Science, Vol. 29, 2014, pp. 62-72.10.1016/j.procs.2014.05.006Search in Google Scholar

22. Tatineni, M., J. Greenberg, R. Wagner, E. Hocks, C. Irving. Hadoop Deployment and Performance on Gordon Data Intensive Supercomputer. – In: Proc. of Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery, 2013, pp. 1-3.10.1145/2484762.2484831Search in Google Scholar

23. Narkhede, S., T. Baraskar. HMR Log Analyzer: Analyze Web Application Logs over Hadoop MapReduce. – International Journal of UbiComp (IJU), Vol. 4, 2013, No 3, pp. 41-51.10.5121/iju.2013.4304Search in Google Scholar

24. Krishna, K., M. N. Murty. Genetic k-Means Algorithm. – IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), Vol. 29, No 3, 1999, pp. 433-439.10.1109/3477.76487918252317Search in Google Scholar

25. Zhao, W., H. Ma., Q. He. Parallel K-Means Clustering Based on MapReduce. – In: CloudCom 2009. LNCS 5931. Berlin, Springer, 2009, pp. 674-679.10.1007/978-3-642-10665-1_71Search in Google Scholar

26. Astsatryan, H., V. Sahakyan, Y. Shoukourian, P. H. Cros, M. Dayde, J. Dongarra, P. Oster. Strengthening Compute and Data Intensive Capacities of Armenia. – In: Proc. of 14th IEEE RoEduNet International Conference – Networking in Education and Research (NER’15), Craiova, Romania; September 2015, pp. 28-33.10.1109/RoEduNet.2015.7311823Search in Google Scholar

27. Astsatryan, H., W. Narsisian, A. Kocharyan, G. da Costa, A. Hankel, A. Oleksiak. Energy Optimization Methodology for e-Infrastructure Providers. – Willey Concurrency and Computation: Practice and Experience, Vol. 29, 2017, No 10. DOI: 10.1002/cpe.4073.10.1002/cpe.4073Search in Google Scholar

28. Nitu, V., A. Kocharyan, H. Yaya, A. Tchana, D. Hagimont, H. Astsatryan. Working Set Size Estimation Techniques in Virtualized Environments: One Size Does Not Fit All. – Proceedings of the ACM on Measurement and Analysis of Computing Systems, Vol. 2, 2018, No 1, pp. 1-22.10.1145/3179422Search in Google Scholar

Langue:: Anglais

Périodicité:: 4 fois par an
Sujets de la revue:: Informatique, Informatique

RSS Feed de la revue

Performance Optimization System for Hadoop and Spark Frameworks

Hrachya Astsatryan

Aram Kocharyan

Daniel Hagimont

Arthur Lalayan

Publié en ligne: 31 déc. 2020

Pages: 5 - 17

Reçu: 06 juil. 2020

Accepté: 25 sept. 2020

DOI: https://doi.org/10.2478/cait-2020-0056

Mots clésHadoop, Spark, data compression, CPU/IO tradeoff, performance optimization

© 2020 Hrachya Astsatryan et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Mots clés
Hadoop, Spark, data compression, CPU/IO tradeoff, performance optimization