Spark-Based Digital Factory Design

Big data processing often uses the paradigm of parallelism by computing directly on top of the distributed data storage. The existing big data workflows unify the data processing practices to utilize the cloud’s native computational potentials to offer advanced machine learning and BI capabilities. Spark is an open-source massively parallel in-memory data processing framework, the current state-of-the-art. The primary approach is to break down the job into granular-level executed tasks, enabling parallelization. In the discussed case study, through IoT – cloud solutions, the plant data can be converted into an analyzable form to let the farther machine learning modules produce added value. To maximize the efficiency of the processing and accumulation, cloud-based components are introduced. Based on the data insights, the appropriate operative actions can be taken. The cost and performance optimization methods were also discussed in the study. Through achieving higher degree of digitalization, the control over the production increased.

eISSN:: 1338-3957
Idioma:: Inglés

Calendario de la edición:: 4 veces al año
Temas de la revista:: Computer Sciences, Information Technology, Databases and Data Mining, Engineering, Electrical Engineering

RSS Feed de revista

Spark-Based Digital Factory Design

Publicado en línea: 12 ago 2022

Páginas: 19 - 26

Recibido: 04 abr 2022

Aceptado: 20 jun 2022

DOI: https://doi.org/10.2478/aei-2022-0008

Palabras claveSpark, big data, pipeline, cloud.ETL

© 2022 István Pölöskei, published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Palabras clave
Spark, big data, pipeline, cloud.ETL