1. bookVolume 29 (2019): Issue 1 (March 2019)
    Exploring Complex and Big Data (special section, pp. 7-91), Johann Gamper, Robert Wrembel (Eds.)
Journal Details
License
Format
Journal
eISSN
2083-8492
First Published
05 Apr 2007
Publication timeframe
4 times per year
Languages
English
Open Access

Parallelizing user–defined functions in the ETL workflow using orchestration style sheets

Published Online: 29 Mar 2019
Volume & Issue: Volume 29 (2019) - Issue 1 (March 2019) - Exploring Complex and Big Data (special section, pp. 7-91), Johann Gamper, Robert Wrembel (Eds.)
Page range: 69 - 79
Received: 04 Mar 2018
Accepted: 10 Dec 2018
Journal Details
License
Format
Journal
eISSN
2083-8492
First Published
05 Apr 2007
Publication timeframe
4 times per year
Languages
English

Ali, S.M.F. (2018). Next-generation ETL framework to address the challenges posed by big data, Workshop Proceedings of the EDBT/ICDT Joint Conference, Vienna, Austria.Search in Google Scholar

Ali, S.M.F. and Wrembel, R. (2017). From conceptual design to performance optimization of ETL workflows: Current state of research and open problems, The VLDB Journal26(6): 1–25.10.1007/s00778-017-0477-2Search in Google Scholar

Aßmann, U. (2003). Invasive software composition, Invasive Software Composition, Springer, Berlin/Heidelberg, pp. 107–145.Search in Google Scholar

Battré, D., Ewen, S., Hueske, F., Kao, O., Markl, V. and Warneke, D. (2010). Nephele/PACTs: A programming model and execution framework for web-scale analytical processing, Proceedings of the Symposium on Cloud Computing, Indianapolis, IN, USA, pp. 119–130.Search in Google Scholar

Chaiken, R., Jenkins, B., Larson, P.-Å., Ramsey, B., Shakib, D., Weaver, S. and Zhou, J. (2008). Scope: Easy and efficient parallel processing of massive data sets, Proceedings of the VLDB Endowment1(2): 1265–1276.10.14778/1454159.1454166Search in Google Scholar

Cloudera (2016). Example: Sentiment analysis using MapReduce custom counters, https://www.cloudera.com/documentation/other/tutorial/CDH5/topics/ht_example_4_sentiment_analysis.html.Search in Google Scholar

Dagum, L. and Menon, R. (1998). OpenMP: An industry standard API for shared-memory programming, IEEE Computational Science and Engineering5(1): 46–55.10.1109/99.660313Search in Google Scholar

Dean, J. and Ghemawat, S. (2008). MapReduce: Simplified data processing on large clusters, Communications of the ACM51(1) 107–113.10.1145/1327452.1327492Search in Google Scholar

Ekman, T. and Hedin, G. (2007). The JastAdd system modular extensible compiler construction, Science of Computer Programming69(1–3): 14–26.10.1016/j.scico.2007.02.003Search in Google Scholar

Ghazal, A., Rabl, T., Hu, M., Raab, F., Poess, M., Crolotte, A. and Jacobsen, H.-A. (2013). Bigbench: Towards an industry standard benchmark for big data analytics, Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, New York, NY, USA, pp. 1197–1208.Search in Google Scholar

González-Vélez, H. and Kontagora, M. (2011). Performance evaluation of MapReduce using full virtualisation on a departmental cloud, International Journal of Applied Mathematics and Computer Science21(2): 275–284, DOI: 10.2478/v10006-011-0020-3.10.2478/v10006-011-0020-3Open DOISearch in Google Scholar

Große, P., May, N. and Lehner, W. (2014). A study of partitioning and parallel UDF execution with the SAP HANA database, Proceedings of the 26th International Conference on Scientific and Statistical Database Management, Aalborg, Denmark, p. 36.Search in Google Scholar

Hedin, G. (2000). Reference attributed grammars, Informatica (Slovenia)24(3): 301–317.Search in Google Scholar

Karagiannis, A., Vassiliadis, P. and Simitsis, A. (2013). Scheduling strategies for efficient ETL execution, Information Systems38(6): 927–945.10.1016/j.is.2012.12.001Search in Google Scholar

Karol, S. (2015). Well-formed and Scalable Invasive Software Composition, PhD dissertation, Technische Universitat Dresden, Dresden.Search in Google Scholar

Kiczales, G., Lamping, J., Mendhekar, A., Maeda, C., Lopes, C., Loingtier, J.-M. and Irwin, J. (1997). Aspect-oriented programming, in M. Akşit and S. Matsuoka (Eds.), European Conference on Object-oriented Programming, Springer, Berlin/Heidelberg, pp. 220–242.10.1007/BFb0053381Search in Google Scholar

Kumar, N. and Kumar, P.S. (2010). An efficient heuristic for logical optimization of ETL workflows, International Workshop on Business Intelligence for the Real-Time Enterprise, Singapore, Singapore, pp. 68–83.Search in Google Scholar

Liu, X., Thomsen, C. and Pedersen, T.B. (2013). ETLMR: A highly scalable dimensional etl framework based on MaprEduce, in A. Hameurlain et al. (Eds.), Transactions on Large-Scale Data-and Knowledge-Centered Systems VIII, Springer, Berlin/Heidelberg, pp. 1–31.10.1007/978-3-642-37574-3_1Search in Google Scholar

Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S. and McClosky, D. (2014). The Stanford CoreNLP natural language processing toolkit, Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Baltimore, MD, USA, pp. 55–60.Search in Google Scholar

Mey, J., Karol, S., Aßmann, U., Huismann, I., Stiller, J. and Fröhlich, J. (2016). Using semantics-aware composition and weaving for multi-variant progressive parallelization, Procedia Computer Science80: 1554–1565.10.1016/j.procs.2016.05.482Search in Google Scholar

Nambiar, R.O. and Poess, M. (2006). The making of TPC-DS, Proceedings of the 32nd International Conference on Very Large Data Bases, Seoul, Korea, pp. 1049–1058.Search in Google Scholar

Simitsis, A., Vassiliadis, P. and Sellis, T. (2005). State-space optimization of ETL workflows, IEEE Transactions on Knowledge and Data Engineering17(10): 1404–1419.10.1109/TKDE.2005.169Search in Google Scholar

Simitsis, A., Wilkinson, K., Dayal, U. and Castellanos, M. (2010). Optimizing ETL workflows for fault-tolerance, IEEE 26th International Conference on Data Engineering (ICDE), Long Beach, CA, USA, pp. 385–396.Search in Google Scholar

Thomsen, C. and Pedersen, T.B. (2011). Easy and effective parallel programmable ETL, Proceedings of the ACM 14th International Workshop on Data Warehousing and OLAP, New York, NY, USA, pp. 37–44.Search in Google Scholar

Tziovara, V., Vassiliadis, P. and Simitsis, A. (2007). Deciding the physical implementation of ETL workflows, Proceedings of the International Workshop on Data Warehousing and OLAP, New York, NY, USA, pp. 49–56.Search in Google Scholar

Vassiliadis, P., Simitsis, A. and Baikousi, E. (2009). A taxonomy of ETL activities, Proceedings of the ACM 12th International Workshop on Data Warehousing and OLAP, New York, NY, USA, pp. 25–32.Search in Google Scholar

Weinberg, A.I. and Last, M. (2017). Interpretable decision-tree induction in a big data parallel framework, International Journal of Applied Mathematics and Computer Science27(4): 737–748, DOI: 10.1515/amcs-2017-0051.10.1515/amcs-2017-0051Open DOISearch in Google Scholar

Recommended articles from Trend MD

Plan your remote conference with Sciendo