Accès libre

Repairing ETL Processes using Extended Relational Algebra

,  et   
10 juin 2025
À propos de cet article

Citez
Télécharger la couverture

Ali S. M. F. and Wrembel R. From conceptual design to performance optimization of ETL workflows: current state of research and open problems. International Journal on Very Large Data Bases (VLDB), 26(6):777–801, 2017. Search in Google Scholar

Allen M. and Cervo D. Multi-domain master data management: Advanced MDM and data governance in practice. Morgan Kaufmann, 2015. Search in Google Scholar

Awiti J. Algorithms and architecture for managing evolving ETL workflows. In Proc. of ADBIS Workshops, volume 1064 of CCIS, pages 539–545. Springer, 2019. Search in Google Scholar

Awiti J., Vaisman A. A., and Zimányi E. From conceptual to logical ETL design using BPMN and relational algebra. In International Conference on Big Data Analytics and Knowledge Discovery (DaWaK), volume 11708 of LNCS, pages 299–309. Springer, 2019. Search in Google Scholar

Awiti J., Vaisman A. A., and Zimányi E. Design and implementation of ETL processes using BPMN and relational algebra. Data & Knowledge Engineering (DKE), 129:101837, 2020. Search in Google Scholar

Awiti J. and Wrembel R. Rule discovery for (semi-)automatic repairs of ETL processes. In International Baltic Conference on Databases and Information Systems (DB&IS), volume 1243 of CCIS, pages 250–264. Springer, 2020. Search in Google Scholar

Awiti J. and Zimányi E. An XML interchange format for ETL models. In New Trends in Databases and Information Systems (ADBIS) Workshops, volume 1064 of CCIS, pages 427–439. Springer, 2019. Search in Google Scholar

Balmin A., Papadimitriou T., and Papakonstantinou Y. Hypothetical queries in an OLAP environment. In International Conference on Very Large Data Bases (VLDB), pages 220–231, 2000. Search in Google Scholar

Bellahsene Z. View adaptation in data warehousing systems. In International Conference on Database and Expert Systems Applications (DEXA), pages 300–309. LNCS 1460, 1998. Search in Google Scholar

Blaschka M., Sapia C., and Hofling G. On schema evolution in multidimensional databases. In International Conference on Big Data Analytics and Knowledge Discovery (DaWaK), pages 153–164. LNCS 1676, 1999. Search in Google Scholar

Body M., Miquel M., Bédard Y., and Tchounikine A. A multidimensional and multiversion structure for OLAP applications. In International Workshop on Data Warehousing and OLAP (DOLAP), pages 1–6, 2002. Search in Google Scholar

Bodziony M., Krzyzanowski H., Pieta L., and Wrembel R. On discovering semantics of user-defined functions in data processing workflows. In International Workshop on Big Data in Emergent Distributed Environments (BiDEDE) @ ACM SIGMOD/PODS Conference. ACM, 2021. Search in Google Scholar

Butkevicius D., Freiberger P. D., and Halberg F. M. Maime: a maintenance manager for ETL processes. In Workshops @ EDBT/ICDT Joint Conference, volume 1810 of CEUR Workshop Proceedings. CEUR-WS.org, 2017. Search in Google Scholar

Chamoni P. and Stock S. Temporal structures in data warehousing. In International Conference on Big Data Analytics and Knowledge Discovery (DaWaK), pages 353–358. LNCS 1676, 1999. Search in Google Scholar

Cleve A., Gobert M., Meurice L., Maes J., and Weber J. Understanding database schema evolution: A case study. Science of Computer Programming, 97:113–121, 2015. Search in Google Scholar

Curino C., Moon H. J., Tanca L., and Zaniolo C. Schema evolution in Wikipedia - toward a web information system benchmark. In International Conference on Enterprise Information Systems (ICEIS), pages 323–332, 2008. Search in Google Scholar

Delplanque J., Etien A., Anquetil N., and Auverlot O. Relational database schema evolution: An industrial case study. In International Conference on Software Maintenance and Evolution (ICSME), pages 635–644. IEEE, 2018. Search in Google Scholar

Dimolikas K., Zarras A. V., and Vassiliadis P. A study on the effect of a table’s involvement in foreign keys to its schema evolution. In International Conference on Conceptual Modeling ER, volume 12400 of LNCS, pages 456–470. Springer, 2020. Search in Google Scholar

Eder J. and Koncilia C. Changes of dimension data in temporal data warehouses. In International Conference on Big Data Analytics and Knowledge Discovery (DaWaK), pages 284–293. LNCS 2114, 2001. Search in Google Scholar

Eder J., Koncilia C., and Morzy T. The COMET metamodel for temporal data warehouses. In International Conference on Advanced Information Systems Engineering (CAISE), pages 83–99. LNCS 2348, 2002. Search in Google Scholar

Elmasri R. and Navathe S. B. Fundamentals of Database Systems, 7th Edition. Pearson, 2016. Search in Google Scholar

Giachos F., Pantelidis N., Batsilas C., Zarras A. V., and Vassiliadis P. Parallel lives diagrams for co-evolving communities and their application to schema evolution. In Companion Proceedings of the International Conference on Conceptual Modeling: ER Forum, volume 3618 of CEUR Workshop Proceedings. CEUR-WS.org, 2023. Search in Google Scholar

Golfarelli M., Lechtenbörger J., Rizzi S., and Vossen G. Schema versioning in data warehouses. In ER 2004 Workshops, pages 415–428. LNCS 3289, 2004. Search in Google Scholar

Gorawski M. and Marks P. Resumption of data extraction process in parallel data warehouses. In International Conference Parallel Processing and Applied Mathematics (PPAM), volume 3911 of LNCS, pages 478–485. Springer, 2005. Search in Google Scholar

Gorawski M. and Marks P. Checkpoint-based resumption in data warehouses. In Software Engineering Techniques: Design for Quality (SET), volume 227 of IFIP, pages 313–323. Springer, 2006. Search in Google Scholar

Hai R., Koutras C., Quix C., and Jarke M. Data lakes: A survey of functions and systems. IEEE Transactions on Knowledge and Data Engineering, 35(12):12571–12590, 2023. Search in Google Scholar

Herrmann K., Voigt H., Behrend A., Rausch J., and Lehner W. Living in parallel realities: Co-existing schema versions with a bidirectional database evolution language. In ACM International Conference on Management of Data (SIGMOD), pages 1101–1116. ACM, 2017. Search in Google Scholar

Herrmann K., Voigt H., Pedersen T. B., and Lehner W. Multi-schema-version data management: data independence in the twenty-first century. International Journal on Very Large Data Bases (VLDB), 27:547–571, 2018. Search in Google Scholar

Herrmann K., Voigt H., Rausch J., Behrend A., and Lehner W. Robust and simple database evolution. Information Systems Frontiers, 20:45–61, 2018. Search in Google Scholar

Huang J. and Guo C. An mas-based and fault-tolerant distributed ETL workflow engine. In IEEE International Conference on Computer Supported Cooperative (CSCWD), pages 54–58. IEEE, 2012. Search in Google Scholar

Hurtado C. A., Mendelzon A. O., and Vaisman A. A. Maintaining data cubes under dimension updates. In International Conference on Data Engineering (ICDE), pages 346–355. IEEE, 1999. Search in Google Scholar

Hyun S. and Hurtado J. A. Traceability of architectural design decisions and software artifacts: A systematic mapping study. Foundations of Computing and Decision Sciences (FCDS), 48(4):401–423, 2023. Search in Google Scholar

Kaas C. K., Pedersen T. B., and Rasmussen B. D. Schema evolution for stars and snowflakes. In International Conference on Enterprise Information Systems (ICEIS), pages 425–433, 2004. Search in Google Scholar

Labio W., Wiener J. L., Garcia-Molina H., and Gorelik V. Efficient resumption of interrupted warehouse loads. In ACM SIGMOD International Conference on Management of Data, pages 46–57. ACM, 2000. Search in Google Scholar

Manousis P., Vassiliadis P., and Papastefanatos G. Automating the adaptation of evolving data-intensive ecosystems. In International Conference on Conceptual Modeling (ER), volume 8217 of LNCS, pages 182–196, 2013. Search in Google Scholar

Manousis P., Vassiliadis P., and Papastefanatos G. Impact analysis and policy-conforming rewriting of evolving data-intensive ecosystems. Journal on Data Semantics, 4(4):231–267, 2015. Search in Google Scholar

Mendelzon A. O. and Vaisman A. A. Temporal queries in OLAP. In International Conference on Very Large Data Bases (VLDB), pages 242–253, 2000. Search in Google Scholar

Moon H. J., Curino C., Deutsch A., Hou C., and Zaniolo C. Managing and querying transaction-time databases under schema evolution. Proceedings of the VLDB Endowment, 1(1):882–895, 2008. Search in Google Scholar

Papastefanatos G., Vassiliadis P., Simitsis A., Sellis T., and Vassiliou Y. Rule-based management of schema changes at ETL sources. In European Conference on Advances in Databases and Information Systems (ADBIS), volume 5968 of LNCS, pages 55–62. Springer, 2010. Search in Google Scholar

Papastefanatos G., Vassiliadis P., Simitsis A., and Vassiliou Y. Policy-regulated management of ETL evolution. Journal on Data Semantics, 5530:147–177, 2009. Search in Google Scholar

Papastefanatos G., Vassiliadis P., Simitsis A., and Vassiliou Y. Metrics for the prediction of evolution impact in ETL ecosystems: A case study. Journal on Data Semantics, 1:75–97, 2012. Search in Google Scholar

Popovic A., Ivkovic V., Trajkovic N., and Lukovic I. A domain-specific language for managing ETL processes. PeerJ Computer Science, 10:e1835, 2024. Search in Google Scholar

Qiu D., Li B., and Su Z. An empirical analysis of the co-evolution of schema and code in database applications. In Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, pages 125–135. ACM, 2013. Search in Google Scholar

Ravat F., Teste O., and Zurfluh G. A multiversion-based multidimensional model. In International Conference on Big Data Analytics and Knowledge Discovery (DaWaK), pages 65–74. LNCS 4081, 2006. Search in Google Scholar

Romero O. and Wrembel R. Data engineering for data science: Two sides of the same coin. In International Conference on Big Data Analytics and Knowledge Discovery (DaWaK), volume 12393 of LNCS. Springer, 2020. Search in Google Scholar

Rundensteiner E. A., Koeller A., Zhang X., Lee A. J., Nica A., Van Wyk A., and Lee Y. Evolvable view environment (EVE): Non-equivalent view maintenance under schema changes. In International Conference on Management of Data (SIGMOD), pages 553–555, 1999. Search in Google Scholar

Schlesinger L., Bauer A., Lehner W., Ediberidze G., and Gutzman M. Efficiently synchronizing multidimensional schema data. In International Workshop on Data Warehousing and OLAP (DOLAP), pages 69–76, 2001. Search in Google Scholar

Tu S. and Zhu L. An optimized etl fault-tolerant algorithm in data warehouses. In IEEE International Conference on Information Science and Technology (ICIST), pages 484–487, 2013. Search in Google Scholar

Vaisman A. and Mendelzon A. A temporal query language for OLAP: Implementation and case study. In Database Programming Languages (DBPL), pages 78–96. LNCS 2397, 2001. Search in Google Scholar

Vaisman A. A. and Zimányi E. Data Warehouse Systems - Design and Implementation, Second Edition. Data-Centric Systems and Applications. Springer, 2022. Search in Google Scholar

Vassiliadis P. Profiles of schema evolution in free open source software projects. In International Conference on Data Engineering (ICDE), pages 1–12. IEEE, 2021. Search in Google Scholar

Vassiliadis P. and Kalampokis G. Taxa and super taxa of schema evolution and their relationship to activity, heartbeat and duration. Information Systems, 110:102109, 2022. Search in Google Scholar

Vassiliadis P., Kolozoff M., Zerva M., and Zarras A. V. Schema evolution and foreign keys: a study on usage, heartbeat of change and relationship of foreign keys to table activity. Computing, 101:1431–1456, 2019. Search in Google Scholar

Vassiliadis P., Shehaj F., Kalampokis G., and Zarras A. V. Joint source and schema evolution: Insights from a study of 195 FOSS projects. In International Conference on Extending Database Technology (EDBT), pages 27–39. OpenProceedings.org, 2023. Search in Google Scholar

Vassiliadis P. and Zarras A. V. Schema evolution survival guide for tables: Avoid rigid childhood and you’re en route to a quiet life. Journal on Data Semantics, 6:221–241, 2017. Search in Google Scholar

Vassiliadis P., Zarras A. V., and Skoulis I. Gravitating to rigidity: Patterns of schema evolution – and its absence – in the lives of tables. Information Systems, 63:24–46, 2017. Search in Google Scholar

Wojciechowski A. ETL workflow reparation by means of case-based reasoning. Information Systems Frontiers, 20(1):21–43, 2018. Search in Google Scholar

Wojciechowski A. and Wrembel R. On case-based reasoning for ETL process repairs: Making cases fine-grained. In International Conference on Databases and Information Systems (DB&IS), volume 1243 of CCIS, pages 235–249. Springer, 2020. Search in Google Scholar

Wrembel R. On handling the evolution of external data sources in a data warehouse architecture. In Integrations of Data Warehousing, Data Mining and Database Technologies - Innovative Approaches, pages 106–147. Information Science Reference, 2011. Search in Google Scholar

Wu S. and Neamtiu I. Schema evolution analysis for embedded databases. In Workshops @ International Conference on Data Engineering (ICDE), pages 151–156. IEEE, 2011. Search in Google Scholar

Langue:
Anglais
Périodicité:
4 fois par an
Sujets de la revue:
Informatique, Intelligence artificielle, Développement de logiciels