Principles of Synthesizing Medical Datasets

Data in many application domains provide a valuable source for analysis and data-driven decision support. On the other hand, legislative restrictions are provided, especially on personal data and patients’ data in the medical domain. In order to maximize the use of data for decision purposes and comply with legislation, sensitive data needs to be properly anonymized or synthetized. This article contributes to the area of medical records synthesis. We first introduce this topic and present it in a broader context, as well as in terms of methods used and metrics for their evaluation. Based on the related work analysis, we selected CTGAN neural network model for data synthesis and experimentally validated it on three different medical datasets. The results were evaluated both quantitatively by means of selected metrics as well as qualitatively by means of proper visualization techniques. The results showed that in most cases, the synthesized dataset is a very good approximation of the original one, with similar prediction performance.

eISSN:: 1338-3957
Język:: Angielski

Częstotliwość wydawania:: 4 razy w roku
Dziedziny czasopisma:: Computer Sciences, Information Technology, Databases and Data Mining, Engineering, Electrical Engineering

Kanał RSS czasopisma

Principles of Synthesizing Medical Datasets

Data publikacji: 24 sty 2023

Zakres stron: 25 - 29

Otrzymano: 03 sie 2022

Przyjęty: 21 paź 2022

DOI: https://doi.org/10.2478/aei-2022-0019

Słowa kluczowedata synthetization, GAN, CTGAN

© 2022 Michal Kolárik et al., published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Słowa kluczowe
data synthetization, GAN, CTGAN