Open Access

Which Data Format To Store Scientific Data Should I Use? A Performance Analysis


Cite

A lot of scientific work is dedicated to the analysis of data. Most of the analyzed data, like data from space missions, are structured. The choice of data format can affect various characteristics - read/write speed of standard files, read/write speed of small files and read/write speed of compressed data formats. In this paper, we analyze binary data formats, proposed types of the tests and testing methods, and compare their performance with human-readable text format. We also discuss compressed and uncompressed modes available for data formats like HDF5 and netCDF. When disregarding precision, the best data format from the size perspective is lossy HDF5 without compression. Losless HDF5 without compression show the best speed performance. Lossy HDF5 without compression is the best balance between size reduction and speed. However, for specific criteria and types of files, there might be better candidates as detailed in the conclusion.

eISSN:
1338-3957
Language:
English