Automatic Generation of Regular Expressions for Extracting Attribute Values of Medical Products


Resources of professional companies operating on the medical services market contain data from a huge number of transactional documents. This allows them to collect and process, among other actions, information about medical products. Organized data is obviously more valuable. In this paper, the possibility of supporting the process of organizing information is considered, with the goal to extract values of attributes of medical products from brief descriptions in transactional documents. This helps to build a structured product specification and makes it possible to make comparisons between products. For this purpose, an approach based on regular expressions and their generation with the use of the genetic algorithm is proposed. The results presented in the paper show a great potential of the presented method.

