An improved similarity matching model for the content-based image retrieval model
Catégorie d'article: Research Article
Publié en ligne: 19 juil. 2025
Reçu: 30 août 2024
DOI: https://doi.org/10.2478/ijssis-2025-0031
Mots clés
© 2025 Manimegalai Asokaraj et al., published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Recent technological advancements have led to a rise in the use of digital cameras, smartphones, and the Internet. The amount of shared and stored multimedia data is increasing, which makes it a difficult research problem to search for or retrieve pertinent images from an archive. Any image retrieval model's primary requirement is to find and sort images that have a visual semantic relevance to the user's query. Most Internet search engines rely on text-based methods to obtain images, which necessitate the inclusion of captions. Digital images are used in various industries, including health care, design, education, and spatial imaging. Images can be stored and retrieved using a wide range of strategies and approaches. Still, most of these search engines rely on metadata (keywords, descriptions, and tags) [1]. To process digital image data, the visual characteristics of the image must be extracted and represented in an organized manner. Content-based retrieval (CBR) is one of the widely used models for retrieving information from digital images. CBR expands on conventional information retrieval methods for digital image search by including information that can only be found by directly examining the image's pixel values [2–3].
Content-based image retrieval (CBR) method is used to extract image features automatically by deriving the attributes, including color, texture, and shape. Extracting the features from the image is a tedious task and a multi-step process that includes various phases such as feature extraction, pattern matching, extracting queries, and matching queries. The general CBR system's process is visualized in Figure 1. The CBR process can be thought of as a simple collection of modules that work together to obtain images and process the comparison in the databases, R-2,C-4. Multi-dimensional feature vectors are used in standard content-based image retrieval systems to extract and describe the visual content of the images stored in the database [4]. A feature database is created from the feature vectors of the images stored in the database. The number of accumulated images is growing rapidly due to the advent of technology, web users, and the availability of recent mobile technologies such as digital cameras and image scanners [5]. Users in a variety of fields, such as publishing, architecture, crime prevention, fashion, remote sensing, and health, need reliable software or applications to handle digital images that are generated from the Internet and smartphones. Numerous general-purpose image retrieval systems have been designed with this goal in mind. Some of the CBR systems have achieved considerable results in retrieving the image from the database, and the learning model uses a similarity matching model to match features from query images [6]. Various models have contributed effectively to achieving reliable results in retrieving the image features. This research study validates various facets of the images, such as image shape, image colors, texture of the images, and image contents, to comprehend the statistical reasons [7]. However, there is a considerable research gap that currently exists to retrieve features between low-level and high-level characteristics, pertinent pattern analysis, problems with data modification, and retrieving images in the least computation time.

General workflow of CBIR.
Many of the CBR models are developed based on a similarity-matching approach to retrieve images from the database. To create content-based image retrieval systems, contemporary research incorporates various content-based image retrieval algorithms. To improve the accuracy of the image retrieval process, a novel multiple-feature extraction from the query image is proposed [5]. A content-based image retrieval system was constructed using the color auto-correlogram feature, Gabor wavelet feature, and wavelet transform feature together. The improved precision is evident from the results, which show that multiple feature extraction creates a path to retrieve optimal images from the database effectively. Due to computational overhead, systems utilizing various feature extraction techniques may see a drop in retrieval speed as accuracy increases. Enhancing the system's performance is crucial because it boosts the system's accuracy. Computation time and semantic gap are two categories of CBR problems. The semantic gap refers to the difference that exists between low-level image pixels and high-level semantics that are understood by humans [8,9,10]. The other problem is computation time, or the amount of time needed to analyze, index, and search through images [11]. The color feature is extracted based on color string coding. Then, we compare one string with another string, and finally, their matching weights are returned. As the process is automated using RPA technology, this improves the perfection of images, which has significantly increased the accuracy while retrieving images that are appropriate to the given input image [29]. The recent advancements in deep learning-based image retrieval, starting with an introduction to the CBIR problem, key datasets, and content-based deep image retrieval methods, with a focus on network models, deep feature extraction, and retrieval types [39]. Finding appropriate methods for picture categorization, prediction, and retrieval systems is a challenging task for researchers. The research aims to develop a dynamic CBR model for retrieving optimal images from databases based on query image features with low computational time. Objectives of the research study can be achieved as follows:
To investigate appropriate methodologies for query image techniques and ascertain those exhibiting minimal computational duration. To examine pertinent predictive and retrieval methodologies to recognize those characterized by superior performance metrics. To amalgamate the recognized classification, predictive, and retrieval methodologies to establish a highly efficient case-based reasoning (CBR) system. R-2,C-8
In a vast database, it may be difficult to locate a specific image object due to the wide variety of image formats available on the Internet. A technique used in many domains is the retrieval of comparable images based on differences in the content of query images. Digitally knowledgeable libraries, criminal prevention, fingerprint identification, biodiversity information systems, health care, historical site research, and more are some of these domains.
CBR is a unique method of image retrieval that deviates from keyword-based techniques by emphasizing the examination of visual characteristics present in images rather than depending only on predefined keywords. A method called CBR makes use of visual cues, including color, form, and texture, to help with the difficult task of identifying visual objects [12].
CBR is one of the many applications that fall under the broad category of computer vision. The retrieving images from a database with many images is known as CBR [17]. This paper aims to investigate the real-world implementation of a two-phase method for content-based image retrieval. In the first stage, convolutional neural networks (CNNs) are used for image detection. CBR is one of the many applications that fall under the broad category of computer vision. The method of retrieving images from a database with a large number of images is known as CBR. This paper aims to investigate the real-world implementation of a two-phase method for content-based image retrieval. In the first stage, CNNs are used for image detection [14,30]. Three different levels of clouds are distinguished by the cloud classification system: high, intermediate, and low. K-means clustering methods and content-CBR are used in the cloud categorization process. Clouds are divided into three different categories by the developed method: low, medium, and high. The kind of cloud has a significant impact on how much precipitation falls. There is a lack of established research and inconsistency about the impact of high resolutions on search precision and result arrangement. This [15] study's main goal is to investigate how picture resolution affects search accuracy and result sorting. Resizing images is highly advised before uploading them to the image database, particularly if the images are affected in any way by resizing.
To improve image retrieval and identification through content-based image recognition, Das et al. presented a technique for feature extraction through image binarization. A total of 3,688 images from two public datasets were used by the authors to test their technique. This procedure, irrespective of the image dimensions, decreased the size of features to 12. For the aim of evaluation, statistical measures based on recall and precision findings were used. Misclassification of query images is one drawback of this methodology, which could impact retrieval performance compared to other available techniques [16].
Jianbo Ouyang et al. [31] proposed a re-ranking method that refines top-K retrieval results by encoding each image into an affinity feature vector based on comparisons with anchor images. These features are refined using a transformer encoder to incorporate contextual information and then used to recalculate similarity scores for re-ranking. Our approach, enhanced with a novel data augmentation scheme, is robust and compatible with results from various retrieval algorithms. Ashraf et al. [13] proposed a unique feature extraction model for CBR, which reliably retrieves object information contained in an image by the bandelet transform technique for image representation and feature extraction. Three public datasets, such as Coil, Corel, and Caltech 101, were utilized to evaluate the system's performance and accomplishments in image retrieval using artificial neural networks. The retrieval efficiency was evaluated using precision and recall values.
This CSM algorithm applies this concept to a contrastive function that is created by gently pushing the multilayer similarity-matching goal function's output neurons. Consequently, between their earlier and later levels, the hidden layers pick up intermediate representations.
This model was inspired by deep learning techniques, such as single feedforward networks and networks with feedback connections. The objective function in CSM is determined with the help of a hyperparameter of anti-Hebbian and Hebbian plasticity to manage neuron weight to retrieve features from the query images [18].
In equilibrium propagation (EP), weight updates in selecting the features are defined with a gradient proportion value of the error signal, which aids in comparing the parameter tuning of a local approximation phase. The approximation phase of propagation is primarily characterized by the nudged equilibrium phase (NP) and clamped phase. One way to conceptualize the learning process is minimization in the approximation phase by the contrastive objective function. Another phase of EP is the reconfiguring phase, which updates the landscape matrix score to an energy score, which helps in removing false rigid features to increase the stability of the fixed points of weights. However, the model performed well in selecting features but failed in matching the image content due to the presence of large local features [19].
DCM's extremely easy method of obtaining a representation appropriate for geometric verification from CNN activations for re-ranking. To align the activation tensors and compare, it should ideally estimate the geometric transformation. However, as mentioned earlier, this is not feasible. DCM uses two characteristics of the activations—that high values are more significant and that the activations are sparse— to simulate this process. Therefore, a limited number of extremal regions can accurately mimic each channel [20,37].
Girgis et al. [21] suggested a GA-based IR technique that modifies a query's keyword weights to produce an optimal or nearly optimal query vector. Every query described in this method is represented by a chromosome. The method generates a new population by performing the genetic operator processes of selection, crossover, and mutation on the present population. Each member of the new population is then subjected to a local search approach to obtain optimal solutions. Until an optimal query chromosome for document retrieval is achieved, this process is repeated. Fitness functions, which evaluate the quality of a potential answer, and direct the evolution of those possibilities.
A technique for image retrieval utilizing statistical tests, such as Welch's
Similarity between two or more images is defined using similarity metrics. The degree to which the content of the retrieved images resembles the user query determines their ranking. The image features are converted into vectors and represented by Raj and Mohanasundaram [25], using the vector space model, and the weights of the terms in the vector were calculated using the TFIDF measure. To determine how similar the image features are, the authors used cosine and Jaccard similarity measures. Various similarity measures were tested by Sutojo et al. [26] for text categorization and grouping. For image classification, they tested three similarity metrics, including cosine, Euclidean distance, and similarity measures. When compared to other similarity measures, they found that the performance of the similarity measure was comparable to that of the conventional model. Similarity matching in content-based image retrieval (CBR) refers to the process of finding images in a database that are like a query image based on their content features. CBR systems analyze the visual content of images to perform retrieval tasks, rather than relying on metadata or annotations. Similarity mapping in CBR typically involves calculating the similarity between feature vectors representing images, which is expressed in Eq. (1) R2,C-7.
Similarity matching is an important step in query processing because the feature generator vector consumes much time in processing feature comparisons, which degrades the computation time of the query processing. The query of the image and the database's collection of images are compared once the features are extracted from the image. In the suggested paper, the Euclidean distance is used to determine how similar the two images are to one another. Eq. (2) defines the Euclidean distance formula. R-2,C-7. This research study develops the optimized hybrid ensemble model (OHEM) by considering the limitations of standard deep learning of heavy architecture described in Algorithm 1. It combines heterogeneous retrieval techniques with the ensemble architecture of many deep learning models, resulting in increased efficiency. OHEM reduces the computation time in image retrieving by using the objective function
Evolved from Eqs (4) and (5) contain pixel values
Let (
draw 4 * 4 matrix
# the first feature vectors
#pairwise similarity between query images
count = count+1
return
The studies were conducted using the widely recognized ROxford and RParis datasets in the field of image retrieval, and data can be freely accessible at

Sample query image.
The precision, recall, and
Recall serves as a measurement of the system's robustness in retrieval. The term recall refers to the proportion of relevant images that were successfully retrieved from all the relevant images stored in the database. Recall can be expressed as:
The calculation of the F-measure indicates the unified performance measure. The
Comparison of precision and recall on the datasets
CSM [18] | 83.29 | 71 | 73.58 | 84.28 | 71.93 | 78.83 |
EP [19] | 83.12 | 70.1 | 71.45 | 84.56 | 72.09 | 81.23 |
DCM [20] | 79.12 | 65.1 | 61.45 | 80.09 | 69.01 | 75.41 |
GA-based IR [21] | 84.16 | 66.6 | 78.56 | 85.56 | 72 | 74.51 |
IRT [22] | 80.14 | 69.5 | 71.26 | 78.41 | 68.47 | 73.25 |
OHEM | 85.56 | 73.3 | 81.45 | 86.25 | 75.59 | 87.89 |
EP, equilibrium propagation; OHEM, optimized hybrid ensemble model.

Result comparisons with different algorithms on the ROxford dataset. EP, equilibrium propagation; OHEM, optimized hybrid ensemble model.

Result comparisons with different algorithms on the RParis dataset. EP, equilibrium propagation; OHEM, optimized hybrid ensemble model.
The computation time to retrieve the image from the database is measured in terms of seconds. Each dataset contains a large number of images with unique features; OHEM worked well in retrieving optimal images from the database. The computational complexity measure through our local system, which has a configuration of 2.5 GHz speed, 8 GB of RAM, a 64-bit OS, and Windows 12. To streamline the process, we divided the images into five categories, and each category had an equal number of images. Each group of images is retrieved, and the time taken to retrieve is measured and counted as average time and visualized. The computational analysis study reveals that OHEM takes less time to retrieve the images from the database, plot, and visualize in Figure 5.

Comparison of computation time of different models on ROxford and RParis datasets. EP, equilibrium propagation; OHEM, optimized hybrid ensemble model.
This research used an enhanced hybrid ensemble model referred to as OHEM to demonstrate considerable advancements in the domain of content-based image retrieval from databases, predicated on similarity matching principles. OHEM addresses the essential challenges associated with the management of large-scale images within the database, as well as the computational time required to retrieve analogous images R-1,C-3. A considerable component of OHEM pertains to the extraction of features through the integration of a hybrid ensemble model for the retrieval of images in a coherent fashion. Utilizing two distinct databases (ROxford and RParis) encompassing a variety of image types, the proposed research evaluated five disparate algorithms (CSM, EP, IRT, GA-based IR, and DCM) R-1, C-2. Similar images may be accurately and quickly retrieved by entering a query image. Additionally, the proposed paper analyzes the precision and recall of each algorithm using both databases. It concludes that the OHEM algorithm has an accuracy of 85%, which is 3% greater than that of the CSM, EP, IRT, GA-based IR, and DCM R-1,C-1. Future papers on the system will incorporate more low-level feature images, such as spatial position and shape features, to fortify it. Semantically based image retrieval and image feature matching are the other two essential parts of the OHEM system.