Comparison of Computer Vision and Convolutional Neural Networks for Vehicle Parking Control
Online veröffentlicht: 26. Juni 2025
Seitenbereich: 26 - 33
Eingereicht: 14. Nov. 2024
Akzeptiert: 04. Apr. 2025
DOI: https://doi.org/10.14313/jamris-2025-011
Schlüsselwörter
© 2025 Jonathan Aguilar Alvarado et al., published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
In Ecuador, major cities face severe problems due to increased vehicle traffic. The number of registered vehicles has increased by 13.61% from 2013 to 2022 [1]. On average, a person may spend approximately 7.8 minutes searching for an available parking space [2]. The rise in vehicle population leads to higher fuel consumption, the production of pollutants and greenhouse gases, and increased traffic congestion. In Chicago, it is estimated that parking-related congestion generates an additional 129,000 tons of CO2 per year. Furthermore, a comparative analysis of 16 studies across 11 cities concludes that finding parking spaces can take an average of 8.1 minutes and contribute up to 30 percent to traffic congestion [3].
If this situation continues, it will affect the quality of life of the university community, leading to dissatisfaction when moving between university campuses, wasting considerable amounts of time, and contributing to environmental pollution [4]. This is where smart parking solutions come into play, aiming to optimize the efficient use of parking spaces through monitoring and diagnosing availability, demand, and usage patterns.
Smart solutions integrate the Internet of Things (IoT) [5], Big Data analysis and artificial intelligence (AI) to recommend parking spaces in real-time based on demand [6], the location of available parking spaces [7], the identification of customers who stay too long [8], the control of vehicles in private parking areas, the enabling of remote payments, and the detection of unauthorized entries [9].
This work aims to analyze the use of two artificial intelligence techniques (computer vision and convolutional neural networks) to classify parking spaces as free or occupied, and to evaluate these techniques based on their results using indicators such as precision and sensitivity.
Some solutions for parking control and monitoring integrate various advanced technologies. Computer vision, for instance, allows for the analysis of images and videos to detect free and occupied spaces in real time [10]. On the other hand, the Internet of Things (IoT), along with integrated sensors, facilitates constant monitoring of vehicle flow and space occupancy, providing accurate data for efficient management [11]. Additionally, the use of artificial neural networks (ANN) enables complex predictions and classifications, enhancing accuracy in detection and resource optimization [12]. Among relevant previous research, notable studies have explored the potential of these technologies to transform parking management:
Developed a smart parking system at the Universidad Politécnica Salesiana that employs Arduino Yun, Temboo, and ultrasonic sensors to provide real-time information on the availability of 12 spaces, accessible via Twitter and aweb page. According to surveys, 50% of users prefer Twitter, and the other 50% prefer the web, enhancing efficiency by reducing search time and congestion [13].
A prototype for parking space control was developed using LM393 proximity sensors, two LEDs in green and red, small-scale vehicles, all connected to an Arduino and a WiFi module. After conducting 100 tests, the system operated with a success rate of 92% [14].
Addresses the need for accurate detection of indoor parking spaces. The methodology employs wide-angle cameras and image processing (modified Hough transform) on a dataset of 5,000 images captured in various scenarios. Using computer vision and line detection algorithms, the system achieved 96% accuracy in detecting available spaces [15].
In [16], proposed a distributed system of wireless cameras that, using Raspberry Pi modules, HOG filters, and SVM classifiers, evaluated 10 spaces per second with 90% accuracy.
A study [17] analyzes the challenges of parking in high-traffic areas. It employs magnetic sensors, ultrasound, and computer vision to detect space occupancy. The results suggest that combining convolutional neural networks and multi-agent systems effectively improves efficiency in open parking areas, achieving up to 96% accuracy in some cases of occupancy detection.
A study [18] addresses the problem of finding parking in congested urban areas. The methodology employs a multi-agent architecture and computer vision to identify real-time free spaces using surveillance cameras. With a dataset from multiple urban cameras, 95% accuracy was achieved in detecting vacant spaces, thus improving the driving experience and optimizing parking at the urban level.
Developed an automatic parking space detection system based on computer vision. The model uses 14 parking spaces and achieves 99.5% accuracy in good visibility conditions, with a slight decrease in low visibility or occlusion scenarios. This system, implemented in MATLAB, accurately identifies free and occupied spaces, notifying drivers in real time [19].
In [20], a parking space detection system developed using the Parking Lot dataset (PLds), which includes images captured at Pittsburgh International Airport with resolutions of 1280x960 pixels. The applied methodology uses computer vision and multi-camera techniques to identify available spaces. The results show an average accuracy of 95% in vacancy detection, evaluated under varying lighting and weather conditions.
The PKLot dataset includes 695,899 images captured in two parking lots at the Federal University of Paraná and the Pontifical Catholic University of Paraná, in Brazil. The implemented system employs textures based on Local Binary Patterns (LBP) and Local Phase Quantization (LPQ), achieving a correct classification rate of up to 99.64% under controlled lighting conditions and 89% in more challenging scenarios-[21].
One study [22] uses a dataset of 8,600 surround-view images captured in indoor and outdoor parking lots, labeled to identify marking points in parking spaces. The methodology employs a learning-based approach with the AdaBoost algorithm to detect marking points. The results show an accuracy of 98.87% and a recall rate of 92.38%, processing between 20 and 25 images per second.
Established real-time detection of occupied parking spaces using smart camera networks and convolutional neural networks (CNN). The research utilizes the PKLot dataset (700,000 images) and the CNRPark dataset (12,584 images) under varying lighting conditions and occlusions. The implemented technology achieved an accuracy of up to 99.6% on the PKLot dataset and 90.7% in multi-camera scenarios with the mAlexNet model [23].
This document is organized as follows: Section 2 selects two techniques, and experiments are conducted to determine the most suitable one. In Section 3, the results are presented and analyzed. Finally, in Section 4, the conclusions of the experiment are presented.
This research used computer vision techniques and convolutional neural networks for parking occupancy detection, utilizing tools such as OpenCV in Python and the YOLO V5 model. The following experimental phases were carried out to conduct the research, as shown in the Figure 1. The following Sections detail the materials and methodology used for the analysis.

Research stages
In the computer vision analysis, OpenCV in Python was used. Gaussian blur filters and grayscale conversion were applied to reduce noise and improve contrast, allowing for clear segmentation of areas of interest (parking spaces). The evaluation was conducted by comparing the original Regions of Interest (ROI) state with the processed image, calculating deviations and averages to determine whether a space was occupied or free.
For the neural network-based model YOLO V5, training was conducted on Google Colab with GPU. The parameters were set to a learning rate of 0.01, a batch size 32, and 500 epochs. YAML files were used to configure the classes and training parameters. The LabelImg tool enabled manual image labeling, and Roboflow facilitated dataset enhancement through transformations and data splitting (80% for training and 20% for validation).
Materials used in research
Tool | Description |
---|---|
DAHUA IPC-HFW1430DT-STW | 4 MP, 2.8 mm fixed lens, 1/3” progressive CMOS sensor, H.265+ compression, 30M IR LED, DWDR, Day/Night mode (ICR), 3DNR, AWB, AGC, BLC, Mirror, IP67 outdoor protection, WiFi, MicroSD slot (256 GB) |
Google Colab | Cloud-based execution and training environment with GPU support. |
LabelImg | Open-source tool for manual image labeling |
Roboflow | Software for organizing, labeling, and transforming images. |
YAML | Text file format for model parameter configuration |
A total of 1,000 images taken in the parking spaces at the Universidad Técnica de Machala were used. The images were manually labeled using the LabelImg 1.8.0 tool. The labels were based on two classes: one for free parking spaces and the other for occupied spaces, as shown in Figure 2.

UTMACH parking
Once the images are labeled, a txt file is generated for each one. The txt file specifies the class numbers for the labels as 0 (free) and 1 (occupied), followed by the (x, y) coordinates of the boxes containing the objects, as shown in Figure 3.

Classified images
The Roboflow web tool was used to simplify the dataset classification process. Roboflow randomly divides the dataset: 100% of the images were split into 80% for training and 20% for validation. Figure 4 shows the creation of the dataset.

Dataset create
A YOLO (You Only Look Once) object detection architecture based on a Convolutional Neural Network is used. YOLO is a CNN that predicts objects. The neural network can achieve an execution speed of 45 frames per second (fps) on general-purpose computers [24].
Six activities are carried out to determine parking spaces: 1. Selection of the regions of interest (ROI) from the parking image, 2. Gaussian blur process, 3. Conversion of the RGB image to grayscale, 4. Evaluation of the original ROI and the converted image’s ROI, 5. Calculation of the standard deviation and the average; if the threshold is above or below, the occupied or free status is determined.
Parking images are captured, and regions of interest are selected using the YAML tool, as shown in Figure 5. Coordinates are defined as forming rectangular zones. This activity is performed manually in each parking area where space recognition is needed.

ROI selection
The Gaussian blur process involves using techniques such as low-pass filtering, where each pixel in the output image is the weighted sum of the corresponding pixel in the original image and its surrounding pixels.
Subsequently, image conversion is performed, which involves converting the three-channel red, green, and blue (RGB) image to grayscale (GRAY), thereby reducing the image information to some extent as a processing strategy. Finally, the values of the original ROIs are evaluated against those of the output image, and a copy of the image is created for comparison. These activities then allow for calculating the standard deviation and the average. The system can determine whether the parking space is occupied or vacant if the values are above or below a threshold.
Table 2 shows the parameters used for training, with the environment being Google Colab. The pretrained Yolo v5 weight was used with a batch size of 8 and 500 epochs.
Training parameters
Params | Value |
---|---|
Datasets | own |
train images | 800 |
validation images | 200 |
Learning rate | 0.001 |
Pre-trained weights | yolov5x.pt |
Number of epochs | 500 |
Batch size | 8 |
Image dimensions (height x width) | 640 × 640 |
For the configuration of training parameters, we followed criteria established in previous research evaluating the performance of YOLO models: a learning rate of 0.01 is used, adjusted through a cosine annealing strategy, with a batch size of 32 and 500 epochs [25]. In [26], a learning rate of 0.01 is used, with 350 epochs, allocating 90% of the dataset for training and 10% for validation. In [27], 100 epochs are used, with a batch size 16 and a learning rate of 0.01. In [28], 300 epochs are used with an autobatch function to determine the optimal batch size based on resource availability, allocating 85% for training, 10% for validation, and 5% for testing.
In the training process conducted in this research, the results are shown in Table 3 were obtained, and Figure 6 illustrates the detection of available and occupied spaces, with free spaces shown in green and occupied spaces in red. These values indicate good model performance in classifying space occupancy, comparable to previous studies. However, training performance can be further improved with some adjustments in parameters, such as the number of epochs and the learning rate strategy, adapted to specific resources and requirements.

Convolutional neural network time-real space control
CNN metrics result
Metric | Value |
---|---|
Precision | 0.8755 |
Sensibility | 0.8158 |
To contextualize the results, the sensitivity and precision metrics obtained in this research were compared with similar studies using computer vision to detect parking occupancy.
In some studies [29], the PakSta model based on computer vision achieved 93.6% accuracy in identifying occupied spaces under controlled conditions, using a vision approach based on the Deformable DETR model. [30] developed a computer vision system to detect available parking spaces using an IP camera and processing in Python with OpenCV, achieving 96% accuracy and transmitting the data in real time to a web page. In [31], HD cameras and OpenCV in Python were used, and the system detects free spaces with 94% accuracy and integrates a Telegram chatbot to notify users in real time. A fisheye lens camera and an embedded AI processor classify spaces as occupied or free in real time, achieving a recognition rate of 94.48% in simulations and 80.36% accuracy in real tests.
The research used both pre-recorded videos and real-time captures with an HD camera. The results, as detailed in Table 4, show a sensitivity of 79% and an accuracy of 80%, which is satisfactory for practical applications and demonstrates efficient detection of the occupancy status of parking spaces, as shown in Figure 7. Although the values are slightly lower than in some of the studies mentioned, our approach is adaptable to various capture conditions, making it more flexible and applicable to real-world scenarios than other systems that require more controlled conditions.

Artificial vision time-real control space
Computer vision metrics result
Precision | Sensibility | |
---|---|---|
1 | 1 | |
0 | 0 | |
1 | 1 | |
1 | 0.94 | |
1 | 1 | |
0.80 | 0.79 |
The results obtained in this research are comparable to those of previous studies. A study in [29] achieved 93.6% accuracy in occupancy detection under controlled conditions using YOLO. Another study [30] obtained 96% accuracy using OpenCV with a computer vision approach in outdoor environments. A comparison of accuracy and sensitivity across different studies is presented below:
Table 5 compares the parking space detection methods used in this study and others available in the literature. Convolutional Neural Network (CNN)-based models, such as YOLO (CNN, pixel-wise ROI) and U-Net, achieved the highest levels of accuracy and sensitivity, with values reaching up to 99.68% and 99.68% (balanced accuracy), and 99.40% and 92.94%, respectively. These results suggest that semantic segmentation models are particularly effective for correctly detecting parking spaces and reducing false positives. On the other hand, traditional computer vision (CV) techniques, such as optical flow and the block matching algorithm, show more varied performance, with high sensitivity in some cases but less consistency in accuracy.
Comparison of accuracy and sensitivity of different parking space detection methods
Research Study | Technique | Precision | Sensibility |
---|---|---|---|
This Study | YOLO V5 (CNN) | 88.00% | 82.00% |
[32] | DeepLabV3+ | 77.26% | 79.55% (Dice) |
[33] | YOLO (CNN, pixel-wise ROI) | 99.68% (balanced accuracy) | 99.68% (balanced accuracy) |
[34] | ResNet50 + SVM VGG16 | 98.90% 93.40% | Not specified |
[35] | Semantic Segmentation (CNN) | 96.81% | 97.80% |
[36] | mAlexNet (CNN) | 90.34% | 98.98% |
[37] | YOLO V4 (CNN) | 93.00% | 98.00% |
[38] | U-Net (CNN) | 99.40% | 92.94% |
[39] | YOLOv7 + IoU (CNN) | 90.04% | 82.17% |
This Study | Image Segmentation (CV) | 80.00% | 79.00% |
[40] | Optical Flow (CV) | 98.80% | 94.40% |
[41] | HOG, LBP, SVM y Naive Bayes (CV) | 97.00% | 97.00% |
[42] | Binary Morphology y Logic (CV) | 76.75% | 99.00% |
[43] | Optical Flow (CV) | 97.90% | 62.40% |
[44] | Block Matching Algorithm (CV) | 93.00% | 46.00% |
[45] | Multi-clue recovery model (CV) | 93.21% | 96.84% |
In this study, YOLO V5 and image segmentation techniques were used, achieving accuracies of 88% and 80% and sensitivities of 82% and 79%, respectively. These results indicate that our CNN and CV-based techniques are competitive but still need improvement to reach the levels of accuracy and sensitivity observed in more advanced studies.
CNN-based approaches, such as YOLO V4 and U-Net, offer a more suitable comprehensive solution for applications requiring high detection accuracy, especially in scenarios where false positives need to be minimized. However, computer vision techniques remain useful in contexts with limited computational resources. In the future, exploring the combination of CNN and CV techniques would be beneficial to leverage the strengths of both approaches and improve the overall system performance.
In this study, two main approaches for parking space detection are compared: convolutional neural networks (CNN) and traditional computer vision (CV) techniques. The results indicate that CNN-based models, such as YOLO V5, offer higher accuracy and sensitivity than computer vision techniques, especially excelling in applications requiring high object detection accuracy while minimizing false positives.
Neural networks are effective in contexts where detection quality is a priority, while computer vision techniques show advantages in scenarios with limited computational resources. This suggests that, although neural networks are superior in performance, computer vision remains a viable alternative for environments where simplicity and low cost are determining factors.
The YOLO V5 model demonstrates high precision and sensitivity; however, it demands significant computational resources, achieving detection rates between 40–45 FPS when using GPU acceleration. In contrast, traditional computer vision techniques exhibit lower performance but require fewer resources, typically operating at speeds of 20–30 FPS using CPU-only environments. Therefore, future studies should evaluate computational efficiency, specifically energy consumption and memory usage, enabling adaptations suitable for diverse scenarios with varying technical capacities and economic constraints.
Based on the results, it is recommended that more advanced CNN versions be explored to increase accuracy and reduce false positives. Additionally, integrating hybrid techniques that combine neural networks with computer vision could provide a more balanced solution, leveraging the strengths of each approach to improve system robustness and efficiency.
Future research should focus on expanding the dataset to include different environmental conditions, such as lighting variations and occlusions, to enhance the generalization capability of the proposed models. Additionally, it is suggested that evaluations be implemented in large-scale real-world environments, such as public or commercial parking lots, to validate the system’s performance under practical conditions and demonstrate its scalability and applicability in real-world scenarios.