Research and Implementation of Forest Fire Detection Algorithm Improvement

Forests are one of the most important ecosystems on Earth. They provide species diversity and support the survival and reproduction of a wide variety of plants, animals and insects. It plays an irreplaceable special role in the balance of the earth′s ecosystem, climate regulation, resource protection, and economic development. If humans cherish and protect forests and sustainably use forest resources, forests will give back to humans with long-term ecological and economic value. Fire occurs artificially or naturally in the forest. It will spread uncontrollably and gradually develop into a disaster. It will not only cause immeasurable permanent damage to various resources and properties in the forest but also cause immeasurable permanent damage to humans and other people living in the surrounding area. A huge threat to the life safety of living things.

The type of fire is inseparable from the firefighting strategy. The focus of urban building fires is on controlling the fire and rescuing trapped people. Due to the speed of its spread, the breadth of its scope, and the huge difference between firefighting resource supply and firefighting demand, firefighting in forest scenes focuses on early detection and prevention of fires. Rapid detection of flame signs will be an important measure to prevent forest fires and respond to existing fires. However, because there are many types of fire scenes and the internal conditions are complex, relying solely on fire rescue personnel to screen fire scenes with harsh conditions has many uncertain and limiting factors. Therefore, we can use the camera equipment carried by individual firefighters or obtain information about the fire scene through other channels. The situation is transmitted back to the fire command headquarters for processing, allowing for a further comprehensive and in-depth understanding of the fire situation. Fire scene information perception and interaction are the foundation and premise of firefighting and are directly related to the depth and breadth of digital applications in fire scenes. Once the knowledge and understanding of fire scene information is lost, the fire command department will lose the ability to coordinate and plan firefighting operations from a high position when a large fire occurs. Therefore, this paper applies digitization to forest fire scenes, utilizing computers to analyze and process forest fire images, thereby reducing the manual analysis workload in firefighting activities. This approach enables timely and efficient detection of fires in the early stages for prompt alarm and response. It also fulfills the requirement of promptly locating and initiating targeted firefighting actions during the development of forest fires. This enhances the external perception of internal conditions at the fire scene. Furthermore, based on the situation at the fire scene, it facilitates task-driven scheduling assistance for individual soldier cooperation, thereby achieving the requirement for organized, efficient, and rational firefighting and disaster relief operations.

II.

Related work

All Fire, generated and spread by humans or nature in forests, can become a disaster [1]. It not only causes incalculable permanent damage to various resources and properties in the forest, but also poses a huge threat to the safety of human and other living beings living in the surrounding areas. This article applies digitalization to forest fire scenes, using computers to analyze and process forest fire scene images, reducing the workload of manual analysis in firefighting behavior, enabling timely and efficient detection in the early stages of fires [2], and providing timely alarms and responses, strengthening the external perception ability of the fire scene for internal conditions, and achieving the requirements of organizing and commanding orderly, efficient, and reasonable firefighting and disaster relief work.

Traditional forest fire detection relies on image processing techniques of classical computer vision to analyze and process the features of flame targets in images, including extracting edge features [3], texture instability and similarity analysis [4], foreground features [5], background modeling [6], and flame color analysis [7]. These classic algorithms have good detection performance, but there are drawbacks such as poor generalization ability [8] and slow detection speed [9]. Based on the characteristics of different fire scenarios, people choose different classic network models and develop various improvement plans for them. Reference [10] improves the GMM algorithm by fusing texture and similarity feature information of different colors in the image, but its learning ability for nonlinear change features is limited. Using depthwise separable convolution and CBAM to form a depthwise separable attention module, a new semantic segmentation network is formed [11], and the fusion multi-scale improved FRCNN [12] method results in slower model processing speed. Building deep neural networks to learn data representation and feature extraction has gradually become a trend in researching fire detection. YOLO has become one of the most optimal object detection algorithms at present. There are improvement options to choose YOLOv3, combined with the CAM [13], or in the detection output module, the improved K-means algorithm optimizes the prior box [14], or adds a variable convolution module [15]. The stability of model accuracy is greatly affected by the environment. On the basis of YOLOv4 network, there are methods such as color enhancement [16], introduction of attention mechanism and residual structure [17], which cannot reduce false positives in certain situations. In YOLOv5, the Neck module was introduced into the weighted bidirectional feature pyramid network [18] to replace the original path aggregation network, transfer learning [19] was adopted to train the model, and the Focal loss function [20] was introduced. The SPP structure was changed by a better performing SPPF structure [21], but the processing speed of the model still cannot meet the timeliness requirements of forest fire detection.

In response to the above issues, this article puts forward an improved fire detection model based on YOLOv5. This model will introduce the attention module dsCBAM, which replaces ordinary convolutions with depthwise separable convolutions, into the backbone network responsible for feature extraction in the YOLOv5 algorithm. This will improve the inference speed of the model and significantly improve its convergence speed. At the same time, the model has advantages in both representation ability in regions of interest and detection robustness in diverse environments.

III.

Design of fire detection model

Datasets

Since forest fire images cannot be collected and reproduced through experiments, the scene image data are public forest fire scene image data crawled on the Internet, and are collected and simulated by some enterprises and related research institutes to create publicly available data sets. Fire images are used to assist. Taking into account the diversity of real fire situations, images will contain different scenes, weather conditions, and fire intensity. Due to different image data acquisition channels and shooting conditions, image data may encounter several different recognition difficulties as shown in Figure1, such as poor image resolution, small fire range, flame obstruction, smoke interference, etc. The dataset applied to the model in this article is manually selected to filter out images that are not suitable for training in individual extreme cases. Then use labelimg software to label the flame area in VOC data set format using suitable image data.

Part of the image data used for training: (a) Fire with poor resolution. (b) Fire in a small area. (c) Fire with flame obstruction. (d) Fire disturbed by smoke.

YOLOv5s

YOLO is a classic target detection algorithm known for its efficient real-time detection. In fire detection tasks with high time-efficiency requirements, the YOLO algorithm can meet the need for rapid identification. Currently, the more mature version of YOLO is YOLOv5, and YOLOv5 is divided into s, m, l, and x, according to the complexity of the model. This article will conduct research and discussion based on the lightest YOLOv 5s 6.0 version.

On the data preprocessing part, YOLOv5s follows the Mosaic method. As shown in Figure 2, it will apply random scaling, random cropping, and random arrangement to the input images for splicing. The processed data is more accurate in detection. Sexuality and discernment abilities are enhanced. Secondly, the difference from 4 version is that the former backbone network apply the single CSP. Two tiny altering structure of CSP are applied in the 5th version. The backbone part adopts CSP1_X and neck part adopts CSP2_X. Normally, the backbone part of YOLOv5s implements by CBL, C3, and SPPF modules to stack in the Neck part. This can help YOLOv5 better handle targets of different sizes, improve network feature fusion capability, and improve detection performance. Finally, YOLOv5’s head output network calculates box and class probability of the target. The head network consists of multiple convolutional layers. It uses threshold filtering and NMS to obtain the final detection result and output the target prediction result. Different from the YOLO algorithm proposed in the previous sequence, YOLOv5 adopts a more lightweight convolution structure, which reduces the amount of calculation and maintains good accuracy.

VariFocal Loss

The computing method measures the disparity between the label predicted by the neural network and the expected true label to a certain extent. A good loss function will have a positive impact on the training process and final results of the neural network. YOLOv5 uses a loss function called CIoU (Complete Intersection over Union) to optimize the target detection task. The CIoU loss function takes into account the degree of overlap between the predicted boundary and the real bounding box and optimizes the positioning of the target more accurately. Generally speaking, in practice, the target to be detected in the training image data, that is, the positive sample, only accounts for a small part of the image, especially image data such as a forest fire scene that contains many small flame targets, and most of the area is the background., constitute the negative samples during training; this will lead to a large number of negative samples in the training data, while the positive samples will account for a relatively small proportion, and the training effect of the model will become worse. Normally, background class negative samples are generally easy-to-separate samples, while target class positive samples are difficult-to-separate samples. As shown in (1), in order to solve the problem of uneven distribution of the two samples, Focal Loss adds weight factors to the samples that α are difficult to separate and those that are easy to separate, increasing the weight of the difficult-to-separate samples and reducing the weight of the easy-to-separate samples, thereby controlling the positive The problem of too large gap between negative samples. Among them, in (1) α Represents balanced weight, (1 – p)^γ is a regulatory factor, γ is an adjustable focusing parameter. Therefore, Focal Loss is suitable for detecting image data of dense targets, and has good effects on data sets with characteristics such as small size, crowding, and occlusion. 1 $F L (p, y) = {\begin{matrix} - α {(1 - p)}^{γ} \log (p), y = 1 \\ - {(1 - p)}^{γ} \log (1 - p), o t h e r s \end{matrix}$

VariFocal Loss is proposed on the basis of Focal Loss, because Focal Loss processes positive and negative samples in a balanced manner, while VariFocal loss only reduces the loss contribution of negative samples without reducing the weight of positive samples in the same way. As shown in (2), the main improvement of VariFocal Loss lies in the introduction of parameter controlled weights for target classification loss, where p is the predicted value of IoU aware classification score (IACS), α and γ is an adjustable scaling factor, and q is a positive sample growth parameter. When it is a negative background sample, q = 0; When it is the target positive sample, q is equal to the IoU between the generated bbox and the annotation box at that point. In the traditional CIoU loss function, the weights for target classification loss and target localization loss are fixed. VariFocal Loss introduces α and γ parameters to adaptively adjust the loss weights based on the difficulty level of different samples [22]. When the sample is more challenging, larger values of α and γ increase the weight of target classification loss, emphasizing classification accuracy. When the sample is less challenging, smaller values of α and γ decrease the weight of target classification loss, prioritizing localization accuracy and dynamically adjusting the weight of target classification loss. When the sample difficulty is greater, α the value of sum is larger, the weight of the target classification loss increases, and more attention is paid to the accuracy of the classification; when the sample difficulty is low, the value of sum is small α, γ . The γ weight of the target classification loss decreases, and more attention is paid to the accuracy of the classification Positioning accuracy, dynamically adjust the weight of the target classification loss. By introducing VariFocal Loss, the target detection model can better balance the trade-off between target classification and target positioning, thereby improving the performance of target detection. VariFocal Loss has been applied in some target detection algorithms. 2 $\begin{matrix} V F L (p, y) = \\ {\begin{matrix} - q (q \log (p) + (1 - q) \log (1 - p)), q > 0 \\ - α p^{γ} \log (1 - p), q = 0 \end{matrix} \end{matrix}$

CBAM

In the process of development, the attention mechanism derives various types of attention. According to different classifications of attention, such as multi-scale attention, contextual attention, parallel branch attention, channel attention, spatial attention, etc. Depending on the size of the attention scale, there are several more outstanding models, such as Transformer, SE, CBAM [23], and so on.

CBAM is one of the new lightweight attention modules used to enhance convolutional neural networks. Quantitative attention model convolutional block model. It does not directly calculate the attention map, but separates it, learns attention from channel and spatial respectively, adaptively learns the channel correlation and spatial importance of the input feature map, and simultaneously takes advantage of both to improve the accuracy of feature extraction. Accuracy. Considering the difficulty of fire scene transmission and the portability requirements of individual firefighters, the pixels of fire scene images are generally very low. When it is difficult to apply computer-related algorithms for identification and detection, false alarms and missed detections will occur. The CBAM module, as is shown in Figure 3, inputs the feature map F,F∈R^C×H×W, processes it into a one-bit feature map through the channel attention module F′,F′∈R^C×1×1, and then uses the spatial attention module to generate a two-dimensional spatial attention map F′′,F′′∈R^1×H×W.

The CAM part, in Figure 4, performs global average pooling and global maximum pooling operations on the input feature layer, and connects the two pooling results using a shared multi-layer perceptron. By performing a weighting operation on the original input feature layer channel-by- channel multiplication, the feature information of different levels of the upper-level output feature map can be extracted. The CAM adopts the global average pooling and global maximum pooling serial structures and combines the results of the two pooling methods to achieve compressed spatial dimensions of the input feature map, with stronger representational power.

The SAM structure, in Figure 5, focuses on which part of the input image information is more significant and is a complement to the CAM in the previous part. To calculate spatial attention, first apply average pooling and maximum pooling along the channel direction of each feature point, stack and aggregate them to generate the channel information of a feature map, and generate two two-bit feature maps. Spatial attention obtains the global maximum feature in the spatial dimension by performing global maximum pooling in the channel dimension and learns the weight of each spatial position through two fully connected layers. In this way, the model can automatically learn the importance of each spatial location, that is, which spatial locations are more important for target positioning.

In the forest fire detection task, there may be different key channels for different fire types and backgrounds. Channel attention can help the model adaptively select channel information suitable for the current task, thereby reducing the interference of irrelevant information and improving feature representation. effectiveness. Fires usually appear at specific locations in images, and spatial attention can help the model focus on these important spatial locations and improve target positioning accuracy. By combining channel attention and spatial attention, the CBAM module enables the model to pay more attention to important channel information and spatial position information in the feature extraction stage, thereby enhancing the model′s perceptual ability. In the forest fire detection task, CBAM can help the model better understand the correlation and importance of the input feature map, improve the model′s detection and positioning capabilities of forest fire targets, and thereby improve the performance and robustness of the forest fire detection system.

Lightweight Convolution

Ordinary convolution is shown in Figure 6. Assume that the number of input channels is M, the size is D_F×D_F, the number of output channels is N, the convolution kernel size is D_K×D_K, and the bias term is ignored b . Then, the amount of calculation required for this convolution operation is 3 $Q_{c} = D_{K} \times D_{K} \times M \times N \times D_{F} \times D_{F}$

,the required parameters are shown in (4). 4 $P_{c} = D_{K} \times D_{K} \times M \times N$

The input feature map of the convolution is divided into g groups, each convolution kernel is also divided into groups accordingly, and the convolution operation is performed in the corresponding group. Each set of convolutions generates one feature map, and a total of g feature maps are generated. The number of groups g is like a control knob. The minimum value is 1, and g = 1 the convolution at this time is ordinary convolution; the maximum value is the number of channels of the input feature map C, and g = C the convolution at this time is depth separation convolution, also called channel-by-channel convolution.

In other words, depthwise separative convolution is a special form of grouped convolution, where the number of groups is the number of channels of the feature map. That is, each feature map is divided into a group, and convolution is performed within the group. A convolution kernel in the group generates a feature map. This convolutional form is the most efficient form of convolution. Compared with ordinary convolution, multiple feature maps can be generated with the same amount of parameters and calculations, while ordinary convolution can only generate one feature map. Pointwise convolution is just 1×1 an ordinary convolution. Because depth convolution does not integrate inter-channel information, it needs to be used in conjunction with point-by-point convolution. The operation of point-wise convolution is very similar to the conventional convolution operation. The size of its convolution kernel is 1×1×M, M which is the number of channels of the previous layer. Therefore, the convolution operation here will weightedly combine the feature maps of the previous step in the depth direction to generate a new feature map.

Depth separable convolution is equivalent to Figure 7(a). Assuming that this convolution and the ordinary convolution above face the same feature weighting task, the corresponding calculation amount of the depth convolution is

Depth separation convolution: (a) Depth convolution. (b) Pointwise convolution

Q_{dw} = D_{K} \times D_{K} \times M \times D_{F} \times D_{F}

The parameter quantity is 6 $P_{dw} = M \times D_{K} \times D_{K} .$

The corresponding calculation amount of pointwise convolution, as is shown in figure 7(b), is 7 $Q_{pw} = 1 \times M \times N \times D_{F} \times D_{F}$

The parameter quantity is 8 $Q_{pw} = 1 \times M \times N \times D_{F} \times D_{F}$

Then the total calculation amount and parameter amount of the combined depth-separable convolution are the sum of the two, and the calculation amount is 9 $P_{pw} = M \times N \times 1$

The parameter quantity is 10 $\begin{array}{l} P_{ds} & {=P}_{dw} {+P}_{pw} \\ {=D}_{K} {×D}_{K} ×M+M×N×1 \end{array}$

Compute the calculation amount and calculation parameters of depthwise separable convolution and ordinary convolution, that is 11 $Q_{ds} / Q_{c} = 1 / N + 1 / D_{k}^{2}$ 12 $P_{ds} / P_{c} = 1 / N + 1 / D_{k}^{2}$

From (11) (12), it can be seen that the calculation amount and parameter amount of the former are $1 / N + 1 / D_{K}^{2}$ times that of the latter. It shows that modified convolution reduces the required parameters and has reference significance in lightweight advanced models. In order to further lightweight the model, the convolution operation in the CBAM module can substitute modified convolution for the initial convolution method, which is referred to as dsCBAM in this article.

Improved Forest Fire Detection Algorithm

The forest fire detection framework of this article is shown in Figure 8, which is an improvement based on the YOLOv5s model. In response to the real-time requirements of forest fires, the first step is to improve the lightweight attention model and replace the convolutions in the CBAM module with depth-separable convolutions. Targets usually occupy a small proportion of the screen, which may cause sample imbalance. Replacing the initial loss function with the mentioned earlier function can improve performance for this problem. The third step is to add the lightweight CBAM model to YOLOv5s to enhance the robustness of the forest fire detection system.

The dsCBAM module can adaptively adjust the channel and spatial information through the CAM and SAM, which helps the YOLO network better understand the target structure and contextual relationships in the image and enhance its ability to perceive fire fields. As shown in Figure 9, this article adds the CBAM module to YOLO to replace the last CSP 1_1 module in the original model. Replacing the CSP1_1 module with the CBAM module will enhance the model′s ability to perceive targets at different scales, directions, and angles, thereby improving detection accuracy. The introduction of the CBAM module may help reduce noise or redundant information inside the prediction frame, make the target boundary clearer, enhance feature extraction and representation capabilities, and thus help improve detection quality. The CBAM module makes the structure of the backbone network richer and more diverse, making the network more robust to changes and disturbances in the input image, thereby increasing the generalization performance of the model.

The position of CBAM in YOLOv5s 6.0 version

IV.

Results and discussion

Training

The data set needs to be manually screened one by one and the flame targets in it need to be labeled. As shown in Table 1. According to the needs of the experiment, the collected forest fire image data set was divided into a training set, a test set, and a verification set in a ratio of 6:2:2 to carry out experiments on the model.

TABLE I.

DATASET SETTINGS

Dataset	Training	Test	Validation	Total
Homemade forest fire data set	1442	617	617	2676
Other institutes data set	600	200	200	1000

TABLE III.

TRAINING SETTINGS PARAMETERS

Training parameters	Detail
Epochs	100
Batch-size	16
Image-size	6 40 × 640
Initial learning rate	0.01
Optimization algorithm	SGD

The experimental settings of this experiment are shown in Table 2. It shows some parameter settings during the training process of the experiment. In this experiment, the training optimization algorithm uses the default stochastic gradient descent method (SGD). During training, the adaptive moment estimation (Adam) optimization algorithm can be selected according to the actual situation.

TABLE II.

EXPERIMENTAL SETTINGS

Lab Environment	Detail
programming language	Python3.8.5
operating system	Windows 10
deep learning framework	Pytorch 1.8.0
GPU	4x NVIDIA TITIAN V

Model Evaluation

The evaluation index of the public data set Microsoft COCO is recognized as effective and state-of-the-art in the field of object detection. It is used in this article to evaluate the performance of the proposed improved forest fire detection algorithm. The five indicators of P, R, AP, mAP and FPS will be expanded below. 13 $P = TP / (TP + FP)$

P (Precision) refers to the ratio of correctly detected targets to all detection results in (15). Among them, TP (true positive) represents the predicted correct box. The boxes predicted by the model are calculated one by one with the labeled boxes of the image. R (Recall) refers to the proportion of truly detected targets to all real targets in (14). 14 $R = TP / (TP + FN)$ 15 $AP = \int_{0}^{1} P (r) dr$

AP (Average Precision) essentially describes the performance of the model on a single category. In the multi-category target detection task, each category has an AP value. The metric provide specific numerical values to measure the algorithm′s prediction accuracy and target detection capabilities. 16 $FPS = 1000 / ({pre}_{process} + inf + NMS)$

pre_process refers to the preprocessing time for converting the input image into the format required by the algorithm, including image aspect ratio scaling, padding, normalization and other operation times. Value inf refers to the inference time, that is, the forward pass calculation time from inputting the image into the model to the model output result after preprocessing. NMS It can be understood that post-processing time is mainly the time spent on converting the model output results and other operations. The sum of the three is the total time of image processing. After calculation by formula (16), FPS (Frame Per Second) is obtained. If tested and compared in the same hardware environment, the lightweight effect of the algorithm can be expressed to a certain extent.

Ablation Experiment

Table TABLE IV. presents the results of this experiment. The experiments were evaluated separately on the same data set, and eight solutions were compared horizontally, namely (1) original YOLOv5s model; (2) combination of CBAM and YOLOv5s model; (3) combination of SE and YOLOv5s model; (4) Combining ECA with YOLOv5s model; (5) Improving the model combining CBAM with YOLOv5s; (6) Improving the model combining CBAM with Alpha-IoU and YOLOv5s; (7) Improving the model combining CBAM with SIOU and YOLOv5s; (8) Improving CBAM with VariFocal Loss Combined model with YOLOv5s. They show that improvement methods are effective from the four indicators of accuracy P, recall rate R and frames per second (FPS). After adding the CBAM model, the original model′s recognition accuracy of flame targets in forest fires has been slightly improved, and the model′s ability to perceive flame targets has been further enhanced. By introducing depthwise separable convolution, the speed of data processing of the model is improved, and the degree of lightweight and portability of the model is deepened. Finally, through ablation experiments to compare the three loss functions of Alpha-IoU, SioU, and VariFocal, the loss function proposed in this article was selected as the loss function with the best performance, which verified the importance of suppressing negative samples in improving the performance of the target recognition algorithm. All in all, compared with the traditional YOLO algorithm, the improved model has achieved significant performance improvements in both the difficult detection task of small target detection and the speed of detection.

TABLE IV.

COMPARATIVE TEST RESULTS OF THE MODEL

Model	P	R	FPS
YOLOv5s	0.811	0.786	59
YOLOv5s + CBAM	0.814	0.790	60
YOLOv5s + SE	0.810	0.787	5 8
YOLOv5s +ECA	0.812	0.791	5 9
YOLOv5s + dsCBAM	0.812	0.787	62
YOLOv5s + dsCBAM +Alpha-IoU	0.821	0.813	61
YOLOv5s + dsCBAM + SIoU	0.860	0.834	60
YOLOv5s + dsCBAM+ VariFocal (Ours)	0.871	0.816	64

Comparision

Figure 10 shows the cpmparative results of the original and the improved. By comparing the initial net and the proposed net to detect four groups of images, the detection results can more intuitively and objectively show that the improved model has better performance. The fire targets in the first set of images can be accurately detected, but the confidence of the improved model is significantly improved. There are three flame targets in the second set of images. The original model can only detect the two larger targets, while the improved model can detect all targets. The flames in the third group of images were blocked to a certain extent by foreground objects and could not be identified by the original model. The improved model accurately identified its location. The proportion of flames in the last set of images is relatively small, and the improved model solves the problem that the original model cannot detect. It can be observed that the improved model does not miss small fire targets and can detect fires more accurately even when the image quality and size are not very high Figures 11 present the robustness experiment, the initial net misclassified forest night lights as flames, while the improved model′s attention mechanism enhanced the feature learning of the detection targets, thereby improving the occurrence of false detections.

Experimental identification results: (a) Improved model. (b) Original model.

Experimental Experimental results of misdetection of forest street lights at night. (a) Original model. (b) Improved model.

By analyzing and comparing the experimental performance index data in Table 4, we can intuitively observe the robustness of all aspects of the model. Through the comparison of Experiments 1 to 4, we can see that after adding three different attention mechanisms: CBAM, SE, and ECA to the original YOLOv5s model, the results are affected to varying degrees. Among them, As to YOLOv5s combined with SE, the accuracy and FPS of the model have declined, while the recall rate has slightly improved. After the introduction of YOLOv5s in the ECA attention mechanism, the three indicators of P, R have slightly improved, and the model processing speed has almost no change. After the assistance of CBAM, P, R improved more significantly than ECA, and the growth of R was even better, indicating that it can detect more targets and reduce the missed detection problem. The processing speed is also not very high. Significant improvement. From the comparison, CBAM has the best improvement in focusing on small targets and detecting speed and is more in line with the requirements of diverse detection environments. The comparison between Experiment 2 and Experiment 5 directly shows the performance of applying the modified convolution. The experimental accuracy and recall rate result data show that the replacement strategy can shorten the processing time of each image based on ensuring the accuracy of the model, making the lightweight features of the model more prominent. Experiments 5-8 respectively completed the training, verification, and detection tasks of forest fire images by applying the improved CBAM and four different loss functions of the original loss function, Alpha-IoU, SIoU, and VariFocal. Comparing the experimental results of these four different loss functions on the training task, we can observe that the first replacement loss function has a small range of growth in the three indicators measuring model training, but the addition also affects the processing time of the model. Compared with the experimental performance of the SIOU loss and the loss used in Experiments 5 and 6, the training accuracy on the data set has been significantly improved, but this also makes the model pay the price of processing speed. Experiment 8 is the experimental data based on the improvement points proposed in this article. It has good adaptability to changes in input images, lighting conditions, occlusions, etc., and also takes into account the processing speed of the model, so that performance and efficiency are balanced to a certain extent.

Discussion

In this study, we apply CBAM, VariFocal, depthwise separable convolution, and YOLOv5s to the forest fire detection task. By comparing experimental results, we observe that the model achieves significant performance improvements on multiple metrics. First, the CBAM module helps improve the model′s attention to key areas, allowing for better detection of features. Secondly, the VariFocal loss function introduces dynamic weight allocation. Furthermore, depthwise separable convolution reduces the computational effort and maintains model performance. Through experiments, we observed that the improved models of CBAM, VariFocal, depthwise separable convolution, and YOLOv5 are highly robust when processing forest fire images in different scenarios, and the model can accurately detect various scales, poses, and density of fire targets, and also has certain adaptability to images under different lighting conditions. Compared with other deep learning-based methods, our method achieves faster inference speed while maintaining high accuracy. Although the improved models of CBAM, VariFocal, depthwise separable convolution, and YOLOv5 achieved good results in the forest fire detection task, there are still some limitations. For example, a model may perform poorly when dealing with low-resolution or blurry images. In addition, the robustness of the model in complex scenarios still needs to be further improved. Future work may need to consider combining multi-modal data, introducing a target tracking module.

Conclusions

This study conducted an in-depth study on the forest fire detection task by applying improved methods of CBAM, VariFocal, depthwise separable convolution, and YOLOv5s, introduced the working principle and working method of the original model, and deeply an alyzed the principles and possible improvements of various improvements. Achievability. For forest fire detection tasks, the assistance of the CBAM structure helps to improve detection model′s focus on key areas and the detection accuracy of fire targets. Its mechanism based on channel attention and spatial attention can effectively extract fire features such as flames and smoke in images. The assistance of the advanced loss function is able to overcome imbalance of sample category and present better results. This loss function uses dynamic weight allocation to make the model pay more attention to minority class samples, thereby improving detection accuracy. The application of depthwise separable convolution reduces the computational load of the model while maintaining model performance. This lightweight convolution operation helps improve the running efficiency of the model, making it more suitable for practical fire detection applications. Improvements in YOLOv5 show good performance in forest fire detection. Its fast and accurate target detection capabilities enable the model to monitor forest areas in real-time and detect the occurrence of fires promptly, thus providing the opportunity for rapid response and processing. Experiments are conducted to demonstrate the actual performance of various improvement ideas. Experiments on forest fire detection tasks have proven that the improved method in this article effectively enhances the perception ability of the original YOLO model and achieves good results. The accuracy rate is improved by 0.06 based on the original model, and the number of frames processed per second is 3 frames has been added, which greatly improves the accuracy and efficiency of forest fire detection. The results of this study provide an important reference and foundation for further research and development in the field of forest fire detection. By improving existing models and technologies, we can improve our monitoring and early warning capabilities for forest fires, thereby reducing the harm of fires to the environment and humans. Future work can explore more deep learning methods and technologies, integrate multi-source data, and enhance the robustness and real-time performance of the algorithm to further advance the development of forest fire detection technology. In summary, the results of this study provide useful exploration for research and practical applications in the field of forest fire detection and demonstrate the potential of CBAM, VariFocal, depthwise separable convolution, and YOLOv5 improved models in fire detection. It is hoped that this article can provide guidance for the prevention and control of forest fires and reduce the harm of fires to the natural environment and human society.

eISSN:: 2470-8038
Lingua:: Inglese

Frequenza di pubblicazione:: 4 volte all'anno
Argomenti della rivista:: Computer Sciences, other

Feed RSS della rivista

Research and Implementation of Forest Fire Detection Algorithm Improvement

Pubblicato online: 16 mar 2024

Pagine: 90 - 102

DOI: https://doi.org/10.2478/ijanmc-2023-0080

Parole chiave
Fire Target Detection, YOLOv5, CBAM, Depth Separable Convolution, VariFocal Loss

© 2023 Xi Zhou et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

Figure 7.

Figure 8.

Figure 9.

Figure 10.

Figure 11.

Research and Implementation of Forest Fire Detection Algorithm Improvement

Pubblicato online: 16 mar 2024

Pagine: 90 - 102

DOI: https://doi.org/10.2478/ijanmc-2023-0080

Parole chiaveFire Target Detection, YOLOv5, CBAM, Depth Separable Convolution, VariFocal Loss

© 2023 Xi Zhou et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

Figure 7.

Figure 8.

Figure 9.

Figure 10.

Figure 11.

Parole chiave
Fire Target Detection, YOLOv5, CBAM, Depth Separable Convolution, VariFocal Loss