Exploration of Vehicle Target Detection and Classification Method Based on Sea Lion Optimization with Deep Convolutional Neural Network

: Presently, urban environments over the globe are highly employed to obtain solutions for enhancing the quality of the livers and enhance the usage of city infrastructure and resources with minimal operational cost. Urban remote sensing acts as a significant part in the ability of mapping, monitoring, and controlling infrastructure. High-resolution remote sensing data renders worldwide images faster than traditional data collection strategies. Hence, small objects such as cars are easily detected. Vehicle recognition on aerial remote sensing images (RSIs) in the complicated background of urban zones has always gained a lot of interest in the remote sensing field. The automatic vehicles enumeration research domain had a significant contribution in several applications, including traffic management and monitoring. Target detection technology will be a crucial part of computer vision (CV) technology, and target detection techniques were enforced in several domains. Therefore, this study develops a new Vehicle Recognition and Classification using Sea Lion Optimization with Deep Learning (VRC-SLODL) model on RSI. In the presented VRC-SLODL technique, the major intention lies in recognising and classifying vehicles present in the images. The bilateral filtering (BF) technique can initially improve the RSI quality. The VRC-SLODL technique employs a modified residual network (ResNet) model to produce a collection of feature vectors. Eventually, the SLO approach with long short-term memory (LSTM) technique was exploited for vehicle classification, where the SLO algorithm acts as a hyperparameter optimizer. The experiments were performed on a benchmark dataset to examine the better performance of the VRC-SLODL method. The obtained values reported the improved classification performance of the VRC-SLODL technique over other models.


Introduction
With the progression of remote sensing technologies, remote sensing images (RSI) have seen exponential growth [1].Compared with aerospace RSI, aerial RSI has the merits of accurate geometric correction, large imaging scale, and high resolution.Thus, aerial remote sensing is considered a significant remote sensing way [2,3]; it generally leverages balloons or aeroplanes as working platforms, and flying altitude lies from meters to kilometres.In aerial RSI, vehicle recognition was a crucial technology in military and civil surveillance [4], like urban planning and traffic management.Still, the technique of manual analysis for vehicle detection has low data usage and poor data timeliness, which can be affected easily by subjective consciousness, physical conditions, and mentality [5,6].Thus, it is mostly required for executing automated vehicle recognition on RSI accurately and efficiently.
Remote sensing target recognition was to spot the object of interest in RSI after predicting the location and type of this target [7].In the conventional recognition database, the target was focused.In contrast, the aviation data was not, and object strength in the aviation image generally appeared in arbitrary orientation that relies on the perception of the Earth vision platform [8,9].Object recognition refers to identifying samples of semantic objects of some classes (like birds, humans, or aeroplanes) in video and digital images.Minor target recognition was problematic in target recognition tasks [10].The RSI study has vital applications in environmental management, military, transport planning, and disaster management.In addition, vehicles in RSI as a special category, whether civilian transportation or military, have a significant meaning and are simultaneously very difficult [11].
Though the obtainability of high-resolution satellite images hastens the object recognition task and automates such applications, vehicle identification from satellite images is difficult [12].The reason behind this is that even in high spatial resolution images, vehicles were visible as minute spots unidentifiable from foreground areas to identify [13,14].Categorizing the identified vehicles was very serious since it sometimes could not differentiate small and large vehicles utilizing the eye itself from high-resolution images.Over the years, DL approaches were broadly employed in several research domains, and the evolving advancement of convolutional neural networks (CNN) made certain enhancements [15].Kim et al. [16] examine integrating more predictive layers as to typical Yolov3 utilizing spatial pyramid pooling to complement the recognition accuracy of vehicles to huge scale change or make occlusion by another object.In [17], an innovative nearby ground vehicle recognition method in aerial infrared images dependent upon a CNN was presented.The UAV and infrared sensor utilized during this application can be initially established.Afterwards, a new aerial moving environment was created, and an aerial infrared vehicle database was unprecedentedly created.The authors released this database (NPU_CS_UAV_IR_DATA) utilized in the subsequent study under this domain.Afterwards, an end-to-end CNN was constructed.A real-time ground vehicle method was generated with huge counts of detection features being iteratively learned.
In [18], a new end-to-end DL network, such as orientation-aware feature fusion single-stage detection (OAFF-SSD), was designed to precise recognition of dense several creating vehicles utilizing images in UAV.The projected OAFF-SSD contains 3 important elements as a multi-level extracting feature, new feature fusion, and novel orientation-aware bounding box (OABB) offer and regression.In the meantime, detailed approaches can be selected for the fast convergence of trained loss.In [19], a productive and effective moving vehicle recognition system in aerial infrared image orders was presented using fast image registration and the YOLOv3 network.Because of insufficiency of infrared vehicle instances, transfer learning (TL) was established for training the enhanced YOLOv3 network.Koga et al. [20] examine an unsupervised domain adaptation (DA) approach, which could not need to be labelled trained data and, therefore, could preserve recognition efficiency in targeted domain at a minimal cost.The authors executed Correlation alignment (CORAL) DA and adversarial DA to region-related vehicle detectors and enhanced the recognition precision by over 10% from the objective domain.
This study develops a new Vehicle Recognition and Classification using Sea Lion Optimization with Deep Learning (VRC-SLODL) model on RSI.In the presented VRC-SLODL technique, the RSI quality can be initially improved by the bilateral filtering (BF) technique.The VRC-SLODL technique employs a modified residual network (ResNet) model to generate a set of feature vectors.Finally, the SLO method with long short-term memory (LSTM) model was exploited for vehicle classification, where the SLO algorithm acts as a hyperparameter optimizer.The experiments were performed on a benchmark dataset to examine the better performance of the VRC-SLODL method.

The Proposed VRC-SLODL Model
In this study, we have introduced a new VRC-SLODL algorithm developed for vehicle classification on RSI.In the presented VRC-SLODL technique, the major intention lies in recognising and classifying vehicles present in the images.It encompasses four major processes: BF-based preprocessing, modified ResNet feature extraction, LSTM classification, and SLO-based hyperparameter tuning.Fig. 1 illustrates the overall procedure of the VRC-SLODL system.

BF-based Preprocessing
At the primary level, the quality of the RSI can be enhanced by the BF technique.A BF can be exploited for de-noising the input images.The BF used spatial weighted averaging without the help of smoothing edges [21].With the combination of two Gaussian filters, this could be accomplished in the spatial domain, one operating and in the intensity domain, the other functioning.The intensity and spatial distance were applied for weight, and it is expressed below: From the expression, () represents the pixel spatial neighbourhood (),   and   symbolize the variable directs weight in intensity domain and spatial begin to reduce and  specifies the normalization constant.
BF was exploited in tone mapping, volumetric de-noising, and other applications like denoising images.They may generate basic conditions for down-sampling the crucial procedure and attain acceleration by formulating 2 modest non-linearity and augmented space.The BF can be applied as linear convolution.

Feature Extraction using modified ResNet Model
The VRC-SLODL technique employed the modified ResNet model for producing a set of feature vectors.The representation capability of the model becomes strong once the network is strengthened [22].In other words, the model extracts additional features from the image.However, the network is complex to train.The initial problem encountered during training is gradient exploding or vanishing.The amount of calculation increases rapidly due to the surge in the number of layers.The gradient becomes unstable in the backpropagation model.For the gradient vanishing problems, researcher workers developed various solutions, namely batch normalization, initialization of MSRABN, etc.A further challenge is network degradation, viz., with the rise of network depth, the model efficiency becomes poor.Especially, the training accuracy would be decreased.In the case of overfitting, the training accuracy must be higher.ResNet resolves these obstacles using residual learning.Note that ResNet gives two choices, residual mapping and identity mapping.Let us assume that the network's input has obtained the optimum level.In other words, is optimum.The residual mapping will be set to once they continue to deepen the network.Currently, the network has identity mapping.The crucial thing is that network deepening won't reduce the model performance.
Formally, indicates the original mapping, and then the stacked non-linear layer fits another mapping of .Now, represents the residual mapping.The original mapping transformed into This study uses Modified ResNet-152, a variation of ResNet-152, as a recognizer to categorize facial expressions.In the Modified ResNet-152 framework, the initial layer is 77 convolutions.Next, there are four major components with 9 layers, layers, layers, and 9 layers correspondingly.Lastly, implement the softmax, global average pooling, and a 7-way FC layer.

LSTM-based Classification
Here, the LSTM model is used for vehicle classification.Once the feature is made, they are inputted into LSTM classification for allotting the appropriate class.LSTM is an exclusive RNN model with RNN features, while the memory cell sequence was exploited to improve the learning procedure of time series and randomly handling input datasets [23].Furthermore, the input dataset's long-term dependence is captured to prevent the gradient disappearance of data communication, a significant enhancement to capture the active modification of time sequence.It is the most productive RNN structure and is widely applied.This approach incorporated the input gate, output gate, output unit, input unit, and memory cells.

Input gate:
[69] Forget gate: Output gate: Input unit: Memory cell: Output unit: From the expression, , , and correspondingly characterize the nonlinear activation function; signifies the resulting unit function; and represent the weight and bias coefficients that are implemented for establishing the connection amongst the units.

Hyperparameter Tuning using SLO Algorithm
In this work, the SLO algorithm acts as a hyperparameter optimizer.SLO has been proposed to resolve global-scale optimization [24].It stimulates the hunting behavior of sea lions, including how they use their tail and whiskers or capture and encircle prey.SLO provides more competitive outcomes than other PSO techniques.SLO is given in the following.Initially, SLO creates (the size of population) -dimension solution through standard distribution in the searching space.Next, in the swarming of sea lions, they recognize the position of prey and gathers others who join subgroups for organizing the net, ensuring encircling model.The prey will be regarded as the solution nearby or the current optimum solution to the optimum solution.This behaviour is given as follows. Whereas: ,  indicates the initial location vector of ℎ-ℎ solution; ,  represent the minimal value for ℎ-ℎ parameter of -ℎ solutions; symbolizes the maximum value for a -ℎ parameter of -ℎ solutions; indicates a random integer within [0,1].
The solution is estimated for fitness based on the objective function. Whereas: [70] indicates the location vector of the optimum solution; characterize sea lions in iteration ; signifies the existing iteration of generations; embodies the maximal amount of generations; symbolizes a arbitrary number withinand 1; that is multiplied by 2 to improve search range; +1 signify the novel location of the searching agent afterwards upgrading; embodies a parameter with a linearly declined value from two to zero, which indicates the encircling model once they move to the prey and encircle them.
Once the sea lion identifies a prey, it calls other members to create and gather a net for capturing prey.Sea lions are regarded as leaders, and it leads a group of sea lions towards them and decides the group's behaviour.The mathematical expression of this behaviour can be given in the following: Whereas:   characterize value that demonstrates decisions of leader ensued by others; characterize the angle of voice reflection in the water; signifies the angle of voice refraction in the water; In this study,  = 2 and  = 2(1 − ), whereas  denotes the random value within[0,1].
The hunting activity of sea lions led by the leader is defined below: Dwindling encircling method: These behaviours depend on the value of , and it declined linearly from 2 to 0. Hence it enables searching space round the present optimum location to force and shrink other searching agents for updating the searching space.Consequently, a recently upgraded location of the sea lion is positioned wherever in searching space between its existing location and position of optimum agent.
Circling updating location: Sea lion chases bait ball of fish and hunts them beginning from the edge, with  a random integer within [−1, 1].

Searching for prey (Exploration stage)
In this stage, the searching agent upgrades the location according to the randomly chosen sea lion.The condition which enables the exploitation stage to perform was once the value of  is higher than 1, and procedure of identifying novel agent can be formulated as follow: In Eq. ( 17),    represents the sea lions randomly chosen from the existing population. represents random integer range from zero to one.
The work results show that SLO faces noticeable issues with nature-inspired techniques, including slow convergence and being trapped in local optimal.In the presented method, the exploration and exploitation stage for ISLO is enhanced than the original SLO algorithm.

Conclusion
In this study, we have introduced a new VRC-SLODL method developed for vehicle classification on RSI.In the presented VRC-SLODL technique, the major intention lies in recognising and classifying vehicles present in the images.Firstly, the quality of the RSI can be enhanced by the BF technique.The VRC-SLODL technique employed by the modified ResNet model is used to produce a set of feature vectors.Eventually, the SLO algorithm with the LSTM approach is used for vehicle classification, where the SLO algorithm acts as a hyperparameter optimizer.The experiments were conducted on a benchmark dataset to examine the better performance of the VRC-SLODL method.The obtained values reported the improved classification performance of the VRC-SLODL technique over other models.In the upcoming years, the performance of the VRC-SLODL method can be improvized by fusion-based DL models.

Algorithm 1 : 1 While
Pseudocode of SLO Algorithm Input: Population size N, the maximal amount of generations   Output: The better solution   Randomly initialize the Sea Lion population   ( = 1,2, … , ) Arrange the population by the fitness values and find the better global solution    =  <   do Compute the value of  using Eq.(12) Compute   by Eq. (11) for  <  do if   < 0.25 then if  ∨ 1 then Upgrade the position of the existing searching agent by Eq. (11) Else Choose a random agent   from the existing population Upgrade the position of the current individual using Eq.(17) end if else Upgrade present searching agent position by Eq. (16) end if Check the bound and compute the fitness of the novel solution.Replace the older solution with the newer one if the fitness value is better 20: end for [72] Sort the population and Upgrade the global optimal solution    =  + 1 end while Return:

Figure 3 :
Figure 3: Confusion matrices of VRC-SLODL system (a-b) TR and TS databases of 90:10 and (c-d) TR and TS databases of 80:20

Figure 4 :
Figure 4: Result analysis of VRC-SLODL approach on 90% of TR database

Figure 5 :
Figure 5: Result analysis of VRC-SLODL approach on 10% of TS database Table 3 shows a brief vehicle classification outcome of the VRC-SLODL technique under 80% of TR and 20% of TS databases.

Figure 6 :
Figure 6: Result analysis of VRC-SLODL method on 80% of TR dataset

Figure 7 :
Figure 7: Result analysis of VRC-SLODL method on 20% of TS dataset

Table 1 :
Details of VEDAI Dataset

Table 2 :
Vehicle classification outcome of VRC-SLODL method under 90:10 of TR/TS databases

Table 3 :
Vehicle classification outcome of VRC-SLODL method under 80:20 of TR/TS databases

Table 4 :
Accuracy analysis of VRC-SLODL technique with recent systems