Dense Scene Flow Estimation with the GVF Snake Model for Obstacle Detection Using a 3D Sensor in the Path-Planning Module

The intelligence of the vehicle is currently being improved for a variety of applications, with a particular focus on path planning for mobile robots, one of the main areas of study in the field of autonomous control environments [17]. Robots operating in a variety of environments, such as land, air, or underwater, face a variety of complex and challenging path planning challenges. The mobile robot can avoid obstacles by routinely observing the kinematic behavior of objects in its path. In addition, when the position of the obstacles changes, the autonomous vehicle (AV) must recalculate its path to avoid colliding [12]. Motion estimation has been thoroughly researched over the past 20 years in the fields of computer vision and robotics, including the identification of human behavior, object tracking [29], and object segmentation. For estimation, a variety of vision sensors have been employed in the past, including infrared sensors [11], sonar [24], ultrasonic sensors, Laser Detection and Ranging (LADAR) [7], laser rangefinder, camera data fused with radar [3], and stereo cameras with a projector [2]. These sensor systems, however, are unable to independently provide the required information about the surroundings.

Scene flow refers to the estimation of the 3D motion of objects in a scene over time. Scene flow is critical in several computer vision applications such as autonomous driving, 3D reconstruction, and object tracking. The estimation of scene flow requires the analysis of consecutive frames of an image sequence and the extraction of motion cues from the image features. Scene flow has numerous applications in computer vision, including object tracking, motion detection, and 3D reconstruction [23]. Scene flow is particularly important in autonomous driving, where it is used to estimate the motion of surrounding vehicles, pedestrians, and other objects in real time. In the development of AVs, it allows vehicles to detect and track objects in real time, providing them with the necessary information to make informed decisions and avoid collisions. Scene flow [27] is essentially the 2D motion field of points in an image, whereas optical flow is the 3D motion field of points in the world. Differential optical flow techniques can be categorized as local and global in the computer vision community [8]. In the former, local energy is optimized, as in the Lucas–Kanade method [21], and in the latter, global energy is minimized to determine the flow vector, as in the Horn–Schunck approach [18]. While global techniques produce 100 dense flow fields but are much more sensitive to noise, local methods are robust to noise but cannot produce dense optical flow fields [8]. The Lucas–Kanade and Horn–Schunck local and global methods [5] are combined [13] to create a method that creates dense optical flows under noisy image conditions. These data can be used to compute object trajectories and times to collision and to enhance the quality of 3D object segmentation [25]. Based on differential analysis of 4D light fields, a previous study [22] presents a method for recovering 3D dense scene flow.

The GVF snake model [19] is a state-of-the-art object tracking algorithm that uses a geometric active contour (AC) framework to track objects in real-time video frames. It is a type of AC model that uses gradient vector flow (GVF) to define the forces that control the contour's movement. The algorithm is based on the concept of a snake that moves in a way that minimizes its energy while conforming to the shape of the object being tracked. The GVF snake model is particularly useful for tracking objects with complex shapes and irregular boundaries. It is also robust to noise and can adapt to changes in the object's shape and position. The GVF snake model has several advantages over other object tracking algorithms. These advantages include robustness to noise and image artifacts, ability to track objects with complex shapes and irregular boundaries, and real-time tracking capability. This makes it ideal for applications such as robotics, video surveillance, and medical imaging [9]. Moreover, the common evaluation metrics such as endpoint error (EPE), average angular error (AAE), and standard deviation angular error (STDAE) are used to evaluate the performance of different algorithms with the benchmark Middlebury datasets [1].

Using the Pioneer 3DX (P3DX) mobile robot, we conduct our experiments in real time while proposing a dense scene flow estimation method in this article. Without relying on the texture or features of the object, our method assesses the three-dimensional motion of every point in the universe. This study aims to use depth data to identify moving obstacles. The combined local and global differential flow method [6] is used in this study to extract the flow vectors in 3D world coordinates, along with the GVF method, which is successfully used to segment the desired portion and to achieve the boundary detection of the obstacles from the 3D image frame. The GVF and scene flow technique work together well to evaluate the 3D location of the obstacles by their center.

The major contributions of the study are as follows:

dense scene flow vectors are estimated by combining differential flow techniques with the gradient vector flow snake model.

to verify the accuracy and effectiveness of our approach, evaluation metrics including EPE, AAE, and STDAE are carried out.

the efficiency of the approach is determined using two different 3D sensors, namely, PMD and Kinect sensors.

the proposed approach is tested for obstacle avoidances in the path planning module.

This article is organized into five sections. The 3D vision sensors and vehicles are described in Section II, as well as the imaging model for the depth camera. Different scene flow techniques are explained in Section III, along with the GVF snake model approach that enhances obstacle detection and obtains dense flow vectors in the 3D coordinate. The experiments that were carried out using actual data to evaluate the accuracy of dense flow estimation by a PMD camera are described in Section IV. To conclude, the real-time experiments are carried out for the path planning task after the acquired data have been finally shown and discussed. The whole work is summarized in Section V.

II.

System Overview

Our work uses the P3DX mobile robot as a reference platform because it offers the same workspace as used by other researchers and also because of its adaptability, dependability, and durability. It is a differential drive vehicle with two 16-point 5-centimetre-diameter motorized wheels and one castor wheel. It has two encoders that monitor the distance traveled on each wheel and has a swing radius of 26 cm. The vehicle is equipped with an embedded computer with a 1.8 GHz Intel Pentium M processor and Debian 6.0, as well as high-performance microcontrollers and cutting-edge embedded robot control software. The AscTec Pelican quadrotor is equipped with an AscTec Mastermind and inertial measurement unit (IMU), which has an Intel Core i7-3612QE processor and 4GB DDR3 RAM and can perform calculations for optical flow, visual odometry, and other tasks. There are two 60 MHz ARM 7 microcontrollers: one with a low-level processor (LLP) for attitude control and IMU data processing, and the other with a high-level processor (HLP) for sensor fusion, speed, and position control.

To obtain information about the depth/range, a time-of-flight (TOF) PMD camera [20] is used. In a TOF camera (Figure 1a), the illumination source transmits a modulated light pulse, and the target distance is calculated from the elapsed time by the pulse to bounce off the target and return to the receiver. Based on the incoming and outgoing signals’ phase shifts, information about the range is computed.

System Overview. (a) PMD camera mounted on P3DX. (b) Microsoft Kinect mounted on the AscTec pelican quadrotor. P3DX, Pioneer 3DX.

The distance d to the target is expressed as follows: (1) $d = \frac{c . ϕ}{4 π . f_{\mod}}$ {\rm{d}} = \frac{{{\rm{c}}{.}\phi }}{{4\pi {.}{{\rm{f}}_{mod}}}} where c is the speed of light, f_mod is the modulation frequency, and ϕ refers to the signal phase shift.

The complete image of the scene is created by the camera's receiver optics, which have 200 by 200 pixels. Data for the range is obtained from each camera pixel and kept in a 2D array. The P3DX(AGV)'s top front has a mount for the PMD camera at a static tilt, Ψ, of the head to view the surroundings. The axis of the camera is nearly parallel to the ground.

At frame rates, up to 30 Hz, the Kinect sensor depicted in Figure 1b simultaneously captures color and depth images. The 640 × 480 and 11-bits per pixel Infra Red (IR) camera offers 2048 levels of sensitivity, while the RGB color camera delivers images at 640 × 480 pixels and 24 bits at the highest frame rate [10]. The field of view (FOV) of the Kinect sensor is 45° vertically and 50° horizontally, with an approximate operational range of 0 points 5 m to 4 m. The IR emitter and camera use a structured light technique to create a seemingly random dot pattern. The depth image and RGB image can be accessed using the OpenNI package because Robotic Operating System (ROS) is installed in the Ubuntu operating system for interacting with all the various software components. Launch files for using OpenNI-compliant gadgets like the Microsoft Kinect are included in this package.

Imaging Model of Depth Camera

Figure 2 depicts a PMD CamCube 3.0 imaging model with the sources separated from the sensor. The PMD camera should ideally have two light sources mounted symmetrically on either side of the smart pixel array. To determine the depth of the obstruction, illuminate it. With a 40 [H] × 40 [V] deg FOV, the infrared light sources L_s1 and L_s2 are arranged spatially symmetrically to the camera, illuminating the obstruction with incident light rays. The light beams from the sources incident on a point (P(x,y,z)) on the obstruction in three-dimensional space. As shown in Figure 2, 1 and 2 are the angles between the surface normals of the x–y plane and the rays traveling from L_s1 and L_s2 to P(x,y,z). With respect to L_s1 and L_s2, D_opt1 and D_opt2 are the distances that the light must travel to strike point P(x, y, z) on the obstruction, such that L_s1 P = D_opt1 and L_s2 P = D_opt2. Point P's reflected light forms an angle (ζ) with respect to a surface parallel to the lens's central source (L_c). To reach the receiver pixel (p_o) on the smart pixels, the light ray is then refracted at L_c and travels LR = D_sen. At point P_c to the sensor array, this refracted ray forms an angle (ζ) with respect to the surface. The focal length (f) of the camera determines how far apart the imaging lens and sensor array are from one another. Depending on the location of each pixel in 3D space, each one has a unique travel time.

The depth information (D_model(x,y,z)) can be modeled as follows: (2) $D_{model (x, y, z)} = (\frac{D_{opt 1} + D_{opt 2}}{2} + D_{ref} + D_{sen}) / 2$ {D_{{{model}} \left( {x,y,z} \right)}} = \left( {\frac{{{D_{opt1}} + {D_{opt2}}}}{2} + {D_{ref}} + {D_{sen}}} \right)/2

Additional depth information and depth data acquisition are possible with the TOF camera. In the experimental setup, a PMD CamCube 3.0 camera is mounted on a table's horizontal rail to allow perpendicular motion toward and away from a fixed board. The distance at which the depth and amplitude images are measured is different between the plain perpendicular board and camera, i.e., 500 mm, 750 mm, 1000 mm, 1250 mm, 1500 mm, 1750 mm, 2000 mm, 2250 mm, and 2500 mm. As seen in Figure 3, these measured depth images are plotted.

PMD Depth Image Analysis with Barrel Distortions. The distance between the PMD camera and the fixed board are: (a) 500 mm (b) 750 mm (c) 1000 mm (d) 1250 mm (e) 1500 mm (f) 1750 mm (g) 2000 mm (h) 2250 mm (i) 2500 mm.

III.

Scene Flow Estimation

The local image motion of an image sequence is approximated by optical flow [4], which is based on local derivatives. That is, in 2D, it specifies how much each image pixel moves between adjacent images, whereas in 3D, it specifies how much each volume voxel moves between adjacent volumes. Temporal variations in image brightness are caused by moving patterns. Typically, a brightness constancy constraint equation (BCCE) is used to calculate optical flow. To determine the optical flow, differential techniques use the spatiotemporal derivatives of image intensity.

Similar to how optical flow is the 2D motion field on an image, scene flow [27] is the three-dimensional motion field of points in the world. Scene flow is critical in estimating the motion of surrounding objects in real time, enabling AVs to navigate safely and efficiently. Any optical flow is merely a camera's image plane being projected with the scene flow. If the globe is wholly non-rigid, the movements of the objects in the scene might all be independent of one another. Consequently, the scene motion is a dense three-dimensional vector field that is defined for each point on each scene surface. Differential optical flow techniques can typically be divided into local and global categories. The first one involves local optimization. As in the Lucas–Kanade method, the latter determines the flow vector by minimizing global energy, as in the Horn–Schunck approach. While global techniques produce 100 dense optical flow fields but are much more sensitive to noise, local methods are robust to noise but cannot produce dense optical flow fields. The suggested approach combines the local and global 3D scene flow methods with the gradient of the vector field snake model to estimate the dense flow information and to lessen the over-smoothing caused by the regularization term.

Combined local and global approach

The Lucas–Kanade algorithm [21] is an intensity-based differential technique that operates under the presumption that the flow vector is consistent within a pixel neighborhood. The flow vectors are determined by fitting a constant model for $\vec{V}$ {\rm{\vec V}} in each spatial neighborhood to a set of weighted least-squares local first-order constraints [6]. Eq. (3) is minimized to obtain the velocity estimate. (3) $W^{2} (x, y, z) = \nabla R (x, y, z, t) {\vec{V}}^{2} + {R (x, y, z, t) |}^{2} x, y, z \in Ω$ {{\rm{W}}^2}\left( {{\rm{x}},{\rm{y}},{\rm{z}}} \right) = \nabla {\rm{R}}\left( {{\rm{x}},{\rm{y}},{\rm{z}},{\rm{t}}} \right){{\rm{\vec V}} ^2} + {\left. {{\rm{R}}\left( {{\rm{x}},{\rm{y}},{\rm{z}},{\rm{t}}} \right)} \right|^2}\;{\rm{x}},{\rm{y}},{\rm{z}} \in \Omega where W (x, y, z) denotes a Gaussian windowing function. The velocity estimate is given by Eq. (4). (4) $\vec{V} = {[A^{T} W^{2} A]}^{- 1} A^{T} W B$ {\rm{\vec V}} = {[{{\rm{A}}^T}{W^2}{\rm{A}}]^{ - 1}}{{\rm{A}}^T}{\rm{W\,B}} where (5) $A = [\nabla R (x_{1}, y_{1}, z_{1}), \dots, \nabla R (x_{n}, y_{n}, z_{n})]$ {\rm{A}} = \left[ {\nabla {\rm{R}}\left( {{{\rm{x}}_1},{{\rm{y}}_1},{{\rm{z}}_1}} \right), \ldots ,\nabla {\rm{R}}\left( {{{\rm{x}}_n},{{\rm{y}}_n},{{\rm{z}}_n}} \right)} \right] (6) $W = diag [W (x_{1}, y_{1}, z_{1}), \dots, W (x_{n}, y_{n}, z_{n})]$ {\rm{W}} = {\rm{diag}}\left[ {{\rm{W}}\left( {{{\rm{x}}_1},{{\rm{y}}_1},{{\rm{z}}_1}} \right), \ldots ,{\rm{W}}\left( {{{\rm{x}}_n},{{\rm{y}}_n},{{\rm{z}}_n}} \right)} \right] (7) $B = - (R_{t} (x_{1}, y_{1}, z_{1}), \dots, R_{t} (x_{n}, y_{n}, z_{n}))$ {\rm{B}} = - \left( {{{\rm{R}}_t}\left( {{{\rm{x}}_1},{{\rm{y}}_1},{{\rm{z}}_1}} \right), \ldots ,{{\rm{R}}_t}\left( {{{\rm{x}}_n},{{\rm{y}}_n},{{\rm{z}}_n}} \right)} \right)

The gradient constraints are combined with a global smoothness term in the Horn–Schunck algorithm [18]. The flow velocity can be calculated by minimizing the squared error quantity of the constraint equation and smoothness constraint. The condition for global smoothness is provided as follows: (8) ${‖ \nabla V_{x} ‖}^{2}, {‖ \nabla V_{y} ‖}^{2}, {‖ \nabla V_{z} ‖}^{2} = \sum_{i = x, y, z} {(\frac{\partial V_{i}}{\partial x})}^{2} + {(\frac{\partial V_{i}}{\partial y})}^{2} + {(\frac{\partial V_{i}}{\partial z})}^{2}$ {\left\| {\nabla {V_x}} \right\|^2},{\left\| {\nabla {V_y}} \right\|^2},{\left\| {\nabla {V_z}} \right\|^2} = \sum\limits_{i = x,y,z} {{{\left( {\frac{{\partial {V_i}}}{{\partial x}}} \right)}^2} + {{\left( {\frac{{\partial {V_i}}}{{\partial y}}} \right)}^2} + {{\left( {\frac{{\partial {V_i}}}{{\partial z}}} \right)}^2}}

Eq. (9) specifies the error that must be minimized. (9) $E^{2} = \int_{D} {(\nabla R . \vec{V} + R_{t})}^{2} + α^{2} ({‖ \nabla V_{x} ‖}^{2} + {‖ \nabla V_{y} ‖}^{2} + {‖ \nabla V_{z} ‖}^{2}) dxdydz$ {E^2} = \int_D {{{\left( {\nabla R{.}\vec V + {R_t}} \right)}^2}} + {\alpha ^2}\left( {{{\left\| {\nabla {V_x}} \right\|}^2} + {{\left\| {\nabla {V_y}} \right\|}^2} + {{\left\| {\nabla {V_z}} \right\|}^2}} \right)dxdydz where α is a term used to determine how much of an impact the smoothness constraint has. The weighted least squares fit of local constraints is applied using a locally implemented [21] combined differential approach.

To the global smoothness constraint and a constant model for flow velocity [18], the predicted velocities can be decreased as follows: (10) $E^{2} = \int_{D} (W_{N}^{2} {(\nabla R . \vec{V} + R_{t})}^{2} + α^{2} ({‖ \nabla V_{x} ‖}^{2} + {‖ \nabla V_{y} ‖}^{2} + {‖ \nabla V_{z} ‖}^{2}) dxdydz$ {E^2} = \int_D {\left( {W_N^2{{\left( {\nabla R{.}\vec V + {R_t}} \right)}^2}} \right.} + {\alpha ^2}\left( {{{\left\| {\nabla {V_x}} \right\|}^2} + {{\left\| {\nabla {V_y}} \right\|}^2} + {{\left\| {\nabla {V_z}} \right\|}^2}} \right)dxdydz where W_N (x, y, z) denotes a Gaussian windowing function. α is a weighting term that identifies the influence of the smoothness constraint. Here, A^T W² A is calculated as follows: (11) $A^{T} W^{2} A = (\begin{matrix} Σ W^{2} R_{x}^{2} & Σ W^{2} R_{x} R_{y} & Σ W^{2} R_{x} R_{z} \\ Σ W^{2} R_{x} R_{y} & Σ W^{2} R_{y}^{2} & Σ W^{2} R_{y} R_{z} \\ Σ W^{2} R_{x} R_{z} & Σ W^{2} R_{y} R_{z} & Σ W^{2} R_{z}^{2} \end{matrix})$ {A^T}{W^2}A = \left( {\begin{array}{*{20}{c}}{\Sigma {W^2}R_x^2}&{\Sigma {W^2}{R_x}{R_y}}&{\Sigma {W^2}{R_x}{R_z}}\\{\Sigma {W^2}{R_x}{R_y}}&{\Sigma {W^2}R_y^2}&{\Sigma {W^2}{R_y}{R_z}}\\{\Sigma {W^2}{R_x}{R_z}}&{\Sigma {W^2}{R_y}{R_z}}&{\Sigma {W^2}R_z^2}\end{array}} \right)

The velocity estimates can be solved through an iterative process [6]. (12) $V_{x}^{n + 1} = V_{x}^{n} - \frac{W_{N}^{2} R_{x} (R_{x} V_{x} + R_{y} V_{y} + R_{z} V_{z} + R_{t})}{α^{2} W_{N}^{2} (R_{x}^{2} + R_{y}^{2} + R_{z}^{2})}$ V_x^{n + 1} = V_x^n - \frac{{W_N^2{R_x}\left( {{R_x}{V_x} + {R_y}{V_y} + {R_z}{V_z} + {R_t}} \right)}}{{{\alpha ^2}W_N^2\left( {R_x^2 + R_y^2 + R_z^2} \right)}} (13) $V_{y}^{n + 1} = V_{y}^{n} - \frac{W_{N}^{2} R_{y} (R_{x} V_{x} + R_{y} V_{y} + R_{z} V_{z} + R_{t})}{α^{2} W_{N}^{2} (R_{x}^{2} + R_{y}^{2} + R_{z}^{2})}$ V_y^{n + 1} = V_y^n - \frac{{W_N^2{R_y}\left( {{R_x}{V_x} + {R_y}{V_y} + {R_z}{V_z} + {R_t}} \right)}}{{{\alpha ^2}W_N^2\left( {R_x^2 + R_y^2 + R_z^2} \right)}} (14) $V_{z}^{n + 1} = V_{z}^{n} - \frac{W_{N}^{2} R_{z} (R_{x} V_{x} + R_{y} V_{y} + R_{z} V_{z} + R_{t})}{α^{2} W_{N}^{2} (R_{x}^{2} + R_{y}^{2} + R_{z}^{2})}$ V_z^{n + 1} = V_z^n - \frac{{W_N^2{R_z}\left( {{R_x}{V_x} + {R_y}{V_y} + {R_z}{V_z} + {R_t}} \right)}}{{{\alpha ^2}W_N^2\left( {R_x^2 + R_y^2 + R_z^2} \right)}}

The velocity (Vⁿ⁺¹, Vⁿ⁺¹, Vⁿ⁺¹) is calculated using the derivate estimates and the average of the previous velocity.

The spatial and temporal derivatives (R_x, R_y, R_z, R_t) are computed using the 3D GVF snake model, which is described in more detail in the following subsection, to reduce artifacts brought on by the speckle noise throughout an image frame.

3D GVF snake model

For ACs, which are curves that move in an image, the GVF [30] is a new external force field. It is also referred to as a snake or deformable model. It is widely used for boundary detection and segmentation and aims to reduce their energy. The contour is then deformed iteratively to fit the object's boundary while minimizing an energy function. The energy function consists of two terms: an internal energy term that penalizes the contour's bending and stretching, and an external energy term that attracts the contour to the object's edges. The GVF snake model uses gradient vector flow to calculate the external energy term. The gradient vector flow is a vector field that points in the direction of the maximum gradient of the image. By using the gradient vector flow, the GVF snake model can track objects with complex shapes and irregular boundaries. The GVF is employed to compute the shifts in pixel values along the x, y, and z axes. The energy function that reduces to the GVF is represented by the vector field $\vec{G} (x, y, z) = [u (x, y, z), v (x, y, z), w (x, y, z)]$ {\bar {\text G}}\left( \text{x},~\text{y},~\text{z} \right)=\left[ \text{u}\left( \text{x},~\text{y},~\text{z} \right),\text{v}(\text{x},\ \text{y},\ \text{z}),\text{w}\left( \text{x},~\text{y},~\text{z} \right) \right] [26]. (15) $\in f = ρ {| \nabla \vec{G} |}^{2} + {| \nabla f |}^{2} {| \vec{G} - \nabla f |}^{2} dxdydz$ \in f = \rho {\left| {\nabla {\rm{\vec G}}} \right|^2} + {\left| {\nabla {\rm{f}}} \right|^2}{\left| {{\rm{\vec G}} - \nabla {\rm{f}}} \right|^2}{\rm{dxdydz}}

The trade-off between the first and second terms in the integrand is controlled by the parameter ρ, where f represents the 3D volume, ∇ stands for the gradient operator, and ρ is the regularization parameter. The following Euler equations must be resolved to determine this vector field. (16) $\begin{array}{l} ρ \nabla^{2} u = (u - f_{x}) {| \nabla f |}^{2} \\ ρ \nabla^{2} v = (v - f_{y}) {| \nabla f |}^{2} \\ ρ \nabla^{2} w = (w - f_{z}) {| \nabla f |}^{2} \end{array}$ \begin{array}{*{20}{l}}{\rho {\nabla ^2}{\rm{u}} = \left( {{\rm{u}} - {{\rm{f}}_x}} \right){{\left| {\nabla f} \right|}^2}}\\{\rho {\nabla ^2}{\rm{v}} = \left( {{\rm{v}} - {{\rm{f}}_y}} \right){{\left| {\nabla f} \right|}^2}}\\{\rho {\nabla ^2}{\rm{w}} = \left( {{\rm{w}} - {{\rm{f}}_z}} \right){{\left| {\nabla f} \right|}^2}}\end{array} where ∇² stands for the Laplacian operator, and |∇f|² = (f² + f²), (f_x, f_y, f_z) stands for the image's edge maps in the $x y z$ x\;\;y\;\;z appropriate directions. We obtain this by extending the Euler equations as time functions [28]. (17) $\begin{array}{l} u (t + δ t) - u (t) + ρ \nabla^{2} u (t) + (u (t) - f_{x}) {| \nabla f |}^{2} = 0 \\ v (t + δ t) - v (t) + ρ \nabla^{2} v (t) + (v (t) - f_{y}) {| \nabla f |}^{2} = 0 \\ w (t + δ t) - w (t) + ρ \nabla^{2} w (t) + (w (t) - f_{z}) {| \nabla f |}^{2} = 0 \end{array}$ \begin{array}{*{20}{l}}{{\rm{u}}\left( {{\rm{t}} + \delta {\rm{t}}} \right) - {\rm{u}}\left( {\rm{t}} \right) + \rho \;{\nabla ^2}{\rm{u}}\left( {\rm{t}} \right) + \left( {{\rm{u}}\left( {\rm{t}} \right) - {{\rm{f}}_x}} \right){{\left| {\nabla f} \right|}^2} = 0}\\{{\rm{v}}\left( {{\rm{t}} + \delta {\rm{t}}} \right) - {\rm{v}}\left( {\rm{t}} \right) + \rho \;{\nabla ^2}{\rm{v}}\left( {\rm{t}} \right) + \left( {{\rm{v}}\left( {\rm{t}} \right) - {{\rm{f}}_y}} \right){{\left| {\nabla f} \right|}^2} = 0}\\{{\rm{w}}\left( {{\rm{t}} + \delta {\rm{t}}} \right) - {\rm{w}}\left( {\rm{t}} \right) + \rho \;{\nabla ^2}{\rm{w}}\left( {\rm{t}} \right) + \left( {{\rm{w}}\left( {\rm{t}} \right) - {{\rm{f}}_z}} \right){{\left| {\nabla f} \right|}^2} = 0}\end{array}

The second term will predominate in ɛ_f if the edge map of the image, ∇f, is large. If ∇f is low, the first term will dominate. ∈_f,

To deliver a smooth vector field [16], the energy function must be minimized. The vector fields give information about changes in the adjacent field. The Pixel-Mixed-Device (PMD) camera presents a simple and compact network in addition to providing adequate information about the obstacles. To determine the moving obstacle's speed, a new algorithm is used. As soon as the moving obstacle enters the camera's FOV, the path planning algorithm begins to calculate its motion from the vertices (x, y) and saves them in a temporary list. Finally, an array containing the calculated vertices of the moving obstacle is created. The range information is used to calculate the moving obstacle's speed and direction.

IV.

Results

Quantitative results

To evaluate the performance of the method, we have conducted a set of experiments with the Middlebury datasets [RubberW hale, Dimetrodon and Urban2], as shown in Figure 4.

Middlebury datasets: (a) Rubber Whale (b) Dimetrodon (c) Urban2.

To compare the computed flow vectors from various methods to the ground truth Middlebury datasets [4], the AAE, ST DAE, and EPE are measured. (18) ${AAE}_{(W_{e}, W_{gt})} = arccos (\frac{W_{e}^{T} W_{gt}}{‖ W_{e} ‖ \cdot ‖ W_{gt} ‖})$ AA{{E}_{\left( {{W}_{e}},\ \ {{W}_{gt}} \right)}}=\text{ }\!\!~\!\!\text{ arccos }\!\!~\!\!\text{ }\left( \frac{W_{e}^{T}{{W}_{gt}}}{\left\| {{W}_{e}} \right\|\cdot \left\| {{W}_{gt}} \right\|} \right) (19) $STDAE = \sqrt{\frac{1}{n - 1} {\sum_{i = 1}^{n} (A E_{i} - \bar{AE})}^{2}}$ STDAE = \sqrt {\frac{1}{{n - 1}}{{\sum\limits_{i = 1}^n {\left( {A{E_i} - \overline {AE} } \right)} }^2}} where AE stands for angular error and n stands for the element's sample count. (20) $\begin{array}{l} \bar{AE} = \frac{1}{n} \sum_{i = 1}^{n} (A E_{i}) \\ EPE = \sqrt{{(u_{e} - u_{gt})}^{2} + {(v_{e} - v_{gt})}^{2}} \end{array}$ \begin{array}{*{20}{l}}{\overline {AE} = \frac{1}{n}\sum\limits_{i = 1}^n {\left( {A{E_i}} \right)} }\\{EPE = \sqrt {{{({u_e} - {u_{gt}})}^2} + {{({v_e} - {v_{gt}})}^2}} }\end{array} where W_e = (u_e, v_e, 1) denotes the estimated flow vector and W_gt = (u_gt, v_gt, 1) is the ground truth flow vector. The errors are illustrated in Table 1 and plotted in Figure 5.

Table 1.

Errors for Different Methods Compared with Middlebury Data Using MATLAB. [Regularization Parameter in GVF (μ = .01), Weighting Term (α = .01)]

(μ = 0.01)(α = 0.01)	AAE	STDAE	EPE	AAE	STDAE	EPE	AAE	STDAE	EPE
Lukas–Kanade	0.3128	0.25895	0.596	0.6691	0.3653	1.535	0.8884	0.5021	8.000
Horn–Schunck	0.3151	0.24985	0.603	0.7141	0.3602	1.625	0.9601	0.6128	8.038
LK–HS	0.3059	0.2625	0.586	0.6632	0.5202	1.627	0.8873	0.7030	8.064
LK–HS–GVF	0.3006	0.2585	0.576	0.6291	0.3086	1.526	0.8765	0.5381	7.870

AAE, average angular error; EPE, endpoint error; STDAE, standard deviation angular error.

Error Plots for Middlebury Datasets (a) Rubberwhale Dataset (b) Dimetrodon Dataset (c) Urban2 Dataset.

Scene flow from RGB Depth image (using Kinect sensor)

Numerous tests using real Kinect depth data are conducted with various image sizes and lighting conditions to assess the effectiveness of our dense scene flow estimation method. When accessing depth information, the OpenNI library is used, and when processing image sequences, the OpenCV library is used.

By moving a paperboard in front of the sensor backward and forward while varying the lighting (light is on fully and light is on partially), the proposed dense scene flow method is evaluated in real-world scenarios. The screenshots of dense scene flow in the X–Y, X–Z, and Y–Z directions with different lighting conditions are shown in Figures 6 and 7. The findings demonstrate that the method can function effectively with objects with low texture and in any lighting situation.

Dense Scene flow in Various Directions (Bright Light); the Coordinate on the Bottom Left is the Camera Coordinate. (a) The x-flow and y-flow components are horizontal and vertical, respectively. To conduct this experiment, a 45º translation of the paperboard's x and y coordinates was made in the direction of the camera plane. (b) The z-flow component is horizontal, and the x-flow component is vertical. For this experiment, a combined motion in y and z coordinates was used to translate the paperboard in the direction of the camera plane. (c) The y-flow component is horizontal, and the z-flow component is vertical. To conduct this experiment, a combined motion in x and z coordinates was applied to the paperboard to translate it toward the camera plane.

The Camera Coordinate Is in the Lower Left Corner of a Dense Scene Flow in Various Directions with Partial Light. (a) The x-flow and y-flow components are horizontal and vertical, respectively. The paperboard was moved toward the camera plane while conducting this experiment with a 45º motion in the x and y coordinates. (b) The horizontal component is z-flow and the vertical component is x-flow. Doing this experiment, the paperboard was translated toward the camera plane with a combined motion in y and z coordinates. (c) Z-flow is the horizontal component, and x-flow is the vertical component. For this experiment, a combined motion in y and z coordinates was used to translate the paperboard in the direction of the camera plane.

Dense Flow Estimation (using PMD Camera)

A person is shown entering the scene from the opposite end and making their way toward the Autonomous Ground Vehicle (AGV). The objective is to identify and gauge the person's movement. Consequently, the camera is used to calculate the relative distance between the vehicle and the moving subject. The (X, Y, Z) axes of the camera frame are defined as being in the direction of the optical axis and the image axes, respectively, and (+)ve Z pointing in the direction of vision. The camera frame's origin is at the optical center. In the camera reference frame, a surface point's coordinates at time (t − 1) are (X′, Y′, Z′), and at time (t), they are (X, Y, Z). By utilizing the scene flow technique, the 3D information is used to estimate the resultant velocity as the camera detects the person's coordinates at various frame intervals. The density of the flow vectors between two various frame sequences that are computed using various techniques is shown in Figure 8.

Flow Vectors Are Shown with Close Objects Shown in Red and Far Objects Shown in Blue Using Various Methods. (a) Depth frame 1 with color coding. (b) Depth frame 2 with color coding. (c) Moving Obstacle Segmentation. (d) LK. (e) HS. (f) Combined Lk-HS. (g) Combined Lk-HS with GVF. GVF, gradient vector field; Lk-HS, Lucas–Kanade; Horn–Schunck.

Experimental Results

We experimented with different scenarios to gauge the performance of the suggested algorithm. The PMD camera's three-dimensional coordinates (X, Y, and Z) are used to reconstruct the three-dimensional scene flow. The dynamic obstacle is allowed to move at different known speeds within the FOV of the camera. The PMD camera is used as an external sensor, with a frame rate for the scene flow of 10 frames per second (fps).

In Experiment I, we use the ARIA library, a C++-based Pioneer mobile robot platform with the GLFW (Open Graphics Library), to create the scene flow technique to see 3D renderings of the PMD range data. The P3DX robot moves diagonally from left to right with regard to the 3D PMD camera coordinates, which are depicted as pale green lines in Figure 9 and Figure 10. These motion field estimation findings are shown for consecutive frames of moving obstacles. In these sequences, the obstacle's estimated position and orientation are represented by red lines; the obstacle's estimated speed is 200 m/s with a 10 % offset.

Experiment I: (a–c) Range Images with 200 × 200 Pixels; (d–f) Grayscale Images.

Experiment I: 3D Renderings of PMD Camera on GLFW Window. (a) GLFW Frame 1.(b) GLFW Frame 2. (c) GLFW Frame 3.

Then, a moving obstacle is allowed to move in the FOV of the camera, and different frames are captured, as shown in Figure 11. The moving obstacle's orientation can potentially be identified, and Figure 12 illustrates how its location vertices are shown in three dimensions. The results cited [15] demonstrate that the suggested method can detect moving obstacles in real-world scenarios and evaluate kinematic behavior. Figure 13 delineates the dense flow vectors that are extracted by combining scene flow and the GVF snake model from the subsequent frames to segment a moving obstacle. The gradient vector flow of the 3D image is calculated with the regularization parameter, ρ of 0.1 and α of 0.001. So, the proposed technique recognizes the orientation of the moving obstacles and provides information about the structure of the environment, and therefore, it is used for dynamic path planning in the following experiments.

Different Frames of a Moving Obstacle. (A–J) Frame 1 to Frame 10.

Obstacle Segmentation: Dense Flow Vectors in 3D Plot.

The P3DX controller's parameters are left alone during the experiment. By comparing features in succeeding images, it is possible to calculate the ego-motion of the PMD camera mounted on the P3DX. By comparing two consecutive frames, the feature detection algorithm known as “Good Features to Track” stabilizes moving P3DX. The goal of the mission is to use the graph search (Efficient D* lite [14]) algorithm on the P3DX's onboard computer to plan an P3DX's collision-free path to its desired position. To visualize 3D renderings of the PMD range data, the ARIA library, a C++-based Pioneer mobile robot platform, and the GLFW (Open Graphics Library) have been used.

As shown in Figure 14, the experiment involves performing an autonomous path planning task while exposing a PMD camera to dim night lighting conditions. The Efficient D* lite algorithm, which is implemented on the P3DX's onboard computer, is tasked with creating a path for the P3DX to travel that avoids collisions and gets it to its desired position. For the experiment, the AGV's maximum linear speed is 400 mm/s. P3DX must move from position (0, 0) to position (9 m, 0), dodging two static obstacles (i.e., tripod at (2.5, 0) and obstacle = (5.8, 0), with (x, y) being the coordinates in meters for each. When it discovers the obstacles, the P3DX adjusts its route. Real-time range data from the sensor are displayed as range intensity frames in Figures 15a–d, as well as 3D point clouds in Figures 15e,f. Tripod and moving obstacle are detected while navigating to its destination and P3DX completes this task (Figure 15g) without running into any obstacles.

Experiment II: Path Planning with Two Irregular Shaped Obstacles. (A–E) Frame 1 to Frame 5.

Experiment II: Real-Time Rendering of PMD Range Data Using GLFW. (a) Range intensity: Tripod (far). (b) Range intensity: Tripod (near). (c) Range intensity: P3DX(2) (far). (d) Range intensity: P3DX(2) (near). (e) 3D point cloud of obstacle: tripod. (f) 3D point cloud of obstacle: P3DX(2). (g) Experiment II: plot of AGV's X-coordinates vs Y-coordinates. P3DX, Pioneer 3DX.

Finally, Experiment III is carried out in a situation with a moving obstacle. With its X and Y coordinates in meters, P3DX is programmed to move from an initial position of (0, 0) to a goal position of (9.0 m, 0). The dynamic obstacle is estimated by the scene flow technique to move at a speed of 694 mm/s and be slightly oriented to the right of P3DX. The path planning algorithm alters P3DX's path to the left to avoid the moving obstacles as it anticipates the possibility of a collision at the coordinates (4.5 m, 0). Figure 16 shows the processed sensor data rendered.

Experiment III on Moving Obstacle: 3D Renderings of PMD Camera.

Conclusion

Generally, scene flow is a method to examine how objects move within a scene. It entails calculating the depth and 3D motion of objects in a succession of images or video frames. Besides, the GVF snake model is a method for object tracking and contour detection. It is based on the theory of ACs, also known as snakes, which are curves that change over time to fit the boundaries of objects in an image. The primary benefit of the GVF snake model is that it can reliably track objects with intricate shapes and textures, even when there is noise, occlusion, or partial occlusion. Our approach is to combine the techniques to create dense flow vectors by using depth data from a 3D camera, and the current experiment shows a novel framework for dense scene flow estimation and obstacle detection. To identify and segment the boundary of the obstacles, the GVF snake model is used to enhance the obstacle detection and to obtain the dense flow vectors in the 3D coordinate. To raise the autonomy of the AV's intelligence, these data can be widely applied to the path planning module. The purpose of the suggested research method demonstrates the efficient and accurate detection and estimation of position and direction of the obstacles. As a result, the proposed dense scene flow algorithm can be used for AVs that can navigate their surroundings and interact with objects in a more human like way to understand the environment, which allows the AVs to detect and track obstacles and plan their path accordingly.

eISSN:: 1178-5608
Sprache:: Englisch

Zeitrahmen der Veröffentlichung:: Volume Open
Fachgebiete der Zeitschrift:: Technik, Einführungen und Gesamtdarstellungen, andere

Zeitschrift RSS Feed

Dense Scene Flow Estimation with the GVF Snake Model for Obstacle Detection Using a 3D Sensor in the Path-Planning Module

Article Category: Research Article

Online veröffentlicht: 20. Nov. 2023

Seitenbereich: -

Eingereicht: 01. Mai 2023

DOI: https://doi.org/10.2478/ijssis-2023-0015

Schlüsselwörter
Scene Flow, Obstacle avoidance, 3D vision sensor, path planning

© 2023 Sobers Francis et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Figure 1:

Figure 2:

Figure 3:

Figure 4:

Figure 5:

Figure 6:

Figure 7:

Figure 8:

Figure 9:

Figure 10:

Figure 11:

Figure 12:

Figure 13:

Figure 14:

Figure 15:

Figure 16:

Dense Scene Flow Estimation with the GVF Snake Model for Obstacle Detection Using a 3D Sensor in the Path-Planning Module

Article Category: Research Article

Online veröffentlicht: 20. Nov. 2023

Seitenbereich: -

Eingereicht: 01. Mai 2023

DOI: https://doi.org/10.2478/ijssis-2023-0015

SchlüsselwörterScene Flow, Obstacle avoidance, 3D vision sensor, path planning

© 2023 Sobers Francis et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Figure 1:

Figure 2:

Figure 3:

Figure 4:

Figure 5:

Figure 6:

Figure 7:

Figure 8:

Figure 9:

Figure 10:

Figure 11:

Figure 12:

Figure 13:

Figure 14:

Figure 15:

Figure 16:

Schlüsselwörter
Scene Flow, Obstacle avoidance, 3D vision sensor, path planning