Cite

Introduction

3D Reconstruction refers to an important work in the field of 3D computer vision, which aims to restore and reconstruct some 3D objects or 3D scenes. The reconstructed models can be easily processed by computers and expressed digitally. 3d reconstruction technology is the key technology to establish and express the objective world by using computer, including object reconstruction in 3D scene, dynamic reconstruction of human body, reconstruction of large buildings and so on. 3D reconstruction technology through the acquisition of depth data, pre-processing, feature extraction and matching, point cloud fusion reconstruction and other processes. Is to depict real scenes as mathematical models that can be logically expressed by computers, The application fields cover cultural relic protection, game development, architectural design, aerospace, shipbuilding, archaeology, robotics, virtual reality (VR), augmented reality (AR), reverse engineering, computer animation, surveying and mapping industry, engineering measurement, clinical medicine and other research to play an auxiliary role. The 3D reconstruction algorithm based on SLAM takes the camera as environment sensing SLAM and turns it into visual SLAM. In this way, it not only focuses on scene reconstruction but also can get the motion track of the camera when acquiring image data. It is a 3D reconstruction algorithm compatible with positioning and mapping, and has a good application prospect.

The ORB - SLAM were reviewed

The ORB-SLAM2 (Oriented Fast View Brief SLAM2) is a monocular visual SLAM system based on image features and nonlinear optimization, including ORB feature extraction, ORB image feature bag for location recognition and loopback detection, information association view, G2O image optimization general framework. Scale - aware loop detection is applied to large-scale map construction. This algorithm only uses ORB as feature detection and description in all the processing, which can improve the effect of location recognition and loopback detection. This paper summarizes three important steps of ORB-SLAM algorithm: tracking thread, local map thread, closed loop detection thread and dense map construction thread. As shown in Figure 1.

Figure 1.

Key threads for ORB-SLAM

Track the thread

In tracking thread part mainly through the depth sensor RGB camera ORB feature extracting and feature matching, and triangulation to establish a link between characteristics of map point depth and 3 d coordinate, follow-up images by tracking motion model, reference key frames and relocation camera pose, with a minimum weight projection error to optimize the current position, Then determine whether to generate a new key frame according to the preset conditions.

Local map thread

Local map thread part is mainly on camera RGB camera get key frames is dealing with the key frames tracking thread creation, update the map point and the corresponding relationship between the key frames, delete maps the newly added observations are less map point, then to the high degree of common view of key frames by triangulation restores map, Repeat map key points of key frames and adjacent key frames are checked. When all key frames in the key frame queue are processed, local trapped set adjustment is performed for the current key frame, adjacent key frames and observed map points. Finally, the pose of key frames and map point accuracy are optimized by minimizing reprojection error. In the process of map building, the whole map building process is divided into three parts: the first part is the driving part, responsible for driving the sensor, here is replaced by the data set; The second part is pose estimation. ORB_SLAM2 obtains the camera data and outputs the camera pose. Finally, 3d scene recovery is realized by drawing part to provide environment map for robot navigation in complex scene.

Closed loop detection thread

Local map processing thread insert key frames, mainly includes three processes, respectively is closed loop testing, calculation of similarity transformation matrix, and the closed loop correction, the closed-loop detection is by calculating the word bag similar score candidate key frames, then for each key frame to calculate similarity transformation matrix, by random sampling to select the optimal consistency of key frames, Finally, the key frame pose was optimized by the Essential Graph, and the global trapped set adjustment was performed to obtain the global consistent environment map and camera running track.

Dense drawing threads

Dense built figure thread is to perform map point depth range search key frames, then match the price within the depth range, perform keyframe the initial depth, according to the principle of similar depth of adjacent pixels to obtain the initial depth map for inverse depth fusion adjacent pixels and filling vacant pixels, through the depth of the adjacent key frames optimal depth information fusion, Furthermore, the final depth map was obtained by intra-frame filling and external point elimination. Finally, the 3d reconstruction of indoor environment was obtained by point cloud splicing. The specific steps for dense reconstruction are as follows:

Step 1: Estimation of scene depth range: The image obtained by the RGB camera in the depth camera is input as the key frame, and every map point observed by the key frame at any time is projected into the key frame image. The depth value of the map point under the coordinates of the key frame is calculated, and the maximum or minimum depth is selected to set the inverse depth search range of the scene (ρmin, ρ Max).

Step 2: Feature point matching: extract special points in the process of camera movement.

Step 3: Fill and eliminate the depth map by smoothing inside the key frame and eliminating the outer points.

Step 4: Optimize the depth information of the key frame on the basis of the position and pose of the key frame calculated by the tracking thread.

Step 5: Smooth the key frame, remove the outer points, and fill and eliminate the depth map obtained from the depth data.

RGBD+IMU sensor information fusion

The dense 3D point cloud reconstruction based on the fusion of DEPTH sensor and inertial data in THE ORB-SLAM2 algorithm adds angular velocity and acceleration measurement information of IMU data in addition to input RGB data and DEPTH data. RGB data and IMU data are tightly coupled. And adopt the method of optimization for the ORB – SLAM2 information system of tracking, built figure and loopback detection of the three parallel threads, angular velocity and acceleration of the join of IMU data can improve the mobile devices pose estimation precision and real-time performance, at the same time the depth data to build dense point cloud, in under the action of both real-time indoor scene of 3 d reconstruction, Figure 2. shows the structure of 3D reconstruction by fusion of depth data and IMU data:

Figure 2.

STRUCTURE diagram of 3D reconstruction

Innovative marketing integral beforehand

Innovative marketing module data including 3 axis gyroscope, 3 axis accelerometer and magnetometer sensor are independent of each other and not easy to influence each other between inertial sensor, the integral solution of displacement and rotation, with the passage of time accumulated error is bigger, data are available in a short period of time only, and magnetometers easily affected by the local environment, need correction in advance, Therefore generally do not adopt the method of integral calculation pose directly, instead of multi-sensor fusion method can get good posture calculation results, the current in order to achieve good posture combination of inertial sensor fusion method which USES the visual sensor, vision sensors in feature points clear, feature rich texture of effect is better, Met feature points such as glass, white walls, but fewer cases, recovery scenarios are defective or not working, and IMU sensor use for a long time has very big accumulated error, but the advantage in a short period of time when the displacement data of high accuracy, so adopt the way of visual and inertial data fusion compensate each other, can get good posture calculation results.

The acceleration of IMU coordinate system can be obtained by the accelerometer and gyroscope of IMU, as shown in the following formula (3-1). On the left is the direct output of IMU, RWB(t) is the rotation matrix from IMU coordinate system to the world coordinate system, while RWB(t)T {\rm{R}}_{{\rm{WB}}({\rm{t}})}^{\rm{T}} is the rotation matrix representing the world coordinate system to IMU coordinate system. wa(t) represents the acceleration in the world coordinate system, wg(t) represents the gravity vector in the world coordinate system, ba(t) and ba(t) represent the offset of the accelerometer and gyroscope, ηa(t) and ηg(t) represent the noise of accelerometer and gyroscope respectively, and the effect of earth rotation is ignored, thus assuming that the world coordinate system is an inertial system. The formula is as follows: Bα(t)=RWBT(t)(Wα(t)wg)+ba(t)+ηa(t) {B^{\alpha (t)}} = R_{WB}^T(t)({W^{\alpha (t)}} - wg) + {b^a}(t) + {\eta ^a}(t) BϖWB(t)=BϖWB(t)+bg(t)+ηg(t) {B^\varpi}WB(t) = {B^\varpi}WB(t) + {b^g}(t) + {\eta ^g}(t)

Visual inertia is tightly coupled

The tight coupling based on visual inertial sensor fusion uses RGB image for feature extraction and feature matching and combines IMU information for joint optimization. Tightly coupled optimizations in trace threads include map updates and no map updates.

Map updates

The tracking thread can be divided into two situations: map update and no map update. At the beginning, there was no map update, and map update was completed in local map building and closed-loop thread. When the map is updated, the overall optimization state vector should be constructed first, including rotation, translation, accelerometer paranoia and gyroscope paranoia at the current moment J. The overall error equation includes visual reprojection error and IMU measurement error, and the last moment is expressed. θ={RWBj,ppBj,w=vBj,bgj,baj} \theta = \{R_{WB}^j,\,{pp}_B^j,\,w = v_B^j,\,b_g^j,\,b_a^j\} θ*=argθmin((kEproig(k,j)+EIUM(i,j)) {\theta ^*} = {\arg_\theta}\,\min \,((\sum\limits_k {{E_{proig}}({\rm{k}},j) + {E_{IUM}}(i,j}))

When no map is updated

When there is no map update, the overall optimization state vector is constructed, including the rotation, translation velocity displacement, accelerometer bias and gyroscope bias of the current moment J +1 and the last moment J. The overall error equation includes visual reprojection error and IMU measurement error, as shown in the formula: θ={RWBj,PWj,vWj,baj,RWBj+1,vWj+1,bgj+1,baj+1} \theta = \{R_{WB}^j,\,P_W^j,\,v_W^j,\,b_a^j,R_{WB}^{j + 1},\,v_W^{j + 1},\,b_g^{j + 1},\,b_a^{j + 1}\}

Improved ORB-SLAM builds dense 3D models

In order to be able to make no matter in a wide range of mobile robot and small scale and unknown environment online implementation of high precision positioning and reconstruction, need to dense, sparse map as sparse map based on feature points cannot be applied in practical fields such as robot navigation, path planning, so dense was carried out on the map is very necessary. Visual SLAM is the simultaneous localization and mapping, so who is the positioning? If the camera is on the robot, position the robot. If is holding a camera is a fixed camera, similarly built figure is also set up the camera or robot through the map, the two collections, can determine the robot in a certain location on the map as well as the continuous motion trajectory in the scene, although using SLAM built figure can very clearly see the camera or the trajectory of the robot, key frames, feature points, However, the visual information provided is less, and the established map is not clear and intuitive. Therefore, it is necessary to convert the sparse map into a dense THREE-DIMENSIONAL point cloud map, and splice the point cloud information of key frames. The generated point cloud map can clearly see the surrounding visual features and obtain a good visual experience. According to the internal parameters of the camera, the corresponding relationship between three-dimensional point cloud and two-position coordinates can be calculated, as shown in Formula.

By the location of the camera is run by generating each key frames of the point cloud, need to remove the depth value is too large or invalid points, and then use statistical filter to remove away from the point of “the masses”, which is isolated points, each point by statistical distance and that point with other recent point distribution, keep a distance precision of points, Discarding the relatively far distance between the point, because the key frames overlap, so the points of the chapter, there are a lot of position close, will occupy a lot of memory and consumption space position, so it is necessary to drop the sample using voxel filtering, the advantages of speed filter is to ensure that each individual element within only a point, The space consumption of memory is reduced by down sampling of space, the pose of camera is calculated in the visual odometer and back-end optimization part, and the global point cloud can be obtained by splicing the point cloud data. The formula for calculating three-dimensional point cloud from two-dimensional color images and depth images is shown as below (6): {z=dsx=(ucx)(zfz)y=(ucy)(zfz) \left\{{\matrix{{z = {d \over s}} \cr {x = (u - {c_x}) \cdot ({z \over {{f_z}}})} \cr {y = (u - {c_y}) \cdot ({z \over {{f_z}}})} \cr}} \right.

Among them: fx, fy, fx, fy for camera inside, (u, v) as the image coordinates of (x, y, z) coordinate system for the image coordinates, d for depth camera measured pixel distance, s for the actual distance and the proportion of the measured distance d sparse coefficient, from the point cloud camera coordinate system to the global coordinate system transformation formula of point cloud is shown in the following (7): xw,j=Tw,ciXci,j {x_{w,j}} = {T_{w,ci}}{X_{ci,j}}

Where, Tw,ci is the pose of the ith key frame, Tw,ci is the point cloud in the coordinate system of the ith key frame, and Tw,ci is the point cloud obtained in the global coordinate system after transformation.

Aiming at the deficiency of ORB SLAM algorithm, a method of rapidly constructing dense 3D map of unknown environment is added on this basis. The so-called dense map is to construct the corresponding point cloud for each key frame, and then assemble all the point clouds according to the key frame location information obtained from ORB SLAM2 to form a global point cloud map. The whole algorithm steps are as follows Figure 3.

Figure 3.

ORB-SLAM algorithm flow

Build a dense point cloud map by adding the PointCloudMapping.cc class to the SRC folder in the ORB-slam2 algorithm, which contains some member variables and member functions. This paper build a dense real-time 3 d reconstruction module is inserted into the key frames (mobile camera or robots get key frames) and the corresponding color and depth information, in according to the algorithm of tracking, BA optimization and key frames after the loopback detection and real-time update dense 3 d reconstruction module, complete 3 d point cloud splicing generates dense module.

In this paper, Realsense D435 is used as a sensor device. Firstly, the ORB-SLAM algorithm is used to extract key frames and obtain the pose of the robot. Then, whether the movement of the robot exceeds a certain threshold value is judged. The large scale map is decomposed into small maps of a certain size, thus reducing the operation time of saving the map and building the map, thus improving the overall performance of the algorithm.

According to the above theory, this chapter adopts the open dataset TUM of Technical University of Munich to conduct dense reconstruction test, and selects fr1_room, FR1_floor, and RGBD_DATASet_freiburg3_teddy. The results of sparse point cloud reconstruction and dense reconstruction are compared.

Figure 4.

Dense reconstruction results of data set FR1_room

Figure 5.

Dense reconstruction results of data set FR3_office

Figure 6.

Reconstruction results of dataset RGBD_DATASet_freiburg3_TEDDY

The establishment of dense point cloud map plays an important role in the robot's perception and positioning of the surrounding environment. Compared with sparse point cloud map, it can make people understand the surrounding environment more intuitively. The establishment of THREE-DIMENSIONAL model can allow people to observe the real world from multiple angles and provide an immersive experience. It is of great significance to explore the unknown world with robots.

Experimental process and steps

All project need to use experiment to verify the overall performance of the system, this chapter will experiment link, the first to experiment environment is introduced and the hardware and software for laboratory use, through to the open source test data sets, and then carry on indoor mobile robot real scene assessment, this paper compares and analyzes so as to verify the reliability of this system.

Hardware experimental environment

This article USES the experimental equipment is independently developed by the indoor mobile robot, and above the robot of the original part of the reform so as to build the experiment platform of the topic, because this topic considering the hardware sensors and integration between software development is practical, need for robot original laser radar sensor will be demolished, Control system is a model for the small new Air15 lenovo workstation PC platform is used to control the robot on the surrounding environment scanning, the PC platform for Intel processor i5, run for 16 gb memory, graphics card memory is 4 gb, hardware platform is made of the robot include the robots, robot control, mobile robot perception of these three parts, Among them, Microsoft InteRealsense Depth Camera D435 is used as the main sensor, while the IMU sensor of MODEL MPU9250 is installed. The bottom of the robot is composed of chassis and quadrupedal wheels, which can realize 360° free movement of the robot.

Hardware configuration parameters of the robot

Hardware Model
Upper-computer workstation Lenovo Air15
IMU sensor MPU9250
Depth sensor Realsense D435

The Inter Realsense Depth Camera D435 is the preferred solution for applications such as phase robot navigation and object recognition. The Realsense D435 offers global shutter sensors and a larger lens for better low-light performance than cheaper D415 cameras. The D435 also features the more powerful RealSense module D430. The D435 camera has four circular holes on its front, running from left to right. The first and third are the IR Stereo Cameral. The second is an IR laser Projector, and the fourth is a color camera (color sensor). As shown below:

Figure 7.

Depth camera schematic diagram

Software experiment environment

This paper uses lenovo Xiaoxin Air15 computer to build Windows 10 home edition and Ubuntu18.04 mirror double operating system as the system software experimental platform. Compared with building virtual machines (Vmware Workstation Pro), Vmware workstation Pro has a faster loading speed, which is convenient, fast, safe and free, and has a convenient configuration environment. Ubuntu18.04 features simple and beautiful interface, convenient configuration environment, and a large number of third-party libraries and dependencies, including pangloin0.5, eigen3.4.5, OpenCV, etc. Eigen provides fast linear algebra operations on the matrix, but also includes the steps to solve the equation, many of the upper software library is also using Eigen matrix operations OpenCV in this paper is mainly on Realsense D435 depth image processing and optical flow tracking.

Software system configuration

The name of the software Model and Version
The operating system Ubuntu18.04
Robot operating system ROS-Melodic
Development programming language c/c++
opencv
Visual library to use Point cloud library PCL
Sensor calibration experiment

In this experiment, the calibration of sensors is a link that cannot be ignored. For the calibration of cameras, a checkerboard calibration grid should be prepared in advance, as shown in the figure 8.

Figure 8.

Schematic diagram of checkerboard calibration board

In this project, the calibration plate is fixed and images at different positions and angles of the calibration plate are collected by moving cameras. In this paper, 15 pictures from different angles are selected, as shown in Figure 9.

Figure 9.

Schematic diagram of multi-angle calibration plate

Then, the calibration algorithm was used to extract corner information of each checkerboard calibration board, calculate the homography matrix of each image, and calibrate the external parameters and distortion parameters of the camera according to the size of the checkerboard, as shown in Figure 10. It can be seen that the corner information of the calibration board was extracted accurately.

Figure 10.

Corner extraction of the calibration board

According to the reprojection error test, the average error is 0.28213 pixels, which meets the requirements of this experiment. The experimental calibration results are shown in Table 3.

Calibration results of the realsense D435 camera

The camera parameters The calibration results
Internal 605.894 0 317.887
0 606.856 256.416
0 0 1
Distortion parameter 1 0.15689
Distortion parameter 2 −0.3425
Mean reprojection difference 0.28213
Multisensor calibration

Of multiple sensors is mainly on the camera and IMU sensor calibration, through access to the inside of the camera, in order to get more accurate camera calibration need to get the camera and joint of IMU sensor calibration for the transformation matrix, transformation matrix can give 3 d reconstruction system provides a good initial value, higher accuracy, In this paper, Kalibr tool is used for joint calibration of the two. Just now, internal parameters of the camera have been obtained. Again, TOPIC and IMU data in the image obtained by the camera are extracted through ROS communication. Sufficient rotation and translation camera can collect enough data from different angles to make the calibration results between sensors more accurate. The calibration results are shown in Table 4.

Calibration results of multiple sensors

The camera parameters The calibration results
Rotation matrix of camera and IMU 0.9999524 0.0042236 0.0012432
−0.0054268 0.9995236 0.00158425
−0.001426238 0.0017052 0.9999842
Translation matrix of camera and IMU 0.01238847 0.00842635 −0.02145263

Figure 11 shows the reprojection error diagram:

Figure 11.

Reprojection error diagram

Real data experimental results

In this paper, the RGB-D sensor loaded by the robot is used instead of the IMU sensor module to carry out dense THREE-DIMENSIONAL reconstruction. The displacement curves of the robot in the X and Y directions and the trajectory of the robot are shown in Figure 12. After several experiments, it was found that dense reconstruction using a separate sensor would appear as follows

Figure 12.

Displacement curve in XY direction

The effect shown in Figure 12, and the effect shown in Figure 13 is produced by using the multi-sensor fusion mode:

Figure 13.

Trajectory diagram of the robot

As shown in Figure 13, the total length of the robot's moving track is 5m, and the end point positioning error is 0.12m in the X direction and 0.16m in the Y axis direction, indicating that the algorithm in this paper can work stably and effectively in the process of robot movement.

By Fig 12, 13, 14 and table 5. shows that only rely on RGBD sensor vision camera, the colour image and depth map as the input for 3 d reconstruction, in the harsh conditions in complex scene and scene, camera is easy to track lost can't find between scanned, so will break, reconstruction of the map will not be able to use, However, when RGBD camera is constructed with IMU sensor for 3D reconstruction, the problem of image fracture and loss caused by the above single sensor can be alleviated in most implementations.

Experimental results of 3D reconstruction for different sensors

Sensor Track the percentage of partial breaks caused by loss
Without IMU 35%
With IMU 10%

Figure 14.

Top view of adding an IMU

Figure 15.

Top view of adding an IMU without an IMU

Summary

This paper mainly analyzes the SLAM scheme based on multiple visual sensors. Due to the defects of the pure visual scheme and its difficulty in stabilizing in various environments and practical scenes, the fusion of multiple sensors is adopted to construct 3D reconstruction. Finally, experiments are used to verify it. Pure vision will have fracture, while partial fracture caused by multi-sensor fusion tracking loss reaches 35%, and fracture caused by multi-sensor tracking loss reaches 10%. Compared with single sensor, the accuracy of multi-sensor fusion reconstruction is improved by 25%. It improves the accuracy and precision of indoor reconstruction.

eISSN:
2470-8038
Language:
English
Publication timeframe:
4 times per year
Journal Subjects:
Computer Sciences, other