From the initial generation of programmable teaching-reproducing robots to offline programming robots with specific sensing functions and adaptive capacities, until the intelligent robot after the mid-1980s, the robot has gone through approximately 60 years of evolution (Garcia-Haro et al., 2020; Rivai et al., 2020). The related technologies used in intelligent robots are constantly developing, such as multi-sensor information fusion, path planning, robot vision, and intelligent human–machine interfaces, due to the rapid growth of computers, information technology, artificial intelligence, and control theory. Intelligent robots with a range of sensors can efficiently respond to environmental changes through information fusion and have a strong self-adaptive, learning, and autonomous function (Nguyen et al., 2018; Nguyen et al., 2020). Recently, robots facilitated many applications in different fields including civil and military. In particular, service robots are very crucial in hazardous environments or in isolated areas, etc. that humans may not reach (Ahmad et al., 2018; Nguyen et al., 2021).
For both practitioners and scholars working with aviation applications, state-of-the-art solutions for transportation planning combining baggage services, routing, security, and safety are an expanding subject (Nguyen and Teague, 2015; Sendari et al., 2019). With the goal of contributing to improving service quality, reducing work pressure for staff at airports when the number of passengers is being considered, especially by providing flight-related information, along with other necessary information for passengers conveniently and quickly (Joosse et al., 2015; Triebel et al., 2016; Ivanov et al., 2017). In other words, the usage of service robots in airports, tourism, and hospital areas also brings many benefits. Due to the COVID-19 pandemic, reducing contact between staff and passengers, reducing fear and anxiety of passengers infected with COVID-19 (Meidute-Kavaliauskiene et al., 2021). Not much different from the public robot, which can give a voice command or on-screen map to go to a predetermined location in the airport (Muñoz Peña and Bacca Cortés, 2020) or the Korean robot Troika (South China Morning Post, 2017). Passengers can find information about the flight by scanning their flight tickets, then the robot will lead its passenger to the door of the plane if they have a need, but the interaction between the robot and the passenger is also a lot. Limitations do not meet requirements such as facial expressions, and positioning and moving to the desired destination requires a fairly large displacement space with slightly large errors and the use of many complex sensor devices.
On several important issues, researchers have tried to develop various techniques that relate computer vision capabilities to applications and products for tracking moving objects. Visual Assist is a technique for controlling the robot's movements using feedback sent from the vision sensor. In visual control, it is difficult to track a moving subject if the distance of the object cannot be reached (Joosse et al., 2015), in another study, the object to be tracked was the face of a standing or back-and-forth movement or back and forth in front of the camera lens. To estimate the distance from the object, tracking an object must be solved first. Model-based tracking methods require prior knowledge of object shapes for the matching process and for finding objects in the field. These studies also apply the method of calculating the focal length of the camera thanks to an initial face image taken at a given distance (Li et al., 2015). However, the accuracy would be reduced if it was replaced by a different person with a different face. From the above-mentioned starting points, this paper focuses on building an intelligent service robot system that, in addition to the functions that can automatically move the robot, also has the function of interacting with passengers and providing information about the service robot news (namely as IRobt), as shown in Figure 1. The robot includes some functions as (1) display flight schedule information for the day; (2) show station map; (3) guide to finding eateries, cafes, shopping stores; (4) notice of prohibited items on the plane; (5) providing passenger photography and emailing services; (6) display directions, location directions, locations, and information about terminal regulations and flight rules based on passenger inquiries. All the display instructions, location and location instructions, information on station regulations, and flight rules are based on frequent questions from passengers.
Figure 1
(a) 3D simulation image of iRobt. (b) Crafted iRobt images. (c) Location of sensors, (d) Posture and parameters of the robot in two coordinate system.

To interact with passengers, we recommend adding speech recognition via the Google Cloud Speech-to-Text API for language processing. and the job of leading passengers to the area requires using a superimposed convolutional neural network to recognize human faces and then based on the similarity ratio of two triangles to determine and track the distance from the robot to the passenger. The precision tracking movement control part is applied according to the methods previously published by the research team (Thanh et al., 2021) and in addition to avoiding unexpected obstacles is also presented in the paper.
The rest of this article is organized as follows. The section “The system descriptions” presents the electronic hardware structure of the multi-sensor and communication systems in robots. The section “Proposed method” summarizes the interaction between passenger and robot, face detection based on superposition neural network and distance prediction technique, then presents motion tracking and obstacle avoidance control. The section “Experimental results” provides the experimental results with the robot built and discussed. Finally, conclusions and future work are addressed in section “Conclustions and future work” (Figure 1).
To meet the application requirements for airport robots as mentioned in the section “Introduction”, we propose a sensor system and actuators that can meet those requirements. Figure 2 provides an overview block diagram of the robot system. In this section, we focus on the sensing system and the PID controller, as shown in Figure 2.
Figure 2
Block diagram of the robot system.

The optical encoder has an LED light source, a light detector, a “code” disc/wheel mounted on the shaft, and output signal processor, Figure 3. The disc has alternating opaque and transparent segments and is placed between the LED and photodetector. As the encoder shaft rotates, the light beam from the LED is interrupted by the opaque lines on the “code” disk before being picked up by the photodetector. This produces a pulse signal: light = on; no light = off. The signal is sent to the counter or controller, which will then send the signal to produce the desired function.
Figure 3
Structure of the rotary encoder.

In mobile robotics, the encoder is used to measure the movement (direction and speed) of each of the wheels of the robot. Determining the position of the robot by this encoder is a popular method in the world called the Odometry method (Qingqing et al., 2019; Thanh et al., 2021; Tran et al., 2022).
An ultrasonic sensor is an electronic device that measures the distance of a target object by emitting ultrasonic sound waves and converts the reflected sound into an electrical signal. Ultrasonic waves travel faster than the speed of audible sound (the sound that humans can hear). Ultrasonic sensors have two main components: the transmitter (which emits the sound using piezoelectric crystals) and the receiver (which encounters the sound after it has travelled to and from the target).
The ultrasonic sensors are used primarily as proximity sensors. They can be found in automobile self-parking technology and anticollision safety systems. Therefore, we use ultrasonic sensor SRF05 for the robot so that the robot can avoid obstacles, as illustrated in Figure 4.
Figure 4
Ultrasonic sensor.

The robot is designed in addition to the functions mentioned above, it also has the function of guiding the passengers to move in the airport. Accordingly, the robot both moves and tracks the passenger's face to maintain a distance from the passenger during movement. If the passenger stops or slows down, the robot will stop to wait, which is made possible by a method that predicts the distance from the image sensor to the person's face using the monocular camera (Pathi et al., 2019). We use the Logitech BRIO 4K Camera, as shown in Figure 5, installed on the robot to capture the passenger's face image.
Figure 5
Logitech BRIO 4K.

As the robot travels through these reference points, a magnetic sensor is added to determine the waypoints with known coordinates and readjust the route. The magnetic sensor used is a bar consisting of 12 hall sensors placed in a line and separated by a distance of
The hall sensors traveling above the reference point will be engaged when the robot passes through the magnetic reference points on the floor. The deviation
Figure 6
(a) Magnetic sensor model and magnet reference point. (b) Calculation of position and orientation of Robot at the reference point.

High-speed, high-torque, reversible DC motors are used in the driving system. For accurate location and speed detection, a quadrature optical shaft encoder with 600 pulses per rotation is mounted to each motor. A microprocessor-based electrical circuit with integrated firmware is used to accomplish the motor control, allowing to control of the motor by a PID algorithm shown in Figure 7.
Figure 7
Block diagram of a PID controller in a feedback loop.

The PID controller is distinguished by a control loop feedback mechanism that calculates the difference between the desired setpoint and the actual output of a process and utilizes the result to correct the operation. PID is an abbreviation for Proportional, Integral, and Derivative. The process's job is to keep a setpoint value constant. You might want a DC motor to maintain a setpoint value
The voice interaction function consists of three phases. In the first phase, we build a text classification model by machine learning using the fasttext library (Joulin et al., 2016), an open-source library that allows users to learn text representations and text classifiers to classify text strings. Accordingly, we build keyword data and label the keyword combinations as follows:
According to Table 1, there are more than 100 labels in which each label represents a type of information provided to the user depending on the user's question.
Keyword and label data sheet.
1 | Flight information, flight schedule, etc. | __label__#fly_schedule# |
2 | Station map, etc. | __label__#station_map# |
3 | Restaurant, cafeteria, food, food, etc. | __label__#restaurant# |
4 | Things not to bring on board, dangerous items, etc. | __label__#dangerous_object# |
5 | Smile, take photo, etc. | __label__#capture_photo# |
6 | Hello, hello robot, etc. | __label__#greeting# |
7 | bye bye, goodbye, thank you, etc. | __label__#goodbye# |
8 | give me information about the weather, etc. | __label__#weather# |
9 | … | … |
In the second phase, the user's voice is recorded in real-time from the microphone, then using API functions to transmit to Google's Speech-to-Text API audio processing system. The text string will be returned immediately and put into the Text Classifier, as shown in Figure 8(a).
Figure 8
(a) Voice recognition model. (b) Flowchart of the passenger answering program algorithm.

In the final phase, the text data converted from the passenger's voice will be classified by the model built in phase 1 to determine its label from which the computer program on the robot will respond correctly necessary information to respond to the passenger, as shown in the flowchart of Figure 8(b). As shown in Figure 8, in case the program does not determine which label the text belongs to, this text will be stored in a log file for us to check and add to the machine learning data.
In practical, we often see a lot of people appearing in the lens, which will lead to many faces appearing in the frame with different sizes, as illustrated in Figure 9. Therefore, the MTCNN method to use image resizing creates a series of copies of the original with multiple dimensions called image pyramids (Hossain and Mukit, 2015).
Figure 9
The image pyramid object.

We have each copy that will use the kernel 12×12 pixels and stride = 2 to scan the image for faces. MTCNN will recognize faces thanks to different dimensions. Next, we will pass the kernel through the P-net to find the coordinates of the four corners of each bounding box.
When removing the kernel and cells, we use two methods: NMS (Non-Maximum Suppression) to delete cells with overlapping percentages and set confidence Threshold—delete cells with low confidence level, as shown in Figure 10.
Figure 10
(a) P-net, R-net and image processing results of NMS. (b) O-net and face detection result.

When we have found and deleted the unsuitable boxes, we convert the coordinates to the original image. Then calculate the length and width of the kernels based on the original image, multiply the coordinates when normalize to about (0,1) and add the coordinates corresponding to the kernel. The output results are the coordinates of the corresponding boxes, get the new coordinates for the R-net network.
R-net will do the same thing as P, but it will use the padding method to insert empty pixels into the missing bounding boxes. All boxes will resize and deliver the results over the O-net.
At O-net will resize the boxes to 48×48. The results return 3 values including 4 coordinates of the bounding box (
According to Figure 11, we have a box covering the face with coordinates (
Figure 11
Coordinates and dimensions of the bounding box in the image.

This paper aims to develop a method for predicting the distance from the camera to the person in front of it based on the similarity of two triangles. In this study, the similarity is a property of the on-camera image sensor. The camera is the remaining side of two opposing triangles, then we have a pair of similar triangles that are symmetrical about the vertical axis, as illustrated in Figure 12. With angle
Figure 12
Two similar triangles opposite each other.

Applying the above property, we predict the distance from the sensor to the camera. Assume a model with a face of known height
Figure 13
(a) Method of determining the focal length of the camera. (b) Determine the distance using the triangle similarity method.

In there: a is the height of the human face in the image,
Next, in Figure 13(b), the model moves closer to or away from the camera a certain distance, applying the principle of similarity of triangles to determine the distance
Then the distance
In there:
With the above distance prediction method, we found that there is still a factor leading to an error in the results since each person will have a different face, the size in the photo will also be different. Thus, the experimental process will be erroneous. To limit the error, for every 10 measurements we ask the user to approach the camera at a distance of 1m to recalibrate the
To accurately locate the robot's position in the operating environment, we apply the method as reported in Thanh et al. (2021). The main goal is to control the mobile robot to follow a certain trajectory. A different trajectory with a path with time constraints added to it, which makes the control target not only minimize the distance between the robot and the path, but also to ensure the travel time.
We define the actual robot state as:
When the robot moves, the error will appear:
From the kinetic and derivative model (4), we get the error model as follows:
The controller for the robot is built as follows:
According to Klancar et al. (2005) with
Figure 14
The positional error between the robot's actual coordinates and the reference coordinates in the trajectory.

Along the way, the robot must be able to detect and avoid unexpected obstacles. For these cases, a proxizmity sensor system is used. That is 12 ultrasonic distance sensors installed on the robot, as shown in Figure 14a, allowing to detect obstacles in front and two sides of the robot.
The VFH + method uses a histogram grid to map the environment around the robot. This map is continuously updated with distance to obstacle data obtained from the ultrasonic sensors mounted on the robot, as shown in Figure 15. The method will find the optimal direction of movement when encountering obstacles, and appropriate velocity control for the robot (linear velocity, angular velocity).
Consecutive free sectors in
Figure 15
(a) Histogram grid. (b) enlargement angle from robot to obstacle. (c) Example of blocked directions.

The VFH + approach first determines a collection of probable candidate directions by finding all vacancies in the masked polar histogram. These candidate directions are then subjected to a cost function that considers more than simply the difference between the candidate and goal directions.
The candidate direction
Selecting the most optimal direction among candidate directions in the first step, the right and left borders
Figure 16
(a) Layout diagram of ultrasonic sensors. (b) Selecting the most optimal direction among candidate direction.

For wide openings, three candidate directions are selected as
With
In order to o prioritize towards the goal, we choose
There are three methods to test our model. All of them are being slightly different. In the first test, we chose an observation which belonged to the category label “__label__#station_map#” and tested the model against it. As you can see below, it correctly predicted the category of “__label__#station_map#” and did so with a probability of 94%, as shown in Figure 17.
Figure 17
Testing the model on a single sentence.

In the next test, we evaluate the keyword classifier on the entire dataset (1241 samples), which yielded values for precision at one as 0.85 and recall of one as 0.85 as well, as shown in Figure 18.
Figure 18
Model's performance on the test set.

In both tests, the precision is the number of the correct labels predicted by the classifier among all the labels and the recall the number of labels successfully predicted among the real labels.
The third experiment is conducted in practice by setting up a microphone connected to a computer to receive voice with the participation of 5 researchers. Each person will repeat 2 times a group of keywords and continuously read 30 groups of keywords. After each reading, it will wait for the computer to respond to information before continuing to read the next keyword. This test method is intended to evaluate the audio acquisition, text-to-text, and classification capabilities of the program. The results of the test are shown in Table 2.
Test result data sheet.
1 | 26 | 2 | 2 |
2 | 28 | 1 | 1 |
3 | 27 | 2 | 1 |
4 | 23 | 4 | 3 |
5 | 29 | 0 | 1 |
According to the above table, the ability to receive and classify the text converted from the user's voice is quite good, the times of misclassification or failure to classify the label are determined by us due to the duplication of words. lock when the user reads a group of keywords with a small number of words, that is, a statement that is too short to contain the keywords. This can be solved by displaying on the computer screen prompts to ask questions to the passenger.
In the first test, the program will test at three distances of 1.2 m; 1.8 m, and 2.4 m. For each distance, 10 measurements will be made, and the average data is in Table 3.
The results of the experimental measurements of the non-calibrated program.
1–10 | 1.2 | 1.19 | 0.01 | 0.83% |
11–20 | 1.8 | 1.73 | 0.07 | 3.89% |
21–30 | 2.4 | 2.01 | 0.39 | 16.25% |
The measurement results in Table 3 show that at the shorter distance, the lower the percentage of error, the higher the accuracy of the measurement. The mean error of 30 measurements is 6.99%. In the second test, we test the program at three similar distances, while applying an additional calibration method. The results are shown in Table 4.
Results of experimental measurements of the calibrated program.
1–10 | 1.2 | 1.18 | 0.02 | 1.67% |
11–20 | 1.8 | 1.76 | 0.04 | 2.22% |
21–30 | 2.4 | 2.14 | 0.26 | 2.67% |
According to Table 4, the average error of 30 measurements is 3.52%. When applying the correction of the recalculation of the
Compare the distance measured results with the images in the laboratory environment.
In the practice room | 50% | 75% |
Here we will have a table comparing the images obtained from the environments.
Table 6 shows that average lighting conditions will give the best results. The inspection in the corridor also depends a lot on the light intensity at that time, if in cloudy weather conditions. Therefore, image acquisition for distance measurement is also greatly affected by ambient light conditions.
Compare the distance measured results with outdoor images.
Distance measurement ratio | 52% | 57% | 90% |
The robot was tested in a simulation of a miniature airport terminal with a size of about 20 m×20 m. In the test environment, there are full locations such as doors, departure gates, check-in counters, coffee shops. The starting position of the robot will be at the door. When someone communicates with the robot and asks the robot to lead the way to the desired location on the map, the robot will map out the most optimal trajectory and will lead the passenger, as shown in Figure 19. The screen behind the robot will show passengers the way to go. The rear camera is also used to identify and determine if the passenger is still following the robot, if the passenger does not follow the robot, the robot will return to the starting position.
Figure 19
(a) Passengers interact with the robot. (b) The robot leads the passenger to the check-in counter.

On the floor of the airport terminal, there are magnets in fixed positions so that the robot can locate. When the robot guides, it will find the optimal path through these reference magnet points and closest to the current position of the robot. The position of the magnets can be seen in Figures 20 and 21. The distance between the magnets is optimized so that the robot can be prevented from deviating from its orbit during movement, here the distance of the magnets is about 3m.
Figure 20
(a) Airport terminal map and robot path. (b) the actual path of the robot during passenger guidance.

Figure 21
Position of the reference magnets on the floor.

In the process of moving, if the robot encounters an obstacle or a person crosses the trajectory, the robot uses the VFH + method with distance data from the ultrasonic sensor.
Obstacle avoidance results are shown in Figure 22. The histograms of the VFH + method and the selected direction, respectively, at the positions of the robot during obstacle avoidance.
Figure 22
Human avoidance robot process and VFH + polar histograms.

This paper provides a framework of building an intelligent robot system to assist passengers in smart airports that can be expanded other areas, especially in COVID-19 situations. The system is built on an autonomous intelligent mobile robot system, which is simulated on a departure terminal. To ensure the accuracy of moving in the desired trajectory when there is no (or weak) GPS signal, new positioning algorithms using Ultra Wideband technology with the synthesis of sensor data from lidars, and encoder sensors are proposed. The program that applies segmentation and feature point extraction algorithms to meet the requirements of local environment mapping and avoidance of obstacles is also proposed. The services interact between the robot and the passenger by voice communication combined with machine learning techniques to analyze and understand the requirements of the passengers. In addition, the face detection technique is based on superimposed convolutional neural network to predict the distance between robot and passenger to perform the function of guiding passengers to areas in the airport. The tasks to assist airline passengers have been surveyed, simulated surveys, and evaluated experimental results and proven the effectiveness of the proposed methods.
The next quantitative studies will be applied to the installed robot system, which promises to have more useful results contributing to the research field of mobile robots, a potential product introduced in the market. to work in locations where information is essential but may be scarce or ever-changing. They are especially useful in places where customers are crowded and hurry, such as airports and transport hubs, information desks, shopping malls, medical services, etc.
Figure 1

Figure 2

Figure 3

Figure 4

Figure 5

Figure 6

Figure 7

Figure 8

Figure 9

Figure 10

Figure 11

Figure 12

Figure 13

Figure 14

Figure 15

Figure 16

Figure 17

Figure 18

Figure 19

Figure 20

Figure 21

Figure 22

The results of the experimental measurements of the non-calibrated program.
1–10 | 1.2 | 1.19 | 0.01 | 0.83% |
11–20 | 1.8 | 1.73 | 0.07 | 3.89% |
21–30 | 2.4 | 2.01 | 0.39 | 16.25% |
Keyword and label data sheet.
1 | Flight information, flight schedule, etc. | __label__#fly_schedule# |
2 | Station map, etc. | __label__#station_map# |
3 | Restaurant, cafeteria, food, food, etc. | __label__#restaurant# |
4 | Things not to bring on board, dangerous items, etc. | __label__#dangerous_object# |
5 | Smile, take photo, etc. | __label__#capture_photo# |
6 | Hello, hello robot, etc. | __label__#greeting# |
7 | bye bye, goodbye, thank you, etc. | __label__#goodbye# |
8 | give me information about the weather, etc. | __label__#weather# |
9 | … | … |
Compare the distance measured results with the images in the laboratory environment.
In the practice room | 50% | 75% |
Test result data sheet.
1 | 26 | 2 | 2 |
2 | 28 | 1 | 1 |
3 | 27 | 2 | 1 |
4 | 23 | 4 | 3 |
5 | 29 | 0 | 1 |
Results of experimental measurements of the calibrated program.
1–10 | 1.2 | 1.18 | 0.02 | 1.67% |
11–20 | 1.8 | 1.76 | 0.04 | 2.22% |
21–30 | 2.4 | 2.14 | 0.26 | 2.67% |
Compare the distance measured results with outdoor images.
Distance measurement ratio | 52% | 57% | 90% |