The research on pilot cognitive load can be traced back to the early 20th century [1-3]. Cognitive load refers to an individual’s psychological resource used to solve problems or complete tasks within a certain period of time [4,5]. When a person’s working memory capacity is overloaded with new information received directly or indirectly, the burden on the cognitive system increases, forming a cognitive load [6]. Nowadays, artificial intelligence technology has been widely applied in many fields, and its future development cannot be estimated. These developments will profoundly transform related fields. But if there is only a single application of artificial intelligence, although machines driven by AI technology and automation technology, such as cars, have both automation capabilities, that is, multiple mechanical/control systems that enable cars to travel according to instructions, and autonomous capabilities such as AI driven environmental perception and path planning, machines can already complete more and more tasks without human participation. However, as artificial intelligence is increasingly applied in various fields, especially in the field of automation control, human participation will become increasingly indispensable. The machine intelligence driven by AI has brought enormous imaginative space for automation applications in various fields and also requires the influence of human factors. The process of pilot information processing includes: firstly, the pilot obtains sensory and tactile information; Secondly, the pilot’s ability to make decisions based on past experience when making plans; Finally, the level of pilot’s operational behavior [7]. After information processing, the amount of cognitive resources consumed by pilots is called cognitive load. The autonomous willingness of pilots to react and operate runs through the process of human machine environment interaction, while the subjective driving intention of pilots is rarely considered in current research results. The reflection of actual effectiveness in flight control processes under different cognitive loads needs to be emphasized.
Early measurement methods often relied on methods such as action response detection, pen and paper filling, form testing, and intelligence testing, due to limitations in experimental equipment and environment. The validity of various subjective scales has been validated and modified by many researchers in order to achieve more accurate evaluation results. Due to subjective evaluation methods such as paper, pen, and questionnaire surveys, which require participants to fill in based on their own perception, the results are often subjective.
Objective measurement methods have been applied to most human-computer interaction studies due to the inability to directly observe and measure cognitive load, such as eye tracking [8-9], Index of Cognitive Activity (ICA) [10], and other technologies. In addition to objective methods, some scholars in the field of multimedia learning research use subjective tools such as the Paas Psychological Effort Scale [11] and the NASA-TLX (National Aeronautics and Space Administration Task Load Index) scale [12] to evaluate cognitive load. The main methods currently used to measure cognitive load in human-computer interaction research include physiologytask [13] performance based [14][13], and subjective self-assessment [15].
Cognitive compliance
Types | Indirect mode | Specific mode |
---|---|---|
subjective measure | Subjective assessment scale | NASA-TLX, WP scales, etc |
Task performance | Dual task measurement | |
objective measurement | Physiological measurement data | Electrocardiogram, oculomotor, electroencephalogram, etc |
Through literature analysis, it was found that in human-computer interaction research, objective methods tend to be used to obtain relatively reliable and effective data, and there are few studies that use a single subjective method to measure cognitive load (such as Clarke, Schuetzler, and Windle et al.), in order to avoid the influence of personal characteristics of participants on experimental results [16]. The stimuli experienced by individuals indirectly affect the changes in physiological data and represent the level of psychological processing. The hypothesis that human physiological changes to some extent reflect an individual’s psychological state establishes a physiological method for measuring cognitive load [17]. Indirect objective measurement methods such as eye tracking technology[18], functional near-infrared spectroscopy (fNIRS) [19], skin electric response, electroencephalogram (EEG) [20]have been used to measure cognitive load in human-computer interaction research.
Heart rate variability is an indicator of electrocardiogram signals, referring to the irregularity of differences between consecutive heart beat cycles. Physiological functions that are not subjectively controlled by humans, including heartbeat, respiration, blood pressure fluctuations, and digestion, are all regulated by the autonomic nervous system of the human body. Due to the influence of multiple factors such as hormones, staying up late, and diet, there is no optimal standard interval for heart rate variability. However, the time-frequency indicators and other characteristic information of heart rate variability can provide non-invasive and quantitative evaluation of the autonomic nervous system, so electrocardiogram signals are selected as measurement data for human factors.
Name | unit | illustrate | Formula |
---|---|---|---|
ms | Mean RR interval | ||
ms | Normal RR interval standard deviation | ||
ms | Root mean square of RR interval difference between neighbors | ||
% | proportion of RR interval difference greater than 50ms |
Time domain analysis is the simplest and most intuitive way to study HRV, and its analysis principle is based on quantitative exploration of statistical indicators such as MEAN and SDNN in RR interval sequences. HRV time-domain indicators commonly used in analysis.
The area enclosed by the power spectrum curve and coordinates on each frequency band is numerically the power of the signal in that frequency band. Therefore, the energy characteristics of each frequency band are extracted based on the power spectrum to quantitatively analyze the frequency domain characteristics of HRV, as shown in the figure.
Name | Abbreviation | Meaning | Frequency range |
---|---|---|---|
Very low frequency | - | <0.04 Hz | |
low frequency | Reflecting sympatheti c nervous activity | 0.04 ~ 0.15 Hz | |
High frequency | Reflecting parasympat hetic nerve activity | 0.15 ~ 0.4 Hz |
Obtain the HRV frequency domain indicators on each side of the pentagonal flight through frequency domain analysis, (including the standardized low-frequency power
Reinforcement learning consists of three parts: intelligent agent, reward function, and environment. As shown in the figure, the initial state of the environment is inputted to the intelligent agent. The intelligent agent selects appropriate actions based on the state, and the actions are inputted to the environment. The environment obtains the reward value generated by the action and the new state. The two are inputted to the intelligent agent. The intelligent agent corrects the strategy based on the reward value, outputs new actions based on the new state, and thus repeats the cycle. The goal of reinforcement learning is to learn a strategy function
The actor and critic represent the policy
The core of DDPG is to split the actor and evaluator critic into two networks: the current network and the target network. After the actor generates an action for the environment, samples (
The function of the current network’s critic is to update parameter
The updated current network will periodically copy the weights
The target network selects the optimal action
Due to the need to characterize short-term cognitive load, electrocardiogram time-frequency indicators
Compared to calculating the average cognitive load using time-domain and frequency-domain indicators over a time period, using the above formula to fit short-term cognitive loads is more accurate and real-time This can then serve as one of the input data for the reinforcement learning model.
The neural network structure of the critic and actuator is shown in the figure. The hidden layer size is 100, where the input layer of the critic inputs the aircraft’s cognitive load state
Assuming that the observation input of the reinforcement learning agent is 4 dimensions, the state input of the aircraft is
When the range set is exceeded, the training is terminated. If the range is too small, the exploration space of the action during the training process is limited, and it takes a lot of time for the reward function to converge. If the range is too large, it does not conform to the actual operation situation, and it is easy to control the situation when the aircraft pitch angle reaches 80 °. When the aircraft pitch angle exceeds a certain range, it is impossible to adjust back to a stable state during actual operation, Therefore, by setting the training range, useless training sample data is screened out. The design of a reward function guides the aircraft to approach the expected operating state during the training process, and the design of the reward function directly affects the control accuracy and robustness of the final controller. Based on experience and experimentation, Setting the coefficients for the relatively direct state variables z and γ as 0.3 and 0.5, and the coefficients for 0.04 and 0.03, respectively. and reward the previous control action with a coefficient of 0.005. Overall, the final reward function is:
The human factor wireless physiological acquisition platform includes experimental equipment such as v1.0 ArgoLAB signal acquisition device, laptop, high-definition camera, etc. The experimental equipment is located in a space with artificial low light and maintained at a comfortable temperature. The electrocardiogram collection device is a wireless optical capacitive pulse sensor with a sampling frequency of 512 Hz. The specific technical parameters are described in Table IV . Then, the v1.0 ErgoLAB wireless receiver is connected to a laptop to transmit the subject’s electrocardiogram collection signal in real-time through a local area network, with a transmission frequency of 2.4 GHz.
Name | Value range |
---|---|
resolution ratio ECG | ≥ 16Bit |
measurement range | -1500 |
Adjustable magnification | 1,2,3,4,5,6,7 |
accuracy | 0.183 |
Number of wireless sensor channels | ≥1 |
Wireless transmission frequency | 2.4GHz |
Distance | 10m ~ 100m |
Battery operating time | ≥ 4h |
The flight simulation adopts DCS World Steam Edition. Conduct experiments with 8 skilled flight trainees and apprentices. The debugging content of experimental equipment mainly includes model, airport, weather, date, and aircraft location, as shown in the figure. After entering the experimental course software, open the instructor console and select “Five sided Flight” experiment in “Create Task”, then click the start button. Enter the scene settings interface again, change the aircraft model to su-25T, set the takeoff runway to Senaki, select the current experimental date and time, and ensure suitable meteorological conditions.
The specific experimental operation process is as follows:
Set the weather of the flight simulator to clear, with a temperature of 20 degrees Celsius and a cloud layer of 2500m. Set the initial position of the aircraft at the runway entrance, align it with the centerline of the runway, and the subjects begin to perform pre takeoff checks. After completing the pre takeoff checklist, the subjects fully push the accelerator and maintain stable acceleration until the airspeed gauge shows more than 55 knots. Then, pull the lever to lift the wheels of the aircraft backwards and take off at a climbing rate of 500 feet per minute. Turn right 90 ° to both sides at a turning landmark, with a maximum turning angle of 20 °, heading from 50 ° to 150 °, and maintaining a climbing speed of 70 knots. Then turn to the third side at the second turning point. The aircraft has reached the altitude of the takeoff and landing route, with a stable airspeed of around 80 knots, maintaining the altitude heading airspeed. Reduce speed in advance on the short three sides, perform a pre landing checklist, check that the throttle valve is open, check the engine parameter table, check the engine temperature, the remaining fuel level on the fuel gauge, check that the mixing ratio lever is in the rich oil position, check the effectiveness of the braking device, lightly retract the throttle, start descent, maintain a descent rate of 500 feet per minute and an airspeed of around 70 knots. Turn to the fifth side at the four turning points, with a maximum turning angle of 30 °. Check that there are no obstacles on the runway, control the throttle as needed, release the throttle before touchdown, gently level the aircraft and wait for touchdown. After touchdown, gently brake to a stop.
For baseline drift and other noise mixed in electrocardiogram signals, a low-pass filter is set to remove them; Then utilize a notch filter to eliminate power frequency interference mixed into it, apply threshold method to extract features of R waves in the electrocardiogram waveform, and quantitatively analyze the time-domain indicators of HRV. The main idea of the threshold method is to utilize the characteristic of QRS characteristic waves being the most oscillatory band within the electrocardiogram waveform. By setting different threshold ranges, the starting point of QRS main waves is obtained, and then the position of the R-wave vertex is determined using window and amplitude thresholds.
The original physiological signal obtained is shown in the figure. The horizontal axis represents the number of samples, in units of 104, and the vertical axis represents the amplitude, in units of
The image after denoising is shown in the figure:
Using threshold method to extract R-waves from denoised electrocardiogram data, the results of R-wave extraction are shown in the figure:
Cognitive load
As shown in the figure, in the Five sided Flight, cognitive load follows a trend of first rising and then falling in the image after quantification. This is because cognitive load is more significantly influenced by psychological factors during takeoff and landing stages.
The CPU of the server used in the experiment is a 256 core AMD EPYC 9654 with a frequency of 2.4GHz, with 128GB of memory and two A800 graphics cards. Each graphics card has 80GB of memory, and the operating system is Ubuntu. The framework is the TensorFlow platform of Python 3.6. The number of samples per batch is set to 512, the number of iterations is set to 1000, the learning rate is set to 0.01, the delay steps are 2, the experience pool size is 1000, the Actor network learning rate is 0.0001, the Critical network learning rate is 0.0002, and the exploration rate is 0.9.The closer the aircraft is to the expected state, the greater the reward value, Set the training objective to achieve an average reward greater than 200 over five consecutive episodes.
In subsequent testing, it was found that due to the setting of the training range, the aircraft exceeded the training range within 1 second, resulting in the termination of this training set. However, the cumulative reward value of this set exceeded 200 due to the small number of samples. The controller obtained after terminating the training for 5 consecutive sets cannot complete the aircraft stability control task. Change the training completion conditions to meet the target requirement when the sampling reaches 400 times in each set and the average value of the reward function in 5 consecutive sets is greater than 200. By establishing a simulation environment for training, terminate the training when the reward function is received and meets the requirements. The process of obtaining a controller through reinforcement learning is a continuous process of adjustment and improvement, and there is no optimal result. Through the simulation results of this training, the reward function and training requirements can be further adjusted to gradually reach the expected state of aircraft operation.
In Figures 11 to 14, during the initial training stage, the aircraft is in the exploration and learning experience stage, so the learning effect is not ideal. However, the gradual increase in training frequency makes the aircraft’s experience more and more rich. After the initial trial and error learning, the cumulative return of the algorithm increases rapidly, and the reward value increases and stabilizes in time step, quickly reaching convergence. In addition, the return value gradually increases, with an exploration variable enhancement value of 0.8 and a decrease in model entropy below -2, indicating a good training effect of the model.
In Figures 15 and 16, the velocities and angular velocities in all directions decrease in fluctuation, and the acceleration in the z-direction tends to 0. The angular velocities in all directions also tend to 0, indicating that the aircraft has landed from the air and entered the ground sliding phase.
Figures 17 and 18 show that the final position, mean square error of pitch angle, and standard deviation of the aircraft gradually decrease with the number of iterations and tend to 0, indicating that the aircraft is gradually stabilizing its landing.
This article uses linear fitting method to obtain cognitive load curve based on time-domain and frequency-domain analysis data of heart rate specificity test, and obtains cognitive load data that is consistent with the frequency of flight data. Based on cognitive load theory, the DDPG algorithm has been improved by incorporating human factors into the closed-loop of reinforcement learning. By using the improved DDPG algorithm for training, the number of ineffective explorations in the early stage was effectively reduced, and the impact of physiological level changes was considered in the field of human-computer interaction, achieving good control effects.