After World War One efforts to predict success in pilot training were occurring both in Europe and the United States. In Europe, the French and British approaches tended to focus on the Physiological challenges of aviation. In the United States, the approach tended to focus more on the psychological difficulties. This dichotomous approach continued well into the Second World War. It was not until the landmark Pensacola 1000 study in the United States in 1945 that demonstrated the superiority of the Psychological approach. (1)
900 US Navy flight training cadets were subjected to 60 different psychological, psychomotor and physiological tests. The Pensacola 1000 study determined that the physiological tests were not predictive of success more than chance. The study concluded that psychometric and psychomotor tests were predictive of success on flight training.
The Pensacola 1000 study became the model for pilot psychometric testing from 1945 until the present day. The research led to the creation of the Naval Aviator Test Battery. The Naval Aviator Test Battery included the Wonderlic Personnel Test (a test of general ability or intelligence), the Bennett Mechanical Comprehension Test (a test of mechanical interest and skills), and the Purdue Biographical Inventory (a measure of morale, interest, and attitudes).
The Pensacola 1000 study demonstrated that psychomotor tests were predictive of success in pilot training. Despite this result, Psychomotor tests had problems with ease of use, reliability, and standardisation. As a result, psychomotor tests were omitted from the Naval Aviator Test Battery. Psychomotor tests fell out of use in the United States in the decade after the Second World War. (1)
Outside of the United States, the experience was different. The Royal Air Force (RAF) and the Royal Australian Air Force (RAAF) persisted with electromechanical psychomotor tests from the 1940s until well into the 1990s.
Pilot aptitude testing has also included various combinations of other measures. These have included previous flight experience, previous service experience, interview results, performance on work sampling, and results at flight screening. This has been an attempt to boost the relatively modest predictive power of psychometric and psychomotor testing. (2, 3)
In the 1970s and 1980s, Scandinavian countries developed the Defence Mechanism Test (DMT), which was a significant departure from the approach that had developed following the Pensacola 1000 study. The DMT is a projective test based on assumedly anxiety-provoking images. The images are exposed through a tachistoscope. The tachistoscope gradually increases the image exposure from 5 to 2000 milliseconds. The rationale for the test as a selection instrument for stressful occupations is that psychological defences bind psychic energy necessary for coping with stressful situations. Furthermore, those subjects with maladaptive strategies for dealing with stress will perform worse on the test. The Scandinavians reported significant predictive ability for the DMT. This approach was trialled by Air Forces outside of Scandinavia, including the RAAF. The DMT failed to demonstrate the same results. A study by Ekehammar et al. in 2005 aimed to understand why this was so. They concluded that the DMT does not measure what it purports to measure. They found that a more plausible explanation was that DMT performance reflects information processing difficulty due to anticipatory or test anxiety. (4)
Since the 1980s computer technology has been introduced into pilot aptitude testing. Various systems have been deployed which combine psychometric and psychomotor tests into the one device. (1) It is important to realise that these machines are based on electronic versions of pre-existing psychometric and psychomotor tests. Because of this, Bartram et al. 1995 concluded that this technology is not expected to significantly enhance the prediction of success or failure on pilot training. (5)
A strength of computer-based devices is that the Psychomotor element doesn’t suffer from the issues that plagued electromechanical devices of the past. This led to the United States military reintroducing psychomotor testing into their test batteries in the 1980s. Computer-based systems have thus enabled reintroduction and combination of psychometric and psychomotor testing into a single device. The widespread uptake of these devices by military and non-military users around the globe has helped to standardise the assessment process.
Although there has been widespread and continuous use of psychometric testing over a very long period, the predictive abilities of these tests have always been modest.
A landmark meta-analysis study by Hunter and Burke in 1992 reported validity coefficients as a function of predictor type (table 1). (6)
Predictor Measure | Mean r |
---|---|
0.1924 | |
0.1168 | |
0.2256 | |
0.3272 | |
0.2646 | |
0.3035 | |
0.1934 | |
0.0889 | |
0.1973 |
The researchers reported that in general, job sample measures were the best predictors of performance, followed by psychomotor coordination and biographical inventories.
Somewhat depressingly, Hunter and Burke reported that the analysis showed a decline in the mean validity correlations over the previous 50 years.
Another disappointing finding was that for the personality measures (mean correlation of 0.1168) the 95% confidence interval was +/- 0.2644. An interval which includes zero meaning that this measure is no more predictive than a coin toss. (6)
In a 1996 paper by Damos et al,
Damos offered the following list of potential explanations for why these tests aren’t more predictive: (7)
Although psychometric tests are unable to provide significant prediction in isolation, when they are combined into selection batteries, they provide increments in prediction that continue to be attractive to organisations that are responsible for candidate selection. (2, 3) A common and interesting observation is that correlations with success on pilot training are not reflected in success on operational training. (7)
As long as pilot training continues to be expensive and while there is a large number of applicants for a small number of training places it is likely that this approach will continue, despite its limitations. On the other hand, if there becomes a severe shortage of pilots (as predicted by ICAO), the limitations of this approach may become more apparent.
Many airlines around the world have incorporated psychometric testing into their selection processes. Somewhat paradoxically, United States airlines have not employed these tests as much as many overseas Airlines, due in part to the particular regulatory framework in which they operate. (7)
The International Aviation Transport Association (IATA) have published
These guidelines make the following claims for pilot aptitude testing;
“
The guidelines are based on a large survey of the practices of member airlines. Although the guidelines make significant claims for the benefit of pilot aptitude testing, they do not address some of the limitations identified in research journals previously identified in this paper. The guidelines do note the following;
“
As previously noted, research has indicated that correlations with success on pilot training are not reflected in success on operational training. So it’s worthwhile to consider to whom the airlines are applying these IATA guidelines. If they are applied to airline cadets then based on the military experience there they might predict success with training.
If the test batteries are being applied to pilots being recruited from the military, or other airlines, then this expectation is probably not realistic. Alternatively, airlines may expect that (for qualified pilots) the batteries may select candidates who are ‘safer’ or more compatible with the ‘culture’ of the company. The evidence base for this expectation is not currently well described in the scientific literature.
Since the 1980s there has been a convergence of psychometric and psychomotor testing in terms of their incorporation into computer-based devices. At the same time computer technology has been incorporated into the cockpit with resulting automation of critical roles. As these processes continues the screening device, the simulator and the aircraft may converge to the point that they are largely identical from the view point of the “pilot”. At this stage very little if any selection or training will take place in real aircraft.
A survey by financial services firm UBS estimated that moving from 2 pilots to 1 pilot for airline operations would yield a potential profit of $15 billion. The study also noted that 70-80% of accidents are the result of human error and that 15-20% of those are due to crew fatigue. (10) It is likely that these drivers will result in greater automation of the cockpit to the point where a pilots role may be unrecognisable from what it is today. The level of automation may result in pilot’s tasks becoming more similar to that of a Main Control Room (MCR) operator in a nuclear power plant.
A study by Zhang et al. looked at any correlation between a Psychometric measure known as general mental ability (GMA) and the performance and safety compliance of main control room (MCR) operators in nuclear power plants. The study noted that GMA is the best single predictor of work performance with the criterion related validity as high as .51. (9)
In this context, it’s interesting to consider that a change in the “task” might be the missing ingredient which finally delivers on the promise of psychometric testing.