Comparative Analysis of Deep Learning and Decision Tree Approaches for Predicting Aircraft Engine Remaining Useful Life
Catégorie d'article: Research Article
Publié en ligne: 19 nov. 2024
Pages: 183 - 200
DOI: https://doi.org/10.2478/fas-2023-0012
Mots clés
© 2023 Hassina Madjour et al., published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.
Prognostics and Health Management (PHM) methods are essential in ensuring the reliability and longevity of industrial systems. PHM encompasses a suite of technologies and processes designed to monitor the health of machines and equipment, predict potential failures, and provide actionable insights for maintenance decision-making. By integrating advanced diagnostics, prognostics, and health management strategies, PHM enables the early detection of faults, the reliable estimation of Remaining Useful Life (RUL), and the planning of maintenance activities to prevent unexpected breakdowns. This proactive approach not only enhances operational efficiency but also reduces maintenance costs and downtime. PHM methods are increasingly being adopted across various industries, including aerospace, automotive, and manufacturing, where they play a critical role in sustaining continuous operation and optimizing asset performance. Techniques that depend on Condition-Based Maintenance (CBM) and the use of intelligent PHM methods have proven their high efficiency in achieving this (Zhao et al., 2017). With the advent of data-driven and machine-learning approaches, PHM has evolved to incorporate more sophisticated algorithms, such as deep learning models, that offer higher accuracy in failure prediction and health assessment, further solidifying its importance in modern industrial practices.
Among the goals of PHM is to reduce maintenance costs and increase system reliability by estimating RUL based on historical data (Heimes, 2008). Recently, RUL prediction has garnered significant attention from both researchers and operators, driven by the increasing industrial demand for efficiency and reliability (Zhao et al., 2017). In contrast to traditional methods, such as corrective maintenance and scheduled preventive maintenance (Azadeh et al., 2015), RUL prediction offers a less limited approach. RUL is generally defined as “the length from the current time to the end of its useful life” (Si et al., 2011) and it is widely employed in decision-making to improve health maintenance and management policies (Rezaeian Jouybari & Shang, 2020; Lei et al., 2018). The primary algorithms for predicting RUL can be categorized into model-dependent methods (Cubillo et al., 2016; Pecht & Gu, 2009), data-driven methods (Li et al., 2019; Heng et al., 2009), and hybrid methods (Liao & Köttig, 2014; Krizhevsky et al., 2012).
The first type of method focuses on classifying the stages of system degradation by building mathematical models based on failure mechanisms or the first cause of damage (Xiao et al., 2022). An example of this is the Paris-Erdogan (PE) model, used to describe the propagation of cracks (Qian et al., 2017). However, one obstacle to this method is the difficulty of generalizing a static model to the rest of the systems. This limitation has led to the rise of data-driven methods as an alternative, which rely on historical data specific to each system. With the growing availability of massive datasets, deep learning has proven to play an effective role in most fields (Zhang et al., 2021). In contrast, mixed methods, which combine both model-based and data-driven approaches, have shown promising results in the field of prediction (Tian et al., 2010), as they leverage both data and structural information. In the context of approaches to aircraft engine health monitoring, commonly used hybrid methods include the Weibull distribution (Ben Ali et al., 2015), particle filter (Ben Ali et al., 2015), and Eyring model (Jouin et al., 2016).
As noted, techniques based on historical data achieve good modeling of deterioration characteristics, provided there is sufficient data available for training. These models do not focus primarily on previous experience in forecasting, so in recent years numerous approaches have been proposed that have achieved good results in prediction. Among them are Support Vector Machines (SVM) (Benkedjouh et al., 2013) and Artificial Neural Networks (ANN) (Gebraeel et al., 2004). Machine learning techniques, especially neural network-based approaches, have seen widespread adoption in health management and failure prediction, due to their capacity to model nonlinear systems without requiring a detailed understanding of the system’s physical structure. Instead, they exploit the information derived from sensors as input (Sikorska et al., 2011).
Tian (2012) used Artificial Neural Networks (ANN) to predict the lifetime of condition-monitored equipment, specifically pump bearings. He exploited the scaling values of various points as inputs and percentages of equipment life as outputs using a neural network based on Long Short-Term Memory (LSTM) to estimate the RUL for pneumatic motors in cases of failure and strong noise (Elsheikh et al., 2019). LSTM is a neural network model that has been proposed to solve the RUL estimation problem by exploiting data collected by sensors that register various indicators, such as vibration intensity or exerted pressure (Kali & Linn, 2010).
Deep learning has shown effective results in pattern recognition and great capabilities in the field of intelligent predictions, thanks to a multi-layered structure capturing detailed information from input data (Gonzalez, 2007). With complex deep structures, it is possible to design excellent abstract models, resulting in more efficient features extraction as compared to shallow networks. By comparing the data obtained from image processing (Liao et al., 2016) and machine monitoring, we find that they share two dimensions. Thus, deep learning networks have tremendous capabilities in estimating PHM and RUL.
The restricted Boltzmann machine (RBM) has been proposed to predict RUL in machinery (Liao et al., 2016), To address the problem of the deterioration of NASA’s Turbofan engine, a multipurpose Deep Belief Networks (DBN) suite was proposed, integrating an evolutionary algorithm with the traditional DBN training technique to develop multiple DBNs (Navathe et al., 2016). Convolutional Neural Networks (CNNs), originally proposed by LeCun et al. (Lee et al., 2014) for image processing, have since achieved considerable success in various applications.
In this study, we apply a Convolutional Neural Network (CNN) architecture, specifically a one-dimensional (1D) deep neural network, to predict the Remaining Useful Life (RUL) of aircraft engines and improve prediction accuracy. The proposed approach is evaluated using the NASA C-MAPSS dataset, a widely recognized benchmark for engine health monitoring research (Saxena et al., 2008). This dataset provides realistic simulation data for gas turbine engines, allowing for an in-depth examination of the deterioration characteristics of engine components.
The study begins with a review of RUL prediction literature, including an overview of previous work utilizing the C-MAPSS dataset. Following this, the methods for modeling degradation characteristics are detailed, with a particular focus on deep learning techniques. Given the increasing complexity of modern aircraft systems, which demand high reliability and safety standards, accurately predicting RUL is critical for effective maintenance planning. This study addresses these challenges by leveraging historical sensor data to model engine performance degradation. The comparative analysis involves applying the CNN-1D model alongside traditional machine learning algorithms, specifically a Decision Tree (DT) model, to assess predictive performance on the same dataset. This comparison aims to validate the proposed method’s effectiveness and highlight the advantages of deep learning approaches in RUL estimation. The results underscore the capability of the CNN-1D model to provide accurate RUL predictions, paving the way for broader adoption of deep learning techniques in predictive maintenance for aviation and other industries.
Convolutional Neural Networks (CNNs) are a class of deep learning models known for their ability to achieve high accuracy in various tasks. While two-dimensional CCNs (CNN-2D) have been applied to image processing and object recognition (Suah, 2017; Gehring et al., 2017), one-dimensional networks CNNs (CNN-1D) have achieved remarkable success in other domains, such as structured language data (Johnson & Zhang, 2017) and document classification (Shenfield & Howarth, 2020).
CNNs are feed-forward neural networks that consist of multiple stages designed to extract features from input data. Fig. 1 illustrates a simple example of a one-dimensional CNN architecture; during each convolutional phase, convolutional filters are applied, followed by the aggregation process. These filters find the high-level features and then enter these outputs into the aggregation stage and reduce the spatial size of the features obtained through the filters.

Simple CNN-1D architecture with two convolutional layers (Frederick et al., 2007).

Example of a decision tree.
Decision Trees are one of the most widely used methods in forecasting and decisionmaking across many fields, including pattern recognition and machine learning (Stein et al., 2005), (Sishi & Telukdarie, 2021). They are effective for solving both regression and classification problems (Charbuty & Abdulazeez, 2021; Yan et al., 2016), as they can predict future outcomes based on past and current data (De Oña et al., 2014). A DT algorithm partitions the dataset into multiple branches, creating a tree-like model that facilitates efficient decision-making and accurate predictions. The design of the tree is optimized based on the characteristics of the data and the requirements of the prediction model (DeCastro et al., 2008; Frederick et al., 2007).
This study focuses on predicting the deterioration of a monitored turbine engine based on simulation data obtained from various sensors. The propeller engine simulation model was developed using C-MAPSS, a simulation tool developed at NASA. C-MAPSS is widely used in engine health monitoring research, due to its ability to simulate realistic conditions for large commercial turbine engines (Navathe et al., 2016). Fig. 3 presents a schematic diagram of a commercial aircraft gas turbine engine simulated using C-MAPSS.

Simplified engine diagram simulated in C-MAPSS (Heimes, 2008).
The Commercial Modular Air Propulsion System (C-MAPSS) model, developed by NASA, provides a transient simulation of a large commercial turbine engine (up to 90,000 lbs of thrust) with a realistic engine control system. The software supports easy access to health, control, and engine parameters through a Graphical User Interface (GUI) – a graphical simulation environment of a propeller engine – enabling users to implement and test advanced algorithms.
C-MAPSS runs user-defined transient simulations and includes an atmospheric model that can simulate engine operation at altitudes from sea level to 40000 feet, Mach numbers from 0 to 0.90, and ambient temperatures from -60°F to 103°F. The package also includes an energy management system that allows the engine to be operated at a wide range of propulsion levels across a full range of flight conditions (Chai & Draxler, 2014).
C-MAPSS takes approximately 14 input parameters and can produce several output metrics. Table 1 lists the outputs that were used in this study. The inputs include fuel flow and a set of 13 health parameters that allow the user to simulate the effects of faults and deterioration in any of the five engine components: Fan, Low-Pressure Compressor (LPC), High-Pressure Compressor (HPC), High-Pressure Turbine (HPT), and Low-Pressure Turbine (LPT).
Description | Symbol | Units |
Total temperature at the fan inlet | T2 | °R |
Total temperature at the LPC outlet | T24 | °R |
Total temperature at the HPC outlet | T30 | °R |
Total temperature at the LPT outlet | T50 | °R |
Pressure at the fan inlet | P2 | Psia |
Total pressure in bypass-duct | P15 | Psia |
Total pressure at the HPC outlet | P30 | Psia |
Physical fan speed | Nf | rpm |
Physical core speed | Nc | rpm |
Engine pressure ratio (P50/P2) | Epr | |
Static pressure at the HPC outlet | Ps30 | Psia |
Ratio of fuel flow to Ps30 | Phi | PPS/psi |
Corrected fan speed | NRf | rpm |
Corrected core speed | NRc | Rpm |
Bypass ratio | BPR | |
Burner fuel-air ratio | farB | |
Bleed enthalpy | bleed | |
Demanded fan speed | Nf_dmd | Rpm |
Demanded corrected fan speed | PCNR_dmd | Rpm |
Coolant bleed (HPT) | W31HPT | 1bm/s |
Coolant bleed (LPT) | W32LPT | 1bm/s |
The total temperature at the HPT outlet | Parameters for calculating Health Index | |
Fan stall margin | SmFan | |
LPC stall margin | SmLPC | |
HPC stall margin | SmHPC |
The dataset used in this study is based on data proposed by NASA, which is from an aero engine simulation program. This data is divided into four sub-groups FD001, FD002, FD003, and FD004, each containing various measurements obtained from 21 sensors, in addition to some other settings. For this research, the engine FD001 was selected as a case study. Table 2 represents the engine data.
Dataset | CMAPSS (FD001) |
Training engine | 100 |
Testing engine | 100 |
Working condition | 1 |
Fault modes | 2 |
This study utilized a dataset with approximately 24.72 billion samples and 26 input variables that influence the decay of engine FD001. The dataset’s features serve as input elements for the CNN model. To facilitate feature selection, a bar chart of the dataset’s features highlights the most influential variables affecting the model’s predictive performance (see Fig. 4).

The distribution of the dataset’s features.
Fig. 5 represents the distribution of 16 sensors out of the 26 data features, showing that many of the features constitute the basic building blocks in forming the model – like columns sr7, sr8, sr9, sr20, sr26, etc., which contain only significant features.

Bar chart of influential features.
After data simplification, to further understand the relationships and interdependencies among variables, a heatmap was drawn up, as shown in Figure 6. The heatmap visualizes the correlation matrix of the features, where each cell represents the correlation coefficient between two variables, with colors ranging from deep blue (indicating a strong positive correlation) to deep red (indicating a strong negative correlation). This visualization aids in identifying closely related variables, which can inform feature selection and engineering decisions by highlighting which variables might be redundant or highly informative for predictive modeling. By understanding these relationships, we can better interpret the data and refine models to improve prediction accuracy and efficiency. The heatmap thus serves as a valuable tool in exploring the dataset’s structure and uncovering patterns and interactions between variables.

Correlation heatmap for selected dataset features.
Engine cycles play a crucial role in assessing the Remaining Useful Life (RUL) of engines, as they significantly influence engine degradation and efficiency over time. Each engine cycle refers to a complete operational period of the engine, including both active operation and downtime. With each additional cycle, various factors contribute to their gradual wear and tear, affecting engine performance and increasing the likelihood of failures.
The relationship between engine cycles and RUL is fundamental for predictive maintenance. Over time, the repeated stress and strain experienced during each cycle contribute to the degradation of engine components, such as turbine blades, bearings, and seals. This cumulative effect leads to a decline in engine efficiency and a higher probability of malfunction. Fig. 7 illustrates this relationship by displaying engine degradation as a function of the number of cycles. The graph demonstrates that, while the RUL of different engines may vary due to individual operational histories and maintenance practices, a common trend is observed: as the number of engine cycles increases, the remaining useful life of the engines generally decreases. This inverse relationship indicates that engines with higher cycles are more prone to experience reduced performance and are at a greater risk of failure.

Diagram of model degradation of all engines.
The diagram of engine degradation underscores the importance of monitoring engine cycles for predicting RUL. By analyzing the number of cycles alongside other operational data, predictive maintenance models can more accurately forecast when an engine might require maintenance or replacement, thereby improving overall reliability and reducing unexpected downtime.
The architecture of the one-dimensional Convolutional Neural Network (CNN-1D) algorithm begins with a convolutional phase that utilizes a set of 32 learnable 2×2 convolutional filters. These filters are essential for extracting intricate features from the input data, such as edges and curves. During this phase, the filters slide across the input data, performing convolution operations to identify and capture these key features.
Following the convolutional phase, the output is processed through an aggregation phase. This phase reduces the spatial dimensions of the feature maps generated by the filters, effectively down sampling the data while retaining the most significant features. This process is crucial for focusing on the critical features learned by each filter and for reducing the computational load in subsequent layers.
The network then continues to refine its feature extraction by applying an additional set of 64 filters. As the data progresses through these convolutional phases from left to right, the network increasingly learns more specialized and abstract features. Each successive layer of filters builds upon the previous layers, capturing higher-level patterns and details within the data. This iterative scanning and feature extraction process allows the CNN-1D to develop a comprehensive understanding of the input data, enhancing its ability to make accurate predictions. Table 3 illustrates the progression, showing how the network’s ability to recognize and learn features evolves through each convolutional phase, contributing to the overall performance of the model.
Layer (type) | Output Shape | Param # |
---|---|---|
conv1d (Conv1D) | (None, 23, 32) | 96 |
flatten (Flatten) | (None, 736) | θ |
dense (Dense) | (None, 64) | 47168 |
dense_1 (Dense) | (None, 1) | 65 |
Total Params: 47,329
Trainable Params: 47,329
Non-trainable Params: 0
The CNN algorithm was chosen for building this model due to its powerful learning capabilities. Analysis of the predicted RUL for the FD001 engine revealed a close match between the predicted RUL and the actual RUL, confirming the efficiency of the algorithm in this application, as well as the more general overall effectiveness of deep learning.
The decision tree prediction model, presented in Fig. 8, starting from the input data set, adopts three divisions. Implemented in Python, the model was trained on the regression problem using the following parameters:

Decision Tree structure.
The results indicate an improvement in prediction accuracy as the number of divisions increases. However, some features, such as ‘sr7’ and ‘sr12,’ exhibit greater sensitivity to new divisions, leading to terminal leaf nodes at their branches. In contrast, the results for ‘sr11,’ which stem from node ‘sr7,’ demonstrated suboptimal performance, with a squared error at the final leaf exceeding that of the parent node.
Comparison of predicted and actual RUL graphs reveals that the predicted RUL often diverges from the actual RUL. When the predicted RUL graph is below the actual RUL graph, it indicates that the prediction was made too early, before the actual failure occurred. Conversely, when the predicted RUL graph is above the actual RUL graph, the prediction was late, occurring after the failure. Between these two situations, early prediction is preferable as it helps to avoid potential risks associated with late predictions. When the two graphs align closely, it indicates that the prediction accurately coincided with the failure occurrence (see Fig. 9).

Diagram of true and predicted RUL using CNN-1D.
The CNN-1D algorithm demonstrated exceptional learning capabilities, as evidenced by the high convergence between the predicted and actual RUL values for the FD001 engine (Fig. 10). The figure shows that the CNN-1D model’s predictions closely match the actual RUL values, indicating strong model performance. This contrasts with the discrepancies observed between the expected and real RUL values when using a predictive decision support model based on the Decision Tree (DT) algorithm.

Diagram of true and predicted RUL using DT.
To assess the models’ performance, we utilized common statistical measures: Root Mean Square Error (RMSE) and Mean Square Error (MSE). These metrics quantify the prediction errors and provide insights into the model’s accuracy. RMSE, in particular, helps measure the standard deviation of the residuals or prediction errors. Table 4 summarizes the evaluation metrics for both the CNN-1D and DT models. For the CNN-1D model, the training set showed an MSE of 459.51 and an RMSE of 21.44, while the test set resulted in an MSE of 735.56 and an RMSE of 27.12. The DT model, in turn, had higher error metrics, with an MSE of 567.90, an RMSE of 23.83 for the training set, and an MSE of 837.13 and an RMSE of 28.93 for the test set. Additionally, the R-squared (R2) values, which indicate the proportion of variance explained by the model, were higher for the CNN-1D model (0.75 for the training set and 0.57 for the test set) compared to the DT model (0.68 for the training set and 0.56 for the test set). These results underscore the superior accuracy and reliability of the CNN-1D model over the DT algorithm in predicting RUL, validating the effectiveness of deep learning techniques in this context.
Model | MSE | RMSE | R2 | |
Train set | CNN-1D | 459.5114 | 21.4362 | 0.7461 |
Test set | 735.5647 | 27.1213 | 0.5707 | |
Train set | DT | 567.8965 | 23.8305 | 0.6761 |
Test set | 837.1339 | 28.9332 | 0.5588 |
Prognostics and Health Management (PHM) methods are an advanced approach to maintaining the reliability and efficiency of industrial systems. PHM integrates various technologies and processes to monitor, diagnose, and predict the health and performance of machines and systems in real-time. By leveraging data collected from sensors and other monitoring tools, PHM methods can detect potential failures before they occur, allowing for proactive maintenance and minimizing downtime. This early detection capability is crucial for preventing costly unplanned breakdowns and extending the life of critical assets. PHM methods often combine data-driven models, physics-based models, and hybrid approaches to assess the Remaining Useful Life (RUL) of components, guiding maintenance decisions and ensuring that systems continue to operate efficiently and safely. As part of the broader field of predictive maintenance, PHM is vital in industries where system reliability is paramount, such as aerospace, automotive, and manufacturing.
In this study, a detailed comparative analysis was conducted to evaluate the effectiveness of the CNN-1D deep learning algorithm against the Decision Tree (DT) algorithm in predicting the Remaining Useful Life (RUL) of gas turbine engines using the C-MAPSS dataset. The choice to rely on the CNN-1D algorithm was driven by its superior learning capabilities, which were evidenced by the high degree of convergence observed between the predicted RUL and the actual RUL, particularly for the FD001 engine dataset. The evaluation of model performance was carried out using key metrics such as Mean Square Error (MSE), Root Mean Square Error (RMSE), and the R-squared (R2) coefficient. The results demonstrated that the CNN-1D algorithm outperformed the Decision Tree algorithm, achieving lower MSE and RMSE values on both the training and test datasets and higher R2 values, indicating better predictive accuracy and reliability.
Specifically, the CNN-1D model yielded an RMSE of 21.44 on the training set and 27.12 on the test set, compared to the DT model’s RMSE of 23.83 on the training set and 28.93 on the test set. These results underscore the robustness of the CNN-1D algorithm in handling time series data, reinforcing its efficacy over traditional machine learning approaches like Decision Trees. This study highlights the potential of deep learning techniques, particularly CNN-1D, in the predictive maintenance domain for gas turbine engines. By applying a deep learning approach to time series data, this research affirms the effectiveness of CNN-1D in delivering more accurate and reliable predictions, paving the way for its broader application in similar predictive analytics tasks.