BIG DATA TO IMPROVE AEROSPACE OPERATIONS
In this article, we explain the development of a Predictive Maintenance Deep Learning methodology based on Big Data. The term “Big data” is widely spread these days in technological industries such as aerospace. Collecting a huge amount of data and extract conclusions “automatically” from them is an attractive topic that has been welcomed in many of the industrial sectors, as it has the potential to support and improve costly processes such as maintenance operations. The main benefits of collecting and processing big data in the aerospace industry can be divided into three categories:
- Future predictions: with historical data, it is possible to predict the future behavior of a technological system.
- Product improvement: it is possible to gather insights from automatically-collected data which can iteratively improve the product in terms of design, quality, and availability.
- Decrease costs: by combining high-quality products and well-informed decisions, big data analysis provides economical benefits in the mid and long term.
However, working with big data is not a straightforward process and implies several challenges that need to be faced. The most common are:
- A huge amount of data does not mean successful results. Moreover, it can generate more noise which interferes with optimal predictions. It is necessary to carefully choose the input data.
- Also, the quality of the dataset is crucial: the input data must be complete, and in most cases, preprocessing is a fundamental step towards the successful usage of big data.
- The tools that are used for this goal are not easy to use and they require some specialization to carry out this process.
- Sometimes, it is difficult to understand the correlation of the input with the output, since it is an “invisible” process, where the user cannot see the logic followed by the algorithm.
In the aviation sector, multiple studies have been carried out, mainly in the biggest companies where access to structured sets of data is readily available. The goal of these implementations is to gain the benefits listed above, but long R&D projects which take time and resources need to be launched.
PREDICTIVE MAINTENANCE
DMD Solutions put in the spotlight the Big data topic and tried to identify how the aviation companies, mainly small and medium-size, could take benefit of this process easily. From our experience, we know that maintenance operations are one of the most critical activities in aerospace products life-cycle. It can account for up to 60% of the total operational cost. Big data has helped a new type of condition-based maintenance to emerge: Predictive maintenance (PdM).

Figure 1 – Maintenance methods, from reactive to predictive
PdM improves on its predecessors: reactive and preventive maintenance, as well as overall product safety, by anticipating potential failures that can lead to critical consequences to the aircraft. A well established PdM system is also capable of reducing cost in comparison to preventive maintenance, minimizing unnecessary maintenance tasks.
A Big data algorithm can detect and store several criteria (critical conditions that could lead to a failure) after analyzing previous information regarding the defects, environmental conditions, sensor information, etc., and then, perform the failure prediction. Two approaches can be considered:
- Real-time detection: in this case, the pilot would be alerted early if a failure is about to occur
- Future predictions: the time to next failure can be estimated based on historical information

Figure 2 – Failure timeline and failure data collection
PdM is attracting the attention of large companies in the aeronautical sector, as they see a great opportunity to improve their products and services. For medium and small-size companies, it is not that easy to implement this process as it requires time and qualified personnel to carry out it. DMD Solutions wanted to develop a tailored algorithm for this niche market, so we got down to work.
DEEP LEARNING PROJECT TO PREDICT FAILURES IN AVIATION
Data sources from public databases
As explained before, the data is the critical point of this process. It is necessary to have complete and effective data. There is a minimum number of examples that are required to train properly a predicting model.
In the study carried out by DMD Solutions, the data used was extracted from public domains. Three sources were used:
- Flight data: after some research, a reliable provider of flight information was found. This service provided latitude, longitude, altitude, speed, and other relevant data for flight operations.
- Meteo data: MeteoStat was the chosen provider. With its API, it was possible to obtain temperature, precipitations, and other climatic conditions from different stations around the world.
- Defect data: this data was extracted from the internal historical database of DMD Solutions where information regarding the defects reported on a fleet was stored. The defects considered are those which can be related to external environmental conditions (corrosion, fatigue, etc.).
From these three sources, a big input database of 923,783 rows and 22 columns was built. Another important variable to define is the output: the one which will be predicted with the obtained model. Two different approaches were considered: the first one tried to predict if the aircraft had a failure on the flight date (Binomial classification) and the second one tried to estimate the remaining time until the next failure (multi-class classification). In this article, only the first one will be explained.
Tools & methodology
To carry out this study, two open-source tools were used: TensorFlow and Keras. Keras is a deep learning API written in Python, running on top of the machine learning platform TensorFlow. It was developed with a focus on enabling fast experimentation. Keras focuses on being user-friendly, modular, and extensible. Keras is developed by François Chollet, a Google engineer. Keras provides an implementation for commonly used functions in machine learning (especially the neural-network building ones). Tensorflow is a free and open-source software library for machine learning focused on the training and inference of Artificial Neural Networks. Tensorflow is developed by Google. Also, the library Sci-kit learn was used to implement different classifiers that are already developed.
Results
The first results obtained seem to be too good. In fact, too good to be true. So, it was necessary to review and check if the results obtained were reliable. With the help of the confusion matrix, it was seen that the algorithm was not predicting the positive values (days when a failure existed).

Figure 3 – Expected and resulting Confusion Matrix
It was discovered that this happens when the dataset shows unbalanced data: in this case, there were too few positives (moments when there is a defect) in comparison with the negatives (moments when there is no defect). This casuistic lead the algorithm to bad results.

Figure 4 – Dataset balance is necessary for DL satisfactory results
To deal with this problem, several oversampling algorithms such as SMOTE-Borderline and SMOTE-ADASYN were employed. The results improved considerably, but still do not satisfy completely. The accuracy obtained was around 40%.
With the engineering knowledge, two new variables were created in the input data to help the algorithm to find more reliable correlations:
- Accumulation of snow that the aircraft has been exposed to
- Accumulation of precipitations that the aircraft has been exposed to
These two variables give an estimation of the historical environmental conditions to which the aircraft has been exposed. This inclusion helped the algorithm to improve the prediction. The results with the new model improved substantially, and finally, they were considered satisfactory. The final accuracy of the prediction was 68% with 86% of positive values predicted correctly.

Figure 5 – Confusion Matrix after Dataset normalization & processing
Conclusions
As seen in the results, an evident correlation was found between the environmental conditions and the failures reported in an aircraft. The results obtained are satisfactory if we consider the data used. After applying several preprocessing methods (normalization, data balancing, etc), the dataset was evolved from its raw version and the results improved significantly. One of the main lessons learned is that the quality of the dataset is the most important concern. The algorithm needs to be fed up with relevant information. For example, the new variables created based on our engineering knowledge improved noticeably.
The performance of the classifiers with this data is promising especially considering that there was incomplete and noisy data. With better data, such as data coming from the Health and Usage Monitoring System (HUMS), it is possible to improve this performance and even try to predict internal failures (e.g. a short circuit) of the aircraft thanks to the data from the different sensors an aircraft is equipped with. In point of fact, the other aim of this study was also to show if a percentage of failures could be also predicted with this ”incomplete” dataset so that DMD Solutions can approach aviation companies to test the algorithm with more complete data, including HUMS’ operation information.
To sum up, the potential of PdM is great, and as has been seen in this study, it can be exploited with the appropriate data using practically any of the classifiers considered in this study. Taking into account the importance of safety and the high costs associated with maintenance in the aviation industry, the effort may be worth it.
PATHS FOR INDUSTRY IMPLEMENTATION OF PdM
With the promising results obtained in this study, a collaboration with an airline or aircraft integrator could top up the work. Aircraft are equipped with HUMS which is a system that stores the data detected by the different sensors installed in the aircraft. Including HUMS data in the study, the algorithm could also detect internal defects that are originated from anomalies of internal systems of the aircraft (e.g. short-circuit).
Also, to offer this algorithm to different parties, an API is being developed. This API would perform the algorithm on the clients’ data and provide predictions.
These requests from the clients will be performed from Robin RAMS software. As Robin has already implemented a FRACAS module, integration paths to perform failure prediction are readily set up. From the Robin RAMS application, the client can send the data from FRACAS and HUMS to the API. And in return, the API would provide a predicting model that can be applied to obtain future predictions.
