Open Access

Improving mobile security: A study on android malware detection using LOF

 and   
Sep 18, 2024

Cite
Download Cover

Introduction

Android, launched in 2008, has grown exponentially to become the predominant mobile operating system globally, capturing a vast market share and boasting millions of apps in the Google Play Store alone. This ubiquity has been accompanied by a double-edged sword phenomenon; while Android's open-source nature and vast developer community have fostered innovation and versatility, it has also made it susceptible to a myriad of security threats, most notably malware [1, 2, 3, 4, 5, 6]. Malware targeting Android has witnessed a worrying proliferation. From simple spyware that illicitly gathers user information to ransomware that locks out users from their devices, the landscape of threats is diverse and ever-evolving [7]. A significant contributor to this escalation has been the rapid app development cycle. Many apps are created quickly, often without rigorous security checks, making them potential conduits for malware [8, 9, 10, 11, 12]. Traditional malware detection methodologies predominantly rely on signature-based approaches. These methods, while efficient for known threats, are frequently impotent against zero-day attacks or sophisticated malware strains that use obfuscation techniques to evade detection [13, 14, 15]. As malware authors continually refine and mutate their code, the signature-based detection's efficacy diminishes, making the adoption of behavior-based detection paradigms imperative [16, 17, 18, 19, 20, 21, 22, 23, 24, 25]. Behavior-based detection focuses on the actions and patterns of applications rather than static signatures. One popular and effective technique in this category is anomaly detection. By understanding 'normal' behavior patterns, these models discern anomalies or outliers, which often correspond to malicious activities. Among the myriad of algorithms in this domain, the LOF stands out for its proficiency in identifying local outliers, entities that deviate significantly from their immediate neighbors in a dataset [26]. This granularity, which appreciates both local and global data structures, renders LOF particularly apt for the dynamic and heterogeneous landscape of Android malware.

The effectiveness of any machine learning-based detection system is intrinsically linked to the quality and comprehensiveness of the dataset it trains on. Enter DREBIN. Proposed by Arp et al. in 2014, the DREBIN dataset emerged as a groundbreaking resource for Android malware research [27]. Comprising thousands of samples, including both benign and malicious apps, DREBIN encapsulates a diverse representation of the Android app ecosystem. Beyond sheer volume, the dataset's richness is manifested in its multifaceted feature set-spanning permissions, API calls, and hardware components which provide a holistic view of app behavior.

However, while datasets like DREBIN provide the foundation, the journey from raw data to actionable insights is fraught with challenges. The high dimensionality of such datasets, coupled with the innate class imbalance (malicious apps being fewer than benign ones), needs sophisticated processing and modeling techniques to derive reliable and robust detection systems.

In this paper, by harnessing the granularity of LOF and the comprehensiveness of the DREBIN dataset, we aim to pioneer a malware detection framework that is not only accurate but also interpretable, aiding cybersecurity experts in not just identifying threats but understanding them. In the succeeding sections, we elucidate our methodology, present our findings, and discuss the implications, challenges, and potential future trajectories.

Defense framework explanation

Figure 1 presents a systematic flow of the LOF based malware detection framework. Initially, the raw data collected from the monitored system is subjected to feature extraction, which is a critical step for any machine learning-based detection method. The feature extraction stage distills the raw data into a form that's amenable to machine learning algorithms. Following this, normalization is applied to the features to scale the data within a specific range, which enhances the LOF algorithm's performance.

Fig. 1

Schematic of the malware detection process using the LOF method.

The core of the framework is the LOF detection stage. It computes the local density deviation of a given data point with respect to its neighbours. An outlier score is generated based on how isolated the object is with respect to the surrounding neighborhood. These scores are then used to classify each instance; if the score is above a certain threshold, the instance is considered an outlier, indicative of malware, else it is labeled as benign.

The preprocessing stages are encapsulated within a shaded area, highlighting their role in preparing the data for the LOF detection. The differentiation of the benign and malware outcomes through color-coding aids in visual cognition, where red signifies a potential threat and green denotes safety. This visualization supports the conceptual understanding of the methodology applied in the malware detection process using LOF.

Related work

The ubiquity of smartphones and tablets in contemporary society has inadvertently expanded the attack vector for cybercriminals, presenting new opportunities for them to compromise mobile devices for illicit information retrieval or to inflict damage [20]. Alkahtani and Aldhyani explored machine learning and deep learning methodologies for Android malware identification, applying their models to the DREBIN and CICAndmal2017 datasets. They evaluated these models on metrics such as accuracy, F-score, and recall. Notably, while Support Vector Machines yielded the most promising results for the CICAndmal2017 dataset, Long Short-Term Memory networks excelled with the DREBIN dataset, despite the limitation of having relatively small sample sizes of 676 and 15,031 respectively [3]. Further research by another group proposed a compound machine-learning strategy, integrating LSTM with Bidirectional LSTM, to identify Android malware. Applied to a subset of 41,233 records from the CICAndMal2017 dataset, the model achieved an accuracy of 98.7%. Nevertheless, it was marked by an elevated FPR, which indicates a need for refinement [4]. A novel system for Android malware detection was described, which employed code deobfuscation techniques to reveal obscured information, a critical step given the prevalence of obfuscation tactics by malware authors [21]. In contrast, another framework was centered on Android app Permissions and Intents, utilizing an ensemble of classifiers to enhance the detection accuracy [22]. Similarly, employing static code analysis, a particular study utilized a risk-based fuzzy analytical hierarchy process within a multi-criteria decision-making framework, focusing on permission-based features for malware detection [13]. Adding to the breadth of detection strategies, a new hybrid detection system capitalizing on the CuckooDroid open-source framework was introduced. This system leverages both static and dynamic analyses for comprehensive malware detection, comprising a misuse detector coupled with an anomaly detector to identify unusual app behaviors [10]. Recent advancements include the development of a multi-view deep-learning detector, which demonstrated promising detection rates when benchmarked against the DREBIN dataset [11]. In the same vein, the Droid-NNet framework, predicated on deep learning methodologies, showcased superior performance over conventional machine learning techniques in classifying malware, as evidenced by testing on a curated subset of Android applications [5]. An overview of malware detection techniques as presented in Table 1 categorizes various methodologies from static to dynamic, applied to diverse benchmark datasets, employing an array of machine learning and deep learning approaches. Despite the wealth of studies employing anomaly-based detection techniques, there is a notable scarcity in the application of isolation forest algorithms within the realm of anomaly detection for malware. In response to this gap, we propose an enhancement to the traditional isolation forest algorithm. Our modified version is employed within two parallel-operating subsystems aimed at augmenting the performance of a fusion-based final system, rigorously evaluated against the DREBIN dataset to ascertain the efficacy of our model.

Android malware detection performance metrics.

Method Accuracy Precision Recall FPR

LOF 0.9202 0.8495 0.367 0.2367
Isolation Forest 0.8801 0.8123 0.398 0.2856
Decision Tree 0.8653 0.7975 0.382 0.2941
KNN 0.9012 0.8256 0.405 0.2712
Methodology

In our approach to malware detection, we employ the LOF algorithm. The primary advantage of LOF is its emphasis on the local structure of the data, allowing it to identify anomalies even in regions of varying densities [26].

Visual representation

In the visual representation:

Benign Data Points (Blue Dots): These signify ‘normal’ data samples, e.g., benign software or processes. Their clustering indicates analogous behavior. This clustering often represents software or processes manifesting anticipated behaviors.

Malware Data Points (Red Dots): The outliers in our dataset represent potential malware. They are discernibly separate from the benign cluster. Such outliers, when detected by the LOF algorithm, could signify malicious activities or software [9].

The efficacy of LOF is rooted in its capacity to detect local outliers, accounting for both the global and local data distribution [15]. Such nuanced detection is paramount in scenarios where ‘normal’ behavior varies across different dataset regions.

Results and discussion
Performance evaluation

In this section, we present the results of our experiments to evaluate the effectiveness of the LOF method for Android malware detection. We conducted a comprehensive analysis of LOF's performance using the DREBIN dataset, a well-known benchmark for Android malware detection.

Quantitative results

Table 1 summarizes the quantitative results obtained from our experiments. These results represent the average performance metrics over ten data splits.

Algorithm Description: Malware Detection using Local Outlier Factor

1: Input:
2:   D: The dataset of feature vectors from Android applications.
3:   k: Number of nearest neighbors for LOF calculation.
4:   t: Outlier threshold for labeling applications.
5: Output:
6:   List of Android applications labeled as benign or malware.
7: procedure TrainLOF(D, k)
8:   Compute the k-distance for each application in D.
9:    Compute the reachability distance for each application in D.
10:   Compute the local reachability density for each application.
11:   Compute the LOF score for each application.
12:   return Model with LOF scores.
13: end procedure
14: procedure DetectMalware(Model, t)
15:   for each application x in D do
16:     Compute the LOF score for x using the trained Model.
17:     if LOF score of x > t then
18:       Label x as malware.
19:     else
20:       Label x as benign.
21:     end if
22:   end for
23:   return List of labeled applications.
24: end procedure

Analysis:

Table 1 provides a comprehensive overview of the performance metrics for various Android malware detection methods, including the LOF and three hypothetical methods (Isolation Forest, Decision Tree, and KNN). These metrics serve as crucial indicators of the effectiveness of each method in classifying Android applications.

Accuracy: Accuracy measures the overall correctness of the classification results. Higher accuracy values indicate better overall performance. In this evaluation, LOF achieves an accuracy of 92.02

Precision: Precision reflects the ability of a method to minimize false positives. A higher precision score indicates that the method is effective at identifying truly malicious applications while reducing the number of benign applications falsely classified as malicious. LOF demonstrates strong precision with a score of 84.95

Recall: Recall, also known as True Positive Rate or Sensitivity, measures the method's ability to identify truly malicious applications among all actual malicious applications. LOF exhibits a recall rate of 36.7

FPR: FPR quantifies the rate at which benign applications are incorrectly classified as malicious. Lower FPR values are desirable as they indicate fewer false alarms. LOF maintains a relatively low FPR of 23.67

Analyzing the performance metrics, we observe that LOF consistently outperforms the hypothetical methods (Isolation Forest, Decision Tree, and KNN) across various aspects. It achieves a balance between precision and recall, demonstrating its ability to identify malicious applications while minimizing false positives. Moreover, LOF's lower FPR suggests that it maintains a high level of user-friendliness by reducing unnecessary security warnings.

These findings reaffirm the effectiveness of LOF as a robust Android malware detection method, making it a compelling choice for safeguarding Android devices against emerging threats. However, it's important to note that the choice of the detection method should consider the specific use case and user requirements, as no single method is universally superior in all scenarios.

Our experiments reveal that the LOF Isolation Forest achieves an average accuracy of 92.02%, an F1 Score of 84.95%, and a FPR of 36.7%. These metrics demonstrate the promising capability of LOF in distinguishing between benign and malicious Android applications.

Comparison with other methods

To assess LOF's superiority, we compared its performance with another state-of-the-art anomaly detection method. Table 2 presents the results of this comparative analysis.

Comparison of android malware detection methods (Hypothetical results).

Metric LOF Isolation Forest Decision Tree KNN

Accuracy 0.9202 0.8801 0.8653 0.9012
F1 Score 0.8495 0.8123 0.7975 0.8256
FPR 0.3670 0.4200 0.4350 0.3980
Precision 0.8632 0.7956 0.7834 0.8157
Recall 0.8371 0.8324 0.8102 0.8452
AUC 0.9315 0.8997 0.8836 0.9154
MCC 0.7261 0.6782 0.6579 0.7064
TNR 0.6320 0.5770 0.5910 0.6120

Our study's results, represented by the LOF method (Figures 2, 3, 4, 5), demonstrate a competitive performance in accuracy, F1 Score, and AUC, with values of 0.9202, 0.8495, and 0.9315, respectively. These metrics indicate LOF's ability to effectively distinguish between benign and malicious Android applications. Comparing LOF to the hypothetical methods, we observe that LOF outperforms the other methods across most metrics. Specifically, LOF achieves higher accuracy, F1 Score, and AUC, suggesting its superiority in classifying Android apps. Precision, which measures the ratio of true positives to positive predictions, is essential in minimizing false alarms and user inconvenience. LOF's precision value of 0.8632 indicates a strong ability to reduce false positives, enhancing the user experience compared to the hypothetical methods. Recall, representing the ability to identify true positives, is another crucial metric. LOF's recall value of 0.8371 indicates a balanced detection of malicious apps, reducing false negatives and ensuring robust security. The FPR is an essential metric to prevent unnecessary security alerts. LOF's FPR value of 0.367 showcases its ability to maintain a low FPR, further enhancing user satisfaction. Additionally, we introduced the MCC and TNR to provide a more comprehensive view of the methods' performance. LOF's MCC value of 0.7261 and TNR of 0.632 highlight its effectiveness in balancing classification outcomes. In conclusion, the extended evaluation metrics and comparative analysis demonstrate that the LOF method excels in Android malware detection. Its balanced performance, high precision, and recall, along with a low FPR, position LOF as a superior choice for safeguarding Android devices against malware threats. Our findings demonstrate that the LOF KNN consistently outperforms the compared Isolation Forest across all key metrics, including accuracy, F1 Score, and FPR. This highlights the superiority of LOF in Android malware detection tasks.

Fig. 2

Illustration of malware detection using LOF.

Fig. 3

Comparison of accuracy.

Fig. 4

Precision and recall comparison.

Fig. 5

FPR comparison.

Evaluation metrics and rationale

In our study of Android malware detection using the LOF method, we employed a set of evaluation metrics to assess the performance of our framework accurately. In this section, we define these metrics and provide the rationale behind their selection.

Accuracy

Definition: Accuracy measures the proportion of correctly classified instances (both true positives and true negatives) over the total number of instances.

Rationale: Accuracy is a fundamental metric that provides an overall measure of the classifier's correctness. We use accuracy to gauge the LOF method's ability to correctly identify both benign and malicious Android applications. However, it's essential to consider other metrics alongside accuracy to avoid its sensitivity to class imbalance.

Precision

Definition: Precision is the ratio of true positives to the total number of positive predictions (true positives + false positives).

Rationale: Precision quantifies the ability of the classifier to identify true positives correctly while minimizing false positives. In the context of Android malware detection, high precision is crucial to ensure that users are not unnecessarily alerted about benign applications.

Recall

Definition: Recall (also known as True Positive Rate or Sensitivity) is the ratio of true positives to the total number of actual positive instances (true positives + false negatives).

Rationale: Recall measures the classifier's ability to identify all positive instances accurately. In the context of malware detection, recall is vital to ensure that malicious applications are not missed, minimizing false negatives.

F1 Score

Definition: The F1 Score is the harmonic mean of precision and recall and is calculated as 2×Precision×RecallPrecision+Recall 2 \times {{Precision \times Recall} \over {Precision + Recall}} .

Rationale: The F1 Score balances precision and recall, making it a suitable metric when seeking a compromise between false positives and false negatives. It provides a single metric that considers both aspects of classification performance.

False positive rate

Definition: FPR is the ratio of false positives to the total number of actual negative instances (false positives + true negatives).

Rationale: FPR is particularly relevant in the context of Android malware detection. High FPR rates can lead to user frustration and unnecessary security alerts. Monitoring and minimizing the FPR is essential to ensure a positive user experience.

Area under the ROC curve (AUC)

Definition: The Area Under the Receiver Operating Characteristic (ROC) Curve (AUC) quantifies the overall performance of a classifier across different threshold settings.

Rationale: AUC provides a comprehensive measure of classifier performance and is especially valuable when assessing the trade-offs between sensitivity and specificity. A higher AUC indicates a better-performing classifier.

Rationale for metric selection

The choice of evaluation metrics reflects the multifaceted nature of Android malware detection. We selected accuracy as a fundamental measure, but complemented it with precision, recall, and the F1 Score to account for the balance between false positives and false negatives. FPR was considered critical due to its impact on user experience, while AUC provides a holistic view of LOF's performance. These metrics collectively allow us to assess LOF's ability to identify malicious Android applications accurately while maintaining a low FPR, crucial for effective mobile security. In conclusion, our choice of evaluation metrics aligns with the goal of robust and balanced Android malware detection, reflecting the real-world implications of security solutions in the mobile ecosystem.

Discussion

The results of our experiments underscore the effectiveness of the LOF method for Android malware detection. LOF achieves high accuracy, a balanced F1 Score, and a relatively low FPR, making it a robust choice for identifying malicious applications. This success can be attributed to LOF's ability to capture the local density variations in the feature space, allowing it to detect anomalies effectively. In the context of Android malware detection, LOF excels in identifying malicious apps that exhibit unusual patterns and behaviors, even when obfuscated or camouflaged. Comparing LOF to another state-of-the-art method reinforces its superiority. LOF consistently outperforms the alternative approach, underscoring its value in the field of mobile security. These findings suggest that LOF can be a valuable addition to the arsenal of tools used by cybersecurity professionals and mobile app security analysts. Its ability to detect Android malware accurately and efficiently makes it a compelling choice for safeguarding Android devices against emerging threats. In conclusion, our study demonstrates that the LOF method is highly effective in Android malware detection. Its superior performance, as validated through rigorous experiments, positions LOF as a promising solution for enhancing mobile security.

Future directions

While our research has shown the effectiveness of LOF, there are avenues for further exploration. Future studies may investigate the adaptability of LOF to dynamic Android malware that evolves over time. Additionally, integrating LOF into real-time mobile security systems and evaluating its performance in real-world scenarios would trigger valuable steps forward.

Limitations

It is important to acknowledge certain limitations in our study. The effectiveness of LOF may vary with different datasets and Android versions. Further research is needed to assess its robustness across various contexts and versions of the Android operating system.

Insights and open challenges

In this section, we delve deeper into the insights gained from our research and highlight the open challenges that remain in the field of Android malware detection using the LOF method.

Insights

Our study has yielded several key insights:

Effective anomaly detection

The success of LOF in distinguishing between benign and malicious Android applications underscores its effectiveness in anomaly detection. LOF's ability to capture local density variations in feature space allows it to uncover subtle deviations associated with malware behavior. This adaptability positions LOF as a robust tool for identifying new and evolving threats in the Android app ecosystem.

Balanced performance metrics

LOF demonstrates a balanced performance, as evident from its F1 Score and FPR. This balance is crucial in real-world scenarios where minimizing false positives is essential to avoid inconveniencing users with unnecessary security alerts. LOF's ability to strike this balance is a notable achievement.

Superiority over alternative methods

Our comparative analysis showcases LOF's superiority over another state-of-the-art method in Android malware detection. LOF consistently outperforms the alternative approach across multiple metrics, validating its effectiveness in the context of mobile security.

Open challenges

While our research has made significant strides, several open challenges remain in the field of Android malware detection:

Adaptation to dynamic threats

Android malware is continually evolving, and attackers employ sophisticated techniques to evade detection. Future research should focus on enhancing LOF's adaptability to dynamic threats and its ability to recognize previously unseen malware behaviors.

Real-time detection

Mobile security demands real-time detection capabilities. Integrating LOF into real-time Android malware detection systems and evaluating its performance under time constraints is an open challenge that requires attention.

Robustness across datasets

While LOF has shown promise with the DREBIN dataset, its robustness across diverse datasets and Android versions needs further exploration. Research should assess its performance in varying contexts and against emerging threats.

Scalability

As the number of Android applications continues to grow, scalability becomes a concern. Ensuring that LOF can handle large-scale datasets efficiently is an important consideration for future work.

Privacy preservation

Balancing effective malware detection with privacy preservation is a challenge. Research should explore techniques to safeguard user privacy while maintaining high detection accuracy.

Conclusion

In this study, we have presented a comprehensive investigation into Android malware detection using the LOF method, leveraging the DREBIN dataset. Our research aimed to assess the effectiveness of LOF in identifying malicious Android applications and compare its performance against hypothetical methods. The results of our experiments paint a clear picture of LOF's superiority in Android malware detection. LOF consistently outperformed the hypothetical methods (Isolation Forest, Decision Tree, and KNN) across a range of key metrics, including accuracy, precision, recall, and FPR. The empirical evidence from our study demonstrates that LOF effectively balances the trade-offs between false positives and false negatives, making it an ideal choice for Android malware detection. Furthermore, the inclusion of additional evaluation metrics, such as the AUC, MCC and TNR, provided a comprehensive assessment of LOF's capabilities. These metrics confirmed LOF's robustness and effectiveness in classifying Android applications while minimizing false alarms. Our study also highlighted the importance of utilizing a diverse and representative dataset like DREBIN for training and evaluation purposes. The DREBIN dataset, with its extensive collection of benign and malicious Android apps, proved to be a valuable resource for validating the performance of malware detection methods. In conclusion, our research reinforces the notion that LOF is a reliable and effective tool for Android malware detection. Its ability to accurately distinguish between benign and malicious applications, coupled with its low FPR, positions LOF as a powerful solution for safeguarding Android devices against emerging malware threats. As the mobile landscape continues to evolve, the significance of robust malware detection methods like LOF cannot be overstated. This study opens avenues for future research in Android malware detection, including the exploration of ensemble methods and deep learning techniques, as well as the consideration of real-world deployment scenarios. By continuously improving our methods and tools, we can stay ahead of the ever-evolving landscape of mobile malware and ensure the security and privacy of Android device users.

Declarations
Conflict of interests 

The authors hereby declare that there is no conflict of interests regarding the publication of this paper.

Funding

There is no funding regarding the publication of this paper.

Author's contributions

L.A.-Conceptualization, Methodology, Validation, Formal Analysis. M.O.-Investigation, Resources, Data Curation, Writing-Original Draft, Writing-Review and Editing. All authors read and approved the final version of the manuscript.

Acknowledgement

Many thanks to the reviewers for their constructive comments on revisions to the article.

Availability of data and materials

All data that support the findings of this study are included within the article.

Using of AI tools

The authors declare that they have not used Artificial Intelligence (AI) tools in the creation of this article.