Stacking Large Language Models is All You Need: A Case Study on Phishing Url Detection

Prompt-engineered Large Language Models (LLMs) have gained widespread adoption across various applications due to their ability to perform complex tasks without requiring additional training. Despite their impressive performance, there is considerable scope for improvement, particularly in addressing the limitations of individual models. One promising avenue is the use of ensemble learning strategies, which combine the strengths of multiple models to enhance overall performance. In this study, we investigate the effectiveness of stacking ensemble techniques for chat-based LLMs in text classification tasks, with a focus on phishing URL detection. Notably, we introduce and evaluate three stacking methods: (1) prompt-based stacking, which uses multiple prompts to generate diverse responses from a single LLM; (2) model-based stacking, which combines responses from multiple LLMs using a unified prompt; (3) hybrid stacking, which integrates the first two approaches by employing multiple prompts across different LLMs to generate responses. For each of these methods, we explore meta-learners of varying complexities, ranging from Logistic Regression to BERT. Additionally, we investigate the impact of including the input text as a feature for the meta-learner. Our results demonstrate that stacking ensembles consistently outperform individual models, achieving superior performance with minimal training and computational overhead. These findings highlight the potential of stacking ensembles in mitigating the limitations of existing methods and significantly enhancing the efficiency and accuracy of chat-based LLMs for text classification tasks.

Lingua:: Inglese

Frequenza di pubblicazione:: 4 volte all'anno
Argomenti della rivista:: Informatica, Base dati e data mining, Intelligenza artificiale

Feed RSS della rivista

Stacking Large Language Models is All You Need: A Case Study on Phishing Url Detection

Hawraa Nasser

Fouad Trad

Ali Chehab

Pubblicato online: 11 lug 2025

Pagine: 337 - 356

Ricevuto: 16 mar 2025

Accettato: 12 giu 2025

DOI: https://doi.org/10.2478/jaiscr-2025-0017

Parole chiaveensemble learning, stacking, large language models, ensemble LLMs, phishing detection

© 2025 Hawraa Nasser et al., published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Parole chiave
ensemble learning, stacking, large language models, ensemble LLMs, phishing detection