Otwarty dostęp

Stacking Large Language Models is All You Need: A Case Study on Phishing Url Detection

,  oraz   
11 lip 2025

Zacytuj
Pobierz okładkę

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023. Search in Google Scholar

Rohan Anil, Andrew M Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos, Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, et al. Palm 2 technical report. arXiv preprint arXiv:2305.10403, 2023. Search in Google Scholar

Leo Breiman. Random Forests. Machine Learning, 45(1):5–32, October 2001. Search in Google Scholar

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. Language Models are Few-Shot Learners, July 2020. arXiv:2005.14165 [cs]. Search in Google Scholar

Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, Harsha Nori, Hamid Palangi, Marco Tulio Ribeiro, and Yi Zhang. Sparks of Artificial General Intelligence: Early experiments with GPT-4, April 2023. arXiv:2303.12712 [cs]. Search in Google Scholar

Yupeng Chang, Xu Wang, Jindong Wang, Yuan Wu, Linyi Yang, Kaijie Zhu, Hao Chen, Xiaoyuan Yi, Cunxiang Wang, Yidong Wang, Wei Ye, Yue Zhang, Yi Chang, Philip S. Yu, Qiang Yang, and Xing Xie. A Survey on Evaluation of Large Language Models. ACM Transactions on Intelligent Systems and Technology, 15(3):1–45, June 2024. Search in Google Scholar

Lingjiao Chen, Jared Quincy Davis, Boris Hanin, Peter Bailis, Ion Stoica, Matei Zaharia, and James Zou. Are More LLM Calls All You Need? Towards Scaling Laws of Compound Inference Systems, June 2024. arXiv:2403.02419 [cs, eess]. Search in Google Scholar

Corinna Cortes and Vladimir Vapnik. Support-vector networks. Machine Learning, 20(3):273–297, September 1995. Search in Google Scholar

A. Costello. Punycode: A Bootstring encoding of Unicode for Internationalized Domain Names in Applications (IDNA). Technical Report RFC3492, RFC Editor, March 2003. Search in Google Scholar

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, May 2019. arXiv:1810.04805 [cs]. Search in Google Scholar

Federico Errica, Giuseppe Siracusano, Davide Sanvito, and Roberto Bifulco. What Did I Do Wrong? Quantifying LLMs’ Sensitivity and Consistency to Prompt Engineering, June 2024. arXiv:2406.12334 [cs]. Search in Google Scholar

Abdallah Ghourabi and Manar Alohaly. Enhancing Spam Message Classification and Detection Using Transformer-Based Embedding and Ensemble Learning. Sensors, 23(8):3861, January 2023. Number: 8 Publisher: Multidisciplinary Digital Publishing Institute. Search in Google Scholar

Yichong Huang, Xiaocheng Feng, Baohang Li, Yang Xiang, Hui Wang, Ting Liu, and Bing Qin. Ensemble learning for heterogeneous large language models with deep parallel collaboration. Advances in Neural Information Processing Systems, 37:119838–119860, 2025. Search in Google Scholar

Dongfu Jiang, Xiang Ren, and Bill Yuchen Lin. Llm-blender: Ensembling large language models with pairwise ranking and generative fusion. arXiv preprint arXiv:2306.02561, 2023. Search in Google Scholar

David G. Kleinbaum and Mitchel Klein. Logistic Regression. Statistics for Biology and Health. Springer New York, New York, NY, 2010. Search in Google Scholar

Takashi Koide, Naoki Fukushi, Hiroki Nakano, and Daiki Chiba. ChatSpamDetector: Leveraging Large Language Models for Effective Phishing Email Detection, August 2024. arXiv:2402.18093 [cs]. Search in Google Scholar

Takashi Koide, Naoki Fukushi, Hiroki Nakano, and Daiki Chiba. Detecting Phishing Sites Using Chat-GPT, February 2024. arXiv:2306.05816 [cs]. Search in Google Scholar

W. L. T. T. N. Kumarasiri, M. K. J. C. Siriwardhana, S. A. D. S. L. Suraweera, A. N. Senarathne, and S. M. B. Harshanath. Cybersmish: A Proactive Approach for Smishing Detection and Prevention using Machine Learning. In 2023 7th International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), pages 210–217, October 2023. ISSN: 2768-0673. Search in Google Scholar

Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521(7553):436–444, May 2015. Search in Google Scholar

Jinliang Lu, Ziliang Pang, Min Xiao, Yaochen Zhu, Rui Xia, and Jiajun Zhang. Merge, ensemble, and cooperate! a survey on collaborative strategies in the era of large language models. arXiv preprint arXiv:2407.06089, 2024. Search in Google Scholar

Keming Lu, Hongyi Yuan, Runji Lin, Junyang Lin, Zheng Yuan, Chang Zhou, and Jingren Zhou. Routing to the expert: Efficient reward-guided ensemble of large language models. arXiv preprint arXiv:2311.08692, 2023. Search in Google Scholar

Samuel Marchal, Jérôme François, Radu State, and Thomas Engel. PhishStorm: Detecting Phishing With Streaming Analytics. IEEE Transactions on Network and Service Management, 11(4):458–471, December 2014. Conference Name: IEEE Transactions on Network and Service Management. Search in Google Scholar

Ggaliwango Marvin, Nakayiza Hellen, Daudi Jjingo, and Joyce Nakatumba-Nabende. Prompt Engineering in Large Language Models. In I. Jeena Jacob, Selwyn Piramuthu, and Przemyslaw Falkowski-Gilski, editors, Data Intelligence and Cognitive Informatics, pages 387–402, Singapore, 2024. Springer Nature. Search in Google Scholar

Ammar Mohammed and Rania Kora. A comprehensive review on ensemble deep learning: Opportunities and challenges. Journal of King Saud University-Computer and Information Sciences, 35(2):757–774, 2023. Search in Google Scholar

Tyler Moore and Benjamin Edelman. Measuring the perpetrators and funders of typosquatting. In International Conference on Financial Cryptography and Data Security, pages 175–191. Springer, 2010. Search in Google Scholar

Daniel Nahmias, Gal Engelberg, Dan Klein, and Asaf Shabtai. Prompted contextual vectors for spear-phishing detection. arXiv preprint arXiv:2402.08309, 2024. Search in Google Scholar

Humza Naveed, Asad Ullah Khan, Shi Qiu, Muhammad Saqib, Saeed Anwar, Muhammad Usman, Naveed Akhtar, Nick Barnes, and Ajmal Mian. A Comprehensive Overview of Large Language Models, April 2024. arXiv:2307.06435 [cs]. Search in Google Scholar

Diego Orozco-Fonseca, Gabriela Marín, and Adrian Lara. Taxonomy of malicious url detection techniques. In International Conference on Information Technology & Systems, pages 73–81. Springer, 2024. Search in Google Scholar

Alec Radford. Improving language understanding by generative pre-training. 2018. Search in Google Scholar

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019. Search in Google Scholar

Shahriyar Zaman Ridoy, Md Shazzad Hossain Shaon, Alfredo Cuzzocrea, and Mst Shapna Akter. Enstack: An ensemble stacking framework of large language models for enhanced vulnerability detection in source code. In 2024 IEEE International Conference on Big Data (BigData), pages 6356–6364. IEEE, 2024. Search in Google Scholar

Ozgur Koray Sahingoz, Ebubekir Buber, Onder Demir, and Banu Diri. Machine learning based phishing detection from URLs. Expert Systems with Applications, 117:345–357, March 2019. Search in Google Scholar

Hajar Sakai and Sarah S Lam. Quad-llm-mltc: Large language models ensemble learning for healthcare text multi-label classification. arXiv preprint arXiv:2502.14189, 2025. Search in Google Scholar

Lee Joon Sern, Yam Gui Peng David, and Chan Jin Hao. PhishGAN: Data Augmentation and Identification of Homoglyph Attacks. In 2020 International Conference on Communications, Computing, Cybersecurity, and Informatics (CCCI), pages 1–6, November 2020. Search in Google Scholar

Chanti Surya prakasam and T. Chithralekha. A literature review on classification of phishing attacks. International Journal of Advanced Technology and Engineering Exploration, 9:446–476, April 2022. Search in Google Scholar

Gemini Team, Rohan Anil, Sebastian Borgeaud, Yonghui Wu, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M Dai, Anja Hauth, et al. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023. Search in Google Scholar

Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023. Search in Google Scholar

Fouad Trad and Ali Chehab. Large multimodal agents for accurate phishing detection with enhanced token optimization and cost reduction. In 2024 2nd International Conference on Foundation and Large Language Models (FLLM), pages 229–237. IEEE, 2024. Search in Google Scholar

Fouad Trad and Ali Chehab. Prompt engineering or fine-tuning? a case study on phishing detection with large language models. Machine Learning and Knowledge Extraction, 6(1):367–384, 2024. Search in Google Scholar

Fouad Trad and Ali Chehab. To ensemble or not: Assessing majority voting strategies for phishing detection with large language models. In International Conference on Intelligent Systems and Pattern Recognition, pages 158–173. Springer, 2024. Search in Google Scholar

Fouad Trad and Ali Chehab. Evaluating the efficacy of prompt-engineered large multimodal models versus fine-tuned vision transformers in image-based security applications. ACM Transactions on Intelligent Systems and Technology, May 2025. Search in Google Scholar

A Vaswani. Attention is all you need. Advances in Neural Information Processing Systems, 2017. Search in Google Scholar

Wei Wei, Qiao Ke, Jakub Nowak, Marcin Korytkowski, Rafał Scherer, and Marcin Woźniak. Accurate and fast URL phishing detector: A convolutional neural network approach. Computer Networks, 178:107275, September 2020. Search in Google Scholar

Junjie Ye, Xuanting Chen, Nuo Xu, Shichun Liu, Yuhan Cui, Zeyang Zhou, Chao Gong, Yang Shen, Jie Zhou, Siming Chen, Tao Gui, Qi Zhang, and Xuanjing Huang. A Comprehensive Capability Analysis of GPT-3 and GPT-3.5 Series Models. Search in Google Scholar

Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, and Bill Dolan. DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation, May 2020. arXiv:1911.00536 [cs]. Search in Google Scholar

Rasha Zieni, Luisa Massari, and Maria Carla Calzarossa. Phishing or Not Phishing? A Survey on the Detection of Phishing Websites. IEEE Access, 11:18499–18519, 2023. Conference Name: IEEE Access. Search in Google Scholar

Język:
Angielski
Częstotliwość wydawania:
4 razy w roku
Dziedziny czasopisma:
Informatyka, Bazy danych i eksploracja danych, Sztuczna inteligencja