A LARGE-SCALE PRETRAINED DEEP MODEL FOR PHISHING URL DETECTION
Yanbin Wang (Zhejiang university); wei fan zhu (zhejiang university); Haitao Xu (Zhejiang University); Zhan Qin (Zhejiang University); Kui Ren (Zhejiang University); Wenrui Ma (Zhejiang Gongshang University)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Phishing attacks have always been a security issue that has attracted great attention in the cyber security community. Recently, the famous pre-trained models is being used as an antiphishing solution. However, existing studies either simply transfer models pre-trained on text to phishing detection task, or pre-train models using only extremely small phishing samples. In this paper, we propose PhishBERT, a veritable pretrained deep transformer network model for phishing URL detection. Using a tailor pre-training objective, PhishBERT obtained a general understanding of various URLs by being pretrained on a corpus of more than 3 billion unlabeled URL data. It is then transferred to the detection task of benign and malicious URL data, with supervised fine-tuning using adversarial methods. Extensive and rigorous benchmark studies verify that PhishBERT is significantly superior to the current state-of-the-art methods in terms of efficiency, robustness and accuracy on the task of phishing website detection.