Machine learning based phishing detection from URLs

DEMİR, ÖNDER

doi:10.1016/j.eswa.2018.09.029

Publication:
Machine learning based phishing detection from URLs

dc.contributor.author	DEMİR, ÖNDER
dc.contributor.authors	Sahingoz, Ozgur Koray; Buber, Ebubekir; Demir, Onder; Diri, Banu
dc.date.accessioned	2022-03-12T22:38:26Z
dc.date.accessioned	2026-01-11T13:22:05Z
dc.date.available	2022-03-12T22:38:26Z
dc.date.issued	2019
dc.description.abstract	Due to the rapid growth of the Internet, users change their preference from traditional shopping to the electronic commerce. Instead of bank/shop robbery, nowadays, criminals try to find their victims in the cyberspace with some specific tricks. By using the anonymous structure of the Internet, attackers set out new techniques, such as phishing, to deceive victims with the use of false websites to collect their sensitive information such as account IDs, usernames, passwords, etc. Understanding whether a web page is legitimate or phishing is a very challenging problem, due to its semantics-based attack structure, which mainly exploits the computer users' vulnerabilities. Although software companies launch new anti-phishing products, which use blacklists, heuristics, visual and machine learning-based approaches, these products cannot prevent all of the phishing attacks. In this paper, a real-time anti-phishing system, which uses seven different classification algorithms and natural language processing (NLP) based features, is proposed. The system has the following distinguishing properties from other studies in the literature: language independence, use of a huge size of phishing and legitimate data, real-time execution, detection of new websites, independence from third-party services and use of feature-rich classifiers. For measuring the performance of the system, a new dataset is constructed, and the experimental results are tested on it. According to the experimental and comparative results from the implemented classification algorithms, Random Forest algorithm with only NLP based features gives the best performance with the 97.98% accuracy rate for detection of phishing URLs. (C) 2018 Elsevier Ltd. All rights reserved.
dc.identifier.doi	10.1016/j.eswa.2018.09.029
dc.identifier.eissn	1873-6793
dc.identifier.issn	0957-4174
dc.identifier.uri	https://hdl.handle.net/11424/235632
dc.identifier.wos	WOS:000449892000024
dc.language.iso	eng
dc.publisher	PERGAMON-ELSEVIER SCIENCE LTD
dc.relation.ispartof	EXPERT SYSTEMS WITH APPLICATIONS
dc.rights	info:eu-repo/semantics/closedAccess
dc.subject	Cyber security
dc.subject	Phishing attack
dc.subject	Machine learning
dc.subject	Classification algorithms
dc.subject	Cyber attack detection
dc.subject	WEBSITES
dc.subject	ATTACKS
dc.title	Machine learning based phishing detection from URLs
dc.type	article
dspace.entity.type	Publication
oaire.citation.endPage	357
oaire.citation.startPage	345
oaire.citation.title	EXPERT SYSTEMS WITH APPLICATIONS
oaire.citation.volume	117

Collections

Araştırma Çıktıları

Publication: Machine learning based phishing detection from URLs

Files

Collections

Publication:
Machine learning based phishing detection from URLs