Total views : 227

Proposed Discriminative Lexical Features for Real-time Detection of Malware Uniform Resource Locator


  • Cyber Security Science Department Federal University of Technology Minna, 920102, Niger, Nigeria
  • Information Security Research Group Faculty of Computer Science and information Technology, Universiti Putra Malaysia, 43400, Selangor, Malaysia


Objective: To identify discriminative lexical features of malware URL through manual examination, and to study prevalence of these features thereby leading to proposition of discriminative lexical feature for real-time detection of malware URL. Methods/Statistical Analysis: Manual examination of malware URL using existing blacklist of malware URLs and empirical analysis allowed the authors to identify discriminative lexical features and to determine whether there is consistency in the way the attackers craft malware URLs respectively. Empirical analysis was carried on both the existing blacklisted malware URLs and newly collected malware URLs. Empirical analysis revealed that there is consistency in the way malware URLs is crafted by the attackers. To evaluate performance of our proposed lexical features, two previously used machine learning models were applied on our trained dataset of malware URLS and benign URLs. The essence of using these models is to enable us compare performance of our proposed lexical features with previous studies proposed feature groups. Our comparison shows that our proposed lexical features outperform previously proposed feature groups. Findings: Our first step was to manually examine blacklisted malware URLs. This step led to the identification of 12 discriminative lexical features which was later reduced to 11. The second step was an empirical analysis of the identified features of existing blacklisted malware URLs and newly collected malware URLs. Empirical analysis was carried out to determine whether there was consistency in the way malware URLs are crafted by the attackers. The results of our empirical analysis revealed that there is indeed consistency in the way malware URLs are crafted by the attackers. This implies that our carefully identified lexical features are common features of malware URL. After experimentation, the evaluation results reveal that our proposed lexical features outperform previously proposed feature groups. Applications/Improvements: Discriminative features are required to build real-time malware URLs detection system with machine learning algorithm. The proposed lexical features are set of discriminative feature that rely on textual properties of malware URL.


Attackers, Blacklist, Lexical Features, Malware URL, Rea-time Malware URL Detection.

Full Text:

 |  (PDF views: 180)


  • Olalere M, Abdullah MT, Ramlan M, Abdullah A. A review on bring your own device on security issues. Sage Open.2015; 05(02). p.1–11.
  • Ever wondered how many websites are created every minut? ered-how-many-websites-are-created-every-minute/. Date accessed: 11/06/2014.
  • Choi HS, Zhu BB, Lee H. Detecting malicious web links and identifying their attack types. Proceedings of the 2nd USENIX Conference on Web Application Development; 2011 USENIX Association Berkeley, CA, USA. ACM Digital Library; 2011. p. 1–11.
  • Patil DR, Patil JB. Survey on malicious web pages detection techniques. International Journal of U- and E- service, Science and Technology. 2015; 08(5):195–206.
  • Eshete B, Villafiorita A, Weldemariam K. BINSPECT: Holistic analysis and detection of malicious web pages.Proceedings of 8th International ICST Conference, SecureComm 2012; 2012 Sep. 3–5; Padua, Italy. Berlin: Springer; 2013. p. 149–166.
  • Blum A, Wardman B, Solorio T, Warner G. Lexical feature based phishing URL detection using online learning.Proceedings of the 3rd ACM workshop on Artificial Intelligence and Security; 2010 October 04 – 08; Chicago, Illinois, USA. ACM; 2010. p. 54–60.
  • Le A, Markopoulou A, Faloutsos M. PhishDef: URL names say it all. Proceedings of IEEE INFOCOM, 2011; 2011 April 10-15; Shanghai, China. IEEE; 2011. p. 191–195.
  • Malwarepatrol. 2005.
  • Kalafut AJ, Shue CA, Gupta M. Malicious hubs: detecting abnormally malicious autonomous systems. Proceedings of IEEE INFOCOM, 2010 Conference. 2010 March 14-19; San Diego, CA, USA. IEEE; 2010. p. 1–5.
  • Sayamber AB, Dixit AM. Malicious URL detection and identification. International Journal of Computer Applications. 2014 August; 99(17). p.17–23.
  • dmoz. Open directory project.
  • Link klipper. apps?hl=en.
  • Ma J, Saul LK, Savage S, Voelker GM. Beyond blacklists: Learning to detect malicious web sites from suspicious urls. Proceedings of the 15th ACM SIGKDD Conference on Knowledge Discovery and Data Mining; June-July 2009; Paris, France, ACM; 2009. p. 1245–1235.
  • Basnet RB, Sung AH. Learning to detect phishing webpages.Journal of Internet Services and Information Security (JISIS). 2014; (4)3. p. 21–39.
  • Wang W, Shirley, KE. Breaking bad: detection malicious domain using word segmentation. Proceedings of the 9th Workshop on Web 2.0 Security and Privacy (W2SP) 2015; May 21, 2015; San Jose, CA, USA. IEEE; 2015.p. 1–7.
  • DNS-BH. Malware prevention through domain blocking. 2016.


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.