Comparison between web Robot Request Detection Techniques on Web Server Log in Data Mining
Authors: Nitika Kadam
Certificate: View Certificate
Abstract
Web robots are software programs which automatically traverse through hyperlink structure of Web to retrieve Web resources. Robots can be used for variety of tasks such as crawling and indexing information for search engines, offline browsing, shopping comparison and email collectors. Apart from that robots can also be used for some malicious purposes like sending spam mails, stealing business intelligence etc. It is necessary to detect robots due to privacy, security and performance of server related issues. Several well-known techniques to detect robots are : robots.txt check, known robot’s IP address, User agent mapping, keywords matching in User agent field, browsing speed, unassigned referrer etc. In this paper we have discussed as well as implemented various robot identification techniques on real server log data and compared their performance for a given dataset.
Introduction
Data mining is the computational process of discovering patterns in large amount data sets involving methods at the intersection of artificial intelligence, machine learning of Data System. The World Wide Web is now a huge database with this growth there arises a need for analyzing the data. The process of discovery and analysis of Web is called Web mining. Web mining is the application of data mining techniques to discover patterns from the Web.
Conclusion
Web server log is a rich source of information, which is used to predict user’s navigation behavior. Due to exponential growth of information on Web, larger part of this log is filled by robot’s requests. Sometimes it is necessary to detect robot’s request for business organizations, Web usage analyst and web site administrator to protect their privacy, to distinguish robot from human user, to improve performance of server respectively. There are several techniques to identify robots in server log are robots.txt check, using IP address, User agent mapping, keywords matching in User agent etc.
Copyright
Copyright © 2025 Nitika Kadam. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.