Implementation to Find Navigational pattern of Log Files Using Hadoop Technology

Authors: Khushbu Ankil, Prof. Mohit Jain

Abstract

This web log contains lot of information so it is preprocessed before modeling. The web log file is preprocessed and converted into the sequence of user web navigation sessions. The web navigation session is the sequence of web page navigated by a user during time window. The user navigation session is finally modeled through a model. Once the user navigation model is ready, the mining task can be performed for finding the interesting pattern. Modeling of web log is the essential task in web usage mining. The prediction accuracy can be achieved through a modeling the web log with an accurate model to improve the performance of the servers, caching is used where the frequently accessed pages are stored in proxy server caches. Pre-fetching of web pages is the new research area which when used with caching greatly increases the performance. In this paper, a better algorithm for predicting the web pages is proposed. Clustering of web users according to their location using clustering is done and then each cluster is mined using FP-Growth algorithm to find the association rules and predict the pages to be pre- fetched for storing in cache.

Introduction

In recent times, Web Usage Mining has emerged as a popular approach in providing Web personalization . Web usage mining is concerned with finding user navigational patterns on the world wide web by extracting knowledge from web usage logs (we will refer to them as web logs). The assumption is that a web user can physically access only one web page at any given point in time, that represents one item. The process of Web Usage Mining goes through the following three phases are . ? Preprocessing phase: The main task here is to clean up the web log by removing noisy and irrelevant data. In this phase also, users are identified and their accessed web pages are organized sequentially into sessions according to their access time, and stored in a sequence database. ? Pattern Discovery phase: The core of the mining process is in this phase. Usually, Sequential Pattern Mining (SPM) is used against the cleaned web log to mine all the frequent sequential patterns. ? Recommendation/Prediction phase: Mined patterns Web Usage Mining is the field of web mining which deals with finding the interesting usage pattern from the logging information. The logging information is stored in a file known as web log file. Web log file contains lot of information like IP address, date, time, web pagerequested etc

Conclusion

Web usage mining model is kind of mining to server logs. Web usage mining used for the improvement of improving the requirement of the system performance, the customers relation and realizing enhancing the usability of the website design. In this paper we suggest offline recommender system using markov mode for next page prediction. We proposed new framework that integrates semantic information into all the phases of web usage mining. second phase pattern discovery phase that calculate semantic distance matrix and pattern mining algorithm to prune and support counting. Semantic annotation in information extraction on web in a better and efficient way.. We build A 1st-order Markov model during the mining process and enrich with semantic information, to be used for subsequently page request prediction, as a solution to ambiguous predictions problem and providing an informed lower order Markov model without the need for complex hybrid order Markov models. In Future work can be ? Enhanced to live log analysis as currently this analysis is of off line analysis. ? Also it can be further enhanced to greater performance if we use parallel tasking or multi threading concept in programming.

Copyright

Copyright © 2025 Khushbu Ankil, Prof. Mohit Jain. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id: IJRRETAS177

Publish Date: 2022-06-01

ISSN: 2455-4723

Publisher Name: ijrretas

Recent Papers

Implementation to Find Navigational pattern of Log Files Using Hadoop Technology

Abstract

Introduction

Conclusion

Copyright