Survey on Generic Framework That Integrates Semantic Information
Authors: Upasana Choudhary, Maya Yadav
Certificate: View Certificate
Abstract
Abstract: Identification of the current interests of the user based on the short-term navigational patterns instead of explicit user information has proved to be one of the potential sources for prediction of pages which may be of interest to the user. This would help organizations in various analyses such as web site improvement. Various techniques are employed for achieving personalized recommendation. In this research employs web usage mining techniques for determining the interest of “similar” users, technique for classifying and matching an online user based on his browsing interests. A novel approach for prediction of unvisited pages has been employed. The complete process for next page prediction, represented in the architecture broadly consists of two components: offline component and online component. The offline component involves Data Preprocessing, Pattern Discovery and Pattern Analysis. The outcome of the offline component is the derivation of aggregate usage profiles using web usage mining techniques. The online component is responsible for matching the current user’s profile to the aggregate usage profiles. The scope of this work is to match an online user’s navigational activity with the aggregate usage profiles obtained through mining tasks and provides suitable page next page prediction, which may be of interest to the user. The recommendation process is an online phase and consists of two sub-phases: matching profile and recommendation.
Introduction
Semantic Web is to address the current web problems by structuring the content of the web, add semantics and extract maximum benefit from the processing power of machines and web. As defined by Sir Tim Berner’s LEE, “The semantic web is an extension of the current web in which information is given well distinct meaning, better enabling computers and people to work in co-operation [1]. It is a vision: the thought of having data on the Web definite and linked in a way that it can be used by machines not just for display purposes, but for automation, integration and reuse of data across various applications [2]. Web Mining plays a pivot role in achieving this as it enables to quickly and easily find the information we need. It is mostly for obtain functional information and knowledge from a large number of web pages of websites, and it can be regarded as the data mining continuing to use on the web, which can draw automatically, standardization and analyzing, explaining the data [3]. Three main concepts for data preprocessing of log files: filtering, normalization, and correlation. Filtering is the act of taking in raw log data, determining if you want to keep it. The output of filtering is a normalized log data. This data is an input to correlation. Correlation is the act of matching a single normalized piece of data, or a series of pieces data, for the purpose of taking an action.
Conclusion
Semantic information can be integrated into the pattern discovery phase, such that a semantic distance matrix is used in the adopted sequential pattern mining algorithm to prune the search space and partially relieve the algorithm from support counting. We build a 1st-order Markov model during the mining process and enrich with semantic information, to be used for subsequently page request prediction, as a solution to ambiguous predictions problem and providing an informed lower order Markov model without the need for complex hybrid order Markov models.
Copyright
Copyright © 2025 Upasana Choudhary, Maya Yadav. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.