SURVEY ON ENHANCING INFORMATION RETRIEVAL IN INVISIBLE DEEP WEB
Authors: Teena Nagar, Prof. Mohit Jain
Certificate: View Certificate
Abstract
Most web structures are huge, intricate and users often miss the purpose of their inquest, or get uncertain results when they try to navigate through them. Internet is enormous compilation of multivariate data. Several problems prevent effective and efficient knowledge discovery for required better knowledge management techniques it is important to retrieve accurate and complete data. The hidden web, also known as the invisible web or deep web, has given rise to a novel issue of web mining research. a huge amount documents in the hidden web, as well as pages hidden behind search forms, specialized databases, and dynamically generated web pages, are not accessible by universal web mining application. in this research we proposed a approach is designed that has a robust ability to access these hidden web techniques for better invisible web resources selection and integration system. In this research we using SC technique for invisible web resources selection and integration and its construction for real-world domains based on database schemas clustering, web searching interfaces and improve traditional methods for information retrieve. Applications of our proposed system include invisible web query interface mapping and intelligent user query intension recognition based on our domain knowledge-base.
Introduction
Now, the internet has emerged a growing number of online databases called web database. According to statistics, the number of web databases is more than 500 million, on this basis constitutes a deep web. Theinvisible web the majority of the information is stored in the retrieval databases and the greatest division of them is structured data stored in the backend databases, such as mysql, db2, access, oracle, sql server and so on. Traditional search engines create their index by spidering or crawling surface web pages. to be exposed, the page ought to be static and linked to other pages. Traditional search engines cannot recover content in the invisible web those pages do not live until they are created dynamically as the result of a precise search. Because traditional search engine crawlers cannot probe beneath the surface, to make easy the users to retrieve the invisible web databases, an amount of invisible web sites have done a lot of subsidiary work, such as classifying the invisible web databases manually by constructing a summary database, and providing users with a unified query interface, the user can retrieve the information by comparing the query results of similar topic to conclude which one can answer their needs better. However, bodily classification efficiency is extremely low and cannot convene user’s information needs. in this paper, an automatic classification approach of invisible web sources based on schema matching data analysis techniques technique according to query interface characteristics is presented.Domain specific search sources focus on documents in confined domains such as documents concerning an association or in a exact subject area. Most of the domain specific search sources consist of organizations, libraries, businesses, universities and government agencies. In our daily life we are provided with several kinds of database directories to store critical records.Likewise to position an exacting site in the ocean of internet there have been efforts to systematize static web content in the form of web directories i.e. bing. The procedure adopted is both manual and automatic. Likewise to organize myriad invisible web databases, we need a impressive database to store information about all the online invisible web databases. a fewaspects which make the task of automatic organization of invisible web sources indispensable are: the understanding of the semantic web can be made possible
Conclusion
In order to create knowledge for makingaccurate and appropriate decisions we need to integrate data fromthese heterogeneous deep web sources. In this research a detailedsurvey of automatic deep web Integration techniques ispresented, which is key to the realization of the data integration from heterogeneous datasources. At web scale,it is infeasible to cluster data sources into domains manually.We deal with this problem and propose a schema clusteringapproach that leverages techniques from document clustering.We use a selection and Integration approachto handle the uncertaintyin assigning schemas to domains, which fits with previouswork on data integration with uncertainty.
Copyright
Copyright © 2025 Teena Nagar, Prof. Mohit Jain . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.