The invisible web (also called deep web) is that part of the Web that cannot be crawled or indexed by traditional search engines. These pages may often link to databases on web sites that can only be called up or generated if you query them from within the site itself. These pages link to “searchable databases” on particular web sites that can only be “dynamically generated” when queried from within the site. The traditional search engine cannot do deep searches in these databases. However, you can search them manually if you know where to look.
Some examples of Invisible Web pages are:
Have you ever seen this message when refreshing a web page?
This usually happens when the page you are trying to refresh is a dynamically generated page. These types of pages contain data or information we require.
This invisible web content is estimated to be about 500 times more than what today's
search engines can
see. Hidden within the pages of these databases lie terabytes of information that are crucial to your day-to-day decision-making. To get at this information, or even to determine if it is pertinent to your subject, you might spend endless hours in manual searching. This is both time-consuming and prone to human error.
What is contained in Invisible Web Sources?
Most of the invisible or deep web is made up of the contents from thousands of specialized searchable databases made available via the web. Some invisible Web sources include:
- The US patents and Trademarks office that publishes a database on their website of all patents and trademarks registered in the country.
- The secretary of state of each state in the US which publishes an online database of all businesses registered in that state
- Public court case information found on courthouse sites
- Tax assessor information found in county auditor websites
Why do search engine crawlers not index the deep web?
Search engines need static and stable links to pages in order to index them. In order to search invisible web sources, the search engine must construct search queries. Traditional search engines cannot do this. Each dynamic page from the invisible web has information that is specific to a particular query and is often a single use page. Search engines do not want to clutter their databases with these pages because of the sheer volume of data.
How do Invisible Web search engines work?
Where invisible web sources are concerned, it is not practical to create static links to pages that traditional search engines need. It is easier to dynamically construct and query the search page for each query term than to generate and store all the pages containing all the permutations of queries and responses that could be made to even a very small database. Even if stable links were created to each page, the pages would have to be constantly refreshed so that they can stay current as the data changes. This would be a maintenance nightmare to webmasters publishing these pages.
Invisible Web search engines are built to construct queries, which connect with dynamic content in real-time in order to obtain current information. Since the types of queries vary widely depending on the type of database being queried, invisible web search applications are focused on searching pre-selected data sources where the search intent and underlying content are known. This has led to the use of invisible web search technology in building “Vertical Searches” or “specialized searches” that focus on specific businesses.
For example, searching for flights across all airlines would require pre-selecting destinations, flight times and dates in order to pull out relevant and real-time information. Here we select sources based on the specific tasks the end-users in this business are trying to accomplish and build a search by selecting invisible web sources that satisfy this intent. Yahoo!
Farechase is a well-known example of an
invisible web Vertical search. Vertical search engine developers offer custom (topic-based) searches that satisfy the particular search intent of end-users of a portal or for intranet users. These vertical searches provide results that are pertinent to the end-user's query for that particular topic.
See Also: FAQ on the invisible
Here are some additional resources about the Invisible Web: