| 
  • If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

View
 

Deep Web

Page history last edited by WoW!ter 10 years, 9 months ago


 

Wat is the deep, invisible or hidden web?

Not all documentary information is directly retrievable. In 2001 two publications described the problems with search engine limitations. Bergman (2001) who coined the term 'deep web' and Sherman & Price (2001) who used the term 'invisble web'. According to estimates by Bergman the 'deep web' is about 500 times as large as the 'surface web'. However, some of the assumptions in the early studies were likely to be flawed. Whatever its size, the deep web still exists and there is a lot of quality content to be found.

 

The main causes for the existence of the deep web

  • The information is contained in databases
  • Search Engine limitations
  • Website limitations
  • Low ranking results
  • Cognitive factors
  • Web 2.0

 

Informatie is contained in databases

Spiders or crawlers of search engines can't deal with database forms. The spiders can't complete a form, and hit the search button to gain access to the information in databases. They can index the search form itself, but not the wealth that is contained in the database behind the search form. Webpages resulting from the database are so called dynamic pages. Dynamic pages can be recognized from the structure of their URL, they contain: ? or clues like: cgi, cfm, php etc. The following URL is an example of a dynamic page http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=9742976 wich can in some instances indexed by search engines.

 

Search engine / Website limitations

  • Sites are too large to be indexed completely
  • Files are too large (limits on index size change, but they are still there).
  • Informatie is contained in non indexable file types (ZIP, TAR etc..)
  • Informatie is contained in graphical, multimedia files or Flash.
  • The site owner robots.txt does not allow indexing
  • Informatie changes rapdily (toc's, news or blogs)
  • Information is on intranets, or requires passwords

 

Cognitive factors

(Re)searcher are human after all. They don't look beyond the first 10 or perhaps 20 results. They will change their search query rather than paging through. Make sure that you alter the preferences of your favourite search engine. Another possibility is too use another search engine. There is probably not a single search engine in the world that is the best for all your search queries.

 

Web 2.0

Perhaps not realized everywhere, but the current evolution of the Web, the so called social Web of Web 2.0, has resulted on many occasions in large silos of information that are only available to users of that system. Much information of eg. Facebook users is to be found by general search engines, but not all information within the webspace of Facebook is retrievable in this way. Web 2.0 service make the Web opaque. ,

 

Solutions

To find the information contained in the deep web, it is most important too find those database that hold the information rather than the information directly. To locate these databases there are four possibilities.

  • Use the standard search engines too locate the databases that can possibly contain the required inforation
  • Look for databases where they can be expected.
  • Special directories.
  • Special search engines.

 

Nevertheless, general search engines like Google are working on solutions to improve their retrieval from deep web sources. Google has improved the indexing of dynamic pages, started indexing Flash pages, works on text recongnition of graphical material and has made a start with retrieving information from search forms.

 

Searching databases with standard search engines

Search for your research topic with addtional terms that point to databases. Terms such as: database, data, dataset, archive, bibliography, index, directory, register or statistics. For example ["plane crash" | "aviation accidents" database].

 

Whenever you have found the suitable databases it is important that you understand how to query the database to retrieve your information.

 

Look for databases in locations where you can expect them.

  • Statistics in Netherlands are collected by CBS at the homepage you find the Statline database with all the important statistics of the Netherlands. Or Eurostat in Europe.
  • Telephone numbers a collected in telephone directories
  • Weather data in The Netherlands is collected by the KNMI
  • Doctors and GPs in the Netherlands are all registered

 

 

Special directories

 

Direct Search http://www.freepint.com/gary/direct.htm

Although Direct Search is no longer updated, it is still a valuable resource to find important databases. This site was started by Gary Price. Recent developments on Web search and Web resrouces are still reported by him and blogged on ResoureShelf and DocuTicker.

 

Yahoo! Webdirectories http://dir.yahoo.com/

Most subject categories have a special set as webdirectories. On some occasions als databases or bibliographies. 

 

Complete Planet http://www.completeplanet.com

Covers some 70,000 databases, and Web directories (It hasn't been updated for a while).

 

 

Specialized search engines

 

IncyWincy http://www.incywincy.com/default

 

Gosh me http://www.goshme.com/ (Still in Beta, perhaps defunct?)

Promising new search engine.

 

ScienceResearch http://www.scienceresearch.com/search/

This portal allows access to numerous scientific journals and public science databases. Depending on the source, full text documents may be available. In the event full text is not available, the results pull up an abstract of the article and a link to the source.

 

Additional Information

Anon. (2004) Invisible Web: What it is, Why it exists, How to find it, and Its inherent ambiguity. Retrieved 2005-05-23, from http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/InvisibleWeb.html

 

Bergman, K. T. (2001). The deep web: surfacing hidden value. The Journal of Electronic Publishing 7(1). http://dx.doi.org/10.3998/3336451.0007.104

 

Devine, J. and F. Egger-Sider. (2005). Beyond Google: The invisible Web. Retrieved 2005-05-23, from http://library.laguardia.edu/invisibleweb.

 

Sherman, C. and G. Price (2001). The invisible web: Discovering information sources search engines can't see. Medford NJ, USA, Information today.

 

 


home

WG 20130996

Comments (0)

You don't have permission to comment on this page.