The Invisible Web

The Invisible Web is a term I like very much because it describes precisely the situation when I know some piece of information is on the Web but I cannot see it. It is that vast part of the Net that search engines do not get to (due to different reasons, as I am going to tell next) but still can be accessed in other ways.

Maybe it is necessary to explain that not all pages that cannot be retrieved through the search engines belong to the Invisible Net. For instance, the Opaque and the Dark Net are two other places that are hidden from the world because the Opaque Net is files that are not linked to other resources and cannot be accessed and the Dark Web is invisible on deliberately – i.e. Corporative networks, sites with special membership and other similar places that do not welcome strangers. To get to Opaque and Dark Web sites, you need to know their URL in advance (for instance from a friend of yours) and if necessary, to have a user name and a password.

The search ideas I am going to give you in the next sections apply to the Invisible Web only and are unlikely to give results for the intentionally hidden parts of the Web. But even the Invisible Web alone is a pretty vast space. It is estimated that it is up to 500 (yes, five hundred) times the size of the Surface Web (the part that is indexed by search engines) and the tendency is that the Invisible Web will grow both as a percentage and in absolute figures. And what is more, really valuable stuff is hidden in its debris.

What Is Hidden in the Debris of the Invisible Web?

The short answer is – many essential items are hidden in the debris of the Invisible Web. It is true that the information there might not be interesting for everybody but if you are looking for a very special piece of information, no matter what topic or area, it is quite probable that it is buried on some other site together with many other topics of interest to you. Most often the stuff that cannot be found via the general search engines (but is accessible by other search means) is like the following:

  • Dynamic, database-driven sites that are publicly accessible but due to technical reasons search engines often skip their content when indexing the Web.
  • Archives of articles in online journals and magazines
  • Specialized databases that are not of interest to the general public – medical, scientific, legal, etc.
  • Different catalogs – of products, of libraries, etc.
  • News and newsgroup postings – although very often, when I search the Net I encounter newsgroup postings from five or more years ago, when searching for recent ones, the “deliverables” of the search engines are far from satisfactory.
  • Legal and administrative information (court records, patents and trademarks information) that is available on request
  • Classifieds and advertisements, Yellow and White Pages
  • Stuff that search engines exclude on deliberately – for instance files with particular extensions, data that is regarded to be private, or content that the owners of the site has asked explicitly to be removed from the search engine’s index.

Posted

in

by