|
|
Vast areas of the web are not accessible to most browsers. Estimates of the scale of the invisible web range as high as 100 to 1. In 2007, a study by Google put the figure of 25 million invisible web sites with rich data. So for every page that you can see, there are 100 you can’t find. So ‘what’ and ‘where’ are these hidden resources? It is not due to a conspiracy of silence or a plot to keep you in the dark. Many sites are run by businesses, universities and government departments and are designed for internal use. They go to great lengths to exclude those who are not insiders, while hackers take great pride in penetrating the electronic defences. A decade ago, hacking into these computers was the stuff of Hollywood films. They were simplistic but they did highlight a serious issue. Important sites have got much better at protecting themselves. Now we have firewalls and secure servers. It was so much easier and, I have to confess, quite a lot of fun in the early days. But I digress. Much of the information in these sites is very boring. You will not get to read the page containing Buckingham Palace’s laundry account unless you are given access or you get an inside job with false references. You are not going to gain access to closed systems without the proper authority. Don't even try. Under the US Patriot Act this might be classified as terrorism. OrganisationsNext, there are many companies which make their living from selling information. Bloomberg sells financial data, as does Reuters, which also runs financial and news services. To obtain access to a terminal you need a load of money and the necessary wiring. SubscriptionsThere are many businesses that make their living by providing health, lifestyle, legal and practical advice, often as a perk to firms’ employees. The clients pay substantial subscriptions for the privilege of accessing this quality information, so you won’t get to see it as you search the web. So, as an ordinary web user, you are excluded from confidential internal, as well as some quality, material. But there is also great deal of material in the public domain which you are unlikely to find. Impenetrable formatsThe web is based on HyperText Markup Language, HTML, but many articles are published as PDFs (Portable Document Format). The software that compiles the search engines does not always penetrate these PDF documents. The same applies to most other file formats. Academic articles are supplied in PDF. The users find them by using keyword searches or catalogues. A new search tool is being tested to get round this issue and came into use during 2006 although there is much more to come. GOOGLE offers the ‘advanced’ option for locating pages in the format specified which can look inside many popular formats including PowerPoint® and Word®. Microsoft and Amazon are both joining the rush to make book content available. Impenetrable serversSearch engines can only check visible pages. An increasing number of web pages are generated from data held in a database. If the web address ends in ASP, PHP or CGI (instead of HTM), then you know it has been composed from a database. The page has been composed in response to your enquiry, (which is why you often get an ‘error’ if you try to store it as a ‘favourite’). The data itself can only be accessed by search engines if the structure of the server and the software allows it. Happily, most Content Management Systems (CMS) ensure that the spiders can index the content. Indeed, they make a virtue of delivering up the information within the database. Most blog and forum sites (which include most of the social networking sites) are searchable. LanguageThe English language dominates the web, but this is changing. At the moment all the material written in the other great languages is likely to be invisible. This problem of access is being addressed by search engines which will translate the content. Those who produce quality sites in other languages will need to embed some words in English in the ‘meta-tags’ to attract the search engines. So, just as the volume of private correspondence dwarfs the word-count in all the newspapers of the world, there is an invisible web. The invisible web will be of concern to you when material you want is in an inaccessible or hard-to-find form. However, it is possible to track down event these articles and data if you use the appropriate search strategy. Invisible is no the same as DarkAnd finally, there is the Dark Web which is designed so that it cannot be seen as it runs on a parallel web or servers provided by people who want to be a part of this invisible web. It sounds a bit sinister, but there are some good reasons. Quality v Quantity The invisible web Research Advance search 'auto google' Google Print © Charles Jones 2003-10 |
|
©WritersServices.com 2000-2010 |