| Indices
An index is a search tool that uses automated programs that are capable of following
the hyperlinks found on a web page to other web pages and in that way move through
a large number of web sites with no human intervention. While doing their wandering,
these "web crawlers" can also create a database of keywords from each web page
they encounter. A search engine can then be designed to take advantage of the
specific ways that the database is indexed making searching the index for all
occurrences of your keywords extremely fast. Because crawlers can process
enormous numbers of web pages each day, older links are constantly being dropped
or updated as the new URLs are added making the maintenance of the database
automated as well.
One of the most advanced index search engines is Alta Vista, created and run by
Digital Equipment Corporation. Within two months of its public debut on December
15th, 1995, Alta Vista had grown to include 21 million web pages, 10 billion words,
and was handling over 4 million request per day. Since that time its automated crawler
has been examining approximately 2.5 million web pages daily.
Alta Vista has one blank entry where you can enter your keywords. It will generally
behave as if the OR Boolean operator is in effect, giving you the total number of web
pages found containing each word and then a total of pages containing all the words
and a listing of the top 10 best matches. Thus it does both an AND and OR search
automatically. Putting a "+" sign in front of a word causes it to be required, somewhat
analogous to using the AND operator. Likewise, a "-" sign can be put in front of a
word to indicate the it should NOT be included. Finally, phrases can be placed in
quotes so that the words are not searched individually. For instance if you searched
for Robert Redford, without quotes, you would get entries for all pages that include
Robert and Redford anywhere in the page such as ones containing the names Robert
Jones and Lynn Redford. If, however, you use "Robert Redford," in quotes, then you
would only get pages which have the two names side by side as in the famous actor's
name. It should be pointed out though, that when calculating a pages "score" to come
up with the 10 best matches, Alta Vista does consider proximity, so sites with the
actor's name in them would score higher than the example site with two separate
names and therefore be more likely to appear at the top of your list. While it only
initially lists the top 10 pages, there is a link at the bottom of the results page that will
allow you to see more results your search turned up.
Activity:
Go to the Alta Vista home page and try a search. If you need more information on
using Alta Vista, click on the Help graphic at the top of the screen.
Try using both lower and upper case letters in your search. For example, Mayan
Civilization might produce different results from mayan civilization.
|