How Do Web Search
Engines Work
Search engines are the key to finding specific information
on the vast expanse of the World Wide Web. Without sophisticated search engines, it would
be virtually impossible to locate anything on the Web without knowing a
specific URL. But do you know how search engines work? And do
you know what makes some search engines more effective than others?
When
people use the term search engine in relation to the Web, they are usually
referring to the actual search forms that searches through databases of HTMLdocuments,
initially gathered by a robot.
There
are basically three types of search engines: Those that are powered by robots
(called crawlers; ants or spiders) and
those that are powered by human submissions; and those that are a hybrid of
the two.
Crawler-based
search engines are those that use automated software agents
(called crawlers) that visit a Web site, read the information on the actual
site, read the site's meta tags and also follow the links that the site connects to performing
indexing on all linked Web sites as well. The crawler returns all that
information back to a central depository, where the data is indexed. The
crawler will periodically return to the sites to check for any information
that has changed. The frequency with which this happens is determined by the
administrators of the search engine.
Human-powered
search engines rely on humans to submit information that is subsequently
indexed and catalogued. Only information that is submitted is put into the
index.
|
In both
cases, when you query a search engine to locate information, you're actually
searching through the index that the search engine has created —you are not
actually searching the Web. These indices are giant databases of
information that is collected and stored and subsequently searched. This
explains why sometimes a search on a commercial search engine, such as Yahoo!
or Google, will return results that are, in fact, dead links. Since the search
results are based on the index, if the index hasn't been updated since a Web
page became invalid the search engine treats the page as still an active link
even though it no longer is. It will remain that way until the index is
updated.
So why
will the same search on different search engines produce different results?
Part of the answer to that question is because not all indices are going to be
exactly the same. It depends on what the spiders find or what the humans
submitted. But more important, not every search engine uses the same algorithm to
search through the indices. The algorithm is what the search engines use to
determine the relevance of the information in the index to what the user is
searching for.
One of
the elements that a search engine algorithm scans for is the frequency and
location of keywords on a Web page. Those with higher frequency are typically
considered more relevant. But search engine technology is becoming
sophisticated in its attempt to discourage what is known as keyword stuffing, or spamdexing.
Another
common element that algorithms analyze is the way that pages link to other
pages in the Web. By analyzing how pages link to each other, an engine can both
determine what a page is about (if the keywords of the linked pages are similar
to the keywords on the original page) and whether that page is considered
"important" and deserving of a boost in ranking. Just as the technology
is becoming increasingly sophisticated to ignore keyword stuffing, it is also
becoming more savvy to Web masters who build artificial links into their sites
in order to build an artificial ranking.
Did You Know...
The first tool for searching the Internet, created in 1990, was called "Archie". It downloaded directory listings of all files located on public anonymous FTP servers; creating a searchable database of filenames. A year later "Gopher" was created. It indexed plain text documents. "Veronica" and "Jughead" came along to search Gopher's index systems. The first actual Web search engine was developed by Matthew Gray in 1993 and was called "Wandex". |
Key Terms To Understanding Web Search Engines
spider trap
A condition of dynamic Web sites in which a search engine’s spider becomes trapped in an endless loop of code.
search engine
A program that searches documents for specified keywords and returns a list of the documents where the keywords were found.
meta tag
A special HTML tag that provides information about a Web page.
deep link
A hyperlink either on a Web page or in the results of a search engine query to a page on a Web site other than the site’s home page. |
Excellent information but i think only those search engines are valid when they are curated, user-friendly and Eco-friendly.
ReplyDeletewww.frompo.com