Search
Engines
Deb
Kumar Bandyopadhyay
Objective:
To study the use of Search
Engines in web based environment.
Structure:
1.
Search Engine.
Types of Search Engines.
Parts of Search Engines.
Features of Search Engines.
Advanced search engines and applications.
1.
Search engines are computer programs that search for particular
keywords entered by users and returns a list of documents in which
they were found, it is especially a service that searches contents on
the web.
1.1.Types
of Search Engines:
Search
engines can be mainly categorised into four types:
Crawlerbased
search engines are useful if we have specific search keyword in our
mind but if our search topic is a general one then these type of
search engines may provide several irrelevant documents to a search
request,e.g. AltaVista.
Human-powered
directories are good if our search is a general topic, then this type
of search engines powered with human crafted directories will guide
us and help to converge our search and fetch refined responses,e.g.
DMOZ.
Hybrid
search engines use a combination of both crawler-based results and
directory results,e.g.Google.
Meta-search
engines are good for saving time by gathering results from different
search engines at a single interface. It is excellent if we wish to
know whether something is available about aparticular topic or not on
the web,e.g.Dogpile.
1.2.The
Three Parts of a Search Engine :
•Spider,
crawler or robot
•Index,
catalog or database
•Search
engine software
The first
part of a search engine is called the spider. The spider(sometimes
called a crawler or robot) is a program that moves around the World
Wide Web visiting websites.
•It
reads the webpages.
•The
spider returns from time to time and checks for changes.The pages
that it finds are placed into the catalog.
The
second part of a search engine is called the index, catalog, or
database.
•This
index contains a copy of each page that was collected by the spider.
•A
spidered page must be indexed to become a search result.
The third
part of a search engine is Search engine software.When a user
requests keywords from a search engine, the search engine software
sifts through all the indexed pages to find matching keywords, then
returns the results/hits to the user.
1.3.Features
of Search Engines:
The
features like basic text search facilities, like Boolean search,
proximity search, phrase search, truncation, field-specific search
and limiting search are provided by almost all the search engines.
Boolean
Search:
George
Boole devised a system of symbolic logic in which he used three
operators, viz.+,* and -, to combine statements in symbolic form. The
three operators of Boolean logic are the logical sum(+), logical
product(*), and logical difference(-).
Logical
product or AND logic allows the searcher to specify the coincidence
of two or more concepts. For example, in order to ask for information
on “computers and information retrieval” the user may formulate
the search satement as
(COMPUTERS)
AND (INFORMATION RETRIEVAL)
Logical
sum or OR logic allows the searcher to specify alternatives among
search terms (or concepts). For example, with the query statement
(COMPUTERS)
OR (INFORMATION RETRIEVAL)
Tsearcher
indicates that items on either of these two topics, or both, will
serve the purpose.
Logical
difference or NOT logic provides facilities to exclude items from a
set. For example, with the search statement
(INFORMATION
RETRIEVAL)AND NOT(DBMS)
the user
narrows his subject, in this case specifying that he or she does not
require information on DBMS.
Truncation
and String search:
Truncation
supports searching on word stems. By using the truncation character
at either end of a word. For example, if the user ask for a serach on
Countr* (right truncation) this would retrive records including words
such as Country, Countries, Countrywide. For example, Chloride
(left truncation) might retrive records of “chloride” with
various prefixes. Truncation, or masking as it is called in this
conrext, is sometimes also available in the middle of words. For
example, Na*ional
will search for records with National and Nacional.
String
seraching appears to be similiar to truncation search. For example,
Employ??? might select terms with a maximum of three additional
characters.
Proximity
search:
The
purpose of proximity searching is to refine search statement by
permitting the searcher specify the context in which a term must
occur, There are various different kinds of proximity operators.
These can require:
Two words
appear next to each other; “Information Retrieval”, Information
and retrieval, Information (N) Retrieval, depending on the search
system.
Two words
appear within the same field, sentence or paragraph; for example,
Browser
SAME Microsoft*[Same paragraph]
Browser
WITH Microsoft*[Same sentence]
Two words
be within a specified distance of one another, for example,
Information
(W.3) Retrieval
Two words
be within a specified distance of one another, with the maximum
number of words to come between the two words set by the system. For
example,
Stage
NEXT Lighting
Range
Searching and Limiting:
This type of searching is particularly useful when selecting records
on the basis of numeric or date fields. Common range operators are:
EQ equal to
NE not equal to
GT greater than
NG not greatr than
LT less than
NL not less than
W within the limits
OL outside the limits
The search interfaces in the modern day search engines enables users
to use above features without much effort. Many advanced search
interfaces also provides enough help information for users to perform
the search on the search interface itself.
1.4.Advanced
search engines and applications:
Present day search engines are like encyclopaedias operating on the
internet, allowing users to search and retrieve relevant digital
contents. But from users perspective only requirement is to search
for a desired content using appropriate search engine. Because
different search engines are meant for diff erent purpose and
requires different skill set to use it. Advanced search engines will
satisfy the most of the users queries by providing advanced search
options, thus efficiently providing solutions to users queries.
Some advanced advanced search engines are
For
General Search: If users
requirement is written information, the general search engines like
Google is efficient one. Google with its advanced search options
enable users to perform more specific search queries.
•Reverse
Image Search: If a users
requirement to search for images then a advanced search engines like
TinEye is a efficient one as this can read the content and thus
making it searchable while a general search engines can look for only
file names or user defined tags.
•Similar
Image Search: The advanced
search engines like GazoPa can look for similar features in the image
like texture, colour or structures but cannot recognize exact copies
of a given content.
•Invisible
Search: The CompletePlanet
advanced search engines have the application of searching the desired
content from the data stored in databases which are almost invisible
to the general search engines. Because general search engines mainly
index the resources from the websites by following the hyperlinks one
after another. This type of hidden web is known as Deep Web.
•Semantic
Search: Semantic search is
meant for searching terms in a meaningful manner i.e. terms with
exact meaning, context and definition. The search engines like Yummly
based on such type of semantic search algorithms are efficient in
obtaining relevant result.
Exercise:
Define
Search Engines? Are all Search Engines alike?
Which
Search Engine can be considered as the first Search Engine for the
World Wide Web?
Name
Various Components of a Search Engine?
What
are Spiders? Do all the spiders function in the same way?
What
are Subject gateways? How are they different from Meta search
Engines?
Difference
between Boolean Search and Proximity Search?