Saturday 18 February 2017

SEARCH ENGINES



Search Engines
Deb Kumar Bandyopadhyay

Objective: To study the use of Search Engines in web based environment.

Structure:
1. Search Engine.
    1. Types of Search Engines.
    2. Parts of Search Engines.
    3. Features of Search Engines.
    4. Advanced search engines and applications.

1. Search engines are computer programs that search for particular keywords entered by users and returns a list of documents in which they were found, it is especially a service that searches contents on the web.

1.1.Types of Search Engines:

Search engines can be mainly categorised into four types:

Crawlerbased search engines are useful if we have specific search keyword in our mind but if our search topic is a general one then these type of search engines may provide several irrelevant documents to a search request,e.g. AltaVista.

Human-powered directories are good if our search is a general topic, then this type of search engines powered with human crafted directories will guide us and help to converge our search and fetch refined responses,e.g. DMOZ.

Hybrid search engines use a combination of both crawler-based results and directory results,e.g.Google.

Meta-search engines are good for saving time by gathering results from different search engines at a single interface. It is excellent if we wish to know whether something is available about aparticular topic or not on the web,e.g.Dogpile.

1.2.The Three Parts of a Search Engine :

•Spider, crawler or robot
•Index, catalog or database
•Search engine software

The first part of a search engine is called the spider. The spider(sometimes called a crawler or robot) is a program that moves around the World Wide Web visiting websites.
•It reads the webpages.
•The spider returns from time to time and checks for changes.The pages that it finds are placed into the catalog.

The second part of a search engine is called the index, catalog, or database.
•This index contains a copy of each page that was collected by the spider.
•A spidered page must be indexed to become a search result.

The third part of a search engine is Search engine software.When a user requests keywords from a search engine, the search engine software sifts through all the indexed pages to find matching keywords, then returns the results/hits to the user.

1.3.Features of Search Engines:

The features like basic text search facilities, like Boolean search, proximity search, phrase search, truncation, field-specific search and limiting search are provided by almost all the search engines.

Boolean Search:

George Boole devised a system of symbolic logic in which he used three operators, viz.+,* and -, to combine statements in symbolic form. The three operators of Boolean logic are the logical sum(+), logical product(*), and logical difference(-).

Logical product or AND logic allows the searcher to specify the coincidence of two or more concepts. For example, in order to ask for information on “computers and information retrieval” the user may formulate the search satement as

(COMPUTERS) AND (INFORMATION RETRIEVAL)

Logical sum or OR logic allows the searcher to specify alternatives among search terms (or concepts). For example, with the query statement

(COMPUTERS) OR (INFORMATION RETRIEVAL)
Tsearcher indicates that items on either of these two topics, or both, will serve the purpose.

Logical difference or NOT logic provides facilities to exclude items from a set. For example, with the search statement
(INFORMATION RETRIEVAL)AND NOT(DBMS)
the user narrows his subject, in this case specifying that he or she does not require information on DBMS.

Truncation and String search:

Truncation supports searching on word stems. By using the truncation character at either end of a word. For example, if the user ask for a serach on Countr* (right truncation) this would retrive records including words such as Country, Countries, Countrywide. For example, Chloride (left truncation) might retrive records of “chloride” with various prefixes. Truncation, or masking as it is called in this conrext, is sometimes also available in the middle of words. For example, Na*ional will search for records with National and Nacional.

String seraching appears to be similiar to truncation search. For example, Employ??? might select terms with a maximum of three additional characters.

Proximity search:

The purpose of proximity searching is to refine search statement by permitting the searcher specify the context in which a term must occur, There are various different kinds of proximity operators. These can require:

Two words appear next to each other; “Information Retrieval”, Information and retrieval, Information (N) Retrieval, depending on the search system.

Two words appear within the same field, sentence or paragraph; for example,
Browser SAME Microsoft*[Same paragraph]
Browser WITH Microsoft*[Same sentence]

Two words be within a specified distance of one another, for example,
Information (W.3) Retrieval

Two words be within a specified distance of one another, with the maximum number of words to come between the two words set by the system. For example,
Stage NEXT Lighting

Range Searching and Limiting:

This type of searching is particularly useful when selecting records on the basis of numeric or date fields. Common range operators are:
EQ equal to
NE not equal to
GT greater than
NG not greatr than
LT less than
NL not less than
W within the limits
OL outside the limits

The search interfaces in the modern day search engines enables users to use above features without much effort. Many advanced search interfaces also provides enough help information for users to perform the search on the search interface itself.

1.4.Advanced search engines and applications:

Present day search engines are like encyclopaedias operating on the internet, allowing users to search and retrieve relevant digital contents. But from users perspective only requirement is to search for a desired content using appropriate search engine. Because different search engines are meant for diff erent purpose and requires different skill set to use it. Advanced search engines will satisfy the most of the users queries by providing advanced search options, thus efficiently providing solutions to users queries.

Some advanced advanced search engines are

For General Search: If users requirement is written information, the general search engines like Google is efficient one. Google with its advanced search options enable users to perform more specific search queries.

Reverse Image Search: If a users requirement to search for images then a advanced search engines like TinEye is a efficient one as this can read the content and thus making it searchable while a general search engines can look for only file names or user defined tags.

Similar Image Search: The advanced search engines like GazoPa can look for similar features in the image like texture, colour or structures but cannot recognize exact copies of a given content.

Invisible Search: The CompletePlanet advanced search engines have the application of searching the desired content from the data stored in databases which are almost invisible to the general search engines. Because general search engines mainly index the resources from the websites by following the hyperlinks one after another. This type of hidden web is known as Deep Web.

Semantic Search: Semantic search is meant for searching terms in a meaningful manner i.e. terms with exact meaning, context and definition. The search engines like Yummly based on such type of semantic search algorithms are efficient in obtaining relevant result.




Exercise:

  1. Define Search Engines? Are all Search Engines alike?
  2. Which Search Engine can be considered as the first Search Engine for the World Wide Web?
  3. Name Various Components of a Search Engine?
  4. What are Spiders? Do all the spiders function in the same way?
  5. What are Subject gateways? How are they different from Meta search Engines?
  6. Difference between Boolean Search and Proximity Search?




4 comments:

E-Content development by Xerte

Xerte is an Open Source content creation tool. Xerte was developed by the University of Nottingham supported by JISC Techdis. Featur...