Search
Engine Terms Glossary
Adjacency
A
property of the relationship between words in a search engine
(or directory) query. Search engines often allow users to specify
that words should be next to one another or somewhere near one
another in the web pages searched.
Agent
Name Delivery
The
process of sending search engine spiders to a tailored page, yet
directing your visitors to what you want them to see. This is
done using server side includes (or other dynamic content techniques).
SSI, for example, can be used to deliver different content to
the client depending on the value of HTTP_USER_AGENT. Most normal
browser software packages have a user agent string which starts
with "Mozilla" (coined from Mosaic and Godzilla). Most
search engine spiders have specific agent names, such as "Gulliver",
"Infoseek sidewinder", "Lycos spider" and
"Scooter".
By
switching on the value of HTTP_USER_AGENT (a process known as
agent detection), different pages can be presented at the same
URL, so that normal visitors will never see the page submitted
to search engines (and vice versa).
In
practise this is somewhat simplistic. Some search engines pretend
to be "plain mozilla" browsers to prevent use of agent
name delivery. Effective use of agent name delivery can be very
difficult, and may not even work.
How
do you spot agent name delivery at work? This is quite difficult,
as the owners of web pages using agent name delivery can control
what you see! You may be able to guess that a page is using this
technique if it appears to be indexed incorrectly or the title
or description don't match the page you see, but this could also
have been achieved by switching pages after the relevant search
engine has indexed it. If you really want to see the search engines'
tailored version of a page, write a program (e.g. a Perl script)
to retrieve the URL with HTTP_USER_AGENT set to each of the strings
used by the search engine spiders. If agent name delivery is in
use, one or more of the retrieved pages will be different to the
others!
See
also hidden text and IP delivery.
Altavista
A
popular search engine with the largest database on the web, indexing
more than 140 million pages. Its main URL is http://www.altavista.com.
Until 1998, this search engine provided the search facility for
Yahoo. Altavista indexes all the words in a web page, and new
pages are normally added to the database fairly quickly, within
a couple of working days. You are asked to submit just the main
page of your site. The Altavista spider will then explore your
site and index a representative sample of the pages. Some problems
with spamming have been noticed. The use of keyword meta tags
is penalised. Altavista places various alternative options before
its search results, including suggested questions (using the Ask
Jeeves service), RealNames. Paid entries are beginning to appear
at the start of the search results.
AOL
Netfind
The
default search engine for users of the AOL internet service provider,
and hence a busy site. Its URL is http://www.netfind.com.
It is essentially the same engine as Excite.
Applet
A
small program, often written in Java, which usually runs in a
web browser, as part of a web page. It is possible that the use
of such a program may cause spiders and robots to stop indexing
a page.
ArchitextSpider
The
name of the Excite search engine's spider.
Ask
Jeeves
A
meta search engine which can be asked questions in English. This
service is also in use at Altavista. http://www.askjeeves.com.
Bait-and-Switch
The
provision of one page for a search engine or directory and a different
page for other user agents at the same URL. Various methods can
be used, e.g. Agent Name Delivery or IP Delivery.
Bridge
Page
CGICommon
Gateway Interface - a standard interface between web server software
and other programs running on the same machine.
CGI
Program
Strictly,
any program which handles its input and output data according
to the CGI standard. In practice, CGI programs are used to handle
forms and database queries on web pages, and to produce non-static
web page content.
Channels,
Channel listings
Lists
of links to selected (and usually popular) web sites. The links
are maintained by search engines and directories and are sorted
into categories or channels. Sites are picked by a channel editor,
often because of a site's already high ranking with the search
engines. Some search engines and directories allow visitors to
nominate sites for inclusion in their channels.
Client
A
computer, program or process which makes requests for information
from another computer, program or process. Web browsers are client
programs. Search engine spiders are (or can be said to behave
as) clients.
Click
through
The
process of clicking on a link in a search engine output page to
visit an indexed site.
This
is an important link in the process of receiving visitors to a
site via search engines. Good ranking may be useless if visitors
do not click on the link which leads to the indexed site. The
secret here is to provide a good descriptive title and an accurate
and interesting description.
Cloaking
The
hiding of page content. Normally carried out to stop page thieves
stealing optimized pages.
Clustering
The
listing of only one page from each web site in a search engine
or directory's list of search results. This avoids occupation
of all the top results by a small number of web sites and makes
the list of results clearer and more useful to the user.
Comment
The
HTML <!-- and --> tags are used to hide text from browsers.
Some search engines ignore text between these symbols but others
index such text as if the comment tags were not there. Comments
are often used to hide javascript code from non-compliant browsers,
and sometimes (notably on Excite) to provide invisible keywords
to some search engines.
Crawler
See Spider.
Dead
Link
An
internet link which doesn't lead to a page or site, probably because
the server is down or the page has moved or no longer exists.
Most search engines have techniques for removing such pages from
their listings automatically, but as the internet continues to
increase in size, it becomes more and more difficult for a search
engine to check all the pages in the index regularly. Reporting
of dead links helps to keep the indexes clean and accurate, and
this can usually be done by submitting the dead link to the search
engine.
De-listing
The
removal of pages from a search engine's index.
Removal can occur for various reasons, including unreliability
of the machine that hosts a site or because of perceived attempts
at spamdexing.
Description
Descriptive
text associated with a web page and displayed, usually with the
page title and URL, when the page appears in a list of pages generated
by a search engine or directory as a result of a query. Some search
engines take this description from the DESCRIPTION Meta tag -
others generate their own from the text in the page. Directories
often use text provided at registration.
Direct
Hit
A
system which monitors the search engine users' selections from
search engine results, counting which results are clicked on most,
and how long visitors spend at that site, so as to improve relevancy.
Used by HotBot and as a plug-in to Apple's new innovative Sherlock
search system. See www.directhit.com.
Directory
A
server or a collection of servers dedicated to indexing internet
web pages and returning lists of pages which match particular
queries. Directories (also known as Indexes) are normally compiled
manually, by user submission (such as at whatsnew.com), and often
involve an editorial selection and/or categorization process (such
as at LookSmart and Yahoo).
Dogpile
A
meta search engine. Found at
http://www.dogpile.com.
Domain
A
sub-set of internet addresses. Domains are hierarchical, and lower-level
domains often refer to particular web sites within a top-level
domain. The most significant part of the address comes at the
end - typical top-level domains are .com, .edu, .gov, .org (which
sub-divide addresses into areas of use). There are also various
geographic top-level domains (e.g. .ar, .ca, .fr, .ro etc.) referring
to particular countries.
The
relevance to search engine terminology is that web sites which
have their own domain name (e.g. http://www.nativetongues.com)
will often achieve better positioning than web sites which exist
as a sub-directory of another organisation's domain (e.g. http://ourworld.compuserve.com/homepages/tijana/).
Doorway
Page
See Gateway Page.
Dynamic
content
Information
on web pages which changes or is changed automatically, e.g. based
on database content or user information. Sometimes it's possible
to spot that this technique is being used, e.g. if the URL ends
with .asp, .cfm, .cgi or .shtml. It is possible to serve dynamic
content using standard (normally static) .htm or .html type pages,
though. Search engines will currently index dynamic content in
a similar fashion to static content, although they will not usually
index URLs which contain the ? character.
Entry
Page See Gateway Page.
Euroseek
A search engine which concentrates on information relating to
Europe. The URL is http://www.euroseek.com.
Excite
Regarded as one of the best search engines, with an index of 55
million pages. It can be slow to index new sites. The URL is http://www.excite.com.
Sites using frames must have a NOFRAMES section in order to be
listed. Some spamming has been noticed. Excite previously ignored
the DESCRIPTION meta tag, but is now using this in its listings
(although the contents do not affect relevancy, which is based
mainly on the title and body text). The use of gateway pages and
hidden text is allowed. Excite has an audio/video search facility
which is a branded component of RealNetworks' RealPlayer G2.
Fake
Copy
Listings Sometimes a malicious company will steal a web page or
the entire contents of a web site, re-publish at a different URL
and register with one or more search engines. This can cause a
loss of traffic from the original site if the search engines position
the copy higher in the listings. If you find that someone has
stolen your site in this way, write to the company concerned and
ask them to remove the stolen content. Also contact the hosting
service used by the company, any company that benefits from the
theft and any search engine(s) concerned. If the thieves refuse
to remove the material or ignore you, obtain legal advice. It
is also well worth having printed evidence to support your claim
that your copy of the material was there first, and that you have
the copyright! See also Mirror Sites.
False
Drop
A web page retrieved from a search engine or directory which is
not relevant to the query used. This could be for one of the following
reasons:
The
web page contained the keywords entered, but used in the wrong
context, with a different meaning or with a different inter-relationship
to that expected.
The
web page is an attempt at spamdexing.
The
search engine has a fault in its database or a bug in its query
program.
Flash
Page See Splash Page.
Font
and Background Spoofs Various techniques used to place invisible
text in a web page, to improve positioning without affecting the
appearance of the page. These are mostly based on setting the
font and background colours to the same value (e.g. white). Most
search engines now detect these tricks.
Frames
An HTML technique for combining two or more separate HTML documents
within a single web browser screen. Compound interacting documents
can be created to make a more effective web page presented in
multiple windows or sub-windows. A framed web site often causes
great problems for search engines, and may not be indexed correctly.
Search engines will often index only the part of a framed site
within the <NOFRAMES> section, so make sure that the <NOFRAMES>
section includes relevant text which can be indexed by the spiders.
If your site uses frames, consider providing a gateway page or
adding navigational links within the framed pages. Submit the
main page - the one containing the <FRAMESET> tag to the
search engines. If you use a gateway page, submit this separately.
Gateway
Page
A web page submitted to a search engine (spyder) to give the relevance-algorithm
of that particular spyder the data it needs, in the format that
it needs it, in order to place a site at the proper level of relevance
for the topic(s) in question. (This determination of topical relevance
is called "placement".) A gateway page may present information
to the spyder, but obscure it from a casual human viewer. The
gateway page exists so as to allow a web-site to present one face
to the spyder, and another to human viewers. There are several
reasons why one might want to do this. One, is that the author
may not want to publicly disclose placement tactics. Another is
that the format that may be easiest for a given spyder to understand,
may not be the format that the author wishes to present to his
viewers for aesthetics. Still another may be that the format that
is best for one spyder may differ from that which is best for
another. By using gateway pages, you can present your site to
each spyder in the way which is known or thought to be best for
that particular spyder.
Also
known as bridge pages, doorway page, entry pages, portals or portal
pages. An example gateway page:
http://www.isquare.com/gateway.htm
Go.com
A
portal partnership between Infoseek and Disney, with search capabilities
based on the Infoseek index, at http://go.com/.
GoTo
A
search engine, powered by Inktomi, which only returns one URL
per domain in its search results. Operates a "pay per click"
scheme where websites can pay to increase their relevancy. The
URL is http://www.goto.com.
Gulliver
The name of the Northern Light Search Engine's spider.
Heading
Many
search engines give extra weight and importance to the text found
inside HTML heading sections. It is generally considered good
advice to use headings when designing web pages and to place keywords
inside headings.
Hidden
Text
Text
on a web page which is visible to search engine spiders but not
visible to human visitors. This is sometimes because the text
has been set the same colour as the background, because multiple
TITLE tags have been used or because the text is an HTML comment.
Hidden text is often used for spamdexing. Many search engines
can now detect the use of hidden text, and often remove offending
pages from their database or lower such pages' positioning.Text
can also be hidden using agent name delivery or IP delivery either
to present different text to different search engine spiders or
to hide the real HTML source from competitors. The Stealth META
Tag CGI Script probably uses this technique and is available at
http://www.OutRank.com/stealth.shtml. Another software product
which hides HTML source is called Psyral Phobia and is available
at http://www.merlesworld.com/software.htm.
Hit
In
the context of visitors to web pages, a hit (or site hit) is a
single access request made to the server for either a text file
or a graphic. If, for example, a web page contains ten buttons
constructed from separate images, a single visit from someone
using a web browser with graphics switched on (a "page view")
will involve eleven hits on the server. (Often the accesses will
not get as far as your server because the page will have been
cached by a local internet service provider). In the context of
a search engine query, a hit is a measure of the number of web
pages matching a query returned by a search engine or directory.
Hotbot
One
of the largest search engines, indexing 110 million pages. Powered
by Inktomi, new submissions appear to be taking two weeks or longer
to appear. The URL is http://www.hotbot.com.
HTML
HyperText
Markup Language - the (main) language used to write web pages.
HTTP
HyperText
Transfer Protocol - the (main) protocol used to communicate between
web servers and web browsers (clients).
Image
Map
A
set of hyperlinks attached to areas of an image. This may be defined
within a web page, or as an external file.If the image map is
defined as an external file, search engines may have problems
indexing your other pages, unless you duplicate the links as conventional
text hyperlinks.If the image map is included within the web page,
the search engines should have no problem following the links,
although it's good practice to provide text links too, to aid
the visually impaired and those accessing the web with graphics
switched off or using text only browsers.
Inbound
Link
A hypertext link to a particular page from elsewhere, bringing
traffic to that page. Inbound links are counted to produce a measure
of the page popularity. Searches for the inbound links to a page
can be made on Altavista, Infoseek and Hotbot.
Index
See
Directory. Also refers to the database of web pages maintained
by a search engine or directory.
Infind
A meta search engine. Found at
http://www.infind.com.
Infoseek
One of the largest search engines. New sites are normally added
very quickly, within one or two business days. The URL is
http://www.infoseek.com. Infoseek is one of the few search
engines to treat singular and plural forms as the same word. Very
sensitive to page popularity in its positioning algorithm.
Inktomi
The
database used by some of the largest search engines, including
Hotbot. Inktomi is also used by Yahoo when no matches are found
in Yahoo's own database.
IP
Delivery
Similar
to agent name delivery, this technique presents different content
depending on the IP address of the client. It is very difficult
to view pages hidden using this technique, because the real page
is only visible if your IP address is the same as (for example)
a search engine's spider.
Java
A
computer programming language whose programs can run on a number
of different types of computer and/or operating system. Used extensively
to produce applets for web pages.
Javascript
An
simple interpreted computer language used for small programming
tasks within HTML web pages. The scripts are normally interpreted
(or run) on the client computer by the web browser. Some search
engines have been known to index these scripts, presumably erroneously.
Keyword
A word which forms (part of) a search engine query.
Keyword
Density
A
property of the text in a web page which indicates how close together
the keywords appear. Some search engines use this property for
Positioning. Analysers are available which allow comparisons between
pages. Pages can then be produced with the similar keyword densities
to those found in high ranking pages.
Keyword
Domain Name
The use of keywords as part of the URL to a website. Positioning
is improved on some search engines when keywords are reinforced
in the URL.
Keyword
Phrase
A phrase which forms (part of) a search engine query.
Keyword
Purchasing
The buying of search keywords from search engines, usually to
control banner ad. placement. All the major search engines (except
EuroSeek and GoTo) insist that keyword purchasing is only used
for banner ad. placement, and doesn't influence search results.
The display of banner ads. for bought keywords can be studied
using a service called Bannerstake from Thomson and Thomson at
http://www.namestake.com. which returns the banner ads. displayed
when particular queries are used.
Keyword
Stuffing
The
repeating of keywords and keyword phrases in META tags or elsewhere.
Link
Popularity See page popularity.
Log
File A file maintained on a server in which details of all file
accesses are stored. Analysing log files can be a powerful way
to find out about a web site's visitors, where they come from
and which queries are used to access a site.Various software packages
are available to analyse log files, and some are listed below.
Sane Solutions provide NetTracker, which is good at analysing
queries from log files. A free program called WebLog is available
at http://www.awsd.com. See also the reviews at http://www.bellacoola.com/html/sample_reports.htm.
LookSmart
A
medium-sized directory. The URL is
http://www.looksmart.com.
Lycos
One
of the largest search engines, Lycos appears to be moving towards
becoming a directory and is using the Open Directory for some
search results. It can be slow to index new sites. The lycos spider
ignores meta tags in pages. Lycos can be found at http://www.lycos.com.
Metacrawler
A
meta search engine found at
http://www.metacrawler.com. Results from various search engines
are summarised in an easy to read form.
Metafind
A
meta search engine found at http://www.metafind.com.
Meta
Search
A
search of searches. A query is submitted to more than one search
engine or directory, and results are reported from all the engines,
possibly after removal of duplicates and sorting. Also the meta
search engine of the same name, found at http://www.metasearch.com.
Meta
Search Engine
A
server which passes queries on to many search engines and/or directories
and then summarises all the results. Ask Jeeves, Dogpile, Infind,
Metacrawler, Metafind and Metasearch are examples of meta search
engines.
Meta
tag
A
construct placed in the HTML header of a web page, providing information
which is not visible to browsers. The most common meta tags (and
those most relevant to search engines) are KEYWORDS and DESCRIPTION.
The
KEYWORDS tag allows the author to emphasise the importance of
certain words and phrases used within the page. Some search engines
will respond to this information - others will ignore it. Don't
use quotes around the keywords or keyphrases.
The
DESCRIPTION tag allows the author to control the text of the summary
displayed when the page appears in the results of a search. Again,
some search engines will ignore this information.
The
HTTP-EQUIV meta tag is used to issue HTTP commands, and is frequently
used with the REFRESH tag to refresh page content after a given
number of seconds. Gateway pages sometimes use this technique
to force browsers to a different page or site. Most search engines
are wise to this, and will index the final page and/or reduce
the ranking. Infoseek has a strong policy against this technique,
and they might penalize your site, or even ban it.
Other common meta tags are GENERATOR (usually advertising the
software used to generate the page) and AUTHOR (used to credit
the author of the page, and often containing e-mail address, homepage
URL and other information).
Mining
Company
A
large directory spread over many different URLs The main URL is
http://www.miningco.com.
Mirror
Sites
Multiple
copies of web sites or web pages, often on different servers.
The process of registering these multiple copies with search engines
is often treated as spamdexing, because it artificially increases
the relevancy of the pages. Filters such as the Infoseek Sniffer
now remove multiple mirrors from the indexes.
Misspellings
People
quite often spell words incorrectly when using search engines.
Pages which use common misspellings will quite often receive extra
hits, so it is a useful technique to include common misspellings
of words in alt tags, keywords, page names and titles. A similar
effect occurs when spaces are missed out and words are accidentally
joined together.
MultiCrawl
A
parallel search engine which offers users their own branded versions.
http://www.multicrawl.com.
Multiple
Domain Names
The
use of several extra domains to provide gateway pages or gateway
sites to the main site.
Multiple
Keyword Tags
The
use of more than one Keywords META tag in order to try to increase
the relevancy of the best keywords on a page. This is not recommended.
It may be detected as a spamming technique, or all but one of
the tags may simply be ignored.
Multiple
Titles
It
used to be possible to repeat the HTML title tag in the header
section of a page several times to improve search engine positioning.
Most search engines now detect this trick.
Netfind
See AOL Netfind.
NewHoo
See the Open Directory Project.
Northern
Light
A
search engine with an additional "pay to access" special
collection of business, health and consumer publication articles.
The first search engine to ban meta search engines from its database.
The URL is http://www.northernlight.com.
Open
Directory Project
A
directory project run by thousands of volunteer editors. In principal,
this is a very exciting and powerful way to organise the web.
In practice, there have been some problems with the behaviour
of some of the editors, which has caused some initial difficulty
for the organisers. Initially known as NewHoo, the project is
now part of Netscape (and therefore of AOL). See http://directory.mozilla.org.
Open
Text
A
large business-only directory. The URL is http://www.opentext.com.
Optimization
Changes
made to a web page to improve the positioning of that page with
one or more search engines. A means of helping potential customers
or visitors to find a web site. Optimization may involve design/layout
changes, new text for the title-tags, meta-tags, alt- attributes,
headings, and changes to the first 200-250 words of the main text.
A large image map at the top of a page should be moved further
down the page. Frames should be avoided (unless navigational links
are also provided within the frames).
Page
Popularity
A
measure of the number and quality of links to a particular page
(inbound links). Many search engines (and most noticeably Infoseek)
are increasingly using this number as part of the positioning
process. The number and quality of inbound links is becoming as
important as the optimisation of page content. A free service
to measure page popularity can be found at http://www.linkpopularity.com.
Page
View
Used
in site statistics as a measure of pages viewed rather than server
hits. Many server hits may be made to access a single page, causing
many separate log file entries. Analysis software can determine
that these server hits were generated when a visitor viewed a
single page, and group them together to provide this more useful
method of counting visitors. See also Hit and Unique Visitor.
Placement
See Positioning.
Politeness
Window
In
order not to overburden any particular server, most search engine
spiders limit their access to each server. If your page is hosted
on the same server as thousands of other pages, the spider may
never get the time to reach (and index) your page. This can be
a powerful argument for having your own server.
Portal
See Gateway page. Can also mean Portal Site.
Portal
Page
See Gateway page.
Portal
Site
A
generic term for any site which provides an entry point to the
internet for a significant number of users.
Examples
are search engines, directories, built-in default browser or service
provider homepages, sites hardwired to browser buttons, sites
offering free homepages, e-mail or personalised news and any popular
(or heavily advertised) sites that significant numbers of people
may bookmark or set as default pages.
Positioning
The
process of ordering web sites or web pages by a search engine
or a directory so that the most relevant sites appear first in
the search results for a particular query. Software such as PositionAgent,
Rank This and Webposition can be used to determine how a URL is
positioned for a particular search engine when using a particular
search phrase. The GoHip Search site allows you to see positioning
information from many of the big search engines, displayed all
on one page.
Positioning
Technique
A
method of modifying a web page so that search engines (or a particular
search engine) treat the page as more relevant to a particular
query (or a set of queries)
Query
A
word, a phrase or a group of words, possibly combined with other
syntax used to pass instructions to a search engine or a directory
in order to locate web pages.For details of which queries are
being used, visit the GoTo.com Search Inventory page. To "spy"
on queries as they're entered, look at the Metaspy page. A summary
of what people actually search for can be found at http://www.synergy-marketing.com/search.html.
A free program called Word Market will collect search terms from
the search engines, and is available at http://www.softwaresolutions.net/free.htm.
The Canadian Email Business Network provides a Meta Tags/Keywords
Search Engine at http://www.cebn.com/metatags.htm which allows
searches through thousands of recent search engine queries.
Ranking
See Positioning.
RealNames
An
alternate website address system in operation at Altavista. Brand
names used in searches are mapped directly to the appropriate
website, usually because the company owning the brand-name has
paid a fee to RealNames. http://www.realnames.com
Referrer
The
URL of the web page from which a visitor came. The server's referrer
log file will indicate this. If a visitor came directly from a
search engine listing, the query used to find the page will usually
be encoded in the referer URL, making it easy to see which keywords
are bringing visitors. The referer information can also be accessed
as document.referrer within JavaScript or via the HTTP_REFERER
environment variable (accessible from scripting languages).
Refresh
Tag
See the paragraph about HTTP_EQUIV under Meta Tag.
Registration
The
process of informing a search engine or directory that a new web
page or web site should be indexed.
Relevancy
Algorithm
The
method a search engine or directory uses to match the keywords
in a query with the content of each web page, so that the web
pages found can be ordered suitably in the query results. Each
search engine or directory is likely to use a different algorithm,
and to change or improve its algorithm from time to time.
Re-submission
Repeating
the search engine registration process one or more times for the
same page or site. Under certain circumstances, this is regarded
with suspicion by the search engines, as it could indicate that
someone is experimenting with spamming techniques.
The
Infoseek and Altavista search engines are particularly vulnerable
to spamming because they list sites very quickly, and are thus
easy to experiment with. Both engines de-list sites for repeated
re-submission and Infoseek, for example, does not allow more than
one submission of the same page in a 24 hour period. Occasional
re-submission of changed pages is not normally a problem.
Robot
Any
browser program which follows hypertext links and accesses web
pages but is not directly under human control. Examples are the
search engine spiders, the "harvesting" programs which
extract e-mail addresses and other data from web pages and various
intelligent web searching programs. A database of web robots is
maintained by Webcrawler.
robots.txt
A
text file stored in the top level directory of a web site to deny
access by robots to certain pages or sub-directories of the site.
Only robots which comply with the Robots Exclusion Standard will
read and obey the commands in this file. Robots will read this
file on each visit, so that pages or areas of sites can be made
public or private at any time by changing the content of robots.txt
before re-submitting to the search engines. The simple example
below attempts to prevent all robots from visiting the /secret
directory:
User-agent:
*
Disallow: /secret
For more information, please refer to the Altavista robots.txt
page.
Scooter
The
name of the Altavista search engine's spider.
Search
Engine
A
server or a collection of servers dedicated to indexing internet
web pages, storing the results and returning lists of pages which
match particular queries. The indexes are normally generated using
spiders. Some of the major search engines are , Altavista, Excite,
Hotbot, Infoseek, Lycos, Northern Light and Webcrawler. Note that
Yahoo is a directory, not a search engine. The term Search Engine
is also often used to describe both directories and search engines.
Searchking
A
smaller search engine which allows visitors to vote on the relevance
of the pages returned by their queries, thus ranking sites based
on the opinions of searchers. Unlike some of the major search
engines, there is good customer support. http://www.searchking.com.
Search
Term
See Query.
Server
A
computer, program or process which responds to requests for information
from a client. On the internet, all web pages are held on servers.
This includes those parts of the search engines and directories
which are accessible from the internet.
Sidewinder
The
name of the Infoseek search engine's spider.
Siphoning
The
use of various means to steal another site's traffic. Techniques
used include the wholesale copying of web pages (with the copied
page altered slightly to direct visitors to a different site,
and then registered with the search engines) and the use of keywords
or keyword phrases "belonging" to other organisations,
companies or web sites.
Site
Hit
See hit.
Skewing
Artificially
changing search engine results so that, for example, popular queries
will return artificially created listings. Infoseek is currently
experimenting with this technique, using a small group of reviewers
to artificially force higher relevance for certain sites.
Slurp
The
name of the spider used by Inktomi.
Snap!
A
large directory. The URL is http://www.snap.com.
Sniffer
The
name of the filter program used by the Infoseek search engine
to prevent spamdexing. It detects multiple mirror pages, font
and background spoofs, multiple title tags, keyword stuffing and
possibly other types of spamdexing.
Spamdexing
The
alteration or creation of a document with intent to deceive an
electronic catalog or filing system. Any technique that increases
the potential position of a site at the expense of the quality
of the search engine's database can also be regarded as spamdexing
- also known as spamming or spoofing.
Spamming
See
spamdexing. Spamming is also used more generally to refer to the
sending of unsolicited bulk electronic mail, and the search engine
use is derived from this term.
Spider,
Spyder
That
part of a search engine which surfs the web, storing the URLs
and indexing the keywords and text of each page it finds. Please
refer to the Search Engine Watch SpiderSpotting Chart for details
of individual spiders. See also Robot.
Spidering
The
process of surfing the web, storing URLs and indexing keywords,
links and text.
Typically,
even the largest search engines cannot spider all of the pages
on the net. This is due to the huge amount of data available,
the speed at which the new data appears, the use of politeness
windows and practical limits on the number of pages that can be
visited in a given time . The search engines have to make compromises
in order to visit as many sites as possible, and they do this
in different ways. For example, some only index the home pages
of each site, some only visit sites they're explicitly told about,
and some make judgements about the importance of sites (from number
and quality of inbound links) before "digging deeper"
into the subpages of a site.
Splash
page
Similar
to a gateway page but provides an initial display which must be
viewed before a visitor reaches the main page. This usually acts
as a kind of "opening title" sequence, and can be extremely
annoying.
Spoofing
See spamdexing.
SSI
Server
Side Includes. Used (for example) to add dynamically generated
content to a web page.
Stealth
Script
A
CGI script which switches page content depending on who or what
is accessing the page. See agent name delivery.
Stemming
A
function of some search engines and directories which allows results
to be returned from some or all keywords based on the same stem
as the keyword entered as a search term. For example, when stemming
is switched on, a search for the word dance will return matches
for any word whose stem is danc-, matching the keywords dance,
dancer and dancing.
Stop
Word
A
word which is ignored in a query because the word is so commonly
used that it makes no contribution to relevancy. Examples are
common net words such as computer and web, and general words like
get, I, me, the and you.
Submission
Service
Any
agent which submits your site to many search engines and directories.
Useful to get listed with many of the minor search engines, but
don't rely on such services to get listed with the major search
engines. Many of these services are automatic and run from web
sites. Others run off line. Some are free. Beware of supplying
your email address to the so called FFA (free for all) services
- you may receive lots of spam.
Title
The
text contained between the start and end HTML tags of the same
name. This text is associated with (but not displayed in) the
web page containing these tags, and is displayed in a special
position (usually at the top of the window) by the web browser.Title
text is important because it normally forms the link to the page
from the search engine listings, and because the search engines
pay special attention to the title text when indexing the page.
Don't confuse this text with heading text within the web page
which often looks like the title. Usually this will be rendered
either using the HTML heading tags or just rendered with a large
font size.
Traffic
The
visitors to a web page or web site. Also refers to the number
of visitors, hits, accesses etc. over a given period.
Unique
Visitor
A
real visitor to a web site.Web servers record the IP addresses
of each visitor, and this is used to determine the number of real
people who have visited a web site.If for example, someone visits
twenty pages within a web site, the server will count only one
unique visitor (because the page accesses are all associated with
the same IP address) but twenty page accesses.See also hit and
page view.
URL
Universal
Resource Locator. An address which can specify any internet resource
uniquely. The beginning of the address indicates the type of resource
- e.g. http: for web pages, ftp: for file transfers, telnet: for
computer login sessions or mailto: for e-mail addresses.
URL
Submission
See Registration.
Virtual
Domain
A
domain hosted by a virtual server account.
Virtual
Server
An
account on a hosting company server, usually linked to its own
domain. This provides an inexpensive way to run a web site with
its own top level domain, and is usually indistinguishable from
having a separate physical server, except that the virtual server
may share an IP address with other virtual servers on the same
machine. A virtual server account is fine for most uses, but will
often be slower to respond than a physically separate server,
and physical access to the machine will seldom be allowed. The
cost of a virtual server account is a small fraction of that needed
to run a real server, mainly because of the expense of the dedicated
line needed to connect the server continuously to the rest of
the net.
Voila
A
search engine from France Telecom with interfaces in at least
different languages and a mission to become one of the leading
international engines. Their (international) English interface
at http://www.voila.com/ is produced in collaboration with Reuters,
Infospace and Looksmart. Their original French language interface
is at http://www.voila.fr/
Web
Copywriting
The
writing of text especially for a web page. Similar to the writing
of copy for any other type of publication, good web copywriting
can have a great effect on search engine positioning, so it forms
a major part of optimization.
Webcrawler
One
of the largest search engines. The URL is http://www.webcrawler.com.
XML
Extensible
Markup Language. A new language which promises more efficient
data delivery over the web. XML does nothing itself - it must
be implemented using 'parser' software or XSL.
XSL
Extensible
Scripting Language - an XML style sheet language supported by
the newer web browsers Internet Explorer 5 and Netscape 5
Yahoo
Similar
to a search engine, but with a database generated by hand, this
is the world's most used directory of web sites. The main URL
is http://www.yahoo.com. It is notoriously difficult to get listed
in Yahoo and, once listed, even more difficult to get your listing
changed or to get out! To increase the odds of getting listed,
try the following:
Select
the three categories you want to be listed in very carefully.
Consider the regional categories. Ensure that the categories match
the content of your site.
Apply
to one of their local subsidiaries for your own country or city.
Make
sure that your site is well-designed and easy to navigate.
Ensure
your site has no dead links.
Ensure
that your pages download quickly.
Provide
good contact information on your site.
If
you manage to get listed, keep the e-mail they send you. You can
e-mail the same person subsequently to get your listing changed.
|