|
Search
IBIMA Publishing library
Open
Accesss to full-text
Articles
|
powered by

|
|
Communications of the IBIMA
Volume 2010
(2010),
Article ID 126850,
Communications of the IBIMA, 7 pages DOI: 10.5171/2010.126850
A Reflection of Search Engine Strategies
Anushia Inthiran1, Saadat M. Alhashmi1, and Pervaiz K. Ahmed2
1School of Information Technology, Monash University Sunway Campus, Malaysia 2School of Business, Monash University Sunway Campus, Malaysia
Copyright
© 2010 Anushia Inthiran, Saadat M. Alhashmi, and Pervaiz K.
Ahmed. This is an open access article distributed under the
Creative Commons Attribution License unported 3.0, which permits
unrestricted use, distribution, and reproduction in any medium,
provided that original work is properly cited.
Abstract
Cost
efficiency has been a dominant perspective in the traditional IT
literature. However, in complex technology and business environment,
the widely recognized cost efficient assumption of information
technology has been increasingly challenged. Drawing from a case study
of wireless network implementation situated in a politically sensitive
workplace, this paper provided practice insights for IT managers in
today’s networked economy. More specifically, stories experienced in
the case study illustrated that despite well-calculated cost efficiency
of wireless network infrastructure, the radical implementation process
in the case organization encountered enormous challenges and opposition
due to the fact that administrators failed to consider various
stakeholders’ positions and interests. Eventually, the implementation
objectives and outcome were considerably undermined. Implications from
this empirical case research reemphasized the significance of
understanding political forces situated in any business environment
where different stakeholders hold conflicting interests. Lessons
learned from the case story further encouraged IT managers and policy
makers to better strategize emerging information technology in general
and wireless networks in particular as the whole global society and
business environment are increasingly facing an emerging wireless
world.
Keywords: re-ranking, search history, query expansion, personalization
Introduction Search
engines provide an interface for users to enter the query. Most search
engines like Google, Yahoo and AltaVista provide a free form text box
that allows for the user to key in the query. Google, now provide users
with options, for example, the user can now request for the search to
be done on local pages or for pages from the World Wide Web. In
addition, search is limited to certain parameters, for example search
for video or images. Search engines recognise that users prefer
to have options while searching. A single interface is not longer
suitable to fit the multitude of users on the Internet. Fu (2007)
states the search engine is becoming increasing important as a tool to
acquire information and enable self directed learning. In this aspect,
Teevan et al.,(2005) states the search engine is now not just seen as a
tool to acquire information from the World Wide Web but also to extend
the process of searching for information into a lifelong learning
process. In this day and age, the search engine plays an important role
in supporting this process. This is further accentuated by the fact
that Xu (2007) states computers become consumer products; and the
Internet a mass medium, searching the web had become a daily activity
for everyone from children to scientist. The search engine now has to
cater to various people from different background and age group.
The
search engines task is to present results to the user based on the
query provided. There are generally three methods of how the search
engine function, crawler based, directory or human powered and hybrid.
In crawler based search engines, listings are created automatically. In
human powered directory, a short description about the web site is
submitted to the directory, in the hybrid technique, both methods are
merged. These are the techniques used that will allow for the search
engine to present results to the user. There are two categories of
users involved when using a search engine; one is the webmaster and the
other the searcher. For the webmaster, listing in major search engines
is an important place to be, because they can potentially generate more
traffic. Sullivan (2007) state for the searcher, using well known and
dependable search engines generally mean more dependable results. In
this paper the focus in on the aspect of the user.
Micarelli
et al., (2007) states the exponential growth of the Internet, has made
the task of the search engine more difficult, users surfing the web in
search of resources to satisfy their information need have less and
less time and patience to formulate queries, wait for the results and
sift through them. The search engine now has to deal a variety of
information needs, user behaviour and eccentricities. Besides
just looking at time as a factor, other issues like not being able to
discern and individual’s goal add to the frustration of the searcher.
Lv et al., (2006) states some searchers also indicate that results are
being ordered by popularity rather than relevance to the user’s
individual need. If search results deviate from the user’s information
goal then the searcher will definitely not be satisfied with the search
results and have to spend more time ‘searching further’ in the results
page. Additionally, Beaza-Yates et al., (2004) state users also
complain that, although the user interface of search engines is user
friendly, it is not always easy for searchers to express or formulate
their queries. Since the query is the only input that the search
engine is provided with in order to present relevant results, the
search engine has to now maximise the usage of the query for it to be
beneficial to the user.
Three
popular techniques used by search engines to provide relevant results
to users are query expansion, search history and re-ranking. Generally
these techniques used past and present activities/user actions or logs
to predict information seeking goals. Once the information seeking goal
is obtained, search results are tailored to fit the goal. In the
following sections, we discuss how search engines utilise these
techniques to provide users with relevant results. Query expansion,
search history and re-ranking will be examined in Section 2.0 3.0 and
4.0 respectively. In Section 5.0 an analysis and conclusion will be
provided.
Query Expansion Query
expansion is a technique that uses the original query provided by the
user. Expansion is performed by the search engine by viewing the user’s
history log or by using algorithms. The rational for expanding the
query is related to being able to provide users with more results. From
the search engine perspective, Google reports that the average Google
query is 3 in the year 2007. With just 3 words, it is not possible for
the search engine to discern the searchers information need. Query
expansion is used to provide more words. The assumption made is that
more words results in more information for the search engine to use to
provide searchers with relevant results.
Intelligent
Client Side Web Search Agent (UCAIR) developed by Shen et al., (2005)
is a client side browser plug-in that acts as a proxy for web search
engines. One of the methods used in UCAIR to provide searchers with
relevant results is query expansion. In order to perform query
expansion, the users’ actions such as viewing a document and clicking
on the ‘Back’ or ‘Next’ button is used as indicators. With these
responses, the search engine will decide if previous queries are
related to the current query and if so, expand the current query with
useful terms from the previous query.
This
technique requires clear demarcation of session boundary. In the event
the adjacent queries are not related to each other, the search engine
will need to be able to differentiate the two. If this is not done then
the query expansion will be filled with unrelated terms that will cause
the search results to deviate from the users information need.
Currently in UCAIR, textual similarity is used to perform boundary
detection. Textual similarity is performed by looking at similarity in
terms of concepts.
Teevan
et al., (2005) states that web log analysis is also used to perform
query expansion. If query expansion is limited to statistical
information rather than textual similarity, this limits the expansion
technique. This is because related queries do not necessarily share the
same words, for example, when a user keys in the query, ‘java island’
and if the adjacent query was ‘travel Indonesia’, and if similar words
is used, then although the two queries are related, the query expansion
mechanism will not be able to relate the two queries. This will lead to
poor results list. However if textual similarity is used then, the
search engine will be able to determine that both the queries are
related.
In
general, when query expansion is performed, the search engine will need
to determine the number of words to select for expansion. Selecting too
few words or too many words will affect the expansion process. The
focus on the number of words required for query expansion is an area
which has not been dealt with much in literature. Popular research area
for query expansion is related to where to obtain information to
perform the expansion.
Xu
et al., (2007) states query expansion is also explored using an
algorithm used to perform correlation of terms. An overlay algorithm is
used to perform expansion on individual terms. To date, this is the
first research technique that performs expansion on individual words of
the query and not the entire query string. However, with more query
terms natural language issues will complicate the expansion process.
Some form of distinction must be made between the selection of query
terms to expand, If two unrelated query terms is expanded this
will lead to relevant results being missed out.
Current
query expansion techniques look at textual similarity, number of
keywords and whether the entire query is expanded or individual terms
expanded in totality. There is also the need to consider if expansion
is performed semantically. Currently, conceptual and contextual
similarities is used, however since semantic expansion is rarely
researched, we are unable to conclude as to which of the three
techniques will perform better. In the next section search history will be examined. Search History Usage
of search history requires a tool to collect information from the user,
in work by Speretta and Gauch (2005), Google Wrapper is used.
User browsing history is seen as a source of information that can be
used to narrow down the users’ interest and search goal.
However,
merely collecting the search history without organisation and
categorization is not beneficial to the overall results presentation. A
method is needed to decide what needs to be collected and where to
place this collection. Speretta and Gauch (2005) state, semantics and
ontology can be used to organise the search history automatically. In
work done by Kim and Chan (2003) browsing history is organised into
clusters to create user interest hierarch. Web pages collected is
organised into clusters, when a user visits a page it is taken as an
indication of interest. This technique of measuring interest is
error nous. Visiting a page is also an indication of disinterest.
Verification method is necessary to quantify if a page is of interest
to the user. An extension to this idea is implemented by Chan (1999)
where a metric is developed to determine user level interest, for
example, the percentage of links visited on a page or URL’s present in
bookmarks.
The
implementation of a monitoring tool on the user’s machine to perform
logging is a cause for concern. Concerns over privacy protection is
growing in parallel with the demand for services. Gauch et al., (2007)
state that these two trends, privacy and obtaining better search
results seem to be in direct opposition to each other, so privacy
protection must be a crucial component of every personalized system.
Similarly,
if a monitoring tool is implemented on the users’ desktop, space
necessary to store logs is another cause for concern. Speretta and
Gauch (2005) state maintenance is required on the logs to determine
which ones is relevant to the users’ query. Differentiation must be
made between short term and long term search context. Speretta and
Gauch (2005) state implicit feedback information collected over a long
period of time is unlikely to be very useful, but the immediate search
context and feedback information is expected to be much more useful.
Unfortunately, current search history technique is not able to
differentiate between short term and long term interest.
In
a home personal computer, there is a possibility that there can be more
than one user to the computer. Speretta and Gauch (2005) states
that each user will need to register their email address to obtain a
cookie to store and upload their user id on their local machine. This
solves the problem of multiple users on a machine, however, this task
is painful for the user to perform, especially if the cookie is lost.
Users do not want to manually perform these responsibilities in order
to obtain relevant results. They require these activities to be
performed automatically. Organisation of web logs collected requires
further research to determine an effective method of usage. The technique of re-ranking is examined in the next section.
Re-ranking
Re-ranking
is a technique where the original order of corpus of information is
changed to suite an individual user. The change in rank is necessary as
generic rank provides generic search results. This eliminates the user
from browsing through irrelevant search results. Irrelevant
information browsing only adds to the cognitive burden and increases
the amount of time taken to find information.
In
work by Shen et al., (2005), re-ranking is performed based on two cases
when the user clicks on the ‘Back’ button and ‘Next’ on the browser.
Any seen and unseen results is re-ranked so that the user will see
improved search results immediately. However, this model does not take
into account exceptions when the user interaction is a result of an
error. Time spent on a link, before clicking on the ‘Back’ or ‘Next’
button is a good indication of interest to use before re-ranking is
automatically performed. However Shen et al., (2005) states monitoring
time, is an overhead. Sometimes this technique is not completely
reliable. It only brings up a small number of highest re-ranked results
to be followed by any originally high ranked results.
Re-ranking
based on historical clicks is discussed in Dou et al., (2007) , however
this technique fails for a completely new query since there is no data
to manipulate for re-ranking to take place. The issue of ‘cold start’
was first coined in Zigoris and Zhang (2006). Any new user to the
system must endure poor initial performance until sufficient feedback
from that user is provided.
In
re-ranking, the issue of using validity metrics is important to
determine the user’s interest when clicking on the ‘Back’ or ‘Next’
button. Bearing in mind that it is not possible to re-rank all results
the objective of re-ranking is that users find relevant pages to their
information need at the top of the results list. Users are biased to
results that are listed at the top of the results list. A rational
searcher might be expected to asses each of these page summaries
against their information need and click on the one that appears as the
most relevant. Keane (2008) state people may not search in such a way.
The issue with cold start when using historical clicks to perform
re-ranking persists whenever a new query is input. However, this is
only an intermittent issue. In
the next section, we will look at the overall analysis of these
techniques and a conclusion for future direction in the area of
personalized search results.
Analysis and Conclusion
In
query expansion, the issue to expand individual terms or to
collectively expand terms remains an unsolved issue. Although more
query words result with more information, the search engine needs to
avoid issues with the natural language. Demarcation of session boundary
is necessary for query expansion. If two adjacent queries is not
related and the search engine expands the current query with terms of
the previous query this results with users having to browse irrelevant
pages before actually finding related information to the users’
information seeking goal. Using textual similarity contextually
and conceptually have been researched. However, keyword matching makes
natural language issues more apparent when compared to conceptual and
contextual similarity.
In
user search history, the main issue is the location of implementation.
Privacy issues, organisation and storing of the logs require the need
to organise and effectively use the logs. Logs provide abundance of
information but at the same time, an effective technique is needed to
reap the full benefit of these logs to provide users with what they
want.
Re-ranking
bring about issues with cold start and validity of measurements used.
The objective of re-ranking is to place relevant results at the top of
the search, hence provide users with their information seeking goal
quicker. However, it is impossible to perform re-ranking for the entire
result page.
Gulli
and Signorini (2005) states to a certain extent the review of
techniques used provides some solution for users but there is still
room for improvement. In an attempt to tailor accurate and appropriate
results to users in the result page, a search engine is required to
crawl over a massive and ever-growing collection of online content.
Searching remains a popular activity on the Internet, in a survey
conducted by Pew Internet (2008) , 89% of Americans used a search
engine to find information on the Internet. However, according to
Marchionini (2006), due to the sheer volume, the techniques mentioned
above, search engines only continue to perform ‘lookup search”, users
still need to browse from page to page to locate the information that
they need].
The
probable solution to this problem that is contributed by volume,
natural language issues, limited query words, difficulty in expressing
information need in the query and obtaining the users information goals
is personalization. This is a method according to Xu et al., (2007)
tries to adapt to individual needs and to move away from the one size
fits all.
Speretta
and Gauch (2005) state personalization is the process of presenting the
right information to the right user at the right moment. Let us take an
example of why it is important to provide users with the best possible
results. In a survey done by Search Engine Watch (2008), ‘redbox’
appeared in the movies category as a top ten search term in December
2008. However in Singapore, ‘redbox’ is an online florist, in Malaysia
a karaoke outlet. In these scenarios, personalization based on the
‘right user’ and ‘right moment’ is not easy to achieve. Moreover, the
‘right moment’ and ‘right user’ can change from time to time.
When
personalization techniques is employed instead of only focusing on
technical details of query expansion, user logs and re-ranking the
bigger picture of the user’s information need and what the user needs
‘right now’ is achieved. Personalization is also the solution to ‘do
what I mean and not what I say’. References
Baeza-Yates,
R. Hurtado, C. & Mendoza, M. (2004). Query Recommendation Using
Query logs in Search Engines, Current Trends in Database Technology,
EDBT 2004 Workshop, 3268, pp. 588-596.
Publisher - Google Scholar
Chan.
P.K.,( 1999) A non-invasive learning approach to building web user
profiles, KDD-99, Workshop on web usage analysis and user profiling,
pp. 7-12.
Publisher - Google Scholar Dou,
Z., Song, R., & Wen, J.R. (2007), A large scale evaluation and
analysis of personalized search strategies, Proceedings of the World
Wide Web Conference, pp.581-590.
Publisher - Google Scholar
Fu,
X. (2007). Evaluating sources of implicit feedback in web searches, ACM
Conference On Recommender Systems Proceedings of the 2007 ACM
conference on Recommender Systems, pp. 191-194.
Publisher - Google Scholar
Gauch,
S., Speretta, M., Chandramouli,A. & Micarelli,A. (2007). User
Profiles For Personalized Information Access, Lecture Notes in Computer
Science, The Adaptive Web, Computer Science, pp 54-89.
Publisher - Google Scholar
Gulli,
A. and Signorini, A. (2005), The indexable Web is more than 11.5
billion pages. In proceedings of the 14th International Conference in
the World Wide Web, 2005, USA, pp.902 – 903.
Publisher - Google Scholar
Huang
X., Peng F., An, A., & Schuurmans, D. (2004). Dynamic web log
session identification with statistical language models. Journal of the
American Society for Information Science and Technology,
55(4):1290-1303.
Publisher - Google Scholar - British
Library Direct
Keane,M.T.,
O’Brien,M. & Smyth, B. (2008). Are People Biased in Their Use of
Search Engines?, Communications of the ACM, 51(2), pp. 49 -52.
Publisher - Google Scholar - British
Library Direct
Kim
H.R. & Chan P.K. (2008). Learning implicit user interest hierarchy
for context in personalization, Applied Intelligence,Volume 28, Number 2, .
Publisher - Google Scholar - British
Library Direct
Lv,
Y., Sun,L., Zhang, J., Nie, J.Y., Chen, W. & Zhang, W. (2006). An
Iterative Implicit Feedback Approach to Personalised Search, 21st
International Conference on Computational Linguistics, pp.585-593.
Publisher - Google Scholar - British
Library Direct
Marchionini G. (2006). Exploratory Search: From Finding to understanding, Communications of the ACM, Vol. 49 No. 4.
Publisher - Google Scholar - British
Library Direct
Micarelli,A.,
Gauch, S., Speretta, M., & Chandramouli, A. (2007). User Profiles
For Personalized Information Access, The Adaptive Web, Lecture Notes in
Computer Science 4321, pp 54-89.
Publisher - Google Scholar - British
Library Direct
Shen,
X., Tan, B. & Zhai,C. (2005). Implicit User Modeling for
Personalized Search, Proceedings of the 14th International
Conference on Information and Knowledge Management, pp. 824-831.
Publisher - Google Scholar
Speretta,
M. & Gauch,S. (2005). Personalizing Search Based on User Search
Histories, IEEE/WIC/ACM International Conference on Web Intelligence,
pp.622-628
Publisher - Google Scholar
Sullivan
D. (2007). (online) Search Engine Watch, Major Search Engines and
Directories from Sullivan, D. (2007), from
http://searchenginewatch.com/showPage.html?page = 2156221 [Accessed
30th June 2008].Teevan,J.,
Dumais,S.T. & Horvitz, E. (2005). Personalizing Search via
Automated Analysis of Interests and Activities, Proceedings of the 29th
Annual International ACM Special Interest Group on Information
Retrieval, pp. 449-456.
Publisher - Google Scholar
Top
10 Search Terms in 10 Categories, December 2008, Search Engine Watch,
from http://searchenginewatch.com/3632401.print (13th February 2009) .Xu,J.,
Zhu,Z., Ren,X., Tian, Y. & Luo,Y. (2007). Personalized Web Search
Using User Profile, Proceedings of the 2007 International Conference on
Computational Intelligence and Security, pp. 222-226.
Publisher - Google Scholar
Zigoris,
P. & Zhang,Y.(2006). Bayesian Adaptive User Profiling with Explicit
and Implicit Feedback, Proceedings of the 2006 ACM CIKM International
Conference on Information and Knowledge Management, pp. 397-404.
Publisher - Google Scholar
|

ISSN:1943-7765
Article Access
|
|