|
Sunday 10 April:
Half-day Tutorial
(David Lewis)
Sunday 10 April: Half-day Tutorial
(Stephen E. Arnold)
Go to Meeting order prices and details
Go to Meeting
order form
Go to printable Meeting order form
Go to conference hotel site (booking
reference GRINF1)
|
|

This page last
changed 29 March 2005
Boston, Massachusetts, April 11-12, 2005
Program
This annual meeting provides a forum and
point-of-reference for all those interested in the intricacies of
Search and Retrieval. The meeting draws those with a professional
interest in search engines -- such as search engine designers and
developers -- and those interested in applying search engines in their
own professional environments. Search is at the heart of information
retrieval; and the Search Engine Meeting provides an annual point of
reference as to what is happening in this fast-moving and exciting
field.
Presentations
are given sequentially; there are no parallel sessions or parallel
presentations at this meeting.
Presentations below are not necessarily in final speaking order.
Monday April 11
09.00.
CONFERENCE OPENING
Jan Pedersen
Yahoo!, California, USA
Internet Search Engines: Past and Future
The 2005 Meeting begins
with a short history of Internet Search Engines from early first
generation systems to the current crop of stock market darlings. Many
of the underlying technology problems remain the same, but the business
has become significantly more sophisticated and high-powered. One of
the pioneers of the industry looks at some of the economics driving the
remarkable success of these services and makes predictions about future
trends.
SESSION ONE:
USERS & USABILITY
Carol Tenopir
University of Tennessee, School of Information Sciences, USA
What We Know About User Behavior (And What We Are Not So
Sure of)
From hundreds of recent
research studies conducted over the last ten years, we now know quite a
bit about how people interact with online information systems. We know,
for example, that information seeking and use by subject experts varies
by subject discipline and work place, that most users prefer
convenience and speed, and that users’ mental models of
information systems are being shaped by major internet search engines.
Some unanswered questions remain, however, and much of system design is
still based on majority behaviors rather than purpose and value of use
for individuals.
Jochen Leidner
Linguit, Bad Bergzabern, Federal Republic of Germany
The Deployment of a Mobile Question Answering System
Two approaches exist to
search automatically for information on the web from the desktop:
keyword-based search engines, and language-aware question answering
systems. In a mobile environment, specific challenges must be faced,
including, but not limited to, space constraints regarding input/output
devices. It is argued that an approach combining the two search
paradigms can meet these challenges. Using an SMS (Small Message
Services) based case study, we describe how these factors are addressed
in the Nuggets answer retrieval system for text-enabled mobile phones,
contrast the types of questions people ask to a desktop search scenario
and conclude with pointing out evaluation challenges.
Johannes C. Scholtes
ZyLAB North America, Virginia, USA
Usability versus Precision & Recall
Where the Google search
engine is popular for providing high precision, users in law
enforcement, intelligence and legal domains need 100% recall; every
possible relevant document, electronic file or email needs to be
examined and checked for relevance for a case. In theory, modern search
engine technology should allow users to overcome the lack in precision
and continue to focus on the recall. However, due to the variety of
languages used, changing document sizes, different document structures,
-types, and -domains; and the fact that most users do not always know
exactly what they are looking, these advanced search algorithms may
fail. Often, they are too domain or language dependent, become too slow
or require too much manual training and fine-tuning. Users do not know
what is going on under the hood, and ultimately get frustrated and may
no longer use the technology.
After working for more than 20 years with the FBI, the UN
War Crimes Tribunals and several other law enforcement, intelligence
and legal organizations worldwide, ZyLAB has some unique insights into
how certain usability tools can help users find what they are looking
for. It appeared that there are many cases where users prefer simple
interfaces and easy-to-use programs that allow them to interact with
the computer and navigate through the data in a highly interactive
manner over advanced linguistic- and statistical retrieval technology.
In this presentation, examples are given of simple,
straightforward tools such as fuzzy search, hit highlighting, hit
navigation, relevance feedback, categorization, classification tools
and data visualization which allow users to reach 100% recall and
overcome the lack of precision by interactively navigating through the
data collections without losing control of what they are doing; even
for relatively non-skilled users in the law enforcement and legal
domain.
SESSION TWO:
PORTALS & CONTENT MANAGEMENT
Tom Wilde
FindWhat.com, New York, USA
Content Management on the Web: the Next Killer App for
Search
Historically, Search has
played the role as a discovery and navigational device for the
internet. However, the growth in internet content has pushed the
traditional search results page beyond human factors. Thousands of
results, mostly uncategorized and poorly abstracted, leave users with
little ability to distill the valuable information buried within the
stack of results. Search's next killer app will leverage the
traditional strengths of information discovery and concept matching to
create on-the-fly content management with rich distillation and
organization of documents for instant information digests across the
internet.
Claude Vogel
Convera, Virginia, USA
The Web You Trust: Building An Authoritative Portal
How can you turn existing
assets — your organization’s content and a
website/portal — into a revenue generating operation and
position your website as the most "trusted" source of information
relevant to your customer base? New search technology enables companies
to combine authoritative web results with their own content and other
high quality proprietary content to deliver only the information most
relevant to a particular field or profession. This allows content
owners to control their own branded high value content, while
complimenting and augmenting it with the best available data from the
web.This presentation describes the methodology and search technology
necessary to make this possible, as well as new revenue sources from
re-using branded content and increasing website visitor loyalty.
Search Engine
Meeting 2005 Panel
The Better
Mousetrap Hour
In this session, guided and animated by Stephen Arnold
and questioned by the audience, young companies in the search arena
tell us how and why their new products are going to have a significant
impact on the world of search. The companies include:
Mondosoft (Anders Hyldahl, Denmark), Speed
of Mind (Denmark), FAST (John Lervik,
Norway) and Blossom Software (Alan Feuer, Massachusetts).
Among the topics addressed will be:
- Increasing performance for enterprise search indexing
- Behavior tracking and click analysis: responding to
actions
- Search and taxonomies
- Clustering: benefit or necessary evil.
Each company's representative will make a brief
presentation and after each presentation, an open discussion of these
and other issues will involve the panelists and the audience.
SESSION THREE:
PERSONAL INFORMATION & INTRANETS
Susan Dumais
Microsoft Research, Washington, USA
Personal Information Retrieval: Helping Finders become
Keepers
Most information retrieval
technologies are designed to facilitate information discovery. However,
much knowledge work involves finding and re-using previously seen
information in the context of ongoing work activities. This
presentation surveys progress and challenges in supporting information
re-use. A brief overview of different techniques that people currently
use to "keep found things found" will be presented. A prototype system,
"Stuff I've Seen" that provides unified access to information that a
person has seen, regardless of whether the information were encountered
as an email, web page, document, hand-written note, etc, will be
demonstrated and usage experiences with the system reported. Finally,
how the rich contextual cues provided by previously seen information
can be used to provide alternative query generation and results
presentation techniques will be illustrated.
Igor Perisic
Entopia, California, USA
State-of-the-Art in Search Engine Personalization; an
Enterprise Perspective
It is inevitable that
everyone needs at least some personalization within search results,
because there is really no absolute relevancy with respect to matching
a query to a set of documents, and we have all various levels of
experience with the topics we are querying. Disambiguating a query has
many steps starting from the semantics of the query itself. Spending
effort at this level is certainly worthwhile and necessary, but it can
only go part of the way to the delivery of efficient results. It
provides a better understanding of the content of the query, but none
about its context.What we propose is to elaborate its context through
the use of the entire set of available content within the enterprise
instead of just the searcher’s recent search history.
Viewing content not only through the set of
documents but through the set of actions users performed on them
provides an essential dual viewpoint. Capitalizing on it allows us to
personalize results on a potentially wide set of variables - for
example, searchers' scaled expertise in the topic queried, their
business roles and what co-workers did with potential relevant
documents.
Justin Gilbreath and Jakob Riegger
global-linxs, Munich, Germany
Search on Demand
Divergency of technology
and solutions and the desire for standards and consolidation are
constantly at battle. This war has also invaded the arena of search.
With companies implementing a different search engine for every new
retrieval service and reinventing the wheel for each new target group,
an answer is long overdue. “Search on Demand”
provides that solution. It has a solid foundation in business logic
with economies of scale at its heart and an attractive delivery
methodology for IT service providers. The technological angle also
provides striking benefits in re-usability of hardware, coding and
experience consolidation. Tackle the serious issues of security,
multiple languages, scaling and redundancy one time and still provide
users with a customized environment that is easy to use.
“Search on Demand”, the answer for IT and financial
decision makers for a truly enterprise-wide search platform has been
already deployed in various large corporations. The presentation
includes a real-world case study.
Avi Rappoport
Search Tools Consulting, California, USA
Data Discovery on the Intranet
This presentation
highlights the process of finding information on large uncontrolled
intranets, and what to do with it once you have it. It covers issues of
access to databases, legacy systems and other black holes, duplicate
detection, web server problems, security and tracking. The information
gathered can provide insight into factors such as index corpus quality,
link analysis including authoritative sites and hubs, file name
extensions, languages used, content change rates, protocols. Given the
opportunity to see all these files, data discovery is an excellent
occasion to use tools for classification and entity extraction as well.
Day
One Ends approximately 5.30 pm
Tuesday April 12
08.30 - SESSION
FOUR: THE COMMERCIAL SEARCH ARENA
Stephen E Arnold
Arnold IT, Kentucky, USA
The Google Legacy
Starting as a highly
successful web search engine, Google has upgraded its enterprise search
appliance and captured a number of key accounts in the commercial and
US government sector. Expanding in email and online news analysis, and
launched in the desktop search environment, Google's intelligent
hardware and software architectures are enabling it to expand in a
number of interesting directions and enable the company to envisage a
far more powerful role than that of being just a search engine.
In the last year, Google has moved beyond search.
The company has created a Google File System and a distributed
environment. The result is that Google can deploy applications which
solve what its founders describe as "interesting problems". This
presentation analyzes Google's contribution to the current search scene
and underlines some of the challenges Google may increasingly pose to
others in the search and retrieval area, as well as in the broad area
of data and content management. Will Google overshadow Microsoft in the
post 2005 period just as Microsoft challenged IBM in the 1980s?
John Lervik
FAST, Norway
Next Generation Search Technology Advancements
Search challenges that were
previously met and addressed in various niches and verticals are
rapidly becoming mainstream, making search an integral part of
‘the corporate operating system’ and the next
‘must-have’ for enterprises. Without strong search
capabilities, a portal in today’s competitive landscape is
bound to be left in cyberspace dust. The search technology employed
will need to be versatile enough to offer broader solutions to a
variety of needs while keeping up with growing data volumes, increasing
content complexity, diversified user populations and enterprise
security. Critical are also the needs for open interfaces and
methodologies to integrate to existing technology assets, requirements
without which this tool would be of little or no help at all.
Contextual navigation is among the important advancements and
innovations in the search space. It will provide a new means for
discovering high value information by automatic discovery of relations,
apparent or hidden in the content itself. Contextual navigation will
assist and expand existing models for advanced relevancy, allowing
results ordering according to the user’s agenda and sense of
importance.
David A. Hull
Clairvoyance, Pennsylvania, USA
Commercializing Information Extraction: Lessons from
WhizBang Labs
Information extraction (IE)
technology has matured to the point where commercial applications have
real potential to succeed in the marketplace. However, there are many
hidden challenges to overcome, including managing customer expectations
and controlling customization costs. WhizBang Labs, an IE start-up from
1998-2002, was a strong early competitor in this area, but it never
managed to find a stable market for its technology. The speaker, a
former scientist at the company, will present a few short case studies,
sharing some of his practical experience with success and failure in
developing IE technology for commercial clients.
Oshoma Momoh
Microsoft, Washington, USA
Search Immersion
"Web Search" has been
around for since the mid-90s, so why is it becoming so importnat ten
years later? MSN Search has just completed a two year process of
building its own algorithmic search engine for the Web. What was
learned in this undertaking, and what is ahead for the Search industry
as a whole? If this is just the first era of search, what is to come?
The task of searching will become an intuitive part of everyday life,
allowing you to search for anything, anywhere, any time, from any
device. How close are we to realizing this vision and what will it take
to get there? What is MSN Search going to focus on in the years ahead?
SESSION FIVE:
SEARCH CHALLENGES
Susan Feldman
IDC, Connecticut, USA
Trends in Information Retrieval: Search Moves to a
Central Role
Finding and managing
information has become such a necessity for today's organization that
the search function is starting to move into the basic software stack.
What are the implications of this for organizations and for search
software vendors? What other trends are emerging in the search and
retrieval market, and how should organizations prepare for or take
advantage of them? Join Sue Feldman for an overview of recent
developments and trends.
David A Evans
Clairvoyance, Pennsylvania, USA
New Solutions for Old (Search) Problems
After more than a decade in
the public spotlight and after becoming a common activity, search seems
familiar and comfortable. Yet, for all the obvious advantages of
current varieties of search over large-scale textual data, there are
persistent and sometimes surprising weaknesses. This presentation will
briefly review several categories of search problems, including the
challenge of exploiting implicit structure and cross-document relations
in collections of free text, and will present some of the (newer)
techniques that have been developed as solutions.
Tuoc Luong
Ask Jeeves Inc, California, USA
Combining Desktop & Web: Building a Seamless
Search Solution across Multiple Platforms
As
2004 drew to a close, most of the major search companies announced
intentions to provide desktop search, giving consumers the ability to
search their hard-drives as easily as they currently search the web.
But integrating a search solution across these distinct platforms --
web and desktop -- poses its own challenges. This presentation
addresses efforts to leverage the best of desktop and web search into
an intuitive, unified interface. And, as search continues to move
"beyond the browser", this presentation also highlights the ongoing
challenge of information retrieval across multiple platforms.
Raul Valdes-Perez
Vivisimo, Pennsylvania, USA
Arguments for Clustering and Meta-Search as a Universal
Norm for Information Retrieval
Clustering both competes
and cooperates with taxonomies for the next advance in information
access; some rivals are personalization, query refinement, entity
extraction, and others. Similarly, meta-search both competes with and
leverages the we-index-everything approach. This
presentation presents pragmatic and conceptual arguments in favor of
clustering and meta-search as a universal basis for information access
and reports on the deployment rate of these emerging technologies.
Robert
Carlson
IBM, Almaden Research Center, California, USA
The Approaching Era of Business Value on the Web
The presentation discusses
a number of recent indicators that suggest the era of business value on
the web is at hand. From consumer research and purchasing behavior to
enterprise brand tracking, intelligence gathering, and advertising, the
web is suddenly on everybody's mind -- not as an exciting future
possibility, but as an exploitable resource today. WebFountain is an
IBM Research project developing technology to support this future. The
presentation discusses the technical approach, talks about what is
technically possible today, describes a few of the ongoing Research
projects and describes key technical challenges for the future that, if
solved, could unlock the insight contained in internet content.
Elizabeth Liddy
Syracuse University, New York, USA
QA Authoring Tools
Since the domains or topics
of interest for each corporate client are myriad, a Question-Answering
system must be adaptable to perform well and permit each client's
customers to ask natural language questions against its collection of
documents, ie, product descriptions, product use instructions, FAQs or
other corporate information in order to provide good service while
reducing the burden on their customer service department. The system
needs to be configurable by the corporate client, in regard to the
number and type of answers to return, the confidence threshold that
determines display of any answer, and the relative weighting of
features considered in the question-answering process. In response to
this need, we have developed authoring tools that allow clients to
customize the system to assure high quality question-answering for
their own particular domain. To assist in tailoring the generic QA
system to a corporate client's document collection, tool sets are
provided to adjust the document processing and question handling
functions: 1) extraction and analysis tools to provide statistics about
the terminology of the user's collection, and the categories in which
those terms fall; 2) knowledge-base editing tools to refine and modify
the categorization scheme, and 3) question-refinement tools to create
domain-specific expansion sets as well as the ability to bolster,
tailor, and even create their own FAQ repository.
David A Ferrucci
IBM, New York, USA
UIMA: A component architecture for integrating
structured and unstructured information in search systems
An Unstructured Information
Management (UIM) application may be generally characterized as a
software system that analyzes large volumes of unstructured information
(text, audio, video, images, etc) to discover, organize and deliver
relevant knowledge to the client application in the form of a search
engine index, database or knowledge base. An example is a system that
processes millions of medical abstracts to discover critical drug
interactions and stores them in an optimized schema for efficient
retrieval. Another example is an application that processes tens of
millions of documents to discover key evidence indicating probable
competitive threats. In analyzing unstructured content, UIM
applications make use of a variety of analysis and search technologies
including: Statistical and rule-based Natural Language Processing
(NLP), Information Retrieval (IR), machine learning, ontologies and
automated reasoning. These technologies tend to be built independently
by highly specialized scientists and engineers. Integrating them to
build effective UIM applications is often a costly challenge.
Ammy Vogtlander
Scirus/Elsevier, Amsterdam, The Netherlands
Structuring the Unstructured Web for Specialized
Searching
The
number of online information sources has exploded and it is becoming
increasingly difficult for users to find relevant results using
general-purpose search engines. Specialized search engines optimize the
retrieval, classification, indexing and ranking processes for the
specific content sources they cover. This presentation discusses the
current search engine landscape and illustrates how a specialized
search engine can enhance the user's search experience. It discusses,
using practical examples, how pattern recognition techniques and
linguistic analysis can assist in automatically identifying the subject
area (eg, medicine or physics) and information types (eg, journal
article, author homepage) of web pages, thus allowing users to further
refine their search to fit their specific needs. Categorization of the
web into information types assists in the information extraction
process needed to capture fields relevant for ranking and display (eg,
document title, author and published date.
Conference Ends approximately 4.30 pm
|