Sunday 10 April:   Half-day Tutorial
(David Lewis)

 

Sunday 10 April: Half-day Tutorial
(Stephen E. Arnold)

 


   Go to Meeting order prices and details

 

  Go to Meeting
order form

 

  Go to printable Meeting order form

 

Go to conference hotel site (booking reference GRINF1)

 

 

 

 

 

 

 

 

 

 

 

 

 

This page last changed 29 March 2005

Boston, Massachusetts, April 11-12, 2005

Program

This annual meeting provides a forum and point-of-reference for all those interested in the intricacies of Search and Retrieval. The meeting draws those with a professional interest in search engines -- such as search engine designers and developers -- and those interested in applying search engines in their own professional environments. Search is at the heart of information retrieval; and the Search Engine Meeting provides an annual point of reference as to what is happening in this fast-moving and exciting field.

Presentations are given sequentially; there are no parallel sessions or parallel presentations at this meeting.
Presentations below are not necessarily in final speaking order.


  Monday April 11  

09.00. CONFERENCE OPENING

Jan Pedersen
Yahoo!, California, USA
Internet Search Engines: Past and Future

The 2005 Meeting begins with a short history of Internet Search Engines from early first generation systems to the current crop of stock market darlings. Many of the underlying technology problems remain the same, but the business has become significantly more sophisticated and high-powered. One of the pioneers of the industry looks at some of the economics driving the remarkable success of these services and makes predictions about future trends.

SESSION ONE: USERS & USABILITY

Carol Tenopir
University of Tennessee, School of Information Sciences, USA
What We Know About User Behavior (And What We Are Not So Sure of)

From hundreds of recent research studies conducted over the last ten years, we now know quite a bit about how people interact with online information systems. We know, for example, that information seeking and use by subject experts varies by subject discipline and work place, that most users prefer convenience and speed, and that users’ mental models of information systems are being shaped by major internet search engines. Some unanswered questions remain, however, and much of system design is still based on majority behaviors rather than purpose and value of use for individuals.

Jochen Leidner
Linguit, Bad Bergzabern, Federal Republic of Germany
The Deployment of a Mobile Question Answering System

Two approaches exist to search automatically for information on the web from the desktop: keyword-based search engines, and language-aware question answering systems. In a mobile environment, specific challenges must be faced, including, but not limited to, space constraints regarding input/output devices. It is argued that an approach combining the two search paradigms can meet these challenges. Using an SMS (Small Message Services) based case study, we describe how these factors are addressed in the Nuggets answer retrieval system for text-enabled mobile phones, contrast the types of questions people ask to a desktop search scenario and conclude with pointing out evaluation challenges.

Johannes C. Scholtes
ZyLAB North America, Virginia, USA
Usability versus Precision & Recall

Where the Google search engine is popular for providing high precision, users in law enforcement, intelligence and legal domains need 100% recall; every possible relevant document, electronic file or email needs to be examined and checked for relevance for a case. In theory, modern search engine technology should allow users to overcome the lack in precision and continue to focus on the recall. However, due to the variety of languages used, changing document sizes, different document structures, -types, and -domains; and the fact that most users do not always know exactly what they are looking, these advanced search algorithms may fail. Often, they are too domain or language dependent, become too slow or require too much manual training and fine-tuning. Users do not know what is going on under the hood, and ultimately get frustrated and may no longer use the technology.
  After working for more than 20 years with the FBI, the UN War Crimes Tribunals and several other law enforcement, intelligence and legal organizations worldwide, ZyLAB has some unique insights into how certain usability tools can help users find what they are looking for. It appeared that there are many cases where users prefer simple interfaces and easy-to-use programs that allow them to interact with the computer and navigate through the data in a highly interactive manner over advanced linguistic- and statistical retrieval technology.
  In this presentation, examples are given of simple, straightforward tools such as fuzzy search, hit highlighting, hit navigation, relevance feedback, categorization, classification tools and data visualization which allow users to reach 100% recall and overcome the lack of precision by interactively navigating through the data collections without losing control of what they are doing; even for relatively non-skilled users in the law enforcement and legal domain.


SESSION TWO: PORTALS & CONTENT MANAGEMENT

Tom Wilde
FindWhat.com, New York, USA
Content Management on the Web: the Next Killer App for Search

Historically, Search has played the role as a discovery and navigational device for the internet. However, the growth in internet content has pushed the traditional search results page beyond human factors. Thousands of results, mostly uncategorized and poorly abstracted, leave users with little ability to distill the valuable information buried within the stack of results. Search's next killer app will leverage the traditional strengths of information discovery and concept matching to create on-the-fly content management with rich distillation and organization of documents for instant information digests across the internet.

Claude Vogel
Convera, Virginia, USA
The Web You Trust: Building An Authoritative Portal

How can you turn existing assets — your organization’s content and a website/portal — into a revenue generating operation and position your website as the most "trusted" source of information relevant to your customer base? New search technology enables companies to combine authoritative web results with their own content and other high quality proprietary content to deliver only the information most relevant to a particular field or profession. This allows content owners to control their own branded high value content, while complimenting and augmenting it with the best available data from the web.This presentation describes the methodology and search technology necessary to make this possible, as well as new revenue sources from re-using branded content and increasing website visitor loyalty.


Search Engine Meeting 2005 Panel

The Better Mousetrap Hour

In this session, guided and animated by Stephen Arnold and questioned by the audience, young companies in the search arena tell us how and why their new products are going to have a significant impact on the world of search. The companies include:
Mondosoft (Anders Hyldahl, Denmark), Speed of Mind (Denmark), FAST (John Lervik, Norway) and Blossom Software (Alan Feuer, Massachusetts).

Among the topics addressed will be:

  • Increasing performance for enterprise search indexing
  • Behavior tracking and click analysis: responding to actions
  • Search and taxonomies
  • Clustering: benefit or necessary evil.

Each company's representative will make a brief presentation and after each presentation, an open discussion of these and other issues will involve the panelists and the audience.


SESSION THREE: PERSONAL INFORMATION & INTRANETS

Susan Dumais
Microsoft Research, Washington, USA
Personal Information Retrieval: Helping Finders become Keepers

Most information retrieval technologies are designed to facilitate information discovery. However, much knowledge work involves finding and re-using previously seen information in the context of ongoing work activities. This presentation surveys progress and challenges in supporting information re-use. A brief overview of different techniques that people currently use to "keep found things found" will be presented. A prototype system, "Stuff I've Seen" that provides unified access to information that a person has seen, regardless of whether the information were encountered as an email, web page, document, hand-written note, etc, will be demonstrated and usage experiences with the system reported. Finally, how the rich contextual cues provided by previously seen information can be used to provide alternative query generation and results presentation techniques will be illustrated.

Igor Perisic
Entopia, California, USA
State-of-the-Art in Search Engine Personalization; an Enterprise Perspective

It is inevitable that everyone needs at least some personalization within search results, because there is really no absolute relevancy with respect to matching a query to a set of documents, and we have all various levels of experience with the topics we are querying. Disambiguating a query has many steps starting from the semantics of the query itself. Spending effort at this level is certainly worthwhile and necessary, but it can only go part of the way to the delivery of efficient results. It provides a better understanding of the content of the query, but none about its context.What we propose is to elaborate its context through the use of the entire set of available content within the enterprise instead of just the searcher’s recent search history.
   Viewing content not only through the set of documents but through the set of actions users performed on them provides an essential dual viewpoint. Capitalizing on it allows us to personalize results on a potentially wide set of variables - for example, searchers' scaled expertise in the topic queried, their business roles and what co-workers did with potential relevant documents.

Justin Gilbreath and Jakob Riegger
global-linxs, Munich, Germany
Search on Demand

Divergency of technology and solutions and the desire for standards and consolidation are constantly at battle. This war has also invaded the arena of search. With companies implementing a different search engine for every new retrieval service and reinventing the wheel for each new target group, an answer is long overdue. “Search on Demand” provides that solution. It has a solid foundation in business logic with economies of scale at its heart and an attractive delivery methodology for IT service providers. The technological angle also provides striking benefits in re-usability of hardware, coding and experience consolidation. Tackle the serious issues of security, multiple languages, scaling and redundancy one time and still provide users with a customized environment that is easy to use. “Search on Demand”, the answer for IT and financial decision makers for a truly enterprise-wide search platform has been already deployed in various large corporations. The presentation includes a real-world case study.

Avi Rappoport
Search Tools Consulting, California, USA
Data Discovery on the Intranet

This presentation highlights the process of finding information on large uncontrolled intranets, and what to do with it once you have it. It covers issues of access to databases, legacy systems and other black holes, duplicate detection, web server problems, security and tracking. The information gathered can provide insight into factors such as index corpus quality, link analysis including authoritative sites and hubs, file name extensions, languages used, content change rates, protocols. Given the opportunity to see all these files, data discovery is an excellent occasion to use tools for classification and entity extraction as well.


Day One Ends approximately 5.30 pm


  Tuesday April 12  

08.30 - SESSION FOUR: THE COMMERCIAL SEARCH ARENA

Stephen E Arnold
Arnold IT, Kentucky, USA
The Google Legacy

Starting as a highly successful web search engine, Google has upgraded its enterprise search appliance and captured a number of key accounts in the commercial and US government sector. Expanding in email and online news analysis, and launched in the desktop search environment, Google's intelligent hardware and software architectures are enabling it to expand in a number of interesting directions and enable the company to envisage a far more powerful role than that of being just a search engine.
   In the last year, Google has moved beyond search. The company has created a Google File System and a distributed environment. The result is that Google can deploy applications which solve what its founders describe as "interesting problems". This presentation analyzes Google's contribution to the current search scene and underlines some of the challenges Google may increasingly pose to others in the search and retrieval area, as well as in the broad area of data and content management. Will Google overshadow Microsoft in the post 2005 period just as Microsoft challenged IBM in the 1980s?

John Lervik
FAST, Norway
Next Generation Search Technology Advancements

Search challenges that were previously met and addressed in various niches and verticals are rapidly becoming mainstream, making search an integral part of ‘the corporate operating system’ and the next ‘must-have’ for enterprises. Without strong search capabilities, a portal in today’s competitive landscape is bound to be left in cyberspace dust. The search technology employed will need to be versatile enough to offer broader solutions to a variety of needs while keeping up with growing data volumes, increasing content complexity, diversified user populations and enterprise security. Critical are also the needs for open interfaces and methodologies to integrate to existing technology assets, requirements without which this tool would be of little or no help at all.
Contextual navigation is among the important advancements and innovations in the search space. It will provide a new means for discovering high value information by automatic discovery of relations, apparent or hidden in the content itself. Contextual navigation will assist and expand existing models for advanced relevancy, allowing results ordering according to the user’s agenda and sense of importance.

David A. Hull
Clairvoyance, Pennsylvania, USA
Commercializing Information Extraction: Lessons from WhizBang Labs

Information extraction (IE) technology has matured to the point where commercial applications have real potential to succeed in the marketplace. However, there are many hidden challenges to overcome, including managing customer expectations and controlling customization costs. WhizBang Labs, an IE start-up from 1998-2002, was a strong early competitor in this area, but it never managed to find a stable market for its technology. The speaker, a former scientist at the company, will present a few short case studies, sharing some of his practical experience with success and failure in developing IE technology for commercial clients.

Oshoma Momoh
Microsoft, Washington, USA
Search Immersion

"Web Search" has been around for since the mid-90s, so why is it becoming so importnat ten years later? MSN Search has just completed a two year process of building its own algorithmic search engine for the Web. What was learned in this undertaking, and what is ahead for the Search industry as a whole? If this is just the first era of search, what is to come? The task of searching will become an intuitive part of everyday life, allowing you to search for anything, anywhere, any time, from any device. How close are we to realizing this vision and what will it take to get there? What is MSN Search going to focus on in the years ahead?


SESSION FIVE: SEARCH CHALLENGES

Susan Feldman
IDC, Connecticut, USA
Trends in Information Retrieval: Search Moves to a Central Role

Finding and managing information has become such a necessity for today's organization that the search function is starting to move into the basic software stack. What are the implications of this for organizations and for search software vendors? What other trends are emerging in the search and retrieval market, and how should organizations prepare for or take advantage of them? Join Sue Feldman for an overview of recent developments and trends.

David A Evans
Clairvoyance, Pennsylvania, USA
New Solutions for Old (Search) Problems

After more than a decade in the public spotlight and after becoming a common activity, search seems familiar and comfortable. Yet, for all the obvious advantages of current varieties of search over large-scale textual data, there are persistent and sometimes surprising weaknesses. This presentation will briefly review several categories of search problems, including the challenge of exploiting implicit structure and cross-document relations in collections of free text, and will present some of the (newer) techniques that have been developed as solutions.

Tuoc Luong
Ask Jeeves Inc, California, USA
Combining Desktop & Web: Building a Seamless Search Solution across Multiple Platforms

As 2004 drew to a close, most of the major search companies announced intentions to provide desktop search, giving consumers the ability to search their hard-drives as easily as they currently search the web. But integrating a search solution across these distinct platforms -- web and desktop -- poses its own challenges. This presentation addresses efforts to leverage the best of desktop and web search into an intuitive, unified interface. And, as search continues to move "beyond the browser", this presentation also highlights the ongoing challenge of information retrieval across multiple platforms.

Raul Valdes-Perez
Vivisimo, Pennsylvania, USA
Arguments for Clustering and Meta-Search as a Universal Norm for Information Retrieval

Clustering both competes and cooperates with taxonomies for the next advance in information access; some rivals are personalization, query refinement, entity extraction, and others. Similarly, meta-search both competes with and leverages the we-index-everything approach. This presentation presents pragmatic and conceptual arguments in favor of clustering and meta-search as a universal basis for information access and reports on the deployment rate of these emerging technologies.

Robert Carlson
IBM, Almaden Research Center, California, USA
The Approaching Era of Business Value on the Web

The presentation discusses a number of recent indicators that suggest the era of business value on the web is at hand. From consumer research and purchasing behavior to enterprise brand tracking, intelligence gathering, and advertising, the web is suddenly on everybody's mind -- not as an exciting future possibility, but as an exploitable resource today. WebFountain is an IBM Research project developing technology to support this future. The presentation discusses the technical approach, talks about what is technically possible today, describes a few of the ongoing Research projects and describes key technical challenges for the future that, if solved, could unlock the insight contained in internet content.

Elizabeth Liddy
Syracuse University, New York, USA
QA Authoring Tools

Since the domains or topics of interest for each corporate client are myriad, a Question-Answering system must be adaptable to perform well and permit each client's customers to ask natural language questions against its collection of documents, ie, product descriptions, product use instructions, FAQs or other corporate information in order to provide good service while reducing the burden on their customer service department. The system needs to be configurable by the corporate client, in regard to the number and type of answers to return, the confidence threshold that determines display of any answer, and the relative weighting of features considered in the question-answering process. In response to this need, we have developed authoring tools that allow clients to customize the system to assure high quality question-answering for their own particular domain. To assist in tailoring the generic QA system to a corporate client's document collection, tool sets are provided to adjust the document processing and question handling functions: 1) extraction and analysis tools to provide statistics about the terminology of the user's collection, and the categories in which those terms fall; 2) knowledge-base editing tools to refine and modify the categorization scheme, and 3) question-refinement tools to create domain-specific expansion sets as well as the ability to bolster, tailor, and even create their own FAQ repository.

David A Ferrucci
IBM, New York, USA
UIMA: A component architecture for integrating structured and unstructured information in search systems

An Unstructured Information Management (UIM) application may be generally characterized as a software system that analyzes large volumes of unstructured information (text, audio, video, images, etc) to discover, organize and deliver relevant knowledge to the client application in the form of a search engine index, database or knowledge base. An example is a system that processes millions of medical abstracts to discover critical drug interactions and stores them in an optimized schema for efficient retrieval. Another example is an application that processes tens of millions of documents to discover key evidence indicating probable competitive threats. In analyzing unstructured content, UIM applications make use of a variety of analysis and search technologies including: Statistical and rule-based Natural Language Processing (NLP), Information Retrieval (IR), machine learning, ontologies and automated reasoning. These technologies tend to be built independently by highly specialized scientists and engineers. Integrating them to build effective UIM applications is often a costly challenge.

Ammy Vogtlander
Scirus/Elsevier, Amsterdam, The Netherlands
Structuring the Unstructured Web for Specialized Searching

The number of online information sources has exploded and it is becoming increasingly difficult for users to find relevant results using general-purpose search engines. Specialized search engines optimize the retrieval, classification, indexing and ranking processes for the specific content sources they cover. This presentation discusses the current search engine landscape and illustrates how a specialized search engine can enhance the user's search experience. It discusses, using practical examples, how pattern recognition techniques and linguistic analysis can assist in automatically identifying the subject area (eg, medicine or physics) and information types (eg, journal article, author homepage) of web pages, thus allowing users to further refine their search to fit their specific needs. Categorization of the web into information types assists in the information extraction process needed to capture fields relevant for ranking and display (eg, document title, author and published date.


Conference Ends approximately 4.30 pm