
Media Partner
Meeting order
prices and details
Meeting
order form
Printable Meeting
order form
Exhibitor briefing
Meeting order
prices and details
Meeting
order form
Printable Meeting
order form
Exhibitor briefing
|
|
This page last changed 24 April 2008
Boston, Massachusetts, April 28-29, 2008
Program
This annual meeting provides a forum and point-of-reference
for all those interested in the intricacies of Search and Retrieval.
The meeting draws those with a professional interest in search
engines -- such as search engine designers and developers --
and those interested in applying search engines in their own
professional environments. Search is at the heart of information
retrieval; and the Search Engine Meeting provides an annual point
of reference as to what is happening in this fast-moving and
exciting field.
All presentations are given sequentially;
there are no parallel sessions or parallel presentations at this
meeting.

Monday April 28
09.00. CONFERENCE OPENING
Charles Clarke Day One Opening Talk
University of Waterloo, Canada
XML Retrieval: Problems and Potential
While XML is not an ideal vehicle for
capturing and exploiting document structure in search, it does
provide a common ground for addressing a number of related retrieval
problems across unrelated document types and collections. Using
examples of retrieval over collections of books and journals,
this talk outlines methods for focused retrieval: returning the
right parts of documents, not just the right documents. In the
case of books and journals, these parts may range from paragraphs
and pages to entire volumes. The talk also discusses the evaluation
of these focused retrieval methods. In particular, the talk describes
INEX (the INitiative for the Evaluation of XML Retrieval) an
ongoing forum for evaluating focused retrieval. Now in its sixth
year, INEX annually brings together an international group of
researchers to compare methods using common test collections.
Stephen E. Arnold
AIT, Kentucky
Beyond Search: Big Money and User Dissatisfaction as Catalysts
for Next-Generation Search Solutions
There are more than 200 companies offering
"search" solutions. Some of these are newcomers unknown
in the US. Paris-based PolySpot and Budapest-based Tesuji are
just two. The $1.2 billion buy out of Fast Search & Transfer,
more than Google's advertising billions, has ignited consolidation
in search. It is not just money. New research funded by the U.S.
government reveals that three out of five enterprise search system
users are dissatisfied or very dissatisfied with their present
search system. A new study for the Gilbane Group identifies facets
of the search business that receive scant attention:
- Universities are funding search ventures.
The goal is not technology transfer. The objective is to create
value and, hence, revenue beyond the traditional research grant
and licensing models.
- Newcomers are making inroads against
far larger vendors. Companies such as Coveo, Exalead, ISYS and
Siderean Software are tallying double digit growth by finding
new markets and capturing business from far-larger, higher-profile
vendors. Companies like Bitext in Madrid generate revenue by
providing established vendors with a way to add natural language
processing to their ageing systems, as dtSearch has done with
its Bitext relationship.
- The shift to rich text processing via
semantic and statistical techniques is now taking place. Although
slow to take off, the "assisted navigation" interface
pioneered by Endeca is now giving way to an information dashboard.
Endeca's lock on the point-and-click interface and "suggestions"
is being challenged by dozens of companies.
- Search is no longer an option. It is
an expected component of other enterprise applications. As a
result, search-and-retreival, findability solutions, and social
search are part of the standard enterprise software vendor's
product functionality. IBM, Microsoft, Oracle and SAP are pushing
downmarket.
The cumulative effect of these trends
has significant implications for vendors, procurement teams and
users. The two principal changes catalyzed by these trends are
that users want increasingly intelligent systems, thus triggering
significant opportunities for vendors and an increasing flow
of funds into new technologies that will ensure a fast-changing,
unstable market for the foreseeable future.
§ Exhibitor Product Highlights: Attivio /
Groxis
Steven Forth and Amelia Newbury
Monitor Group, Massachusetts
Search as a Mode of Learning: Requirements for Next Generation
Search Systems
In addition to hearing, watching, experiencing and other popular
modes of learning, search is an important and fundamental mode
for understanding complex subject areas. Learning in complex
fields can be understood as the building of concept maps through
the exploration of a knowledge space. As learning is often a
social act, and learners need to be able to communicate and share
their concept maps, there are important social and communication
issues at stake here as well. Search provides a compelling way
to explore multi-dimensional spaces. This is both a common and
growing use of search systems. But the use of search for learning
and social learning impose new requirements on search system.
These needs go beyond the simple optimization of 'findability'
of a known piece of information or even a sampling of possibly
relevant results. When social and communication aspects are factored
in, the current generation of search systems do not provide adequate
support for learners. Among other things, search systems need
to factor in the "white spaces" in the concept maps
and influence the presentation of results to fill these spaces.
This presentation looks at a number of common patterns in search
to see how these support learning, and then develops a set of
requirements for a search system that provides better support
for social learning in complex fields. eMonitor's experiments
in the use of semantic constructs to organize learning and performance
content to improve searchability will also be discussed.
10:45 Conference Break
Peter Jackson
Thomson Corporation, Minnesota
Blending Retrieval and Categorization Technologies in a Document
Recommender System
The task of recommending documents to knowledge workers differs
from the task of recommending products to consumers. Variations
in search context can undermine the effectiveness of collaborative
approaches, while many professionals function in an environment
where the open sharing of information may be impossible or undesirable.
There is also the 'cold start' problem of how to bootstrap a
recommendation capability in the absence of current usage statistics.
We describe a fully fielded system called ResultsPlus, which
uses a blend of information retrieval and machine learning technologies
to recommend secondary materials to attorneys engaged in primary
law research based on document metadata. Rankings of recommended
material are subsequently enhanced by incorporating both historical
user behavior and document usage data.
Terry Clift
ISYS Search Software, Colorado / Australia
Forget One Size Fits All, Search is an Iterative
Process
Taking a high-level, strategic approach to enterprise search
might make sense in specific cases, but it should not be done
at the exclusion of tactical deployments that pay immediate dividends
while the big rollout is still in the configuration
stage. Search is an iterative process that, when done correctly,
lives and breathes and conforms to the requirements of various
user communities over time. Understanding these users, their
needs and environments, and the intended goals for search, lays
the crucial foundation for any successful implementation. Vendors
and customers only set each other up for failure when they assume
environments are rigid and that search must have all of
the answers from the beginning.
This presentation discusses the iterative approach to search
and illustrates the best short-term and long-term strategies
for bringing enterprise search into an organization. The process
begins with how to identify where search can benefit immediately.
The presentation then outlines steps to rolling out search implementations
for broader requirements and how to generate lasting, long-term
gain without sacrificing the short term.
§ Exhibitor Product Highlights: ISYS
Roger Bradford
Agilex Technologies, Virginia
Semantic Retrieval: Making the Computer do the Heavy Lifting
This presentation covers the range of modern applications
of semantic processing to information retrieval. The emphasis
is on techniques that reduce the cognitive load on the user.
Techniques covered include conceptual retrieval, clustering and
categorization, on-the-fly taxonomy generation, and text mining.
Examples are taken from applications in industry and government.
These applications include patent analysis, legal data discovery,
and counter-terrorism analysis. The discussion includes multi-lingual
and cross-lingual applications. Although the emphasis is on text,
extensions to audio and video data are included. New results
are presented that demonstrate the applicability of these techniques
to very large document collections.
12:45 Lunch Break
Nigel Hamilton
Trexy, UK
Search Trails - Back to the Future
Each day millions search for the same things and often find
themselves repeating their own searches. Would it not be good
if we could harness this collective effort and remember the searches
and the web pages visited to find information?
This presentation explores how new social search tools impact
and assist the online searching community. Trexy.com remembers
search trails and shares them anonymously with other searchers.
Search trails are the pathways users make when searching on engines
such as Google, Yahoo and MSN. But what is the optimal trail
for a given search? How can we pass useful trails onto one another?
Can search trails help users to pinpoint information? The presentation
looks at the technical developments that
have led to how we currently view, retrieve, and remember information
online.
George Chitouras
Business Objects, California
Using Information Retrieval and NLP techniques to drive Business
Intelligence
While traditional Business Intelligence has transformed business
using structured information from operational applications and
transaction systems, there is a huge source of information that
has by and large been ignored: peoples thoughts and opinions,
found in communications such as emails, web pages, reports, surveys,
customer relationship management note fields, contracts, blogs,
wikis, and reports. Whether it is customer complaints, employee
feedback, analyst opinions, or competitors' intentions, this
potentially valuable information lies hidden in unstructured
text sources.
This presentation proposes that the artifacts of text analytics,
when used in the aggregate, can drive business intelligence dashboards
and measure sentiment as it relates to products,
companies or marketing initiatives.
§ Exhibitor Product Highlights:
Northern Light / Trexy
Sam Chapman
University of Sheffield, UK
Combining Semantics and Keyword Approaches to Enable Flexible
Enterprise Search
Keyword search has issues, in that returns
are not suitable for many business uses, reliable quantitative
returns are impossible to obtain due to the uncertain relevance
of any query return. More of an issue is that textual information
in specialised domains is often repetitive, and the context of
information is paramount to its meaning. In such circumstances
standard keyword approaches are not the best method to use relevant
information.
Semantic approaches offer a method to alleviate this issue by
capturing "knowledge" according to a pre-assigned structure
(ontology classes and relations). Although these techniques are
proven to be helpful in answering precise queries, the complexity
of how knowledge is searched and its rigid organisation can sometimes
constrain a user, especially considering that not all possible
"knowledge" is encoded into a re-usable structured
form.
This presentation outlines a flexible approach combining both
Keyword and Semantic approaches for specialised domains where
the user can easily switch between, or use, both approaches together
within a degree of variably structured and unstructured query
to locate the information needed for quantitative analysis. The
presentation focuses on a number of specific examples where this
simple patented approach is used in large scale enterprises.
3.45 Conference Break
Jeff Fried
FAST Search & Transfer, Massachusetts
The Next Step in the Confluence of Search and Business Intelligence
Enterprise search (with a heritage in serving ad hoc queries
on unstructured data) and BI (traditionally focused on structured
inquiry into structured data) have been coming together. A range
of capabilities combining search and BI are available and in
use. Text mining, search-based everyday analytics,
and search integrated in BI new technology that merges traditional
database and traditional search cores is coming out of the lab,
providing a next step in the search/BI space. This presentation
outlines the internal architecture and data management approach
of this next-generation core search technology.
Pascal Coupet
TEMIS, Philadelphia
Better Annotations for Text Mining: Using a Knowledge Server
Simple entity recognition is becoming more and more popular
to improve user search experiences. We are now used to seeing
personal names, places and others automatically recognized in
texts. These new dimensions can be used for facet navigation,
hyper linking and several statistic analysis types.
However, quality becomes quickly an issue in production systems
because of ambiguities and naming variation. Normalization and
disambiguation are a necessity for high quality systems.
The presentation discusses the next generation of entity recognition
system which addresses these issues in a customizable way from
one customer to another, based on a dedicated knowledge server
which stores known entities for a specific project, associated
with disambiguation methods and normalization methods. Its contents
evolve according to historical annotations and allow customers
to correct mistakes that will not be made again by the system.
This is a key element in providing strong customization capability
for each customer without modification to core annotator products.
5:15. Conference Mixer Cocktail
Tuesday April 29
09.00. CONFERENCE OPENING
Jason R. Baron Day Two Opening Talk
Director of Litigation, National
Archives and Records Adminstration, D.C.
Searching for the Good Lawyer: Emerging Best Practices In
The Use of Search and Information Retrieval Methods in E-Discovery
The cost of litigation in the US involving
electronically stored information (so-called E-discovery)
is burgeoning: according to one Forrester study issued in 2006,
corporate America is expected to increase its spending on E-discovery
from what was $1.4 billion dollars in 2006 to $ 4.8 billion in
2011. Under the new US Federal Rules of Civil Procedure effective
December 1, 2006, both private and public sector litigants increasingly
confront a world of preservation orders, injunctions, subpoenas,
and other demands for access to exponentially increasing volumes
of records and information stored electronically. One set of
emerging best practices highlighted by a recent best practices
commentary published by The Sedona Conference® involves more
serious attention being paid by lawyers and judges to information
retrieval issues, and the deficiencies in the way in which documents
and ESI routinely are searched for, including by way of keywords.
This presentation provides a strategic approach to search problems
lawyers and their clients face in e-discovery, as well as practical
pointers drawn from real cases; it also includes the latest findings
from the TREC Legal Track, an international text retrieval project
run out of the US National Institute of Standards and Technology.
Marcelline Saunders
Groxis, California
Powering Search Results with Visualization
This presentation discusses trends and
benefits of the convergence of search and visualization tools,
including brief historical information to provide context. It
looks at the latest tools as well as reviews the early adoption
and specific use cases in a number of verticals. Finally it discusses
how powering search results with visualization will impact the
market moving forward, including the impact on today's most popular
business models.
Richard Brath
Oculus, Canada
Search, Sense-Making and Visual User Interfaces
Following from research, observations, and interviews we determined
that search and sense-making involved many different component
tasks and many different workflows through these tasks. Advanced
technologies, such as entity extraction, classification, etc,
address part of the problem, but significant end-user performance
improvement in search and sense-making tasks can be also be achieved
through innovating the end-user interface to enable better workflows
across these different technologies and tasks with a single unified
interface. We have created a new interface using visualization
techniques called nSpace to implement these ideas including a
component called TRIST for interaction with large amounts of
results data to help users find the relevant, novel and unexpected;
and a component called Sandbox for collecting, organizing and
reasoning with pieces of information for sense-making. nSpace
uses novel techniques such as linked dimensions for characterization
and use of gestures for fluid workflows. The presentation discusses
some of the research findings, shows some examples of the nSpace
interface, and discusses some of the results and feedback.
10:30 Conference Break
Abe Lederman
Deep Web Technologies, New Mexico
Federated Search: True Enterprise Search
Enterprise Search Software as it is known today, whether from
Autonomy, Endeca, FAST or others, cannot provide access to all
the information of value at any reasonably sized organization
with a single search. Organizational information-content exists
in numerous silos accessible through a myriad of individual,
incompatible indices-engines. Technical, cost and bureaucratic
reasons prevent unifying all these various enterprise silos under
one index.
This presentation discusses how state-of-the art Federated Search
software provides actual enterprise (-wide) single point of search-access
to most, if not all, of the information repositories of value
to an enterprise, including those beyond the firewall.
Spencer Shearer
Exalead, Massachusetts
The Next Big Thing in Search: Hybrid/Vertical Search
Recent trends in search have centered largely on specialized
search functions, such as image and video search, signaling the
importance of the ability to index specific types of information
and different types of content. However, searchs potential
does not end there. It will continue to grow, providing users
with the ability to combine different forms of content in more
effective ways. By connecting sources that were until now considered
separate, hybrid/vertical search eclipses traditional text-oriented
and directory-based search methods, and is poised to become the
search industrys next big trend.
In this presentation, Bourdoncle offers a technical perspective
on the latest technologies to facilitate hybrid/vertical search,
which ultimately fosters a simpler, more natural search experience
for the user. Bourdoncle also provides insight into his first-hand
experience in helping to design Exaleads hybrid/vertical
search solution and will discuss the various opportunities for
employing hybrid/vertical search. In addition, the presentation
addresses the benefits and challenges of the technologies employed,
including entity extraction, real-time indexing, taxonomies and
navigation.
Edwin Cooper
InQuira, California
Two Roads Diverged in a Google World
Is Enterprise Search the dull cousin
of web-wide search? Is it destined to play follow-the-leader
in the innovation game, gradually adapting technologies to the
enterprise that were originally created for full web searches
by Google, Yahoo, and MSN?
This presentation suggests that Enterprise Search is not just
one application of full web search technology, but rather a fundamentally
different problem. As such, the technologies of Enterprise Search
and full web search are on unavoidably divergent paths; differences
in content production, quality, and knowledge of domain will
inevitably lead to fundamentally different solutions for the
two problems.
12:20 Conference Lunch
The Emergence of Next Generation
Systems
Brad Allen
Siderean Software, California
Relational Navigation Brings Social Computing and Semantic
Technology to the Enterprise
Enterprise IT managers are increasingly looking for the quickest
and most effective ways to bring the best of Web 2.0 and social
networking into their organizations. This is a difficult task
and many IT managers do not know where to where to begin and
which technologies offer the fastest return on investment in
this brave new world. One technology that is showing great promise
in terms of bringing together the best of Web 2.0 for the enterprise
is relational navigation, because it can bring together
the best of the Web, such as tagging, sharing and annotating
results to extend the huge investments that have already been
made in enterprise content management (ECM), document and database
management, and enterprise search.
Fundamentally, relational navigation is about providing a more
effective way to find and manage the myriad of content enterprises
import, store and export. It improves ECM by leveraging semantic
technology and the principles of the social Web to aggregate,
organize, manage and navigate an information centric architecture
in ways that were never possible before. By focusing on the relationships
between people, places and things, relational navigation maintains
context and allows participation in the discovery process. In
this presentation attendees will hear:
Chris Cleveland
Dieselpoint, Illinois
Open Pipeline: An Open Architecture for Document Processing
Open Search, an new XML standard, has simplified the process
of standardizing search results from multiple sources. Open Pipeline
is a new standard for the indexing side of the equation. It is
a simple, common architecture for connectors to data sources,
file filters, text analyzers, and modules to distribute documents
across a network. Partly an API and partly a feed protocol based
on Atom/RSS, it provides an open, non-proprietary way to fetch,
parse, analyze, and route documents.
2:45 Break and Final Exhibition
Kelly Stirman
Mark Logic Corporation, California
Classification of XML: Leveraging Semantics and Syntax
In todays fast moving marketplace, content applications
need to quickly adapt to new content sources and new market demands.
MarkLogic Servers XML classifier is a new tool to let content
owners:
-
Classify XML at any level of granularity: assign class membership
for the whole document, an individual element, or anywhere in
between
-
Classify synthetic documents: use the title, abstract, and
first paragraph of each section, ignoring the footnotes
-
Classify based on semantics or syntax: leverage indexes for
text, structure, or a mixture of both
-
Incorporate classification into the rendering logic: allow
classification output to dynamically affect the rendering of
content
This presentation describes how MarkLogic exposes its XML
classifier through an XQuery interface, and shows a demonstration
of the XML classifier at work within a content application built
using MarkLogic.
Presentation of the Everett
Brenner Award for the Best Paper at the 2008 Search Engine Meeting Meeting
Wrap-up Panel: What we Liked. What we Learned
Two expert industry commentators reflect on what was said
during the two days of the 2008 Search Engine Meeting and, with
the help of the audience, draw some lessons and conclusions.
Conference Ends at approximately 4.00 pm
|