|
|
![]() |
|
|
Buses to Marineland
Pre-Dinner Cocktail
View 2008 hotel booking
To record your interest and to be on our email update list for this and future meetings |
ICIC
The International Conference for Science & Business Information Nice 19-22 October 2008
This page last updated 10 July 2008. Programme contents as of this date The ICIC meeting covers trends in the field of scientific and professional information. The 2008 meeting programme has an impressive line-up of significant figures in the information world. Among the topics covered and discussed by the 20+ speakers are:
The programme below is not yet in final speaking order
Day One Opening Keynote Carl Horton Erik
Nemeth Both generalised web search engines and discipline-specific bibliographic databases will need to evolve to remain competitive — comprehensive and authoritative — in discovery of scholarly literature. Initiatives such as Google Scholar and Microsoft Academic-live Search, acknowledge the importance of specialisation in searching for scholarly literature, and rising expectations of comprehensive access require that discipline-specific databases increase coverage. In parallel, cross-disciplinary pursuits such as neuroaesthetics — neuroscience and art history — increase the need for an integrated search of specialised databases. By following models of open collaboration in Web 2.0 and applying thesauri in the ontology of the Semantic Web, producers of discipline-specific databases can apply existing knowledge bases not only to expand coverage and maximise discovery of scholarly literature but also to foster interdisciplinarity. A strategy for leveraging primary assets of a specialised database — discipline-specific partnerships, expert abstracts and indexing, and discipline-specific thesauri — serves as a case study. The strategy illuminates the potential for integrating a discipline-specific database in the humanities with datasets from the sciences through the evolving infrastructure of the Web. Stephen Leicht Digitised
content, ubiquitous data access and information overload are hallmarks of
the Web 1.0 revolution of the 1990s. As the information on a
company, an industry and the competitive landscape became digitised, new
media of relevant content also emerged – company websites, FAQs,
blogs, online journals, social networks and other new media. The
combination of overwhelming digitised information with new content has
offered possibilities and challenges for competitive intelligence. While Wall Street has historically been guided by the numbers, more defining
information in brand messaging, consumer feedback, online reviews, company
reports, patent applications and other less structured content can alter
future performance. At the same time, innovations in text analytics
have allowed for robust data mining from unstructured heterogeneous sources. Jürgen Müller Francisco Webber Most of the current IR (Information Retrieval) research efforts point
towards its application in the consumer domain, where the requirements tend
to focus on broadness rather than depth. By contrast, professional IR needs
maximum precision, recall AND efficiency. The experimental investigation and the practical evaluation of existing
methodologies has shown that there is little probability of finding a single
algorithm that will satisfy all the needs of professional patent searchers.
Hence, there is a need for a variety of different Natural Language
Processing (NLP) techniques to be applied on the global patent corpus in order to
significantly improve patent retrieval. Peter Vanderheyden Latent Semantic Analysis (LSA) is a powerful information retrieval tool that provides searchers with an effective way to locate and semantically rank related documents while overcoming the search problems associated with synonymy and polysemy. While advanced searchers still rely heavily on Boolean searching because of its high precision, the quality of Boolean searching is dependent on the searcher’s experience level, knowledge of the content set and search engine, and ability to enter all relevant keywords as part of their search. Since various unknown keywords can be used to describe a concept, Boolean searching may result in reduced recall. Although LSA is limited in its ability to improve precision, it can dramatically improve recall, finding documents that Boolean searches may miss by analyzing document sets and terms to reveal concepts especially when document sets span varied or noisy texts or contain multiple languages. This presentation will outline the pros, cons and synergy between Boolean and LSA and discuss the value of LSA for the information professional. Maik Annies Searching non-patent literature prior art is crucial for checking
patentability of new inventions and validity of granted patents, since by
patent law information contained in non-patent literature is as important as
any patent document. Relevant subject matter is not always in focus of a
publication, but often hidden in the text, and therefore not always indexed
in bibliographic databases of classical online hosts. Thus, comprehensive
information retrieval requires searching the full-text of journals and the
internet. In this context retrieval of chemical structures from these
sources is a major challenge. John Bambridge The EPO manages one of the world's most comprehensive collections of
technical documentation, accessed daily by thousands of internal and external
users through electronic tools developed to support the patent granting process.
Guiding the further evolution of these services to a full electronic end-to-end
granting process for the benefit of all users is a major challenge in the
strategic objectives of the EPO. At the same time, the growth in international
data exchange and handling of the increasing amount of patent-related
documentation, particularly from Asia, needs to be addressed so as to ensure
that user expectations are met with efficient tools. US Patent & Trademark Office, USA The Changing World of Search and Information Access at the USPTO Effective access to Intellectual Property (IP) information is a key component of the USPTO's mission. By disseminating this information through its public search systems and data products, the USPTO provides the public the means to foster the competent preparation of patent and trademarks applications, avoid infringement of patents and trademarks, and understand the current state of the art as a basis for new ideas. This interactive presentation will focus on the new and innovative approaches being explored by the USPTO to more effectively provide access to the USPTO's extensive body of scientific knowledge. Current projects that will support the modernisation of internal USPTO automation systems to enhance text and search-related capabilities will be discussed. The interactive portion of this presentation will focus on a topic of key interest to the USPTO: improving automated access to the USPTO's systems, so that patent information can be delivered to all users, including 'automated' / data mining users, in an efficient manner. The presentation will engage the audience in a discussion on key data dissemination issues and ideas on approaches to improve the electronic access and delivery of information to the business and research communities. Sophia Ananiadou Ann Perry In recent years Open Innovation has become increasingly important as a
way to approach business, and R&D in particular. Organisations big and
small no longer assume that the answers to R&D challenges can be
generated internally. It is therefore increasingly important to find the
right partner to collaborate with - not just any partner - which often means
a combination of technical capability and commercial viability. Conference Mixer Cocktail sponsored by Questel and Matrixware Tuesday 21 October 2008
Bruno van Pottelsberghe The patent is a policy tool aiming at stimulating innovation. This presentation first explains the economic role of patent systems and their importance in innovation systems. In this respect the design of patent systems is a key issue. A particular focus will be put on the quality and cost factors through historic and recent cases as well as with simulations. The presentation is inspired by recent research and the book authored by Guellec and van Pottelsberghe (2007), “The economics of the European patent system”, Oxford University Press, which calls for a more ‘economic’ approach in the design of patent systems. Martin GriffiesAriadne Genomics, USA Using Automated Corpus Visualisation and Summarisation to Improve Literature and Reference Comprehension A significant proportion of scientists’ and information specialists’
time is spent reading, annotating and in the analysis of information sources,
ranging from news feeds to patents to full-text scientific literature. Understanding all the implications of a corpus is a challenge which can be
tackled by using tools which combine text processing, entity extraction,
automatic recognition of correlations between those entities, and graphics.
Visualisations of text corpora using techniques such as entity and
relationship (fact) frequency tables, Venn diagrams, heat maps and
computer-generated networks, pathways or spidergrams make comprehension easier
and will save time. Graphic analysis of large text corpora is perhaps the only
way to perform this task effectively, ascribing authority and reliability to
automatically extracted relationships from multiple sources. Luca Toldo and Caroline
Kant-Mareda* The ever increasing amount of available scientific literature sparks new approaches for knowledge extraction. At Merck Serono, we are using state-of-the-art use text mining technology to discover "hidden" / new links between biomedically relevant entities. In this way we try to validate new scientific hypothesis to add value to our molecules in the pipeline. In this presentation we show our latest experiences, exemplified by a well validated case study. Anton Heijs Text mining is now being used more in patent and non-patent literature search, especially to analyse large complex data sets rapidly. The supervised approach – classification – and the unsupervised approach – clustering and projection techniques – are both popular in text mining and together provide strong instruments for various tasks. Text mining in combination with advanced visualisation are two important techniques in patent analytics. This presentation presents the work of Treparel undertaken together with Philips on the combined usage of classification and clustering and different advanced visualisation techniques. The technical principles and the business case of some applications of text mining and visualisation will be presented and discussed. Christopher Southan The last few years have seen a revolution in open cheminformatics as exemplified by the growth of PubChem, DrugBank and other databases. Consequently, medicinal chemists and biologists now have access to high utility public sources of bioactive compounds that they can not only download and/or query directly over the Web but that also link to structured bioinformatic data. This work (PubMed ID 17897036) reviews compound content comparisons between selected public and commercial databases, particularly those that specify relationships between compounds and their activity against primary protein targets, thereby linking chemistry to biology. After collecting 19 different commercial and public data sources, including selected bioactive sub-sets, stringent filtering for unique content was applied to facilitate standardised comparison of content. The resultant 19x19 matrix shows the pair-wise comparison of each set of compounds. Detailed results will be presented but overall they emphasise the complementaritity of combining sources. This conclusion is supported by a Venn-type analysis of GVKBIO, WOMBAT (both commercial) and PubChem (public). These compound databases show not only overlap but also unique content and types of molecular target bioinformatic connectivity in each case because of their different strategies for source selection and expert curation. The Information Community Panel : The Next Five YearsAn interactive panel, animated
by Randall Marcinko
that unites an expert panel with the audience for comments and analysis concerning challenges for information users and producers over
the next five years. Expert panelists
include Fabienne Berthet (IPSEN), Vin
Caraher (Thomson Scientific), Peter
Kallas (BASF) and Stephen Leicht
(Collexis). Steven Hajkowski Patent data can be searched either from a collection of first-level original patent datasets from the issuing authorities, or from single sources of value-add data from the commercial information providers. In terms of the results obtained, each has its own advantages, for example the first-level data can provide the most comprehensive text-based searching, whereas the value-add databases offer abstracts in English for many more countries, plus advanced indexing to aid retrieval. In addition, combining and de-duplication of results from the various sources can be difficult, and the differing methods of calculating patent family relationships can bring further complications. This presentation examines a case study demonstrating these issues, comparing a search from first-level and value-add patent sources. Options for combining and de-duplicating the results are then discussed, as are the possibilities for creating answer sets compiled according to INPADOC and invention-based patent families. Irene Schellner Currently, more than half of all new patent applications published in the
world are written in Japanese, Chinese or Korean. Japan, China and Korea are
all among the top five biggest patenting nations in the world. Every year, the
Japanese Patent Office receives some 400,000 patent applications, the majority
of which are filed by domestic applicants. In the last ten years, applications
from domestic applicants doubled in Korea, and increased more than eight-fold
in China. A considerable part of the prior art thus generated in East Asia
will stay at a national level and not be published elsewhere in the world in a
western language. Jane List Using citations can be a convenient method for expanding searches in patent and scientific literature and this technique is well known. Cited references and citing references can add hits backwards in time, forwards in time and also laterally. In a patent search citations can provide new approaches for the search, new search terms and new potential applications. This presentation looks at the different types of patent citations, what they mean and how they can add insight for searching. Examples of patent searches such as competitor patent monitoring, prior art and invalidity searching may be used to explore the strengths and weaknesses of using A, X, Y and examiner citations. Patent citations from patent offices around the world are indexed by several vendors and also by patent office and other independent search engines for the benefit of patent information searchers. The approaches to citations taken on databases such as Patbase, DPCI and esp@cenet are reviewed for different searches. To finish, we take a look at graphic visualisations of patent citations. How useful are patent citation trees in gaining deeper understanding of the patent landscape? Conference Dinner at Marineland Buses Sponsored by BizInt Solutions Conference Dinner Welcoming Cocktail Sponsored by Chemical Abstracts Service Conference Dinner Sponsored by Thomson Reuters and Prous Science Conference Dinner Flowers and Décor Sponsored by Minesoft Wednesday 22 October 2008William Town To circumvent this problem we have come out with a text mining service model called XTractor. XTractor is highly accurate and more efficient than many of NLP engines, since we use hybrid technology of semi-automated data mining, which means the process involves NLP mining followed by a layer of manual validation. So we end up getting the most accurate hits for genes, diseases, drugs and many more entities. Since the annotation is accurate we would also be able to perform complex queries and retrieve the most complex relations in PUBMED, which is currently not possible with the conventional NLP systems. We have been able to achieve up to 99% accuracy in term pickups and relationship extraction with the XTractor system. A few advantages of the XTractor system are as follows:
With XTractor, the entities/terms in the sentences are manually categorised to public biological ontologies and it also provides users with the ability to create their own databases of sentences and relations for their sets of Keywords. XTractor also provides the user with ability to change Keywords preferences from time to time. Christiane Wolff Simon Gittins According to leading technology analyst firm Forrester Research, the top
concern of market and competitive intelligence professionals is "a
wellspring of competitive and market insight that goes untapped."
Researchers, scientists and marketers are all struggling with the best way to
understand their own products' competitive weaknesses and strengths, knowing
when competitors will announce their next product or upgrade and whether their
product line will soon be imitated by a lower cost offering. Imagine being able
to search instantly across multiple data repositories to learn more about your
market in order to ward off outside threats and competition. The ability to find
and digest information easily such as what patents your competitors are applying
for or what new compounds might be on the horizon could be invaluable to an
organization. The reality today is that organizations are doing this by
leveraging the power of enterprise search. Richard Kidd The RSC's Project Prospect, which was the first application of semantic web technologies to primary research publishing, won the 2007 ALPSP/Charlesworth Award for Publishing Innovation. The application of open and standard identifiers for both compounds and subject matter has opened new possibilities for linking between related publications and data, which promise to transform the way published chemistry is handled in the next few years. The role of a publisher, between author and reader, offers particular advantages and challenges - to preserve more of the original lab science throughout the publication process while delivering the science in ways that aid discovery and re-use. This presentation discusses the problems with the conventional publication process which we tried to address, the development process, and successes and failures in applying new standards. We look at the InChI and identifying chemical entities, using existing ontologies and building new ones, and their real-life application. While new developments applied to RSC's book and journal portfolio will be highlighted, the application of the underlying technologies can be seen to offer real benefits for both standalone and web-wide chemical information applications.
René
Deplanque
During the last years major publishing houses have started to publish
electronic books which they are offering within their search systems.
Normally, using the search engine of a publishing house, the user can search
the content of a publishing house, full text or within defined metadata and
then download hits as a PDF file, in conformity with licence agreements.
Unfortunately, this is a time consuming and tedious procedure, because
searching the total content of a publishing house will retrieve all possible
hits, whether in eBooks or journals, and independent of the actual licence
agreement. Therefore it is complex to pinpoint exactly the desired answer in a
licensed eBook. Another problem users are facing is that libraries have
licences with many publishing houses. In addition, for obvious reasons,
publishing houses do not allow cross-publishing-house-searching. This greatly
hinders the use of eBooks and therefore the development of this new and
important market. Sasha
Gurke Six
COUNTER reports are analyzed for e-books
in general and technical reference works in
particular to ascertain the value derived from each report
by subscribers of online services. Some
of the reports provide skewed statistics and are not very useful for
aggregated STM e-references, however COUNTER compliance is frequently a
requirement and certainly desired by the subscribers.
booking
|