Generalized Analysis of Logs for Automatic Translation and Episodic Analysis of Searches

GALATEAS DIT-PRJ-11-009

Homepage http://www.galateas.eu/
Status active project
DISI role Partner
Project type Research Project
Dimension International
Acquisition date 2010-04-01
Start date 2010-04-01
End date 2013-03-30

Project details

Project astract The GALATEAS project offers digital content providers an innovative approach to understanding users' behaviour by analysing language-based information from transaction logs and facilitates the development of improved navigation and search technologies for multilingual content access.<br/><br/>The objectives of GALATEAS are the following.<br/><br/> * Query log analysis. Analyse transaction logs containing queries to search engines for a given content provider and produce customized reports concerning the information needs of the users accessing that particular aggregation. The analysis is based on linguistic and statistical features.<br/> * Query translation. Translate queries coming from an external search engine into several target languages. The external search engine will use these translations to return results in languages other than the one in which the query was formulated. The languages addressed in the context of GALATEAS are Italian, French, English, German, Dutch, Modern Arabic and Polish.<br/><br/>The sound integration of GALATEAS's infrastructure with current digital content systems will be achieved by coupling statistical and natural language processing techniques with existing information retrieval systems using a web services based framework.
Keywords Digital Libraries, Query Log Analysis, multinguality
Fundings 1850000 €
Partners
  • DISI - UniTN
  • XRCE - Xerox Research Center Europe
  • CELI
  • Object Direct
  • BRIDGEMAN ART LIBRARY
  • GONETWORK
  • ISLA-University of Amsterdam
  • IBI-Universitat zu Berlin

DISI Sub-project details

Project astract UniTN is leader of the WP over Algorithm Tuning:<br/><br/>This WP will produce most of the algorithms for query analysis to be integrated either in other technological WPs or in business activities. The main algorithms to be extended/developed are the following:<br/><br/> * Extended TLike: for identifying queries which are likely translation equivalents of each other.<br/> * Clustering: for grouping together all queries that represent the same information need.<br/> * Topic Computation: for identifying queries that match the category tree adopted by the content provider (e.g. the specific subject headings system or classification system of a certain library).<br/><br/>UniTN is also involved on the Evaluation of these algorithms (WP7) and in the development of a Named Entity Recognizer from query logs (part of WP2).
Keywords Topic Models, Clustering, Classification, Named Entity Recognizer
Fundings 20901 €
Manager Massimo Poesio