Quality-based Indexing of Web Information

QuIEW DIT-PRJ-04-049

Status NOT active project
DISI role Partner
Project type Research Project
Dimension Trentino
Acquisition date 2004-10-11
Start date 2004-10-11
End date 2007-12-31
SAP code 40100826

Project details

Project astract In principle web content is easily available. In practice, the main method to access web information is by means of search engines, and the view they offer is limited. Queries select hundreds or thousands of documents that the search engines rank using ranking euristhics. In practice, users view just the first few pages, hence web of the search engines. The ranking schemas are partial and eve if their principles are public, the details of the preferences are well-guarded commercial secrets. As a consequence, on the one hand web access mediated by search engines is an fundamental service, on the other hand high-quality information retrieval requires a different approach unique centralized service that implies a loss of visibility.<br/>In this project we propose methods that permit classification and organization of web content. The methods are based on the concept of namely views of the web that are related to a particular topic and take into account quality of content.<br/>Documents related to a given topic are retrieved by means of focused crawling . Indexing and retrieval goals are satisfied by means of content-based and collaborative filtering techniques. This topic-based search leads to naturally distributed organization of the search process based on a peer-to-peer architecture.<br/>This general structure is applied to automatically extract from the web the information needed to digital libraries a context where quality plays a major role. The methodologies we will develop will support the human activity aimed to extract topic-based information, select with quality criteria and organize it. In particular, university of Trento will develop techniques aimed to share the knowledge of the users involved in the process. <br/>
Keywords web technologies, machine learning, distributed indexing, document classification
Fundings 1227400 €
Partners
  • ITC-Irst
  • DIT - University of Trento
  • University of Siena
  • D-Think

DISI Sub-project details

Project astract The goal described in project abstract
Keywords web technologies, machine learning
Fundings 49500 €
Manager Enrico Blanzieri