Advanced Business Intelligence Techniques

Laurea Magistrale in Computer Science
Academic year 2015-2016, first semester

News

  • [2016-07-12] The text of the July test has been published below.

Lecturer


Schedule

The 6-credit course is composed of lectures (24h) and laboratory sessions (24h).
Laboratory hours are an integral part of the course and introduce several main course topics.

Prerequisites

Students should have covered the following topics in bachelor courses:

  • Probability and Statistics
  • Linear algebra (vector spaces, linear operators and matrices, eigenvalues and eigenvectors)
  • Databases (namely the relational model and SQL)
  • Programming simple algorithms (main examples shall be provided in Python)

Program

Selected topics (still subject to small changes):
  • Data mining by crawling the web, document representation and indexing;
  • Data clustering;
  • Feature extraction and dimensionality reduction;
  • Recommender systems;
  • The MapReduce framework for distributed processing.

See also the day-by-day syllabus below.

Exams

The exam consists of a written (pen-and-paper) test with theoretical questions and exercises. Students pass with a minimum of 18/30.
An optional oral exam is available for small adjustments to the final mark (±4 points).
Summer 2016 session (dates are not confirmed yet, might vary):

Written examOral exam
1st callThursday, June 9, 9amTBD
2nd callMonday, July 11, 9amTBD
3rd callThursday, September 8, 9amTBD

Bibliography and Course Material

See the syllabus below for details on covered parts.
Some Wikipedia articles are also suggested.


Detailed syllabus

Annotations:
theory The topic has been subject of theoretical discussion, theoretic questions can be expected in the written exam about this topic.
exercises practical exercises about the topic can be expected in the written exam.
lab the topic has been subject of lab experiences; scripts can be found in the lab code.

Document crawling, parsing, indexing, searching

Sources: Chapter 25 of The LION Way, sections 25.1, 25.2.
Confusion matrix Wikipedia article,

Complexity reduction and de-noising of data

Sources: MinHash Wikipedia article,
Wikipedia article on PCA,
Wikipedia article on SVD,
Wikipedia article on LSA,
Advances in Collaborative Filtering (up to 3.1 - SVD),
Wikipedia article on the Gradient Descent method,
Wikipedia article on hierarchical clustering.

Distributed computing

Source: Original paper about the MapReduce model


Page maintained by Mauro Brunato