Computational Linguistics 20-21

The Computational Linguistics course is taught by Raffaella Bernardi (UniTN)

We will have a mixture of blended (some students in presence and other online), only-online classes (all students attend synchronous live video lectures online) and asynchronous video-lessons (pre-recordered short lectures), the content of the latter will be discussed during sychronous meetings. All synchronous classes (both blended and only-online) will be recorded and posted in Moodle just after the lecture takes place. Detailed schedule and information. Only students enrolled within the LMI track can attend classes in presence, the other students must attend them online. Please, read UniTN COVID-19 rules.

Location: room 21, Palazzo Istruzione, Rovereto: for in-presence attendency. The zoom link is posted in Moodle, please register to it and contact the lecturer if you do not have access to it. Video-lessons are available in Moodle.

Please, NOTE the online calender DOES NOT take into account the coordination of classes between this course and the course on Computational Skills for Text Analysis by Ducceschi. The calendar below contains the correct information.

Platform

Information about the final exam

For LMI students, the exam will consist of three parts each contributing the 50% of the total mark:

  1. 25%: Assignments (RG Baroni et al 2014, Linzen 2020 and CL lab 19th of Nov.)
  2. 25%: written exercises on Syntax, Semantics and their interface
  3. 50% written report on a topic selected among those presented in class. The report can be either a project proposal based on a literature review or the report on a project based on a literature review. The report has to be written in LaTeX.

Students from Data Science will do only the project (75%) and the Assignments (25%). The project workload has to be discussed with the lecturer

Non-frequentanti: need to contact the lecturer two months before the exam,

Students have to agree on the topic of the report with the lecturer at least one month before the exam.

We will rely on programming skills taught by Luca Ducceschi in the course Computational Skills for Text Analysis (first semester). Students are highly reccomended to attend it, in particular if they lack a computational background. My course is complementary to Carlo Strapparava's course on Human Language Technologies (second semester). The Formal Semantic part will be presented in depth in Roberto Zamparelli's course on Logical Structures of Natural Language (second semester).

My slides will be posted below after each class.

Topics with a rough schedule

Schedule

1.) 21.09.2020 (10:15-11:45, blended) Intro to the course. NB. We are on Zoom too!
Introduction to CL frontal class
Assignment for class 2.)

Luca Ducceschi: Intro to Python programming, intro to nlp, tokenization and regular expressions. (intro+9 classes --the last one on the 2nd of Oct.)

2.) 01.10.2020 (18:00-19:30, online) Reading Group:
Tenney et al. ACL 2019 NLP pipeline and Neural Networks
Jawahar et al. ACL 2019 Syntax-Semantics and Neural Networks

In class online quiz on new technical vocabulary used so far


Syntax: Extra material: NLTK Ch. 8 and Syntactic Tree Structures, Demos: DG
3.) 05.10.2020 (10:15-11:45, blended) Intro to CFG (Quiz and discussion on the 40 min video on CFG)
4.) 05.10.2020 (17:00-18:00, online) Exercises on CFG
5.) 07.10.2020 (10:30-12:00, blended) Intro to TAG, DG and CG and Chomsky Hiearchy (20 min video on TAG, DG and CG and 15 min video on Chomsky Hierarchy+meeting in class)
6.) 07.10.2020 (13:30-15:00, blended) Exercises on TAG and CG
7.) 08.10.2020 (18:00-19:00, online) Exercises on Formal Grammars.
8.) 09.10.2020 (11:00-12:00, blended) Intro to Parsing (two videos (28 min and 22 min)+meeting in class)

Off-class quiz on the concepts introduced in this part (self-assesment)


Luca Ducceschi: PoS Tagging and Lemmatization (6 classes)

Formal Semantics: Extra material Semantic Parser by McCartney
09.) 19.10.2020 (10:15-11:45, blended) Recap about Syntax and Reading Group
Wilxoc et al 2018 Filler-Gap and LSTM Blackbox and
Kulmizev et al ACL 2020 Do Neural Language Models Show Preferences for Syntactic Formalisms?
10.) 19.10.2020 (17:00-19:00, online) Introduction to logic (watch the 17 min video on intro to semantics)
11.) 21.10.2020 (10:30-12:30, blended) Introduction to Formal Semantics (no video)
12.) 22.10.2020 (18:00-19:30, online) Compositionality: Lambda calculus (function application) (24 min video and exercises)
13.) 23.10.2020 (11:00-12:30, online) Compositionality: Lambda calcuslus (abstraction)(no video)
14.) 26.10.2020 (10:15-11:45, blended) Syntax-Semantics (24 min video)

Off-class quiz on the concepts introduced in this part (self-assesment)

Luca Ducceschi: bigrams

Distributional Semantics
Extra Material: Linear Algebra (video by G. Strang)
Extra Material: Baroni 2013
15.) 28.10.2020 (10:30-12:30, blended) Introduction to DSM (videos of 17 and 27 mins)
16.) 29.10.2020 (18:00-19:30, online) Exercises on syntax-semantics

17.) 02.11.2020 (10:15-11:45, blended) Compositionality in DSM
18.) 04.11.2020 (13:30-15:00, online) Pen-paper exercises on vectors
19.) 05.11.2020 (18:00-19:30, online) Reading Group
Baroni et al. 2014 Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors

Luca Ducceschi: W2V training


Students should start thinking of the project they want to bring at the exam.


Evaluation methods, metrics, the issue of bias; and Language and Vision
Extra material: Dror et al 2018, Chris Potts and Geiger et al 2020; Baroni 2016 and Kafle, Shrestha and Kanan 2019
20.) 09.11.2020 (10:15-11:45, blended) Datasets, evaluation metrics
21.) 09.11.2020 (17:00-18:30, online) Lab 1 on Baroni et al 2014 (Cosine similarity and Accuracy)
22.) 11.11.2020 (13:30-15:00, online) Intro to Language and Vision
23.) 12.11.2020 (18:00-19:30, online) Lab 2 on Baroni et al 2014 (Correlation and Purity)

Off-class quiz on the concepts on distributional semantics introduced in this part (self-assesment)

24.) 16.11.2020 (10:15-11:45, blended) LaVi at CIMeC
25.) 18.11.2020 (13:30-15:00, online) MSc Thesis Presentation by Alberto Testoni
26.) 19.11.2020 (18:00-19:30, online) Follow-up Lab on Baroni et al 2014


27.) 23.11.2020 (13:00-14:30, online) Reading Group
Tal Linzen How Can We Accelerate Progress Towards Human-like Linguistic Generalization?
28.) 25.11.2020 (13:30-15:00, online) Supervision working groups on project design
29.) 26.11.2020 (18:00-19:30, online) Sample exam (CFG,CG,lambda, vectors)

30.) 30.11.2020 (10:15-11:45 or in the pm??, online) Discussion Sample exam

Students Project Presentations
Extra material McCartney
31.) 02.12.2020 (13:30-15:00, online) Project Proposal Presentations
32.) 03.12.2020 (18:00-19:30, online) Project Proposal Presentations

Further Materials

If you are interested in textbooks about FS:

For further information, see the suverys and tutorials below:

Tools and further links


Last modified: Tue Nov 24 09:08:57 CET 2020