Computational Linguistics 20-21
The Computational Linguistics course is taught by Raffaella Bernardi (UniTN)
We will have a mixture of blended (some students in presence and other online), only-online classes (all students attend synchronous live video lectures online) and asynchronous video-lessons (pre-recordered short lectures), the content of the latter will be discussed during sychronous meetings. All synchronous classes (both blended and only-online) will be recorded and posted in Moodle just after the lecture takes place. Detailed schedule and information. Only students enrolled within the LMI track can attend classes in presence, the other students must attend them online. Please, read UniTN COVID-19 rules.
Location: room 21, Palazzo Istruzione, Rovereto: for in-presence attendency. The zoom link is posted in Moodle, please register to it and contact the lecturer if you do not have access to it. Video-lessons are available in Moodle.
Please, NOTE the online calender DOES NOT take into account the coordination of classes between this course and the course on Computational Skills for Text Analysis by Ducceschi. The calendar below contains the correct information.
Platform
- For the Reading Groups we will use Perusall: Students have to enroll to the course in the platform. You can find the code in Moodle
- For the CL Labs, we will use CoLab
Information about the final exam
For LMI students, the exam will consist of three parts each contributing the 50% of the total mark:
- 25%: Assignments (RG Baroni et al 2014, Linzen 2020 and CL lab 19th of Nov.)
- 25%: written exercises on Syntax, Semantics and their interface
- 50% written report on a topic selected among those presented in class. The report can be either a project proposal based on a literature review or the report on a project based on a literature review. The report has to be written in LaTeX.
Students from Data Science will do only the project (75%) and the Assignments (25%). The project workload has to be discussed with the lecturer
Non-frequentanti: need to contact the lecturer two months before the exam,
Students have to agree on the topic of the report with the lecturer at least one month before the exam.
We will rely on programming skills taught by Luca Ducceschi in the course Computational Skills for Text Analysis (first semester). Students are highly reccomended to attend it, in particular if they lack a computational background. My course is complementary to Carlo Strapparava's course on Human Language Technologies (second semester). The Formal Semantic part will be presented in depth in Roberto Zamparelli's course on Logical Structures of Natural Language (second semester).
My slides will be posted below after each class.
Topics with a rough schedule
- 6 classes on Syntax (Sep-Oct): Formal Grammars of English, Syntactic Parsing, Statistical Parsing Dependency Parsing.
- classes on Semantics (Oct): Formal Semantics (4), Distributional Semantics Models (6) -- The Representation of Sentence Meaning, Computational Semantics, Neural Models of Sentence Meaning
- 1 class on evaluation methods and metrics (Nov)
- 2 classes on Multimodal Models (Nov): Language and Vision
- 4 Reading Groups (Sep-Nov) on cutting edge topics related to those discussed in class.
Schedule
- 1.) 21.09.2020 (10:15-11:45, blended) Intro to the course. NB. We are on Zoom too!
- Introduction to CL frontal class
- Assignment for class 2.)
- Luca Ducceschi: Intro to Python programming, intro to nlp, tokenization and regular expressions. (intro+9 classes --the last one on the 2nd of Oct.)
- 2.) 01.10.2020 (18:00-19:30, online) Reading Group:
- Tenney et al. ACL 2019 NLP pipeline and Neural Networks
- Jawahar et al. ACL 2019 Syntax-Semantics and Neural Networks
- In class online quiz on new technical vocabulary used so far
- Syntax: Extra material: NLTK Ch. 8 and Syntactic Tree Structures, Demos: DG
- 3.) 05.10.2020 (10:15-11:45, blended) Intro to CFG (Quiz and discussion on the 40 min video on CFG)
- 4.) 05.10.2020 (17:00-18:00, online) Exercises on CFG
- 5.) 07.10.2020 (10:30-12:00, blended) Intro to TAG, DG and CG and Chomsky Hiearchy (20 min video on TAG, DG and CG and 15 min video on Chomsky Hierarchy+meeting in class)
- 6.) 07.10.2020 (13:30-15:00, blended) Exercises on TAG and CG
- 7.) 08.10.2020 (18:00-19:00, online) Exercises on Formal Grammars.
- 8.) 09.10.2020 (11:00-12:00, blended) Intro to Parsing (two videos (28 min and 22 min)+meeting in class)
- Off-class quiz on the concepts introduced in this part (self-assesment)
- Luca Ducceschi: PoS Tagging and Lemmatization (6 classes)
- Formal Semantics: Extra material Semantic Parser by McCartney
- 09.) 19.10.2020 (10:15-11:45, blended) Recap about Syntax and Reading Group
- Wilxoc et al 2018 Filler-Gap and LSTM Blackbox and
- Kulmizev et al ACL 2020 Do Neural Language
Models Show Preferences for Syntactic Formalisms?
- 10.) 19.10.2020 (17:00-19:00, online) Introduction to logic (watch the 17 min video on intro to semantics)
- 11.) 21.10.2020 (10:30-12:30, blended) Introduction to Formal Semantics (no video)
- 12.) 22.10.2020 (18:00-19:30, online) Compositionality: Lambda calculus (function application) (24 min video and exercises)
- 13.) 23.10.2020 (11:00-12:30, online) Compositionality: Lambda calcuslus (abstraction)(no video)
- 14.) 26.10.2020 (10:15-11:45, blended) Syntax-Semantics (24 min video)
- Off-class quiz on the concepts introduced in this part (self-assesment)
- Luca Ducceschi: bigrams
- Distributional Semantics
- Extra Material: Linear Algebra (video
by G. Strang)
- Extra Material: Baroni 2013
- 15.) 28.10.2020 (10:30-12:30, blended) Introduction to DSM (videos of 17 and 27 mins)
- 16.) 29.10.2020 (18:00-19:30, online) Exercises on syntax-semantics
- 17.) 02.11.2020 (10:15-11:45, blended) Compositionality in DSM
- 18.) 04.11.2020 (13:30-15:00, online) Pen-paper exercises on vectors
- 19.) 05.11.2020 (18:00-19:30, online) Reading Group
- Baroni et al. 2014 Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors
- Extra Material: Baroni 2013
- Luca Ducceschi: W2V training
- Evaluation methods, metrics, the issue of bias; and Language and Vision
- Extra material: Dror et al 2018, Chris Potts and Geiger et al 2020; Baroni 2016 and Kafle, Shrestha and Kanan 2019
- 20.) 09.11.2020 (10:15-11:45, blended) Datasets, evaluation metrics
- 21.) 09.11.2020 (17:00-18:30, online) Lab 1 on Baroni et al 2014 (Cosine similarity and Accuracy)
- 22.) 11.11.2020 (13:30-15:00, online) Intro to Language and Vision
- 23.) 12.11.2020 (18:00-19:30, online) Lab 2 on Baroni et
al 2014 (Correlation and Purity)
- Off-class quiz on the concepts on distributional semantics introduced in this part (self-assesment)
- 24.) 16.11.2020 (10:15-11:45, blended) LaVi at CIMeC
- 25.) 18.11.2020 (13:30-15:00, online) MSc Thesis Presentation by Alberto Testoni
- 26.) 19.11.2020 (18:00-19:30, online) Follow-up Lab on Baroni et al 2014
- 27.) 23.11.2020 (13:00-14:30, online) Reading Group
- Tal Linzen How Can We Accelerate Progress Towards Human-like Linguistic Generalization?
- 28.) 25.11.2020 (13:30-15:00, online) Supervision working groups on project design
- 29.) 26.11.2020 (18:00-19:30, online) Sample exam (CFG,CG,lambda, vectors)
- 30.) 30.11.2020 (10:15-11:45 or in the pm??, online) Discussion Sample exam
- Students Project Presentations
- Extra material McCartney
- 31.) 02.12.2020 (13:30-15:00, online) Project Proposal Presentations
- 32.) 03.12.2020 (18:00-19:30, online) Project Proposal Presentations
Students should start thinking of the project they want to bring at the exam.
Further Materials
- Online courses at MIT
- Online course at Standford on Natural Language Understanding by Christopher Potts and McCartney
- Introduction to Semantics and Pragmatics by Christopher Potts
- A case for deep learning in semantics 2018 by Christopher Potts.
- ACL 2020 TUTORIAL "Reviewing Natural Language Processing Research"
- Speech and Language Processing (SLP)
- Steven Bird, Ewan Klein, and Edward Loper Natural Language Processing with Python: Ch 8 (CFG and DG), Ch. 9 (Feature Structures) and Ch. 10 (meaning)
If you are interested in textbooks about FS:
- For formal semantics and in particular lambda-calculus: Mathematical Methods in Linguistics by Barbara Partee, Alice ter Meulen, Robert Wall
- For a general intro to FS: Introduction to Natural Language Semantics by Henriette de Swart
For further information, see the suverys and tutorials below:
- Barbara Partee (2018) Formal Semantics. In the handbook of Formal Semantics ed. Maria Aloni and Paul Dekker. (in dropbox)
- Alessandro Lenci (2008) Distributional semantics in linguistic and cognitive research
- Turney and Pantel (2010) From Frequency to Meaning: Vector Space Models of Semantics
- Katrin Erk. (2012) Vector space models of word meaning and phrase meaning: a survey. Language and Linguistics Compass 6(10), 635-653, October 2012. (in dropbox)
- Marco Baroni Composition in Distributional Semantics. Language and Linguistics Compass 6(10), 635-653, October 2013. (in dropbox)
- Gemma Boleda and Aurelie Herbelot (2016) Formal Distributional Semantics: Introduction to the Special issue. Computational Linguistics 42:4
- Raffaella Bernardi, Ruket Cakici, Desmond Elliott, Aykut Erdem,
Erkut Erdem, Nazli Ikizler-Cinbis, Frank Keller, Adrian Muscat and
Barbara Plank Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures JAIR (Journal of Artificial Intelligence Research). Vol. 55, 2016. DOI:10.1613/jair.4900
- Emily Bender Semantics and Pragmatics ACL 2018
- Mrinmaya Sachan, Minjoon Seo, Hannaneh Hajishirzi, and Eric Xing Standardized Tests as benchmarks for Artificial Intelligence
- Percy Liang and Christopher PottsBringing machine learning and compositional semantics together
Tools and further links
- SippyCup semantic parser
- lambda viewer and lambda interpreter
- Tree viewer
- Online interface to query English and Italian semantic models
- DISCO, another online interface (multiple languages)
- word2vec, the tool and pre-compiled semantic vectors
- Semantic vectors, pre-compiled using word2vec with optimal parameters
- Gensim, Python Framework for Vector Space Modeling
- spaCy
- See Awesome Community-Curated NLP List
- Mailing lists: Corpora,
- Top conferences are run by ACL
Last modified: Tue Nov 24 09:08:57 CET 2020