Grounded Language Processing 21-22
The Grounded Language Processing course is taught by Raffaella Bernardi (UniTN), the TA is Alberto Testoni. Classes are on Tuesdays (15:00-17:00), Wednesdays (13:00-15:00) and Thursdays (13:00-15:00) in Rovereto, Palazzo Fedrigotti, Corso Bottini 31, 3rd floor (aula seminari).
The course is part of the degree in Artificial Intelligent Systems, but any student at UniTN interested in the topic can attend it as a Free Choice Course -- following the rules of the Program in which he/she is enrolled.
UniTn students, who are interested in attending the course but cannot attend it in presence, are welcome to email me -- we plan to teach the course using a digital board so to facilitate virtual participation.
If you are planning to attend the course, please add info about you in this form, it will help us planning the course better. .
What this course is about This course focuses on the new emerging field of Grounded Language Processing (GLP), a subarea of AI that studies the connection between natural language, perception and action in the world. It gives students an overview of recent advances by revisiting also the long standing challenges set by the AI community at its start. It makes connections between natural language processing (NLP) and computer vision and robotics. It covers both grounded Natural Language Understanding and grounded Natural Language Generation and unified architecture for these two crucial components of AI agents. If time allows, the course ends by providing students with hints towards connection between GLP and Robotics and by comparing humans’ neural representations and attention mechanisms behind grounded NL and State-of-the-Art multimodal models.
Each main section consists both of frontal and hands-on experience.
Prerequisites: The course presupposes knowledge in Machine Learning, Natural Language Processing and possibly Computer Vision.Grading Criteria: paper review 15%, presentation of a research question and its SOTA 35%, Project 50%. Details can be found here
- WEEK 1: The Grounding Problem Harnad (1990), Pulvermüller (2005), Kafle et al (2019)
- Tuesday 14.09.21 Intro to the course. Why grounding?
- Wednesday 15.09.21 From the faraway to the near computational past
- Thursday 16.09.21 Reading Group: Mayo (2003)
- WEEK 2: Computational models for Multimodal Concept Representations Baroni (2016), Beinborn et al (2018)
- Tuesday 21.09.21 Word representation
- Wednesday 22.09.21 (13:15-14:45) Practical Lab (MM conceptual representation)
- 23.09.21 (13:15-14:45) Reading Group Lazaridou, Bruni, Baroni 2014
- WEEK 3: Grounded NL understanding
- Tuesday 28.09.21 (15:15-16:45) Sentence representation
- Wednesday 29.09.21 Practical Lab
- Thursday 30.09.21 Reading Group: Aishwarya Kamath et al 2021
- WEEK 4: Visual Question Answering Kafle and Kanan 2017, Bernardi and Pezzelle 2021, Srivastava et al 2021
- Tuesday 05.10.21task, dataset and models
- Wednesday 06.10.21Practical Lab
- Thursday 07.10.21Reading Group: Bugliarello et al TACL 2021
- WEEK 5: Grounded NL generation Hossain et al (2019)
- Tuesday 12.10.21 Datasets and models
- Wednesday 13.10.21 Practical Lab
- Thursday 14.10.21 Reading Group: Vinyals et al 2015
- WEEK 6: Visual Dialogues Chen et al 2020TBC
- Tuesday 19.10.21datasets and models
- Wednesday 20.10.21Practical Lab
- Thursday 21.10.21Reading Group: Holtzman et al 2020
- WEEK 7: Neuro representations (Stefania Bracci)
- Tuesday 26.10.21Vision in the Brain
- Wednesday 27.10.21Practical Lab (with Alberto and Raffaella)
- Thursday 28.10.21Reading Group (Stefania's paper)
- WEEK 8: Work on Language and Vision at CIMeC (?)
- Tuesday 02.11.21Lab on annotation, inter-agremment, correlation
- Wednesday 03.11.21GLP at LaVi's -- Alberto's current project. Define the research questions of each group.
- Thursday 04.11.21Reading Group on Suglia et al 2020
- WEEK 9: Project Design
- Tuesday 09.11.21 NO CLASS
- Wednesday 10.11.21Converge towards the main idea
- Thursday 11.11.21 GLP at LaVi's running projects: Claudio's PhD overview, Emma's and David's MSc plans)
- WEEK 10: Project Proposal Discussion
- Tuesday 16.11.21 Search for existing codes and related work
- Wednesday 17.11.21 Get your hands in the code
- Thursday 18.11.21 Set the specific research questions: update the other groups
- WEEK 11:
- Tuesday 23.11.21 Design the experiments and evaluation method
- Wednesday 24.11.21Peer-to-peer supervision
- Thursday 25.11.21Hawkins et al TBC
- WEEK 12:
- Tuesday 30.11.21Schema of relevant literature
- Wednesday 01.12.21Frontal class: Embodied Agents
- Thursday 02.12.21 (14:00-16:00) Group 1, 2, 3 (Project Proposal Literature overview presentations)
- WEEK 13: Project Proposal Presentations
- Tuesday 09.12.21 (10:00-12:00) by the three groups
Main Surveys
- Harnad, S. (1990) The Symbol Grounding Problem. Physica D 42: 335-346.
- F. Pulvermüller (2005) Brain mechanisms linking language and action
- Kafle, K., Shrestha, R., & Kanan, C. (2019). Challenges and prospects in vision and language research. Frontiers in Artificial Intelligence, 2, 28.
- Marco Baroni (2016) Grounding Distributional Semantics in the Visual World
- Lisa Beinborn, Teresa Botschen and Iryna Gurevych (2018) Multimodal Grounding for Language Processing
- Hossain, Sohele, Shiratuddin, Laga (2019) A Comprehensive Survey of Deep Learning for Image Captioning
- Raffaella Bernardi, Ruket Cakici, Desmond Elliott, Aykut Erdem, Erkut Erdem, Nazli Ikizler-Cinbis, Frank Keller, Adrian Muscat, and Barbara Plank. 2016. Automatic description generation from images: A survey of models, datasets, and evaluation measures. Journal of Artificial Intelligence Research (JAIR) 55 (2016), 409–442.
- Kushal Kafle and Christopher Kanan (2017) Visual Question Answering: Datasets, Algorithms, and Future Challenges
- Raffaella Bernardi and Sandro Pezzelle (2021) Linguistic issues behind visual question answering
- Srivastava, Y., Murali, V., Dubey, S. R., & Mukherjee, S. (2021). Visual question answering using deep learning: A survey and performance analysis. In S. Singh, P. Roy, B. Raman, & P. Nagabhushan (Eds.), Computer vision and image processing. CVIP 2020, volume 1377 of communications in computer and information science. Springer. (Pre-print)
- Chen, Lao, Duan (2020) Multimodal Fusion of Visual Dialog: A Survey
Papers for Reading Groups
- Michael J. Mayo (2003) Symbol Grounding and its Implications for Artificial Intelligence
- Lazaridou, Bruni and Baroni Is this a wampimuk? Cross-modal mapping between distributional semantics and the visual world ACL 2014
- Douwe Kiela, Alexis Conneau, Allan Jabri, Maximilian Nickel (2018) Learning Visually Grounded Sentence Representations
- Jize Cao, Zhe Gan, Yu Cheng, Licheng Yu, Yen-Chun Chen, Jingjing Liu Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models
- Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan Show and Tell: A Neural Image Caption Generator
- Will Monroe, Robert X.D. Hawkins, Noah D. Goodman and Christopher Potts Colors in Context: A Pragmatic Neural Model for Grounded Language Understanding
- Anderson, P., He, X., Buehler, C., Teney, D., Johnson, M., Gould, S., & Zhang, L. (2018). Bottom-up and top-down attention for image captioning and visual question answering. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6077–6086).
- Jesse Thomason, Aishwarya Padmakumar, Jivko Sinapov, Nick Walker, Yuqian Jiang, Harel Yedidsion, Justin Hart, Peter Stone, Raymond J. Mooney (2019) Improving Grounded Natural Language Understanding through Human-Robot Dialog [pre-print]