Grounded Language Processing 22-23
The Grounded Language Processing course is taught by Raffaella Bernardi (UniTN), the TA is Alberto Testoni. Classes are on Tuesdays (13:00-15:00), Wednesdays (13:00-15:00) and Thursdays (15:00-17:00) in Rovereto, Palazzo Fedrigotti, Corso Bottini 31, 3rd floor (aula seminari).
The course is part of the degree in Artificial Intelligent Systems, but any student at UniTN interested in the topic can attend it as a Free Choice Course -- following the rules of the Program in which he/she is enrolled.
UniTn students, who are interested in attending the course but cannot attend it in presence, are welcome to email me -- we plan to teach the course using a digital board so to facilitate virtual participation.
If you are planning to attend the course, please add info about you in this form, it will help us planning the course better.
What this course is about This course focuses on the new emerging field of Grounded Language Processing (GLP), a subarea of AI that studies the connection between natural language, perception and action in the world. It gives students an overview of recent advances by revisiting also the long standing challenges set by the AI community at its start. It makes connections between natural language processing (NLP) and computer vision and robotics. It covers both grounded Natural Language Understanding and grounded Natural Language Generation and unified architecture for these two crucial components of AI agents. If time allows, the course ends by providing students with hints towards connection between GLP and Robotics and by comparing humans’ neural representations and attention mechanisms behind grounded NL and State-of-the-Art multimodal models.
Each main section consists both of frontal and hands-on experience.
Prerequisites: The course presupposes knowledge in Machine Learning, Natural Language Processing and possibly Computer Vision.Grading Criteria: paper review 15%, presentation of a research question and its SOTA 35%, Project 50%. Details can be found here
- WEEK 1: The Grounding Problem Harnad (1990), Pulvermüller (2005), Kafle et al (2019)
- Tuesday 20.09.22 Intro to the course. Why grounding?
- Wednesday 21.09.22 From the faraway to the near computational past
- Thursday 22.09.22 Reading Group: Mayo (2003)
- WEEK 2: Computational models for Multimodal Concept Representations Baroni (2016), Beinborn et al (2018)
- Tuesday 04.10.22 Word representation
- Wednesday 05.10.22 Practical Lab (with Raffa -- MM conceptual representation)
- Thursday 06.10.22 Reading Group Lazaridou, Bruni, Baroni 2014
- WEEK 3: Grounded NL understanding
- Tuesday 11.10.22 Sentence representation
- Wednesday 12.10.22 Practical Lab (ALBERTO -- MDETR code)
- Thursday 13.10.22 Reading Group: Aishwarya Kamath et al 2021
- WEEK 4: Visual Question Answering Kafle and Kanan 2017, Bernardi and Pezzelle 2021, Srivastava et al 2021
- Tuesday 18.10.22 task, dataset and models
- Wednesday 19.10.22 Practical Lab (ALBERTO -- VQA with MDETR)
- Thursday 20.10.22 Reading Group: Parcalabescu et al 2022
- WEEK 5: Grounded NL generation Hossain et al (2019)
- Tuesday 25.10.22 Datasets and models
- Wednesday 26.10.22 Practical Lab (ALBERTO -- IC with MDETR)
- Thursday 27.10.22 Reading Group: Vinyals et al 2015
- WEEK 6: Visual Dialogues Chen et al 2020
- CANCELLED Wednesday 02.11.22 datasets and models
- Thursday 03.11.22 Practical Lab (ALBERTO -- decoding strategies)
- WEEK 7 and &: Work on Language and Vision at CIMeC
- Monday 07.11.22 (11:30-13:00 online via zoom) datasets and models on Visual Dialogue
- Tuesday 08.11.22 Alex and Federico
- Wednesday 09.11.22 Lab on annotation, inter-agreemment, correlation (RAFFA)
- Thursday 10.11.22 Reading Group: Holtzman et al 2020
- Tuesday 15.11.22 GLP at LaVi'spast and current projects
- Wednesday 16.11.22 continuation on previous Lab (ALBERTO)
- Thursday 17.11.22 Reading Group on Mazuecos/Benotti et al EMNLP 2021
- WEEK 9: Project Design
- Tuesday 22.11.22 (YOUR PROJECT) Converge towards the main idea of your projects
- Wednesday 23.11.22 (YOUR PROJECT) Search for existing codes and related work
- Thursday 24.11.22 Emobodied AI
- WEEK 10: Project Proposal Discussion
- Monday 28.11.22 (YOUR PROJECT) (15:00-17:00) Get your hands in the code (ALBERTO)
- Tuesday 29.11.22 (YOUR PROJECT) Group 1 and Group 2: present relevant literature
- Wednesday 30.11.22 ( 13:00-14:30) -- YOUR PROJECT) Group 3 and Group 4: Present relevant literature
- Thursday 01.12.22 Evaluation methods in NLP (TBC)
- WEEK 11:
- Tuesday 06.12.22 (YOUR PROJECT) Design the experiments and evaluation method (Raffa and Alberto)
- Wednesday 07.12.23 (YOUR PROJECT)Peer-to-peer supervision (exchanges of groups --only Raffa)
- WEEK 12:
- Tuesday 13.12.22 (4 hrs YOUR PROJECT) Group 1, 2, 3 and 4: project proposal
Main Surveys
- Harnad, S. (1990) The Symbol Grounding Problem. Physica D 42: 335-346.
- F. Pulvermüller (2005) Brain mechanisms linking language and action
- Kafle, K., Shrestha, R., & Kanan, C. (2019). Challenges and prospects in vision and language research. Frontiers in Artificial Intelligence, 2, 28.
- Marco Baroni (2016) Grounding Distributional Semantics in the Visual World
- Lisa Beinborn, Teresa Botschen and Iryna Gurevych (2018) Multimodal Grounding for Language Processing
- Hossain, Sohele, Shiratuddin, Laga (2019) A Comprehensive Survey of Deep Learning for Image Captioning
- Raffaella Bernardi, Ruket Cakici, Desmond Elliott, Aykut Erdem, Erkut Erdem, Nazli Ikizler-Cinbis, Frank Keller, Adrian Muscat, and Barbara Plank. 2016. Automatic description generation from images: A survey of models, datasets, and evaluation measures. Journal of Artificial Intelligence Research (JAIR) 55 (2016), 409–442.
- Kushal Kafle and Christopher Kanan (2017) Visual Question Answering: Datasets, Algorithms, and Future Challenges
- Raffaella Bernardi and Sandro Pezzelle (2021) Linguistic issues behind visual question answering
- Srivastava, Y., Murali, V., Dubey, S. R., & Mukherjee, S. (2021). Visual question answering using deep learning: A survey and performance analysis. In S. Singh, P. Roy, B. Raman, & P. Nagabhushan (Eds.), Computer vision and image processing. CVIP 2020, volume 1377 of communications in computer and information science. Springer. (Pre-print)
- Chen, Lao, Duan (2020) Multimodal Fusion of Visual Dialog: A Survey
Papers for Reading Groups
- Michael J. Mayo (2003) Symbol Grounding and its Implications for Artificial Intelligence
- Lazaridou, Bruni and Baroni Is this a wampimuk? Cross-modal mapping between distributional semantics and the visual world ACL 2014
- Douwe Kiela, Alexis Conneau, Allan Jabri, Maximilian Nickel (2018) Learning Visually Grounded Sentence Representations
- Jize Cao, Zhe Gan, Yu Cheng, Licheng Yu, Yen-Chun Chen, Jingjing Liu Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models
- Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan Show and Tell: A Neural Image Caption Generator
- Will Monroe, Robert X.D. Hawkins, Noah D. Goodman and Christopher Potts Colors in Context: A Pragmatic Neural Model for Grounded Language Understanding
- Anderson, P., He, X., Buehler, C., Teney, D., Johnson, M., Gould, S., & Zhang, L. (2018). Bottom-up and top-down attention for image captioning and visual question answering. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6077–6086).
- Jesse Thomason, Aishwarya Padmakumar, Jivko Sinapov, Nick Walker, Yuqian Jiang, Harel Yedidsion, Justin Hart, Peter Stone, Raymond J. Mooney (2019) Improving Grounded Natural Language Understanding through Human-Robot Dialog [pre-print]
Open Access Codes
Other interesting papers
- Goel, Ashok K. 2021. “Looking back, looking ahead: Symbolic versus connectionist AI.” AI Magazine 42: 83–85. https://doi.org/10.1609/aaai.12026
- Tadas Baltrusaitis, Chaitanya Ahuja, Louis-Philippe Morency Multimodal Machine Learning: A Survey and Taxonomy IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 41, Issue 2. 2019. Video by Louis-Philippe Morency
- Laura Ruis, Jacob Andreas, Marco Baroni, Diane Bouchacourt, Brenden M. Lake A Benchmark for Systematic Generalization in Grounded Language Understanding
- Kexin Yi, Jiajun Wu, Chuang Gan, Antonio Torralba, Pushmeet Kohli, and Josh Tenenbaum. (2018) Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding , NeurIPS 2018
- Yonatan Bisk, Ari Holtzman, Jesse Thomason, Jacob Andreas, Yoshua Bengio, Joyce Chai, Mirella Lapata, Angeliki Lazaridou, Jonathan May, Aleksandr Nisnevich, Nicolas Pinto, Joseph Turian (2020) Experience Grounds Language
- Felix Hill, Stephen Clark, Karl Moritz Hermann, Phil Blunsom Understanding Early Word Learning in Situated Artificial Agents
- Leonardo Fernandino, Jeffrey R. Binder, Rutvik H. Desai, Suzanne L. Pendl, Colin J. Humphries, William L. Gross, Lisa L. Conant, Mark S. Seidenberg Concept Representation Reflects Multimodal Abstraction: A Framework for Embodied Semantics
- Emmanuel Dupoux Cognitive Science in the era of Artificial Intelligence: A roadmap for reverse-engineering the infant language-learner
- Armand S. Rotaru, Gabriella Vigliocco (2020) Constructing Semantic Models From Words, Images, and Emojis
- Gabriella Vigliocco, Lotte Meteyard, Mark Andrews, Stavroula Kousta (2009) Toward a theory of semantic representation
- L. Smith and M. Gasser. The development of embodied cognition: Six lessons from babies. Artificial life, 11(1-2):13–29, 2005.
- Harnad, S. (1994) Computation Is Just Interpretable Symbol Manipulation: Cognition Isn't.
- Gabriella Vigliocco, Pamela Perniss, David Vinson (2014) Language as a multimodal phenomenon: implications for language learning, processing and evolution
- Vogt, Paul. "Language evolution and robotics: issues on symbol grounding and language acquisition." Artificial cognition systems. IGI Global, 2007. 176–209.
- Michael F. Bonner and Russell A. Epstein Object representations in the human brain reflect the co-occurrence statistics of vision and language. Nature Commuications.
Last modified: Tue Dec 6 14:21:05 CET 2022