Computational Linguistics Course A.A. '04-05 at FUB
Course Description
Syllabus^
Why is language/speech difficult and interesting?; Ambiguity, communication, inference ...; Phonetics, Morphology, Syntax; Semantics; Pragmatics; Formal Grammars, Parsing; Logic and NLP; Corpora, Ontologies, Wordnet. History of the field.
Objectives ^
This course presents a graduate-level introduction to computational linguistics, the primary concern of which is the study of human language use from a computational perspective. The principal objectives of the course are to provide students with a broad overview of the field, and prepare them for further study computational linguistics and language technologies. No previous knowledge of linguistic theory and linguistic applications is assumed. Some background in First Order Logic is preferred.
Grading^
- 10% participation at LCT Colloquia. (optional)
- 10%: critiques of selected readings and research papers or of an LCT Seminar. Critiques are due by the 20th of February.
- 40%: You are to complete an independent project on some topic
in computational linguistics that must include a careful write-up
and oral presentation. Projects proposed during the course will be
listed here, but you are most welcome to
come with your own idea! You can choose to present your project
(a) to the me only or (b) to the other students too.
- (a) In the first case, you have to write a paper describing your work. The written presentation is due by the 23rd, and we could meet to discuss it.
- (b) In the second case, you have to give a 30 mins. talk either on the 21st or on the 24th of February. Talks will be from 09:30-12:30 (the room will be posted on this site soon). Please, send me a mail giving your preference for the day so to make sure we have a good distribution of talks. Please, state whether the other day is also possible for you or not.
- 40%: final exam. Winter session: 14th of February 05 (on CFG, CG and lambda calculus).
Practical Info^
- Students: Compulsory course for (first year) students enrolled in the European Master in LCT. Optional course for 2nd and 3rd year bachelor students
- Pre-requisites: None (some background in Logic is preferred.)
- Lecturer: Dr. Raffaella Bernardi
- Credits: 4 credits for master students and 6 credits for bachelor students (24 hs lectures, 12 hs labs in both cases)
- Schedule: 1st semester 2004-2005, 2nd part (November-January). Lectures: Thursdays 08:30-10:30 and Fridays 08:30-10:30. Labs: Fridays 10:30-12:30
- Place: Room E4.12 (lectures), Computer Room E5.31 (labs)
- Office hours: Thursdays 10:30-11:30 or by prior arrangement via e-mail
^
- 13-02-2005
- The inforamation about "Grading" has been updated
- 15-01-2005
- Change of Schedule: there will be a lesson on Monday 24th at 16:00-18:00 Room 4.11 and Tuesday 25th at 14:00-16:00 Room 4.12. (Hence, there won't be lessons on Thursday 27th and Friday 28th, nor the Lab of Friday 28th.
- 15-01-2005
- Change of Schedule: there will be a lesson on Monday 17th at 16:00-18:00 Room A518. Monday 24th at 16:00-18:00 Room 4.11, Tuesday 25th at 14:00-16:00 Room 4.12.
- 17-11-2004
- We have started an Erasmus Exchange Programme with the University of Saarland (the leading European center for LCT). For more information contact Enrico Franconi.
- 09-11-2004
- LCT Colloquia
- 08-11-2004
- Crash Course on Prolog
- 04-10-2004
- BIT Seminar: Talk by prof. Hans Uszkoreit on State of the Art in Language and Communication Technologies, 18-10-2004, 15:00-16:00, CS Faculty Seminar Room.
- 04-10-2004
- BIT Seminar: Talk by prof. Oliviero Stock on Natural language processing and intelligent information presentation, 18-10-2004, 16:30-17:30, CS Faculty Seminar Room.
- 30-09-2004
- Students interested in the Master Programs in CS at FUB are invited to attend the information day that will be held on the 04-10-2004
Participants^
For organizational reasons, it would be good if you could register to the course expressing your intend to attend it by sending an e-mail to the lecturer. Please, specify whether you are a Bachelor or a Master student, and, in the latter case, whether you will be following the European Master in LCT. If you have not done it yet, please fill in this form and return it to the lecturer.Material
Textbooks^
The recommended text books for the course are:- Daniel Jurafsky and James H. Martin, Speech and Language Processing, Prentice-Hall, 2000.
- Patrick Blackburn and Kristina Striegnitz (BS) Natural Language Processing Techniques in Prolog
- Patrick Blackburn and Johan Bos (BB1) Representation and Inference for Natural Language A First Course in Computational Semantics
- Patrick Blackburn and Johan Bos (BB2) Working with Discourse Representation Theory
Lecture Notes ^
During the frontal lessons I will use slides that will made available after the lesson from this link.
Labs^
Labs aim to give you hands-on experience on the topic presented during the frontal lessons. We will use Prolog for most of the exercises.
A good crash course on Prolog is Learn Prolog Now!. Prolog is available for free get your own prolog
All exercises and material used at the labs will be available after the lesson from this link.
Critiques^
During the lectures, I will propose some readings which will be listed here, but your own selections are most welcome.
If you want to write your critique in LaTeX, you will find this site intersting.
Guidelines for writing critiques
An example of a critique of A
Prototype Reading Coach that Listens. Mostow et al. AAAI 94.
- Banko & Brill. Scaling to Very Very Large Corpora for Natural Language Disambiguation ACL 2001.
- Lauri KarttunenApplications of Finite-State Transducers in NLP
- Lillian Lee I'am sorry Dave, I am afraid I can't do that: Linguistics, Statistics and NLP circa 2001*
- Blackburn and BosComputational Semantics
- Aravind Joshi. "Tree-Adjoining Grammars". In The Oxford Handbook of Computational Linguistics (In the Library)
- Sanda M. Harabagiu Deriving Metonymic Coercions from WordNet
- Beatrice Santorini. Part-of-Speech Tagging Guidelines for the Penn Treebank Project.
- Marcus Mitchell, Beatrice Santorini, Mary Ann Marcinkiewicz. Building a large annotated corpus of English: the Penn Treebank
- Eric Brill. Transformation-Based Error-Driven Learning and NLP: A case study in POS Tagging. 1995.
- Michael Moortgat. Categorial grammar and formal semantics. In L. Nagel (ed.) Encyclopedia of Cognitive Science, Vol. 1, pp. 435-447. London, Nature Publishing Group. . 2002.
Projects^
During the lectures, I will propose several projects which will be listed here, but you are most welcome to come with your own idea.
- Morphological Analyser: Study the chapeter on Morphological Parsing in BS. Pick a phenomenon of inflectional or derivational morphology in some language you know (look for the rules in a grammar!). Try to write down a finite state transducer as a graph. Then put it in Prolog and test it with the programs provided in that chapther.
- Formal Grammar: (a) Build a CFG of a fragment of your favorite language (different than English). Try to include the phenomena introduced during lectures. (b) Implement the fragment in PROLOG.
- Formal Grammar: Choose a fragment of your favorite language (different than English) and formilize it in a CFG and TAG. Compare the results.
- Formal Grammar: Choose a fragment of your favorite language (different than English) and formilize it in a CFG and CTL. Compare the results.
- Formal Grammar: Choose two languages and build for them a fragment covering the same structures. Use a formal grammar to analyse the differences among the two languages.
- Discourse Look for cases of Co-references in any of the Corpora below. Make a comparison between languages.
- Cognitive Science: TBA
- Linguistic PhenomenaUse any of the Corpora below to investigate the phenomena of polarity items (PI) in your favorite language. (PI to be introduced after Christmas)
- Linguistic PhenomenaUse any of the Corpora below to investigate long-distance phenomena in Italian (German). You can also do a comparative study of the two languages.
- Implementation (a) Implement Brill's algoritm to extract Non-Lexicalized Transformation Rules (See Dongilli's seminar). See Brill '95 paper. (b) Test the rules you have found. I will provide the Corpora and the extracted lexicon.
- Implementation Build a Parser for Feature Structure Grammar. Use the algorithms given in SLP, Chapter 11. (To be discussed!)
Final Exams^
After the exam you will find here the given exam, the solutions and the results.
Weekly Programme ^
The topics within square brackets have not been discussed in class
Week | Date | Slides | SLP | Lab | Deepen in/Related to |
---|---|---|---|---|---|
1 | 02/12/04 | Introduction to LCT and CL | Chapters 1,8,9: Course Info; Goals of CL; Challenges: Ambiguities at all levels; Morphology; Finite State Automata; Part-of-Speech; Word Class; Constituency. | Crash Course on Prolog | FSA: Theory of Computing, Formal Languages; Prolog: Programming Languages |
03/12/04 | Syntax I | Chapter 9: Coordination; Formal Grammars; Context-Free Rules and Trees; Sentence-Level Constructions. | Lab 1 (Syntax) | Formal Grammars: Compilers||
2 | 09/12/04 | Syntax II | Chapters 9, 11: Agreement; The VP and Subcategorization; Feature Structures; Unification of Feature Structures; Features Structures in the Grammar. See also BS | ||
10/12/04 | Parsing | Chapter 10, 11: Bottom up Parsing; Top down Parsing; Depth First Search; Breadth First Search; [Feature Unification]. See also BS | Lab 2 (Parsing) | Text Processing, 3rd sem. BZ, Compilers | |
3 | 16/12/04 | Semantics, Lambda extra | Chapter 15: Syntax-Driven Semantics; Lambda-Calculus. [Inference]. See also BB1 | Semantic Theory, 2nd sem. Saarbruecken, Reasoning methods: Computational Logic, Knowledge Representation | |
17/12/04 | Discourse | Chapter 18: Reference Resolution; Text Coherence; Discourse Structure; [DRT]. See BB2 ch. 1 | Lab 3 (Semantics) | ||
Winter break | |||||
4 | 13/01/05 | Summary of the first part. (1hr) Sample Exam with solutions (1hr) | None: Key points of the 1st part of the course. | ||
14/01/05 | Formal Grammars I | Gamut (ch. 4): Syntax-Semantic Interface, CG. | Lab 4 (CG) | ||
5 | 17/01/05 | Categorial Type Logic [Room A518] | my thesis (ch. 1): Syntax-Semantics Interface in CTL. | ||
20/01/05 | Lab 5 (CG: solution lab 4, solution sample exam) [lecture room] | ||||
21/01/05 | CFG | BS: Long Distance Dependency in CFG | Lab 6 (CTL + CFG) [Comp. Room], | ||
6 | 24/01/05 (tbc) | Formal Grammars II< A href="./Slides_04_05/tag.tar">Files on TAG . | Slides only History of Formal Grammars; Comparison of Formal Grammars |
||
25/01/05 (tbc) | Formal Grammars III [Room E 412 (14:00-16:00)] | Chapter 13: Generative Power; The Chomsky Hierarchy; Complexity, History and Current Directions of CL | Theory of Computing | ||
Textbooks^
The recommended text books for the course are:- Daniel Jurafsky and James H. Martin, Speech and Language Processing, Prentice-Hall, 2000.
- Patrick Blackburn and Kristina Striegnitz (BS) Natural Language Processing Techniques in Prolog
- Patrick Blackburn and Johan Bos (BB1) Representation and Inference for Natural Language A First Course in Computational Semantics
- Patrick Blackburn and Johan Bos (BB2) Working with Discourse Representation Theory