Computational Linguistics Course A.A. '10-'11 at FUB
Course Description
Syllabus^
Why is language/speech difficult and interesting?; Ambiguity; History of the field; Morphology; Syntax; Semantics; Pragmatics; Formal Grammars; Parsing; Logic and NLP.
Objectives ^
This course presents a graduate-level introduction to computational linguistics, the primary concern of which is the study of human language use from a computational perspective. The principal objectives of the course are to provide students with a broad overview of the field, and prepare them for further study computational linguistics and language technologies. No previous knowledge of linguistic theory and linguistic applications is assumed. Some background in First Order Logic is preferred.
Grading^
- 50%: You are to complete an independent project on some topic
in computational linguistics. Projects will be presented either
(a) to the lecturer only (in this case, you will have to send a
written report), or (b) to the other students too during the
lab session (in this case you will have to prepare
slides). The presentation must include a brief overview of
the literature, a critique of a selected paper and a
description of your own idea/implementation.
Projects' topics will have to be decided together with the
lecturer.
You can find tips on how to write a paper and on how to give a
talk here.
- 50%: project presentation. Winter session: (TBA),
- 50%: final exam. Winter session: (TBA)
Practical Info^
- Students: Compulsory course for (first year) students enrolled in the European Masters Program in LCT. Optional course for 2nd and 3rd year bachelor students and students of other MSc offered at FUB, Faculty of CS.
- Pre-requisites: None (some background in Logic is preferred.)
- Lecturer: Raffaella Bernardi
- Teaching Assistant: Elena Cabrio
- Credits: 4 credits (24 hs lectures, 12 hs labs)
- Schedule: 1st semester 2010-2011. Lectures: Thursdays 10:30-12:30. Labs: Wednesdays 18:00-19:00
- Place: See the updated info in the: RIS
- Office hours: by prior arrangement via e-mail during the whole academic year
^
- 14-10-2010
- Change of time of the labs: from next week onwards they will be from 17:00 to 18:00!!
- 12-09-2010
- Page oneline
Participants^
For organizational reasons, it would be good if you could register to the course expressing your intend to attend it by sending an e-mail to the lecturer. Please, specify whether you are a Bachelor or a Master student, and, in the latter case, whether you will be following the European Masters Program in LCT. If you have not done it yet, please fill in this form and return it to the lecturer by email.Material
Textbooks^
The recommended text books for the course are:
- Daniel Jurafsky and James H. Martin, Speech and Language Processing, Prentice-Hall, 2000.
- Patrick Blackburn and Kristina Striegnitz (BS) Natural Language Processing Techniques in Prolog
- Patrick Blackburn and Johan Bos (BB1) Representation and Inference for Natural Language A First Course in Computational Semantics
- Patrick Blackburn and Johan Bos (BB2) Working with Discourse Representation Theory
Lecture Notes ^
During the frontal lessons the lecturer will use slides that will made available after the lesson from this link.
Labs^
Labs will be divided into two parts. During the first part, leaded by Elena Cabrio, labs will be organized as reading groups during which articles concerning a given topic will be discussed together, these meetings will prepare the students to carry out the critique and project. Each student will be asked to report on a specific topic (i.e. give a critique) and carry out a small project on his/her own under the supervision of the teaching assistant. The topic of the project will be decided together on the base of the student interest. Both the critique and the project should be on the same topic and will be part of the grading. During the second part, leaded by Raffaella Bernardi, we will be doing pencil and paper exerices on the lambda caluclus and the interface between syntax and semantics. This part will be the topic of the written exam. If time allows, we will use Prolog too.
First part of Labs: Critiques and Projects^
Students will have to choose a topic on which to write a critique about a paper and carry out a project. The porpouse of the critique is to learn how to read a scientific paper in CL, and propose a project on the base of the reviewed paper. The project will give the student the chance to have hand-on experience and test his/her own ideas.
Critiques^
In the second lab (07/10) we will discuss how to write a paper critique. As an example, we will review together the paper Aurlien Max, Guillaume Wisniewski, "Mining Naturally-occurring Corrections and Paraphrases from Wikipedia's Revision History",/p>
Some interesting papers to get prepare for your critique are:
- Dale J., Benos, Kevin L. Kirk and John E. Hall, "How to review a paper"
- Polak JF. "The role of the manuscript reviewer in the peer review process".
- Baxt WG, Waeckerle JF, Berlin JA, and Callaham ML. Who reviews the reviewers? Feasibility of using a fictitious manuscript to evaluate peer reviewer performance. Ann Emerg Med 32:310,317, 1998.
Guidelines for preparing the slides and writing critiques
If you want to write your critique in LaTeX, you will find this site intersting.
Below a first proposal for the reading material for the critiques and the description of the corresponding projects (new papers/projects will be added soon)
- COLLOCATION EXTRACTION
- Paper
- Dekang Lin, "Extracting Collocations from Text Corpora".
- Project
- Investigate various methods of extracting collocations from a corpus and finding relations between them
- AUTOMATIC DISCOVERING OF WN RELATIONS
- Paper
- Marti A. Hearst, "Automatic Acquisition of hyponyms from large text corpora".
- Project
- Automatical discovery of WN relations (decide on a lexical relation, e.g. meronymy; pick a list of word pairs in WN in which the relation holds; extract sentences from large corpora in which this terms occurrs; find the commonalities about lexical and syntactic patterns and hypothesize patterns)
- CORPUS-BASED WORD SENSE IDENTIFICATION
- Paper
- Claudia Leacock, Martin Chodorow: "Combining Local context and WordNet similarity for Word Sense Identification".
- Project
- Exploiting local context for word sense identification (reproduce experiment 1 presented in the paper)
- PRECISION-ORIENTED TE MODULES
- Paper
- Bill MacCartney, Trond Grenager, Marie-Catherine de Marneffe, Daniel Cer, Christopher Manning, "Learning to recognize features of valid textual entailments"
- Project
- Implement TE modules (within the EDITS architecture) to handle specific linguistic phenomena relevant to inference
- TEXT STATISTICS
- Paper
- William H. Fletcher, Making the Web More Useful as a Source for Linguistic Corpora
- Project
- Given a sample of a book, calculate number of types, number of tokens, type/token ratio, possible collocations + data analysis
- AUTOMATIC KEYPHRASE EXTRACTION FROM SCIENTIFIC ARTICLES
- Paper
- Ken Barker and Nadia Corrnacchia, "Using noun phrase heads to extract document keyphrases", or choose a paper from the SemEval 2010 task proceedings
- Project
- Implement a simple system to automatically extract keyphrases from scientific articles (a simplified version of the SemEval 2010 task)
- EXTENDING MULTIWORDNET TO A NEW LANGUAGE
- Paper
- Bernardo Magnini, Carlo Strapparava, Giovanni Pezzulo, Alfio Gliozzo, "Comparing Ontology-Based and Corpus-Based Domain Annotation in WordNet"
- Project
- Investigating techniques to extend WN to a new language
Weekly Programme ^
The program below is provisional since it will be adapted to the students background. Slides will be updated through the course after each lesson.Nr.
LecNr.
LabDate Slides SLP Lab 1. 30/09/10 Introduction to LCT and CL Chapters 1-3,8.1,8.2: Course Info; Goals of CL; Challenges: Ambiguities at all levels; Morphology; Finite State Automata; Part-of-Speech; Word Class; Constituency.
1. 06/10/10 LAB: intro 2. 07/10/10 Syntax I Chapter 9: Coordination; Formal Grammars; Context-Free Rules and Trees; Sentence-Level Constructions, Chomsky Hierarchy. 2. 13/10/10 LAB: How to review a paper. 3. 14/10/10 Syntax-Semantics I
exercises on LambdaChapter 15.1,15.2: Syntax-Driven Semantics; Lambda-Calculus. [Inference]. See also BB1 3. 20/10/10
17:00-18:00!!LAB: Corpus Linguistics 4. 21/10/10 Syntax-Semantics II 5. 28/10/10 Lexical Semantics 4. 03/11/10 LAB: WordNet 6. 04/11/10 Textual Entailiment 5. 10/11/10 LAB: Reading group on TE 7. 11/11/11
3 hrs: 10:30-13:30Critiques by students 8. 17/11/10 LAB: lambda 7. 18/11/10 CFG and lambda calculus
intro to CG9. 24/11/10 Lab: CFG and lambda calculus 8. 25/11/10 CG and lambda calculus
Lambek Calculus10. 01/12/10 Lab: CG, Lambek Calculus and lambda calculus 9. 02/12/10 Comparison of Formal Grammars 10. 09/12/10
Parsing11. 15/12/10 Lab: CFG in Prolog 11. 16/12/10
Sample Written Exam12. 13/01/11
Discussion of sample exam12. 20/01/2011 Projects Presentation