General
Course Description (10-11)
Material ('10-'11)

Computational Linguistics Course A.A. '10-'11 at FUB

Course Description

Syllabus^

Why is language/speech difficult and interesting?; Ambiguity; History of the field; Morphology; Syntax; Semantics; Pragmatics; Formal Grammars; Parsing; Logic and NLP.

Objectives ^

This course presents a graduate-level introduction to computational linguistics, the primary concern of which is the study of human language use from a computational perspective. The principal objectives of the course are to provide students with a broad overview of the field, and prepare them for further study computational linguistics and language technologies. No previous knowledge of linguistic theory and linguistic applications is assumed. Some background in First Order Logic is preferred.

Grading^

  • 50%: You are to complete an independent project on some topic in computational linguistics. Projects will be presented either (a) to the lecturer only (in this case, you will have to send a written report), or (b) to the other students too during the lab session (in this case you will have to prepare slides). The presentation must include a brief overview of the literature, a critique of a selected paper and a description of your own idea/implementation. Projects' topics will have to be decided together with the lecturer. You can find tips on how to write a paper and on how to give a talk here.
    • 50%: project presentation. Winter session: (TBA),
    • 50%: final exam. Winter session: (TBA)

    Practical Info^

    • Students: Compulsory course for (first year) students enrolled in the European Masters Program in LCT. Optional course for 2nd and 3rd year bachelor students and students of other MSc offered at FUB, Faculty of CS.
    • Pre-requisites: None (some background in Logic is preferred.)
    • Lecturer: Raffaella Bernardi
    • Teaching Assistant: Elena Cabrio
    • Credits: 4 credits (24 hs lectures, 12 hs labs)
    • Schedule: 1st semester 2010-2011. Lectures: Thursdays 10:30-12:30. Labs: Wednesdays 18:00-19:00
    • Place: See the updated info in the: RIS
    • Office hours: by prior arrangement via e-mail during the whole academic year

    News ^

    14-10-2010
    Change of time of the labs: from next week onwards they will be from 17:00 to 18:00!!
    12-09-2010
    Page oneline

    Participants^

    For organizational reasons, it would be good if you could register to the course expressing your intend to attend it by sending an e-mail to the lecturer. Please, specify whether you are a Bachelor or a Master student, and, in the latter case, whether you will be following the European Masters Program in LCT. If you have not done it yet, please fill in this form and return it to the lecturer by email.

    Material

    Textbooks^

    The recommended text books for the course are:

    Lecture Notes ^

    During the frontal lessons the lecturer will use slides that will made available after the lesson from this link.

    Labs^

    Labs will be divided into two parts. During the first part, leaded by Elena Cabrio, labs will be organized as reading groups during which articles concerning a given topic will be discussed together, these meetings will prepare the students to carry out the critique and project. Each student will be asked to report on a specific topic (i.e. give a critique) and carry out a small project on his/her own under the supervision of the teaching assistant. The topic of the project will be decided together on the base of the student interest. Both the critique and the project should be on the same topic and will be part of the grading. During the second part, leaded by Raffaella Bernardi, we will be doing pencil and paper exerices on the lambda caluclus and the interface between syntax and semantics. This part will be the topic of the written exam. If time allows, we will use Prolog too.

    First part of Labs: Critiques and Projects^

    Students will have to choose a topic on which to write a critique about a paper and carry out a project. The porpouse of the critique is to learn how to read a scientific paper in CL, and propose a project on the base of the reviewed paper. The project will give the student the chance to have hand-on experience and test his/her own ideas.

    Critiques^

    In the second lab (07/10) we will discuss how to write a paper critique. As an example, we will review together the paper Aurlien Max, Guillaume Wisniewski, "Mining Naturally-occurring Corrections and Paraphrases from Wikipedia's Revision History",/p>

    Some interesting papers to get prepare for your critique are:

    Guidelines for preparing the slides and writing critiques


    If you want to write your critique in LaTeX, you will find this site intersting.

    Below a first proposal for the reading material for the critiques and the description of the corresponding projects (new papers/projects will be added soon)

    • COLLOCATION EXTRACTION
      Paper
      Dekang Lin, "Extracting Collocations from Text Corpora".
      Project
      Investigate various methods of extracting collocations from a corpus and finding relations between them
    • AUTOMATIC DISCOVERING OF WN RELATIONS
      Paper
      Marti A. Hearst, "Automatic Acquisition of hyponyms from large text corpora".
      Project
      Automatical discovery of WN relations (decide on a lexical relation, e.g. meronymy; pick a list of word pairs in WN in which the relation holds; extract sentences from large corpora in which this terms occurrs; find the commonalities about lexical and syntactic patterns and hypothesize patterns)
    • CORPUS-BASED WORD SENSE IDENTIFICATION
      Paper
      Claudia Leacock, Martin Chodorow: "Combining Local context and WordNet similarity for Word Sense Identification".
      Project
      Exploiting local context for word sense identification (reproduce experiment 1 presented in the paper)
    • PRECISION-ORIENTED TE MODULES
      Paper
      Bill MacCartney, Trond Grenager, Marie-Catherine de Marneffe, Daniel Cer, Christopher Manning, "Learning to recognize features of valid textual entailments"
      Project
      Implement TE modules (within the EDITS architecture) to handle specific linguistic phenomena relevant to inference
    • TEXT STATISTICS
      Paper
      William H. Fletcher, Making the Web More Useful as a Source for Linguistic Corpora
      Project
      Given a sample of a book, calculate number of types, number of tokens, type/token ratio, possible collocations + data analysis
    • AUTOMATIC KEYPHRASE EXTRACTION FROM SCIENTIFIC ARTICLES
      Paper
      Ken Barker and Nadia Corrnacchia, "Using noun phrase heads to extract document keyphrases", or choose a paper from the SemEval 2010 task proceedings
      Project
      Implement a simple system to automatically extract keyphrases from scientific articles (a simplified version of the SemEval 2010 task)
    • EXTENDING MULTIWORDNET TO A NEW LANGUAGE
      Paper
      Bernardo Magnini, Carlo Strapparava, Giovanni Pezzulo, Alfio Gliozzo, "Comparing Ontology-Based and Corpus-Based Domain Annotation in WordNet"
      Project
      Investigating techniques to extend WN to a new language


    Weekly Programme ^

    The program below is provisional since it will be adapted to the students background. Slides will be updated through the course after each lesson.
    Nr.
    Lec
    Nr.
    Lab
    Date Slides SLP Lab
    1.   30/09/10 Introduction to LCT and CL Chapters 1-3,8.1,8.2: Course Info; Goals of CL; Challenges: Ambiguities at all levels; Morphology; Finite State Automata; Part-of-Speech; Word Class; Constituency.
      1. 06/10/10     LAB: intro
    2.   07/10/10
    Syntax I Chapter 9: Coordination; Formal Grammars; Context-Free Rules and Trees; Sentence-Level Constructions, Chomsky Hierarchy.  
      2. 13/10/10     LAB: How to review a paper.
    3.   14/10/10
    Syntax-Semantics I
    exercises on Lambda
    Chapter 15.1,15.2: Syntax-Driven Semantics; Lambda-Calculus. [Inference]. See also BB1  
      3. 20/10/10
    17:00-18:00!!
        LAB: Corpus Linguistics
    4.   21/10/10 Syntax-Semantics II    
    5.   28/10/10 Lexical Semantics    
      4. 03/11/10     LAB: WordNet
    6.   04/11/10 Textual Entailiment    
      5. 10/11/10     LAB: Reading group on TE
      7. 11/11/11
    3 hrs: 10:30-13:30
        Critiques by students
      8. 17/11/10     LAB: lambda
    7.   18/11/10 CFG and lambda calculus
    intro to CG
       
      9. 24/11/10     Lab: CFG and lambda calculus
    8.   25/11/10 CG and lambda calculus
    Lambek Calculus
       
      10. 01/12/10     Lab: CG, Lambek Calculus and lambda calculus
    9.   02/12/10 Comparison of Formal Grammars    
    10.   09/12/10
    Parsing
       
      11. 15/12/10     Lab: CFG in Prolog
    11.   16/12/10
    Sample Written Exam
       
    12.   13/01/11
    Discussion of sample exam
       
      12. 20/01/2011     Projects Presentation