DISI Seminar seminar
Fri Mar 25, 2011 at 12:00
Aula Seminari Matematica Via Sommarive, 14
The semantic interpretation of an utterance can be split into a two-level process: a translation process projecting lexical items into basic conceptual constituents and a composition process that takes as input these basic constituents and combine them in a possibly complex semantic interpretation of the utterance, represented, for example, as a set of semantic Frames. Various methods have been proposed for both levels of this process, from statistical tagging approaches to parsing methods. Syntactic information is useful to perform such an understanding process: at the concept level, syntax can help reducing the ambiguity by computing the syntactic functions of the concepts supports in an utterance; at the semantic Frame level, syntactic dependencies can be projected into semantic dependencies to obtain structured semantic objects. Despite its usefulness, syntactic parsing is not always considered when building a Spoken Language Understanding (SLU) system dedicated to process spontaneous speech because of two main issues: firstly transcriptions obtained through an Automatic Speech Recognition (ASR) process contain errors, the amount of errors increasing with the level of spontaneity in speech; secondly, spontaneous speech transcriptions are often difficult to parse using a grammar developed for written text due to the specificities of spontaneous speech syntax (agrammaticality, disfluences such as repairs, false starts or repetitions). Concerning the first issue, one way of dealing with ASR errors is to take into account not only the best word string produced by the ASR process but multiple hypotheses encoded as a word lattice or a confusion network. The consequences of the second issue is that the traditional view of parsing based on context-free grammars is not suitable for processing speech: due to ungrammatical structures in spontaneous speech, writing a generative grammar and annotating transcripts with that grammar remains difficult. New approaches to parsing based on dependency structures and discriminative machine learning techniques are much easier to adapt to speech for two main reasons: (a) they need less training data and (b) the annotation with syntactic dependencies of speech transcripts is simpler than with syntactic constituents. Another advantage is that partial annotation can be performed when the speech is ungrammatical or the ASR transcripts are erroneous. The dependency parsing framework also generates parses much closer to meaning which eases semantic interpretation. Developing discriminative machine learning techniques for spoken dialogue systems raises the problem of the availability of annotated training data. This is done by collecting and annotating spoken utterances characteristic of the targeted application. This task is labour intensive, needs a high level of expertise to be done properly and is one of the major bottlenecks in the deployment of spoken dialogue applications. An alternative solution is to consider, among all the resources needed to perform syntactic analysis, those that can be derived from generic resources already available and those that are linked to the applicative domain. This talk will present some studies done in this framework by the NLP research group of the computer lab (LIF) of Aix Marseille Universite. More precisely I will present a study on the use of dependency parsing in word lattices for Spoken Language Understanding and a study on question detection and characterization for speaker role labelling in broadcast conversation.
I will also give a demo on open source tools developed in Marseille for parsing word lattices (http://macaon.lif.univ-mrs.fr/).
About the speaker
Frédéric Béchet is a researcher in the field of Speech and Natural Language Processing at the Laboratoire d'Informatique Fondamentale (LIF) of Marseille . His research activities are mainly focused on Spoken Language Understanding for both Spoken Dialogue Systems and Speech Mining applications. After studying Computer Science at the Aix-Marseille II University (Luminy), he obtained his PhD in Computer Science in 1994 from the University of Avignon, France. Since then he worked at the Ludwig Maximilian University in Munich, Germany, as a Professor Assistant at the University of Avignon, France, as an invited professor at AT&T Research Shannon Lab in Florham Park, New Jersey, USA, and he is currently a full Professor of Computer Science at the Aix Marseille Université in France. Frédéric Béchet is the author/co-author of over 60 refereed papers in journals and international conferences. He has served on the reviewing committees of several international conferences (ICASSP, Interspeech, ASRU, HLT, EMNLP) and has been an invited reviewer for several journals including: Speech Communication, IEEE Signal Processing Letters, IEEE Transactions on Speech and Audio Processing, Traitement Automatique des Langues. Frédéric Béchet was an elected member of the IEEE Speech and Language Processing Technical Committee (2008-2010) and is part of the board of the French Natural Language Processing association ATALA.
Contact: Giuseppe Riccardi