Kernel Philosophy in Natural Language Processing (NLP) 

Whatever linguistic theory we consider, the processing of natural language cannot be accomplished regardless the representation of structured data. As soon as our natural language model becomes richer than simple bag-of-words, the data representation is no longer linear, e.g. POS tag sequences vs. syntactic parse trees.

Classical machine learning approaches attempt to represent structural syntactic/semantic objects by using a flat feature representation, i.e. attribute-value vectors. However, this raises two problems:

1.  There is no well defined theoretical motivation for the feature model. Structural properties may not fit in any flat feature representation.

2.  When the linguistic phenomenon is complex we may not be able to find any suitable linear representation.

Kernel methods for NLP aim to solve both of the above problems. First, the kernel function, allows us to express the similarity between two objects without explicitly defining their feature space. As a result we do not have major feature representation problems.

Second, a linguistic phenomenon can be modeled at a more abstract level where the modeling processing is easier. For example, which features would you use to learn the difference between a correct and incorrect syntactic parse tree? By using the parse tree itself rather than any of its feature representations, we leave the learner to focus only on the properties useful to decide. The tree kernel proposed in (Collins and Duffy 2002) measures the similarity between trees in terms of all common sub-structures.

Third, even if kernel functions can be seen as scalar products in feature spaces, we still preserve the advantage of including a large (possibly infinite) number of features. Moreover, kernel methods can be used along with the Support Vector Machines which are one of the most accurate classification approaches.

Finally, the mathematical formalism behind kernel methods allows us to clearly separate the learning algorithms from features and representation spaces. This increases the assessment of performance between spaces (i.e. baseline vs. more complex spaces).

Given the above properties we believe that kernel methods are a useful mathematical tool to study the reciprocal impact of natural language structures, e.g. syntactic structures over semantic frames and vice versa.




Working in Progress


- New Kernels for predicate argument extraction (done!!! – see publications)

- Kernel methods for semantic Text Categorization (done!!! – see publications)

- Kernels for parse re-ranking

- Kernels for relation extraction

- A complete (and downloadable) tree-kernel based system for predicate argument extraction (coming soon!!!)