Syntactic
parsers are among of the most useful tools for Natural Language Processing
(NLP) applications. However, how to exploit
the syntactic parse tree information in NLP tasks is considered an open
problem. For example, the learning
models for automatic Word Sense Disambiguation or Coreference Resolution would
benefit from syntactic tree features but their design and selection is not an
easy task.
Convolution
kernels (see Kernel philosophy in NLP) are an
alternative to the explicit feature design. They measure similarity between two
syntactic trees in terms of their substructures (e.g. [Collins and Duffy,
2002]). These approaches have given optimal results [Moschitti, 2004] when
introducing syntactic information in the task of Predicate Argument
Classification.
Suppose we
want to measure the similarity between the parse trees of two noun phrases: "a
dog" and "a cat". The idea of behind syntactic tree kernels is graphically
described by the following figure:
As 3 structures
(out of 5) are completely identical the similarity is equal to 3.
This kind
of similarity has been shown useful to improve the ranking of the m best syntactic parse trees [Collins
and Duffy, 2002]. Other interesting applications concern the classification of
predicate argument structures annotated in PropBank and Question Classification
[Zhang and Lee, 2003; Moschitti, ECML
2006]. To describe the linguistic phenomenon of the former task, i.e. the syntactic/semantic
relation between a predicate and the semantic roles of its arguments, we need
to extract features from some subparts of a syntactic tree. For example given
the following sentence:
Paul gives a talk in
Rome
The
Predicate argument annotation in PropBank is:
[ Arg0 Paul] [ Predicate gives] [ Arg1 a talk] [ ArgM in
The syntactic tree may be:
To
select syntactic information related to a particular argument type we could use
the corresponding subtree, i.e.:
TreeKernels measure the similarity between arguments whereas the kernelbased algorithms (e.g. SVMs) automatically select the substructures that better describe the argument type.
The tree kernel has been encoded inside the well known SVMlight software written by Thorsten Joachims (www.joachims.org). The input format and the new options are compatible with those of the original SVMlight 5.01. Moreover, combination models between tree kernels and feature vectors are available.

Fast Kernel Computation [Moschitti, EACL 2006] (already
available since the previous version).

Vector sets, multiple feature vectors over multiple feature spaces can be specified
in the input. This allows us to use different kernels with different feature
subsets. Given two objects, O_{1} and O_{2},
described by two sets of feature vectors, and , several kernels can be defined:

Tree forests, a set of trees over multiple feature spaces can be specified in the
input. This allows us to use a set of different structured features, e.g. it is
possible to extract different portions from a parse tree and combine the
different contributions. This limits the sparseness of the kernels applied to
the whole tree. Given two objects, O_{1} and O_{2},
described by two sets of trees, and , several kernels can
be defined on:

Two types of tree kernels:
1.
SubSet
Tree kernel (SST) [Collins and Duffy, 2002; Moschitti, EACL
2006]
2.
Subtree
kernel (ST) [Vishwanathan and Smola, 2001; Moschitti,
EACL 2006]

Embedded combinations of Trees and vectors:
1.
sequential summation, the kernels between corresponding
pairs of trees and/or vectors in the input sequence are summed together. The t parameter rules the contributions of tree
kernels K_{t} with respect to
the feature vector kernel k_{b}.
Each of the two types of kernels can or cannot be normalized according to a
command line parameter. More formally:
,
K_{t }can be either the SST or the ST kernel whereas k_{b} can be one of the traditional kernels on feature vectors, e.g. gaussian or polynomial kernel.
2. all vs all summation, each tree and vector of the first object are evaluated against each
tree and vector of the second object:
The input format has changed since previous version as sets of objects have to be specified:
<line> ::= <target><blank><setofvectors>
 <target><blank><setoftrees> 
<target><blank><treesandvectors>
<setofvectors> ::= <vector>
<vector><blank><beginvector><blank><vector><blank>..<endvector>
<setoftrees> ::= <begintree><blank><tree><blank>..<begintree><blank><tree><blank><endtree>
<treesandvectors>::= <setoftrees><blank><setofvectors>
<vector> ::=
<feature>:<value><blank><feature>:<value><blank>...<blank><feature>:<value>
 <blank>
<target> ::=
+1  1  0  <float>
<feature> ::=
<integer>  "qid"
<value> ::=
<float>
<begintree> ::="BT"
<endtree> ::="ET"
<beginvector>::="BV"
<endvector> ::="EV"
<tree> ::= <fulltree>  <blank>
<fulltree> ::=
(<root><blank><fulltree>..<fulltree>) 
(<root><blank><leaf>)
<leaf> ::= <string>
<root> ::= <string>
<blank> ::= " " (i.e. one space)
where <string> does not contain space, left and right parentheses, i.e. "(" and ")", whereas <tree> defines the usual Penn Treebank format. Note that (a) two begin trees, i.e. ".. BT BT .." encode the empty tree (useful as placeholder), (b) two begin vectors, i.e. ".. BV BV .." encode the empty vector and (c) the sequence ".. ET BV .." is used to specify that the first vector (after trees) is empty.
For example, if we liked to experiment with different trees for question classification, given the question "What does S.O.S stand for?", we may use the following forest:
1 BT (SBARQ (WHNP (WP What))(SQ (AUX does)(NP (NNP S.O.S.))(VP (VB
stand)(PP (IN for))))(. ?)) BT (BOW (What *)(does *)(S.O.S.
*)(stand *)(for *)(? *)) BT (BOP (WP *)(AUX *)(NNP *)(VB *)(IN
*)(. *)) BT (PAS (ARG0 (RA1 (What *)))(ARG1 (A1 (S.O.S. NNP)))(ARG2 (rel
stand))) ET
where the four different trees are: the question parse tree, the BOW tree (used to simulate the bagofwords), the BOP tree (used to simulate the bagofPOStags) and the predicate argument tree (i.e. the PAS defined in [Moschitti et al., ECMLMLG 2006])
We can add
trees with different feature vectors. For example suppose we want to implement
a reranker based on tree kernel and flat features. We need to compare pairs of
instances. The following line contains a pair of PASs and a pair of feature
vectors that correspond to two target predicate argument structures. These are
needed for learning of a reranker for Semantic Role Labeling systems
[Moschitti et al., CoNLL 2006].
1 BT (TREE (ARG0 (A1
NP))(ARG1 (AMNEG RB))(ARG2 (rel fall))(ARG3 (AMTMP NNP))(ARG4 (AMTMP
SBAR))(ARG5 null)(ARG6 null)) BT
(TREE (ARG0 (A1 NP))(ARG1 (AMNEG RB))(ARG2 (rel fall))(ARG3 (A4 RP))(ARG4
(AMTMP NNP))(ARG5 (AMTMP SBAR))(ARG6 null)) ET 1:1 21:2.742439465642236E4 23:1 30:1 36:1 39:1 41:1 46:1 49:1
66:1 152:1 274:1 333:1 BV 2:1
21:1.4421347148614654E4 23:1 31:1 36:1 39:1 41:1 46:1 49:1 52:1 66:1 152:1
246:1 333:1 392:1 EV
In case we
liked to use only feature vectors we could write them as follows:
1 1:1 21:2.742439465642236E4 23:1 30:1 36:1 39:1 41:1 46:1 49:1 66:1
152:1 274:1 333:1 BV 2:1
21:1.4421347148614654E4 23:1 31:1 36:1 39:1 41:1 46:1 49:1 52:1 66:1 152:1
246:1 333:1 392:1 EV
However, the
original SVMlight input format can be used without any changes as the next two
lines associated with two instances illustrate:
1 1:1 21:2.742439465642236E4 23:1 30:1 36:1 39:1 41:1 46:1 49:1 66:1
152:1 274:1 333:1
+1 2:1 21:1.4421347148614654E4 23:1 31:1 36:1 39:1 41:1 46:1 49:1 52:1
66:1 152:1 246:1 333:1 392:1
Important: follow the syntax of trees defined
by <tree> (e.g.
there is no space between two parenthesis). Moreover, the expected input is a parsetree this means
that a preterminal (the lowest nonterminal in the tree) is always followed by
only one leaf.
The "svm_classify" and the "svm_learn" commands maintain the original svmlight format:
usage:
svm_learn [options] example_file model_file
Arguments:
example_file> file with training
data
model_file > file to store the learned decision
rules in
usage:
svm_classify [options] example_file model_file
Arguments:
example_file> file with testing
data
model_file > file to retrieve the learned decision
rules
The new options
are shown below (in blue colour):
Kernel
options:
t int > type of kernel function:
0: linear (default)
1: polynomial (s a*b+c)^d
2: radial basis function exp(gamma
ab^2)
3: sigmoid tanh(s a*b + c)
4: user defined kernel
from kernel.h
5: combination of
forest and vector sets according to W, V, S, C options
11: reranking based
on trees (each instance must have two trees)
12: reranking based on vectors
(each instance must have two vectors)
13: reranking based on both
tree and vectors (each instance must have two trees and two vectors)
W [S,A] > a tree kernel is applied to the
sequence of trees of two input forests and the results are summed;
> with an "A", a tree kernel
is applied to all tree pairs from the two forests (default "S")
V
[S,A] > same as before but
sequences of vectors are used (default "S" and the type of vectorbased kernel
is specified by the option S)
S
[0,4] > kernel to be used with
vectors (default polynomial of degree 3, i.e. S = 1 and d = 3)
C
[*,+,T,V] > combination operator
between forests and vectors (default 'T')
> "T" only the contribution
from trees is used
> "V" only the contribution
from feature vectors is used
> "+" or "*" sum or
multiplication of the contributions from feature vectors and trees (default
'T')
T
float > multiplicative constant
for the contribution of tree kernels when C = "+", i.e. K =
treeforestkernel*r + vectorkernel (default 1)
D
[0,1] > 0, SubTree kernel or 1,
SubSet Tree kernels (default 1)
L
float > decay factor in tree
kernels (default 0.4)
N
[0,3] > 0 = no normalization, 1 =
tree normalization, 2 = vector normalization and, 3 = normalization of both trees
and vectors. The normalization is applied to each individual tree or vector
(default 3).
u string > parameter of user defined kernel
d int > parameter d in
polynomial kernel
g float > parameter gamma in rbf kernel
s float > parameter s in sigmoid/poly kernel
r float > parameter c in sigmoid/poly kernel
To obtain the sequential summation K_{s} (of tree and vector kernels) previously defined, we can set the option "t 5 T 1 W S V S C +". Considering the default values, this is equivalent to use "t 5 C +".
./svm_learn
t 5 example_file model_file /* the subsettree kernel alone is used, if the
forest contains only a tree, the classic tree kernel is computed */
./svm_learn
t 5 C V example_file model_file /* the default polynomial kernel is used on
the pairs from vector sequences */
./svm_learn
t 5 C V V A example_file model_file /* the default polynomial kernel is used
on the pairs from
vector
sequences. The pairs are built by combining each element of the first sequence
with each element of the second sequence */
./svm_learn t 5 C + S 1 d 5 example_file model_file /* the sequential summation of trees, using SST kernel, is summed to the sequential summation of vectors, using a polynomial kernel with degree = 5. The contribution of tree kernels is multiplied by t (i.e. default 1) */
./svm_learn t 5 C + D 0 S 1 example_file model_file /* the sequential summation of trees, using the ST kernel (D 0), is summed to the sequential summation of vectors, using polynomial kernel with degree = 5 */
./svm_learn t 12 example_file model_file /* a reranker over a pair of trees and a pair of vectors is applied*/
./svm_learn
example_file model_file /* original SVMlight linear kernel "t 0". The input can
be provided in the new style or in the old SVMlight format*/
 Source code (It is available with the make files for Windows DevC++ and Linux gcc)
 Example data (it contains the PropBank Argument 0 as positive class and Argument 1 as negative class)
It is possible to design our own kernel by using weights for both trees and vectors and combining them in very different way as the following example illustrates:
 kernel.h
All the options of previous SVMLIGHTTK version have been integrated in SVMLIGHTTK 1.2. However, the use of forest and vector set make slightly slower the new version so if you just need a fast computation of one tree kernel please use SVMLIGHTTK1.0
[Moschitti, 2004], Alessandro Moschitti. A study on
Convolution Kernels for Shallow Semantic Parsing. In proceedings of the
42th Conference on Association for Computational Linguistic (ACL2004),
or
[Moschitti, EACL 2006], Alessandro Moschitti, Making
tree kernels practical for natural language learning. In Proceedings of
the Eleventh International Conference on European Association for Computational
Linguistics,
[Joachims,
1999], Thorsten Joachims. Making largescale SVM learning
practical. In B. Scholkopf, C. Burges, and A. Smola, editors, Advances in
Kernel Methods  Support Vector Learning, 1999.
[Collins and Duffy, 2002], Michael Collins and Nigel Duffy. New ranking algorithms for parsing and tagging: Kernels over discrete structures, and the voted perceptron. In ACL02, 2002.
[Moschitti, ECML 2006], Alessandro Moschitti, Efficient Convolution
Kernels for Dependency and Constituent Syntactic Trees. In Proceedings
of the 17th European Conference on Machine
[Moschitti et al., ECMLMLG 2006], Alessandro Moschitti, Daniele Pighin, and Roberto Basili, Tree Kernel Engineering for Proposition Reranking, In Proceedings of Mining and Learning with Graphs (MLG 2006), Workshop held with ECML/PKDD 2006, Berlin, Germany, 2006.
[Moschitti et al., CoNLL
2006], Alessandro Moschitti, Daniele Pighin and Roberto Basili. Semantic Role Labeling via Tree Kernel
joint inference. In Proceedings of the 10th
Conference on Computational Natural Language
[Vishwanathan and Smola, 2002], S.V.N. Vishwanathan and A.J. Smola. Fast kernels on strings and trees. In Proceedings of Neural Information Processing Systems, 2002.
[Zhang and Lee, 2003], Zhang, D., Lee, W.S.: Question classification using support vector machines. In: Proceedings of SIGIR'03, 2003.
Maintained by Alessandro Moschitti
moschitti[at]dit.unitn.it 