[my name] CV

Department of Information Engineering
and Computer Science,
University of Trento, Italy

Via Sommarive 5
38123 Trento (Italy)
Phone: +39 0461 285250
Email: aseverynSpamfreegmail

Currently I'm working on the application of Deep Learning techniques to Sentiment Analysis and Question Answering. I'm also actively working on the problem of automatic feature engineering with structural kernels, where we develop novel methods for modeling input texts with syntactic structures and efficient algorithms to train supervised machine learning models on such input. My advisor is Alessandro Moschitti. I am supported by the Google Europe Doctoral Fellowship in Machine Learning.

News and Activities

  • [Dec 2015] Our paper "Deep Neural Networks for Named Entity Recognition in Italian" got the Best Paper Award at Clic-2015
  • [June 2015] Our short paper "Distributional Neural Networks for Automatic Resolution of Crossword Puzzles" accepted at ACL 2015
  • [May 2015] Our short paper "Twitter Sentiment Analysis with Deep Convolutional Neural Networks" accepted at SIGIR 2015
  • [April 2015] Our long paper "Learning to Rank Short Text Pairs with Convolutional Deep Neural Networks" accepted at SIGIR 2015
  • [March 2015] Our journal paper on Multi-lingual Opinion Mining on YouTube accepted for publication in a special issue of IPM journal
  • [March 2015] Our system description paper for Twitter Sentiment Analysis accepted at Semeval-2015
  • [April 2015]Our paper On the Automatic Learning of Sentiment Lexicons accepted at NAACL 2015
  • [Jan 2015]Our deep learning system for Twitter Sentiment Analysis (team unitn) ranked 1st in the term-level subtask and 2nd in the message-level subtask at SemEval-2015 Task 10
  • [May 2014]This summer I'm interning at Google, London helping the Ads team to improve their ML model
  • [April 2014]Our short paper on Reranking Tweets accepted at SIGIR 2014
  • [April 2014]Our long paper on Opinion Mining for YouTube comments accepted at ACL 2014
  • [Jan 2014]Our paper describing the creation of SenTube corpus for Sentiment Analysis on YouTube Social Media accepted at LREC 2014
  • [Jan 2014]Our long paper on "Encoding Semantic Resources in Syntactic Structures for Passage Reranking" accepted at EACL 2014
  • [Dec 2013] Recently I joined Kaggle to participate in the Twitter challenge to predict the sentiment and weather expressed by tweets. My model won the 1st place (with 250 teams competing).
  • [Aug 2013] Long paper accepted at EMNLP 2013
  • [July 2013] Long paper accepted at CIKM 2013
  • [July 2013] I'm visiting Yahoo! Research lab in Barcelona for 2 weeks!
  • [June 2013] I was awarded a Google European PhD Fellowship in Machine Learning! Hurray!!!
  • [May 2013]Long paper accepted at CoNLL 2013
  • [April 2013] Short paper accepted at ACL 2013
  • [April 2013] STS paper accepted for an oral presentation at *SEM 2013 shared task
  • [April 2013] Our system iKernels developed for Semantic Textual Similarity ranked 21st out 90 systems submitted to *SEM 2013 shared task
  • [April 2013] Long paper accepted at IJCAI 2013
  • [Feb 2013] The code repository for CS Master's course on NLPIR & IR is made available. Contains example code, slides, project proposals, etc.
  • [July 2012] Journal paper accepted at Data Mining and Knowledge Discovery (DMKD)
  • [May 2012] I'll be doing a Research internship at Google, Zurich this summer
  • [April 2012] Long paper accepted at SIGIR 2012
  • [Aug 2011] Our long paper won the Best Student Paper Award in Machine Learning at ECML/PKDD 2011
  • [June 2011] Long paper accepted at ECML/PKDD 2011
  • [June 2010] Long paper accepted at ECML/PKDD 2010

Publications

  • Stanislau Semeniuta, Aliaksei Severyn, and Erhardt Barth
    Recurrent Dropout without Memory Loss
    arXiv, 2016 [PDF]

  • Daniele Bonadiman, Aliaksei Severyn and Alessandro Moschitti
    Deep Neural Networks for Named Entity Recognition in Italian [Best Paper Award]
    Clic-IT, 2015 [PDF]

  • Aliaksei Severyn
    Modelling input texts: from Tree Kernels to Deep Learning
    PhD thesis, April 2015 [PDF]

  • Aliaksei Severyn, Massimo Nicosia, Gianni Barlacchi and Alessandro Moschitti
    Distributional Neural Networks for Automatic Resolution of Crossword Puzzles
    In ACL 2015 [PDF] Automatic resolution of Crossword Puzzles (CPs) heavily depends on the quality of the answer candidate lists produced by a retrieval system for each clue of the puzzle grid. Previous work has shown that such lists can be generated using Information Retrieval (IR) search algorithms applied to the databases containing previously solved CPs and reranked with tree kernels (TKs) applied to a syntactic tree representation of the clues. In this paper, we create a labelled dataset of 2 million clues on which we apply an innovative Distributional Neural Network (DNN) for reranking clue pairs. Our DNN is computationally efficient and can thus take advantage of such large datasets showing a large improvement over the TK approach, when the latter uses small training data. In contrast, when data is scarce, TKs outperform DNNs.

  • Aliaksei Severyn and Alessandro Moschitti
    Twitter Sentiment Analysis with Deep Convolutional Neural Networks
    In SIGIR 2015 [PDF] This paper describes our deep learning system for sentiment analysis of tweets. The main contribution of this work is a new model for initializing the parameter weights of the convolutional neural network, which is crucial to train an accurate model while avoiding the need to inject any additional features. Briefly, we use an unsupervised neural language model to train initial word embeddings that are further tuned by our deep learning model on a distant supervised corpus. At a final stage, the pre-trained parameters of the network are used to initialize the model. We train the latter on the supervised training data recently made available by the official system evaluation campaign on Twitter Sentiment Analysis organized by Semeval-2015. A comparison between the results of our approach and the systems participating in the challenge on the official test sets, suggests that our model could be ranked in the first two positions in both the phrase-level subtask A (among 11 teams) and on the message-level subtask B (among 40 teams). This is an important evidence on the practical value of our solution. [poster]

  • Aliaksei Severyn and Alessandro Moschitti
    Learning to Rank Short Text Pairs with Convolutional Deep Neural Networks
    In SIGIR 2015 [PDF] Learning a similarity function between pairs of objects is at the core of learning to rank approaches. In information retrieval tasks we typically deal with query-document pairs, in question answering -- question-answer pairs. However, before learning can take place, such pairs needs to be mapped from the original space of symbolic words into some feature space encoding various aspects of their relatedness, e.g. lexical, syntactic and semantic. Feature engineering is often a laborious task and may require external knowledge sources that are not always available or difficult to obtain. Recently, deep learning approaches have gained a lot of attention from the research community and industry for their ability to automatically learn optimal feature representation for a given task, while claiming state-of-the-art performance in many tasks in computer vision, speech recognition and natural language processing. In this paper, we present a convolutional neural network architecture for reranking pairs of short texts, where we learn the optimal representation of text pairs and a similarity function to relate them in a supervised way from the available training data. Our network takes only words in the input, thus requiring minimal preprocessing. In particular, we consider the task of reranking short text pairs where elements of the pair are sentences. We test our deep learning system on two popular retrieval tasks from TREC: Question Answering and Microblog Retrieval. Our model demonstrates strong performance on the first task beating previous state-of-the-art systems by about 3\% absolute points in both MAP and MRR and shows comparable results on tweet reranking, while enjoying the benefits of no manual feature engineering and no additional syntactic parsers. [code] [slides]

  • Aliaksei Severyn and Alessandro Moschitti
    UNITN: Training Deep Convolutional Neural Network for Twitter Sentiment Classification
    Semeval, 2015 [PDF] This paper describes our deep learning system for sentiment analysis of tweets. The main contribution of this work is a process to initialize the parameter weights of the convolu- tional neural network, which is crucial to train an accurate model while avoiding the need to inject any additional features. Briefly, we use an unsupervised neural language model to initialize word embeddings that are further tuned by our deep learning model on a distant supervised corpus. At a final stage, the pre-trained parameters of the network are used to initialize the model which is then trained on the supervised training data from Semeval-2015. According to results on the official test sets, our model ranks 1st in the phrase-level subtask A (among 11 teams) and 2nd on the message-level subtask B (among 40 teams). Interestingly, computing an average rank over all six test sets (official and five progress test sets) puts our system 1st in both subtasks A and B. [slides] [poster]

  • Aliaksei Severyn and Alessandro Moschitti
    On the Automatic Learning of Sentiment Lexicons
    NAACL, 2015 [PDF] This paper describes a simple and principled approach to automatically construct sentiment lexicons using distant supervision. We induce the sentiment association scores for the lexicon items from a model trained on a weakly supervised corpora. Our empirical findings show that features extracted from such a machine-learned lexicon outperform models using manual or other automatically constructed sentiment lexicons. Finally, our system achieves the state-of-the-art in Twitter Sentiment Analysis tasks from Semeval-2013 and ranks 2nd best in Semeval-2014 according to the average rank. [poster]

  • Aliaksei Severyn, Alessandro Moschitti, Olga Uryupina, Barbara Plank, and Katja Filippova
    Multi-lingual Opinion Mining on YouTube
    In Information Processing Management (IPM) journal, 2015 [PDF]

  • Aliaksei Severyn, Alessandro Moschitti, Manos Tsagkias, Richard Berendsen and Maarten de Rijke
    A Syntax-Aware Re-ranker for Microblog Retrieval
    In SIGIR 2014 [PDF] We tackle the problem of improving microblog retrieval algorithms by proposing a robust structural representation of (query, tweet) pairs. We employ these structures in a principled kernel learning framework that automatically extracts and learns highly discrimi native features. We test the generalization power of our approach on the TREC Microblog 2011 and 2012 tasks. We find that relational syntactic features generated by structural kernels are effective for learning to rank (L2R) and can easily be combined with those of other existing systems to boost their accuracy. In particular, the results show that our L2R approach improves on almost all the participating systems at TREC, only using their raw scores as a single feature. Our method yields an average increase of 5% in retrieval effectiveness and 7 positions in system ranks. [poster]

  • Aliaksei Severyn, Alessandro Moschitti, Olga Uryupina, Barbara Plank, and Katja Filippova
    Opinion Mining on YouTube
    In ACL 2014 [PDF] This paper defines a systematic approach to Opinion Mining (OM) on YouTube comments by (i) modeling classifiers for predicting the opinion polarity and the type of comment and (ii) proposing robust shallow syntactic structures for improving model adaptability. We rely on the tree kernel technology to automatically extract and learn features with better generalization power than bag-of-words. An extensive empirical evaluation on our manually annotated YouTube comments corpus shows a high classification accuracy and highlights the benefits of structural models in a cross-domain setting. [poster]

  • Olga Uryupina, Barbara Plank, Aliaksei Severyn, Agata Rotondi, and Alessandro Moschitti
    SenTube: A Corpus for Sentiment Analysis on YouTube Social Media
    In LREC 2014

  • Kateryna Tymoshenko, Alessandro Moschitti and Aliaksei Severyn
    Encoding Semantic Resources in Syntactic Structures for Passage Reranking
    In EACL 2014

  • Aliaksei Severyn and Alessandro Moschitti
    Automatic Feature Engineering for Answer Selection and Extraction
    In EMNLP 2013 [PDF] This paper proposes a framework for automatically engineering features for two important tasks of question answering: answer sentence selection and answer extraction. We represent question and answer sentence pairs with linguistic structures enriched by semantic information, where the latter is produced by automatic classifiers, e.g., question classifier and Named Entity Recognizer. Tree kernels applied to such structures enable a simple way to generate highly discriminative structural features that combine syntactic and semantic information encoded in the input trees. We conduct experiments on a public benchmark from TREC to compare with previous systems for answer sentence selection and answer extraction. The results show that our models greatly improve on the state of the art, e.g., up to 22% on F1 (relative improvement) for answer extraction, while using no additional resources and no manual feature engineering. [poster]

  • Aliaksei Severyn and Massimo Nicosia and Alessandro Moschitti
    Building Structures from Classifiers for Passage Reranking
    In CIKM 2013 [PDF] This paper shows that learning to rank models can be applied to automatically learn complex patterns, such as relational semantic structures occurring in questions and their answer passages. This is achieved by providing the learning algorithm with a tree representation derived from the syntactic trees of questions and passages connected by relational tags, where the latter are again provided by the means of automatic classifiers, i.e., question and focus classifiers and Named Entity Recognizers. This way effective structural relational patterns are implicitly encoded in the representation and can be automatically utilized by powerful machine learning models such as kernel methods. We conduct an extensive experimental evaluation of our models on well-known benchmarks from the question answer (QA) track of TREC challenges. The comparison with state-of-the-art systems and BM25 show a relative improvement in MAP of more than 14% and 45%, respectively. Further comparison on the task restricted to the answer sentence reranking shows an improvement in MAP of more than 8% over the state of the art. [slides]

  • Aliaksei Severyn and Massimo Nicosia and Alessandro Moschitti
    Learning Adaptable Patterns for Passage Reranking
    In CoNLL 2013 [PDF] This paper proposes passage reranking models that (i) do not require manual feature engineering and (ii) greatly preserve accuracy, when changing application domain. Their main characteristic is the use of relational semantic structures representing questions and their answer passages. The relations are established using information from automatic classifiers, i.e., question category (QC) and focus classifiers (FC) and Named Entity Recognizers (NER). This way (i) effective structural relational patterns can be automatically learned with kernel machines; and (ii) structures are more invariant w.r.t. different domains, thus fostering adaptability. [slides] [poster]

  • Aliaksei Severyn and Massimo Nicosia and Alessandro Moschitti
    Learning Semantic Textual Similarity with Structural Representations
    In ACL 2013 [PDF] Measuring semantic textual similarity (STS) is at the cornerstone of many NLP applications. Different from the majority of approaches, where a large number of pairwise similarity features are used to represent a text pair, our model features the following: (i) it directly encodes input texts into relational syntactic structures; (ii) relies on tree kernels to handle feature engineering automatically; (iii) combines both structural and feature vector representations in a single scoring model, i.e., in Support Vector Regression (SVR); and (iv) delivers significant improvement over the best STS systems. [slides]

  • Aliaksei Severyn and Massimo Nicosia and Alessandro Moschitti
    iKernels-Core: Tree Kernel Learning for Textual Similarity
    In *SEM 2013 [PDF] This paper describes the participation of iKernels system in the Semantic Textual Similarity (STS) shared task at *SEM 2013. Different from the majority of approaches, where a large number of pairwise similarity features are used to learn a regression model, our model directly encodes the input texts into syntactic/semantic structures. Our systems rely on tree kernels to automatically extract a rich set of syntactic patterns to learn a similarity score correlated with human judgements. We experiment with different structural representations derived from constituency and dependency trees. While showing large improvements over the top results from the previous year task (STS-2012), our best system ranks 21st out of total 88 participated in the STS-2013 task. Nevertheless, a slight refinement to our model makes it rank 4th. [BibTex]@InProceedings{severyn:*SEM:2013,
    author = {Severyn, Aliaksei and Nicosia, Massimo and Moschitti, Alessandro},
    title = {iKernels-Core: Tree Kernel Learning for Textual Similarity},
    booktitle = {Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity},
    month = {June},
    year = {2013},
    address = {Atlanta, Georgia, USA},
    publisher = {Association for Computational Linguistics},
    pages = {53--58},
    }
    [slides] [poster]

  • Aliaksei Severyn and Alessandro Moschitti
    Fast Linearization of Tree Kernels over Large-Scale Data.
    In IJCAI 2013 [PDF] Convolution tree kernels have been successfully applied to many language processing tasks for achieving state-of-the-art accuracy. Unfortunately, higher computational complexity of learning with kernels w.r.t. using explicit feature vectors makes them less attractive for large-scale data. In this paper, we study the latest approaches to solve such problems ranging from feature hashing to reverse kernel engineering and approximate cutting plane training with model compression. We derive a novel method that relies on reverse-kernel engineering together with an efficient kernel learning method. The approach gives the advantage of using tree kernels to automatically generate rich structured feature spaces and working in the linear space where learning and testing is fast. We experimented with training sets up to 4 million examples from Semantic Role Labeling. The results show that (i) the choice of correct structural features is essential and (ii) we can speed-up training from weeks to less than 20 minutes. [slides] [poster]

  • Aliaksei Severyn and Alessandro Moschitti
    Structural Relationships for Large-Scale Learning of Answer Re-reranking
    In SIGIR 2012 [PDF] Supervised learning applied to answer re-ranking can highly improve on the overall accuracy of question answering (QA) systems. The key aspect is that the relationships and properties of the question/answer pair composed of a question and the supporting passage of an answer candidate, can be efficiently compared with those captured by the learnt model. In this paper, we define novel supervised approaches that exploit structural relationships between a question and their candidate answer passages to learn a re-ranking model. We model structural representations of both questions and answers and their mutual relationships by just using an off-the-shelf shallow syntactic parser. We encode structures in Support Vector Machines (SVMs) by means of sequence and tree kernels, which can implicitly represent question and answer pairs in huge feature spaces. Such models together with the latest approach to fast kernel-based learning enabled the training of our rerankers on hundreds of thousands of instances, which previously rendered intractable for kernelized SVMs. The results on two different QA datasets, e.g., Answerbag and Jeopardy! data, show that our models deliver large improvement on passage re-ranking tasks, reducing the error in Recall of BM25 baseline by about 18\%. One of the key findings of this work is that, despite its simplicity, shallow syntactic trees allow for learning complex relational structures, which exhibits a steep learning curve with the increase in the training size. [BibTex]@inproceedings{sigir:Severyn:2012,
    author = {Aliaksei Severyn and Alessandro Moschitti},
    title = {Structural relationships for large-scale learning of answer re-ranking},
    booktitle = {SIGIR},
    year = {2012},
    pages = {741-750},
    }

  • Aliaksei Severyn and Alessandro Moschitti
    Fast Support Vector Machines for Convolution Tree Kernels
    In Data Mining and Knowledge Discovery (DMKD), Volume 25 (special issue), 2012.
    [PDF] Feature engineering is one of the most complex aspects of system design in machine learning. Fortunately, kernel methods provide the designer with formidable tools to tackle such complexity. Among others, tree kernels (TKs) have been successfully applied for representing structured data in diverse domains, ranging from bioinformatics and data mining to natural language processing. One drawback of such methods is that learning with them typically requires a large number of kernel computations (quadratic in the number of training examples) between training examples. However, in practice substructures often repeat in the data which makes it possible to avoid a large number of redundant kernel evaluations. In this paper, we propose the use of Directed Acyclic Graphs (DAGs) to compactly represent trees in the training algorithm of Support Vector Machines. In particular, we use DAGs for each iteration of the cutting plane algorithm (CPA) to encode the model composed by a set of trees. This enables DAG kernels to efficiently evaluate TKs between the current model and a given training tree. Consequently, the amount of total computation is reduced by avoiding redundant evaluations over shared substructures. We provide theory and algorithms to formally characterize the above idea, which we tested on several datasets. The empirical results confirm the benefits of the approach in terms of significant speedups over previous state-of-the-art methods. In addition, we propose an alternative sampling strategy within the CPA to address the class-imbalance problem, which coupled with fast learning methods provides a viable TK learning framework for a large class of real-world applications. [BibTex]@article {DMKD:Severyn:2012,
    author = {Aliaksei Severyn and Alessandro Moschitti},
    title = {Fast support vector machines for convolution tree kernels},
    journal = {Data Min. Knowl. Discov.},
    volume = {25},
    number = {2},
    year = {2012},
    pages = {325-357},
    }

  • Aliaksei Severyn and Alessandro Moschitti
    Fast Support Vector Machines for Structural Kernels [Best Student Paper Award]
    In ECML/PKDD 2011 [PDF] In this paper, we propose three important enhancements of the approximate cutting plane algorithm (CPA) to train Support Vector Machines with structural kernels: (i) we exploit a compact yet exact representation of cutting plane models using directed acyclic graphs to speed up both training and classification, (ii) we provide a parallel implementation, which makes the training scale almost linearly with the number of CPUs, and (iii) we propose an alternative sampling strategy to handle class-imbalanced problem and show that theoretical convergence bounds are preserved. The experimental evaluations on three diverse datasets demonstrate the soundness of our approach and the possibility to carry out fast learning and classification with structural kernels. [BibTex]@inproceedings{Severyn:2011:ECML,
    author = {Severyn, Aliaksei and Moschitti, Alessandro},
    title = {Fast Support Vector Machines for Structural Kernels},
    booktitle = {ECML/PKDD (3)},
    year = {2011},
    isbn = {978-3-642-23807-9},
    pages = {175-190},
    keywords = {natural language processing, structural kernels, support vector machines},
    }
    [Slides] [Video]

  • Aliaksei Severyn and Alessandro Moschitti
    Large-Scale Learning with Structural Kernels for Class-Imbalanced Datasets
    In Communications in Computer and Information Science (CCIS), Springer 2011.

  • Aliaksei Severyn and Alessandro Moschitti
    Large-Scale Support Vector Learning with Structural Kernels
    In ECML/PKDD 2010 [PDF] In this paper, we present an extensive study of the cutting-plane algorithm (CPA) applied to structural kernels for advanced text classification on large datasets. In particular, we carry out a comprehensive experimentation on two interesting natural language tasks, e.g. predicate argument extraction and question answering. Our results show that (i) CPA applied to train a non-linear model with different tree kernels fully matches the accuracy of the conventional SVM algorithm while being ten times faster; (ii) by using smaller sampling sizes to approximate subgradients in CPA we can trade off accuracy for speed, yet the optimal parameters and kernels found remain optimal for the exact SVM. These results open numerous research perspectives, e.g. in natural language processing, as they show that complex structural kernels can be efficiently used in real-world applications. For example, for the first time, we could carry out extensive tests of several tree kernels on millions of training instances. As a direct benefit, we could experiment with a variant of the partial tree kernel, which we also propose in this paper. [BibTex]@inproceedings{Severyn:2010:ECML,
    author = {Severyn, Aliaksei and Moschitti, Alessandro},
    title = {Large-scale support vector learning with structural kernels},
    booktitle = {ECML/PKDD (3)},
    year = {2010},
    isbn = {978-3-642-15938-1},
    pages = {229-244},
    keywords = {natural language processing, structural kernels, support vector machines},
    }
    [Slides]

[my name] CV

* a keyword cloud generated from abstracts of my papers