The MediaMill TRECVID 2009 Semantic Video Search Engine (bibtex)
by C.G.M. Snoek, K.E.A. van de Sande, O. de Rooij, B. Huurnink, J.R.R. Uijlings, M. van Liempt, M. Bugalho, I. Trancoso, F. Yan, M.A. Tahir, K. Mikolajczyk, J. Kittler, M. de Rijke, J-M Geusebroek, T. Gevers, M. Worring, D.C. Koelma, A.W.M. Smeulders
Abstract:
In this paper we describe our TRECVID 2009 video retrieval experiments. The MediaMill team participated in three tasks: concept detection, automatic search, and interactive search. The starting point for the MediaMill concept detection approach is our top-performing bag-of-words system of last year, which uses multiple color descriptors, codebooks with soft-assignment, and kernel-based supervised learning. We improve upon this baseline system by exploring two novel research directions. Firstly, we study a multi-modal extension by including 20 audio concepts and fusion using two novel multi-kernel supervised learning methods. Secondly, with the help of recently proposed algorithmic refinements of bag-of-word representations, a GPU implementation, and compute clusters, we scale-up the amount of visual information analyzed by an order of magnitude, to a total of 1,000,000 i-frames. Our experiments evaluate the merit of these new components, ultimately leading to 64 robust concept detectors for video retrieval. For retrieval, a robust but limited set of concept detectors justifies the need to rely on as many auxiliary information channels as possible. For automatic search we therefore explore how we can learn to rank various information channels simultaneously to maximize video search results for a given topic. To further improve the video retrieval results, our interactive search experiments investigate the roles of visualizing preview results for a certain browse-dimension and relevance feedback mechanisms that learn to solve complex search topics by analysis from user browsing behavior. The 2009 edition of the TRECVID benchmark has again been a fruitful participation for the MediaMill team, resulting in the top ranking for both concept detection and interactive search. Again a lot has been learned during this year’s TRECVID campaign; we highlight the most important lessons at the end of this paper.
Reference:
C.G.M. Snoek, K.E.A. van de Sande, O. de Rooij, B. Huurnink, J.R.R. Uijlings, M. van Liempt, M. Bugalho, I. Trancoso, F. Yan, M.A. Tahir, K. Mikolajczyk, J. Kittler, M. de Rijke, J-M Geusebroek, T. Gevers, M. Worring, D.C. Koelma, A.W.M. Smeulders, "The MediaMill TRECVID 2009 Semantic Video Search Engine", In Proceedings of the 7th TRECVID Workshop, Gaithersburg, USA, 2009.
Bibtex Entry:
@INPROCEEDINGS{SnoekTRECVID09,
  author = {C.G.M. Snoek and K.E.A. van de Sande and O. de Rooij and B. Huurnink
	and J.R.R. Uijlings and M. van Liempt and M. Bugalho and I. Trancoso
	and F. Yan and M.A. Tahir and K. Mikolajczyk and J. Kittler and M.
	de Rijke and J-M Geusebroek and T. Gevers and M. Worring and D.C.
	Koelma and A.W.M. Smeulders},
  title = {The {MediaMill} {TRECVID} 2009 Semantic Video Search Engine},
  booktitle = {Proceedings of the 7th TRECVID Workshop},
  year = {2009},
  address = {Gaithersburg, USA},
  month = {November},
  abstract = {In this paper we describe our TRECVID 2009 video retrieval experiments.
	The MediaMill team participated in three tasks: concept detection,
	automatic search, and interactive search. The starting point for
	the MediaMill concept detection approach is our top-performing bag-of-words
	system of last year, which uses multiple color descriptors, codebooks
	with soft-assignment, and kernel-based supervised learning. We improve
	upon this baseline system by exploring two novel research directions.
	Firstly, we study a multi-modal extension by including 20 audio concepts
	and fusion using two novel multi-kernel supervised learning methods.
	Secondly, with the help of recently proposed algorithmic refinements
	of bag-of-word representations, a GPU implementation, and compute
	clusters, we scale-up the amount of visual information analyzed by
	an order of magnitude, to a total of 1,000,000 i-frames. Our experiments
	evaluate the merit of these new components, ultimately leading to
	64 robust concept detectors for video retrieval. For retrieval, a
	robust but limited set of concept detectors justifies the need to
	rely on as many auxiliary information channels as possible. For automatic
	search we therefore explore how we can learn to rank various information
	channels simultaneously to maximize video search results for a given
	topic. To further improve the video retrieval results, our interactive
	search experiments investigate the roles of visualizing preview results
	for a certain browse-dimension and relevance feedback mechanisms
	that learn to solve complex search topics by analysis from user browsing
	behavior. The 2009 edition of the TRECVID benchmark has again been
	a fruitful participation for the MediaMill team, resulting in the
	top ranking for both concept detection and interactive search. Again
	a lot has been learned during this year’s TRECVID campaign; we highlight
	the most important lessons at the end of this paper.},
  file = {mediamill-TRECVID2009-final.pdf:http\://staff.science.uva.nl/~cgmsnoek/pub/mediamill-TRECVID2009-final.pdf:PDF},
  url = {http://www.science.uva.nl/research/publications/2009/SnoekPTRECVID2009/mediamill_TRECVID2009_final.pdf}
}
Powered by bibtexbrowser