Future Directions for Indexing and Retrieving Visual Media On Friday November 26th, before the PhD defense of Bouke Huurnink, a related mini-symposium will take place, entitled "Future Directions for Indexing and Retrieving Visual Media". The mini- symposium is organized by the ISLA group at the Informatics Institute of the University of Amsterdam. You are cordially invited to attend both the symposium and the defense. There is no admission fee for the symposium, however space is limited. Please register by sending an e-mail to Saskia van Loo at S.M.vanLoo@uva.nl. See below for the program, locations, and abstracts of the talks. Program 10:30-11:15 Alan Smeaton - Indexing Visual Media and the Box 11:15-11:30 Coffee break 11:30-12:15 Cees Snoek - Visual Concept-Search Solved? 12:15-12:30 Discussion 12:30-14:00 Break (lunch not included) 14:00-15:00 PhD defense Bouke Huurnink - Search in Audiovisual Broadcast Archives 15:00-16:00 Reception Locations Symposium location: VOC Zaal, Oost-Indisch Huis Kloveniersburgwal 48 Amsterdam On map: http://bit.ly/bPtm4Q PhD Defense location: Agnietenkapel, Oudezijdsvoorburgwal 231, Amsterdam On map: http://bit.ly/93CVJ Symposium Abstracts Alan Smeaton - Indexing Visual Media and the Box CLARITY: Centre for Sensor Web Technologies, Dublin City University Content-based information access to visual media is usually derived from the content - the pixels within the box or the frame - or more recently, by manually describing what the box looks like. In the first instance we have extracted low-level features like colours and texture, and then more recently we used semantic features like indoor and outdoor and faces and trees and roads, etc. For describing what the box or frame looks like we've used metadata - automatic tags like time and location or user-assigned tags describing the content. All these techniques represent different ways to describe visual content but in a very cold and clinical way. Perhaps they describe with some elements of context like location or activities, but the techniques don't describe the impact of visual media, the way in which it might change or affect us when we see it, the way it might change our emotional state. Ask anybody to describe a movie for example and they may use terms terms like exciting, uplifting, depressing, emotionally draining, funny, light- hearted, etc. What are the pixel combinations we can use to determine these ? In this presentation I will outline some of the ways in which the less clinical aspects of visual media can be described by capturing human reactions. A brief overview of the kinds of sensors we have for capturing human physiology indicators is followed by an outline of some of the work which we are undertaking in CLARITY towards this goal. While still quite preliminary, the techniques show promise as a way to more comprehensively indexing visual media. Cees Snoek: Visual-Concept Search Solved? University of Amsterdam/UC Berkeley Interpreting the visual signal that enters the brain is an amazingly complex task, deeply rooted in life experience. Approximately half the brain is engaged in assigning a meaning to the incoming image, starting with the categorization of all visual concepts in the scene. Nevertheless, during the past five years, the field of computer vision has made considerable progress. It has done so not on the basis of precise modeling of all encountered objects and scenes -- that task would be too complex and exhaustive to execute -- but on the basis of combining rich, sensory-invariant descriptions of all patches in the scene into semantic classes learned from a limited number of examples. Research has reached the point where one part of the community suggests visual search is practically solved and progress has only been incremental, while another part argues that current solutions are weak and generalize poorly. We've done an experiment to shed light on the issue. Contrary to the widespread belief that visual-search progress is incremental and detectors generalize poorly, we will show in this presentation that progress has doubled on both counts in just three years. In a strive towards machine understanding of images, we highlight current challenges, solutions, and dead ends.