HERMES, the wholeseeing eye

12/2009

HERMES, the wholeseeing eye

HERMES (Human Expressive Graphic Representation of Motion and their Evaluation in Sequences) analyses human behaviour based on video sequences captured at three different focus levels: the individual as a relatively distant object; the individual's body at medium length so as to be able to analyse body postures; and the individual's face, which allows a detailed study of facial expressions. The information obtained is processed by computer vision and artificial intelligence algorithms, which permits the system to learn and recognise movement patterns.

HERMES offers two important innovations in the field of computer vision. The first is the description of in natural language movement captured by the cameras, through simple and precise phrases which appear on the computer screen in real time, together with the frame number in which the action is taking place. The system uses an avatar to talk and describe this information in different languages. The second innovation is the possibility to analyse and discover potentially unusual behaviour - based on the movements it recognises - and give off warning signals. For example, HERMES sends a signal to the control centre of an underground station after capturing the image of someone trying to cross the tracks, or alerts a medical centre if an elderly person living alone falls.

Seven different sub-projects have been developed by researchers working on the HERMES project:

1.- Cameras system: static cameras were used to supply a full scene, and high resolution active cameras - pan-tilt-zoom sensors (horizontal and vertical inclinations and zoom) - were used for the automatic tracking and close-ups of individuals. To do this, optimisation techniques were applied to the information contained in the images.
2.- Movement analysis of objects and individuals in the images. The information obtained is used to guide the active cameras towards where the action is taking place. These tasks were carried out using different tracking techniques.
3.- Movement analysis of a person's body in order to extract information from different parts of the body, analyse these actions and describe or predict behaviours. In this case, techniques based on pattern and silhouette recognition were used.
4.- Analysis of facial movements to understand emotional states of an individual, attitudes and possible reactions. In this sub-project new techniques were created and used for the tracking and aligning of 2D and 3D faces.
5.- Integration of software and natural language with the aim of describing what is happening in the scenes recorded using a conceptual representation scheme.
6.- Full integration of system, software and hardware to work in real environments and in real time. The system was designed and put to use in real life situations to test its functioning.
7.- The generation of virtual sequences based on the description of behaviours in natural language and the interaction of real and virtual worlds in the same sequence, using increased reality techniques.

The application advantages of HERMES are obvious, mainly in the fields of intelligent surveillance and the prevention of accidents or crimes. However, researchers consider that there is much to be gained with the use of this tool in sectors such as marketing or psychology.

The HERMES project was coordinated by Juan José Villanueva, emeritus professor of the Department of Computer Science at UAB and former director of the CVC, a position he held for 14 years.

HERMES, which has received several scientific awards in many of the top specialised conferences, was carried out as part of the 6th European Research Framework Programme. The project, lasting three and a half years and with 2,100,000 euros in funding, included the participation of five of the field's most prominent research groups in Europe and of a firm specialising in information and communication technologies: the Computer Vision Centre at UAB, Spain, which worked as project coordinator; the Institut für Algorithmen und Kognitive Systeme (IAKS) at the University of Karlsruhe, Germany; the Computer Vision and Media Technology Laboratory (CVMT) at the University of Aalborg, Denmark; the Computer Vision Laboratory (BIWI) at the ETH -Eidgenoesssiche Technische Hochschule Zürich, Switzerland; the Active Vision Laboratory (AVL) at Oxford University, United Kingdom; and Answare Technologies, Spain.

HERMES video

Juan José Villanueva

juanjo@cvc.uab.es