Viu a la UAB

Computer Vision & Pattern Recognition

Department or Institution involved



Information Extraction from Historical Document Images

Topic description
Digital humanities are an emerging topic. The automatic extraction of information from scanned documents stored in archives and libraries of any modality (printed or handwritten text, graphics) is a challenge.
The use of context is very important in improving the recognition, especially when it is handwritten. The structure of objects, language models both for textual and graphical contents, is a kind of syntactic context. There is also another context, the semantic context provided by users which encompasses new modalities of user interaction (sketching interface, collaborative annotation platforms, etc.).
To tackle these challenges, the current research objectives of the hosting group are:
1.To develop models for indexing and cross-linking visual terms in large scale document collections, building common representation spaces for different data modalities.
2.To include syntactic and semantic context in recognition and retrieval architectures (Context aware recognition).
3.To construct an advanced collaborative and tangible interface architecture and validate with end users.
The fellow is expected to contribute to (some of) the previous objectives. Research will be carried out in collaboration with archives with the objective of the interpretation of historical manuscripts such as demographic documents, hand written musical scores and maps, among others.

The Computer Vision Centre
The selected candidate will work in the Computer Vision Centre, Barcelona, a research institute comprising more than 100 researchers and support staff, dedicated to computer vision research and knowledge transfer.  With a strong international projection and links to the industry, the Computer Vision Centre offers an exciting environment for scientific career development.
The Computer Vision Centre has a plan for expansion of its permanent research staff base, and has received the "HR Excellence in Research" award as a provider and supporter of a stimulating and favourable working environment.

Project supervisor & hosting group
Prof. Josep Lladós will supervise the Fellow (with the support of Dr. Alicia Fornés).
The hosting group (“DAG: Document Analysis Group”) is one of the largest, active and internationally recognized teams in document analysis and recognition. The main outcomes in the last five years are: 12 PhD theses, 50 JCR papers, 1.200 k€ in competitive projects and 800 k€ in technology transfer projects.
As a research framework, the team has many National and International running projects as well as several international collaborations, where the Fellow will be involved.

Planned Secondments
The Hosting group has several international collaborations with mutual exchanges of PhD students and post-docs. Examples of such close collaborations are: Université de La Rochelle (France), Université de Lyon (France), DFKI (Germany), University of Dortmund (Germany), University of Florence (Italy), Osaka Prefecture University (Japan), École Polytechnique de Montréal (Canada). The Fellow will do stays of about 2-8 weeks to learn new methods needed for the project.

Candidate’s profile  
A PhD in Computer Science (Computer Vision and Pattern Recognition) or a related field is required. We especially encourage candidates with experience in Document Analysis and Recognition, but other backgrounds with a potential interest to this position will also be considered.
The applicants are expected to be fluent in both oral and written communication in English. They should work well in a team, while demonstrating initiative and independence, and willing to supervise PhD students.



If you are an eligible candidate interested in applying, please do contact pr.sphere@uab.cat to get you in contact with the Hosting Group.

Department or institution involved


Predicting intrinsic properties from images using deep networks

Topic description
Natural images contain many effects caused by the interaction between light and the objects in the scene. These effects include, amongst others, shading, shadows, and specularities. In general, computer vision algorithms are hindered by such effects.

Identifying and isolating these effects and also other physical elements of the scene which participate in the generation of these effects, such as 3D shape, depth and the color of the light source, can be very useful to improve the performance of many algorithms. However, the estimation of these intrinsic properties from a single image is a challenging problem which has received much attention in the last years.

This project aims at the definition of deep network architectures that allow to decompose images into some of their photometric properties such as reflectance, lighting, shading (including self-shadows and cast shadows) or specularities jointly with the corresponding 3D shape properties. To achieve this goal, we propose to work on a large image dataset we are building. This dataset is automatically acquired by creating a wide range of variations in the photometric conditions of the scenes. Scenes are composed by a set of objects whose 3D properties are known.

The fellow is expected to contribute in the current lines of research of the hosting group, and do research tasks as:

•    Reviewing previous works on the topics of the project (i.e. intrinsic image estimation and deep learning).
•    Designing Convolutional Neural Networks to learn the best features based on the large dataset of lighting effects that the hosting group is building.
•    Evaluating the defined models by performing experiments on standard datasets and/or specific datasets of intrinsic properties.


The Computer Vision Centre
The selected candidate will work in the Computer Vision Centre, Barcelona, a research institute comprising more than 100 researchers and support staff, dedicated to computer vision research and knowledge transfer.  With a strong international projection and links to the industry, the Computer Vision Centre offers an exciting environment for scientific career development.

The Computer Vision Centre has a plan for expansion of its permanent research staff base, and has received the "HR Excellence in Research" award as a provider and supporter of a stimulating and favourable working environment.


Project supervisor & hosting group
Prof. Maria Vanrell will supervise the Fellow with the support of the hosting team (Color in Context: www.cic.uab.cat), which is widely recognized for the contribution in color representation for Computer Vision.

The hosting group is collaborating with a number of institutions world-wide. The research fellow will have the opportunity to participate actively in these collaborations, including through research stays.  The activities of the team are supported by multiple research and technology transfer projects.

Planned Secondments
The Hosting group has several international collaborations with mutual exchanges of PhD students and post-docs. The collaborations most related to this project include the following research partners: Dimitris Samaras (Stony Brooks University), Theo Gevers (University of Amsterdam) and Graham Finlayson (University of East Anglia).

Candidate’s profile  
A PhD in computer science or a related field is required.
The applicants must have experience in computer vision, pattern recognition and machine learning techniques, and be able to demonstrate strong analytical and programming skills.

The applicants are expected to be fluent in both oral and written communication in English. They should work well in a team, while demonstrating initiative and independence, and willing to supervise PhD students.

If you are an eligible candidate interested in applying, please do contact pr.sphere@uab.cat to get you in contact with the Hosting Group.

Department or institution involved


Perception-based self-driving system for urban scenarios

Topic description
 A self-driving car is a vehicle capable of sensing its environment and navigating autonomously without human intervention. Autonomous cars can interpret its environment using sensors such as radar, lidar, GPS, odometry, and computer vision, being computer vision one of the most challenging and promising techniques thanks to the Deep Learning technology.

Computer Vision allows to detect and track obstacles (e.g., pedestrians, vehicles, cyclists, traffic signs) perform scene understanding, 3D reconstruction or free space computation among others. Advanced simulation is crucial to train and test the system in corner cases where it is difficult to acquire enough annotated data.

The goal of the project is to design and develop a perception-based self-driving system for urban scenarios, performing a real life test at the UAB campus.

The fellow will have the opportunity to work and lead Ph.D. and Master Students in different deep learning topics applied to autonomous driving such:
•    Object detection
•    End-to-End driving
•    Visual localization and odometry
•    3D reconstruction
•    Scene understanding
•    Free space computation
•    Lane detection

The Computer Vision Centre
The selected candidate will work in the Computer Vision Centre (CVC), Barcelona, a research institute comprising more than 100 researchers and support staff, dedicated to computer vision research and knowledge transfer.  With a strong international projection and links to the industry, the Computer Vision Centre offers an exciting environment for scientific career development.

The Computer Vision Centre has a plan for expansion of its permanent research staff base, and has received the "HR Excellence in Research" award as a provider and supporter of a stimulating and favourable working environment.


Project supervisor & hosting group
Prof. Antonio M. López will supervise the Fellow. Prof. Antonio Lopez is the leader of the Advanced Driver Assistance Systems team at the CVC (ADAS, http://adas.cvc.uab.es). The team has 10 years of experience in ADAS and autonomous vehicles developing such systems for companies like Volkswagen, SEAT, IDIADA, or Samsung.

Currently, ADAS is developing an autonomous car (http://adas.cvc.uab.es/site/elektra). It is an electric vehicle fully automatized and equipped with cameras, GPS, IMU and computing power. The vehicle has already control & planning algorithms, high definition maps for precise localization, environment perception (e.g., Obstacle detection and tracking, scene understanding, 3D reconstruction, free space computation) and V2X communications.

NVIDIA is one of the key partners of the project providing us hardware and prototypes such as Drive PX. With DRIVE PX we are able of boosting the development of our project faster than other groups.

Our research focus is mainly the vehicle perception and understanding based mainly in cameras. For this the Deep learning technology will be key. The remain aspects of the autonomous vehicles are being developed by our research groups partners such us Control & Planning (ACS-UPC), real-time optimization (CAOS-UAB), communications (DEIC-UAB), vehicle automation (ICS-CSIC), etc.

This is a very ambitious project and one of the strategic lines of the CVC. It is totally alienated with the H2020, RIS3Cat and Spanish challenges as well as with the industry interests. In this context, ADAS is currently developing 2 main projects: ACDC (Automated and Connected Driving in the City) and SYNTHIA (www.synthia-dataset.net).

Planned Secondments
The Hosting group has several international collaborations with mutual exchanges of Ph.D. students and post-docs. Examples of such close collaborations are: MILA - Université de Montreal (Montreal, Canada), Daimler AG (Stuttgart, Germany), Toshiba Research Europe (Cambridge, UK), Toshiba Research and Development Center (Kawasaki, Japan), NICTA (Canberra, Australia).

Candidate’s profile  
The candidate should possess a PhD in computer vision or machine learning, and have a strong publication record. We especially encourage candidates with experience in Deep Learning to apply, but other backgrounds in computer vision and machine learning will also be considered.
The applicants are expected to be fluent in both oral and written communication in English. They should work well in a team, while demonstrating initiative and independence, and willing to supervise Ph.D. students.


If you are an eligible candidate interested in applying, please do contact pr.sphere@uab.cat to get you in contact with the Hosting Group.

Department or institution involved


 Scene Text Understanding

Topic description
The successful candidate will work on unconstrained reading systems, capable to detect and recognise textual information in challenging conditions such as urban scenery, born-digital images, videos and images recorded with wearable devices etc.
The selected candidate is expected to align his research to the research priorities of the host group. Of particular interest is the interplay between visual and textual content in an urban scene image, and to what extent such modalities can work synergistically towards urban scene imagery interpretation. Interaction with textual content and camera-based capture and analysis of document images are also of interest.
The candidate is expected to contribute in the current lines of research of the hosting group, which consist of (but are not restricted to): scene text localisation, object proposals for text, script identification, word spotting, language models, deep network architectures, camera-based document image analysis, and human-document interaction among others.

The Computer Vision Centre
The selected candidate will work in the Computer Vision Centre, Barcelona, a research institute comprising more than 100 researchers and support staff, dedicated to computer vision research and knowledge transfer.  With a strong international projection and links to the industry, the Computer Vision Centre offers an exciting environment for scientific career development.
The Computer Vision Centre has a plan for expansion of its permanent research staff base, and has received the "HR Excellence in Research" award as a provider and supporter of a stimulating and favourable working environment.

Project supervisor & hosting group
The fellow will report to Prof. Dimosthenis Karatzas (robust reading and scene-text understanding), Prof. Ernest Valveny (word spotting) and Prof. Andrew Bagdanov (machine learning methods).
The hosting group is collaborating with a number of institutions world-wide. The research fellow will have the opportunity to participate actively in these collaborations, including through research stays.  The activities of the team are supported by multiple research and technology transfer projects.

Candidate’s profile  
A PhD in computer science or a related field is required.
The applicants must have experience in computer vision, pattern recognition and machine learning techniques, and be able to demonstrate strong analytical and programming skills.
The applicants are expected to be fluent in both oral and written communication in English. They should work well in a team, while demonstrating initiative and independence, and willing to supervise PhD students.

If you are an eligible candidate interested in applying, please do contact pr.sphere@uab.cat to get you in contact with the Hosting Group.

Department or institution involved



 Human Pose Recovery and Behavior Analysis

Topic description
Human Action/Gesture recognition is a challenging area of research that deals with the problem of recognizing people in images, detecting and describing body parts, inferring their spatial configuration, and performing action/gesture recognition from still images or image sequences, also including multi-modal data.

The Fellow expected to contribute in the current lines of research of the hosting group, Human Pose Recovery and Behavior Analysis from multi-modal input data. In particular, we want the fellow to perform advances in the following topics:
•    Human-based and deep learning-based feature extraction for human analysis
•    Compositional models analysis for human pose recovery and scene understanding
•    Temporal series and behavior analysis


The Computer Vision Centre
The selected candidate will work in the Computer Vision Centre, Barcelona, a research institute comprising more than 100 researchers and support staff, dedicated to computer vision research and knowledge transfer.  With a strong international projection and links to the industry, the Computer Vision Centre offers an exciting environment for scientific career development.

The Computer Vision Centre has a plan for expansion of its permanent research staff base, and has received the "HR Excellence in Research" award as a provider and supporter of a stimulating and favourable working environment.


Project supervisor & hosting group
Prof. Sergio Escalera will supervise the Fellow with the collaboration of the The Human Pose Recovery and Behavior Analysis team (HUPBA) at CVC.

HUPBA is one of the most active and internationally recognized teams in image analysis for human behavior recognition. It has received several international grants and projects, including the participation in European projects and the organization of international challenges.


Candidate’s profile  
A PhD in computer science or a related field is required.

The applicants must have experience in computer vision, pattern recognition and machine learning techniques, and be able to demonstrate strong analytical and programming skills.
The applicants are expected to be fluent in both oral and written communication in English. They should work well in a team, while demonstrating initiative and independence, and willing to supervise PhD students.


If you are an eligible candidate interested in applying, please do contact pr.sphere@uab.cat to get you in contact with the Hosting Group.

Department or institution involved



Deep Learning for Multi-Modal Data Representations

Topic description
We are seeking a postdoc to join our research line on deep learning for multi-modal data understanding. There is evidence that humans learn more effectively when exposed to multiple modalities.

Deep learning is currently the leading machine learning technique to address speech, image and sound understanding. However these modalities are often considered independently. In this project we aim to jointly learn representations for the various modalities including images, audio and text.

The Computer Vision Centre
The selected candidate will work in the Computer Vision Centre (CVC), Barcelona, a research institute comprising more than 100 researchers and support staff, dedicated to computer vision research and knowledge transfer.  With a strong international projection and links to the industry, the Computer Vision Centre offers an exciting environment for scientific career development.

The Computer Vision Centre has a plan for expansion of its permanent research staff base, and has received the "HR Excellence in Research" award as a provider and supporter of a stimulating and favourable working environment.


Project supervisor & hosting group
The fellow will report to Joost Van de Weijer, leader of the Learning and Machine Perception (LAMP) team at the CVC. The LAMP team is already working on this research line within the context of an international project with partners in France and Canada.


Candidate’s profile  
The candidate should possess a PhD in computer vision or machine learning, and have a strong publication record. We especially encourage candidates with experience in Deep Learning to apply, but other backgrounds in machine learning will also be considered.
The applicants are expected to be fluent in both oral and written communication in English. They should work well in a team, while demonstrating initiative and independence, and willing to supervise PhD students.


If you are an eligible candidate interested in applying, please do contact pr.sphere@uab.cat to get you in contact with the Hosting Group.