LABIC - Bioinformatics and Computational Intelligence
Laboratory
Local Repository of Research Datasets
OSVidCAP: a Framework for the Simultaneous Recognition and Description of Concurrent Actions in Videos in an Open-Set Scenario
Introduction
Automatically understanding and describing the visual content of videos in natural language is a challenging task.
Most current approaches are designed to describe single events in a closed-set setting.
However, in real-world scenarios, concurrent activities and previously unseen actions may appear ina video.
The OSVidCap is a novel open-set video captioning framework that recognizes and describes, in natural language, concurrent known actions and detects the unknown ones.
It is based on the encoder-decoder framework and uses a detection-and-tracking-object-based mechanism followed by a background blurring method to focus on specific targets in a video. Additionally, the TI3D Network with the Extreme Value Machine (EVM) is also used to learn representations and recognize unknown actions.
Dataset Description
In our experiments, we use the LIRIS human activities dataset. It was designed for recognizing complex and realistic actions in videos and made availablefor the ICPR-HARL’2012 competition.
The full dataset con-tains 828 actions (including discussing, telephone calls, givingan item, etc.) performed by 21 different people in 10 differentclasses.
It was organized into two independentsubsets: the D1 subset, with depth and grayscale images, andthe D2 subset, with color images. The dataset also has unan-notated actions, such as walking, running, whiteboard writing,book leafing, etc.
n this work we used the D2 subset that contains 367 anno-tated actions from 167 videos. Each action consists of one ormore people performing one or more different activities. Be-sides, 116 video segments in 15 different unannotated actionswere extracted from the original videos. They were considered as unknown classes. Each new video segment was also anno-tated with spatial, temporal, and description information.