Aravinda S. Rao
The University of Melbourne
Publication year: 2015

Abstract

Crowd analysis is a critical problem in understanding crowd behavior for surveillance applications. The current practice is manually scanning video feeds from several sources. Video analytics allows the automatic detection of events of interest, but it faces many challenges because of non-rigid crowd motions and occlusions. The algorithms developed for rigid objects are ineffectual in managing crowds. This study describes the optical flow-based video analytics for crowd analysis and applications include people counting, density estimation, event detection, and abnormal event detection.

There are two main approaches to detecting objects in a video. First, the background modeling approach models the scene background. Modeled pixel values represent the scene, and each pixel value determines whether it belongs to the background or foreground. The second method provides motion information by estimating an object’s motions. Articulated actions and sudden movements of people limit background modeling. Therefore, this thesis uses motion estimation to detect objects.

Crowd density estimation is important for understanding crowd behavior. Optical flow features provide motion information on objects, and refining these features using spatial filters produce motion cues that signal the presence of people. Clustering the motion cues hierarchically results in estimating crowd density, and hierarchical clustering employs single linkage clustering. The approach presented in this paper conducts block-by-block processing of frames, and produces excellent results on a frame-by-frame basis. This is a new approach compared with existing approaches.

Crowd events such as walking, running, merging, separating into groups (“splitting'”), dispersing, and evacuating are critical to understanding crowd behavior. However, video data lie in a high-dimensional space, whereas events lie in a low-dimensional space. This thesis introduces a novel Optical Flow Manifolds (OFM) scheme to detect crowd events. Experiment results suggest that the proposed semi-supervised approach performs best in detecting merging, separating into group (“splitting”), and dispersion events compared with existing methods. The advantages of the semi-supervised approach are the requirement of a single parameter to detect crowd events, and results that are provided on a frame-by-frame basis.

Crowd event detection requires information on the number of neighboring and incoming frames, which is difficult to estimate in advance. Therefore, crowd event detection needs adaptive schemes that can automatically detect events. This study presents a new adaptive crowd event detection approach using the OFM framework. To the best of our knowledge, this is the first study that reports adaptive crowd event detection. Experiment results suggest that the proposed approach accurately detects crowd events and is suitable for near real-time video surveillance systems based on the computational time it needs to detect events.

Anomalous events in crowded videos need spatio-temporal localization of crowd events. Appropriate features and suitable coding of features result in accurate event localization. In this study, the proposed spatial and spatio-temporal coded features detect anomalous events. To the best of our knowledge, this is the first study that reports the detection of loitering people in a video. The approach helps manage crowds, for example, at stadiums, public transport hubs, pedestrian crossings, and other public places.