Automated camera ranking and selection using video content and scene context
Abstract
When observing a scene with multiple cameras, an important problem to solve is to automatically
identify “what camera feed should be shown and when?” The answer to this question is of interest
for a number of applications and scenarios ranging from sports to surveillance. In this thesis we
present a framework for the ranking of each video frame and camera across time and the camera
network, respectively. This ranking is then used for automated video production. In the first stage
information from each camera view and from the objects in it is extracted and represented in a way
that allows for object- and frame-ranking. First objects are detected and ranked within and across
camera views. This ranking takes into account both visible and contextual information related to
the object. Then content ranking is performed based on the objects in the view and camera-network
level information. We propose two novel techniques for content ranking namely: Routing Based
Ranking (RBR) and Multivariate Gaussian based Ranking (MVG). In RBR we use a rule based
framework where weighted fusion of object and frame level information takes place while in MVG
the rank is estimated as a multivariate Gaussian distribution. Through experimental and subjective
validation we demonstrate that the proposed content ranking strategies allows the identification of
the best-camera at each time.
The second part of the thesis focuses on the automatic generation of N-to-1 videos based on the
ranked content. We demonstrate that in such production settings it is undesirable to have frequent
inter-camera switching. Thus motivating the need for a compromise, between selecting the best
camera most of the time and minimising the frequent inter-camera switching, we demonstrate that
state-of-the-art techniques for this task are inadequate and fail in dynamic scenes. We propose three
novel methods for automated camera selection. The first method (¡go f ) performs a joint optimization
of a cost function that depends on both the view quality and inter-camera switching so that a
i
Abstract ii
pleasing best-view video sequence can be composed. The other two methods (¡dbn and ¡util) include
the selection decision into the ranking-strategy. In ¡dbn we model the best-camera selection
as a state sequence via Directed Acyclic Graphs (DAG) designed as a Dynamic Bayesian Network
(DBN), which encodes the contextual knowledge about the camera network and employs the past
information to minimize the inter camera switches. In comparison ¡util utilizes the past as well
as the future information in a Partially Observable Markov Decision Process (POMDP) where the
camera-selection at a certain time is influenced by the past information and its repercussions in
the future. The performance of the proposed approach is demonstrated on multiple real and synthetic
multi-camera setups. We compare the proposed architectures with various baseline methods
with encouraging results. The performance of the proposed approaches is also validated through
extensive subjective testing.
Authors
Daniyal, Fahad M.Collections
- Theses [4270]