dc.contributor.author | Tadesse, Girmaw Abebe | |
dc.date.accessioned | 2018-06-19T15:18:13Z | |
dc.date.available | 2018-06-19T15:18:13Z | |
dc.date.issued | 2018-05-14 | |
dc.date.submitted | 2018-06-19T15:19:23.893Z | |
dc.identifier.citation | Tadesse, G.A. 2018. Human activity recognition using a wearable camera. Queen Mary University of London | en_US |
dc.identifier.uri | http://qmro.qmul.ac.uk/xmlui/handle/123456789/39765 | |
dc.description | PhD | en_US |
dc.description.abstract | Advances in wearable technologies are facilitating the understanding of human activities using
first-person vision (FPV) for a wide range of assistive applications. In this thesis, we propose
robust multiple motion features for human activity recognition from first-person videos. The
proposed features encode discriminant characteristics from magnitude, direction and dynamics
of motion estimated using optical flow. Moreover, we design novel virtual-inertial features from
video, without using the actual inertial sensor, from the movement of intensity centroid across
frames. Results on multiple datasets demonstrate that centroid-based inertial features improve
the recognition performance of grid-based features.
Moreover, we propose a multi-layer modelling framework that encodes hierarchical and temporal
relationships among activities. The first layer operates on groups of features that effectively
encode motion dynamics and temporal variations of intra-frame appearance descriptors of activities
with a hierarchical topology. The second layer exploits the temporal context by weighting
the outputs of the hierarchy during modelling. In addition, a post-decoding smoothing technique
utilises decisions on past samples based on the confidence of the current sample. We validate the
proposed framework with several classifiers, and the temporal modelling is shown to improve
recognition performance.
We also investigate the use of deep networks to simplify the feature engineering from firstperson
videos. We propose a stacking of spectrograms to represent short-term global motions
that contains a frequency-time representation of multiple motion components. This enables us
to apply 2D convolutions to extract/learn motion features. We employ long short-term memory
recurrent network to encode long-term temporal dependency among activities. Furthermore, we
apply cross-domain knowledge transfer between inertial-based and vision-based approaches for
egocentric activity recognition. We propose sparsity weighted combination of information from
different motion modalities and/or streams. Results show that the proposed approach performs
competitively with existing deep frameworks, moreover, with reduced complexity. | en_US |
dc.language.iso | en | en_US |
dc.publisher | Queen Mary University of London | en_US |
dc.rights | The copyright of this thesis rests with the author and no quotation from it or information derived from it may be published without the prior written consent of the author | |
dc.subject | Electronic Engineering and Computer Science | en_US |
dc.subject | Interactive and Cognitive Environments | en_US |
dc.subject | wearable technologies | en_US |
dc.subject | first-person vision | en_US |
dc.title | Human activity recognition using a wearable camera | en_US |
dc.type | Thesis | en_US |