Hierarchical modeling for first-person vision activity recognition
362 - 377
MetadataShow full item record
We propose a multi-layer framework to recognize ego-centric activities from a wearable camera. We model the activities of interest as hierarchy based on low-level feature groups. These feature groups encode motion magnitude, direction and variation of intra-frame appearance descriptors. Then we exploit the temporal relationships among activities to extract a high-level feature that accumulates and weights past information. Finally, we define a confidence score to temporally smooth the classification decision. The results across multiple public datasets show that the proposed framework outperforms state-of-the-art approaches, e.g., with at least 11% improvement in precision and recall on a 15-h public dataset with six ego-centric activities.