Context-based Semi-supervised Joint People Recognition in Consumer Photo Collections using Markov Networks
MetadataShow full item record
Faces, along with the personal identities behind them, are effective elements in organizing a collection of consumer photos, as they represent who was involved. However, the accurate discrimination and subsequent recognition of face appearances is still very challenging. This can be attributed to the fact that faces are usually neither perfectly lit nor captured, particularly in the uncontrolled environments of consumer photos. Unlike, for instance, passport photos that only show faces stripped of their surroundings, Consumer Photo Collections contain a vast amount of meaningful context. For example, consecutively shot photos often correlate in time, location or scene. Further information can also be provided by the people appearing in photos, such as their demographics (ages and gender are often easier to surmise than identities), clothing, or the social relationships among co-occurring people. Motivated by this ubiquitous context, we propose and research people recognition approaches that consider contextual information within photos, as well as across entire photo collections. Our aim of leveraging additional contextual information (as opposed to only considering faces) is to improve recognition performance. However, instead of requiring users to explicitly label specific pieces of contextual information, we wish to implicitly learn and draw from the seemingly coherent content that exists inherently across an entire photo collection. Moreover, unlike conventional approaches that usually predict the identity of only one person’s appearance at a time, we lay out a semi-supervised approach to jointly recognize multiple peoples’ appearances across an entire photo collection simultaneously. As such, our aim is to find the overall best recognition solution. To make context-based joint recognition of people feasible, we research a sparse but efficient graph-based approach that builds on Markov Networks and utilizes distance-based face description methods. We show how to exploit the following specific contextual cues: time, social semantics, body appearances (clothing), gender, scene and ambiguous captions. We also show how to leverage crowd-sourced gamified feedback to iteratively improve recognition performance. Experiments on several datasets demonstrate and validate the effectiveness of our semisupervised graph-based recognition approach compared to conventional approaches.
- Theses