|dc.description.abstract||Faces, along with the personal identities behind them, are effective elements in organizing a
collection of consumer photos, as they represent who was involved. However, the accurate discrimination
and subsequent recognition of face appearances is still very challenging. This can be
attributed to the fact that faces are usually neither perfectly lit nor captured, particularly in the
uncontrolled environments of consumer photos.
Unlike, for instance, passport photos that only show faces stripped of their surroundings,
Consumer Photo Collections contain a vast amount of meaningful context. For example, consecutively
shot photos often correlate in time, location or scene. Further information can also
be provided by the people appearing in photos, such as their demographics (ages and gender are
often easier to surmise than identities), clothing, or the social relationships among co-occurring
Motivated by this ubiquitous context, we propose and research people recognition approaches
that consider contextual information within photos, as well as across entire photo collections. Our
aim of leveraging additional contextual information (as opposed to only considering faces) is to
improve recognition performance. However, instead of requiring users to explicitly label specific
pieces of contextual information, we wish to implicitly learn and draw from the seemingly
coherent content that exists inherently across an entire photo collection.
Moreover, unlike conventional approaches that usually predict the identity of only one person’s
appearance at a time, we lay out a semi-supervised approach to jointly recognize multiple
peoples’ appearances across an entire photo collection simultaneously. As such, our aim is to
find the overall best recognition solution.
To make context-based joint recognition of people feasible, we research a sparse but efficient
graph-based approach that builds on Markov Networks and utilizes distance-based face description
methods. We show how to exploit the following specific contextual cues: time, social semantics,
body appearances (clothing), gender, scene and ambiguous captions. We also show how
to leverage crowd-sourced gamified feedback to iteratively improve recognition performance.
Experiments on several datasets demonstrate and validate the effectiveness of our semisupervised
graph-based recognition approach compared to conventional approaches.||en_US