Quantifying and transferring contextual information in object detection.
762 - 777
IEEE Trans Pattern Anal Mach Intell
MetadataShow full item record
Context is critical for reducing the uncertainty in object detection. However, context modeling is challenging because there are often many different types of contextual information coexisting with different degrees of relevance to the detection of target object(s) in different images. It is therefore crucial to devise a context model to automatically quantify and select the most effective contextual information for assisting in detecting the target object. Nevertheless, the diversity of contextual information means that learning a robust context model requires a larger training set than learning the target object appearance model, which may not be available in practice. In this work, a novel context modeling framework is proposed without the need for any prior scene segmentation or context annotation. We formulate a polar geometric context descriptor for representing multiple types of contextual information. In order to quantify context, we propose a new maximum margin context (MMC) model to evaluate and measure the usefulness of contextual information directly and explicitly through a discriminant context inference method. Furthermore, to address the problem of context learning with limited data, we exploit the idea of transfer learning based on the observation that although two categories of objects can have very different visual appearance, there can be similarity in their context and/or the way contextual information helps to distinguish target objects from nontarget objects. To that end, two novel context transfer learning models are proposed which utilize training samples from source object classes to improve the learning of the context model for a target object class based on a joint maximum margin learning framework. Experiments are carried out on PASCAL VOC2005 and VOC2007 data sets, a luggage detection data set extracted from the i-LIDS data set, and a vehicle detection data set extracted from outdoor surveillance footage. Our results validate the effectiveness of the proposed models for quantifying and transferring contextual information, and demonstrate that they outperform related alternative context models.