Latent Dependency Mining for Solving Regression Problems in Computer Vision
Abstract
Regression-based frameworks, learning the direct mapping between low-level imagery features
and vector/scalar-formed continuous labels, have been widely exploited in computer vision, e.g.
in crowd counting, age estimation and human pose estimation. In the last decade, many efforts
have been dedicated by researchers in computer vision for better regression fitting. Nevertheless,
solving these computer vision problems with regression frameworks remained a formidable
challenge due to 1) feature variation and 2) imbalance and sparse data. On one hand, large feature
variation can be caused by the changes of extrinsic conditions (i.e. images are taken under
different lighting condition and viewing angles) and also intrinsic conditions (e.g. different aging
process of different persons in age estimation and inter-object occlusion in crowd density
estimation). On the other hand, imbalanced and sparse data distributions can also have an important
effect on regression performance. Apparently, these two challenges existing in regression
learning are related in the sense that the feature inconsistency problem is compounded by sparse
and imbalanced training data and vice versa, and they need be tackled jointly in modelling and
explicitly in representation. This thesis firstly mines an intermediary feature representation consisting
of concatenating spatially localised feature for sharing the information from neighbouring
localised cells in the frames. This thesis secondly introduces the cumulative attribute concept
constructed for learning a regression model by exploiting the latent cumulative dependent nature
of label space in regression, in the application of facial age and crowd density estimation.
The thesis thirdly demonstrates the effectiveness of a discriminative structured-output regression
framework to learn the inherent latent correlation between each element of output variables in
the application of 2D human upper body pose estimation. The effectiveness of the proposed regression
frameworks for crowd counting, age estimation, and human pose estimation is validated
with public benchmarks.
Authors
Chen, KeCollections
- Theses [4404]