Free-hand Sketch Understanding and Analysis
Abstract
With the proliferation of touch screens, sketching input has become popular among many software
products. This phenomenon has stimulated a new round of boom in free-hand sketch research,
covering topics like sketch recognition, sketch-based image retrieval, sketch synthesis
and sketch segmentation. Comparing to previous sketch works, the newly proposed works are
generally employing more complicated sketches and sketches in much larger quantity, thanks
to the advancements in hardware. This thesis thus demonstrates some new works on free-hand
sketches, presenting novel thoughts on aforementioned topics.
On sketch recognition, Eitz et al. [32] are the first explorers, who proposed the large-scale
TU-Berlin sketch dataset [32] that made sketch recognition possible. Following their work, we
continue to analyze the dataset and find that the visual cue sparsity and internal structural complexity
are the two biggest challenges for sketch recognition. Accordingly, we propose multiple
kernel learning [45] to fuse multiple visual cues and star graph representation [12] to encode the
structures of the sketches. With the new schemes, we have achieved significant improvement
on recognition accuracy (from 56% to 65.81%). Experimental study on sketch attributes is performed
to further boost sketch recognition performance and enable novel retrieval-by-attribute
applications.
For sketch-based image retrieval, we start by carefully examining the existing works. After
looking at the big picture of sketch-based image retrieval, we highlight that studying the sketch’s
ability to distinguish intra-category object variations should be the most promising direction to
proceed on, and we define it as the fine-grained sketch-based image retrieval problem. Deformable
part-based model which addresses object part details and object deformations is raised
to tackle this new problem, and graph matching is employed to compute the similarity between
deformable part-based models by matching the parts of different models. To evaluate this new
problem, we combine the TU-Berlin sketch dataset and the PASCAL VOC photo dataset [36] to
form a new challenging cross-domain dataset with pairwise sketch-photo similarity ratings, and
our proposed method has shown promising results on this new dataset. Regarding sketch synthesis, we focus on the generating of real free-hand style sketches for
general categories, as the closest previous work [8] only managed to show efficacy on a single
category: human faces. The difficulties that impede sketch synthesis to reach other categories
include the cluttered edges and diverse object variations due to deformation. To address those
difficulties, we propose a deformable stroke model to form the sketch synthesis into a detection
process, which is directly aiming at the cluttered background and the object variations. To alleviate
the training of such a model, a perceptual grouping algorithm is further proposed that
utilizes stroke length’s relationship to stroke semantics, stroke temporal order and Gestalt principles
[58] to perform part-level sketch segmentation. The perceptual grouping provides semantic
part-level supervision automatically for the deformable stroke model training, and an iterative
learning scheme is introduced to gradually refine the supervision and the model training. With
the learned deformable stroke models, sketches with distinct free-hand style can be generated for
many categories.
Authors
Li, YiCollections
- Theses [4121]