Recognizing Sketch Recognition: November 2015

Saturday, November 28, 2015

Reading 33 : Using shape context for sketch recognition

Citation:
Belongie, Serge, Jitendra Malik, and Jan Puzicha. "Shape matching and object recognition using shape contexts." Pattern Analysis and Machine Intelligence, IEEE Transactions on 24.4 (2002): 509-522.
Publication Link

Summary:
This paper is very similar to the $p algorithm, in the fact that it tries to match every point in the candidate to some point in the template. We take the set of vectors originating from a point to all other sample points on a shape. These vectors express the configuration of the entire shape relative to the reference point.

Discussion:
The following library has Javascript implementations of Haudroff Distance and Shape Context. This was the motivation behind looking unto Shape Context as an alternative means of template matching.

https://github.com/kjkjava/Sketchy.js/tree/master

The shape descriptor of a point is represented as a relative distribution over positions, in histogram bins of log-polar space. The cost of this matching for a point is defined using the chi-square statistic:

We choose the matching that minimizes the total cost of this matching.

Reading 32 : $P Recognizer

Citation:

Vatavu, Radu-Daniel, Lisa Anthony, and Jacob O. Wobbrock. "Gestures as point clouds: a $ P recognizer for user interface prototypes." Proceedings of the 14th ACM international conference on Multimodal interaction. ACM, 2012.

Publication Link

Summary:

The $P algorithm is a gesture matching algorithm that follows in the family of $1 and $N algorithms. It is able to compare a candidate gesture with a predefined template, by treating the gestures as point clouds, while removing the timing information. This ensures that the algorithm is invariant to multi-stroke data.

Discussion:

The key approach in the algorithm is to match each point in the candidate to some point in the template. This matching is done by using a matching function. The goodness of the matching is defined using the follow equation:

The Hungarian algorithm is used to find the best matching, by treating this candidate-template point matching as an assignment problem.

The authors feel that this algorithm cannot be classified a $ family algorithm as it is fairly complex and requires us to solve an optimization problem. Some of the heuristics adopted by the authors are:

(1) Find all the Euclidean distances between points in C and T, and then sort thme, and choose the first n points that can yield a valid match.

(2) Use Greedy heuristics

Reading 31: R Trees for Sketch Representations

Citation:

Guttman, Antonin. R-trees: a dynamic index structure for spatial searching. Vol. 14. No. 2. ACM, 1984.

Publication Link

Summary:

The primary motivation behind this paper in the context of this project is to implement relative spatial indexing for sketches and shapes which have a well defined bounding box.

R-Trees are used to spatially index objects that can be represented in an n-dimensional space using intervals in each dimension. In the simplest form, each object is represented by a rectangle, i.e bounding box.

Discussion

For most of our implementations, our sketches are well defined within a bounding box and can hence be spatially indexed to build a tree that gives us the relative position of a feature with respect to all the other feature objects that are present.

We first construct an RTree using a standard dataset of geographical features. The parameters to the RTree restrict the maximum number of nodes and hence the split size and depth of the tree.

To query the RTree, we query the RTree to find out which leaf node the given bounding box is in. If the candidate and template bounding box occur the same node of the RTree, then we say that their relative position is the same.

Saturday, November 21, 2015

Reading 30 : Course of Action maps

Citation:

Hammond, Tracy Anne, et al. "A Sketch Recognition System for Recognizing Free-Hand Course of Action Diagrams." IAAI. 2010.

Publication Link

Summary:

Course of Action diagrams are used by military commanders to plan field operations, with various symbols used to represent troops, supplies, obstacles and movement patterns. An important aspect is the use of domain specific information to tune the recognition algorithm.

Discussion:

Initially, the grouping algorithm detemines which recognizer to send the strokes, PaleoSketch for geometric primitives, and a HWR for text and decision graphics. The segmentation is done by breaking every stroke into a series of lines, and then using Paleosketch.Primitive recognition is done using an extended version of PaleoSketch, which can be combined with LADDER to form more complex shapes. The system also recognizes dashed primitives, decision points (10-line stars), and handwriting. It also uses an improved version of LADDER, CALVIN which uses heuristics to account for confidence values of shape decisions. Some of the other recognised shapes are phase/boundary line, arrows , obstacles and decision graphics.

Reading 29: Hand-Drawn Concept Maps

Citation:

Jiang, Yingying, et al. "Structuring and manipulating hand-drawn concept maps." Proceedings of the 14th international conference on Intelligent user interfaces. ACM, 2009.

Publication Link

Summary:

Concept maps are extensively used in visualization. Also, anything drawn by a person on a map is a mental map of some sort. This paper presents a way of recognizing these concept maps on a hand drawn sketch. We believe some of these recognition concepts can also be used in drawings on maps.

Discussion:

The stages in this approach are preprocessing, partitioning the graph into subgraphs, extracting the block of each subgraph using dynamic programming, and finally generating the concept map's structure.The authors also created a set of intelligent gestures that could be used to directly modify the extracted figures. The paper looks similar to the multi-stroke paper in its graph-based approach of merging strokes.

Reading 28 : Sketch-Based Spatiotemporal Visualization

Citation:

Godwin, Alex, and John Stasko. "Drawing Data on Maps: Sketch-Based Spatiotemporal Visualization."

Publication Link

Summary:

This paper is a joint effort by GeorgiaTech and the Atlanta Police Department, to help create visualization tools for understanding crime patterns, based on the user interaction with a map surface. The system allows sketching of 3 types of paths - shortest path, user defined path and radial path. The main purpose of this paper is to present a technique of visualization aid using map sketching. There is little or no recognition component used here.

Discussion:
The map is constructed using two underlying datasets - spatial events and traversable roads. The paths can be queried either using shortest path, a user-defined path, or a radial path where a user is shown all the possible traversable locations within a particular radius. The rendering of paths is dependent on the spatial characteristics of the path segments, i.e in this case, spatial regions are characterized by the number of crimes recorded. The user also has the power to create visualization summaries by sketching out regions on the map

Saturday, November 14, 2015

Reading 27 : Visual Symbolic Recognizer

Citation:
Tom Y. Ouyang and Randall Davis. 2009. A visual approach to sketched symbol recognition. In Proceedings of the 21st international jont conference on Artifical intelligence (IJCAI'09), Hiroaki Kitano (Ed.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1463-1468.
Publication Link

Summary:
The main approach followed in this paper is to represent symbols as feature images rather than temporally ordered points or geometric primitives. There are four feature images concerned with orientation, and 1 concerned with endpoint. The feature images are first extracted from the input and then there are stages of smoothing, down-sampling and classification.

Discussion:

The online stroke sequences are converted into low level feature images.

In the smoothing and downsampling phases, we try to increase the tolerance to local shifts and distortions by applying a Gaussian smoothing function. We then downsample the images by a factor of 2 using a MAX filter.

The symbol recognition is done using a deformable template matching algorithm. The IDM allows every point in the input image to shift within a 3x3 local window to find the perfect match. To compute the IDM distance, the following equation is used in the paper.

The system also incorporates two optimization schemes in the form of coarse candidate pruning (using the distance between these reduced feature sets), and hierarchical clustering (which uses a branch and bound technique, where clusters are first formed using agglomerative clustering).

Reading 26 : Shape Context

Citation:
Oltmans, Michael. Envisioning sketch recognition: a local feature based approach to recognizing informal sketches. Diss. Massachusetts Institute of Technology, 2007.
Publication Link

Summary:
This paper focuses on recognizing sketches based on their visual appearance rather than stroke information. However, comparing pixels based on perceptual similarity will lead to inaccurate results. Hence, the sketches are represented as parts, and these parts are compared. The main challenges in this include over-tracing, variations in signals, as well as segmentation issues.

Discussion:

The main takeaway from this paper is the bullseye representation used for images. In this representation shapes are described as collections of visual parts and their allowable conceptual variations. A visual part represents the appearance of a region of the shape. In contrast to a conventional tile based matching system, a bullseye representation is selected. Each of the subdivisions of the circular region is a histogram bin that measures how many pixels of ink are contained in that region. A part is represented as a vector formed from the counts in each bin. To form the representation, we compute first calculate the visual parts at a sampling of locations that cover the sketched symbol. Then a standard codebook of parts is used to identify each individual part. This codebook is preprocessed for common parts in the domain(eg. resistors). Finally, the representation of the sketched symbol is calculated as a vector of distances that indicates the degree to which each of the codebook parts appears in the sketched symbol. The match vector representation is used to train a classifer to learn the differences between shape classes and to learn the allowable variations within a shape class.

The second task is to localize of shapes in complete sketches, which is done by identifying candidate locations and training the classifier on them.

Saturday, November 7, 2015

Reading 25: User identification using pressure and tilt data

Citation:

Eoff, Brian David, and Tracy Hammond. "Who dotted that'i'?: context free user differentiation through pressure and tilt pen data." Proceedings of Graphics Interface 2009. Canadian Information Processing Society, 2009.

Summary;

Along with the usual parameters used in understanding a sketch, this paper reviews other information that can be used to unique identify a sketcher. This paper investigates how the physical mechanics of a pen alone including pressure, tilt and speed can be used to unique identify a user. Some of the major use cases for such a system include forensics, identify personal traits to build trends for education, and also for collaboration on online whiteboards.

Discussion:

The data is collected in Cintiq, which provides values for tilt and pressure. This application records the following information about the users strokes, the coordinates, at what time the pen was in contact with the surface, at what angle and using how much pressure.

2 experiments were performed:

Exp1:To determine how consistent a user is in the drawing.

Exp2: To determine if we can build a classifier to find the creator of a sketch

Reading 24 : SketchRead

Citation:

Alvarado, Christine, and Randall Davis. "SketchREAD: a multi-domain sketch recognition engine." Proceedings of the 17th annual ACM symposium on User interface software and technology. ACM, 2004.

Publication Link

Summary:

The aim of this system is to create a universal system that can be used to recognize sketches across multiple domains. A major constraint is such systems is the trade off between accurate recognition and freedom of drawing style.

Discussion:
Most previous systems make use of assumptions or limitations on the drawing style to improve recognition which in turn interferes with the user's liberty of design. Some of the challenges the author discusses are incremental nature of sketching ,messiness and segmentation of strokes into shapes.

Reading 23 : Tutorial on HMMs

Citation:

Rabiner, Lawrence R. "A tutorial on hidden Markov models and selected applications in speech recognition." Proceedings of the IEEE 77.2 (1989): 257-286.

Publication Link

Summary:

This paper is a landmark article on Hidden Markov Models, and is cited by most scientists who use HMMs, and it presents a comprehensive review on HMMs an their applications. The core aspects include the construction of a basic Marov model, additional parameters required to construct a Hidden Markov Model, along with forward and backward chaining algorithms. The paper also deals with the three main problems that are solved using HMMs.

Discussion:
The paper begin with explaining probability primitives and basic Markov models. It then discusses the characterization of the basic parameters required to build a HMM, i..e No. of states, No. of observation, apriori probability, transitional probability and emission probability.

The 3 main uses of the HMM are :

1. Given an observation sequence and a model, what is the probability of that observation sequence? (likelihood of those events occurring)

2. Given a set of observations and a model, what is the most likely sequence of states under which the observations hold?

3. How can the model parameters be adjusted to increase likelihood of a desired observation.

The paper the delves into the application of the Viterbi algorithm for forward and backward chaining. The math details are avoided in this post.

Reading 22 : HMM-based efficient sketch recognition

Citation:

Sezgin, Tevfik Metin, and Randall Davis. "HMM-based efficient sketch recognition." Proceedings of the 10th international conference on Intelligent user interfaces. ACM, 2005.

Publication Link

Summary:

This paper deals with the application of HMMs to sketch recognition, that is treating sketches a series of incremental states built upon every stage of interaction. The goal is to capture variations in drawing style for various figures, and break these down into a sequence of observations that can be used to build a probabilistic model.

Discussion:

The core details of HMMs are discussed in the next paper.

This paper primarily deals with the two phases of recognition using a HMM:

(1) Encoding: In this stage, strokes are converted to geometric primitives.

(2)Segmentation and Recogntion: We need to have identified object classes apriori, and constructed HMMs for them. Individual object recognition is done using forward chaining. To recognize in an overall scene,we end up with an optimization of problem of assigning models to subsequences such that there is maximum likelihood of some scene.