Recognizing Sketch Recognition: 2015

Saturday, December 12, 2015

Reading 35: Enhancing KNN Search in Spatial Indexes

Citation:

Cheung, King Lum, and Ada Wai-Chee Fu. "Enhanced nearest neighbour search on the R-tree." ACM SIGMOD Record 27.3 (1998): 16-21.

Publication Link

Summary:

This paper gives an improves set of heuristics for the R* Tree, that makes indexing and retrieving neighbours of any spatial object, much more efficient.

Discussion:

We perform relative indexing primarily by extracting the nearest neighbours of a drawn feature, and checking how much the relative position of the two features match. However, this approach fails drastically when we have insufficient features indexed. In such a case, we have to resort to conventional algorithms that are translation independant.

Reading 34 : 2D Curve Similarity Algorithms

Citation:

Deriche, Rachid, and Olivier Faugeras. "2-D curve matching using high curvature points: application to stereo vision." Pattern Recognition, 1990. Proceedings., 10th International Conference on. Vol. 1. IEEE, 1990.

Publication Link

Summary:

This paper deals with the high curvature points of curves, and uses derivatives to calculate curve maxima.

Discussion:

Our application requires us to be able to gauge the similarity between two freely drawn curves, and we expect a very optimistic approximation of correctness between a template and candidate curve in the case of rivers.

Although the curve detection methods presented here are for a computer vision domain, they can be easily translated to the sketch domain.

Saturday, November 28, 2015

Reading 33 : Using shape context for sketch recognition

Citation:
Belongie, Serge, Jitendra Malik, and Jan Puzicha. "Shape matching and object recognition using shape contexts." Pattern Analysis and Machine Intelligence, IEEE Transactions on 24.4 (2002): 509-522.
Publication Link

Summary:
This paper is very similar to the $p algorithm, in the fact that it tries to match every point in the candidate to some point in the template. We take the set of vectors originating from a point to all other sample points on a shape. These vectors express the configuration of the entire shape relative to the reference point.

Discussion:
The following library has Javascript implementations of Haudroff Distance and Shape Context. This was the motivation behind looking unto Shape Context as an alternative means of template matching.

https://github.com/kjkjava/Sketchy.js/tree/master

The shape descriptor of a point is represented as a relative distribution over positions, in histogram bins of log-polar space. The cost of this matching for a point is defined using the chi-square statistic:

We choose the matching that minimizes the total cost of this matching.

Reading 32 : $P Recognizer

Citation:

Vatavu, Radu-Daniel, Lisa Anthony, and Jacob O. Wobbrock. "Gestures as point clouds: a $ P recognizer for user interface prototypes." Proceedings of the 14th ACM international conference on Multimodal interaction. ACM, 2012.

Publication Link

Summary:

The $P algorithm is a gesture matching algorithm that follows in the family of $1 and $N algorithms. It is able to compare a candidate gesture with a predefined template, by treating the gestures as point clouds, while removing the timing information. This ensures that the algorithm is invariant to multi-stroke data.

Discussion:

The key approach in the algorithm is to match each point in the candidate to some point in the template. This matching is done by using a matching function. The goodness of the matching is defined using the follow equation:

The Hungarian algorithm is used to find the best matching, by treating this candidate-template point matching as an assignment problem.

The authors feel that this algorithm cannot be classified a $ family algorithm as it is fairly complex and requires us to solve an optimization problem. Some of the heuristics adopted by the authors are:

(1) Find all the Euclidean distances between points in C and T, and then sort thme, and choose the first n points that can yield a valid match.

(2) Use Greedy heuristics

Reading 31: R Trees for Sketch Representations

Citation:

Guttman, Antonin. R-trees: a dynamic index structure for spatial searching. Vol. 14. No. 2. ACM, 1984.

Publication Link

Summary:

The primary motivation behind this paper in the context of this project is to implement relative spatial indexing for sketches and shapes which have a well defined bounding box.

R-Trees are used to spatially index objects that can be represented in an n-dimensional space using intervals in each dimension. In the simplest form, each object is represented by a rectangle, i.e bounding box.

Discussion

For most of our implementations, our sketches are well defined within a bounding box and can hence be spatially indexed to build a tree that gives us the relative position of a feature with respect to all the other feature objects that are present.

We first construct an RTree using a standard dataset of geographical features. The parameters to the RTree restrict the maximum number of nodes and hence the split size and depth of the tree.

To query the RTree, we query the RTree to find out which leaf node the given bounding box is in. If the candidate and template bounding box occur the same node of the RTree, then we say that their relative position is the same.

Saturday, November 21, 2015

Reading 30 : Course of Action maps

Citation:

Hammond, Tracy Anne, et al. "A Sketch Recognition System for Recognizing Free-Hand Course of Action Diagrams." IAAI. 2010.

Publication Link

Summary:

Course of Action diagrams are used by military commanders to plan field operations, with various symbols used to represent troops, supplies, obstacles and movement patterns. An important aspect is the use of domain specific information to tune the recognition algorithm.

Discussion:

Initially, the grouping algorithm detemines which recognizer to send the strokes, PaleoSketch for geometric primitives, and a HWR for text and decision graphics. The segmentation is done by breaking every stroke into a series of lines, and then using Paleosketch.Primitive recognition is done using an extended version of PaleoSketch, which can be combined with LADDER to form more complex shapes. The system also recognizes dashed primitives, decision points (10-line stars), and handwriting. It also uses an improved version of LADDER, CALVIN which uses heuristics to account for confidence values of shape decisions. Some of the other recognised shapes are phase/boundary line, arrows , obstacles and decision graphics.

Reading 29: Hand-Drawn Concept Maps

Citation:

Jiang, Yingying, et al. "Structuring and manipulating hand-drawn concept maps." Proceedings of the 14th international conference on Intelligent user interfaces. ACM, 2009.

Publication Link

Summary:

Concept maps are extensively used in visualization. Also, anything drawn by a person on a map is a mental map of some sort. This paper presents a way of recognizing these concept maps on a hand drawn sketch. We believe some of these recognition concepts can also be used in drawings on maps.

Discussion:

The stages in this approach are preprocessing, partitioning the graph into subgraphs, extracting the block of each subgraph using dynamic programming, and finally generating the concept map's structure.The authors also created a set of intelligent gestures that could be used to directly modify the extracted figures. The paper looks similar to the multi-stroke paper in its graph-based approach of merging strokes.

Reading 28 : Sketch-Based Spatiotemporal Visualization

Citation:

Godwin, Alex, and John Stasko. "Drawing Data on Maps: Sketch-Based Spatiotemporal Visualization."

Publication Link

Summary:

This paper is a joint effort by GeorgiaTech and the Atlanta Police Department, to help create visualization tools for understanding crime patterns, based on the user interaction with a map surface. The system allows sketching of 3 types of paths - shortest path, user defined path and radial path. The main purpose of this paper is to present a technique of visualization aid using map sketching. There is little or no recognition component used here.

Discussion:
The map is constructed using two underlying datasets - spatial events and traversable roads. The paths can be queried either using shortest path, a user-defined path, or a radial path where a user is shown all the possible traversable locations within a particular radius. The rendering of paths is dependent on the spatial characteristics of the path segments, i.e in this case, spatial regions are characterized by the number of crimes recorded. The user also has the power to create visualization summaries by sketching out regions on the map

Saturday, November 14, 2015

Reading 27 : Visual Symbolic Recognizer

Citation:
Tom Y. Ouyang and Randall Davis. 2009. A visual approach to sketched symbol recognition. In Proceedings of the 21st international jont conference on Artifical intelligence (IJCAI'09), Hiroaki Kitano (Ed.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1463-1468.
Publication Link

Summary:
The main approach followed in this paper is to represent symbols as feature images rather than temporally ordered points or geometric primitives. There are four feature images concerned with orientation, and 1 concerned with endpoint. The feature images are first extracted from the input and then there are stages of smoothing, down-sampling and classification.

Discussion:

The online stroke sequences are converted into low level feature images.

In the smoothing and downsampling phases, we try to increase the tolerance to local shifts and distortions by applying a Gaussian smoothing function. We then downsample the images by a factor of 2 using a MAX filter.

The symbol recognition is done using a deformable template matching algorithm. The IDM allows every point in the input image to shift within a 3x3 local window to find the perfect match. To compute the IDM distance, the following equation is used in the paper.

The system also incorporates two optimization schemes in the form of coarse candidate pruning (using the distance between these reduced feature sets), and hierarchical clustering (which uses a branch and bound technique, where clusters are first formed using agglomerative clustering).

Reading 26 : Shape Context

Citation:
Oltmans, Michael. Envisioning sketch recognition: a local feature based approach to recognizing informal sketches. Diss. Massachusetts Institute of Technology, 2007.
Publication Link

Summary:
This paper focuses on recognizing sketches based on their visual appearance rather than stroke information. However, comparing pixels based on perceptual similarity will lead to inaccurate results. Hence, the sketches are represented as parts, and these parts are compared. The main challenges in this include over-tracing, variations in signals, as well as segmentation issues.

Discussion:

The main takeaway from this paper is the bullseye representation used for images. In this representation shapes are described as collections of visual parts and their allowable conceptual variations. A visual part represents the appearance of a region of the shape. In contrast to a conventional tile based matching system, a bullseye representation is selected. Each of the subdivisions of the circular region is a histogram bin that measures how many pixels of ink are contained in that region. A part is represented as a vector formed from the counts in each bin. To form the representation, we compute first calculate the visual parts at a sampling of locations that cover the sketched symbol. Then a standard codebook of parts is used to identify each individual part. This codebook is preprocessed for common parts in the domain(eg. resistors). Finally, the representation of the sketched symbol is calculated as a vector of distances that indicates the degree to which each of the codebook parts appears in the sketched symbol. The match vector representation is used to train a classifer to learn the differences between shape classes and to learn the allowable variations within a shape class.

The second task is to localize of shapes in complete sketches, which is done by identifying candidate locations and training the classifier on them.

Saturday, November 7, 2015

Reading 25: User identification using pressure and tilt data

Citation:

Eoff, Brian David, and Tracy Hammond. "Who dotted that'i'?: context free user differentiation through pressure and tilt pen data." Proceedings of Graphics Interface 2009. Canadian Information Processing Society, 2009.

Summary;

Along with the usual parameters used in understanding a sketch, this paper reviews other information that can be used to unique identify a sketcher. This paper investigates how the physical mechanics of a pen alone including pressure, tilt and speed can be used to unique identify a user. Some of the major use cases for such a system include forensics, identify personal traits to build trends for education, and also for collaboration on online whiteboards.

Discussion:

The data is collected in Cintiq, which provides values for tilt and pressure. This application records the following information about the users strokes, the coordinates, at what time the pen was in contact with the surface, at what angle and using how much pressure.

2 experiments were performed:

Exp1:To determine how consistent a user is in the drawing.

Exp2: To determine if we can build a classifier to find the creator of a sketch

Reading 24 : SketchRead

Citation:

Alvarado, Christine, and Randall Davis. "SketchREAD: a multi-domain sketch recognition engine." Proceedings of the 17th annual ACM symposium on User interface software and technology. ACM, 2004.

Publication Link

Summary:

The aim of this system is to create a universal system that can be used to recognize sketches across multiple domains. A major constraint is such systems is the trade off between accurate recognition and freedom of drawing style.

Discussion:
Most previous systems make use of assumptions or limitations on the drawing style to improve recognition which in turn interferes with the user's liberty of design. Some of the challenges the author discusses are incremental nature of sketching ,messiness and segmentation of strokes into shapes.

Reading 23 : Tutorial on HMMs

Citation:

Rabiner, Lawrence R. "A tutorial on hidden Markov models and selected applications in speech recognition." Proceedings of the IEEE 77.2 (1989): 257-286.

Publication Link

Summary:

This paper is a landmark article on Hidden Markov Models, and is cited by most scientists who use HMMs, and it presents a comprehensive review on HMMs an their applications. The core aspects include the construction of a basic Marov model, additional parameters required to construct a Hidden Markov Model, along with forward and backward chaining algorithms. The paper also deals with the three main problems that are solved using HMMs.

Discussion:
The paper begin with explaining probability primitives and basic Markov models. It then discusses the characterization of the basic parameters required to build a HMM, i..e No. of states, No. of observation, apriori probability, transitional probability and emission probability.

The 3 main uses of the HMM are :

1. Given an observation sequence and a model, what is the probability of that observation sequence? (likelihood of those events occurring)

2. Given a set of observations and a model, what is the most likely sequence of states under which the observations hold?

3. How can the model parameters be adjusted to increase likelihood of a desired observation.

The paper the delves into the application of the Viterbi algorithm for forward and backward chaining. The math details are avoided in this post.

Reading 22 : HMM-based efficient sketch recognition

Citation:

Sezgin, Tevfik Metin, and Randall Davis. "HMM-based efficient sketch recognition." Proceedings of the 10th international conference on Intelligent user interfaces. ACM, 2005.

Publication Link

Summary:

This paper deals with the application of HMMs to sketch recognition, that is treating sketches a series of incremental states built upon every stage of interaction. The goal is to capture variations in drawing style for various figures, and break these down into a sequence of observations that can be used to build a probabilistic model.

Discussion:

The core details of HMMs are discussed in the next paper.

This paper primarily deals with the two phases of recognition using a HMM:

(1) Encoding: In this stage, strokes are converted to geometric primitives.

(2)Segmentation and Recogntion: We need to have identified object classes apriori, and constructed HMMs for them. Individual object recognition is done using forward chaining. To recognize in an overall scene,we end up with an optimization of problem of assigning models to subsequences such that there is maximum likelihood of some scene.

Saturday, October 24, 2015

Reading 21 : Ladder

Citation:
Hammond, Tracy, and Randall Davis. "LADDER, a sketching language for user interface developers." Computers & Graphics 29.4 (2005): 518-532.
Publication Link

Summary:
LADDER is a high level language that allows developers to write sketch definitions for complex shapes, that are comprised of primitives. The language can be used to model the components of a shape, and enforce some constraints on how these components are position with respect to each other. Other important features the languages allows are editing and display commands, which determine how a shape should react when edited, or how it should be displayed upon recognition.

Discussion:
The shape definition comprises of components, constraints, aliases, and editing and display behaviours. Shapes can be defined using shape groups, hierarchical shapes and abstract shapes. There is also an additional constraint available that allows recognition to be orientation specific.

Two import rules used are:
1. Jess Rule - All possible combinations of shapes are checked
2. Constraint Satisfaction Rule - The systems tries to satisfy all the given constraints while moving the points as less as possible

Reading 20: Recognizig multi-stroke primitives

Citation:

Hammond, Tracy, and Brandon Paulson. "Recognizing sketched multistroke primitives." ACM Transactions on Interactive Intelligent Systems (TiiS) 1.1 (2011): 4

Publication Link

Summary:

Most research in primitive recognition often assumes that a basic shape such as lines and polygons are are drawn with a single stroke. If multi strokes are allowed, the user is forced to draw under certain set of limitations. This paper seeks to eliminate, by giving the user full independence to complete one shape in multiple strokes, or multiple shapes in a single stroke. The main stages in this multi stroke recognition system are, graph building, identification of strongly connected components, merging components and removing false positives.

Discussion:

By analyzing the sketches of a set of users over a range of shapes, it was found that there were a lot differences in the styles of drawing, especially as far as number of strokes in a shape are concerned. Many people tend to make use of touch up strokes and continuity strokes, even while drawing a simple figure.

1. Use the stroke information to form a graph. The endpoints of strokes form th nodes, while an edge between two nodes occurs if either of the three conditions are satisifed:
-There is a stroke between the two points
- The gap between the two points is within a threshold of the average stroke length of the strokes passing through the end point
- The gap is smaller than the average width of the bounding box of the two strokes.

2. Tarjan's algorithm is used to extract the strongly connected components in this graph.

3. The merging of nodes identified done in the same component is done by removing peaks at the end points, and finding the point in the two strokes closest to each other, and creating a touch up stroke between them.

4. The final step is remove false positives.

Reading 19 : Sketch Recognition using Sounds

Citation:

Li, Wenzhe, and Tracy Anne Hammond. "Recognizing text through sound alone." Twenty-Fifth AAAI Conference on Artificial Intelligence. 2011.

Publication Link

Summary:

This paper makes use of the classical idea of capturing the sound information while making strokes, to recognize the stroke. The sound analysis is performed using multiple features extracted from the input signal of the sound. The stages in this recognition system are noise removal, feature extraction, template matching and dynamic time warping.

Discussion:

1. End Point Noise Removal:

Two techniques are used to remove end point noise. In the first technique, first the environmental noise is obtained as the signal energy for noise amplitudes over the first 150ms, after which the amplitudes are sample at every 45 ms, and checked against a threshold multiplied by the environmental noise, to determine the start and end points. The second method assumes the first 20ms as environmental noise and uses it to calculate a Gaussian probability. Every 10ms segments is checked, and marked as silent or valid. Finally, all the valid segments are merged. The mean amplitud value is used to normalize the signal.

2. Feature Extraction. Two main features are extracted, mean amplitude for each time frame, and mel-frequency cepstral coefficients.

3. Template matching: This is done using dynamic time warping. The DTW constraints defined are end point constraints, local continuity constraints, global path constraint and axis orientation.

Sunday, October 18, 2015

Reading 18: Geometric Basics

Citation:
Dr. Tracy Hammond

Summary:
This handout primarily covers some of the geometric basics required to have a good understanding of sketch recognition systems.

This includes fundemental trignometric rules, linear algebra, vector algebra.

Reading 17 : PaleoSketch

Citation:

Paulson, Brandon, and Tracy Hammond. "PaleoSketch: accurate primitive sketch recognition and beautification." Proceedings of the 13th international conference on Intelligent user interfaces. ACM, 2008.

Publication Link

Summary:

Paleosketch is one of the landmark techniques used for primitive shape recognition, i.e after early processing of corners is completed. Many current applications make use of Paleosketch as the technique to obtain individual shapes before passing it to higher level recognizers to obtain further groupings.

Discussion:

The pre-processing involves a few stages, such as computing corners (Using a three step process of selecting consecutive corners, merging similar corners in a neighborhood and finally selecting the highest curvature point in a neighborhood as the corner), plotting direction, curvature and speed graphs and using them to compute NDDE and DCR. We also detect closed shapes, overtracing and tails at this time.

Next, the strokes are passed to a series of shape detectors. Each shape detector subjects the stroke to a set of geometric constraints, returns the shape if all the constraints are passed. If the stroke does not pass any test, it is recursively broken down at the point of highest curvature. Later, the detector also tries to recombine primitive shapes by passing it to a secondary function (as dividing at point of highest curvature may not necessarily give individual shapes all the time)

The geometric tests are designed for the following shapes:

1. Line

2. Polyline

3. Circle

4. Ellipse

5. Curve

6. Spiral

7. Helix

(Refer paper for the exact details of these geometric constraints)

Finally, to select the final set of shapes the defines a stroke, a hierarchical ordering is used that tries to minimize the number of lines returned (i.e it favors arcs over lines, and complex shapes over heavily segmented polylines)

Reading 16 : Combining Corner Detectors

Citation:

Wolin, Aaron, Martin Field, and Tracy Hammond. "Combining corners from multiple segmenters." Proceedings of the Eighth Eurographics Symposium on Sketch-Based Interfaces and Modeling. ACM, 2011.

Publication Link

Summary:
This paper aims to consolidate all the previous attempts in the domain of corner-detection, and integrate all these algorithms to build a powerful corner detector. The accuracy of this detector was proven to surpass that of any individual detector. However, running this corner finding algorithm on the entire set of points proved to fail drastically, as there are sharp spikes in the error metric which prevent effective thresholding.

Discussion:

The first step is to run each of the 5 corner detectors individually, (Sezgin, Yu and Cai, ShortStraw, Douglas-Pecker, and Kim and Kim), and the union of all the obtained corners is taken, with duplicates removed.

Next, the a Sequential Backward Floating Selecion(SBFS) technique is used to remove corners one by one from this combined set. At every iteration, a removed corner can be added back if its found that the error decreases. The corner to be removed is determined as the one that causes the least error upon removal.

The error metric used in this case is Mean Squared Error, which is the mean distance between every point and the vertically closest point on the polyline formed by the remaining corners.

The corner removing stops when there a sharp spike in M.S.E (an elbow in the curve), whose slope exceeds that of a threshold. The remianing set of corners are returned as the final output.

To determine the threshold, we use a training set where the corners are already labeled. Next, the we take the medians of the slope before the elbow, and the slope after the elbow for these shapes, and construct the Gaussian distribution for the 2 sets. The points where the probability of these two models are the same, is taken as the confusion point, i.e the threshold.

Sunday, October 11, 2015

Reading 15 : IStraw

Citation:

Xiong, Yiyan, and Joseph J. LaViola Jr. "Revisiting ShortStraw: improving corner finding in sketch-based interfaces." Proceedings of the 6th Eurographics Symposium on Sketch-Based Interfaces and Modeling. ACM, 2009.

Publication Link

Summary:

This paper first reviews the Shortstraw algorithm, and then identifies its limitations and suggestions improvements. It also introduces curve detection, and provides experimental proof for the massive improvement obtained.

Discussion:

We mainly need to discuss the 6 limitations that occur in ShortStraw, and how iStraw overcomes them.

1. Initial Straws

Problem : Straw fails to compute the straw for the first 3 and last 3 points, which can turn out to miss corners in some cases.

Solution : iStraw introduces formalae to compute the straws for these points too.

2. Timing information

Problem : Shortstraw does not make any use of the stroke speed, which can actually be helpful in finding corners

Solution. Use speed maxima to identify corners.

3. Consecutive False Corners Avoidance

Problem: Shortstraw tends to delete some true corners, after which the colinearity test fails drastically.

Solution: Use a two-pass co-linearity test, with second Dpass having more relaxed thresholds.

4. Dynamic Threshold

Problem : Shortstraw does not take into account the length of line segments, while determining corners. This is an important distinguishing factor is some cases.

Solution : In the second pass of colinearity test, a dynamic threshold based on length of the segment, is introduced.

5. Impact of Sharp noise

6. iStraw introduces curve detection by treating every prospective corner as a point where two rays coverage, both originating from a fixed shift distance from the point in question. For a line, the angle formed a this point remains fairly constant, while it changes drastically for a curve.

Reading 14 : ShortStraw for Corner Detection

Citation:
Wolin, Aaron, Brian Eoff, and Tracy Hammond. "ShortStraw: A Simple and Effective Corner Finder for Polylines." SBM. 2008.

Publication Link: http://www.researchgate.net/profile/Tracy_Hammond/publication/220772398_ShortStraw_A_Simple_and_Effective_Corner_Finder_for_Polylines/links/0deec529f76e58d523000000.pdf

Summary:
This paper presents an alternative algorithm for corner detection, where the earlier work has been done by Sezgin, Yu and Cai. The main phases of this algorithm are Resampling, Bottom up Phase and Top Down Phase. The overall result was a massive improvement over the previous two techniques, in both corner accuracy as well as all-or-nothing accuracy. The biggest drawback of this algorithm however, is its failure to detect obtuse corners.

Discussion:
The main takeaways here are the 3 phases of the Shortstraw algorithm:

1. Resampling
The first step is to obtain a resampled representation by using an interspacing distance.

2. Bottom-up

A straw for a point is computed as the distance between the endpoints of a window containing the point.(window size is generally 3 on either side of the point). The points having short straw length are determined as corners, using a threshold of of (median * 0.95)

3. Top-Down
(i) First, it checks to see if every segment between two consecutive corners passes a line test (chord distance/path distance)

(ii) Removing false negatives: If any two corners fail the line test, it is assumed that there must a missing corner, approximately between them, and the threshold is relaxed to find it.

(ii)Removing false poistives: Checking for co-linearity among triplets of consecutive corners.

Reading 13 : Domain Independant Sketch Recognition

Citation:

Yu, Bo, and Shijie Cai. "A domain-independent system for sketch recognition." Proceedings of the 1st international conference on Computer graphics and interactive techniques in Australasia and South East Asia. ACM, 2003.

Publication Link : http://dl.acm.org/citation.cfm?id=604499

Summary:

The main approach used in this paper is to combine primitive shape recognition with the early processing (as opposed to having a separate pre-processing stage as with Sezgin). The vertex detection phase is an implicit part of the stroke detection. The paper then explains its line segment approximation and curve approximation algorithms, along with some of the post processing features. The concept used for determining the fit of strokes against primitives is called 'feature area verification'.

Discussion:
The main takeaways from this paper are the approaches to detect lines, circles, arcs and curves.

Line Segment Approximation:

Try to fit a best fit line, and compute the feature area using this candidate line. The stroke area is taken to the length of the stroke into its thickness. If the ratio of the feature area to stroke area is less then 1, the stroke is recorded as a line

Circle Approximation:

Check if the direction graph of the stroke can be approximated as a circle. If so, take a candidate circle that takes the center of the stroke's bounding box, with radius as mean of distances between the center and the stroke point. Feature area is then applied to try to fit the circle. If the direction graph extends beyond 2pi, then it is checked for over-tracing and helical strokes.

Arc Approximation:

1. Check if direction curve can be approx. using line

2. Find the perpendicular bisector of arc, and find the find the circle on which the end points, and this intersection points lie.

3. Follow the same procedure as for circles

Handling Self-Intersection Strokes:

Uses two heuristics and compares results

(i)Simple is better

(ii) Specific is better

Finally, the paper also covers some of the post processing routines, including:

-Elimination of false and redundant elements introduced by noise

-Adjust layout of elements

-Basic object recognition

Saturday, September 26, 2015

Reading 12 : Early Processing for Sketch Understanding

Citation:

Sezgin, Tevfik Metin, and Randall Davis. "Early processing in sketch understanding." Unpublished Master’s Thesis, Massachusetts Institute of Technology (2001).

Publication Link

Summary:

The main aim of this paper is to deal with the low level stages in understanding a freehand sketch that does not restrict the user. It mainly deals with the use of curvature in the neighbourhood of a segment, as well as pen speed to identify corners. It also describes the user Bezier cures to model the curves detected in a sketch

Discussion:

The three stages in the pre-processing of a stroke are :

1. Stroke Approxmiation

(i) Vertex Detection

While plotting the arc against direction, it is observed the maximum curvature, and minimum speed occurs at the vertices. Thresholds determined by average based filtering are used prevent occurrence of too many false positives. A combination of speed and curvature data helps eliminate the shortcoming of either.

Generating Hybrid Fits:
The initial hybrid fit is the intersection of Fd and Fs. The candidate list Fd is computed based on scaled magnitude of curvature in a local neighborhood around the point. Successive hybrid fits are done by performing a kind of merge sort on the remaining members of Fd and Fs, using the orthogonal distance squared. The final set of vertices are the selected by choosing the fewest number of vertices that also have an error below a threshold.

(ii) Handling Curves
The ratio between arc length and Euclidean distance is used to approximate the distance. Curvature is represented using a Bezier curve. But since fitting a Berzie curve would require solving a 6th degree polynomial equation, we resort to using piece linear curves with error metrics. If the curve still cannot be fit with minimal error, it is subdivided into smaller curves.

2. Beautification
Mainly consist of smoothing of slopes using a clustering approach

3. Object recognition is done using hand tailored templates that examine various simple properties.

The system was evaluated by users drawn from a diverse background, who found the system impressive, and highly accurate in terms of beautifying and understanding their diagrams.

Reading 11 : Combining geometric and gesture-based features

Citation:
Paulson, Brandon, et al. "What!?! no Rubine features?: using geometric-based features to produce normalized confidence values for sketch recognition." HCC Workshop: Sketch Tools for Diagramming. 2008.

Publication Link

Summary:

The main aim of this paper is to investigate a combination of geometric features (how it looks), and gesture based techniques to build a highly accurate classifier, and produce normalized confidence values. The authors than proceed to finally establish how gesture based features turn out to be insignificant when an optimal set of features is computed using a greedy algorithm to maximize accuracy.

Discussion:

Geometric based recognisers focus on determining the error between a sketched shape and its ideal version using a series of geometric tests and formulas. A total of 44 features were selected for evaluation (31 geometric, 13 Rubine's), and a dataset consisted of 90 samples each from 20 users was aggregated.

The relevant features are determined using a greedy, sequential forward selection algorithm. Optimal features are the ones that that yield highest accuracy during the SFS (Subset feature selection) process. Using a 50-50 split (same as PaleoSketch), the quadratic classifier was found to perform poorly when the entire set of features was used. When a an optimal set of features (obtained thorough multi-fold validation) was used, the quadratic classifier was able to match Paleosketch.

The advantages of this system over Paleosketch is that it is easier to code, and is computationally faster since it makes of an optimal subset of features. Since the data is partitioned userwise, the subset selection algorithm tends to bias towards user independence, and most gesture-based features were found to be insignificant.

Reading 10 : Visual Similarity of Pen Gestures

Citation:

Long Jr, A. Chris, et al. "Visual similarity of pen gestures." Proceedings of the SIGCHI conference on Human Factors in Computing Systems. ACM, 2000.

Publication Link

Summary:

This paper deals with an investigation into gesture similarity, and explains the methods used to come up with a set of the best 22 features to uniquely identify a gesture in feature space, by making use of the notion of perceptual similarity.

Discussion:

Based on experiments, it was found that the logarithm of quantitative correlates with similarity. It makes uses of MDS to translate the perceived distances between gestures, onto a multi-dimensional space.

What exactly is MDS?

MDS pictures the structure of a set of objects from data that approximate the distances between pairs of the objects. The data points are arranged in this space so that the distances between pairs of points have the strongest possible relation to the similarities among the pairs of objects. That is, two similar objects are represented by two points that are close together, and two dissimilar objects are represented by two points that are far apart. From a slightly more technical point of view, what MDS does is find a set of vectors in p-dimensional space such that the matrix of euclidean distances among them corresponds as closely as possible to some function of the input matrix according to a criterion function called stress. Normally, MDS is used to provide a visual representation of a complex set of relationships that can be scanned at a glance. Since maps on paper are two-dimensional objects, this translates technically to finding an optimal configuration of points in 2-dimensional space. However, the best possible configuration in two dimensions may be a very poor, highly distorted, representation of your data. If so, this will be reflected in a high stress value. When this happens, you have two choices: you can either abandon MDS as a method of representing your data, or you can increase the number of dimensions.

Experiment 1 :

Participants were presented all possible triads of 14 gestures, and asked to identify the most different one in each triad. These response were recorded and use to compute a distance matrix between gestures. The goals of this experiment were to identify the optimal number of features needed to represent the gesture, and to produce a model of gesture similarity.

Experiment 2:

3 more sets of 9 gestures each were added, and use evaluate the relationship between independent features.

Bottom Line : This is a very poorly written paper, and I really need to read this again to actually understand it.

Reading 9 : Introduction to Gesture Recognition

Citation:
Chapter 2 : Gesture Recognition by Features - Tracy Hammond

Summary:
This chapter goes into the specifics behind each of Rubine's features, and presents examples of gestures and their corresponding feature set, and also a brief overview of Long's features. An important aspect of gesture recognition is that path of a gesture must be the same for recognition to work.

Discussion:
In this paper, a stroke is defined as a set of <x,y,t> points.

Features 1 and 2: The cosine and sine of starting angle (taken between the first and third point) to avoid getting the same point due to high sampling rate). We require both the sine and cosine as only together, with their sines, can we get an exact idea of the direction, since the angle is not a continuous feature at axe boundaries)

Features 3 and 4: Length and Angle of the bounding box

Feature 5 : Start and Endpoint distance

Feature 6 and 7: Cosine and sine of angle between start point and end point

Feature 8 : Total Stroke Length

Feature 9: Total Rotation

Feature 10: Total Absolute Rotation

Feature 11 : Sharpness

Feature 12 : Maximum speed

Feature 13: Time taken

Long's features are discussed in the next paper.

Reading 8 : Rubine's Features

Citation:
Rubine, Dean. Specifying gestures by example. Vol. 25. No. 4. ACM, 1991.
Publication Link

Summary:
Rubine's feautres are often considered the pioneering work in gesture recognition. This paper explains the intuition behind these gestures, and the implementation of a toolkit called GRANDMA. The paper details the MVC architecture used in the toolkit, the statistic features of a gesture, how a classifier is training and also eager and multi-finger gestures.

Discussion:
The main takeaway from this paper are the eleven features (which will be discussed in the next reading) , and the approach to train a linear classifier using these features.

The main aim is to find the class that maximizes the value of v according to the following relation:

The computation of the weights for each feature of each class is done with the following relations:

The paper also proceeds to define a probabilistic heuristic for rejection and also describes the Mahalonobis distance which measures how many standard deviations away a gesture is from a given class.

Reading 7 : $1 Recognizer

Citation:
Wobbrock, Jacob O., Andrew D. Wilson, and Yang Li. "Gestures without libraries, toolkits or training: a $1 recognizer for user interface prototypes."Proceedings of the 20th annual ACM symposium on User interface software and technology. ACM, 2007.
Publication Link

Summary:
This paper presents an entirely new view on gesture recognition, and openly questions all the universally accepted methods up till now, pointing out that they are slow, complicated, inaccessible and domain restricted. A new technique is presented, and is evaluated against its counterparts to prove that it does equally well if not better. The paper also claims that the entire algorithm can be implemented in 100 lines of codes, making it very easy for beginners to pick up and use in their work.

Discussion:
The aim of this paper to present a simple yet effective gesture recognition algorithm that does not delve into pattern recognition algorithms.

The 4-step algorithm used is:
1. Resample the point path (generally 64 equidistant points)
2. Rotating once based on indicative angle (angle between centroid and first point)
3. Scale (non-uniformly onto a square) and Translate (to origin)
4. Find the optimal angle for the best score (Finds average distance between corresponding points in candidate and the template being checked against)

To maintain rotation invariance, it was found that for similar gestures, the indicate angle was almost accurate, and a simple hill climbing algorithm from there would take us to the global minima (local minima effect is low). However, this ended up in too many rotation steps in the case of dissimilar gestures. GSS(Golden Section Search) using the Golden Ratio is used to amend this at the cost of some performance for similar gestures.

Limitations of $1 Recognizer:
- 1D figures
- Overdependent on non-uniform scaling
- Assumes rotation and translation invariance, and hence cannot capture specifics in these aspects
- Does not take time into account

The $1 recognizer was compared against the DTW, which was inefficient and scaled poorly, as well as Rubine's method which proved to require too many examples to show some acceptable accuracy. The $1 recognizer on the other hand, was much faster and achieved accuracy with minimal training requirement.

The factors considered for experimentation are:
1. No. of templates for training
2. Gesture speed
3. Separation boundaries between competing templates

Saturday, September 19, 2015

Reading 6 : Using Entropy to distinguish between text and shapes

Citation:
Bhat, Akshay, and Tracy Hammond. "Using Entropy to Distinguish Shape Versus Text in Hand-Drawn Diagrams." IJCAI. Vol. 9. 2009.

Publication Link

Summary:
While a lot of systems exist that can recognize shapes and text effectively, very few can distinguish when it comes to diagrams containing both text and shapes, such as those in UML charts. The idea behind this paper is something that comes very intuitively when we look at alphabets. The strokes and points are much more random than most shapes that have some sort of order or sequence. This randomness is characterized mathematically using entropy, and is used to train classifiers that help determine thresholds for shapes and text. One important feature is the use of confidence values which assert how confidently the classifier makes its decision.

Discussion:
Entropy is the expected value of information in a message. Ironically, the more random it is, the more information it is said to have.

The steps in this implementation are :

Form stroke groups using spatial and temporal proximity
Resample strokes to smoothen angles and represent the diagram using equidistant points
Take every 3 successive pair of equidistant points, and map the angle between them to a corresponding entropy alphabet. (Basically, the if angle changes quickly, the alphabet assigned keeps changing too)
Calculate entropy using Shanons formula
Obtain thresholds for the decision
Classifier can decide whether the diagram is Text, Shape or Unclassified using a threshold obtained from training sets.
When thresholding isn't enough, and a decision has to be made, it can be made by specifying a confidence value.
The thresholds are obtained using a 10-fold validation process.
Some of the results were the decrease in accuracy when classifier was forced to classify a diagram into either category, and the good performance in new domains when using thresholds from other classifier methods.

Limitations:

Dashed Lines (because they are grouped as single stroke)
Dots filled by concentric shading (repeated circular strokes)
Resistor shapes (high degree of randomness)

Most importantly, how does this system deal with alphabets such as O, V , L and T which are inherently going to have a consistent sequence of entropy alphabets?

Reading 5 : Symbol Recognizer

Citation:
Kara, Levent Burak, and Thomas F. Stahovich. "An image-based, trainable symbol recognizer for hand-drawn sketches." Computers & Graphics 29.4 (2005): 501-517.
Publication Link

Summary:
This paper discusses a technique developed to match sketches to a standard set of defined templates that are learnt from single prototype examples. Such a recognition tool has a vast set of applications. The main contributions of this paper are the use of polar transforms for preprocessing and filtering, as well as the use of multiple classifiers to achieve high recognition accuracy. The results were finally evaluated by performing a series of tests, where it was found that high accuracy was achieved even with very little training examples.

Discussion:
The main takeaway from this paper are the stages involved in attaining a match for a sketch.

The stages are:

Preprocessing: Done by producing a rasterized image in 48 x 48 grid to reduce amount of data while retaining the characteristics of an image
Template-Matching: Its a featureless approach (using purely geometry of figure) to characterize similarity. It is done by comptuing Hausdoroff distance, Tanimoto coefficient and Yule's Coefficeint.
Hausdoroff distance is susceptible to outliers, hence we often choose MHD or partial(kth) Hausdoroff distance.
There are Tanimoto coefficients defined to measure the coincidence of white pixels, as well as that of black pixels. They are combined using a constant which determines whether the figure is biased towards black or white pixels.
As figures always suffer from misalignmens, thresholded matching criteria is used that relaxes the condition for two pixels to be considered as overlapping.
In a distance transform, each pixel encodes its distance from the nearest black pixel (a kind of nearest neighbour function). It basically allows offline computation during pre-processing, to enable to processing to be simply a look up operation.
Classifiers are combined by eliminating the range difference using normalization and by standardizing the output.
Rotations are handled by conversion to polar coordinates. The weighted centroid is often chosen as the origin. A weighting function is used to influence data near the centroid as it can be too sensitive to changes in the centroid. The MHD is then used to find the most similar alignments of the figure.
The polar transform phase also functions as a filtering phase prior to recognition, as it is immune to false negatives, and manages to remove 90% of possible definitions before sending it forth to the processing stage.
The tests covered multiple defintions of symbols as well user dependencies to unravel all possible performance constraints.

Limitations

While this system is rotation, translation and scale invariant, it degrades when non-uniform scaling is introduced, and is unable to tell apart from similar figures that have slightly varying dimensions
The method of sampling tends to be lossy in some cases where the figure has a lot of detail