Recognizing Sketch Recognition: September 2015

Saturday, September 26, 2015

Reading 12 : Early Processing for Sketch Understanding

Citation:

Sezgin, Tevfik Metin, and Randall Davis. "Early processing in sketch understanding." Unpublished Master’s Thesis, Massachusetts Institute of Technology (2001).

Publication Link

Summary:

The main aim of this paper is to deal with the low level stages in understanding a freehand sketch that does not restrict the user. It mainly deals with the use of curvature in the neighbourhood of a segment, as well as pen speed to identify corners. It also describes the user Bezier cures to model the curves detected in a sketch

Discussion:

The three stages in the pre-processing of a stroke are :

1. Stroke Approxmiation

(i) Vertex Detection

While plotting the arc against direction, it is observed the maximum curvature, and minimum speed occurs at the vertices. Thresholds determined by average based filtering are used prevent occurrence of too many false positives. A combination of speed and curvature data helps eliminate the shortcoming of either.

Generating Hybrid Fits:
The initial hybrid fit is the intersection of Fd and Fs. The candidate list Fd is computed based on scaled magnitude of curvature in a local neighborhood around the point. Successive hybrid fits are done by performing a kind of merge sort on the remaining members of Fd and Fs, using the orthogonal distance squared. The final set of vertices are the selected by choosing the fewest number of vertices that also have an error below a threshold.

(ii) Handling Curves
The ratio between arc length and Euclidean distance is used to approximate the distance. Curvature is represented using a Bezier curve. But since fitting a Berzie curve would require solving a 6th degree polynomial equation, we resort to using piece linear curves with error metrics. If the curve still cannot be fit with minimal error, it is subdivided into smaller curves.

2. Beautification
Mainly consist of smoothing of slopes using a clustering approach

3. Object recognition is done using hand tailored templates that examine various simple properties.

The system was evaluated by users drawn from a diverse background, who found the system impressive, and highly accurate in terms of beautifying and understanding their diagrams.

Reading 11 : Combining geometric and gesture-based features

Citation:
Paulson, Brandon, et al. "What!?! no Rubine features?: using geometric-based features to produce normalized confidence values for sketch recognition." HCC Workshop: Sketch Tools for Diagramming. 2008.

Publication Link

Summary:

The main aim of this paper is to investigate a combination of geometric features (how it looks), and gesture based techniques to build a highly accurate classifier, and produce normalized confidence values. The authors than proceed to finally establish how gesture based features turn out to be insignificant when an optimal set of features is computed using a greedy algorithm to maximize accuracy.

Discussion:

Geometric based recognisers focus on determining the error between a sketched shape and its ideal version using a series of geometric tests and formulas. A total of 44 features were selected for evaluation (31 geometric, 13 Rubine's), and a dataset consisted of 90 samples each from 20 users was aggregated.

The relevant features are determined using a greedy, sequential forward selection algorithm. Optimal features are the ones that that yield highest accuracy during the SFS (Subset feature selection) process. Using a 50-50 split (same as PaleoSketch), the quadratic classifier was found to perform poorly when the entire set of features was used. When a an optimal set of features (obtained thorough multi-fold validation) was used, the quadratic classifier was able to match Paleosketch.

The advantages of this system over Paleosketch is that it is easier to code, and is computationally faster since it makes of an optimal subset of features. Since the data is partitioned userwise, the subset selection algorithm tends to bias towards user independence, and most gesture-based features were found to be insignificant.

Reading 10 : Visual Similarity of Pen Gestures

Citation:

Long Jr, A. Chris, et al. "Visual similarity of pen gestures." Proceedings of the SIGCHI conference on Human Factors in Computing Systems. ACM, 2000.

Publication Link

Summary:

This paper deals with an investigation into gesture similarity, and explains the methods used to come up with a set of the best 22 features to uniquely identify a gesture in feature space, by making use of the notion of perceptual similarity.

Discussion:

Based on experiments, it was found that the logarithm of quantitative correlates with similarity. It makes uses of MDS to translate the perceived distances between gestures, onto a multi-dimensional space.

What exactly is MDS?

MDS pictures the structure of a set of objects from data that approximate the distances between pairs of the objects. The data points are arranged in this space so that the distances between pairs of points have the strongest possible relation to the similarities among the pairs of objects. That is, two similar objects are represented by two points that are close together, and two dissimilar objects are represented by two points that are far apart. From a slightly more technical point of view, what MDS does is find a set of vectors in p-dimensional space such that the matrix of euclidean distances among them corresponds as closely as possible to some function of the input matrix according to a criterion function called stress. Normally, MDS is used to provide a visual representation of a complex set of relationships that can be scanned at a glance. Since maps on paper are two-dimensional objects, this translates technically to finding an optimal configuration of points in 2-dimensional space. However, the best possible configuration in two dimensions may be a very poor, highly distorted, representation of your data. If so, this will be reflected in a high stress value. When this happens, you have two choices: you can either abandon MDS as a method of representing your data, or you can increase the number of dimensions.

Experiment 1 :

Participants were presented all possible triads of 14 gestures, and asked to identify the most different one in each triad. These response were recorded and use to compute a distance matrix between gestures. The goals of this experiment were to identify the optimal number of features needed to represent the gesture, and to produce a model of gesture similarity.

Experiment 2:

3 more sets of 9 gestures each were added, and use evaluate the relationship between independent features.

Bottom Line : This is a very poorly written paper, and I really need to read this again to actually understand it.

Reading 9 : Introduction to Gesture Recognition

Citation:
Chapter 2 : Gesture Recognition by Features - Tracy Hammond

Summary:
This chapter goes into the specifics behind each of Rubine's features, and presents examples of gestures and their corresponding feature set, and also a brief overview of Long's features. An important aspect of gesture recognition is that path of a gesture must be the same for recognition to work.

Discussion:
In this paper, a stroke is defined as a set of <x,y,t> points.

Features 1 and 2: The cosine and sine of starting angle (taken between the first and third point) to avoid getting the same point due to high sampling rate). We require both the sine and cosine as only together, with their sines, can we get an exact idea of the direction, since the angle is not a continuous feature at axe boundaries)

Features 3 and 4: Length and Angle of the bounding box

Feature 5 : Start and Endpoint distance

Feature 6 and 7: Cosine and sine of angle between start point and end point

Feature 8 : Total Stroke Length

Feature 9: Total Rotation

Feature 10: Total Absolute Rotation

Feature 11 : Sharpness

Feature 12 : Maximum speed

Feature 13: Time taken

Long's features are discussed in the next paper.

Reading 8 : Rubine's Features

Citation:
Rubine, Dean. Specifying gestures by example. Vol. 25. No. 4. ACM, 1991.
Publication Link

Summary:
Rubine's feautres are often considered the pioneering work in gesture recognition. This paper explains the intuition behind these gestures, and the implementation of a toolkit called GRANDMA. The paper details the MVC architecture used in the toolkit, the statistic features of a gesture, how a classifier is training and also eager and multi-finger gestures.

Discussion:
The main takeaway from this paper are the eleven features (which will be discussed in the next reading) , and the approach to train a linear classifier using these features.

The main aim is to find the class that maximizes the value of v according to the following relation:

The computation of the weights for each feature of each class is done with the following relations:

The paper also proceeds to define a probabilistic heuristic for rejection and also describes the Mahalonobis distance which measures how many standard deviations away a gesture is from a given class.

Reading 7 : $1 Recognizer

Citation:
Wobbrock, Jacob O., Andrew D. Wilson, and Yang Li. "Gestures without libraries, toolkits or training: a $1 recognizer for user interface prototypes."Proceedings of the 20th annual ACM symposium on User interface software and technology. ACM, 2007.
Publication Link

Summary:
This paper presents an entirely new view on gesture recognition, and openly questions all the universally accepted methods up till now, pointing out that they are slow, complicated, inaccessible and domain restricted. A new technique is presented, and is evaluated against its counterparts to prove that it does equally well if not better. The paper also claims that the entire algorithm can be implemented in 100 lines of codes, making it very easy for beginners to pick up and use in their work.

Discussion:
The aim of this paper to present a simple yet effective gesture recognition algorithm that does not delve into pattern recognition algorithms.

The 4-step algorithm used is:
1. Resample the point path (generally 64 equidistant points)
2. Rotating once based on indicative angle (angle between centroid and first point)
3. Scale (non-uniformly onto a square) and Translate (to origin)
4. Find the optimal angle for the best score (Finds average distance between corresponding points in candidate and the template being checked against)

To maintain rotation invariance, it was found that for similar gestures, the indicate angle was almost accurate, and a simple hill climbing algorithm from there would take us to the global minima (local minima effect is low). However, this ended up in too many rotation steps in the case of dissimilar gestures. GSS(Golden Section Search) using the Golden Ratio is used to amend this at the cost of some performance for similar gestures.

Limitations of $1 Recognizer:
- 1D figures
- Overdependent on non-uniform scaling
- Assumes rotation and translation invariance, and hence cannot capture specifics in these aspects
- Does not take time into account

The $1 recognizer was compared against the DTW, which was inefficient and scaled poorly, as well as Rubine's method which proved to require too many examples to show some acceptable accuracy. The $1 recognizer on the other hand, was much faster and achieved accuracy with minimal training requirement.

The factors considered for experimentation are:
1. No. of templates for training
2. Gesture speed
3. Separation boundaries between competing templates

Saturday, September 19, 2015

Reading 6 : Using Entropy to distinguish between text and shapes

Citation:
Bhat, Akshay, and Tracy Hammond. "Using Entropy to Distinguish Shape Versus Text in Hand-Drawn Diagrams." IJCAI. Vol. 9. 2009.

Publication Link

Summary:
While a lot of systems exist that can recognize shapes and text effectively, very few can distinguish when it comes to diagrams containing both text and shapes, such as those in UML charts. The idea behind this paper is something that comes very intuitively when we look at alphabets. The strokes and points are much more random than most shapes that have some sort of order or sequence. This randomness is characterized mathematically using entropy, and is used to train classifiers that help determine thresholds for shapes and text. One important feature is the use of confidence values which assert how confidently the classifier makes its decision.

Discussion:
Entropy is the expected value of information in a message. Ironically, the more random it is, the more information it is said to have.

The steps in this implementation are :

Form stroke groups using spatial and temporal proximity
Resample strokes to smoothen angles and represent the diagram using equidistant points
Take every 3 successive pair of equidistant points, and map the angle between them to a corresponding entropy alphabet. (Basically, the if angle changes quickly, the alphabet assigned keeps changing too)
Calculate entropy using Shanons formula
Obtain thresholds for the decision
Classifier can decide whether the diagram is Text, Shape or Unclassified using a threshold obtained from training sets.
When thresholding isn't enough, and a decision has to be made, it can be made by specifying a confidence value.
The thresholds are obtained using a 10-fold validation process.
Some of the results were the decrease in accuracy when classifier was forced to classify a diagram into either category, and the good performance in new domains when using thresholds from other classifier methods.

Limitations:

Dashed Lines (because they are grouped as single stroke)
Dots filled by concentric shading (repeated circular strokes)
Resistor shapes (high degree of randomness)

Most importantly, how does this system deal with alphabets such as O, V , L and T which are inherently going to have a consistent sequence of entropy alphabets?

Reading 5 : Symbol Recognizer

Citation:
Kara, Levent Burak, and Thomas F. Stahovich. "An image-based, trainable symbol recognizer for hand-drawn sketches." Computers & Graphics 29.4 (2005): 501-517.
Publication Link

Summary:
This paper discusses a technique developed to match sketches to a standard set of defined templates that are learnt from single prototype examples. Such a recognition tool has a vast set of applications. The main contributions of this paper are the use of polar transforms for preprocessing and filtering, as well as the use of multiple classifiers to achieve high recognition accuracy. The results were finally evaluated by performing a series of tests, where it was found that high accuracy was achieved even with very little training examples.

Discussion:
The main takeaway from this paper are the stages involved in attaining a match for a sketch.

The stages are:

Preprocessing: Done by producing a rasterized image in 48 x 48 grid to reduce amount of data while retaining the characteristics of an image
Template-Matching: Its a featureless approach (using purely geometry of figure) to characterize similarity. It is done by comptuing Hausdoroff distance, Tanimoto coefficient and Yule's Coefficeint.
Hausdoroff distance is susceptible to outliers, hence we often choose MHD or partial(kth) Hausdoroff distance.
There are Tanimoto coefficients defined to measure the coincidence of white pixels, as well as that of black pixels. They are combined using a constant which determines whether the figure is biased towards black or white pixels.
As figures always suffer from misalignmens, thresholded matching criteria is used that relaxes the condition for two pixels to be considered as overlapping.
In a distance transform, each pixel encodes its distance from the nearest black pixel (a kind of nearest neighbour function). It basically allows offline computation during pre-processing, to enable to processing to be simply a look up operation.
Classifiers are combined by eliminating the range difference using normalization and by standardizing the output.
Rotations are handled by conversion to polar coordinates. The weighted centroid is often chosen as the origin. A weighting function is used to influence data near the centroid as it can be too sensitive to changes in the centroid. The MHD is then used to find the most similar alignments of the figure.
The polar transform phase also functions as a filtering phase prior to recognition, as it is immune to false negatives, and manages to remove 90% of possible definitions before sending it forth to the processing stage.
The tests covered multiple defintions of symbols as well user dependencies to unravel all possible performance constraints.

Limitations

While this system is rotation, translation and scale invariant, it degrades when non-uniform scaling is introduced, and is unable to tell apart from similar figures that have slightly varying dimensions
The method of sampling tends to be lossy in some cases where the figure has a lot of detail

Sunday, September 13, 2015

Reading 4 : K-Sketch

Citaiton:
K-sketch: A kinetic sketch pad for novice animators
RC Davis, B Colwell, JA Landay
Proceedings of the SIGCHI Conference on Human Factors in Computing System
Publication Link

Summary:
This paper discusses a scratchpad, K-Sketch that has been developed keeping in mind the primary needs of animators, while maintaining a simplistic interface. The usage scenarios for this application were compiled after a series of interviews with both professional animators and amateurs. The paper then discusses how the final set of possible animations was arrived at by modelling the supported animations against the number of steps and percentage of scenarios that were covered in that many steps, as an optimization problem. It then details the functionalities available in K-Sketch, and its trial run with users coming from a variety of backgrounds. Finally, the implications, results and comparisons of K-Sketch with conventional animation tools are discussed.

Discussion:
The main takeaways from this paper are:

Surveying the needs of target users is always the first step of building any user interface.
There is a trade-off between allowing a large variety of functions and minimizing then number of steps.
Smaller learning curve is often preferred to perfection when it comes to learning something new or building informal mock ups.
Some of the core functionalities include coordinated motions, overwriting and adding motion, ghosts, copy pasting and recording.

Sunday, September 6, 2015

Reading 3 : iCanDraw?

Citation:
iCanDraw: using sketch recognition and corrective feedback to assist a user in drawing human faces
D Dixon, M Prasad, T Hammond
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Publication Link

Summary:
This paper describes the second version of iCanDraw, which is essentially a software to help users learn to draw faces with step-wise guidance and live feedback. It discusses the the amends made to the previous version, which failed to yield a good experience or proper learning. At a top level, the main aim of this research is to help Left-Brained people (poor artists who draw based on a symbolic representation) adopt a gradual transition to the R-Mode (i.e Right-brained people who can perceive facial features and represent the drawing more realistically). The paper first discusses the user interface and options available to the user, and then proceeds to discuss the kind of corrective feedback mechanisms, evaluation metrics as well as the techniques which actually perform the comparison of a drawing to the actual image, by extracting a face template from both.

Discussion:
The first iteration of iCanDraw highlighted the conceptual mistakes in the program design, such as over-reliance on 'prep-sketching' and 'overtracing' to detect the user's intentions. This did not apply well for beginner drawers. The visual feedback was also not intuitive enough.

The following are some of the aspects of the user interface :

Drawing area
Drawing instructions
Reference Image - It is manipulated at each step to assist in the R-Mode shift
Corrective Feedback

Next, we discuss the implementation details outlined in the paper.

Pre-processing reference imagery: Face recognition library is used to extract 40 points, and few are manually added to obtain 53 points.
Setting the example template : Example template is centered and rescaled until head is drawn
Processing of a stroke: Done using PaleoSketch
New strokes are analyzed according to their positions as per the facial features of the features in that step using K-nearest neighbours algorithms (where k = 3) and classified accordingly. All strokes drawn outside an example template are treated as ignore spaces.

Determining Correctness of Image: This is done using a sliding window that compares dissimilarity with the template. The average of all these distances is used to isolate windows that have a larger than acceptable standard deviation.

The paper also discusses a few principles that drawing through sketch recognition needs to follow:

Master template must be accurate
Application should complement the R-mode shift
Feedback should not be intrusive towards the creative process
Feedback should direct user towards final outcome
Erased strokes should be temporarily be visible
User should be able to override the applications suggestion,
Sketch reco algo should be adaptive
Support with the drawing area
Support artistic techniques such as shading

Friday, September 4, 2015

Reading 2 : Sketch Recognition Algorithms for Comparing Complex and Unpredictable Shapes

Citation:
Sketch recognition algorithms for comparing complex and unpredictable shapes

M Field, S Valentine, J Linsey, T Hammond

IJCAI, 2436-2

Summary:

This paper describes the methods by which unusually shaped bodies are identified and compared. The identification is basically done by constantly trying to combine strokes to form larger shapes. The rest of the paper goes briefly over truss recognition and comparision techniques that were discussed in the Reading 1. This support for unspecified type of diagrams makes the software very flexible and adapt to any new type of diagrams that an instructor may want to assign questions on.

Discussion:

One of the main ideas used in this paper, is that the computation is done online. This basically means that every time a new stroke is made, only combinations including the new stroke need to be analyzed, as combinations excluding it are assumed to have already been performed.

The main concepts discussed here are on identifying and comparing bodies of complex/undefined shapes:

1. Body Identification

This is done by checking for a closed shape first, whereby an arbitrary end point is selected and we traverse along closest end points, until we reach the last segments second point. This point should be close to the point first selected. The notion of close enough is defined as 9% of the total path length of the strokes.

2. Body Comparision

This step uses a modified version of the template comparison algorithms by Kara and Stockwich(2005), but uses stroke points instead of rasterizing an image. The steps followed here are :

(i)Resample the shape to 64 evenly spaced points (to prevent effect of stroke speed)

(ii)The shapes are scaled uniform to bring them to common bounding box for comparison

(iiI) Now we calculate three parameters - Hausdroff Distance, Modified Hausdroff Distance, and Tanimoto Coefficient. (note that nab is different from nba)
(iv) We use a combination of the three metrics for comparison, and set 0.65 as an acceptance threshold to determine if students sketch matches that of instructor.

Reading 1 : Mechanix and Free-Body Diagrams

Citation:

Mechanix: a sketch-based tutoring and grading system for free-body diagrams

S Valentine, F Vides, G Lucchese, D Turner, H Kim… - AI Magazine, 2012

Summary:

Scalar mechanics, beam stress analysis, etc. are some of the important techniques that every civil and mechanical engineer needs to be well versed in. This paper discussed Mechanix, an software that tries to teach these skills to students, allowing them to hand-draw figures, analyzes these strokes and provide intuitive annotations and feedback to help students draw a diagram correctly and understand the underlying concepts. The paper first discusses past work in this area, all of which have only been partial solutions, or do not give students the liberty to hand draw diagrams. Mechanix blends the traditional and authentic practice of hand-drawing diagrams, with a touch of artificial intelligence, to create a rich and visually enriching learning experience for students. The authors then document a myriad of interaction methods and geometric recognition techniques that have been used in Mechanix for features such as truss recognition, answer correction, scribbling, etc. It also discusses the overall distributed architecture of the system, and how students' solutions are evaluated securely at a server location, against the predefined sketches and equations designed by the instructor.

Discussion:

As I have used Mechanix before and am familiar with the user experience, the main takeaways from this paper are a couple of things - how Mechanix is better than its predecessors, and what algorithms does Mechanix use?

Prior to Mechanix:

1. Sketch-Worksheets: Generates raw facts about sketches, doesn't actually understand diagrams. Heavily dependent on the instructor to filter these facts.

2. Wintruss/Bridge Architect: No hand-drawn input, and offer only partially completed solutions

3. Andes Physics/Free Body Diagram Assistant: Pick and use shapes, no actual drawing, which is bad in the long run.

4. Newton's Pen: Forcing drawing in a very specific order.

Algorithms that power the various features of Mechanix:

1. Scribbling

Scribbles are understood as combinations of strokes within a small time interval. The algorithm also uses the region of scribbling whether to erase a single stroke, or a partial or full shape.

2. Steps in Geometric Recognition: This phase uses a kind of agglomerative clustering algorithm, whereby clusters of shapes are formed based on some pre-understood domain knowledge on the various possible complex shapes, and this goes on until no more clusters/groups of shapes can be merged.

(i) Stroke Segmentation: Cusp Detector (Wolin 2010)
(ii) Primitive Shape : PaleoSketch (Paulson and Hammond 2008)
(iii) Adding primitive shapes to collection of shapes.

(iv) Analyze various shape grouping using high level recognizers

(v) Replace low level shapes with high level shapes, and return to step (iii)

3. Truss Recognition

Trusses are recognized primarily by detecting shared edges. To find out if two polygons share a particular edge, we remove that edge from the graph form by the vertexes of polygon and the edges, and evaluate using BFS if there is still a way to reach B from A, where AB is the removed edge.

4. Answer Checking

(i) Checks for truss configuration, by representing the completed sketch as a graph, and compares with instructor's solution using Graph Isomorphism

(ii) Checks for axis and presence of forces and their directions

(iii) Checks force values

5. Generic Closed-Shape Comparison Technique

Makes use of three similarity measures : Hausdroff distance, Modified Hausdroff Distance, and Tanimoto Coefficient. They basically try to gague geometric similarity, as well as ratio of points overlapping between studen'ts solution and instructor's solution.

6. Creative Response

Uses the constraints provided by instructor to frame its own set of solutions against which the student's work is compared.

7. Distributed Architechture

The student's workspace is transferred in the form of XML document to a session at a server running Mechanix. Load balancing is done using HAProxy.

Mechanix: A Run Through

Our first assignment in Sketch Recognition was to go through a software called Mechanix, that has been design for high school and engineering students to work on solid mechanics problem effectively. It has a lucid user interface that provides a rich visual experience, and is powered by an AI core that is able to understand sketches and shapes drawn by the student, and also compare it with instructor provided templates.

Although the software on a whole was really impressive, for someone like me with butterfingers, it was fairly difficult to get through all the problems, as I found it hard to even make lines intersect correctly! When I told Dr. Tracy Hammond about this, she suggested that I can possibly work on algorithms that makes Mechanix more friendly towards poor artists! That is probably an excellent direction in the future, to help artistically deprived people like me!

Some thoughts on Mechanix:

1. Professors can view the student's submissions. What if a student has been trying to submit an imperfect diagram for a long time, which Mechanix didn't accept, but is acceptable to a human professor? Can the professor then tell Mechanix that the solution is indeed right (i.e Correct Solution, Detected as Wrong - True Negative?)? Will Mechanix be able to perform fresh analysis and learn to widen its range of accepted solutions?

I will be adding more thoughts here on Mechanix as and when I figure out new things!