Citation:
2. Feature Extraction. Two main features are extracted, mean amplitude for each time frame, and mel-frequency cepstral coefficients.
Li, Wenzhe, and Tracy Anne Hammond. "Recognizing text through sound alone." Twenty-Fifth AAAI Conference on Artificial Intelligence. 2011.
Summary:
This paper makes use of the classical idea of capturing the sound information while making strokes, to recognize the stroke. The sound analysis is performed using multiple features extracted from the input signal of the sound. The stages in this recognition system are noise removal, feature extraction, template matching and dynamic time warping.
Discussion:
1. End Point Noise Removal:
Two techniques are used to remove end point noise. In the first technique, first the environmental noise is obtained as the signal energy for noise amplitudes over the first 150ms, after which the amplitudes are sample at every 45 ms, and checked against a threshold multiplied by the environmental noise, to determine the start and end points. The second method assumes the first 20ms as environmental noise and uses it to calculate a Gaussian probability. Every 10ms segments is checked, and marked as silent or valid. Finally, all the valid segments are merged. The mean amplitud value is used to normalize the signal.
2. Feature Extraction. Two main features are extracted, mean amplitude for each time frame, and mel-frequency cepstral coefficients.
3. Template matching: This is done using dynamic time warping. The DTW constraints defined are end point constraints, local continuity constraints, global path constraint and axis orientation.
No comments:
Post a Comment