Discovering Local Spatial Relationships for Image Recognition

Intellectual Systems and Technologies

Nowadays the feature detection approach widely used in lots of computer vision applications has been particularly improved by using unsupervised feature learning techniques such as Restricted Boltzman Machines and Sparse Autoencoders. It helps to learn features from large amounts of generally unrelated (or domain-related) data. Unsupervised feature learning has especially become useful combined with deep learning models like convolution neural networks and Deep Belifef Networks. However, when dealing with complex and high-level structured data, as long as with data exposed to lots of invariant transformations (which is highly relevant to a computer vision in 3D and motion), it can become a problem to construct a bag-of-words like a feature dictionary to contain all the possible changes an object can take. Instead, the study offers a different approach involving discovering some relevant spatial relationships that appear across the dataset. The idea by itself is not a new one. The classical example of using spatial structures for recognition would be a distinct pattern of a human face, with two eyes and a mouth. However, most of the existing solutions are strictly limited by a certain domain. The study proposes an algorithm inspired by some properties of a primary visual cortex V1. It mimics the functionality of orientation cells, discovers general properties of a natural image and aggregates them together to extract some useful statistics to be further used in a classification algorithm.