Unsupervised Learning of Hierarchical 2D Features for Image Classification

Intellectual Systems and Technologies

One of the key problems of image classification and pattern recognition domains is that of feature detection. The desired features are expected to be robust and invariant to a number of spatial transformations, compact enough to evade the «curse of dimensionality» which is a frequent obstacle when dealing with large natural images, and provide a characteristic relation to a classification category with high probability. There exists a number of approaches developed to reach the stated goals, including a variety of deep learning models, such as Restricted Boltzman Machines, convolutional networks, autoencoders, PCA, Deep Belief Networks, etc. However, most applications of the above-mentioned algorithms are often concentrated on obtaining the most accurate features for a chosen dataset rather than trying to extract the inner structure of the data. This paper suggest a slightly different approach, namely a method for building a hierarchy of meaningful features with each level composed of the features from a previous layer. Such model has multiple applications — it can serve as a composite feature detector in an unsupervised pre-training step of learning, or be itself a metric that answers the question of whether the same spatial structure is present across the dataset. The proposed approach exploits the idea of local connectivity supposing that multiple adjacent image parts which contain some meaningful features might present another, more high-level feature when composed together. We also discuss the advantages of a hierarchical feature model, such as the ability to guess a high-level feature presence by discovering a collection of low-level features concentrated in the same area, or its stability against noise and distortion which happens due to the fact that each feature level accepts a certain degree of deviation accumulating those to the top of the hierarchy. The resulting model operates on 2D images, but can be easily extended in order to extract 3D features from a continuous data input, such as a movie, which promises to be a good way to deal with 3D transformations, which can drastically change the appearance of an object while preserving its identity.