Constructing 3D Feature Maps from Video Sequences by Optic Flow Estimation
The study presents a general case of structure-from-motion problem where the given data consists of a bunch of video sequences filmed in the same scene. Unlike the popular methods of photogrammetry and bundle adjustment, the proposed solution does not required specific knowledge of intrinsic camera parameters, could be applied to any type of consistent motion pictures and can handle large amounts of noise. During the process of reconstruction an object is viewed as a 3D map of robust sparse features, which at first hand are discovered in certain key frames (using existent computer vision techniques like Shi-Tomasi corner detector) and afterwards tracked across the following frames using sparse optic flow method. When camera motion (egomotion) data is available, it is became possible to estimate each feature's depth by using simple geometric properties of two-image disparity, and having each feature estimated from multiple video frames allows to effectively filter out the noise. Apart from sparse Lucas-Kanade optic flow the study also makes use of some properties of dense optic flow (Gunnar Farneback's algorithm), which is used for scene segmentation during the camera motion. The resulting 3D feature maps are designed to be used as a macro object detector that could be applied to any previously unknown single digital images, representing structures that are believed to store 3D visual memory of an object, and therefore being able to detect objects in spite of general invariant scene transformations.