Image Forensics | 3D Reconstruction | Aids for Visually Impaired People
Stereo visual odometry with accurate frame selection
SSLAM (Selective SLAM) is a novel stereo visual odometry (VO) framework based on SfM, where a robust keypoint tracking and matching is combined with an effective keyframe selection strategy. The main aspect characterizing SSLAM is the selection of the keyframes used as base references for computing the camera trajectory. Keyframes are selected only if a strong temporal feature disparity is detected. This idea arises from the observation that errors may propagate from the uncertainty of the 3D points, which is higher for distant points corresponding to low temporal disparity matches in the images. The proposed strategy can be more stable and effective with respect to using a threshold on the average temporal disparity or a constant keyframe interleaving. Additionally, a robust loop chain matching scheme is adopted, improving upon VISO2-S by using a more robust detector-descriptor pair, to find correspondences also in images with high spatial and/or temporal disparity as the requested keyframes. The proposed solution is effective and robust even for very long path, and has been used as support to AUV navigation in real complex underwater environments.
SAMSLAM: Simulated Annealing Monocular SLAM
SAMSLAM (Simulated Annealing Monocular SLAM) replaces the classic global SfM optimized approach - for obtaning both the 3D map and the camera pose - by a robust simulated annealing scheme. It works locally on triplets of successive overlapping keyframes, thus guaranteeing scale and 3D structure consistency. Each update step uses RANSAC and alternates between the registration of the three 3D maps associated to each image pair in the triplet and the refinement of the corresponding poses, by progressively limiting the allowable reprojection error. SAMSLAM does not require neither global optimization nor loop closure. Moreover, it does not perform any back-correction of the poses and does not suffer of 3D map growth.
Best reference homography estimation for planar mosaicing
A mosaicing pipeline is developed to globally reduce the distortion induced by a wrong viewpoint selection given by a bad choice of the mosaic reference homography. In particular, the input sequence is split into almost planar sub-mosaics, merged hierarchically by a bottom-up approach according to their overlap error when reprojected through the "average homography". Given two sub-mosaics, the average homography is defined as the homography that minimizes the average point shift from the original coordinates when points are mapped using the first sub-mosaic as reference as well as when the second sub-mosaic is used ad reference.
Spine-based color blending for image mosaicing
A novel color correction and blending scheme for image stitching is developed, where color map is modelled by a monotone Hermite cubic spline and smoothly propagated into the target image. The employed three-segments monotone cubic spline minimizes color distribution statistics and gradient differences with respect to both the source and target images. While the spline model can handle non-linear color maps, the minimization over the gradient differences limits strong alterations on the image structure. Adaptive heuristics are introduced to reduce the minimization search space and thus computational time, obtaining results better or comparable with the state-of-the-art.
Image mosaicing on high parallax scenes
An alternative approach based on fundamental matrices is here employed to obtain accurate image mosaics from scenes with high parallax (and then not suitable for the classical homography-based mosaicing techniques). In particular, visual information is transferred from an image to any other thank to the epipolar propagation on the connected graph of fundamental matrices, while SIFT dense stereo matching is used to obtain the output mosaic. Additionally, epipolar relations are employed to correctly handling occlusions inducted by the parallax.
Multi-image super-resolution of corneal endothelium
In collaboration with VISIA Imaging s.p.a., we developed a practical and effective method to compute a high-resolution image of the corneal endothelium starting from a low-resolution video sequence obtained with a general purpose slit lamp biomicroscope. This is obtained thank to a SVM-based learning approach to identify the most suitable endothelium video frames, followed by a robust graph-based mosaicing registration. An image quality typical of dedicated and more expensive confocal microscopes is obtained using only low-cost equipment, that makes the method valid and affordable as diagnostic tool for medical practice in developing countries.
Robust keypoint matching with the sGLOH-based descriptors
The sGLOH descriptor is able to handle discrete rotations of the keypoint patch by a simple permutation of its vector components. sGLOH can be used in combination with a global or a priori orientation estimation to filter keypoint correspondences, thus improving the matches. sGLOH2 extends the descriptor by concatenating two sGLOH descriptor for the same patch with a relative rotation offset, improving the original robustness and discriminability when in-the-middle rotations occour. Still, an adaptive, general, fast matching scheme can be used to significantly reducing both computation time and memory usage, while binarization based on comparisons inside each descriptor histogram yields the more compact, faster, yet robust, alternative BisGLOH2. sGLOH-based descriptors come with an exhaustive comparative experimental evaluation on both image matching and object recognition. According to this evaluation the proposed descriptors achieve state-of-the-art results.
Principal point reliability for image tampering detection
Principal Point (PP) estimation can be used in image forensic analysis to detect image manipulations such as asymmetric cropping or image splicing. The Minimum Vanishing Angle (MVA) is proposed as a reliability score of the estimated PP after an extensive evaluation under different experimental conditions. In this way, MVA provides a robust indicator on the accuracy provided the estimated PP. Moreover, MVA is also an effective and practical criterion for choosing the best lines that serve as input, since PP reliability does not depend on the number of lines used, but on amplitude of the obtained vanishing angles, codified by MVA.
Statistically accurate measurments from single image
Geometric methods of computer vision has been applied to extract accurate measurements from video frames in a sport justice case on an international bridge tournament. In particular, calibration parameters were extracted from the card table and sub-pixel edge detection of the cards were employed to obtain very accurate measurements.
LaserGun: Hybrid 3D reconstruction combining visual odometry and laser scanning
LaserGun combines visual odometry techniques with active laser scanning triangulation. After the initial calibration of the relative component of the system (i.e the camera intrinsic calibration parameters and the laser plane), laser profiles are employed to extract object 3D structure while visual odometry information are used to track and merge the data from the different frames. According to the experimental results, greater accuracy is achieved by the system when planar homography decomposition is used to track the camera instead of using a monocular SLAM approach.
Fast keyframe selection for Visual SfM using DWAFS
DWAFS (Double Window Adaptive Frame Selection) is a new fast online preprocessing strategy to detect and discard ongoing bad frames (too blurry or without relevant content changes) in video sequences. Unlike keyframe selectors and deblurring methods, the proposed approach does not require to compute complex time-consuming image processing, such as the computation of image feature keypoints, previous poses and 3D structure. The presented method can be used to directly filter a SfM video input improving the final 3D reconstruction by discarding noisy and non-relevant frames, also decreasing the total computation cost. DWAFS is based on the gradient percentile statistics of the input frames, where an adaptive decision strategy - based on a dangling sampling window according to the ongoing values and the last best ones - is used.
MagicBox: Photometric stereo for accurate leather fabric reproduction
MagicBox is a hardware/software tool designed for the accurate acquisition of 3D surfaces using photometric stereo and employed to get high-quality digitalized reproduction of leather fabric samples. MagicBox combined a hardware module to control the acquisition environment needed to illuminate the input object with different lights with a software module that assembles the final virtual fabric result.
2D to 3D semi-automatic image conversion for stereoscopic displays
This project describes the development of a fast and effective 2D to 3D conversion scheme to render 2D images on stereoscopic displays. The stereo disparities of all scene elements (including background and foreground) are computed after statistical segmentation and geometric localization of the ground plane. An original algorithm is devised for recovering 3D visual parameters from planar homologies. The theatrical model employed for the scene provides a effective 3D impression of the displayed scene, and it is fast enough to process video sequences.
Obstacle detection on mobile phones
An effective obstacle detection application running on mobile phone was developed to help visually imparied people. The system uses a SfM approach, modified to use more reliable position information by exploiting the phone gyroscope data. A robust RANSAC-based approach is used on the estimated 3D structure to detect the principal plane and localize out-of-plane objects to be marked as obstacles.
BusAlarm: bus line number detection
BusAlarm is a smartphone application that automatically reads the bus line number, assisting visually impaired people in taking public transport and improving their autonomy in daily activities. BusAlarm combines machine learning with geometric and template matching approaches and OCR techniques to correctly detect the incoming bus, find the line number location and output the final answer to the user.