Visual Odometry

Stereo visual odometry with accurate frame selection 

SSLAM (Selective SLAM) is a novel stereo visual odometry (VO) framework based on SfM, where a robust keypoint tracking and matching is combined with an effective keyframe selection strategy. The main aspect characterizing SSLAM is the selection of the keyframes used as base references for computing the camera trajectory. Keyframes are selected only if a strong temporal feature disparity is detected. This idea arises from the observation that errors may propagate from the uncertainty of the 3D points, which is higher for distant points corresponding to low temporal disparity matches in the images. The proposed strategy can be more stable and effective with respect to using a threshold on the average temporal disparity or a constant keyframe interleaving. Additionally, a robust loop chain matching scheme is adopted, improving upon VISO2-S by using a more robust detector-descriptor pair, to find correspondences also in images with high spatial and/or temporal disparity as the requested keyframes. The proposed solution is effective and robust even for very long path, and has been used as support to AUV navigation in real complex underwater environments.

  • M. Fanfani, F. Bellavia, and C. Colombo, "Accurate keyframe selection and keypoint tracking for robust visual odometry". MVA, 2016 | PDF
  • F. Bellavia, M. Fanfani, and C. Colombo, "Selective visual odometry for accurate AUV localization". Aut. Rob., 2015 | PDF
  • F. Bellavia, M. Fanfani, F. Pazzaglia and C. Colombo, "Robust Selective Stereo SLAM without Loop Closure and Bundle Adjustment", ICIAP 2013 | PDF | Poster

SAMSLAM: Simulated Annealing Monocular SLAM

SAMSLAM (Simulated Annealing Monocular SLAM) replaces the classic global SfM optimized approach - for obtaning both the 3D map and the camera pose - by a robust simulated annealing scheme. It works locally on triplets of successive overlapping keyframes, thus guaranteeing scale and 3D structure consistency. Each update step uses RANSAC and alternates between the registration of the three 3D maps associated to each image pair in the triplet and the refinement of the corresponding poses, by progressively limiting the allowable reprojection error. SAMSLAM does not require neither global optimization nor loop closure. Moreover, it does not perform any back-correction of the poses and does not suffer of 3D map growth.

  • M. Fanfani, F. Bellavia, F. Pazzaglia and C. Colombo, "SAMSLAM: Simulated Annealing Monocular SLAM", CAIP 2013 | PDF | Poster
Back to top

Image Stitching and Super-resolution

Best reference homography estimation for planar mosaicing

A mosaicing pipeline is developed to globally reduce the distortion induced by a wrong viewpoint selection given by a bad choice of the mosaic reference homography. In particular, the input sequence is split into almost planar sub-mosaics, merged hierarchically by a bottom-up approach according to their overlap error when reprojected through the "average homography". Given two sub-mosaics, the average homography is defined as the homography that minimizes the average point shift from the original coordinates when points are mapped using the first sub-mosaic as reference as well as when the second sub-mosaic is used ad reference.

  • F. Bellavia and C. Colombo, "Estimating the best reference homography for planar mosaics from videos", VISAPP, 2015 | PDF | Slide
  • F. Bellavia, M. Fanfani, F. Pazzaglia, C. Colombo, et al., "Piecewise planar underwater mosaicing", Oceans, 2015 | PDF

Spine-based color blending for image mosaicing

A novel color correction and blending scheme for image stitching is developed, where color map is modelled by a monotone Hermite cubic spline and smoothly propagated into the target image. The employed three-segments monotone cubic spline minimizes color distribution statistics and gradient differences with respect to both the source and target images. While the spline model can handle non-linear color maps, the minimization over the gradient differences limits strong alterations on the image structure. Adaptive heuristics are introduced to reduce the minimization search space and thus computational time, obtaining results better or comparable with the state-of-the-art.

  • F. Bellavia and C. Colombo, "Color correction for image stitching by monotone cubic spline interpolation", IbPRIA, 2015 | PDF | Slide

Image mosaicing on high parallax scenes

An alternative approach based on fundamental matrices is here employed to obtain accurate image mosaics from scenes with high parallax (and then not suitable for the classical homography-based mosaicing techniques). In particular, visual information is transferred from an image to any other thank to the epipolar propagation on the connected graph of fundamental matrices, while SIFT dense stereo matching is used to obtain the output mosaic. Additionally, epipolar relations are employed to correctly handling occlusions inducted by the parallax.

  • A. Nardi, D. Comanducci, and C. Colombo, "Augmented vision: Seeing beyond field of view and occlusions via uncalibrated visual transfer from multiple viewpoints", IMVIP, 2011 (Best paper award) | PDF

Multi-image super-resolution of corneal endothelium

In collaboration with VISIA Imaging s.p.a., we developed a practical and effective method to compute a high-resolution image of the corneal endothelium starting from a low-resolution video sequence obtained with a general purpose slit lamp biomicroscope. This is obtained thank to a SVM-based learning approach to identify the most suitable endothelium video frames, followed by a robust graph-based mosaicing registration. An image quality typical of dedicated and more expensive confocal microscopes is obtained using only low-cost equipment, that makes the method valid and affordable as diagnostic tool for medical practice in developing countries.

  • D. Comanducci and C. Colombo, "Vision-Based Magnification of Corneal Endothelium Frames", ICVS, 2013 | PDF | Poster
Back to top

Image Keypoint Descriptors

Robust keypoint matching with the sGLOH-based descriptors

The sGLOH descriptor is able to handle discrete rotations of the keypoint patch by a simple permutation of its vector components. sGLOH can be used in combination with a global or a priori orientation estimation to filter keypoint correspondences, thus improving the matches. sGLOH2 extends the descriptor by concatenating two sGLOH descriptor for the same patch with a relative rotation offset, improving the original robustness and discriminability when in-the-middle rotations occour. Still, an adaptive, general, fast matching scheme can be used to significantly reducing both computation time and memory usage, while binarization based on comparisons inside each descriptor histogram yields the more compact, faster, yet robust, alternative BisGLOH2. sGLOH-based descriptors come with an exhaustive comparative experimental evaluation on both image matching and object recognition. According to this evaluation the proposed descriptors achieve state-of-the-art results.

  • F. Bellavia and C. Colombo, "Extending the sGLOH descriptor", ICIAP, 2015 | PDF | Poster
  • F. Bellavia and C. Colombo, "Rethinking the sGLOH descriptor", TPAMI,  to appear | PDF | Additional material
Back to top

Image Forensics

Principal point reliability for image tampering detection

Principal Point (PP) estimation can be used in image forensic analysis to detect image manipulations such as asymmetric cropping or image splicing. The Minimum Vanishing Angle (MVA) is proposed as a reliability score of the estimated PP after an extensive evaluation under different experimental conditions. In this way, MVA provides a robust indicator on the accuracy provided the estimated PP. Moreover, MVA is also an effective and practical criterion for choosing the best lines that serve as input, since PP reliability does not depend on the number of lines used, but on amplitude of the obtained vanishing angles, codified by MVA.

  • M. Iuliani, M. Fanfani, C. Colombo and A. Piva, "Reliability Assessment of Principal Point Estimates for Forensic Applications", JVCI, 2016 | PDF

Statistically accurate measurments from single image

Geometric methods of computer vision has been applied to extract accurate measurements from video frames in a sport justice case on an international bridge tournament. In particular, calibration parameters were extracted from the card table and sub-pixel edge detection of the cards were employed to obtain very accurate measurements.

Back to top

3D Reconstruction

LaserGun: Hybrid 3D reconstruction combining visual odometry and laser scanning

LaserGun combines visual odometry techniques with active laser scanning triangulation. After the initial calibration of the relative component of the system (i.e the camera intrinsic calibration parameters and the laser plane), laser profiles are employed to extract object 3D structure while visual odometry information are used to track and merge the data from the different frames. According to the experimental results, greater accuracy is achieved by the system when planar homography decomposition is used to track the camera instead of using a monocular SLAM approach.

  • M. Fanfani and C. Colombo, "LaserGun: a Tool for Hybrid 3D Reconstruction", ICVS, 2013 | PDF | Slide

Fast keyframe selection for Visual SfM using DWAFS

DWAFS (Double Window Adaptive Frame Selection) is a new fast online preprocessing strategy to detect and discard ongoing bad frames (too blurry or without relevant content changes) in video sequences. Unlike keyframe selectors and deblurring methods, the proposed approach does not require to compute complex time-consuming image processing, such as the computation of image feature keypoints, previous poses and 3D structure. The presented method can be used to directly filter a SfM video input improving the final 3D reconstruction by discarding noisy and non-relevant frames, also decreasing the total computation cost. DWAFS is based on the gradient percentile statistics of the input frames, where an adaptive decision strategy - based on a dangling sampling window according to the ongoing values and the last best ones - is used.

  • F. Bellavia, M. Fanfani and C. Colombo, "Fast adaptive frame preprocessing for 3D reconstruction", VISAPP, 2015 | PDF | Slide

MagicBox: Photometric stereo for accurate leather fabric reproduction

MagicBox is a hardware/software tool designed for the accurate acquisition of 3D surfaces using photometric stereo and employed to get high-quality digitalized reproduction of leather fabric samples. MagicBox combined a hardware module to control the acquisition environment needed to illuminate the input object with different lights with a software module that assembles the final virtual fabric result.

2D to 3D semi-automatic image conversion for stereoscopic displays

This project describes the development of a fast and effective 2D to 3D conversion scheme to render 2D images on stereoscopic displays. The stereo disparities of all scene elements (including background and foreground) are computed after statistical segmentation and geometric localization of the ground plane. An original algorithm is devised for recovering 3D visual parameters from planar homologies. The theatrical model employed for the scene provides a effective 3D impression of the displayed scene, and it is fast enough to process video sequences.

  • D. Comanducci, A. Maki, C. Colombo, and R. Cipolla, "2D-to-3D photo rendering for 3D displays", 3DPVT, 2010 | PDF | Poster
Back to top

Aids for Visually Impaired People

Obstacle detection on mobile phones

An effective obstacle detection application running on mobile phone was developed to help visually imparied people. The system uses a SfM approach, modified to use more reliable position information by exploiting the phone gyroscope data. A robust RANSAC-based approach is used on the estimated 3D structure to detect the principal plane and localize out-of-plane objects to be marked as obstacles.

  • A. Caldini, M. Fanfani and C. Colombo, "Smartphone-based obstacle detection for the visually impaired", ICIAP, 2015 | PDF | Poster

BusAlarm: bus line number detection

BusAlarm is a smartphone application that automatically reads the bus line number, assisting visually impaired people in taking public transport and improving their autonomy in daily activities. BusAlarm combines machine learning with geometric and template matching approaches and OCR techniques to correctly detect the incoming bus, find the line number location and output the final answer to the user.

  • C. Guida, D. Comanducci, and C. Colombo, "Automatic bus line number localization and recognition on mobile phones - a computer vision aid for the visually impaired", ICIAP, 2011 | PDF | Poster
  • Presetation at the workshop promoted by the Andrea Bocelli Foundation, 2012 | Slide | Video
Back to top