Research

Visual Odometry and SLAM

Stereo visual odometry with accurate frame selection 

SSLAM (Selective SLAM) is a novel stereo visual odometry (VO) framework based on Structure from Motion, where robust keypoint tracking and matching is combined with an effective keyframe selection strategy. The main aspect characterizing SSLAM is the selection of the keyframes used as base references for computing the camera trajectory. Keyframes are selected only if a strong temporal feature disparity is detected. This idea arises from the observation that localization uncertainty of 3D  points is higher for distant points corresponding to low temporal disparity matches in the images. The proposed strategy can be more stable and effective with respect to using a threshold on the average temporal disparity or a constant keyframe interleaving. Additionally, a robust loop chain matching scheme is adopted, improving upon VISO2-S by using a more robust detector-descriptor pair, to find correspondences also in images with high spatial and/or temporal disparity as the requested keyframes. The proposed solution is effective and robust even for very long paths, and has been used as support to AUV navigation in real complex underwater environments.

  • M. Fanfani, F. Bellavia, and C. Colombo, "Accurate keyframe selection and keypoint tracking for robust visual odometry". MVA, 2016 | PDF
  • F. Bellavia, M. Fanfani, and C. Colombo, "Selective visual odometry for accurate AUV localization". Aut. Rob., 2017 | PDF
  • F. Bellavia, M. Fanfani, F. Pazzaglia and C. Colombo, "Robust Selective Stereo SLAM without Loop Closure and Bundle Adjustment", ICIAP 2013 | PDF | Poster

SAMSLAM: Simulated Annealing Monocular SLAM

SAMSLAM (Simulated Annealing Monocular SLAM) replaces the classic global Structure from Motion optimized approach - for obtaning both the 3D map and the camera pose - with a robust simulated annealing scheme. It works locally on triplets of successive overlapping keyframes, thus guaranteeing scale and 3D structure consistency. Each update step uses RANSAC and alternates between the registration of the three 3D maps associated to each image pair in the triplet and the refinement of the corresponding poses, by progressively limiting the allowable reprojection error. SAMSLAM does not require neither global optimization nor loop closure. Moreover, it does not perform any back-correction of the poses and does not suffer of 3D map growth.

  • M. Fanfani, F. Bellavia, F. Pazzaglia and C. Colombo, "SAMSLAM: Simulated Annealing Monocular SLAM", CAIP 2013 | PDF | Poster

Fast keyframe selection for Visual SfM using DWAFS

DWAFS (Double Window Adaptive Frame Selection) is a new fast online preprocessing strategy to detect and discard ongoing bad frames (too blurry or without relevant content changes) in video sequences. Unlike keyframe selectors and deblurring methods, the proposed approach does not require to compute complex time-consuming image processing, such as the computation of image feature keypoints, previous poses and 3D structure. The presented method can be used to directly filter a Structure from Motion video input improving the final 3D reconstruction by discarding noisy and non-relevant frames, also decreasing the total computation cost. DWAFS is based on the gradient percentile statistics of the input frames, where an adaptive decision strategy - based on a dangling sampling window according to the ongoing values and the last best ones - is used.

  • F. Bellavia, M. Fanfani and C. Colombo, "Fast adaptive frame preprocessing for 3D reconstruction", VISAPP, 2015 | PDF | Slides
 
Back to top

3D Reconstruction

3D Map Computation from Historical Stereo Photographs of Florence

This work deals with the analysis of historical photos with the final objective of reconstruct in 3D the old structures so to easy its comparison with the scene at present time. In particular we use the stereograms of Hanton Hautmann - one of the most active photographer working in Florence in the middle of the 19th century. This work has been carried out within the project TRAVIS (Tecniche di Realtà Aumentata per la Visualizzazione di Immagini Storiche), funded by Ente Cassa di Risparmio di Firenze (bando Giovani Ricercatori Protagonisti, 2015)

  • M. Fanfani, F. Bellavia, G. Bassetti, F. Argenti, and C. Colombo, "3D Map Computation from Historical Stereo Photographs of Florence", Heri-Tech, IOP, 2018 | PDF | Slides

2D to 3D semi-automatic image conversion for stereoscopic displays

This project describes the development of a fast and effective 2D to 3D conversion scheme to render 2D images on stereoscopic displays. The stereo disparities of all scene elements (including background and foreground) are computed after statistical segmentation and geometric localization of the ground plane. An original algorithm is devised for recovering 3D visual parameters from planar homologies. The theatrical model employed for the scene provides a effective 3D impression of the displayed scene, and it is fast enough to process video sequences.

  • D. Comanducci, A. Maki, C. Colombo, and R. Cipolla, "2D-to-3D photo rendering for 3D displays", 3DPVT, 2010 | PDF | Poster

3D change detection

Tracking the structural evolution of a site has important fields of application, ranging from documenting the excavation progress during an archaeological campaign, to hydro-geological monitoring. We developed a simple yet effective method that exploits vision-based reconstructed 3D models of a time-changing environment to automatically detect any geometric changes in it. Changes are localized by direct comparison of time-separated 3D point clouds according to a majority voting scheme based on three criteria that compare density, shape and distribution of 3D points.

  • M. Fanfani and C. Colombo, "Structural change detection by direct 3D model comparison", VISAPP, 2019 | PDF | Slides

LaserGun: Hybrid 3D reconstruction combining visual odometry and laser scanning

LaserGun combines visual odometry techniques with active laser scanning triangulation. After the initial calibration of the relative component of the system (i.e the camera intrinsic calibration parameters and the laser plane), laser profiles are employed to extract object 3D structure while visual odometry information are used to track and merge the data from the different frames. According to the experimental results, greater accuracy is achieved by the system when planar homography decomposition is used to track the camera instead of using a monocular SLAM approach.

  • M. Fanfani and C. Colombo, "LaserGun: a Tool for Hybrid 3D Reconstruction", ICVS, 2013 | PDF | Slides

MagicBox: Photometric stereo for accurate leather fabric reproduction

MagicBox is a hardware/software tool designed for the accurate acquisition of 3D surfaces using photometric stereo and employed to get high-quality digitalized reproduction of leather fabric samples. MagicBox combined a hardware module to control the acquisition environment needed to illuminate the input object with different lights with a software module that assembles the final virtual fabric result.

 
Back to top

Local Image Descriptors

Robust keypoint matching with sGLOH-based descriptors

The sGLOH descriptor is able to handle discrete rotations of the keypoint patch by a simple permutation of its vector components. sGLOH can be used in combination with a global or a priori orientation estimation to filter keypoint correspondences, thus improving the matches. sGLOH2 extends the descriptor by concatenating two sGLOH descriptor for the same patch with a relative rotation offset, improving the original robustness and discriminability when in-the-middle rotations occour. Still, an adaptive, general, fast matching scheme can be used to significantly reducing both computation time and memory usage, while binarization based on comparisons inside each descriptor histogram yields the more compact, faster, yet robust, alternative BisGLOH2. sGLOH-based descriptors come with an exhaustive comparative experimental evaluation on both image matching and object recognition. According to this evaluation the proposed descriptors achieve state-of-the-art results.

Evaluation of recent local image descriptors for image matching

A comparison of the best and most recent local image descriptors on planar and non-planar scenes under viewpoint changes is presented. This evaluation, aimed at assessing descriptor suitability for real-world applications, leverages the concept of Approximated Overlap error as a means to naturally extend to non-planar scenes the standard metric used for planar scenes. According to the evaluation results, most descriptors exhibit a gradual performance degradation in the transition from planar to non-planar scenes. The best descriptors are those capable of capturing well not only the local image context, but also the global scene structure. Deep learned descriptor approaches are shown to have reached the matching robustness and accuracy of the best handcrafted descriptors.

  • F. Bellavia and C. Colombo, "An evaluation of recent local image descriptors for real-world applications of image matching", MVA, 2019 (to appear) | PDF
 
Back to top

Image Stitching and Super-resolution

Best reference homography estimation for planar mosaicing

A mosaicing pipeline is developed to globally reduce the distortion induced by a wrong viewpoint selection given by a bad choice of the mosaic reference homography. In particular, the input sequence is split into almost planar sub-mosaics that are then merged hierarchically by a bottom-up approach according to their overlap error when reprojected through the "average homography". Given two sub-mosaics, the average homography is defined as the homography that minimizes the average point shift from the original coordinates when points are mapped using, as reference, either the first or the second sub-mosaic.

  • F. Bellavia and C. Colombo, "Estimating the best reference homography for planar mosaics from videos", VISAPP, 2015 | PDF | Slides
  • F. Bellavia, M. Fanfani, F. Pazzaglia, C. Colombo, et al., "Piecewise planar underwater mosaicing", Oceans, 2015 | PDF

Compositional framework to analyse and design novel color correction methods for image stitching

Starting from the observation that any color correction method can be decomposed into two main computational units, we defined a new compositional framework for classifying color correction methods, which allowed us both to investigate existing methods from a new perspective, and to develop new and more effective solutions to the problem. The framework was used to dissect 15 among the best color correction algorithms. The computational units so derived, with the addition of 4 new units, were then reassembled in a combinatorial way to originate about one 100 distinct color correction methods, most of which never considered before. The above color correction methods were tested on three different existing datasets, including both real and artificial color transformations, plus a novel dataset of real image pairs categorized according to the kind of color alterations induced by specific acquisition setups. Comparative results show that combinations of the computational units newly designed for this work are the most effective for real stitching scenarios, regardless of the specific source of color alteration. This is achieved by employing monotone cubic splines to locally model the correction function, that also take into account the gradient of both the source and target images so as to preserve the image structure.

Image mosaicing with high parallax scenes

An original image-based rendering approach based on fundamental matrices is here employed to obtain accurate image mosaics from scenes with high parallax (and then not suitable for the classical homography-based mosaicing techniques). In particular, visual information is transferred from an image to any other thanks to the epipolar propagation on the connected graph of fundamental matrices, while SIFT dense stereo matching is used to obtain the output mosaic. Additionally, epipolar relations are employed to correctly handling occlusions induced by the parallax.

  • A. Nardi, D. Comanducci, and C. Colombo, "Augmented vision: Seeing beyond field of view and occlusions via uncalibrated visual transfer from multiple viewpoints", IMVIP, 2011 (Best paper award) | PDF

Multi-image super-resolution of corneal endothelium

In collaboration with VISIA Imaging s.p.a., we developed a practical and effective method to compute a high-resolution image of the corneal endothelium starting from a low-resolution video sequence obtained with a general purpose slit lamp biomicroscope. This is obtained thanks to an SVM-based learning approach for identifying the most suitable endothelium video frames, followed by a robust graph-based mosaicing registration. An image quality typical of dedicated and more expensive confocal microscopes is obtained using only low-cost equipment, which makes the method valid and affordable as a diagnostic tool for medical practice in developing countries.

  • D. Comanducci and C. Colombo, "Vision-Based Magnification of Corneal Endothelium Frames", ICVS, 2013 | PDF | Poster
  • D. Comanducci, F. Bellavia and C. Colombo, "Super-resolution based magnification of endothelium cells from biomicroscope videos of the cornea", JEI, 2018 | PDF
 
Back to top

Image Forensics

Computer vision based image cropping detection

Principal Point (PP) estimation can be used in image forensic analysis to detect image manipulations such as asymmetric cropping or image splicing. ACID (Automatic Cropped Image Detector), is a fully automated detector for exposing evidences of asymmetrical image cropping. The proposed solution estimates and exploits the camera principal point, i.e., a physical feature extracted directly from the image content that is quite insensitive to image processing operations, such as compression and resizing, typical of social media platforms. Robust computer vision techniques are employed throughout, so as to cope with large sources of noise in the data and improve detection performance. The method leverages a novel metric based on robust statistics, and is also capable to decide autonomously whether the image at hand is tractable or not.

  • M. Iuliani, M. Fanfani, C. Colombo and A. Piva, "Reliability Assessment of Principal Point Estimates for Forensic Applications", JVCI, 2016 | PDF

PRNU pattern alignment for images and videos based on scene content

A novel approach for registering the PRNU pattern between different acquisition modes is presented. This method relies on the imaged scene content: image registration is achieved by establishing correspondences between local descriptors; the result can optionally be refined by maximizing the PRNU correlation. The proposed scene-based approach for PRNU pattern alignment is suitable for video source identification in multimedia forensics applications.

  • F. Bellavia, M. Iuliani, M. Fanfani, C. Colombo and A. Piva, "PRNU pattern alignment for images and videos based on scene content", ICIP 2019 | PDF | Source code | Dataset

Statistically accurate measurements from a single image

Geometric methods of computer vision has been applied to extract accurate measurements from video frames in a sport justice case on an international bridge tournament. In particular, calibration parameters were extracted from the card table and sub-pixel edge detection of the cards were employed to obtain very accurate measurements.

 
Back to top

Aids for Visually Impaired People

Obstacle detection on smartphones

An effective obstacle detection application running on smartphones was developed to help visually impaired people. The system uses a SfM approach, modified to use more reliable position information by exploiting the phone gyroscope data. A robust RANSAC-based approach is used on the estimated 3D structure to detect the principal plane and localize out-of-plane objects to be marked as obstacles.

  • A. Caldini, M. Fanfani and C. Colombo, "Smartphone-based obstacle detection for the visually impaired", ICIAP, 2015 | PDF | Poster

BusAlarm: bus line number detection

BusAlarm is a smartphone application that automatically reads the bus line number, assisting visually impaired people in taking public transport and improving their autonomy in daily activities. BusAlarm combines machine learning with geometric and template matching approaches and OCR techniques to correctly detect the incoming bus, find the line number location and output the final answer to the user.

  • C. Guida, D. Comanducci, and C. Colombo, "Automatic bus line number localization and recognition on mobile phones - a computer vision aid for the visually impaired", ICIAP, 2011 | PDF | Poster
  • Presentation at the workshop organized by the Andrea Bocelli Foundation, 2012 | Slides | Video
 
Back to top