Open CV Stereo – Depth image generation and filtering with python 3+, ximgproc and OpenCV 3+ 14


Dear readers today we are going to look at how to generate a depth image using 2 stereo images. I know that there exists a tutorial in the OpenCV – docs. Although the functions of OpenCV are implemented quite well in python, there seem to be some misunderstandings in how exactly to port the code. I will present to you the code step by step. In the bottom of the post you can find the complete code to copy/paste. If you don’t know how to get the opencv-contrib working take a look at this >>post.

Please note in order to run this example you will need a set of stereo images:

>>imgL and: >>imgR

from the OpenCV-docs tutorial found here.

First let us import the Numpy and OpenCV package. For normalization sake we can go on and import normalize from the scikit package.

import numpy as np
from sklearn.preprocessing import normalize
import cv2

lets now load the images in the same folder, the image for the left eye and the image for the right eye:

print('loading images...')
imgL = cv2.imread('imgL.jpg')  # downscale images for faster processing if you like
imgR = cv2.imread('imgR.jpg')

Now comes the interesting part, we define the parameters for the SGBM. These parameters can be changed although I recommend to use the below for standard web image sizes:

# SGBM Parameters -----------------
window_size = 3                     # wsize default 3; 5; 7 for SGBM reduced size image; 15 for SGBM full size image (1300px and above); 5 Works nicely

left_matcher = cv2.StereoSGBM_create(
    minDisparity=0,
    numDisparities=160,             # max_disp has to be dividable by 16 f. E. HH 192, 256
    blockSize=5,
    P1=8 * 3 * window_size ** 2,    # wsize default 3; 5; 7 for SGBM reduced size image; 15 for SGBM full size image (1300px and above); 5 Works nicely
    P2=32 * 3 * window_size ** 2,
    disp12MaxDiff=1,
    uniquenessRatio=15,
    speckleWindowSize=0,
    speckleRange=2,
    preFilterCap=63,
    mode=cv2.STEREO_SGBM_MODE_SGBM_3WAY
)

This leads us to define the right_matcher so we can use it for our filtering later. This is a simple one-liner:

right_matcher = cv2.ximgproc.createRightMatcher(left_matcher)

To obtain hole free depth-images we can use the WLS-Filter. This filter also requires some parameters which are shown below:

# FILTER Parameters
lmbda = 80000
sigma = 1.2
visual_multiplier = 1.0

wls_filter = cv2.ximgproc.createDisparityWLSFilter(matcher_left=left_matcher)
wls_filter.setLambda(lmbda)
wls_filter.setSigmaColor(sigma)

Now we can compute the disparities and convert the resulting images to the desired int16 format or how OpenCV names it: CV_16S for our filter:

print('computing disparity...')
displ = left_matcher.compute(imgL, imgR)  # .astype(np.float32)/16
dispr = right_matcher.compute(imgR, imgL)  # .astype(np.float32)/16
displ = np.int16(displ)
dispr = np.int16(dispr)
filteredImg = wls_filter.filter(displ, imgL, None, dispr)  # important to put "imgL" here!!!

Finally if you show this image with imshow() you may not see anything. This is due to values being not normalized to a 8-bit format. So lets fix this by normalizing our depth map:

filteredImg = cv2.normalize(src=filteredImg, dst=filteredImg, beta=0, alpha=255, norm_type=cv2.NORM_MINMAX);
filteredImg = np.uint8(filteredImg)
cv2.imshow('Disparity Map', filteredImg)
cv2.waitKey()
cv2.destroyAllWindows()

That’s it! You have done it. Please feel free to post code-suggestions no-one is perfect. And remember if your image is not showing well most of the time you have not Rectified the pictures correctly before using Stereo Matching! This is a necessary step!

Stereo SGBM opencv result

Stereo SGBM opencv result

CODE:
______

import numpy as np
from sklearn.preprocessing import normalize
import cv2

print('loading images...')
imgL = cv2.imread('imgL.jpg')  # downscale images for faster processing
imgR = cv2.imread('imgR.jpg')

# SGBM Parameters -----------------
window_size = 3                     # wsize default 3; 5; 7 for SGBM reduced size image; 15 for SGBM full size image (1300px and above); 5 Works nicely

left_matcher = cv2.StereoSGBM_create(
    minDisparity=0,
    numDisparities=160,             # max_disp has to be dividable by 16 f. E. HH 192, 256
    blockSize=5,
    P1=8 * 3 * window_size ** 2,    # wsize default 3; 5; 7 for SGBM reduced size image; 15 for SGBM full size image (1300px and above); 5 Works nicely
    P2=32 * 3 * window_size ** 2,
    disp12MaxDiff=1,
    uniquenessRatio=15,
    speckleWindowSize=0,
    speckleRange=2,
    preFilterCap=63,
    mode=cv2.STEREO_SGBM_MODE_SGBM_3WAY
)

right_matcher = cv2.ximgproc.createRightMatcher(left_matcher)

# FILTER Parameters
lmbda = 80000
sigma = 1.2
visual_multiplier = 1.0

wls_filter = cv2.ximgproc.createDisparityWLSFilter(matcher_left=left_matcher)
wls_filter.setLambda(lmbda)
wls_filter.setSigmaColor(sigma)

print('computing disparity...')
displ = left_matcher.compute(imgL, imgR)  # .astype(np.float32)/16
dispr = right_matcher.compute(imgR, imgL)  # .astype(np.float32)/16
displ = np.int16(displ)
dispr = np.int16(dispr)
filteredImg = wls_filter.filter(displ, imgL, None, dispr)  # important to put "imgL" here!!!

filteredImg = cv2.normalize(src=filteredImg, dst=filteredImg, beta=0, alpha=255, norm_type=cv2.NORM_MINMAX);
filteredImg = np.uint8(filteredImg)
cv2.imshow('Disparity Map', filteredImg)
cv2.waitKey()
cv2.destroyAllWindows()

Leave a comment

Your email address will not be published. Required fields are marked *

14 thoughts on “Open CV Stereo – Depth image generation and filtering with python 3+, ximgproc and OpenCV 3+

  • Arindam

    Thanks for the script. But when I try to use this code for a video stream, the final depth image flickers a lot.
    Have you tried this on a live video stream? How did you fix the flickers?

  • SKR

    Hey T.S., nice and easy to follow article. I have a few questions and I request some pointers or resources or insights please.
    1) You said if image is not showing well then perhaps it is due to rectification. How do you perform this step in OpenCV? Have you done that or you use already rectified images?
    2) You don’t use any camera calibration parameters? Any reason for that?
    3) The image your code generates is Disparity Map and the title of article is Depth Map generation. I guess disparity and depth are inversely proportional so how to compute depth from disparity?
    4) I am working towards computation of depth from two videos, one taken from front and another from left. I have extracted time-aligned frames from both videos and am trying to compute (X,Y,Z) coordinates that signify the ROI from two frames at a time. I am more interested in spatial indexing rather than the pixels so that I can maintain the coordinates with respect to spatial location. Any ideas or suggestions would be greatly appreciated. Big THX

    • timsamart Post author

      Hey SKR!

      Ok so I will try to answer your questions the best I can but feel free to find additional info:

      to 1) and 2): So yes in a real scenario you will need to rectify and lens-correct your images therefore you will need the camera calibration parameters. This was not part of this tutorial however maybe I will do a tutorial on the calibration too since the most errors come from a wrong or calibration of the cameras. In the most cases you have to calibrate your camera parameters correctly. Most of the time this is the biggest problem. Also don’t forget that you have to sync the cameras if you have moving objects!

      to 3) Yes you are right! So of course the Disparity Map is not necessarily a Depth Map. However you can derive the depth of the pixels if you measure the distance of the computed pixels in a real world scenario and you have a calibrated system. I myself have never performed that exact task but I suppose you could define depth by the grey level of your pixels and assign a specific depth to each grey level (this should be a linear relationship).

      to 4) So the most important thing in stereo-video depth reconstruction is synchronization of the shutter in the cameras! I would say this is probably the most difficult to achieve if you don’t have an integrated system like this: stereocam (I am not in any way related to this company and this is just an example!). However there have been some attempts to do this with synchronized cameras I would search for: “synchronized shutter diy” or similar.

      I hope this helps! ๐Ÿ™‚

    • timsamart Post author

      Hey SKR!

      Ok so I will try to answer your questions the best I can but feel free to find additional info:

      to 1) and 2): So yes in a real scenario you will need to rectify and lens-correct your images therefore you will need the camera calibration parameters. This was not part of this tutorial however maybe I will do a tutorial on the calibration too since the most errors come from a wrong or calibration of the cameras. In the most cases you have to calibrate your camera parameters correctly. Most of the time this is the biggest problem. Also don’t forget that you have to sync the cameras if you have moving objects!

      to 3) Yes you are right! So of course the Disparity Map is not necessarily a Depth Map. However you can derive the depth of the pixels if you measure the distance of the computed pixels in a real world scenario and you have a calibrated system. I myself have never performed that exact task but I suppose you could define depth by the grey level of your pixels and assign a specific depth to each grey level (this should be a linear relationship).

      to 4) So the most important thing in stereo-video depth reconstruction is synchronization of the shutter in the cameras! I would say this is probably the most difficult to achieve if you don’t have an integrated system like this: stereocam (I am not in any way related to this company and this is just an example!). However there have been some attempts to do this with synchronized cameras I would search for: “synchronized shutter diy” or similar.

      I hope this helps! ๐Ÿ™‚

      • SKR

        Hey TS, many thanks for your insights. I was able to decode from a depth video frames and was able to generate (x,y,z) from a RGB and a depth video frame. Problem is when I created a point cloud to do 3D reconstruction from (x,y,z) world coordinates the final cloud looks more like a 2D frame rather than a 3D image because depth is not filled up, perhaps because of lack of volume blending. Can you provide some pointers or may be just let me know in your reply “What is volume blending” and “how one can achieve it while doing 3D reconstruction”? Thanks once again.

      • SKR

        Hey TS, just a quick question, do you have any idea about what a volume blending means and how one can achieve it so that 3D reconstructed images look like a filled up object instead of a plain image? I was unable to find some good resources to read and understand about it. Thanks in anticipation.