Dear readers today we are going to look at how to generate a depth image using 2 stereo images. I know that there exists a tutorial in the OpenCV – docs. Although the functions of OpenCV are implemented quite well in python, there seem to be some misunderstandings in how exactly to port the code. I will present to you the code step by step. In the bottom of the post you can find the complete code to copy/paste. If you don’t know how to get the opencv-contrib working take a look at this >>post.
Please note in order to run this example you will need a set of stereo images:
from the OpenCV-docs tutorial found here.
First let us import the Numpy and OpenCV package. For normalization sake we can go on and import normalize from the scikit package.
import numpy as np from sklearn.preprocessing import normalize import cv2lets now load the images in the same folder, the image for the left eye and the image for the right eye:
print('loading images...') imgL = cv2.imread('imgL.jpg') # downscale images for faster processing if you like imgR = cv2.imread('imgR.jpg')Now comes the interesting part, we define the parameters for the SGBM. These parameters can be changed although I recommend to use the below for standard web image sizes:
# SGBM Parameters ----------------- window_size = 3 # wsize default 3; 5; 7 for SGBM reduced size image; 15 for SGBM full size image (1300px and above); 5 Works nicely left_matcher = cv2.StereoSGBM_create( minDisparity=0, numDisparities=160, # max_disp has to be dividable by 16 f. E. HH 192, 256 blockSize=5, P1=8 * 3 * window_size ** 2, # wsize default 3; 5; 7 for SGBM reduced size image; 15 for SGBM full size image (1300px and above); 5 Works nicely P2=32 * 3 * window_size ** 2, disp12MaxDiff=1, uniquenessRatio=15, speckleWindowSize=0, speckleRange=2, preFilterCap=63, mode=cv2.STEREO_SGBM_MODE_SGBM_3WAY )This leads us to define the
right_matcher
so we can use it for our filtering later. This is a simple one-liner:right_matcher = cv2.ximgproc.createRightMatcher(left_matcher)To obtain hole free depth-images we can use the WLS-Filter. This filter also requires some parameters which are shown below:
# FILTER Parameters lmbda = 80000 sigma = 1.2 visual_multiplier = 1.0 wls_filter = cv2.ximgproc.createDisparityWLSFilter(matcher_left=left_matcher) wls_filter.setLambda(lmbda) wls_filter.setSigmaColor(sigma)Now we can compute the disparities and convert the resulting images to the desired
int16
format or how OpenCV names it:CV_16S
for our filter:print('computing disparity...') displ = left_matcher.compute(imgL, imgR) # .astype(np.float32)/16 dispr = right_matcher.compute(imgR, imgL) # .astype(np.float32)/16 displ = np.int16(displ) dispr = np.int16(dispr) filteredImg = wls_filter.filter(displ, imgL, None, dispr) # important to put "imgL" here!!!Finally if you show this image with
imshow()
you may not see anything. This is due to values being not normalized to a 8-bit format. So lets fix this by normalizing our depth map:filteredImg = cv2.normalize(src=filteredImg, dst=filteredImg, beta=0, alpha=255, norm_type=cv2.NORM_MINMAX); filteredImg = np.uint8(filteredImg) cv2.imshow('Disparity Map', filteredImg) cv2.waitKey() cv2.destroyAllWindows()That’s it! You have done it. Please feel free to post code-suggestions no-one is perfect. And remember if your image is not showing well most of the time you have not Rectified the pictures correctly before using Stereo Matching! This is a necessary step!
Stereo SGBM opencv result
CODE:
______import numpy as np from sklearn.preprocessing import normalize import cv2 print('loading images...') imgL = cv2.imread('imgL.jpg') # downscale images for faster processing imgR = cv2.imread('imgR.jpg') # SGBM Parameters ----------------- window_size = 3 # wsize default 3; 5; 7 for SGBM reduced size image; 15 for SGBM full size image (1300px and above); 5 Works nicely left_matcher = cv2.StereoSGBM_create( minDisparity=0, numDisparities=160, # max_disp has to be dividable by 16 f. E. HH 192, 256 blockSize=5, P1=8 * 3 * window_size ** 2, # wsize default 3; 5; 7 for SGBM reduced size image; 15 for SGBM full size image (1300px and above); 5 Works nicely P2=32 * 3 * window_size ** 2, disp12MaxDiff=1, uniquenessRatio=15, speckleWindowSize=0, speckleRange=2, preFilterCap=63, mode=cv2.STEREO_SGBM_MODE_SGBM_3WAY ) right_matcher = cv2.ximgproc.createRightMatcher(left_matcher) # FILTER Parameters lmbda = 80000 sigma = 1.2 visual_multiplier = 1.0 wls_filter = cv2.ximgproc.createDisparityWLSFilter(matcher_left=left_matcher) wls_filter.setLambda(lmbda) wls_filter.setSigmaColor(sigma) print('computing disparity...') displ = left_matcher.compute(imgL, imgR) # .astype(np.float32)/16 dispr = right_matcher.compute(imgR, imgL) # .astype(np.float32)/16 displ = np.int16(displ) dispr = np.int16(dispr) filteredImg = wls_filter.filter(displ, imgL, None, dispr) # important to put "imgL" here!!! filteredImg = cv2.normalize(src=filteredImg, dst=filteredImg, beta=0, alpha=255, norm_type=cv2.NORM_MINMAX); filteredImg = np.uint8(filteredImg) cv2.imshow('Disparity Map', filteredImg) cv2.waitKey() cv2.destroyAllWindows()
For some reason i get that ximgproc is not an attribute of cv2, any ideas?
Hey Gabriel!
You need cv2 and contributions to run this example. I would suggest to install one of those opencv wheels with: “+contrib” in the name:
https://www.lfd.uci.edu/~gohlke/pythonlibs/#opencv
just be sure to choose the correct wheel for your python installation. For a quick tutorial I would suggest taking a look at my post on my python setup:
http://timosam.com/opencv-3-contributions-python-3-numpy-intel-mkl-support-many
if none of the versions suits you, you will need to compile opencv 3 with contributions for python which can be a long process if you haven’t done so until now.
Happy to help,
Cheers
Did you try doing this in a live video stream? My depth image is flickering when I try to do it in a live video stream. Any ideas?
Hi Arindam,
this is not working out of the box because each frame is computed without the knowledge of the last frame. If you want to use it in a video stream without flickering you have to take into account time related transformations. I would suggest you take a look at this video: https://www.youtube.com/watch?v=ZUinHSjUZNM there should be also some code available here: https://github.com/rachillesf/stereoMagic. I hope this helps 🙂
Cheers
Hello, sorry for coming back to this but this comment was interesting for me. What do you mean by “take into account time related transformations”?
Hello, what do you mean by “take into acount time related transformations”?
hi, Arindam. Have you solved this problem? I could really use your help!
You have to understand that the computation is done on every single image. The slightest variation between different images or noise can result in different computations of depth. This could result in flickering.
That is also what I meant by time related transformations -> transformations between two images
Thanks for the script. But when I try to use this code for a video stream, the final depth image flickers a lot.
Have you tried this on a live video stream? How did you fix the flickers?
Hi Arindam,
this is not working out of the box because each frame is computed without the knowledge of the last frame. If you want to use it in a video stream without flickering you have to take into account time related transformations. I would suggest you take a look at this video: https://www.youtube.com/watch?v=ZUinHSjUZNM there should be also some code available here: https://github.com/rachillesf/stereoMagic. I hope this helps 🙂
Cheers
Hey T.S., nice and easy to follow article. I have a few questions and I request some pointers or resources or insights please.
1) You said if image is not showing well then perhaps it is due to rectification. How do you perform this step in OpenCV? Have you done that or you use already rectified images?
2) You don’t use any camera calibration parameters? Any reason for that?
3) The image your code generates is Disparity Map and the title of article is Depth Map generation. I guess disparity and depth are inversely proportional so how to compute depth from disparity?
4) I am working towards computation of depth from two videos, one taken from front and another from left. I have extracted time-aligned frames from both videos and am trying to compute (X,Y,Z) coordinates that signify the ROI from two frames at a time. I am more interested in spatial indexing rather than the pixels so that I can maintain the coordinates with respect to spatial location. Any ideas or suggestions would be greatly appreciated. Big THX
Hey SKR!
Ok so I will try to answer your questions the best I can but feel free to find additional info:
to 1) and 2): So yes in a real scenario you will need to rectify and lens-correct your images therefore you will need the camera calibration parameters. This was not part of this tutorial however maybe I will do a tutorial on the calibration too since the most errors come from a wrong or calibration of the cameras. In the most cases you have to calibrate your camera parameters correctly. Most of the time this is the biggest problem. Also don’t forget that you have to sync the cameras if you have moving objects!
to 3) Yes you are right! So of course the Disparity Map is not necessarily a Depth Map. However you can derive the depth of the pixels if you measure the distance of the computed pixels in a real world scenario and you have a calibrated system. I myself have never performed that exact task but I suppose you could define depth by the grey level of your pixels and assign a specific depth to each grey level (this should be a linear relationship).
to 4) So the most important thing in stereo-video depth reconstruction is synchronization of the shutter in the cameras! I would say this is probably the most difficult to achieve if you don’t have an integrated system like this: stereocam (I am not in any way related to this company and this is just an example!). However there have been some attempts to do this with synchronized cameras I would search for: “synchronized shutter diy” or similar.
I hope this helps! 🙂
Thanks a lot TS. Your insights are really helpful. I will keep an eye for your future articles on calibration.
Happy to help! 🙂
Hey SKR!
Ok so I will try to answer your questions the best I can but feel free to find additional info:
to 1) and 2): So yes in a real scenario you will need to rectify and lens-correct your images therefore you will need the camera calibration parameters. This was not part of this tutorial however maybe I will do a tutorial on the calibration too since the most errors come from a wrong or calibration of the cameras. In the most cases you have to calibrate your camera parameters correctly. Most of the time this is the biggest problem. Also don’t forget that you have to sync the cameras if you have moving objects!
to 3) Yes you are right! So of course the Disparity Map is not necessarily a Depth Map. However you can derive the depth of the pixels if you measure the distance of the computed pixels in a real world scenario and you have a calibrated system. I myself have never performed that exact task but I suppose you could define depth by the grey level of your pixels and assign a specific depth to each grey level (this should be a linear relationship).
to 4) So the most important thing in stereo-video depth reconstruction is synchronization of the shutter in the cameras! I would say this is probably the most difficult to achieve if you don’t have an integrated system like this: stereocam (I am not in any way related to this company and this is just an example!). However there have been some attempts to do this with synchronized cameras I would search for: “synchronized shutter diy” or similar.
I hope this helps! 🙂
Hey TS, many thanks for your insights. I was able to decode from a depth video frames and was able to generate (x,y,z) from a RGB and a depth video frame. Problem is when I created a point cloud to do 3D reconstruction from (x,y,z) world coordinates the final cloud looks more like a 2D frame rather than a 3D image because depth is not filled up, perhaps because of lack of volume blending. Can you provide some pointers or may be just let me know in your reply “What is volume blending” and “how one can achieve it while doing 3D reconstruction”? Thanks once again.
Could you tell how to proceed in getting pose of object from stereo vision camera?
Hey TS, just a quick question, do you have any idea about what a volume blending means and how one can achieve it so that 3D reconstructed images look like a filled up object instead of a plain image? I was unable to find some good resources to read and understand about it. Thanks in anticipation.
The image that is calculated is basically a point cloud you can use to reconstruct a 3D surface. The problem is that if the reconstruction is not possible due to errors or the reconstruction has not enough points that can be calculated you will not achieve A Surface you can work with.
For Volume blending in order to achieve a 3D reconstruction of a complete model I recommend the software from https://www.agisoft.com
A good video of an approach used by the kinect is this: https://www.youtube.com/watch?v=XD_UnuWSaoU
Good Luck!
Pingback: Reconstructing 3D Models from The Last Jedi – Terence Eden's Blog
hello, im from indonesia. i get stuck in here, please help me 🙂
Python 3.7.1 (v3.7.1:260ec2c36a, Oct 20 2018, 14:57:15) [MSC v.1915 64 bit (AMD64)] on win32
Type “help”, “copyright”, “credits” or “license()” for more information.
>>>
======================== RESTART: C:\Python37\111.py ========================
loading images…
Traceback (most recent call last):
File “C:\Python37\111.py”, line 23, in
right_matcher = cv2.ximgproc.createRightMatcher(left_matcher)
AttributeError: module ‘cv2.cv2’ has no attribute ‘ximgproc’
please install the additional dependencies for opencv as seen in my other post: http://timosam.com/opencv-3-contributions-python-3-numpy-intel-mkl-support-many
Hey T.S.
This code looks great in acquiring the disparity map. But I want to ask how to retrieve the real disparity values from filteredImg? As Opencv uses int16 to present the original disparity, so I think it is still necessary to divide all values by 16 in the filteredImg. However, when I try to divise the valuses by 16 in order to get the realy disparity values, the results seems strange and I think they are wrong. Do you have any ideas about this issue?
You have to calculate the correct distance by yourself and by a reference object and distance or triangulation or a similar approach
Hey,
Thanks for the blog, super helpful. When I run the program it gives the following error: “Failed to allocate 4294456596 bytes in function”. I tracked down the memory error to left_matcher.compute. Any idea how to fix this error?
Thanks!
Is there a propability that your image is too big?
Can you get the real distance (depth) from the disparity image?
You have to calculate the correct distance by yourself and by a reference object and distance or triangulation or a similar approach
Does the pixels of this diaparity image represent the real distance from the camera? eg, if the pixel (30, 40) has the value of 37, what does that 37 reprresent? does it represent 37 mm or what?
You have to calculate the correct distance by yourself and by a reference object and distance or triangulation or a similar approach
when i run this code I have the following error:
wls_filter = cv2.ximgproc.createDisparityWLSFilter(matcher_left=left_matcher)
AttributeError: module ‘cv2.cv2’ has no attribute ‘ximgproc’
please install the additional dependencies for opencv as seen in my other post: http://timosam.com/opencv-3-contributions-python-3-numpy-intel-mkl-support-many
Hi!
When I process a 384 x 288 png image it shows me the half of the image at the end of the process.
How can I solve this problem?
Check the parameters used and check where the images are coming from are the cameras synchronized moving etc.?
After I have done the disparity, how do I calculate the distance of objects and camera by using the disparity values ?
You have to calculate the correct distance by yourself and by a reference object and distance or triangulation or a similar approach
Hi,
My problem is to find point-to-point correspondence between my left and right images. At first, I thought I will generate disparity map and reproject points from the disparity map to my original left and right images. So far, I’ve created the disparity map and a 3D point cloud from it and discovered that disparity map’s coordinate axes aligns with that of the left image (which means each pixel in the disparity map can be found at exactly the same position in the left image), but I’m still having trouble finding corresponding locations in my right image. Is this doable using the disparity map at all?
I again want to point out that as for now, my concern is to find point-to-point correspondence between my left and right images and not the generation of the disparity/depth map.
Thanks.
You can generate the disparity map for both images. Does that help you further?
Hi, thanks for sharing. I want to get depth map with VIS-NIR Binocular camera, and i use this code you sharing, but it seems bad, Do you have any suggestion?
Hey parker,
There are many different parameters that could go wrong. E. g. what camera do you use exactly? Are the Images synchronized? Is the resolution high enough? Is there enough picture Information in a Greyscale image?
Maybe specify your problem further.
Good luck
Cheers
Hey Gabriel,
Awesome tutorial, I removed the black border on the left side by padding the image and then cropping it. I was wondering if you were able to replicate the KITTI-2015 results using the SGBM python algorithm. I get inconsistent results currently, with some images giving very high percentage of error.
Hey AK,
Unfortunately not, but I am sure that newer methods should present better results right know. It can depend on many things starting from your code/parameters to a compressed image or similar errors.
Good Luck!
Hello Tim , First of all I’d like to thank your for sharing this great code with us.
I have a question. I’m trying to use your code to generate actual depth map that holds actual distances,
but the values are not very accurate any fluctuates a lot.
May you give me your opinion?
Hey Ahmed,
the depth mapping achieved with this method relies a lot on the equipment used. It is of course hard to give you an opinion without knowing your data but I would look into: Are the images synchronized? are the images compressed? Is the stereo setup configured correctly?
I would suggest looking also to other possible algorithms because in recent years there was a lot of good work done in the field. One point of entry would be this: http://www.cvlibs.net/datasets/kitti/eval_scene_flow.php?benchmark=stereo
I strongly suggest reading in this paper https://arxiv.org/pdf/2003.10432v1.pdf and following the references in the Chapter 2 Related Work!
Cheers and all the best in those times!