Computer Vision Challenge 1: Augmented Reality
This is a challenge we’re working through in the Silicon Valley Computer Vision Meetup. Feel free to follow along on your own.
The source and supporting files are available in a GitHub repo.
The first challenge is to implement augmented reality (AR). This means synchronizing the world as seen through a camera and then superimposing computer-generated imagery on top of the real world. You can see an example in the following photo. The 3D graphic of a globe is superimposed over a real-world image. In this case, the position of the globe is based on the position of the black square printed on the piece of paper. (The white square is for orientation, and will only be needed later. To begin with we will use a target with only a solid black square.) The black square is an example of a fiducial marker. Marker-based AR uses markers like these (or more sophisticated ones that encode data) to orient the camera with respect to the world.
Like most computer vision projects, this will require a pipeline. The basic pipeline will do the following:
Run video from a camera through OpenCV and out to the screen.
Separate the black square from the rest of the image.
Find the contours (2d coordinates for the outline of the square).
Get the four vertices of the square from the contours.
Map the 2d coordinates of the corners to their coordinates in 3d.
Draw a 3d object relative to the square’s position in 3d.
Use the second smaller square to orient your 3d drawing.
Each of these steps will teach you something about computer vision and OpenCV.Iif you don’t complete the whole thing, don’t worry. However far you get, you’ll learn something.
Step 0 Install OpenCV
You can’t use OpenCV until you install it. Unfortunately, OpenCV has a fairly complicated install process due to all the dependencies it includes.. Here are some suggestions to make things easier:
Mac: Use your favorite package manager to install OpenCV. For example, to install with MacPorts:
Install MacPorts from the MacPorts website.
Update MacPorts by issuing the command “sudo port selfupdate” from the command line.
Install OpenCV by entering the command “sudo port install opencv”. This will take several hours.
Linux: Use your system package manager to install OpenCV.
Windows: No idea. Try using the Windows installer on the OpenCV home page maybe?
Step 1 Video from Camera to Screen
This step will introduce you to OpenCV’s HighGUI module. The HighGUI module provides a simple cross-platform way of drawing windows and reading from the camera, among other things.
Rather than walk you through this, I’ve written the code for you. Take a look at step1.cpp on GitHub. The line “cap >> image;” creates an OpenCV Mat object from the image capture device. The line “cv::imshow(“image”, image);” puts up a window titled “image” and shows the image in it.
Virtually all of the code you write for this challenge will be between those two lines of code. You capture the image with “cap >> image”, process it somehow, and show the result with “cv::imshow”.
Important! In addition to calling “cap >> image” and “cv::imshow”, you also need to call “cv::waitKey” to give the system time to process events. If you don’t call cv::waitKey, the image window may mysteriously stop updating under certain circumstances.
Before you go any further build and run the step1 program. This will show that you’ve got everything hooked up properly.
Step 2: Separate the black square from the rest of the image.
Note: steps 2 through 6 use the solid square marker. Click on the link and print it out. You may want to scale your print so there’s plenty of white border around the black square.
In step 3, we’re going to use the findContours call to find the outline of the shape of the black square. But findContours wants a single channel binary image (anything that isn’t 0 is treated as a 1. So we’ll have to modify the image before findContours does its magic.
First, we’ll need to convert the image from color to grayscale. This is done with the cvtColor call. Then we’ll need to threshold the image using either threshold or adaptiveThreshold.
Step 3: Find the contours.
Now that we have a properly prepared image, we can find the contours. This involves a single call to findContours.
Step 4: Find the four corners of the square.
findContours returns a vector of contours. From that, we’ll need to extract the polygon for the square. One approach to this is to use approxPolyDP to find polygon approximations of the contours. Then search for polygons with only four vertices. See the squares example from in your OpenCV samples directory for details.
Step 5: Map the 2d coordinates to 3d.
For this step, we’ll use the call solvePnP to map the 2d points onto the 3d points in our model of the world.
Note that two of the parameters solvePnP take are a camera matrix and a vector of distortion coefficients. To compute these, you’ll need to calibrate your camera, then pass in the computed values to solvePnP.
Step 6: Draw something.
Once you have computed the camera and object position with solvePnP, you can use the results to project 3d points onto the image using projectPoints. The you can use the OpenCV drawing functions to draw a line, for example, that moves with the image.
Step 7: Orient your drawing with the small white square.
Up until now, we’ve been picking which 3 vertices of the square to use arbitrarily. In real-world situations, you probably want to keep track of the marker’s orientation, so you can orient a complex drawing constantly as the marker rotates from the point of view of the camera.
To do this, we’ll switch from the solid square marker to the marker with the small white square. Print out this marker like you did the solid one.
We start the same way we did before: find the large square. However, once we’ve done that, we then find the small square. We’ll designate the vertex of the large square closest to the small square as “first”.
That’s all there is to it!
References
This exercise was inspired in part by chapter 2 of Mastering OpenCV with Practical Computer Vision Projects from Packt Publishing. That chapter walks through building a more advanced AR pipeline. The chapter is also available as an article on the Packt website. If you’re not developing for iPhone, you can skip down to the section titled “Marker Detection”, after which point all the code is all platform-independent C++.
There is also a full marker-based AR library built on top of OpenCV called ArUco that does almost all of the work for you. I used ArUco to build my iPhone AR demo.