Introduction
In this article, we will learn to write Python scripts to pre-process. Since to do so we will use OpenCV, I will also be discussing OpenCV and the OpenCV functions that we need.
OpenCV
OpenCV is a programming framework that primarily aims at computer vision in real-time. Originally developed by Intel, it was later supported by Willow Garage then Itseez. The repository is cross-platform and is publicly accessible under open-source BSD. It was first released in June 2000.
OpenCV Python API Commands
Some OpenCV commands that we will be using from this article onwards:
1. To Read the Video/Image
- cv2.VideoCapture()
To initialize the VideoCapture object
- cv2.VideoCapture(filename)
To load the filename into the VideoCapture object
- cv2.VideoCapture(device)
To load the device id into the VideoCapture object
- cv2.VideoCapture.open(filename)
To load and open the filename into the VideoCapture object
- cv2.VideoCapture.open(device)
To load and open the device into the VideoCapture object
- cv2.VideoCapture.release()
to close file or capturing device
- cv2.VideoCapture.read([image])
To grab, decode and return the next video frame
where
- filename
Name of the opened video file (ex. video.avi) or an image sequence (ex. img_%02d.jpg, which will read samples img_00.jpg, img_01.jpg, and so on)
- device
Id of the opened video capturing device (i.e a camera index). If there is a single camera connected, just pass 0.
2. To Write the Video/Image
- cv2.VideoWriter()
To create default VideoWriter Object
- cv2.VideoWriter([filename, fourcc, fps, framesize [,isColor]])
To create parameterized VideoWriter Object
- cv2.VideoWriter.isOpened()
To return if the VideoWriter object is loaded or not
- cv2.VideoWriter.open([filename, fourcc, fps, framesize [,isColor]])
To open the VideoWriter object to start writing
- cv2.VideoWriter.write(image)
To start the writing/creating video
where
- filename
Name of the output video file
- framesize
Size of the video frame
- fps
Frame rate of the created/loaded video
- fourcc
4 character code of codec used to compress the frames
- isColor
If not zero, the encoder will expect and encode color frames, otherwise it will work with grayscale
3. To Change the Size
dst= cv2.resize(src, dsize[, dst [, fx [, fy, interpolation]]]])
Where
- src
Name/Path of the input image
- dst
Name/Path of the output image
- dsize
Size of the output image
dsize= size(round(fx*src.cols), round(fy*src.rows))
- fx
Scalar factor along the horizontal axis
fx = (double) dsize.width/src.cols
- fy
Scalar factor along the vertical axis
fy= (double) dsize.height/src.rows
- interpolation
Interpolation values
4. To Change the Color of Image/Frame of Video
cv2.CvtColor(src, dst, code)
where
- src
Name/Path of the input image
- dst
Name/Path of the output image
- code
Color space conversion code
- CV2_BGR2GRAY
To convert BGR to Gray
- CV2_RGB2GRAY
To convert RGB to Gray
- CV2_GRAY2BGR
To convert Gray to BGR
- CV2_GRAY2RGB
To convert Gray to RGB
5. To Construct Rectangle
- cv2.rectangle(img, pt1, pt2, color [, thickness [, lineType [, shift ]]] )
- cv2.rectangle(img, rec, color [, thickness [, lineType [, shift ]]] )
where
- img
Name/Path of the input image
- pt1
Vertex of rectangle
- pt2
Vertex of rectangle opposite to pt1
- color
Rectangle color or brightness
- thickness
Thickness of lines
- lineType
Type of line
- shift
Number of fractional bits in the point coordinates
6. To Read an Image
cv2.imread(filename [, flags ])
where
- flags
- CV_LOAD_IMAGE_COLOR
- CV_LOAD_IMAGE_ANYDEPTH
- CV_LOAD_IMAGE_GRAYSCALE
- >0, return a 3-channel color image
- =0, return a grayscale image
- <0, return the loaded images as it is
7. To Write an Image
cv2.imwrite(file, img, [, params])
where
- param
- CV_IMWRITE_JPEG_QUALITY or value can be between 0 to 100
- CV_IMWRITE_PNG_QUALITY or value can be between 0 to 9
- CV_IMWRITE_PXM_BINARY or value can be 0 or 1
8. For Canny Edge Detection
- edges= cv2.Canny(image, threshold1, threshold2 [, edges [, apertureSize [, L2gradient ]]])
- edges= cv2.Canny(dx, dy, threshold1, threshold2 [, edges, [, L2gardient ]])
where
- image
Single-channel 8-bit input image
- edges
Output edge map, it has the same size and type as image
- threshold1
1st threshold for the hysteresis procedure
- threshold2
2nd threshold for the hysteresis procedure
- apertureSize
Aperture size for the Sobel() operator
- L2Gradient
a flag, indicating whether a more accurate should be used to calculate the image gradient magnitude ( L2gradient=true ), or whether the default norm, is enough ( L2gradient=false ).
Image Pre-Processing
Many may think that we should start processing directly, but that is not the case, in order to get the best of the image or video frame, we have to pre-process an image. Pre-processing means to augment or normalize an image or video frame so that all the pixels are at the same level. Let me give you an example, suppose you have to look at 4 images with (1024x1024, 380x280, 100x100 and 400x50 pixels) and find which image is a monkey. If we want to process these images, we have to change the dsize value each time as the pixel values are changing constantly for each image. So to solve this we need to either increase or decrease the number the pixels for each image, this process of increasing or decreasing pixels is what we call pre-processing,
Image Processing Application
Let us start. In this article, we will not be interacting with the pre-trained models that we downloaded in the previous article, as to interact with these pre-trained models we need to use OpenVINO Python API, which we will discuss in the coming article. This article is intended just to tell you how we do processing or you can say pre-processing on an image or video frame.
- def preprocessing(input_image, height, width):
- image = cv2.resize(input_image, (width, height))
- image = image.transpose((2, 0, 1))
- image = image.reshape(1, 3, height, width)
-
- return image
In the above code, we resize the coming frame to the desired height and width. After that, we perform the transpose operation which we will convert the image vector format from (height, weight, channel) to (channel, height, weight) format. And at last, we do the reshaping to convert the image to [batch_size, number_of_channels, height, width] vector, which is the required input format by the Pre-Trained Models.
- def pose_estimation(input_image):
- preprocessed_image = np.copy(input_image)
- preprocessed_image = preprocessing(preprocessed_image, 256, 456)
- return preprocessed_image
-
- def text_detection(input_image):
- preprocessed_image = np.copy(input_image)
- preprocessed_image = preprocessing(preprocessed_image, 768, 1280)
- return preprocessed_image
-
- def car_meta(input_image):
- preprocessed_image = np.copy(input_image)
- preprocessed_image = preprocessing(preprocessed_image, 72, 72)
- return preprocessed_image
In the above code, we pre-process each of the given images as per the required dimensions by the corresponding model.
Note:
You can refer to the official documentation of the corresponding model, to find the dimensions.
After we finish writing the code for preprocessing, we will now write the control logic, here our aim is to see if the images are loading and check whether we are able to preprocess the given images.
-
- POSE_IMAGE = cv2.imread("sitting-on-car.jpg")
- TEXT_IMAGE = cv2.imread("sign.jpg")
- CAR_IMAGE = cv2.imread("blue-car.jpg")
-
-
- test_names = ["Pose Estimation", "Text Detection", "Car Meta"]
In the above code, we load the images and assign the name of the test that we will perform.
- def set_solution_functions():
- global solution_funcs
- solution_funcs = {
- test_names[0]: pose_solution,
- test_names[1]: text_solution,
- test_names[2]: car_solution,
- }
-
- def pose_solution(input_image):
- return preprocessing(input_image, 256, 456)
-
- def text_solution(input_image):
- return preprocessing(input_image, 768, 1280)
-
- def car_solution(input_image):
- return preprocessing(input_image, 72, 72)
Tthe above code is used to initialize all the required variables, here these functions are used to return the preprocessed images.
-
- def test_pose():
- comparison = test(pose_estimation, test_names[0], POSE_IMAGE)
- return comparison
-
-
- def test_text():
- comparison = test(text_detection, test_names[1], TEXT_IMAGE)
- return comparison
-
-
- def test_car():
- comparison = test(car_meta, test_names[2], CAR_IMAGE)
- return comparison
-
-
- def test(test_func, test_name, test_image):
-
- try:
- s_processed = test_func(test_image)
- except:
- print_exception(test_name)
- return
-
- solution = solution_funcs[test_name](test_image)
- comparison = np.array_equal(s_processed, solution)
- print_test_result(test_name, comparison)
-
- return comparison
In the above code, we declare all the test functions. In each of the test-specific functions, we are passing the name of the test, image name, and the name of the function on which the test has to be performed.
In the "test" function, we test if we are able to get any output, and if we are not able to get any output we print an exception for the corresponding test. The criteria to pass the test is that the preprocessed image vector should be the same as the original image vector.
- def print_exception(test_name):
- print("Failed to run test on {}.".format(test_name))
- print("The code should be valid Python and return the preprocessed image.")
-
- def print_test_result(test_name, result):
- if result:
- print("Passed {} test.".format(test_name))
- else:
- print("Failed {} test, did not obtain expected preprocessed image.".format(test_name))
The above are the helper functions needed by "test" function.
- def feedback(tests_passed):
- print("You passed {} of 3 tests.".format(int(tests_passed)))
- if tests_passed == 3:
- print("Congratulations!")
- else:
- print("See above for additional feedback.")
The above code is used to print the final output; i.e. if you pass all 3 tests it will print "congratulations" otherwise it will ask to see which all test application was not able to pass.
Output
Passed Pose Estimation test.
Passed Text Detection test.
Passed Car Meta test.
You passed 3 of 3 tests.
Congratulations!
I have attached all 3 images that I used and the Python scripts with proper formatting and comments.
Conclusion
In this article, I discussed how we pre-process an image so that we can get the maximum benefit from it. In the coming articles, I will tell you how we can combine the pre-trained models and the preprocessing to make an application that can annotate an image with its characteristics.