Introduction
In this tutorial, we will create an air canvas application that allows users to draw in the air using hand gestures. We will utilize OpenCV for image processing, MediaPipe for hand tracking, and some basic machine learning techniques to interpret the hand movements. The goal is to create an interactive drawing experience without touching any physical device.
Prerequisites
Before we start, ensure you have the following libraries installed:
You can install these libraries using pip:
pip install opencv-python numpy mediapipe
Step-by-Step Guide
Step 1. Importing Libraries
We begin by importing the necessary libraries for our project:
import cv2
import numpy as np
import mediapipe as mp
from collections import deque
Step 2. Setting Up Color Points and Indices
We need to set up deques to handle points for different colors and keep track of the indices for these colors:
bpoints = [deque(maxlen=1024)]
gpoints = [deque(maxlen=1024)]
rpoints = [deque(maxlen=1024)]
ypoints = [deque(maxlen=1024)]
blue_index = 0
green_index = 0
red_index = 0
yellow_index = 0
Step 3. Canvas Setup
Next, we create the canvas window where the drawing will appear and set up the buttons for color selection and clearing the canvas:
paintWindow = np.zeros((471, 636, 3)) + 255
paintWindow = cv2.rectangle(paintWindow, (40, 1), (140, 65), (0, 0, 0), 2)
paintWindow = cv2.rectangle(paintWindow, (160, 1), (255, 65), (255, 0, 0), 2)
paintWindow = cv2.rectangle(paintWindow, (275, 1), (370, 65), (0, 255, 0), 2)
paintWindow = cv2.rectangle(paintWindow, (390, 1), (485, 65), (0, 0, 255), 2)
paintWindow = cv2.rectangle(paintWindow, (505, 1), (600, 65), (0, 255, 255), 2)
cv2.putText(paintWindow, "CLEAR", (49, 33), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 2, cv2.LINE_AA)
cv2.putText(paintWindow, "BLUE", (185, 33), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 2, cv2.LINE_AA)
cv2.putText(paintWindow, "GREEN", (298, 33), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 2, cv2.LINE_AA)
cv2.putText(paintWindow, "RED", (420, 33), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 2, cv2.LINE_AA)
cv2.putText(paintWindow, "YELLOW", (520, 33), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 2, cv2.LINE_AA)
cv2.namedWindow('Paint', cv2.WINDOW_AUTOSIZE)
Step 4. Initializing MediaPipe Hand Detection
We initialize MediaPipe for hand tracking, allowing detection of up to one hand with a minimum detection confidence of 0.7:
mpHands = mp.solutions.hands
hands = mpHands.Hands(max_num_hands=1, min_detection_confidence=0.7)
mpDraw = mp.solutions.drawing_utils
Step 5. Capturing Video from Webcam
We set up the webcam to capture video frames continuously:
cap = cv2.VideoCapture(0)
ret = True
while ret:
ret, frame = cap.read()
...
Step 6. Processing Each Frame
Each frame is flipped horizontally to create a mirror effect and converted from BGR to RGB for MediaPipe processing:
frame = cv2.flip(frame, 1)
framergb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
Step 7. Drawing UI Elements on Frame
We draw the rectangles and text labels for color selection and clearing the canvas on each frame:
frame = cv2.rectangle(frame, (40, 1), (140, 65), (0, 0, 0), 2)
frame = cv2.rectangle(frame, (160, 1), (255, 65), (255, 0, 0), 2)
frame = cv2.rectangle(frame, (275, 1), (370, 65), (0, 255, 0), 2)
frame = cv2.rectangle(frame, (390, 1), (485, 65), (0, 0, 255), 2)
frame = cv2.rectangle(frame, (505, 1), (600, 65), (0, 255, 255), 2)
cv2.putText(frame, "CLEAR", (49, 33), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 2, cv2.LINE_AA)
cv2.putText(frame, "BLUE", (185, 33), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 2, cv2.LINE_AA)
cv2.putText(frame, "GREEN", (298, 33), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 2, cv2.LINE_AA)
cv2.putText(frame, "RED", (420, 33), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 2, cv2.LINE_AA)
cv2.putText(frame, "YELLOW", (520, 33), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 2, cv2.LINE_AA)
Step 8. Hand Landmark Detection
We process the frame to detect hand landmarks using MediaPipe:
result = hands.process(framergb)
Step 9. Handling Detected Landmarks
If hand landmarks are detected, we extract the coordinates and draw the landmarks on the frame:
if result.multi_hand_landmarks:
landmarks = []
for handslms in result.multi_hand_landmarks:
for lm in handslms.landmark:
lmx = int(lm.x * 640)
lmy = int(lm.y * 480)
landmarks.append([lmx, lmy])
mpDraw.draw_landmarks(frame, handslms, mpHands.HAND_CONNECTIONS)
fore_finger = (landmarks[8][0], landmarks[8][1])
center = fore_finger
thumb = (landmarks[4][0], landmarks[4][1])
cv2.circle(frame, center, 3, (0, 255, 0), -1)
if (thumb[1] - center[1] < 30):
bpoints.append(deque(maxlen=512))
blue_index += 1
gpoints.append(deque(maxlen=512))
green_index += 1
rpoints.append(deque(maxlen=512))
red_index += 1
ypoints.append(deque(maxlen=512))
yellow_index += 1
elif center[1] <= 65:
if 40 <= center[0] <= 140: # Clear Button
bpoints = [deque(maxlen=512)]
gpoints = [deque(maxlen=512)]
rpoints = [deque(maxlen=512)]
ypoints = [deque(maxlen=512)]
blue_index = 0
green_index = 0
red_index = 0
yellow_index = 0
paintWindow[67:,:,:] = 255
elif 160 <= center[0] <= 255:
colorIndex = 0 # Blue
elif 275 <= center[0] <= 370:
colorIndex = 1 # Green
elif 390 <= center[0] <= 485:
colorIndex = 2 # Red
elif 505 <= center[0] <= 600:
colorIndex = 3 # Yellow
else:
if colorIndex == 0:
bpoints[blue_index].appendleft(center)
elif colorIndex == 1:
gpoints[green_index].appendleft(center)
elif colorIndex == 2:
rpoints[red_index].appendleft(center)
elif colorIndex == 3:
ypoints[yellow_index].appendleft(center)
else:
bpoints.append(deque(maxlen=512))
blue_index += 1
gpoints.append(deque(maxlen=512))
green_index += 1
rpoints.append(deque(maxlen=512))
red_index += 1
Conclusion
We have successfully built an air canvas application using OpenCV, MediaPipe, and basic machine learning techniques. This project demonstrates the power of computer vision and hand tracking in creating interactive and immersive experiences. By leveraging these tools, users can draw in the air with simple hand gestures, opening up possibilities for innovative applications in art, education, and beyond.