How to Build Air Canvas?

Kautilya Utkarsh
1y
1.6k
0
4

Article

Air Canvas

Introduction

In this tutorial, we will create an air canvas application that allows users to draw in the air using hand gestures. We will utilize OpenCV for image processing, MediaPipe for hand tracking, and some basic machine learning techniques to interpret the hand movements. The goal is to create an interactive drawing experience without touching any physical device.

Prerequisites

Before we start, ensure you have the following libraries installed:

OpenCV
NumPy
MediaPipe

You can install these libraries using pip:

pip install opencv-python numpy mediapipe

Step-by-Step Guide

Step 1. Importing Libraries

We begin by importing the necessary libraries for our project:

import cv2
import numpy as np
import mediapipe as mp
from collections import deque

Step 2. Setting Up Color Points and Indices

We need to set up deques to handle points for different colors and keep track of the indices for these colors:

bpoints = [deque(maxlen=1024)]
gpoints = [deque(maxlen=1024)]
rpoints = [deque(maxlen=1024)]
ypoints = [deque(maxlen=1024)]

blue_index = 0
green_index = 0
red_index = 0
yellow_index = 0

Step 3. Canvas Setup

Next, we create the canvas window where the drawing will appear and set up the buttons for color selection and clearing the canvas:

paintWindow = np.zeros((471, 636, 3)) + 255
paintWindow = cv2.rectangle(paintWindow, (40, 1), (140, 65), (0, 0, 0), 2)
paintWindow = cv2.rectangle(paintWindow, (160, 1), (255, 65), (255, 0, 0), 2)
paintWindow = cv2.rectangle(paintWindow, (275, 1), (370, 65), (0, 255, 0), 2)
paintWindow = cv2.rectangle(paintWindow, (390, 1), (485, 65), (0, 0, 255), 2)
paintWindow = cv2.rectangle(paintWindow, (505, 1), (600, 65), (0, 255, 255), 2)

cv2.putText(paintWindow, "CLEAR", (49, 33), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 2, cv2.LINE_AA)
cv2.putText(paintWindow, "BLUE", (185, 33), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 2, cv2.LINE_AA)
cv2.putText(paintWindow, "GREEN", (298, 33), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 2, cv2.LINE_AA)
cv2.putText(paintWindow, "RED", (420, 33), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 2, cv2.LINE_AA)
cv2.putText(paintWindow, "YELLOW", (520, 33), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 2, cv2.LINE_AA)
cv2.namedWindow('Paint', cv2.WINDOW_AUTOSIZE)

Step 4. Initializing MediaPipe Hand Detection

We initialize MediaPipe for hand tracking, allowing detection of up to one hand with a minimum detection confidence of 0.7:

mpHands = mp.solutions.hands
hands = mpHands.Hands(max_num_hands=1, min_detection_confidence=0.7)
mpDraw = mp.solutions.drawing_utils

Step 5. Capturing Video from Webcam

We set up the webcam to capture video frames continuously:

cap = cv2.VideoCapture(0)
ret = True
while ret:
    ret, frame = cap.read()
    ...

Step 6. Processing Each Frame

Each frame is flipped horizontally to create a mirror effect and converted from BGR to RGB for MediaPipe processing:

frame = cv2.flip(frame, 1)
framergb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)

Step 7. Drawing UI Elements on Frame

We draw the rectangles and text labels for color selection and clearing the canvas on each frame:

frame = cv2.rectangle(frame, (40, 1), (140, 65), (0, 0, 0), 2)
frame = cv2.rectangle(frame, (160, 1), (255, 65), (255, 0, 0), 2)
frame = cv2.rectangle(frame, (275, 1), (370, 65), (0, 255, 0), 2)
frame = cv2.rectangle(frame, (390, 1), (485, 65), (0, 0, 255), 2)
frame = cv2.rectangle(frame, (505, 1), (600, 65), (0, 255, 255), 2)
cv2.putText(frame, "CLEAR", (49, 33), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 2, cv2.LINE_AA)
cv2.putText(frame, "BLUE", (185, 33), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 2, cv2.LINE_AA)
cv2.putText(frame, "GREEN", (298, 33), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 2, cv2.LINE_AA)
cv2.putText(frame, "RED", (420, 33), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 2, cv2.LINE_AA)
cv2.putText(frame, "YELLOW", (520, 33), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 2, cv2.LINE_AA)

Step 8. Hand Landmark Detection

We process the frame to detect hand landmarks using MediaPipe:

result = hands.process(framergb)

Step 9. Handling Detected Landmarks

If hand landmarks are detected, we extract the coordinates and draw the landmarks on the frame:

if result.multi_hand_landmarks:
    landmarks = []
    for handslms in result.multi_hand_landmarks:
        for lm in handslms.landmark:
            lmx = int(lm.x * 640)
            lmy = int(lm.y * 480)
            landmarks.append([lmx, lmy])

        mpDraw.draw_landmarks(frame, handslms, mpHands.HAND_CONNECTIONS)
    fore_finger = (landmarks[8][0], landmarks[8][1])
    center = fore_finger
    thumb = (landmarks[4][0], landmarks[4][1])
    cv2.circle(frame, center, 3, (0, 255, 0), -1)
    if (thumb[1] - center[1] < 30):
        bpoints.append(deque(maxlen=512))
        blue_index += 1
        gpoints.append(deque(maxlen=512))
        green_index += 1
        rpoints.append(deque(maxlen=512))
        red_index += 1
        ypoints.append(deque(maxlen=512))
        yellow_index += 1
    elif center[1] <= 65:
        if 40 <= center[0] <= 140: # Clear Button
            bpoints = [deque(maxlen=512)]
            gpoints = [deque(maxlen=512)]
            rpoints = [deque(maxlen=512)]
            ypoints = [deque(maxlen=512)]
            blue_index = 0
            green_index = 0
            red_index = 0
            yellow_index = 0
            paintWindow[67:,:,:] = 255
        elif 160 <= center[0] <= 255:
            colorIndex = 0 # Blue
        elif 275 <= center[0] <= 370:
            colorIndex = 1 # Green
        elif 390 <= center[0] <= 485:
            colorIndex = 2 # Red
        elif 505 <= center[0] <= 600:
            colorIndex = 3 # Yellow
    else:
        if colorIndex == 0:
            bpoints[blue_index].appendleft(center)
        elif colorIndex == 1:
            gpoints[green_index].appendleft(center)
        elif colorIndex == 2:
            rpoints[red_index].appendleft(center)
        elif colorIndex == 3:
            ypoints[yellow_index].appendleft(center)
else:
    bpoints.append(deque(maxlen=512))
    blue_index += 1
    gpoints.append(deque(maxlen=512))
    green_index += 1
    rpoints.append(deque(maxlen=512))
    red_index += 1

Conclusion

We have successfully built an air canvas application using OpenCV, MediaPipe, and basic machine learning techniques. This project demonstrates the power of computer vision and hand tracking in creating interactive and immersive experiences. By leveraging these tools, users can draw in the air with simple hand gestures, opening up possibilities for innovative applications in art, education, and beyond.