初学者 Python 项目:使用 OpenCV 和 Mediapipe 构建增强现实绘图应用程序
在这个 Python 项目中,我们将创建一个简单的 AR 绘图应用程序。使用网络摄像头和手势,您可以在屏幕上虚拟绘图、自定义画笔,甚至保存您的作品!
设置
首先,创建一个新文件夹并使用以下命令初始化一个新的虚拟环境:
python -m venv venv
./venv/Scripts/activate
接下来使用 pip 或您选择的安装程序安装所需的库:
pip install mediapipe
pip install opencv-python
笔记
您可能无法在 Python 上安装最新版本的 mediapipe。撰写此博客时,我使用的是 Python 3.11.2。请确保使用兼容的 Python 版本。
步骤 1:捕获网络摄像头信号
第一步是设置您的网络摄像头并显示视频源。我们将使用 OpenCV 的“VideoCapture”访问摄像头并连续显示帧。
import cv2
# The argument '0' specifies the default camera (usually the built-in webcam).
cap = cv2.VideoCapture(0)
# Start an infinite loop to continuously capture video frames from the webcam
while True:
# Read a single frame from the webcam
# `ret` is a boolean indicating success; `frame` is the captured frame.
ret, frame = cap.read()
# Check if the frame was successfully captured
# If not, break the loop and stop the video capture process.
if not ret:
break
# Flip the frame horizontally (like a mirror image)
frame = cv2.flip(frame, 1)
# Display the current frame in a window named 'Webcam Feed'
cv2.imshow('Webcam Feed', frame)
# Wait for a key press for 1 millisecond
# If the 'q' key is pressed, break the loop to stop the video feed.
if cv2.waitKey(1) & 0xFF == ord('q'):
break
# Release the webcam resource to make it available for other programs
cap.release()
# Close all OpenCV-created windows
cv2.destroyAllWindows()**你可知道?**
在 OpenCV 中使用 `cv2.waitKey()` 时,返回的按键代码可能包含额外的位,具体取决于平台。为了确保正确检测按键,您可以用 `0xFF` 屏蔽结果以隔离低 8 位(实际 ASCII 值)。如果没有这个,您的按键比较可能会在某些系统上失败 - 因此请始终使用 `& 0xFF` 来保持一致的行为!
第 2 步:集成手部检测
使用 Mediapipe 的 Hands 解决方案,我们将检测手并提取食指和中指等关键标志的位置。
import cv2
import mediapipe as mp
# Initialize the MediaPipe Hands module
mp_hands = mp.solutions.hands # Load the hand-tracking solution from MediaPipe
hands = mp_hands.Hands(
min_detection_confidence=0.9,
min_tracking_confidence=0.9
)
cap = cv2.VideoCapture(0)
while True:
ret, frame = cap.read()
if not ret:
break
# Flip the frame horizontally to create a mirror effect
frame = cv2.flip(frame, 1)
# Convert the frame from BGR (OpenCV default) to RGB (MediaPipe requirement)
frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
# Process the RGB frame to detect and track hands
result = hands.process(frame_rgb)
# If hands are detected in the frame
if result.multi_hand_landmarks:
# Iterate through all detected hands
for hand_landmarks in result.multi_hand_landmarks:
# Get the frame dimensions (height and width)
h, w, _ = frame.shape
# Calculate the pixel coordinates of the tip of the index finger
cx, cy = int(hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_TIP].x * w), \
int(hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_TIP].y * h)
# Calculate the pixel coordinates of the tip of the middle finger
mx, my = int(hand_landmarks.landmark[mp_hands.HandLandmark.MIDDLE_FINGER_TIP].x * w), \
int(hand_landmarks.landmark[mp_hands.HandLandmark.MIDDLE_FINGER_TIP].y * h)
# Draw a circle at the index finger tip on the original frame
cv2.circle(frame, (cx, cy), 10, (0, 255, 0), -1) # Green circle with radius 10
# Display the processed frame in a window named 'Webcam Feed'
cv2.imshow('Webcam Feed', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break # Exit the loop if 'q' is pressed
# Release the webcam resources for other programs
cap.release()
cv2.destroyAllWindows()步骤 3:跟踪手指位置并绘制
我们将跟踪食指,并且仅当食指和中指相距阈值距离时才允许绘图。
我们将维护一个食指坐标列表,以便在原始框架上绘制,并且每当中指足够接近时,我们就会将“无”附加到该坐标数组中,表示断裂。
import cv2
import mediapipe as mp
import math
# Initialize the MediaPipe Hands module
mp_hands = mp.solutions.hands
hands = mp_hands.Hands(
min_detection_confidence=0.9,
min_tracking_confidence=0.9
)
# Variables to store drawing points and reset state
draw_points = [] # A list to store points where lines should be drawn
reset_drawing = False # Flag to indicate when the drawing should reset
# Brush settings
brush_color = (0, 0, 255)
brush_size = 5
cap = cv2.VideoCapture(0)
while True:
ret, frame = cap.read()
if not ret:
break
frame = cv2.flip(frame, 1)
frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
result = hands.process(frame_rgb)
# If hands are detected
if result.multi_hand_landmarks:
for hand_landmarks in result.multi_hand_landmarks:
h, w, _ = frame.shape # Get the frame dimensions (height and width)
# Get the coordinates of the index finger tip
cx, cy = int(hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_TIP].x * w), \
int(hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_TIP].y * h)
# Get the coordinates of the middle finger tip
mx, my = int(hand_landmarks.landmark[mp_hands.HandLandmark.MIDDLE_FINGER_TIP].x * w), \
int(hand_landmarks.landmark[mp_hands.HandLandmark.MIDDLE_FINGER_TIP].y * h)
# Calculate the distance between the index and middle finger tips
distance = math.sqrt((mx - cx) ** 2 + (my - cy) ** 2)
# Threshold distance to determine if the fingers are close (used to reset drawing)
threshold = 40
# If the fingers are far apart
if distance > threshold:
if reset_drawing: # Check if the drawing was previously reset
draw_points.append(None) # None means no line
reset_drawing = False
draw_points.append((cx, cy)) # Add the current point to the list for drawing
else: # If the fingers are close together set the flag to reset drawing
reset_drawing = True #
# Draw the lines between points in the `draw_points` list
for i in range(1, len(draw_points)):
if draw_points[i - 1] and draw_points[i]: # Only draw if both points are valid
cv2.line(frame, draw_points[i - 1], draw_points[i], brush_color, brush_size)
cv2.imshow('Webcam Feed', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
# Release the webcam and close all OpenCV windows
cap.release()
cv2.destroyAllWindows()