How to make a Hand tracking module using Python

Jiten Patel
Level Up Coding
Published in
4 min readNov 24, 2021

--

How great would it be to make different things happen just by waving your hand in the air? This magic power can be compared to having your network of devices, just like Iron Man commanding his network of AI-enabled systems. Wouldn’t be great if you create one for yourself?

Well, tracking your hand is the first step you need to take to build one. In this article, we will create our real-time hand tracking module to detect our hands and estimate the landmarks.

Let’s get started

Believe me or not, if you’re working in the field of computer vision you have definitely used the OpenCV library or if you’re a beginner you’ve at least come across this library. Guess what? we are going to use this library in our project. OpenCV will help us to access the webcam. Don’t worry if you don’t know how to work with the OpenCV library. I will explain every line of code. For hand and landmarks estimation we will use Mediapipe.

What is Mediapipe? 🤔

Mediapipe is a machine learning framework developed by Google. It is open-source and it is lightweight. Mediapipe offers lots of ML solutions APIs such as face detection, face mesh, iris, hands, pose, and there are a lot of them. Check out more about Mediapipe here.

In this project, we are going to use Mediapipe’s Hands Landmark Model.

Figure 1: Mediapip hands landmark model

Let’s start with some basic code

1. import cv2
2. import mediapipe as mp
3. cap = cv2.VideoCapture(0)
4. mp_hands = mp.solutions.hands
5. hands = mp_hands.Hands()
6. mp_draw = mp.solutions.drawing_utils

If you’re new to OpenCV let me help you to understand the above code. We imported two libraries which is a dependency of our project. On line no. 3 we are accessing the webcam of our system. Mediapipe provides hand module API we are accessing them on line no. 4. We are creating the object of class Hands on line no. 5. The Hands() class’s constructor has some optional parameters like, static_image_mode, max_num_hands, min_detection_confidence, and min_tracking_confidence. For the sake of this project, we ain’t going to use any of them but you can configure them as per your project needs. And about mp_draw you will get to know it in a bit.

Detecting hands

We are reading the frames from our webcam on line no. 12 by cap.read() method provided by class VideoCapture(). We will convert the image (frame) captured by our webcam into RGB. Wait Jiten why do we convert our image to an RGB image? I have seen most of the tutorials and read most of the articles but no one really explains why we do this? But don’t worry I will explain you. The reason why we do this is that when OpenCV was first being developed many years ago the standard for reading an image was BGR order. Over the years, the standard has now become RGB but OpenCV still maintains this “legacy” BGR order to ensure no existing code breaks.

Next, we detect hands in a frame with the help of a hand.process() method (line no. 14) Once the hands get detected we move further with locating the key points (figure 1) and then highlighting the dots in the key point. We check if a user is showing his/her hand on a webcam or not (line no. 17). If yes then we have to capture the landmarks of a user’s hands. We loop over the hand landmarks (line no. 19–22) and draw the circle to detected key points.

Image by author
GIF by author

Remember we access the drawing utils API of Mediapipe and I told you to explain it later? Mediapipe drawing_utils provides a method called draw_landmarks() that helps us to connect the dots (key points) we detected. Last but not least, we have to show the final output, the final image to the user that’s where cv2.imshow() method comes in handy. We have wrapped the whole hand-tracking code into a while loop. It’s an infinite loop. And the reason why we wrap our whole code into an infinite loop is that it's a continuous process. Every time a webcam capture a frame we have to process the frame (image), detect the hands in frames, detect the key points on hand, and connect the key points. So, how do we break this infinite loop? on line no. 26 we check if the user presses the “q” then we break out of the loop.

Final output

Image by author
GIF by author

Thanks for coming up here! I appreciate you. The code of this project can be found on my Github. If you liked it don’t forget to give it a star ⭐️ on Github and if you like this article don’t forget to give an appreciative gesture (clap) 👏.

Get connected with me on LinkedIn, Twitter, & Instagram

Happy Coding 😄

--

--