Deploy your apps to a supercloud in a few clicks

This Engineering Education program is supported by Section. Instantly deploy your GitHub apps, Docker containers or K8s namespaces to a supercloud.

Try It For Free

Creating a Hand Gesture Volume Controller using Python and Pycharm

January 28, 2022

This tutorial will discuss how to use python to track hand gestures, and how to create a hand gesture volume controller.

By the end of this tutorial, you will be able to use your fingers and adjust the computer’s volume to your level of satisfaction.

Table of contents


To follow along with this tutorial, you need:

  • To be familiar with the Python programming language.
  • To have Pycharm installed on your computer. You can download it from here.

Referral hand image


Image Source: Mediapipe

The above image shows the numbers of the points that MediaPipe uses to refer to different points of the hand. This tutorial will use point 4 and point 8 which are the thumb and the index finger respectively.

Creating a hand gesture volume controller

Setting up

First of all, we will prepare our workspace. Launch the Pycharm app. Click on the create a new project.


Click on the create button on the window that appears next.

We need to install the libraries we will use in our project.

numpy will help us work with arrays. To install it, open the terminal and run the following command:

pip install numpy

Repeat the same process for the other libraries.

pip install opencv-python

We will import this library as cv2. We will use it to capture an image using the webcam and convert it to RGB.

pip install mediapipe

It is an open-source library developed by Google. We use it for both face and gesture recognition. For this tutorial, we’ll use it for hand gesture recognition.

pip install pycaw

We’ll need this library to access the device’s speaker and its master volume.

pip install python-math

We’ll use this library to find the distance between point number 4 (the thumb) and point number 8 (the index finger) using hypotenuse.

pip install gpib-ctypes, comtypes

pycaw depends on these two libraries. Ctypes provides C language compatible data types. Comtypes bases on the ctypes FFI(Foreign Function Interface) library.

Now, let’s start coding. In the file that pycharm automatically creates for you, type in the following code:

Step 1: Importing the libraries we will need

import cv2
import mediapipe as mp
from math import hypot
from ctypes import cast, POINTER
from comtypes import CLSCTX_ALL
from pycaw.pycaw import AudioUtilities, IAudioEndpointVolume
import numpy as np

In the code segment above, we import each library we installed in our project.

cap = cv2.VideoCapture(0) 

We then get the video input from our computer’s primary camera. If you are using any other camera, replace the number 0 with that of the camera you are using.

Step 2: Detecting, initializing, and configuring the hands

mpHands =
hands = mpHands.Hands()
mpDraw =

In the code above, we are calling on the mediapipe hand module to detect the hands from the video input we got from our primary camera. MpHands.Hands() then completes the initialization and configuration of the detected hands. We finally draw the connections and landmarks on the detected hand using

Step 3: Accessing the speaker using pycaw

devices = AudioUtilities.GetSpeakers()
interface = devices.Activate(IAudioEndpointVolume._iid_, CLSCTX_ALL, None)
volume = cast(interface, POINTER(IAudioEndpointVolume))

These are the initializations we need for pycaw to run smoothly. The developer provides this library together with the initializations. We are not going to change anything. You can find the documentation here.

Step 4: Finding the volume range between the minimum and maximum volume

volMin, volMax = volume.GetVolumeRange()[:2]

The code above finds the volume range between the minimum and maximum volume. We place it outside the while loop because we need to find the volume range once.

Step 5: Capturing an image from our camera and converting it to an RGB image

while True:
    success, img =
    imgRGB = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    results = hands.process(imgRGB)

The code above checks whether the camera we have specified works. If it works, we will capture an image. We then convert the image to RGB and complete the processing of the image.

We now need to check whether we have multiple hands in the image we captured.

Step 6: Checking whether we have multiple hands in our input

lmList = []
if results.multi_hand_landmarks: 

This code creates an empty list that will store the list of elements of the hands detected by the mediapipe hand module, i.e., the number of points on the hand. It also checks whether the input has multiple hands.

We will now create a for loop to manipulate each hand present in the input.

Step 7: Creating a for loop to manipulate each hand

for handlandmark in results.multi_hand_landmarks:
    for id, lm in enumerate(handlandmark.landmark):
        h, w, c = img.shape
        cx, cy = int(lm.x * w), int(lm.y * h)
        lmList.append([id, cx, cy]) 
    mpDraw.draw_landmarks(img, handlandmark, mpHands.HAND_CONNECTIONS)
  • In the code above, we use the first for loop to interact with each hand in the results. We use the second for loop to get the id (id number) and lm (landmark information) for each hand landmark. The landmark information will give us the x and y coordinates. The id number is the number assigned to the various hand points.

  • h, w, c = img.shape: this line of code checks the height, width, and channels of our image. This will give us the width and height of the image.

  • cx, cy = int(lm.x * w), int(lm.y * h): this line of code will find the central position of our image. We will achieve this by multiplying lm.x by the width and assigning the value obtained to cx. Then multiply lm.y by the height and assign the value obtained to cy. lm stands for landmark.

  • lmList.append([id, cx, cy]): we will then use this line to add the values of id,cx and cy to lmList.

  • We will finally call mpDraw.draw_landmarks to draw all the landmarks of the hand using the last line of code.

Step 8: Specifying the points of the thumb and middle finger we will use

if lmList != []:
    x1, y1 = lmList[4][1], lmList[4][2]
    x2, y2 = lmList[8][1], lmList[8][2]

In the code above, we specify the number of elements in lmlist. It should not be null. We assign variables x1 and y1 the x and y coordinates of point 4 respectively. This is the tip of the thumb. We then repeat the same for the index finger in the last line.

Refer to the hand image diagram we discussed to identify the points.

Step 9: Drawing a circle between the tip of the thumb and the tip of the index finger, (x1, y1), 15, (255, 0, 0), cv2.FILLED), (x2, y2), 15, (255, 0, 0), cv2.FILLED)  

The code above draws a circle at the tip of the thumb and that of the index finger.

  • (x1, y1) specifies that we will draw the circle at the tip of the thumb. 15 is the radius of the circle. (255, 0, 0) is the color of the circle. cv2.FILLED refers to the thickness of -1 pixels which will fill the circle with the color we specify.

  • We will repeat the same for the index finger:

Step 10: Drawing a line between points 4 and 8

cv2.line(img, (x1, y1), (x2, y2), (255, 0, 0), 3)

In the code above, we use the cv2.line function to draw a line between point four of the hand and point 8. The line will connect point 4 (x1, y1), which is the tip of the thumb, and point 8 (x2, y2), which is the tip of the index finger. (255, 0, 0) is the line color and 3 is its thickness.

Step 11: Finding the distance between points 4 and 8

length = hypot(x2 - x1, y2 - y1)

In the code above, we find the distance between the tip of the thumb and the index finger using a hypotenuse. We achieve this by calling the math hypot function then passing the difference between x2 and x1 and the difference between y2 and y1.

Step 12: Converting the hand range to the volume range

vol = np.interp(length, [15, 220], [volMin, volMax])
print(vol, length)

We call the NumPy function np.interp, to convert the hand range to the volume range. The arguments used are:

  • length: This is the value we want to convert.
  • [15 - 220]: This is the hand range.
  • [volMin, volMax]: Giving the range to which we want to convert.

Step 13: Setting the master volume

volume.SetMasterVolumeLevel(vol, None)  

We are setting the master volume level following the hand range. We achieve this by passing vol, which is the value of the hand range we converted to volume range.

Step 14: Displaying the video output used to interact with the user

cv2.imshow('Image', img) 

The code above shows the real-time video of the user interacting with the program, i.e., the user uses the thumb finger and the index finger to control the volume.

Step 15: Terminating the program

if cv2.waitKey(1) & 0xff == ord('q'): 

The code above will terminate the program when the user presses the q key.


When we run the code without any errors, the results will be:



You now have all the skills required to create a hand gesture volume controller. If you are working while listening to your favorite music, by just a gesture of your hand, you will be able to control the volume level of your music.

Happy coding!

Peer Review Contributions by: Willies Ogola