Perceptual image hashing is a way of creating a fingerprint of an image based on its visual appearance. This fingerprint makes comparing similar images much easier. Algorithms like the one below are used for many purposes, most notably Google’s Image Match where you can provide an image and it will return a listing of images that are visually similar.
The average hash algorithm is a very simple version of a perceptual hash. It’s a good algorithm to use if you’re wanting to find images that are very similar and haven’t had localized corrections performed (like a border or a watermark). The main advantages of this algorithm are its speed and simplicity. This algorithm has three steps:
- Simplify the image by reducing its size and colors.
- Get the average pixel value (the percent gray).
- Generate the hash by comparing each pixel’s value to the average.
image = image.resize((8, 8), Image.ANTIALIAS) # Reduce it's size.
image = image.convert("L") # Convert it to grayscale.
Next we find the average pixel value of the image:
pixels = list(image.getdata()) avg = sum(pixels) / len(pixels)
pixels will simply be a list of the pixel values, ranging from 0 (black) to 255 (white), and we get the average just by adding them up and dividing by how many there are. For this image the average pixel value is 61 (about 25% gray).
Now we can compute the hash. This is done by comparing each pixel in the image to the average pixel value. If the pixel’s value is less than the average it gets a 0 and if it’s more it gets a 1. Then we treat this like a string of bits and convert it to hexadecimal.
bits = "".join(map(lambda pixel: '1' if pixel < avg else '0', pixels)) # '00010100...' hexadecimal = int(bits, 2).__format__('016x').upper()
This gives a hash value of
00010E3CE08FFFFE which can be used to compare how this image “looks” to any other image that has been hashed in the same way using a simple hamming distance. The closer the distance is to 0 the more similar the images are, with 0 being essentially identical.
For more information about perceptual hashing: