Review of Image Processing Concepts#

This notebook introduces some of the image processing algorithms/functions that are used throughout these interactive sessions.

Goals#

  • Introduce image processing using OpenCV

References#

OpenCV

Introduction #

What is a pixel?#

A pixel is a picture element. Merriam Webster defines pixel as:

any of the small discrete elements that together constitute an image.

Usually pixels are built of channels, in the case of color images there are three channels Red, Blue and Green (RGB). Each pixel is a discrete representation of the light intensity for a particular channel. The light intensity is encoded in 8-bit, hence only 256 discrete values are possible per channel.

Sometimes, you may find images with the alpha channel. This channel handles transparency, we will omit this channel for the sake of simplicity.

Run the next cell and then click on the blue box. Interact with the color picker and observe the light intensity for each channel.

import ipywidgets as widgets

widgets.ColorPicker(
    concise=False,
    description='Pick a color',
    value='blue',
    disabled=False
)

How many colors can a pixel represent?#

A color pixel is built of three channels, each pixel channel can represent 256 values. So, a color pixel can represent \(256 \times 256 \times 256 = 16,777,256\) different colors.

What is an image?#

An image is a collection of pixels, where each pixel stores a value proportional to the light intensity at that particular location. The size of an image is its dimension (or resolution) which is specified in width and hight. For instance, \(1920x1080\)

What does the term frames per second refer to?#

In a video the term frames per second (FPS), or images per second, represents the number of images that are played every second. The higher the image resolution and/or FPS the better the hardware required to record, process and reproduce video. Graphic Process Units (GPU) are dedicated hardware to render high frame per second applications (for instance games).

OpenCV #

OpenCV is an open source cross platform computer vision library. In the remainder of this notebook, you will be using OpenCV to get familiar with some of the computer vision functions.

Import numpy, OpenCV and matplotlib to visualize images. We will use a mosaic image to demonstrate the effect of some vision operation to such image.

import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt
img = cv.imread('./images/jpg/ryzenai_future_starts_now.jpg')
plt.figure(figsize=(10, 10))
plt.axis("off")
plt.imshow(img[...,::-1]);
../_images/f624a3020aac0cbe69af7098fb547615c8e6fd37e37802a948df359b68ceee82.png

Color Conversion #

A color image is represented on the RGB color space, however there are many different color spaces. Each of them have a particular purpose. We will explore some of them.

Grayscale#

A grayscale image has only one channel which represents the amount of light that each pixel contains. One of the main purposes of grayscale images on vision applications is to detect edges on images.

gray = cv.cvtColor(img, cv.COLOR_RGB2GRAY)
plt.figure(figsize=(10, 10)), plt.axis("off"), plt.imshow(gray, cmap='gray');
../_images/0a7afc3c0a87e5ce55e9e185d8b869ced87bd75d91541004c4f159ef8c8f53e5.png

HSV Color space#

HSV color space represents an image using the channels hue, saturation and value. This color space aligns a bit better to the way human perceives color-making attributes, and in vision application this color space is useful to detect colors more accurately.

hsv = cv.cvtColor(img, cv.COLOR_BGR2HSV)
plt.figure(figsize=(10, 10)), plt.axis("off"), plt.imshow(hsv);
../_images/6f2d6e77fb829171667e8437b110a6002c43ba79af331daef852b141f16ba181.png

Color Thresholding #

Produce an output image that lights up pixels with light intensity within a range.

# Get Random threshold values
thr_b = np.random.randint(75,188)
thr_g = np.random.randint(60,230)
thr_r = np.random.randint(50,123)
# Perform per channel thresholding
(b, g, r) = cv.split(img)
_, thr_img_b = cv.threshold(b, 150, 255, cv.THRESH_BINARY)
_, thr_img_g = cv.threshold(g, 150, 255, cv.THRESH_BINARY)
_, thr_img_r = cv.threshold(r, 150, 255, cv.THRESH_BINARY)
dst = cv.merge((thr_img_b, thr_img_g, thr_img_r))

plt.figure(figsize=(15, 15))
plt.subplot(121),plt.imshow(img[...,::-1]),plt.title('Original')
plt.xticks([]), plt.yticks([])
plt.subplot(122),plt.imshow(dst[...,::-1]),plt.title('Thresholded')
plt.xticks([]), plt.yticks([])
plt.show()
print(f'{thr_b=}\t{thr_g=}\t{thr_r=}')
../_images/2d18c4481b7fe915ca0533abb5495760d6cdab2e632baa1d55e600a46a208422.png
thr_b=161	thr_g=214	thr_r=108

2D Convolution #

It is also know as filter2D on OpenCV.

The 2D convolution is a mathematical operation that uses a kernel (matrix of a dimension \(n \times n\)). This kernel slides over the input image and produces an output image.

In the next few cells we will consider a few \(3 \times 3\) kernels. You can explore more kernels live here

Identity kernel#

The output of a 2D convolution using an identity kernel is the same image.

\[\begin{split} Identity = \begin{bmatrix} 0 & 0 & 0\\ 0 & 1 & 0\\ 0 & 0 & 0 \end{bmatrix} \end{split}\]
identity = np.array([[0,0,0],[0,1,0],[0,0,0]],np.float32)
identity
array([[0., 0., 0.],
       [0., 1., 0.],
       [0., 0., 0.]], dtype=float32)

Apply the identity kernel to the image

dst = cv.filter2D(img,-1,identity)
plt.figure(figsize=(15, 15))
plt.subplot(121),plt.imshow(img[...,::-1]),plt.title('Original')
plt.xticks([]), plt.yticks([])
plt.subplot(122),plt.imshow(dst[...,::-1]),plt.title('Identity')
plt.xticks([]), plt.yticks([])
plt.show()
../_images/c0f14077898e4cc58dad500796f05fcaea1781186d0ce024ba47cb378ffb8f81.png
print("Are image the same? {}".format(np.array_equal(img,dst)))
Are image the same? True

Emboss kernel#

The output of a 2D convolution using an emboss kernel produces an image that stress the difference of pixels in a given direction given an illusion of depth.

\[\begin{split} Emboss = \begin{bmatrix} -2 & -1 & 0\\ -1 & 1 & 1\\ 0 & 1 & 2 \end{bmatrix} \end{split}\]
emboss = np.array([[-2,-1,0],[-1,1,1],[0,1,2]],np.float32)
emboss
array([[-2., -1.,  0.],
       [-1.,  1.,  1.],
       [ 0.,  1.,  2.]], dtype=float32)
dst = cv.filter2D(img,-1,emboss)
plt.figure(figsize=(15, 15))
plt.subplot(121),plt.imshow(img[...,::-1]),plt.title('Original')
plt.xticks([]), plt.yticks([])
plt.subplot(122),plt.imshow(dst[...,::-1]),plt.title('Emboss')
plt.xticks([]), plt.yticks([])
plt.show()
../_images/05dfde6cd373718ad3d4dad7f1de5f3e5910477e06a784d520d61ecf54bfceb9.png

Morphological Transformations #

Morphological transformations are operations based on the image shape. The provided kernel decides the nature of operation.

Dilate#

From Wikipedia:

The dilation operation usually uses a structuring element for probing and expanding the shapes contained in the input image.

kernel = np.ones((3,3),np.uint8)
dilate = cv.dilate(img,kernel,iterations = 1)
plt.figure(figsize=(10, 10)), plt.axis("off"), plt.imshow(dilate[...,::-1]);
../_images/c9f5908bbe0c253a9335840ed89923f5bcf17c01bfef3b771f11fb4f60e68202.png

Erode#

From Wikipedia:

The erosion operation usually uses a structuring element for probing and reducing the shapes contained in the input image.

kernel = np.ones((3,3),np.uint8)
erode = cv.erode(img,kernel,iterations = 1)
plt.figure(figsize=(10, 10)), plt.axis("off"), plt.imshow(erode[...,::-1]);
../_images/061b2aa554d9a2988c954ee3a4b4b444d857dfe4ffe0679a3418c5b294f209e8.png

Duplicate Operation #

This operation simply produces two copies of the same image.

img2 = img.copy()
img3 = img.copy()

plt.figure(figsize=(18, 18))
plt.subplot(131),plt.imshow(img[...,::-1]),plt.title('Original')
plt.xticks([]), plt.yticks([])
plt.subplot(132),plt.imshow(img2[...,::-1]),plt.title('Copy 1')
plt.xticks([]), plt.yticks([])
plt.subplot(133),plt.imshow(img3[...,::-1]),plt.title('Copy 2')
plt.xticks([]), plt.yticks([])
plt.show()
../_images/8c6405ca0e2f520a0d126e36997846c950fe81323929fce9c638ec57b0535f87.png

Join Operations #

These vision operations take two input images and produce a single one as a result.

We will generate the input images as the mosaic version after erode and dilate operations.

kernel = np.ones((3,3),dtype=np.uint8)
dilate = cv.dilate(img,kernel,iterations = 1)
erode = cv.erode(img,kernel,iterations = 1)

Subtract#

This function performs a pixel-wise subtraction, therefore the order of the the images matter.

ed = erode - dilate
de = dilate - erode

plt.figure(figsize=(15, 15))
plt.subplot(121),plt.imshow(ed[...,::-1]),plt.title('erode - dilate')
plt.xticks([]), plt.yticks([])
plt.subplot(122),plt.imshow(de[...,::-1]),plt.title('dilate - erode')
plt.xticks([]), plt.yticks([])
plt.show()
../_images/01344adeed3d68ebe0624c2c604a97b1b067c95de1eff44c6c8998d8a3b4a80b.png

Absdiff#

ed = cv.absdiff(erode, dilate)
de = cv.absdiff(dilate, erode)
plt.figure(figsize=(15, 15))
plt.subplot(121),plt.imshow(ed[...,::-1]),plt.title('absdiff(erode, dilate)')
plt.xticks([]), plt.yticks([])
plt.subplot(122),plt.imshow(de[...,::-1]),plt.title('absdiff(dilate - erode)')
plt.xticks([]), plt.yticks([])
plt.show()
../_images/67b40bbe3505abbc56c912437dee7a09a435bc5bb350eaac8b7be08370a6ae5b.png
print("Are image the same? {}".format(np.array_equal(ed,de)))
Are image the same? True

Add#

The addition of two images can potentially lead to overflow (the result is bigger than 255). Depending on the implementation the result can either:

  1. Saturate: you will notice large white areas in the image

  2. Wrap around: you will notice artifacts in the result

addsat = cv.add(erode, dilate)
addwrap = erode + dilate
plt.figure(figsize=(15, 15))
plt.subplot(121),plt.imshow(addsat[...,::-1]),plt.title('Add Saturate')
plt.xticks([]), plt.yticks([])
plt.subplot(122),plt.imshow(addwrap[...,::-1]),plt.title('Add wrap around')
plt.xticks([]), plt.yticks([])
plt.show()
../_images/0ec8b5693cfe2ac89bf2884c8f5de3fad9886ab85a674982d144e4560463b035.png

Review #

After having read and used this notebook you should be able to answer the following questions:

  • What is a pixel?

  • What is an image?

  • What is image resolution?

  • What is frames per second?

Conclusion #

This notebooks provide a brief introduction to core concepts of computer vision. The OpenCV library is introduced and used to visualize some of the common computer vision functionality.


Copyright© 2023 AMD, Inc

SPDX-License-Identifier: MIT