View Only

Getting started with Image classification

By Randhir Singh posted 18 days ago

Like

Introduction

The choice of machine learning model depends on the nature of data for which we're trying to either perform classification or regression. There are mainly the following classes of data for which we build machine learning models

Tabular data
Time-series
Text
Image

In this blog, we'll get started with working on Images data. Images differ from other data classes. Images can be represented 3-D tensors, the three dimensions being - image height, image width and the RGB color channel. Gray scale images can be represented as 2-D arrays of pixel values in the range of 0-255. We'll learn how to train a model on images that belong to known categories and use the model to classify unseen images from those categories. Specifically, we'll learn to

load and extract features of images
implement the Multi-Layer perceptron to classify images
implement simple neural network from keras

Traffic sign recognition is a challenging, real-world problem relevant for AI based transportation systems. Traffic signs show a wide range of variations between classes in terms of color, shape, and the presence of pictograms or text. However, there exist subsets of classes (e.g., speed limit signs) that are very similar to each other. Further, the classifier has to be robust against large variations in visual appearances due to changes in illumination, partial occlusions, rotations, weather conditions etc. Using a comprehensive traffic sign detection dataset, here we will perform classification of traffic signs, train and evaluate the different models and compare to the performance of MLPs.

Dataset

The data for this exercise is from the German Traffic Sign Detection Benchmark GTSDB. This archive contains the training set used during the IJCNN 2013 competition.

The German Traffic Sign Detection Benchmark is a single-image detection assessment for researchers with interest in the field of computer vision, pattern recognition and image-based driver assistance. It is introduced on the IEEE International Joint Conference on Neural Networks 2013.

It features ...

The main archive FullIJCNN2013.zip includes the images (1360 x 800 pixels) in PPM format, the image sections containing only the traffic signs
A file in CSV format with the ground truth
A ReadMe.txt with more details.

Note that we will be using the images inside the image sections subfolders, containing only the traffic signs.

Problem Statement

To build and improve upon a machine learning model for the classification of images and achieve a high accuracy final model.

Download the data

!wget -qq https://sid.erda.dk/public/archives/ff17dc924eba88d5d01a807357d6614c/FullIJCNN2013.zip
!unzip -qq FullIJCNN2013.zip

Import Required packages

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from sklearn.neural_network import MLPClassifier
from skimage.io import imread, imshow
from sklearn import preprocessing
from sklearn.preprocessing import normalize
from sklearn import metrics
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
import seaborn as sns
import os, glob
from PIL import Image
from sklearn.model_selection import GridSearchCV
from keras.callbacks import EarlyStopping
# Keras
import tensorflow as tf
from tensorflow import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, BatchNormalization, Activation
from tensorflow.keras.losses import SparseCategoricalCrossentropy
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.utils import to_categorical

Get the features and labels of data

We'll begin by converting the RGB images into Grayscale images. Next, we'll transform images into 1-D array, so that it can be fed into a machine learning model. Later, we'll see the disadvantages of this approach and how CNNs solve those problems.

Extract the features of the images within image sections only
Extract labels of the images
Resize the images to (30, 30) and convert to numpy 1-D array

def rgb2gray(rgb):
    return np.dot(rgb[...,:3], [0.2989, 0.5870, 0.1140])

def read_images(path_folder):
    """ Read all the folder (images)
        Preprocess images - resize (30x30)
        return images and classes
    """

    img_data=[]
    X = []
    Y = []

    for name in os.listdir(path_folder):
        # Classes are represented by folders
        if os.path.isdir(os.path.join(path_folder, name)):
            idclass=int(name)
            path_class=os.path.join(path_folder, name)
            #read files (.ppm) for every folder
            for file in os.listdir(path_class):
                if os.path.isfile(os.path.join(path_class, file)):
                    img = Image.open(os.path.join(path_class, file))
                    img = img.resize((30, 30))
                    img_array=np.array(img)
                    img_data.append(img_array)
                    gray = rgb2gray(img_array)
                    gray = gray.flatten()
                    X.append(gray)
                    Y.append(idclass)
    return np.array(img_data), np.array(X), np.array(Y)

path = "/content/FullIJCNN2013/"
img_data, X, y = read_images(path)

Let's check the shape of the data.

img_data.shape, X.shape, y.shape

((1213, 30, 30, 3), (1213, 900), (1213,))

We have 1213 RGB images, each of size 30 x 30. We have transformed each image into a gray scale image and vectorized them into a 1-D array of size 900 each. The labels associated with the images are also recorded in vector y.

Data Exploration and Preprocessing

Plot the sample image of each class

num_classes = 42
num_rows = 7
num_cols = 6

fig, axes = plt.subplots(num_rows, num_cols, figsize=(12, 12))

for i in range(num_classes):
    sample_image = img_data[np.where(y == i)[0][0]]

    row = i // num_cols
    col = i % num_cols
    axes[row, col].imshow(sample_image)
    axes[row, col].set_title(f"Class {i}")
    axes[row, col].axis('off')

plt.tight_layout()
plt.show()

There are 43 classes of images, labeled 0 through 42, each representing a traffic sign.

Plot the distribution of Classes

unique_classes, class_counts = np.unique(y, return_counts=True)

plt.figure(figsize=(10, 8))
plt.barh(unique_classes, class_counts, color='skyblue')
plt.xlabel('Frequency')
plt.ylabel('Class')
plt.title('Class Distribution')
plt.yticks(unique_classes)
plt.grid(axis='x', linestyle='--', alpha=0.7)
plt.show()

There is a class imbalance - some of the classes have more examples, some have very few. We should ideally balance them for better model. We'll leave this exercise for the next blog.

Normalize the features

For most image data, the pixel values are integers with values between 0 and 255.

Neural networks process inputs using small weight values, and inputs with large integer values can disrupt or slow down the learning process. As such it is good practice to normalize the pixel values.

X_normalized = normalize(X, axis=0)

Train the MLP classifier on features

Split the data into train and test
Train the MLP classifier with different parameters
Get the accuracy score and performance metrics

We do a 80:20 split - 80% of data will be used for training and remaining 20% for testing.

test_size = 0.2

X_train, X_test, y_train, y_test = train_test_split(
    X_normalized, y, test_size=test_size
)

X_train.shape, X_test.shape, y_train.shape, y_test.shape

((970, 900), (243, 900), (970,), (243,))

Next, we build a simple MLP classifier and train the model on training set.

model = MLPClassifier()
model.fit(X_train, y_train)

Let's test this simple MLP classifier on test data.

accuracy = model.score(X_test, y_test)
print("Test accuracy:", accuracy)

Test accuracy: 0.6172839506172839

Not bad, we've been able to achieve 61% accuracy on an untuned model.

Tune the hyper-parameters

Use the GridSearchCV or RandomizedSearchCV and select best parameters

Hint: GridSearchCV, RandomizedSearchCV

(or)
Manually change and find the best parameters

To know about all the parameters, click here

param_grid = {
    'hidden_layer_sizes': [(2048,), (2048, 1024), (2048, 1024, 512)],
    'activation': ['relu', 'tanh', 'logistic'],
    'solver': ['sgd', 'adam'],
    'alpha': [0.0001, 0.001]
}

model = MLPClassifier(max_iter=100)

grid_search = GridSearchCV(model, param_grid, cv=3, scoring='accuracy', verbose=1)
grid_search.fit(X_train, y_train)

print("Best parameters:", grid_search.best_params_)

accuracy = grid_search.best_estimator_.score(X_test, y_test)
print("Test accuracy:", accuracy)

Fitting 3 folds for each of 36 candidates, totalling 108 fits

Best parameters: {'activation': 'tanh', 'alpha': 0.0001, 'hidden_layer_sizes': (2048, 1024), 'solver': 'adam'}
Test accuracy: 0.8436213991769548

We've been able to improve the accuracy of the model to 84% by tuning the parameters with grid search.

Implement simple Neural Networks using keras

Define the keras model and initialize the layers
- Ensure the input layer has the right number of input features. This can be specified when creating the first layer with the input_dim argument.
Compile the model
- Specify the loss function (to evaluate a set of weights), the optimizer (is used to search through different weights for the network) and any optional metrics to collect and report during training.
Fit and Evaluate the model
- Fit the data by specifying epochs and evaluate the model

print(tf.__version__)

2.15.0

# Define your model architecture
def create_model(input_shape, num_classes):
    model = Sequential([
        Dense(1024, input_shape=input_shape, activation='relu'),
        Dropout(0.2),
        Dense(512, activation='relu'),
        Dropout(0.2),
        Dense(num_classes, activation='softmax')
    ])
    return model

# Define model parameters
input_shape = X_train.shape[1:]
num_classes = len(set(y_train))

# Create the model
model = create_model(input_shape, num_classes)

# Compile the model
model.compile(optimizer=Adam(),
              loss=SparseCategoricalCrossentropy(),
              metrics=['accuracy'])

es = EarlyStopping(
  monitor="val_loss", mode="min", verbose=0, patience=5, restore_best_weights=True
)

# Train the model
history = model.fit(X_train, y_train, epochs=200, batch_size=32, validation_data=(X_test, y_test))

# Evaluate the model
test_loss, test_accuracy = model.evaluate(X_test, y_test)
print(f"Test Accuracy: {test_accuracy}")

Test Accuracy: 0.8230452537536621

We created a 3-layer neural network - one input layer, two hidden layers and an output layer. We're performing regularization, so that the model does not overfit, using two methods

Dropout
Early stopping

Without much fine-tuning, we've been able to achieve accuracy similar to a tuned MLP classifier. We can do further fine-tuning using BatchNormalization, and fine-tune other hyper-parameters of the model - number of hidden layers, number of neurons in each layer.

To conclude this exercise, let's now get the classification report.

# Predict probabilities for each class for test set
y_pred_prob = model.predict(X_test)

# Convert probabilities to class labels
y_pred = np.argmax(y_pred_prob, axis=1)

cm = confusion_matrix(y_test, y_pred)

plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.xlabel('Predicted')
plt.ylabel('True')
plt.title('Confusion Matrix')
plt.show()

# Compute classification report
class_report = classification_report(y_test, y_pred)

# Print classification report
print("\nClassification Report:")
print(class_report)

Many images are wrongly classified as class 12. We should perform further analysis on this and take appropriate steps - collecting more samples from classes not getting classified correctly, reviewing classes, etc.

Conclusion

We embarked on the exercise to learn image classification. Images need specific transformations to be used in building machine learning models. We started with a simple MLP classifier, fine-tuned it. We then built simple neural network with regularization for classification. Finally, we created and interpreted the classification report.

Reference: J. Stallkamp, M. Schlipsing, J. Salmen, and C. Igel. The German Traffic Sign Recognition Benchmark: A multi-class classification competition. In Proceedings of the IEEE International Joint Conference on Neural Networks, pages 1453–1460. 2011.

0 comments

3 views

AI and Data Science

Master the art of AI and Data Science.

AI and DS Skills