To build and improve upon a machine learning model for the classification of images and achieve a high accuracy final model.
Download the data
!wget -qq https://sid.erda.dk/public/archives/ff17dc924eba88d5d01a807357d6614c/FullIJCNN2013.zip
!unzip -qq FullIJCNN2013.zip
Import Required packages
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from sklearn.neural_network import MLPClassifier
from skimage.io import imread, imshow
from sklearn import preprocessing
from sklearn.preprocessing import normalize
from sklearn import metrics
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
import seaborn as sns
import os, glob
from PIL import Image
from sklearn.model_selection import GridSearchCV
from keras.callbacks import EarlyStopping
# Keras
import tensorflow as tf
from tensorflow import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, BatchNormalization, Activation
from tensorflow.keras.losses import SparseCategoricalCrossentropy
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.utils import to_categorical
Get the features and labels of data
We'll begin by converting the RGB images into Grayscale images. Next, we'll transform images into 1-D array, so that it can be fed into a machine learning model. Later, we'll see the disadvantages of this approach and how CNNs solve those problems.
-
Extract the features of the images within image sections only
-
Extract labels of the images
-
Resize the images to (30, 30) and convert to numpy 1-D array
def rgb2gray(rgb):
return np.dot(rgb[...,:3], [0.2989, 0.5870, 0.1140])
def read_images(path_folder):
""" Read all the folder (images)
Preprocess images - resize (30x30)
return images and classes
"""
img_data=[]
X = []
Y = []
for name in os.listdir(path_folder):
# Classes are represented by folders
if os.path.isdir(os.path.join(path_folder, name)):
idclass=int(name)
path_class=os.path.join(path_folder, name)
#read files (.ppm) for every folder
for file in os.listdir(path_class):
if os.path.isfile(os.path.join(path_class, file)):
img = Image.open(os.path.join(path_class, file))
img = img.resize((30, 30))
img_array=np.array(img)
img_data.append(img_array)
gray = rgb2gray(img_array)
gray = gray.flatten()
X.append(gray)
Y.append(idclass)
return np.array(img_data), np.array(X), np.array(Y)
path = "/content/FullIJCNN2013/"
img_data, X, y = read_images(path)
Let's check the shape of the data.
img_data.shape, X.shape, y.shape
((1213, 30, 30, 3), (1213, 900), (1213,))
We have 1213 RGB images, each of size 30 x 30. We have transformed each image into a gray scale image and vectorized them into a 1-D array of size 900 each. The labels associated with the images are also recorded in vector y.
Data Exploration and Preprocessing
Plot the sample image of each class
num_classes = 42
num_rows = 7
num_cols = 6
fig, axes = plt.subplots(num_rows, num_cols, figsize=(12, 12))
for i in range(num_classes):
sample_image = img_data[np.where(y == i)[0][0]]
row = i // num_cols
col = i % num_cols
axes[row, col].imshow(sample_image)
axes[row, col].set_title(f"Class {i}")
axes[row, col].axis('off')
plt.tight_layout()
plt.show()
![](https://dw1.s81c.com//IMWUC/MessageImages/5767d403cdda4cc895f52c2d8825e249.png)
There are 43 classes of images, labeled 0 through 42, each representing a traffic sign.
Plot the distribution of Classes
unique_classes, class_counts = np.unique(y, return_counts=True)
plt.figure(figsize=(10, 8))
plt.barh(unique_classes, class_counts, color='skyblue')
plt.xlabel('Frequency')
plt.ylabel('Class')
plt.title('Class Distribution')
plt.yticks(unique_classes)
plt.grid(axis='x', linestyle='--', alpha=0.7)
plt.show()
![](https://dw1.s81c.com//IMWUC/MessageImages/68a3841d02924467a6bc67f651e5bf04.png)
There is a class imbalance - some of the classes have more examples, some have very few. We should ideally balance them for better model. We'll leave this exercise for the next blog.
Normalize the features
For most image data, the pixel values are integers with values between 0 and 255.
Neural networks process inputs using small weight values, and inputs with large integer values can disrupt or slow down the learning process. As such it is good practice to normalize the pixel values.
X_normalized = normalize(X, axis=0)
Train the MLP classifier on features
-
Split the data into train and test
-
Train the MLP classifier with different parameters
-
Get the accuracy score and performance metrics
We do a 80:20 split - 80% of data will be used for training and remaining 20% for testing.
test_size = 0.2
X_train, X_test, y_train, y_test = train_test_split(
X_normalized, y, test_size=test_size
)
X_train.shape, X_test.shape, y_train.shape, y_test.shape
((970, 900), (243, 900), (970,), (243,))
Next, we build a simple MLP classifier and train the model on training set.
model = MLPClassifier()
model.fit(X_train, y_train)
Let's test this simple MLP classifier on test data.
accuracy = model.score(X_test, y_test)
print("Test accuracy:", accuracy)
Test accuracy: 0.6172839506172839
Not bad, we've been able to achieve 61% accuracy on an untuned model.
Tune the hyper-parameters
-
Use the GridSearchCV or RandomizedSearchCV and select best parameters
Hint: GridSearchCV, RandomizedSearchCV
(or)
-
Manually change and find the best parameters
To know about all the parameters, click here
param_grid = {
'hidden_layer_sizes': [(2048,), (2048, 1024), (2048, 1024, 512)],
'activation': ['relu', 'tanh', 'logistic'],
'solver': ['sgd', 'adam'],
'alpha': [0.0001, 0.001]
}
model = MLPClassifier(max_iter=100)
grid_search = GridSearchCV(model, param_grid, cv=3, scoring='accuracy', verbose=1)
grid_search.fit(X_train, y_train)
print("Best parameters:", grid_search.best_params_)
accuracy = grid_search.best_estimator_.score(X_test, y_test)
print("Test accuracy:", accuracy)
Fitting 3 folds for each of 36 candidates, totalling 108 fits
Best parameters: {'activation': 'tanh', 'alpha': 0.0001, 'hidden_layer_sizes': (2048, 1024), 'solver': 'adam'}
Test accuracy: 0.8436213991769548
We've been able to improve the accuracy of the model to 84% by tuning the parameters with grid search.
Implement simple Neural Networks using keras
- Define the keras model and initialize the layers
- Ensure the input layer has the right number of input features. This can be specified when creating the first layer with the input_dim argument.
- Compile the model
- Specify the loss function (to evaluate a set of weights), the optimizer (is used to search through different weights for the network) and any optional metrics to collect and report during training.
- Fit and Evaluate the model
- Fit the data by specifying epochs and evaluate the model
print(tf.__version__)
2.15.0
# Define your model architecture
def create_model(input_shape, num_classes):
model = Sequential([
Dense(1024, input_shape=input_shape, activation='relu'),
Dropout(0.2),
Dense(512, activation='relu'),
Dropout(0.2),
Dense(num_classes, activation='softmax')
])
return model
# Define model parameters
input_shape = X_train.shape[1:]
num_classes = len(set(y_train))
# Create the model
model = create_model(input_shape, num_classes)
# Compile the model
model.compile(optimizer=Adam(),
loss=SparseCategoricalCrossentropy(),
metrics=['accuracy'])
es = EarlyStopping(
monitor="val_loss", mode="min", verbose=0, patience=5, restore_best_weights=True
)
# Train the model
history = model.fit(X_train, y_train, epochs=200, batch_size=32, validation_data=(X_test, y_test))
# Evaluate the model
test_loss, test_accuracy = model.evaluate(X_test, y_test)
print(f"Test Accuracy: {test_accuracy}")
Test Accuracy: 0.8230452537536621
We created a 3-layer neural network - one input layer, two hidden layers and an output layer. We're performing regularization, so that the model does not overfit, using two methods
Without much fine-tuning, we've been able to achieve accuracy similar to a tuned MLP classifier. We can do further fine-tuning using BatchNormalization, and fine-tune other hyper-parameters of the model - number of hidden layers, number of neurons in each layer.
To conclude this exercise, let's now get the classification report.
# Predict probabilities for each class for test set
y_pred_prob = model.predict(X_test)
# Convert probabilities to class labels
y_pred = np.argmax(y_pred_prob, axis=1)
cm = confusion_matrix(y_test, y_pred)
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.xlabel('Predicted')
plt.ylabel('True')
plt.title('Confusion Matrix')
plt.show()
# Compute classification report
class_report = classification_report(y_test, y_pred)
# Print classification report
print("\nClassification Report:")
print(class_report)
![](https://dw1.s81c.com//IMWUC/MessageImages/38320a2567ad41a7be1de6cd4b06a65d.png)
Many images are wrongly classified as class 12. We should perform further analysis on this and take appropriate steps - collecting more samples from classes not getting classified correctly, reviewing classes, etc.
Conclusion
We embarked on the exercise to learn image classification. Images need specific transformations to be used in building machine learning models. We started with a simple MLP classifier, fine-tuned it. We then built simple neural network with regularization for classification. Finally, we created and interpreted the classification report.
Reference: J. Stallkamp, M. Schlipsing, J. Salmen, and C. Igel. The German Traffic Sign Recognition Benchmark: A multi-class classification competition. In Proceedings of the IEEE International Joint Conference on Neural Networks, pages 1453–1460. 2011.