Image Classification Using Convolutional Neural Networks

The aim of the project is to design and implement a custom convolutional neural network (CNN) architecture to address the task of classifying images from the CIFAR-10 dataset. The CIFAR-10 dataset consists of 60,000 colour images in 10 classes, with 50,000 images for training and 10,000 images for testing. The goal is to build a robust model that generalises well to unseen data while keeping the architecture both flexible and fairly computationally efficient.

Developed a novel CNN architecture featuring six intermediate blocks and an output block.
Each intermediate block uses six parallel convolutional layers with dynamic output combination via a small fully connected network using softmax-normalised channel averages.
Maintains a constant channel width of 64 across all blocks to simplify the model and control parameter growth, reducing overfitting.
Adam optimizer with cross-entropy loss, along with a learning rate scheduler (halving every 100 epochs) and early stopping based on test loss.
Achieved a maximum test accuracy of 81.71% on the CIFAR-10 dataset.

An example of images and classes used in the CIFAR-10 dataset

Link to Source code repository

Network Architecture

Prior to arriving at the architecture which provided the best test accuracy, a few alternative designs and hyperparameter configurations were explored. Initial experiments included varying the number of intermediate blocks and the number of convolutional layers per block, as well as comparing architectures with progressively increasing channel counts against those maintaining a constant channel width. These early tests revealed that while adding more layers and blocks could potentially capture richer and more intricate features, they also increased the model complexity and risk of overfitting.

My most successful model architecture, presented on the left adheres to the following:

The model consists of 7 intermediate blocks followed by an output block.
Each intermediate block takes input of shape [B, c, H, W] (with c starting at 3 and set to 64 after the first block).
Each block contains 7 parallel convolutional pathways, with each pathway applying conv2d, BatchNorm2d and ReLu activation.
Instead of averaging or concatenating, a dynamic weighted sum is computed.
This allows the model to dynamically control how much each branch contributes to inference.
The output block takes the final feature map and converts it into logits for the 10 CIFAR-10 classes.
The model selects the class with the highest probability, using torch.argmax()

Despite some overfitting, the model does generalize fairly well, achieving over 80% test accuracy, which is a strong result for CIFAR-10 with a custom architecture.
Data augmentation, weight decay, and early stopping clearly helped mitigate overfitting to some extent.
Further improvements could come from stronger regularisation, more aggressive data augmentation, or experimenting with reducing model complexity.

Dataset, Training & Validation

The CIFAR-10 Dataset:

60,000 color images (32×32 pixels).
10 classes: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck.
50,000 training images and 10,000 test images.

To train the final model for CIFAR-10 classification, I adopted an Adam Optimiser with an initial learning rate of 0.0007 and a weight decay of 2×10^−4 . A StepLR learning rate scheduler was also adopted to reduce the learning rate by a factor of 0.5 every 100 epochs, which helped with gradual convergence. The training process was set up to 200 epochs, but to prevent overfitting I incorporated an early stopping mechanism that monitored the average test loss. If no improvement in test loss was observed for 10 consecutive epochs, training was halted, to make sure model wasn’t learning from noise.

During each epoch, the network’s performance was evaluated on both the training and test datasets. Training loss was computed as the mean of batch losses, while training accuracy was calculated by comparing the model’s predicted labels to the true labels over the entire training set. Similarly, testing loss was averaged over all test batches, and test accuracy was measured as a percentage of correctly classified examples. These metrics were recorded at every epoch. They served both to monitor progress and to guide hyperparameter tuning.

Image Classification Using Convolutional Neural Networks

Network Architecture

Dataset, Training & Validation

Alen Abdrakhmanov

Location

Contact