Image classification Cifar 10 Dataset

In this project, I tackle image classification using the CIFAR-10 dataset [1], a widely used benchmark dataset consisting of 60,000 small (32×32) color images across 10 classes, including airplanes, cars, birds, and more. While deep learning models, particularly convolutional neural networks (CNNs), have achieved remarkable performance on CIFAR-10, regularization techniques play a crucial role in improving generalization and robustness to overfitting.

My approach is inspired by the paper "Improved Regularization of Convolutional Neural Networks with Cutout"[2] by DeVries and Taylor. Cutout is a simple yet effective data augmentation technique that enhances model robustness by randomly masking out square regions of input images during training. By forcing the model to rely on surrounding visual information, Cutout encourages feature learning that is less dependent on specific pixel locations, thereby improving generalization.

Through this study, I aim to demonstrate the practical benefits of Cutout in real-world deep learning applications and provide insights into how simple augmentation techniques can significantly boost performance without increasing computational complexity.

CNN Model architecture: Layer-wise Breakdown

We will explain the layers that we use:

1. Convolutional Layer (Conv2D, 32 filters, 3×3, ReLU, BatchNorm)

The first convolutional layer extracts low-level features such as edges and textures. We use 32 filters with a small 3×3 kernel to capture fine details while preserving spatial structure. Batch normalization is applied to stabilize learning and speed up convergence.

2. Convolutional Layer (Conv2D, 32 filters, 3×3, ReLU, Padding='same', BatchNorm)

This layer continues feature extraction while maintaining the spatial dimensions using padding. The use of batch normalization ensures smoother gradient flow and prevents internal covariate shift.

3. MaxPooling Layer (2×2)

Reduces the spatial dimensions by half, retaining the most important features while discarding less useful information. This helps reduce computational complexity and prevents overfitting.

4. Convolutional Layer (Conv2D, 64 filters, 3×3, ReLU, Padding='same', BatchNorm)

We increase the number of filters to 64 to capture more complex patterns such as shapes and textures. The same 3×3 kernel is used to maintain spatial consistency.

5. Convolutional Layer (Conv2D, 64 filters, 3×3, ReLU, Padding='same', BatchNorm)

By stacking another convolutional layer, the network learns richer hierarchical features before downsampling.

6. MaxPooling Layer (2×2)

Again, we reduce spatial dimensions to enhance computational efficiency and make the network invariant to small translations.

7. Convolutional Layer (Conv2D, 128 filters, 3×3, ReLU, Padding='same', BatchNorm)

Now, we increase the filter count to 128, allowing the model to capture more abstract features like object parts and shapes.

8. Convolutional Layer (Conv2D, 128 filters, 3×3, ReLU, Padding='same', BatchNorm)

Another convolutional layer at this depth further refines feature extraction by reinforcing hierarchical representations.

9. MaxPooling Layer (2×2)

Final downsampling step before flattening, reducing spatial size while maintaining essential information.

10. Flatten Layer

Converts the 2D feature maps into a 1D vector to prepare it for fully connected layers.

11. Fully Connected Layer (Dense, 1024 neurons, ReLU, Dropout 0.2)

A fully connected layer with 1024 neurons helps in high-level decision making. Dropout (0.2) is applied to prevent overfitting.

12. Output Layer (Dense, 10 neurons, Softmax)

The final layer consists of 10 neurons, one for each CIFAR-10 class. Softmax activation is used to output class probabilities.

Results and Conclusion

After training the model for 200 epochs, we achieved a test set accuracy of 89% accuracy of 89%. Applying cutoff regularization resulted in improved performance. Below, we visualize the model's performance on several image samples.

Our model performs well, and we believe it has the potential for further refinement to achieve higher accuracy.

Thanks for reading! 🧑‍💻💕