deep-learning-applications

This repository contains the code developed to solve the exercises of the Deep Learning Applications course (2024/25).

Laboratory 1

Exercise 1.1 and 1.2

The goal of this exercise is to demonstrate the importance of residual connections. We do this by evaluating simple MLPs and showing that deeper networks with residual connections are easier to train compared to networks of the same depth without residual connections. We compare MLPs with a width of 16 and varying depths on the MNIST dataset.

Note: For this first experiment, each residual block contains 1 layer

Depth	Accuracy (MLP)	Accuracy (ResidualMLP)
2	0.93	0.94
8	0.88	0.95
32	0.11	0.95
64	0.11	0.35

Notice the poor performance of the last Residual MLP (depth 64). This was improved by increasing the number of layers per residual block from 1 to 2:

Depth	Accuracy (MLP)	Accuracy (ResidualMLP, 2 layers/block)
2	0.93	0.93
8	0.88	0.95
32	0.11	0.96
64	0.11	0.96

The figure below shows the magnitude of gradients as they propagate through the network, comparing a standard MLP and a MLP with residual blocks. The zig-zag pattern in the graph occurs because gradients for biases and weights were not separated in this visualization. Despite this, it is clear that residual connections help preventing vanishing gradients.

Exercise 1.3

The goal of this exercise is to repeat the analysis from Exercise 1.2, this time using Convolutional Neural Networks trained on CIFAR-10. As expected, we observe improvements when residual connections are used.

_{Accuracy curves during training. The numbers next to each model name indicate the number of residual blocks (with or without skip connections enabled). Models with residual connections achieve consistently higher accuracy than the plain versions.}

Exercise 2.3

The goal of this exercise is to explain the predictions of a CNN by visualizing Class Activation Maps (CAMs). We use the CNN trained in Exercise 1.3 and extend it with CAM to highlight which regions of the input image contribute most to the model’s classification decision.

_{CAMs showing which image regions the trained CNN focuses on (CIFAR-10).}

We also apply CAM to a pre-trained ResNet-18 on images from the Imagenette dataset:

Laboratory 3

Exercise 1

The focus of this exercise was to build a stable baseline for the next exercises. The task is sentiment analysis on the Rotten Tomatoes dataset.

A pretrained DistilBERT was used as feature extractor
With a SVM classifier attached on top of it

Accuracy
0.82

_{(on validation data)}

Exercise 2

In this exercise the pretrained DistilBERT model was fine tuned in order to achive higher accuracy compared to the baseline introduced above. By doing some pre-processing and using HuggingFace Trainer the model achived:

Accuracy
0.84 (+2%)

_{(Best model across 30 fine-tuning epochs)}

Exercise 3.2

In this exercise, we first use a small CLIP model, openai/clip-vit-base-patch16, to evaluate its zero-shot performance on the tiny-imagenet dataset:

Zero-shot Accuracy
0.63

interestingly, just by adding "A photo of a {label}" in front of the text prompts we achive:

Zero-shot Accuracy
0.70 (+7%)

Using Low-Rank Adaptation (LoRA) and fine-tuning the attention layers of both the text and image encoder we achive:

Accuracy
0.76

_{(Best model across 5 fine-tuning epochs)}

While fine-tuning only one of the encoders leads to:

Accuracy (fine-tune Image Encoder only)	Encoder (fine-tune Text Encoder only)
0.72	0.73

I also experimented with a similar methodology on an art dataset Art Style Classification.

The goal is to classify paintings into one of the following five categories:

Portrait
Landscape
Abstract
Religious Painting
Cityscape

The Zero-shot Performance of CLIP is:

Zero-shot Accuracy
0.87

_{(Example of misclassification)}

The performance after fine-tuning with LoRA increases up to:

Accuracy
0.92 (+5%)

_{(Fine-tuning performed on both encoders)}

Laboratory 4

Exercise 1

In this exercise, we build a simple Out-of-Distribution (OOD) detection pipeline. The dataset used for in distribution (ID) examples is CIFAR-10, while the OOD datasets are a subset of CIFAR-100 (with classes not present in CIFAR-10) and randomly generated FakeData. For brevity only results using CIFAR-100 are discussed.

The maximum softmax probability is used for representing how OOD a test sample is. This probability is produced by a custom small CNN and a pretrained ResNet-20 model that i compare in the following table:

	Custom CNN	ResNet
Histogram
ROC curve
PR curve

As shown in the plots, the ResNet performs better. This is expected since it also achieves higher classification accuracy on CIFAR-10 (81%) compared to the smaller custom CNN (64%).

Exercise 2.1

In this exercise the FGSM method is used to generate adversarial examples. The model used is the custom CNN introduced in the previous exercise. Here are shown some examples of adversarial attacks generated with an epsilon = 1/255:

I used 3 metrics in order to evaluate how dependent on epsilon the generated adversarial images are:

Attack success rate
Average iterations to success
Average confidence drop

(All the attacks have a fixed max_n_iterations = 10)

As expected bigger epsilons produce more powerful (but also more noticeable) attacks .

Exercise 2.2

In this exercise FGSM adversarial samples are used to augment the training dataset used to train the the OOD detector model. The way i implemented this augmented training is by using a weighted loss function in the training loop. For each batch, I compute the loss on both the original (clean) inputs and the adversarially perturbed inputs, then combine them to form a single loss. The weights of the loss components are hyperparameters.

For an equally weighted loss, loss = 0.5 * clean_loss + 0.5 * adv_loss, there is a slight improvement of about 2% in both the ROC and PR curves.

_{65% and 87% vs 63% and 85% of (non augmented training) from exercise 1}

For an unbalanced loss, loss = 0.2 * clean_loss + 0.8 * adv_loss, the performance slightly degrades. This suggests that the hyperparameter weights of the components of the loss might be tricky to tune.

Exercise 3.3

The goal of this exercise was to generate targeted attacks by creating adversarial samples that imitate samples from a specific class. Here is a qualitative evaluation of 2 of them, where the target class was "dog":

For a quantitative evaluation i compared the targeted and non targeted attacks using the 3 metrics introduced in exercise 2.1. The model used to generate images is the same (custom CNN trained in exercise 2.2). For both types of attacks epsilon = 1/255 and max_n_iterations = 10.

_{As expected, the average confidence drop is larger and the average number of iterations to success is lower for untargeted attacks compared to targeted ones. This behavior arises because untargeted attacks only need to push the sample outside the decision region of the true class, which is a simpler optimization problem. What is surprising is that the success rate is slightly higher for targeted attacks. This might be the effect of the augmented training done during exercise 2.2, where the augmentation was based on untargeted attacks.}

Note: Some parts of the code in this project were generated with the assistance of generative AI tools.
For example, almost all of the code for the plots was AI-generated.
All AI-generated code was carefully reviewed and rechecked to ensure it executed as intended.
Moreover, I often used AI for debugging errors, especially issues related to tensor shapes or out-of-index errors.

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
lab_1		lab_1
lab_3		lab_3
lab_4		lab_4
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

deep-learning-applications

Laboratory 1

Exercise 1.1 and 1.2

Exercise 1.3

Exercise 2.3

Laboratory 3

Exercise 1

Exercise 2

Exercise 3.2

Laboratory 4

Exercise 1

Exercise 2.1

Exercise 2.2

Exercise 3.3

About

Uh oh!

Releases

Packages

Languages

fbizza/deep-learning-applications

Folders and files

Latest commit

History

Repository files navigation

deep-learning-applications

Laboratory 1

Exercise 1.1 and 1.2

Exercise 1.3

Exercise 2.3

Laboratory 3

Exercise 1

Exercise 2

Exercise 3.2

Laboratory 4

Exercise 1

Exercise 2.1

Exercise 2.2

Exercise 3.3

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages