The instruction under Baseline: Without Self-Supervised Learning said clearly "all weights in the model are trained", but I found something confusing in the corresponding code as follow,
model = Classifier(num_class=len(train_data.classes)).to(device)
for param in model.f.parameters():
param.requires_grad = False
and
optimizer = optim.Adam(model.fc.parameters(), lr=1e-3, weight_decay=1e-6)
It's obvious that only the weights of the final fc layer are trained, isn't it?