There is a training bug in this project. That is I only set teacher model's require_grad_=False, but still put it's parameters to optimizer. So teacher model will update in training while it doesn't compute grad.
I don't have plan to fix it, because the training result shows it also works well.
There is a training bug in this project. That is I only set teacher model's require_grad_=False, but still put it's parameters to optimizer. So teacher model will update in training while it doesn't compute grad.
I don't have plan to fix it, because the training result shows it also works well.