Unexpected Parameter Updates During Second-Stage Fine-Tuning

## Bug
Thanks for this wonderful work. However,I noticed that during the second-stage fine-tuning of the decoder, if torch.no_grad() is not used to block gradient flow, the parameters of the encoder and entropy model also appear to be updated. Could you please confirm if this is the intended behavior?
Looking forward to your response!