backpropgation on chunks?

Hi,

When the document chunks are fed to the data parallel model, how is the loss backpropagated? Is it for every chunk?

Also, do you unfreeze and fine-tune for the classification task?

Thank you!