Hi, When the document chunks are fed to the data parallel model, how is the loss backpropagated? Is it for every chunk? Also, do you unfreeze and fine-tune for the classification task? Thank you!
Hi,
When the document chunks are fed to the data parallel model, how is the loss backpropagated? Is it for every chunk?
Also, do you unfreeze and fine-tune for the classification task?
Thank you!