Could you elaborate a little more and maybe propose a solution to the problem you raised?
(2020/02/10)
I was able to finish this implementation by completing the Stop token prediction and remove the concatenation of inputs and outputs of multihead attention.
However, the alignments of this implementation are less diagonal, so it can not generate proper alignments for fastspeech
As a result, I failed to train the fastspeech with this implementation :(
Could you elaborate a little more and maybe propose a solution to the problem you raised?
(2020/02/10)
I was able to finish this implementation by completing the Stop token prediction and remove the concatenation of inputs and outputs of multihead attention.
However, the alignments of this implementation are less diagonal, so it can not generate proper alignments for fastspeech
As a result, I failed to train the fastspeech with this implementation :(