Hi Haarnoja,
Thanks a lot for maintaining the amazing repo!
I feel a little confused about the implementation of SVGD in soft-q learning.
At
|
log_probs = svgd_target_values + squash_correction |
,the log probs is calculated as log_probs = svgd_target_values + squash_correction,where is log probs on the
$u$(raw_action) space. (
$a$ = tanh(
$u$))
However, the following SVGD used the log probs on the
$u$ space to get the updated directions of
$a$, which seems to be not aligned.
I think there should be actions = self._policy.raw_actions(expanded_observations) in
|
actions = self._policy.actions(expanded_observations) |
. (the policy class could add this property.)
Best,
Yuxuan
Hi Haarnoja,
Thanks a lot for maintaining the amazing repo!
I feel a little confused about the implementation of SVGD in soft-q learning.
At
softlearning/softlearning/algorithms/sql.py
Line 281 in 05daa55
,the log probs is calculated as log_probs = svgd_target_values + squash_correction,where is log probs on the
However, the following SVGD used the log probs on the
I think there should be actions = self._policy.raw_actions(expanded_observations) in
softlearning/softlearning/algorithms/sql.py
Line 235 in 05daa55
Best,
Yuxuan