Question on the soft q learning implementation

Hi Haarnoja,

Thanks a lot for maintaining the amazing repo!
I feel a little confused about the implementation of SVGD in soft-q learning.
At 
https://github.com/rail-berkeley/softlearning/blob/05daa5524ae1a76b70b8a8a29a0f5f824d401484/softlearning/algorithms/sql.py#L281
，the log probs is calculated as log_probs = svgd_target_values + squash_correction，where is log probs on the $u$(raw_action) space. ($a$ = tanh($u$))
However, the following SVGD used the log probs on the $u$ space to get the updated directions of $a$, which seems to be not aligned. 

I think there should be  actions = self._policy.raw_actions(expanded_observations) in https://github.com/rail-berkeley/softlearning/blob/05daa5524ae1a76b70b8a8a29a0f5f824d401484/softlearning/algorithms/sql.py#L235. (the policy class could add this property.)

Best，
Yuxuan

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question on the soft q learning implementation #143

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question on the soft q learning implementation #143

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions