MADDPG environment to solve openai's 'simple_tag' environment.
Three(default) predators chase a preyer for reward(10) plus shaped reward(distance of predators and preyers).
Three predator choose action with MADDPG algorithms and the preyer acts with uniform distribution from -1. to 1.
- pytorch==1.0.1
- tensorboardX
- Use my environment on envs or...
- Install the OpenAI's environment and edit some codes
# environment.py L29
# self.discrete_action_space = True
self.discrete_action_space = False# simple_tag.py L92
def agent_reward(self, agent, world):
# Agents are negatively rewarded if caught by adversaries
rew = 0
# shape = False
shape = True
# simple_tag.py L118
def adversary_reward(self, agent, world):
# Adversaries are rewarded for collisions with agents
rew = 0
# shape = False
shape = Truepython train.py --tensorboard
