reinforcement-learning

Hi all!
I am trying a self-play based scheme, where I want to have two agents in waterworld environment have a policy that is being trained (“shared_policy_1”) and other 3 agents that sample a policy from a menagerie (set) of the previous policies of the first two agents ( “shared_policy_2”).
My problem is that I see that the weights in the menagerie are overwritten in every iteration by the cur

Continuation of issue #2474 as discussed here

Is there a way to train a bidirectional RNN (like LSTM or GRU) on trax nowadays?

The following applies to DDPG and TD3, and possibly other models. The following libraries were installed in a virtual environment:

numpy==1.16.4
stable-baselines==2.10.0
gym==0.14.0
tensorflow==1.14.0

Episode rewards do not seem to be updated in model.learn() before callback.on_step(). Depending on which callback.locals variable is used, this means that:

episode rewards may n

How to use Watcher / WatcherClient over tcp/ip network?

Watcher seems to ZMQ server, and WatcherClient is ZMQ Client, but there is no API/Interface to config server IP address.
Do I need to implement a class that inherits from WatcherClient?

reinforcement-learning

Here are 6,609 public repositories matching this topic...

ray-project / ray

tensorflow / tensor2tensor

Unity-Technologies / ml-agents

eugeneyan / applied-ml

ShangtongZhang / reinforcement-learning-an-introduction

ddbourgin / numpy-ml

kmario23 / deep-learning-drizzle

Hvass-Labs / TensorFlow-Tutorials

bulletphysics / bullet3

VowpalWabbit / vowpal_wabbit

deepmind / pysc2

tensorlayer / tensorlayer

MorvanZhou / Reinforcement-learning-with-tensorflow

google / trax

owainlewis / awesome-artificial-intelligence

lazyprogrammer / machine_learning_examples

MorvanZhou / PyTorch-Tutorial

tensorpack / tensorpack

aws / amazon-sagemaker-examples

keras-rl / keras-rl

yandexdataschool / Practical_RL

BinRoot / TensorFlow-Book

janhuenermann / neurojs

jason718 / awesome-self-supervised-learning

udacity / deep-reinforcement-learning

arXivTimes / arXivTimes

andri27-ts / Reinforcement-Learning

hill-a / stable-baselines

pytorch / ELF

microsoft / tensorwatch

Improve this page

Add this topic to your repo