reinforcement-learning
Here are 6,609 public repositories matching this topic...
-
Updated
Jun 27, 2021 - Python
-
Updated
Jul 4, 2021 - C#
-
Updated
Jun 29, 2021
-
Updated
Jun 6, 2021 - Python
-
Updated
Jun 23, 2021 - Python
-
Updated
May 21, 2021
-
Updated
Jan 15, 2021 - Jupyter Notebook
-
Updated
Jun 30, 2021 - C++
-
Updated
Jul 29, 2020 - Python
-
Updated
Jun 30, 2021 - Python
-
Updated
Nov 1, 2020 - Python
Bidirectional RNN
Is there a way to train a bidirectional RNN (like LSTM or GRU) on trax nowadays?
-
Updated
Jul 1, 2021
-
Updated
Jun 27, 2021 - Python
-
Updated
Apr 1, 2021 - Jupyter Notebook
-
Updated
Jun 16, 2021 - Python
-
Updated
Jul 2, 2021 - Jupyter Notebook
-
Updated
Nov 20, 2020 - Python
-
Updated
Jul 4, 2021 - Jupyter Notebook
-
Updated
Dec 14, 2019 - Jupyter Notebook
-
Updated
May 7, 2021 - JavaScript
-
Updated
Jun 24, 2021
-
Updated
May 1, 2021 - Jupyter Notebook
-
Updated
Jul 3, 2021
-
Updated
Jun 30, 2020 - Jupyter Notebook
The following applies to DDPG and TD3, and possibly other models. The following libraries were installed in a virtual environment:
numpy==1.16.4
stable-baselines==2.10.0
gym==0.14.0
tensorflow==1.14.0
Episode rewards do not seem to be updated in model.learn()
before callback.on_step()
. Depending on which callback.locals
variable is used, this means that:
- episode rewards may n
-
Updated
Jun 21, 2019 - C++
How to use Watcher / WatcherClient over tcp/ip network?
Watcher seems to ZMQ server, and WatcherClient is ZMQ Client, but there is no API/Interface to config server IP address.
Do I need to implement a class that inherits from WatcherClient?
Improve this page
Add a description, image, and links to the reinforcement-learning topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the reinforcement-learning topic, visit your repo's landing page and select "manage topics."
Hi all!
I am trying a self-play based scheme, where I want to have two agents in waterworld environment have a policy that is being trained (“shared_policy_1”) and other 3 agents that sample a policy from a menagerie (set) of the previous policies of the first two agents ( “shared_policy_2”).
My problem is that I see that the weights in the menagerie are overwritten in every iteration by the cur