-
Updated
May 29, 2020 - Python
policy-gradient
Here are 231 public repositories matching this topic...
- I have marked all applicable categories:
- exception-raising bug
- RL algorithm bug
- documentation request (i.e. "X is missing from the documentation.")
- new feature request
- I have visited the [source website], and in particular read the [known issues]
- I have searched through the [issue tracker] for duplicates
- I have mentioned versio
-
Updated
Mar 18, 2020 - Python
-
Updated
Jan 28, 2020 - Python
BTgym have two main sections, the Gym framework and the RL algorithm framework.
The RL part is tailored to the unique gym requirements of BTgym, but as new research in the field is emerging there will be a benefit in exploring new algorithms that aren't implemented by this project.
The following tutorial is my own attempt of testing the integration between the Gym part of BTgym with an externa
-
Updated
Apr 17, 2020 - Jupyter Notebook
-
Updated
Apr 23, 2020 - Python
-
Updated
Oct 31, 2017 - Python
-
Updated
Sep 27, 2018 - Python
-
Updated
Apr 6, 2020 - Python
-
Updated
Feb 26, 2020 - Jupyter Notebook
-
Updated
Jan 22, 2019 - Jupyter Notebook
-
Updated
Jul 14, 2019 - Python
-
Updated
May 11, 2020 - Python
-
Updated
Dec 3, 2019 - Jupyter Notebook
-
Updated
Feb 9, 2018 - Python
-
Updated
May 29, 2020 - Python
-
Updated
Mar 29, 2020 - Jupyter Notebook
1-grid-word ---> 1-policy-iteration 에서
코드 전제적으로 width, height 순서가 맞지 않습니다.
코드에서는 widht=5, height=5로 되어 있어, 작동하지만,
width=5, height=6이면, 작동하지 않습니다.
예들 들어,
self.value_table = [[0.0] * env.width for _ in range(env.height)] # height x width
--->
self.value_table = [[0.0] * env.height for _ in range(env.width)] # width x height
코드 전체를 좀 손봐야 할 것 같습니다.
graphic상 에
-
Updated
Feb 10, 2019 - Python
-
Updated
Dec 10, 2019 - Jupyter Notebook
-
Updated
Jan 25, 2020 - Python
-
Updated
Feb 16, 2018 - Jupyter Notebook
-
Updated
Jan 8, 2019 - Python
-
Updated
Dec 26, 2019 - Python
-
Updated
Apr 19, 2019 - Python
-
Updated
Oct 16, 2018 - Python
Improve this page
Add a description, image, and links to the policy-gradient topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the policy-gradient topic, visit your repo's landing page and select "manage topics."
I was surprised to see this loss function because it is generally used when the target is a distribution (i.e. sums to 1). This is not the case for the advantage estimate. However, I worked out the math and it does appear to be doing the right thing which is neat!
I think this trick should be mentioned in the code.