Skip to content
#

policy-gradient

Here are 231 public repositories matching this topic...

fredcallaway
fredcallaway commented Jun 29, 2017

I was surprised to see this loss function because it is generally used when the target is a distribution (i.e. sums to 1). This is not the case for the advantage estimate. However, I worked out the math and it does appear to be doing the right thing which is neat!

I think this trick should be mentioned in the code.

JacobHanouna
JacobHanouna commented Mar 7, 2020

BTgym have two main sections, the Gym framework and the RL algorithm framework.
The RL part is tailored to the unique gym requirements of BTgym, but as new research in the field is emerging there will be a benefit in exploring new algorithms that aren't implemented by this project.

The following tutorial is my own attempt of testing the integration between the Gym part of BTgym with an externa

Reinforcement Learning Tutorial with Demo: DP (Policy and Value Iteration), Monte Carlo, TD Learning (SARSA, QLearning), Function Approximation, Policy Gradient, DQN, Imitation, Meta Learning, Papers, Courses, etc..

  • Updated Jan 22, 2019
  • Jupyter Notebook
hccho2
hccho2 commented Dec 15, 2019

1-grid-word ---> 1-policy-iteration 에서

코드 전제적으로 width, height 순서가 맞지 않습니다.
코드에서는 widht=5, height=5로 되어 있어, 작동하지만,
width=5, height=6이면, 작동하지 않습니다.

예들 들어,

self.value_table = [[0.0] * env.width for _ in range(env.height)]  # height x width

--->

self.value_table = [[0.0] * env.height for _ in range(env.width)]  # width x height

코드 전체를 좀 손봐야 할 것 같습니다.

graphic상 에

Improve this page

Add a description, image, and links to the policy-gradient topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the policy-gradient topic, visit your repo's landing page and select "manage topics."

Learn more

You can’t perform that action at this time.