Q-learning (1992) |
none |
★★☆ |
★★★★ |
DP, MDP, Bellman Equation |
Policy Gradient Methods for Reinforcement Learning with Function Approximation (1999) |
none |
★★★ |
★★★☆ |
Basic RL |
Playing Atari with Deep Reinforcement Learning (2013) |
none |
★☆ |
★★☆ |
TD, NN |
Actor-Critic Algorithms (1999) |
none |
★★ |
★★★ |
PG, TD |
Deep Reinforcement Learning with Double Q-learning (2015) |
implements |
★ |
★★ |
DQN, Double-Q learning |
Deterministic Policy Gradient Algorithms (2014) |
none |
★★★ |
★★☆ |
PG, DQN |
Continuous control with deep reinforcement learning (2016) |
none |
★★ |
★☆ |
DPG, DQN |
Addressing Function Approximation Error in Actor-Critic Methods (2018) |
official |
★★★ |
★★☆ |
DDQN |
Trust Region Policy Optimization (2015) |
implements |
★★★★ |
★★★ |
KL divergence, Kakade-Langford inequality |
Proximal Policy Optimization Algorithms (2017) |
implements |
★☆ |
★★ |
TRPO |
Asynchronous Methods for Deep Reinforcement Learning (2016) |
implements implements
|
★ |
★★ |
Actor-Critic |
IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures (2018) |
implements |
★☆ |
★☆ |
A3C |