以下は、各種強化学習ライブラリと、対応している強化学習アルゴリズムのリストです。
※一部、可視化ツールや要素技術など、強化学習アルゴリズムではないものも含みますがご了承ください。
ライブラリ
tensorforce
github: https://github.com/tensorforce/tensorforce
documentation: http://tensorforce.readthedocs.io/
paper: http://arxiv.org/abs/1808.07903
keras-rl
github: https://github.com/keras-rl/keras-rl
documentation: https://keras-rl.readthedocs.io/en/latest/
論文は出していないようです。
PFRL
github: https://github.com/pfnet/pfrl
documentation: https://pfrl.readthedocs.io/en/latest/
paper: ChainerRL: A Deep Reinforcement Learning Library
stable baselines3
github: https://github.com/DLR-RM/stable-baselines3
documentation: https://stable-baselines3.readthedocs.io/en/master/
paper: Stable-Baselines3: Reliable Reinforcement Learning Implementations
stable baselines3には、実験的に最新のアルゴリズムの実装も別のリポジトリで公開してます。それが、Stable Baselines3 Contribです。本記事では、これもStable Baselinesに含めています。
github: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
documentation: https://sb3-contrib.readthedocs.io/en/master/
TF-Agents
github: https://github.com/tensorflow/agents
documentation: https://www.tensorflow.org/agents
論文は出していないようです。
Coach
github: https://github.com/IntelLabs/coach
documentation: https://intellabs.github.io/coach/
paper: Reinforcement Learning Coach
RLlib
github: https://github.com/ray-project/ray/tree/master/rllib
documentation: https://docs.ray.io/en/latest/rllib/index.html
paper: https://arxiv.org/abs/1712.09381
CleanRL
github: https://github.com/vwxyzjn/cleanrl
documentation: https://docs.cleanrl.dev/
paper: CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms
mbrl-lib
github: https://github.com/facebookresearch/mbrl-lib
documentation: https://facebookresearch.github.io/mbrl-lib/
paper: MBRL-Lib: A Modular Library for Model-based Reinforcement Learning
対応アルゴリズム
備考欄に記載の内容は以下を参照
- mb: model-based reinforcement learning
- 備考欄ではないですが、Stable Baselines3の列において✔の横に「sbc」と記載があるものは、Stable Baselines3 Contribで公開されているものです。
強化学習ライブラリ | tensorforce | Keras-rl | PFRL | Stable Baselines3 | TF-Agents | Coach | RLlib | CleanRL | mbrl-lib | 備考 |
ライセンス | Apache-2.0 | MIT | MIT | MIT | Apache-2.0 | Apache-2.0 | Apache-2.0 | MIT |
MIT |
|
TensorBoard | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | 可視化ツール | |
Deep SARSA | ✔ | |||||||||
DQN | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ||
Bootstrap DQN | ✔ | |||||||||
APEX-DQN | ✔ | |||||||||
QR-DQN | ✔sbc | ✔ | ||||||||
N-Step Q Learning | ✔ | |||||||||
Double DQN | ✔ | ✔ | ✔ | ✔ | ✔ | |||||
Prioritized Experience Replay (PER) | ✔ | ✔ | ||||||||
Hindsight Experience Replay (HER) | ✔ | ✔ | ||||||||
Dueling Network (Dueling DQN) | ✔ | ✔ | ✔ | ✔ | ||||||
NAF (Continuous DQN) | ✔ | ✔ | ✔ | |||||||
Categorical DQN | ✔ | ✔ | ✔ | |||||||
Noisy Network | ✔ | ✔ | ||||||||
IQN | ✔ | |||||||||
PAL | ✔ | ✔ | ||||||||
NEC | ✔ | |||||||||
Gorila | ||||||||||
Rainbow | ✔ | ✔ | ✔ | |||||||
Ape-X | ||||||||||
R2D2 | ✔ | |||||||||
Pseudo Count Based | ||||||||||
ICM | ||||||||||
RND | ||||||||||
NGU | ||||||||||
Agent57 | ||||||||||
Policy Gradient | ✔ | ✔ | ||||||||
ACKTR | ✔ | |||||||||
Actor-Critic | ✔ | |||||||||
HAC | ✔ | |||||||||
A3C | ✔ | ✔ | ✔ | |||||||
A2C | ✔ | ✔ | ✔ | ✔ | ||||||
ACER | ✔ | ✔ | ✔ | |||||||
UNREAL | ||||||||||
World Models | ||||||||||
SimPLe | ||||||||||
REINFORCE | ✔ | |||||||||
VPG | ✔ | |||||||||
DPG | ✔ | |||||||||
DDPG | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | |||
MADDPG | ✔ | |||||||||
APEX-DDPG | ✔ | |||||||||
TRPO | ✔ | ✔ | ✔sbc | |||||||
CPPO | ✔ | |||||||||
APPO | ✔ | |||||||||
DD-PPO | ✔ | |||||||||
MaskablePPO | ✔sbc | |||||||||
RecurrentPPO | ✔sbc | |||||||||
PPO | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | |||
TD3 | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ||||
SAC | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ||||
TQC | ✔sbc | |||||||||
GAE | ✔ | |||||||||
Dreamer | ✔ | mb | ||||||||
IMPALA | ✔ | |||||||||
MC | ||||||||||
MMC | ✔ | |||||||||
TD Gammon | ||||||||||
AlphaGo | ||||||||||
AlphaGo Zero | ||||||||||
AlphaZero | ✔ | |||||||||
AlphaStar | ||||||||||
OpenAI Five | ||||||||||
MuZero | ||||||||||
CEM | ✔ | |||||||||
BC | ✔ | ✔ | ||||||||
Conditional Imitation Learning | ✔ | |||||||||
GAIL | ✔ | |||||||||
LinUCB | ✔ | |||||||||
LinTS | ✔ | |||||||||
ARS | ✔sbc | ✔ | ||||||||
CQL | ✔ | |||||||||
SlateQ | ✔ | |||||||||
QMIX | ✔ | |||||||||
ES | ✔ | |||||||||
MAML | ✔ | |||||||||
MARWIL | ✔ | |||||||||
MBMPO | ✔ | mb | ||||||||
MBPO | ✔ | mb | ||||||||
PETS | ✔ | mb | ||||||||
PlaNet | ✔ | mb | ||||||||
DDQ |
mb |
|||||||||
DFP | ✔ | |||||||||
CRR | ✔ | |||||||||
PPG | ✔ | |||||||||