OpenAI深度強化學習入門專案:Spinning Up筆記(第二部分)
OPenAI的deep RL教程—— Spinning Up in Deep RL ,上面有基礎知識和經典演算法的盤點和部分實現程式碼,還有一些練習題,對新手比較友好,不過對完全沒有基礎的人來說還是有些難,所以我對每個部分進行翻譯並補充了個人的理解。希望有所收穫。
傳送門:OpenAI深度強化學習入門專案:Spinning Up筆記(第一部分)
以下是原文連結:
下面是我的翻譯+補充註釋(不僅僅是翻譯!不僅僅是翻譯!不僅僅是翻譯!):
二、各分類演算法的相關連結
[2]A2C / A3C (Asynchronous Advantage Actor-Critic): Mnih
et al
, 2016
[3]PPO (Proximal Policy Optimization): Schulman et al, 2017
[4]TRPO (Trust Region Policy Optimization): Schulman et al, 2015
[5]DDPG (Deep Deterministic Policy Gradient): Lillicrap et al, 2015
[6]TD3 (Twin Delayed DDPG): Fujimoto et al, 2018
[7]SAC (Soft Actor-Critic): Haarnoja et al, 2018
[8]DQN (Deep Q-Networks): Mnih et al, 2013
[9]C51 (Categorical 51-Atom DQN): Bellemare et al, 2017
[10]QR-DQN (Quantile Regression DQN): Dabney et al, 2017
[11]HER (Hindsight Experience Replay): Andrychowicz et al, 2017
[12]World Models: Ha and Schmidhuber, 2018
[13]I2A (Imagination-Augmented Agents): Weber et al, 2017
[14]MBMF (Model-Based RL with Model-Free Fine-Tuning): Nagabandi et al, 2017
[15]MBVE (Model-Based Value Expansion): Feinberg et al, 2018
[16]AlphaZero: Silver et al, 2017
如果發現有什麼錯誤或是問題,請聯絡評論或是QQ:1642127033