# GPU_A3C **Repository Path**: AngryPanda_XYZ/gpu_a3c ## Basic Information - **Project Name**: GPU_A3C - **Description**: Nvidia提出的GPU版本实现的A3C算法,论文《Reinforcement Learning thorugh Asynchronous Advantage Actor-Critic on a GPU》,该算法使用单GPU进行推理和训练(actor和critic),多进程对环境进行在线数据采样,共同调用GPU中的actor,官方地址:https://github.com/NVlabs/GA3C - **Primary Language**: Python - **License**: BSD-3-Clause - **Default Branch**: master - **Homepage**: https://openreview.net/forum?id=r1VGvBcxl - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2023-09-06 - **Last Updated**: 2024-04-26 ## Categories & Tags **Categories**: Uncategorized **Tags**: 不再维护的项目 ## README # GA3C: Reinforcement Learning through Asynchronous Advantage Actor-Critic on a GPU A hybrid CPU/GPU version of the Asynchronous Advantage Actor-Critic (A3C) algorithm, currently the state-of-the-art method in reinforcement learning for various gaming tasks. This CPU/GPU implementation, based on TensorFlow, achieves a significant speed up compared to a similar CPU implementation. ## How do I get set up? ### * Install [Python > 3.0](https://www.python.org/) * Install [TensorFlow 1.0](https://www.tensorflow.org/install/install_linux) * Install [OpenAI Gym](https://github.com/openai/gym) * Clone the repo. * That's it folks! ## How to Train a model from scratch? ### Run `sh _clean.sh` first, and then `sh _train.sh`. The script `_clean.sh` cleans the checkpoints folder, which contains the network models saved during the training process, as well as removing `results.txt`, which is a log of the scores achieved during training. > Remember to save your trained models and scores in a different folder if needed before cleaning. `_train.sh` launches the training procedure, following the parameters in `Config.py`. You can modify the training parameters directly in `Config.py`, or pass them as argument to `_train.sh`. E.g., launching `sh _train.sh LEARNING_RATE_START=0.001` overwrites the starting value of the learning rate in `Config.py` with the one passed as argument (see below). You may want to modify `_train.sh` for your particular needs. The output should look like below: ... [Time: 33] [Episode: 26 Score: -19.0000] [RScore: -20.5000 RPPS: 822] [PPS: 823 TPS: 183] [NT: 2 NP: 2 NA: 32] [Time: 33] [Episode: 27 Score: -20.0000] [RScore: -20.4815 RPPS: 855] [PPS: 856 TPS: 183] [NT: 2 NP: 2 NA: 32] [Time: 35] [Episode: 28 Score: -20.0000] [RScore: -20.4643 RPPS: 854] [PPS: 855 TPS: 185] [NT: 2 NP: 2 NA: 32] [Time: 35] [Episode: 29 Score: -19.0000] [RScore: -20.4138 RPPS: 877] [PPS: 878 TPS: 185] [NT: 2 NP: 2 NA: 32] [Time: 36] [Episode: 30 Score: -20.0000] [RScore: -20.4000 RPPS: 899] [PPS: 900 TPS: 186] [NT: 2 NP: 2 NA: 32] ... **PPS** (predictions per second) demonstrates the speed of processing frames, while **Score** shows the achieved score. **RPPS** and **RScore** are the rolling average of the above values. To stop the training procedure, adjuts `EPISODES` in `Config.py` propoerly, or simply use ctrl + c. ## How to continue training a model? ### If you want to continue training a model, set `LOAD_CHECKPOINTS=True` in `Config.py`, and set `LOAD_EPISODE` to the episode number you want to load. Be sure that the corresponding model has been saved in the checkpoints folder (the model name includes the number of the episode). > Be sure not to use `_clean.sh` if you want to stop and then continue training! ## How to play a game with a trained agent? ### Run `_play.sh` You may want to modify this script for your particular needs. ## How to change the game, configurations, etc.? ### All the configurations are in `Config.py` As mentioned before, one useful way of modifying a config is to pass it as an argument to `_train.sh`. For example, to save the models while training, just run: `train.sh TRAINERS=4`. ## Sample learning curves Typical learning curves for Pong and Boxing are shown here. These are easily obtained from the results.txt file. ![Convergence Curves](http://mb2.web.engr.illinois.edu/images/pong_boxing.png) ### References ### If you use this code, please refer to our [ICLR 2017 paper](https://openreview.net/forum?id=r1VGvBcxl): ``` @conference{babaeizadeh2017ga3c, title={Reinforcement Learning thorugh Asynchronous Advantage Actor-Critic on a GPU}, author={Babaeizadeh, Mohammad and Frosio, Iuri and Tyree, Stephen and Clemons, Jason and Kautz, Jan}, booktitle={ICLR}, biurl={https://openreview.net/forum?id=r1VGvBcxl}, year={2017} } ``` This work was first presented in an oral talk at the [The 1st International Workshop on Efficient Methods for Deep Neural Networks](http://allenai.org/plato/emdnn/papers.html), NIPS Workshop, Barcelona (Spain), Dec. 9, 2016: ``` @article{babaeizadeh2016ga3c, title={{GA3C:} {GPU}-based {A3C} for Deep Reinforcement Learning}, author={Babaeizadeh, Mohammad and Frosio, Iuri and Tyree, Stephen and Clemons, Jason and Kautz, Jan}, journal={NIPS Workshop}, biurl={arXiv preprint arXiv:1611.06256}, year={2016} } ```