# sqlearn

**Repository Path**: wisestruct/sqlearn

## Basic Information

- **Project Name**: sqlearn
- **Description**: 来源：https://github.com/haarnoja/softqlearning
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2024-02-05
- **Last Updated**: 2024-08-09

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

**This repository is clone from https://github.com/haarnoja/softqlearning

# Soft Q-Learning
Soft Q-learning (SQL) is a deep reinforcement learning framework for training maximum entropy policies in continuous domains. The algorithm is based on the paper [Reinforcement Learning with Deep Energy-Based Policies](https://arxiv.org/abs/1702.08165) presented at the International Conference on Machine Learning (ICML), 2017.


## Prerequisites
目前复现的环境为：ubuntu 23.04, nvidia 1080Ti

## Local Installation


2.Clone `sqlearn`, Create and activate conda environment
```
git clone https://gitee.com/iseekseek/sqlearn
cd sqlearn
cp mujoco /root/.mujoco -rf
conda env create -f environment.yml
source activate sqlenv
export PYTHONPATH=$(pwd)/rllab:${PYTHONPATH}
```

The environment should be ready to run. See examples section for examples of how to train and simulate the agents.


## Examples
### Training and simulating an agent
1. To train the agent
```
python ./examples/mujoco_all_sql.py --env=swimmer --log_dir="/root/sql/data/swimmer-experiment"
```

2. To simulate the agent (*NOTE*: This step currently fails with the Docker installation, due to missing display.)
```
python ./scripts/sim_policy.py /root/sql/data/swimmer-experiment/itr_<iteration>.pkl
```

`mujoco_all_sql.py` contains several different environments and there are more example scripts available in the  `/examples` folder. For more information about the agents and configurations, run the scripts with `--help` flag. For example:
```
python ./examples/mujoco_all_sql.py --help
usage: mujoco_all_sql.py [-h]
                         [--env {ant,walker,swimmer,half-cheetah,humanoid,hopper}]
                         [--exp_name EXP_NAME] [--mode MODE]
                         [--log_dir LOG_DIR]
```
### Training and combining policies
It is also possible to merge two existing maximum entropy policies to form a new composed skill that approximately optimizes both constituent tasks simultaneously as discussed in [ Composable Deep Reinforcement Learning for Robotic Manipulation](https://arxiv.org/abs/1803.06773). To run the pusher experiment described in the paper, you can first train two policies for the constituent tasks ("push the object to the given x-coordinate" and "push the object to the given y-coordinate") by running 
```
python ./examples/pusher_pretrain.py --log_dir=/root/sql/data/pusher
```
You can then combine the two policies to form a combined skill ("push the object to the given x and y coordinates"), without collecting more experience form the environment, with
```
python ./examples/pusher_combine.py --log_dir=/root/sql/data/pusher/combined \
--snapshot1=/root/sql/data/pusher/00/params.pkl \
--snapshot2=/root/sql/data/pusher/01/params.pkl
```


# Credits
The soft q-learning algorithm was developed by [Haoran Tang](https://math.berkeley.edu/~hrtang/) and [Tuomas Haarnoja](https://people.eecs.berkeley.edu/~haarnoja/) under the supervision of Prof. [Sergey Levine](https://people.eecs.berkeley.edu/~svlevine/) and Prof. [Pieter Abbeel](https://people.eecs.berkeley.edu/~pabbeel/) at UC Berkeley. Special thanks to [Vitchyr Pong](https://github.com/vitchyr), who wrote some parts of the code, and [Kristian Hartikainen](https://github.com/hartikainen) who helped testing, documenting, and polishing the code and streamlining the installation process. The work was supported by [Berkeley Deep Drive](https://deepdrive.berkeley.edu/).

# References
```
@article{haarnoja2017reinforcement,
  title={Reinforcement Learning with Deep Energy-Based Policies},
  author={Haarnoja, Tuomas and Tang, Haoran and Abbeel, Pieter and Levine, Sergey},
  booktitle={International Conference on Machine Learning},
  year={2017}
}
@article{haarnoja2018composable,
  title={Composable Deep Reinforcement Learning for Robotic Manipulation},
  author={Tuomas Haarnoja, Vitchyr Pong, Aurick Zhou, Murtaza Dalal, Pieter Abbeel, Sergey Levine},
  booktitle={International Conference on Robotics and Automation},
  year={2018}
}

```