# DecisionMamba **Repository Path**: rationalspark/DecisionMamba ## Basic Information - **Project Name**: DecisionMamba - **Description**: No description available - **Primary Language**: Python - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-04-18 - **Last Updated**: 2025-04-18 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

Decision Mamba: A Multi-Grained State Space Model with Self-Evolution Regularization for Offline RL

[**Qi Lv**](https://aopolin-lv.github.io/)1,2 · [**Xiang Deng**](https://xiang-deng-dl.github.io/)1† · [**Gongwei Chen**](https://scholar.google.com/citations?user=Mpg0w3cAAAAJ)1 · [**Michael Yu Wang**](https://scholar.google.com/citations?user=Oo7c22wAAAAJ&hl=zh-CN)2 · [**Liqiang Nie**](https://liqiangnie.github.io/)1† 1Harbin Institute of Technology (Shenzhen)    2Great Bay University     corresponding author **NeurIPS 2024** Paper PDF
This work presents Decision Mamba (DM), a novel multi-grained state space model (SSM) with a self-evolving policy learning strategy. DM explicitly models the historical hidden state to extract the temporal information by using the mamba architecture. To capture the relationship among state-action-RTG triplets, a fine-grained SSM module is designed and integrated into the original coarse-grained SSM in mamba, resulting in a novel mamba architecture tailored for offline RL. Finally, to mitigate the overfitting issue on noisy trajectories, a self-evolving policy is proposed by using progressive regularization. The policy evolves by using its own past knowledge to refine the suboptimal actions, thus enhancing its robustness on noisy demonstrations. **Framework** **The action distribution of polices trained on the different noisy data** ## Performance Here we present the performance comparsion between DM and baseline models. ## Usage ### Installation #### 1. Install MuJoCo First, you need to download the file from this [link](https://mujoco.org/download/mujoco210-linux-x86_64.tar.gz) and `tar -xvf the_file_name` in the `~/.mujoco` folder. Then, run the following commands. ```bash cd experiment-d4rl conda env create -f env.yml ``` If you encounter errors brought by `mamba_ssm` and `causal_conv1d`, we suggest you downloading the source code and install manually. The versions are as follows: ``` mamba-ssm==1.2.0.post1 causal-conv1d==1.2.0.post2 ``` After that, add the following lines to your `~/.bashrc` file: ```bash export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/YOUR_PATH_TO_THIS/.mujoco/mujoco210/bin export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/nvidia ``` Remember to `source ~/.bashrc` to make the changes take effect. #### 2. Install D4RL Install D4RL by following the guidance in [D4RL](https://github.com/Farama-Foundation/D4RL). #### 3. Install transformers Install `transformers` package with: ```bash cd transformers pip install -e . ``` We also provide the `requirements.txt` and `environment.yaml` for installing or checking the packages. #### 3. Dataset To download original D4RL data, ```bash cd data python download_d4rl_datasets.py ``` ### Run ```bash sh run.sh hopper medium 0 0.85 ``` ## Acknowledgement **DM** is based on many open-source projects, including [Decision Transformer](https://github.com/kzl/decision-transformer), [Can Wikipedia Help Offline Reinforcement Learning](https://github.com/machelreid/can-wikipedia-help-offline-rl), [LaMo](https://github.com/srzer/LaMo-2023), [LoRA](https://github.com/microsoft/LoRA), [DeFog](https://github.com/hukz18/DeFog). We thank all these authors for their nicely open sourced code and their great contributions to the community. ## License DM is licensed under the MIT license. See the [LICENSE](LICENSE) file for details. ## Citation If you find this project useful, please consider citing: ```bibtex @inproceedings{NEURIPS2024_288b63aa, author = {Lv, Qi and Deng, Xiang and Chen, Gongwei and Wang, Michael Yu and Nie, Liqiang}, booktitle = {Advances in Neural Information Processing Systems}, editor = {A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang}, pages = {22827--22849}, publisher = {Curran Associates, Inc.}, title = {Decision Mamba: A Multi-Grained State Space Model with Self-Evolution Regularization for Offline RL}, url = {https://proceedings.neurips.cc/paper_files/paper/2024/file/288b63aa98084366c4536ba0574a0f22-Paper-Conference.pdf}, volume = {37}, year = {2024} } ```