# LLaMA-Rider
**Repository Path**: sunfangyi/LLaMA-Rider
## Basic Information
- **Project Name**: LLaMA-Rider
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2023-11-11
- **Last Updated**: 2023-11-11
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
#
LLaMA-Rider: Spurring Large Language Models to Explore the Open World
[[Arxiv Paper]](https://arxiv.org/abs/2310.08922)
---
**LLaMA-Rider** is a two-stage framework:
* Exploration stage: LLM explores the open world with the help of the environmental feedback, where a feedback-revision mechanism helps the LLM revise its previous decisions to align with the environment
* Learning stage: The experiences collected during exploration stage are processed into a supervised dataset and used for supervised fine-tuning (SFT) of the LLM
## Exploration stage
In the exploration stage, for tasks based on logs/stones/mobs, run
```shell
python collect_feedback.py
```
For tasks based on iron ore, run
```shell
python collect_feedback_iron.py
```
Available tasks are listed in `envs/hard_task_conf.yaml`. One can modify the file to change task settings.
## Learning stage
One can process the explored experiences into a supervised dataset by calling:
```shell
python process_data.py
```
For learning stage, we use [QLoRA](https://github.com/artidoro/qlora) to train the LLM. Run
```shell
sh train/scripts/sft_70B.sh
```
## Evaluation
For evaluation with the LLM after SFT, run
```shell
python collect_feedback.py --adapter /path/to/adatper
```
## Main results
**LLaMA-Rider** outperforms ChatGPT planner on average across 30 tasks in Minecraft based on LLaMA-2-70B-chat.
Besides, **LLaMA-Rider** can accomplish 56.25% more tasks after learning stage using only a 1.3k supervised data, showing the efficiency and effectiveness of the framework.

We also found **LLaMA-Rider** can achieve better performance in unseen iron-based tasks, which are more difficult, after exploration & learning in 30 log/stone/mob-based tasks, showing the generalization of the learned decision making capabilities.

## Citation
If you use our method or code in your research, please consider citing the paper as follows:
```latex
@article{feng2023llama,
title={LLaMA Rider: Spurring Large Language Models to Explore the Open World},
author={Yicheng Feng and Yuxuan Wang and Jiazheng Liu and Sipeng Zheng and Zongqing Lu},
journal={arXiv preprint arXiv:2310.08922},
year={2023}
}
```