# polaris_rag

**Repository Path**: shangyejun/polaris_rag

## Basic Information

- **Project Name**: polaris_rag
- **Description**: 很轻量的一个rag服务
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 2
- **Forks**: 0
- **Created**: 2024-08-19
- **Last Updated**: 2024-09-10

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# 北极星RAG

## 简介
这是一个最简实现的RAG，目前支持Openai库、Qwen2本地模型提供API服务、Zhipu的API服务，聊天的内容是从外挂知识库中去检索的，代码有些简陋，欢迎大佬进行指点，支持shell命令行对话和web网页对话，效果还可以，后续还有一些功能进行扩展，比如支持上下文，聊天记录本地存储，agent tools加载。文本向量化支持m3e和智谱embedding模型，后续将加入bge，正在开发中。现在的向量化的文本存储为json文件，参考了`TinyRAG`的实现，在开发中的版本将支持chroma向量数据库，个人觉得，以json文件存储文本向量的方式有助于我们去理解如何实现rag。

现在已经支持`llama-cpp-python`运行gguf格式的qwen2-0.5b的模型，这将使对rag感兴趣的人们更加容易上手，但模型参数的大小对增强检索生成的性能是有关系的，个人的经验是模型参数越大则效果越好，但是在资源有限的情况下，自然是希望可以用参数越小的模型实现更佳的问答效果。

### 开发环境

* python3.10
* vscode
* langchain
* openai
* zhipuai
* modelscope

### 环境搭建
#### 使用conda
```angular2html
conda create -n polaris_rag python=3.10
conda activate polaris_rag
pip install -r requirement.txt
```
#### 使用virtualenv 
```
pip install virtualenv

virtualenv polaris_rag -p python

# win
.\polaris_rag\Scripts\activate
# linux
source polaris_rag/bin/activate

pip install -r requirement.txt
```


### 最简使用

使用qwen2-05b的gguf模型搭建api服务进行推理

#### 下载qwen2-05b模型

```
model_dir = model_file_download(model_id='qwen/Qwen2-0.5B-Instruct-GGUF',
                                 file_path='qwen2-0_5b-instruct-q5_k_m.gguf',
                                 revision='master',
                                 cache_dir='./model')
```

#### llama-cpp-python

##### 下载

```
pip install llama-cpp-python
# 以下可参考llama-cpp-python使用 /document/llama-cpp-pytho使用.md
```

##### 启动

```
python -m llama_cpp.server --model ./model/qwen2-0_5b-instruct-q5_k_m.gguf --n_gpu_layers 32 --m_thread 8
```

#### 配置

修改.env文件

```
QWEN2_API_KEY="sk-no-key-required"
QWEN2_BASE_URL="http://localhost:8000/v1"
```

查看robot.py文件

```
# 初始化模型
llm = Qwen2LLM()
```

### 启动

#### shell交互
```angular2html
python robot.py --chat_mode shell
# 输入q结束对话
```

#### web交互
```angular2html
python robot.py --chat_mode web
```

### 使用
#### 导入大模型api key等
```angular2html
# 导入openai api key
import os
from dotenv import load_dotenv, find_dotenv
# .env 存储api_key
load_dotenv()
```

#### 构建向量数据库
##### 第一种方法
```angular2html
# 导入数据
python robot.py --db load

# 清除数据
python robot.py --db clean
```
##### 第二种方法
```
from component.embedding import ZhipuEmbedding
from component.data_utils import ReadFileFolder
from component.vector_databases import VectorDB

# 建立数据库
filter = ReadFileFolder('./dataset')
docs = filter.get_all_chunk_content(200,150)
embedding_model = ZhipuEmbedding()
database = VectorDB(docs)
vectors = database.get_vector(embedding_model)
# 保存数据
database.export_data()
```


#### 加载向量数据库
```angular2html
from component.embedding import ZhipuEmbedding
# 将向量和文档内容保存到db目录下，下次再用就可以直接加载本地的数据库
#加载向量数据库
text="项目结构"
embedding_model=ZhipuEmbedding()
db=VectorDB()
db.load_vector('./db')
result=db.query(text,embedding_model,10)
print(result)
```


### 参考资料
* https://github.com/abetlen/llama-cpp-python
* https://github.com/chroma-core/chroma
* https://datawhaler.feishu.cn/wiki/NNKJwO0q7iC3uakHjkhcwLhbn9d
* https://github.com/KMnO4-zx/TinyRAG
* https://github.com/langchain-ai/langchain
* https://llama-cpp-python.readthedocs.io