# Agent-E **Repository Path**: felixchina2024/Agent-E ## Basic Information - **Project Name**: Agent-E - **Description**: Agent-E是一个基于AutoGen代理框架构建的系统,旨在自动化用户电脑上的操作,特别是浏览器环境中的自动化。它通过自然语言处理技术,允许用户以日常语言指令来控制网页浏览器,执行如填写表单、筛选电商平台商品、检索网站信息、控制网页媒体播放、进行网络搜索、管理项目管理软件中的任务以及提供个性化的购物辅助等操作。 - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 1 - **Forks**: 0 - **Created**: 2024-08-13 - **Last Updated**: 2024-09-27 ## Categories & Tags **Categories**: Uncategorized **Tags**: AI代理 ## README # Agent-E 📚 [Cite paper](https://arxiv.org/abs/2407.13032) Agent-E is an agent based system that aims to automate actions on the user's computer. At the moment it focuses on automation within the browser. The system is based on on [AutoGen agent framework](https://github.com/microsoft/autogen). This provides a natural language way to interacting with a web browser: - Fill out forms (web forms not PDF yet) using information about you or from another site - Search and sort products on e-commerce sites like Amazon based on various criteria, such as bestsellers or price. - Locate specific content and details on websites, from sports scores on ESPN to contact information on university pages. - Navigate to and interact with web-based media, including playing YouTube videos and managing playback settings like full-screen and mute. - Perform comprehensive web searches to gather information on a wide array of topics, from historical sites to top local restaurants. - Manage and automate tasks on project management platforms (like JIRA) by filtering issues, easing the workflow for users. - Provide personal shopping assistance, suggesting products based on the user's needs, such as storage options for game cards. While Agent-E is growing, it is already equipped to handle a versatile range of tasks, but the best task is the one that you come up with. So, take it for a spin and tell us what you were able to do with it. For more information see our [blog article](https://www.emergence.ai/blog/distilling-the-web-for-multi-agent-automation). ## Quick Start ### Setup - install `uv` https://github.com/astral-sh/uv - macOS/Linux: `curl -LsSf https://astral.sh/uv/install.sh | sh` - Windows: `powershell -c "irm https://astral.sh/uv/install.ps1 | iex"` - Alternatively you can use pip `pip install uv` - Create `uv venv --python 3.11` (`3.10+` should work) - Activate the virtual environment: `source .venv/bin/activate` (Windows: `.venv\Scripts\activate`) - Generate the requirements.txt from toml file: `uv pip compile pyproject.toml -o requirements.txt` - Install the generated requirements file: `uv pip install -r requirements.txt` - To install extras/dev dependancies: `uv pip install -r pyproject.toml --extra dev` - If you do not have Google Chrome locally (and don't want to install it), install playwright drivers: `playwright install` - .env file in project root is needed with the following (sample `.env-example` is included for convience): - Follow the directions in the sample file - You will need to set `AUTOGEN_MODEL_NAME` (We recommend using `gpt-4-turbo` for optimal performance) and `AUTOGEN_MODEL_API_KEY`. - If you are using a model other than OpenAI, you need to set `AUTOGEN_MODEL_BASE_URL` for example `https://api.groq.com/openai/v1` or `https://.openai.azure.com` on [Azure](https://azure.microsoft.com/). - For [Azure](https://azure.microsoft.com/), you'll also need to configure `AUTOGEN_MODEL_API_TYPE=azure` and `AUTOGEN_MODEL_API_VERSION` (for example `2023-03-15-preview`) variables. - If you want to use local chrome browser over playwright browser, go to chrome://version/ in chrome, find the path to your profile and set `BROWSER_STORAGE_DIR` to the path value ### pip issues If you run into an issue where pip is not installed in the virtual env, you can take the following steps: 1. activate the venv 2. `python -m ensurepip --upgrade` This will install pip 3. Deactivate the venv: `deactivate` 4. Activate the venv again 5. If you look in the `.venv/bin` dir you will not see pip3. At this point, you do not have pip, but you have `pip3` ### Blocking IO issues: If you are on mac and you are getting _BlockingIOError: [Errno 35] write could not complete without blocking_ when autogen tries to print large amont of text: - Run python with `-u` command `python -u -m ae.main` which will make it unbuffered and the issue will go away. However, there is a change that not all the output will be in the terminal. ### User preferences To personalize this agent, there is a need for Long Term Memory (LTM) that tracks user preferences over time. For the time being we provide a user preferences free form text file that acts as a static LTM. You can see a sample [here](ae/user_preferences/user_preferences.txt). Feel free to customize this file as you wish making it more personal to you. This file might move to `.gitignore` in future changes. ### Run the code: `python -m ae.main` (if you are on a Mac, `python -u -m ae.main` See blocking IO issues above) Once the program is running, you should see an icon on the browser. The icon expands to chat-like interface where you can enter natural language requests. For example, `open youtube`, `search youtube for funny cat videos`, `find Nothing Phone 2 on Amazon and sort the results by best seller`, etc. ### Launch via web endpoint There is a FastAPI wrapper for Agent-E. It allows the user to send commands via HTTP and receive streaming results. - Run `uvicorn ae.server.api_routes:app --reload --loop asyncio` - Send POST requests to: `http://127.0.0.1:8000/execute_task` - Sample cURL: ``` curl --location 'http://127.0.0.1:8000/execute_task' \ --header 'Content-Type: application/json' \ --data '{ "command": "go to espn, look for soccer news, report the names of the most recent soccer champs" }' ``` ### Additional environment variables Agent-E has a few more env variables that can be added to `.env` or whichever environment you are using. `SAVE_CHAT_LOGS_TO_FILE`: true | false (Default: `true`) Indicates whether to save chat logs, for planner and nested chat, into files or log them to stdout `LOG_MESSAGES_FORMAT`: json | text (Default: `text`) Whether to using structured logging or text logging. If text is used, json objects will not be output. This will mainly be used for chat logs, so if `SAVE_CHAT_LOGS_TO_FILE` is set to `true`, then setting this to `text` will be fine. ## Demos | Video | Command | Description | |-----------|-------------|-------------| | [![Oppenheimer Video](docs/images/play-video-on-youtube-thumbnail.png)](https://www.youtube.com/embed/v4BgYiDHNZs) | There is an Oppenheimer video on youtube by Veritasium, can you find it and play it? | | | [![Example 2: Use information to fill forms](docs/images/form-filling-thumbnail.png)](https://www.youtube.com/embed/uyE7tfKkB0E) | Can you do this task? Wait for me to review before submitting. | Takes the highlighted text from the email as part of the instruction.