# PixelCraft **Repository Path**: mirrors_microsoft/PixelCraft ## Basic Information - **Project Name**: PixelCraft - **Description**: High-Fidelity Visual Reasoning on Structured Images - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-10-01 - **Last Updated**: 2025-10-04 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # PixelCraft: A Multi-Agent System for High-Fidelity Visual Reasoning on Structured Images [๐ Arxiv](https://arxiv.org/abs/2509.25185) | ๐ง Code: *Coming Soon* --- ## ๐ Overview **PixelCraft** is a novel multi-agent system designed to enable **high-fidelity visual reasoning** on **structured images** such as charts, scientific plots, and geometric diagrams. Unlike existing multimodal large language models (MLLMs) that often suffer from perceptual errors and rigid linear reasoning, PixelCraft introduces a **dynamic, collaborative agent framework** that integrates precise image processing with flexible, non-linear reasoning strategies. By combining pixel-level grounding with classical computer vision tools and multi-agent collaboration, PixelCraft brings notable gains in structured image understanding. --- ## ๐ Key Features * **๐ High-Fidelity Image Processing:** Fine-tuned MLLM with pixel-level grounding provides precise localization of visual elements, enabling accurate data extraction and visual manipulation. * **๐ง Multi-Agent Collaboration:** A planner, reasoner, critics, and visual tool agents work together in a three-stage workflow โ **query-aware tool selection**, **role-driven reasoning**, and **iterative self-correction**. * **๐ Image Memory & Non-Linear Reasoning:** A novel image memory mechanism allows agents to **revisit intermediate visual states**, **branch reasoning paths**, and **refine conclusions**, overcoming the limitations of linear chain-of-thought approaches. * **๐ง Specialized Visual Tools:** PixelCraft introduces a rich suite of visual tool agents for tasks like subfigure cropping, region magnification, legend masking, auxiliary line drawing, and geometric construction. --- ## ๐ Performance Highlights PixelCraft achieves substantially higher performance in structured visual reasoning across multiple challenging datasets: | Benchmark | GPT-4o | GPT-4.1-mini | Claude 3.7 Sonnet | | -------------- | ------ | ------------ | ----------------- | | **ChartXiv** | 55.2 | 68.1 | 73.9 | | **ChartQAPro** | 58.83 | 65.56 | 69.82 | | **EvoChart** | 70.24 | 79.44 | 80.48 | --- ## ๐งช System Workflow The PixelCraft pipeline follows a **three-stage collaborative workflow**, enabling precise tool usage and flexible reasoning.