# word_compare **Repository Path**: bi_cel/word_compare ## Basic Information - **Project Name**: word_compare - **Description**: 这是一个网页版可视化的 office word文档对比工具 - **Primary Language**: Python - **License**: GPL-3.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 1 - **Created**: 2025-10-29 - **Last Updated**: 2025-10-29 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Word Document Comparison Tool (Word文档比较工具) This application compares two Word documents and highlights the differences between them. ## Version **v1.0** - First stable release with similarity-based paragraph matching and original formatting preservation ## Author **hk0369** ## Features - Upload two Word documents (docx format) - View side-by-side comparison of differences between documents - View the original content of both documents - Download highlighted documents showing the differences - User-friendly web interface with tabbed navigation - Similarity-based paragraph matching (not just line-by-line comparison) - Preserves original document formatting in the highlighted output - Docker support for easy deployment on any platform (x86 and ARM) ## Technical Details - Backend: Python (Flask) - Document processing: python-docx - Diff generation: difflib - Highlighting: Word document formatting - Paragraph matching: Similarity metrics and Hungarian algorithm - Format preservation: Maintains original document styling - Containerization: Docker with multi-architecture support ## Installation ### Standard Installation 1. Clone this repository 2. Install required packages: ``` pip install -r requirements.txt ``` 3. Create the necessary directories if they don't exist: ``` mkdir -p uploads results logs ``` ### Docker Installation (Recommended) The application can be easily deployed using Docker: #### Using the Startup Scripts **Windows:** ``` start.bat ``` **Linux/macOS:** ``` chmod +x start.sh ./start.sh ``` The scripts will: - Create required directories - Generate a default `.env` file if one doesn't exist - Build and start the Docker container #### Manual Docker Setup 1. Make sure Docker and Docker Compose are installed on your system 2. Configure environment variables (optional): - Create a `.env` file in the project root - Set variables like `APP_PORT`, `APP_DEBUG`, etc. 3. Build and run the Docker container: ``` docker compose up -d ``` ## Usage ### Standard Usage 1. Start the application: ``` python main.py ``` 2. Open a web browser and go to `http://127.0.0.1:5000/` ### Docker Usage When using Docker, the application will be available at: ``` http://localhost:5000 ``` ### Document Comparison 1. Upload two Word documents using the web interface 2. View the differences and original document content using the tabs 3. Download highlighted documents if needed ## Project Structure ``` word-compare/ ├── main.py # Flask application ├── word_compare.py # Core comparison functionality ├── templates/ # HTML templates │ ├── index.html # Upload page │ └── result.html # Results page with tabbed interface ├── uploads/ # Temporary storage for uploaded files ├── results/ # Storage for highlighted documents ├── logs/ # Application logs ├── requirements.txt # Python dependencies ├── Dockerfile # Docker container configuration ├── docker-compose.yml # Docker Compose configuration ├── entrypoint.sh # Docker entrypoint script ├── start.bat # Windows startup script └── start.sh # Linux/macOS startup script ``` ## Implementation Details The application performs the following steps: 1. Reads Word documents using python-docx 2. Converts documents to a comparable format 3. Calculates similarity between paragraphs using difflib 4. Uses Hungarian algorithm to match the most similar paragraphs 5. Identifies differences between matched paragraphs 6. Generates HTML for side-by-side comparison 7. Creates highlighted Word documents while preserving original formatting 8. Provides downloads of the highlighted documents ## Docker Configuration The application uses a multi-architecture Docker setup that automatically selects the appropriate base image: - For x86 systems: `python:3.12-slim` - For ARM systems: `python:3.12-slim-linuxarm64` Environment variables can be configured in the `.env` file: - `APP_PORT`: Port to run the application (default: 5000) - `APP_HOST`: Host address to bind to (default: 0.0.0.0) - `APP_DEBUG`: Enable debug mode (default: false) - `TZ`: Timezone (default: Asia/Shanghai) ## Changelog ### v1.1 - Added Docker support with multi-architecture images - Added startup scripts for Windows and Linux/macOS - Docker environment configuration via .env file - Fixed line ending issues in scripts - Improved error handling ### v1.0 - Initial stable release - Implemented similarity-based paragraph matching - Added format preservation in highlighted documents - Fixed display and styling issues - Improved error handling and reporting