# qmtStudy **Repository Path**: ooooinfo/qmt-study ## Basic Information - **Project Name**: qmtStudy - **Description**: QMT 学习资料 - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-12-25 - **Last Updated**: 2025-12-25 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # QMT Documentation Scraper A Python web scraper that extracts content from the QMT (QuantMiniTrader) documentation website and saves it locally in a structured format within a qmthelp directory. ## Project Structure ``` qmt-doc-scraper/ ├── qmt_doc_scraper/ # Main package directory │ ├── __init__.py # Package initialization │ ├── web_client.py # HTTP requests and rate limiting │ ├── html_parser.py # HTML parsing and content extraction │ ├── content_processor.py # Content processing and formatting │ ├── file_manager.py # File system operations │ └── config_manager.py # Configuration management ├── main.py # Main entry point script ├── config.json # Default configuration file ├── requirements.txt # Python dependencies └── README.md # This file ``` ## Installation 1. Install dependencies: ```bash pip install -r requirements.txt ``` ## Usage ### Basic Usage Run the scraper with default settings: ```bash python main.py ``` Run with custom configuration: ```bash python main.py --config custom_config.json --output-dir my_docs ``` ### Command-Line Interface The QMT Documentation Scraper provides a comprehensive command-line interface with the following options: #### Configuration Options - `--config, -c FILE`: Path to configuration file (default: config.json) - `--output-dir, -o DIR`: Output directory for scraped content (overrides config) - `--url, -u URL`: Add URL to scrape (can be used multiple times) #### Scraping Behavior Options - `--delay SECONDS`: Delay between requests in seconds (overrides config) - `--max-retries N`: Maximum number of retries for failed requests (overrides config) - `--timeout SECONDS`: Request timeout in seconds (overrides config) - `--no-assets`: Skip downloading assets (images, CSS, etc.) - `--no-index`: Skip generating navigation index #### Output and Logging Options - `--verbose, -v`: Increase verbosity (use -v, -vv, or -vvv) - `--quiet, -q`: Suppress all output except errors - `--log-file FILE`: Write logs to file (overrides config) - `--no-progress`: Disable progress reporting #### Utility Options - `--dry-run`: Show what would be scraped without actually scraping - `--list-config`: Display current configuration and exit - `--validate-config`: Validate configuration file and exit - `--version`: Show version information - `--help`: Show help message ### Examples #### Basic scraping with default configuration: ```bash python main.py ``` #### Scrape specific URLs with custom output directory: ```bash python main.py --url https://dict.thinktrader.net/innerApi/start_now.html \ --url https://dict.thinktrader.net/innerApi/another_page.html \ --output-dir ./qmt_docs ``` #### Scrape with custom settings and verbose output: ```bash python main.py --delay 2.0 --max-retries 5 --verbose --log-file scraper.log ``` #### Preview what would be scraped (dry run): ```bash python main.py --dry-run ``` #### Validate configuration file: ```bash python main.py --validate-config --config my_config.json ``` #### Scrape without downloading assets, with minimal output: ```bash python main.py --no-assets --quiet --no-progress ``` #### Display current configuration: ```bash python main.py --list-config ``` ### Exit Codes The scraper returns the following exit codes: - `0`: Success - `1`: General error (scraping failed, unexpected error) - `2`: Invalid arguments or configuration - `130`: Interrupted by user (Ctrl+C) ### Progress Reporting By default, the scraper displays progress information during execution: ``` QMT Documentation Scraper v1.0.0 Configuration: config.json Output directory: qmthelp URLs to scrape: 1 -------------------------------------------------- Progress: 1/1 (100.0%) - Success: 1, Failed: 0, Assets: 5 Scraping Results: ================================================== Success: Yes Pages scraped: 1 Assets downloaded: 5 Duration: 12.34 seconds Output directory: qmthelp Documentation successfully saved to: qmthelp Open qmthelp/index.html in your browser to browse the documentation. ``` Use `--quiet` to suppress output or `--no-progress` to disable progress updates while keeping other output. ## Configuration The scraper uses a JSON configuration file to specify: - Target URLs to scrape - Output directory settings - Request parameters (delays, retries, timeouts) - Content processing options - Logging configuration See `config.json` for the default configuration structure. ## Development Status This project is currently under development. Core functionality will be implemented in subsequent development phases.