# next-evals-oss **Repository Path**: mirrors_sourcegraph/next-evals-oss ## Basic Information - **Project Name**: next-evals-oss - **Description**: Evals for Next.js up to 15.5.6 to test AI model competency at Next.js - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-10-28 - **Last Updated**: 2026-01-25 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Next.js Evals Evaluates the quality and correctness of Next.js code against popular AI models. ## Quick Start ### Prerequisites - [Bun](https://bun.sh) - JavaScript runtime & package manager - [pnpm](https://pnpm.io) - Package manager (for shared dependency management) ### Local Setup ```bash # Clone the repository git clone cd next-evals # Install dependencies pnpm install # Show help bun cli.ts --help ``` ### Environment Variables Set up your API keys: ```bash # For LLM-based evals export BRAINTRUST_API_KEY="your-braintrust-key" export AI_GATEWAY_API_KEY="your-ai-gateway-key" # For Claude Code evals export ANTHROPIC_API_KEY="your-anthropic-key" # For Amp evals export AMP_API_KEY="your-amp-key" ``` **Note:** The `--dry` flag is recommended for testing as it runs evaluations locally without uploading results to Braintrust ## Usage ### CLI Commands #### LLM-based Evals Run evals using various LLM models (configured in `lib/models.ts`): ```bash # Show help and all available options bun cli.ts --help # Run a specific eval (uploads results to Braintrust) bun cli.ts --eval 001-server-component # Run eval locally without Braintrust upload (recommended for testing) bun cli.ts --dry --eval 001-server-component # Run all evals in parallel bun cli.ts --all --dry # Run with multiple worker threads for better performance # (useful for large eval sets, automatically manages concurrency) bun cli.ts --all --dry --threads 4 # Run with all models (default: only first model) bun cli.ts --dry --eval 001-server-component --all-models # Debug mode - keep output folders for inspection bun cli.ts --dry --debug --eval 001-server-component # Verbose output - see detailed logs during execution bun cli.ts --dry --verbose --eval 001-server-component # Create a new eval from template bun cli.ts --create --name "my-new-eval" --prompt "Create something cool" ``` #### Claude Code Evals Run evals using Claude Code (AI coding agent): ```bash # Run a specific eval with Claude Code bun claude-code-cli.ts --eval 001-server-component # Or use the main CLI with --claude-code flag bun cli.ts --eval 001-server-component --claude-code # Run all evals with Claude Code bun claude-code-cli.ts --all # With custom timeout (default: 600000ms = 10 minutes) bun claude-code-cli.ts --eval 001-server-component --timeout 900000 # With custom API key (or use ANTHROPIC_API_KEY env var) bun claude-code-cli.ts --eval 001-server-component --api-key sk-ant-... # Verbose output bun claude-code-cli.ts --eval 001-server-component --verbose # Debug mode - keep output folders bun claude-code-cli.ts --eval 001-server-component --debug ``` #### Amp Evals Run evals using Amp (AI coding agent): ```bash # Run a specific eval with Amp bun amp-cli.ts --eval 001-server-component # Or use the main CLI with --amp flag bun cli.ts --eval 001-server-component --amp # Run all evals with Amp bun amp-cli.ts --all # With custom timeout (default: 600000ms = 10 minutes) bun amp-cli.ts --eval 001-server-component --timeout 900000 # With custom API key (or use AMP_API_KEY env var) bun amp-cli.ts --eval 001-server-component --api-key your-amp-key # Verbose output bun amp-cli.ts --eval 001-server-component --verbose # Debug mode - keep output folders bun amp-cli.ts --eval 001-server-component --debug ``` #### Claude Code with Dev Server and Hooks Run Claude Code with a Next.js dev server and lifecycle hooks (e.g., for MCP server setup): ```bash # Run with dev server and hook scripts bun cli.ts --eval 001-server-component --claude-code \ --with-dev-server \ --pre-eval ./scripts/eval-hooks/nextjs-mcp-pre.sh \ --post-eval ./scripts/eval-hooks/nextjs-mcp-post.sh # Customize dev server command and port bun cli.ts --eval 001-server-component --claude-code \ --with-dev-server \ --dev-server-cmd "pnpm dev" \ --dev-server-port 3001 \ --pre-eval ./scripts/eval-hooks/nextjs-mcp-pre.sh \ --post-eval ./scripts/eval-hooks/nextjs-mcp-post.sh ``` **Dev Server & Hook Options:** - `--with-dev-server` - Start Next.js dev server before eval - `--dev-server-cmd ` - Command to start server (default: "npm run dev") - `--dev-server-port ` - Port for dev server (default: 3000) - `--pre-eval