Forge
Real-time quality gates for AI-generated code. Catches LLM degradation before it ships, across Claude Code, Cursor, Codex, and OpenCode.
Key metrics
Architecture
Single self-contained binary that integrates with Claude Code, Cursor, Codex, and OpenCode via standard hooks. Validates AI-generated code against the local codebase as it lands.
Case study
Forge
Real-time quality gates for AI-generated code.
The problem
LLMs degrade as context fills up. Research consistently documents five failure modes that get worse as sessions grow longer:
- Truncated functions — the model emits a signature and docstring but the body is
passorreturn None. Performance drops 15–30 points beyond 50% context depth. - Hallucinated imports — packages that don't exist in your project. Python import hallucination rates hit 5.2%; JavaScript packages 21.7%.
- Phantom references — calls to functions, classes, or variables that were deleted three edits ago. The model's memory of the codebase is stale.
- Repetitive output — copy-paste blocks where the model loops rather than generating new logic. Token efficiency collapses.
- Placeholder stubs —
# TODO: implementthat look like real code but do nothing. The longer the session, the more of these appear.
Every team using AI coding assistants hits these. Most don't notice until the tests fail — or worse, until production fails.
What Forge does
Forge sits between your IDE and your codebase as a quality layer. Every file the LLM writes gets checked in real time — not after the commit, not in CI, right as it lands.
graph LR
LLM[AI Assistant] -->|writes file| Hook[Forge Hook]
Hook -->|validate| Engine[Quality Engine]
Engine -->|check against| Graph[Code Graph]
Engine -->|resolve unknowns| Search[Symbol Search]
Engine -->|pass/fail| Hook
Hook -->|feedback| LLM
Graph -.->|built from| Repo[Your Codebase]The quality engine runs 30 compiled rules across 6 categories. Each rule targets one of the five failure modes. Rules execute in parallel — validation completes in milliseconds, not seconds.
When the engine finds a hallucinated import, it doesn't just flag it — it searches your actual codebase for what the model probably meant and suggests the fix. When it finds a truncated function, it catches it before the model moves on and forgets it was supposed to finish.
How it integrates
Forge ships as a single self-contained binary. No Python env, no Docker, no cloud dependency. Install it, point it at your project, and it hooks into your IDE:
- Claude Code — PostToolUse hook (fires after every file write)
- Cursor —
.cursor/rulesintegration - Codex —
AGENTS.mdguidance layer - OpenCode —
.opencode/configuration
The same binary also exposes 12 MCP tools so your AI assistant can query the code graph, look up symbols, check rule violations, and self-correct — all without leaving the conversation.
Context injection
The other half of the problem: LLMs lose track of architectural decisions and conventions as sessions get long. Forge maintains a compact summary of your codebase (~3,000 characters vs the 50,000+ an uncompressed dump would need) and injects it into every LLM call. The model stays grounded in reality across sessions — it remembers your ADRs, your naming conventions, your dependency graph.
By the numbers
| Metric | Value |
|---|---|
| Languages supported | 6+ (Python, TypeScript, JavaScript, Go, Rust, Java) |
| Compiled validation rules | 30 |
| Built-in skills | 19 |
| MCP tools | 12 |
| Tests passing | 638 |
| Binary size | 38 MB (self-contained, zero dependencies) |
| Version | v1.6.0 |
Tech stack
Gallery