Developer Tools · Product · 2026-03-27

Forge

Real-time quality gates for AI-generated code. Catches LLM degradation before it ships, across Claude Code, Cursor, Codex, and OpenCode.

GitHub

Year: 2026
Status: Product
Category: Developer Tools
Role: Architect & Lead

Key metrics

638

Tests

Languages

38MB

Binary

v1.6.0

Version

Architecture

Single self-contained binary that integrates with Claude Code, Cursor, Codex, and OpenCode via standard hooks. Validates AI-generated code against the local codebase as it lands.

Case study

Forge

Real-time quality gates for AI-generated code.

The problem

LLMs degrade as context fills up. Research consistently documents five failure modes that get worse as sessions grow longer:

Truncated functions — the model emits a signature and docstring but the body is pass or return None. Performance drops 15–30 points beyond 50% context depth.
Hallucinated imports — packages that don't exist in your project. Python import hallucination rates hit 5.2%; JavaScript packages 21.7%.
Phantom references — calls to functions, classes, or variables that were deleted three edits ago. The model's memory of the codebase is stale.
Repetitive output — copy-paste blocks where the model loops rather than generating new logic. Token efficiency collapses.
Placeholder stubs — # TODO: implement that look like real code but do nothing. The longer the session, the more of these appear.

Every team using AI coding assistants hits these. Most don't notice until the tests fail — or worse, until production fails.

What Forge does

Forge sits between your IDE and your codebase as a quality layer. Every file the LLM writes gets checked in real time — not after the commit, not in CI, right as it lands.

graph LR
    LLM[AI Assistant] -->|writes file| Hook[Forge Hook]
    Hook -->|validate| Engine[Quality Engine]
    Engine -->|check against| Graph[Code Graph]
    Engine -->|resolve unknowns| Search[Symbol Search]
    Engine -->|pass/fail| Hook
    Hook -->|feedback| LLM
    Graph -.->|built from| Repo[Your Codebase]

The quality engine runs 30 compiled rules across 6 categories. Each rule targets one of the five failure modes. Rules execute in parallel — validation completes in milliseconds, not seconds.

When the engine finds a hallucinated import, it doesn't just flag it — it searches your actual codebase for what the model probably meant and suggests the fix. When it finds a truncated function, it catches it before the model moves on and forgets it was supposed to finish.

How it integrates

Forge ships as a single self-contained binary. No Python env, no Docker, no cloud dependency. Install it, point it at your project, and it hooks into your IDE:

Claude Code — PostToolUse hook (fires after every file write)
Cursor — .cursor/rules integration
Codex — AGENTS.md guidance layer
OpenCode — .opencode/ configuration

The same binary also exposes 12 MCP tools so your AI assistant can query the code graph, look up symbols, check rule violations, and self-correct — all without leaving the conversation.

Context injection

The other half of the problem: LLMs lose track of architectural decisions and conventions as sessions get long. Forge maintains a compact summary of your codebase (~3,000 characters vs the 50,000+ an uncompressed dump would need) and injects it into every LLM call. The model stays grounded in reality across sessions — it remembers your ADRs, your naming conventions, your dependency graph.

By the numbers

Metric	Value
Languages supported	6+ (Python, TypeScript, JavaScript, Go, Rust, Java)
Compiled validation rules	30
Built-in skills	19
MCP tools	12
Tests passing	638
Binary size	38 MB (self-contained, zero dependencies)
Version	v1.6.0

Tech stack

RustMCP ProtocolIDE IntegrationsTree-sitterSemantic Search

Key metrics

Architecture

Case study

Forge

The problem

What Forge does

How it integrates

Context injection

By the numbers

Tech stack

Gallery

Other 2026 work