AI Product Rescue

I make agentic AI shippable.

Macleod Labs rescues AI products stuck between demo and market.

Three decades architecting trading, banking and cloud systems at Morgan Stanley, HSBC, Bank of America and AWS — most recently securing generative AI at several cybersecurity orgs.

I find the real reasons your agent isn't production-ready — the security holes, the brittle plumbing, the architecture that won't scale — fix the critical path, and get it trusted enough to ship to customers and investors.

For teams whose AI works in the demo but isn't yet reliable, secure, or trusted enough to ship.

Book a Product Rescue Call Free 30-minute call — we'll find your biggest launch blocker. See examples of pain points I've fixed

Featured field note

The Manager as Compiler

WorkSpec: the missing control layer for AI work

AI does not just change how work is produced. It changes how work must be specified, reviewed, constrained, and accepted. The problem is no longer output. The problem is control.

Read the article

Recent clients

CTO and Chief Architecture roles

Ways to work together

Every engagement starts with a free call. Pick the entry point that fits where your product is — each one routes to the same conversation.

Start here

Product Rescue Diagnostic

$2,000

fixed · ~4-day turnaround

Architecture + security review
Ranked launch-blockers
Prioritized fix roadmap
90-minute readout

Fee credited toward a Sprint if you proceed.

Book a call

Fix & ship

Rescue Sprint

from $12,000

fixed scope · 2–3 weeks

Hands-on implementation of the critical-path fixes
Security hardening + evidence pack
Test-to-production readiness

Book a call

Fractional AI CTO

Ongoing technical leadership, architecture ownership, hands-on delivery.

Advisor up to 2 days/mo $2,200/mo

Operatorrecommended up to 5 days/mo $5,500/mo

Embedded up to 8 days/mo $8,800/mo

Custom / enterprise scope on request.

Book a call

[Steve was] asked to rectify the bugs in the code, move it from a test environment to production, document the code and accounts used, and recommend the development tools that would suit the AI development that was taking place. Whilst the first two weeks was impressive, the achievement of these stretch goals was even more impressive, as we were left with a fully documented and running environment, which made it to market in tight deadlines.
— Chief Technology Officer, AI-security company

Named client references available on request.

Examples of pain points I've fixed

These are not abstract demos. They are recurring failure patterns I have fixed inside real AI products: codebases moving faster than control, agentic systems with invisible risk, voice AI that breaks outside the demo, and RAG/search systems users cannot trust. They are examples, not the full list — the same approach applies wherever an AI product has to get reliable, secure, and trusted enough to ship.

AI codebases moving faster than control

AI coding tools and agents create speed, but they also create fragile code, weak tests, phantom references, unclear ownership, and build instability.

I audit the codebase, identify the launch blockers, repair the critical path, and put quality gates around the parts that matter.

Code harnesses & agentic coding The agent loop, mediated by hooks & deterministic gates

source & promptcontext

# compact context pack, not a raw dump
ctx = pack(repo)        # ~3k chars, not 50k
ctx += adrs + conventions + dep_graph
ctx += symbol_index(treesitter_parse(repo))
prompt = user_intent + ctx
agent.run(prompt, budget=Budget(steps, tokens))

agent · plan → act loopbounded

while not done and budget.ok():
    plan = llm(ctx)              # reason over state
    call = select_tool(plan)     # next action
    call = pre_tool(call)        # mediated ↓
    obs  = run(call)             # sandboxed
    ctx += obs + post_tool(obs)  # gate report
    done = plan.is_complete or budget.spent()

pre_tool(call)guardrail

# intercept BEFORE execution — deny or rewrite
def pre_tool(call):
    assert call.path in scope        # path/scope guard
    assert call.tool in allowlist    # policy allowlist
    assert not secrets(call.args)    # no creds leak
    call = policy.rewrite(call)      # Cursor .cursor/rules
    return call                      # or raise Deny

tool actionsandboxed

# edit / exec inside the isolated worktree
before = read(call.path)
apply(call, cwd=wt)              # write file / run cmd
after  = read(call.path)
diff   = unified(before, after) # captured change
return Observation(diff, stdout, exit_code)

post_tool · quality gatedeterministic

# Claude Code PostToolUse hook → 30 compiled rules
def post_tool(diff):
    checks = [lint, types, tests, ast_diff,
              secret_scan, no_stub, no_phantom_ref,
              import_exists]          # 6 categories
    gate = all(rule(diff) for rule in RULES)  # 30, parallel
    if not gate: rollback(diff)       # revert in ms
    return Report(gate, failures)     # Codex AGENTS.md

sandbox · validate vs. liveisolated

wt    = worktree(repo)           # isolated checkout
apply(diff, wt)
graph = code_graph(treesitter, wt)
ok    = graph.resolve_refs(diff) # no phantom symbols
ok   &= symbol_search(diff.new_imports).exist()
build(wt)                        # self-contained binary
return ok                        # OpenCode .opencode/

pass / fail feedbackbounded retry

if report.pass:
    ctx += "✓ gate green"; continue   # keep going
else:
    ctx += report.failures            # actionable
    retries += 1
    if retries > MAX: escalate(human) # bounded
    # agent re-plans against the failures →

Forge (forgeOS) v1.6.0 — Rust binary · Tree-sitter code graph · 30 deterministic rules · 12 MCP tools · 6+ languages · 638 tests. Hooks into Claude Code, Cursor, Codex & OpenCode.

An edit-compile-run-debug loop traces across a live code graph: a node changes, the build fans out, tests fire, a failure lights up, and the agent patches and re-runs — the inner loop of agentic coding, made legible.

Developer Tools

MCP Stack

Code harnesses, agentic coding workflows, AI-generated code control, codebase rescue, and release-readiness checks.

See how the control loop works

Agentic systems with invisible risk

Agentic AI risk hides across tools, memory, retrievers, APIs, files, permissions, prompts, and human approval boundaries.

I map what the system can read, write, call, trigger, or leak — then fix the control gaps before launch.

A code property graph assembles node by node — functions, variables, and calls — then a taint path lights up and threads from an untrusted source through the graph to a sensitive sink, the way a CPG exposes an attack path.

Code Intelligence / AI Security

CPG Scanner

Security & Compliance

Vulnflow

Generative AI Agents

GCoder

Code property graphs, trust-boundary mapping, control-surface analysis, agent permissions, and AI security review.

See how the risk map works

Voice AI that works in demo but not production

Voice AI breaks on latency, silence detection, streaming, transcription quality, turn-taking, inference cost, telephony, deployment, and real users.

I turn fragile voice prototypes into production-ready pipelines.

A live sales call, annotated in real time: the pipeline transcribes the client–salesperson conversation and categorizes each exchange against four different sales strategies as the call unfolds. The annotator also lets the user mark key parts of the call by hand — those annotations feed back in to train the AI.

AI & Machine Learning

Real-Time Voice AI

Flagship case study: engineering sub-second voice agents over telephony — the latency budget, VAD-gated streaming Whisper, semantic endpointing, barge-in, and scaling to thousands of concurrent calls.

Real-time speech, VAD, streaming, inference acceleration, call handling, deployment, and production-readiness.

See how voice becomes production-ready

RAG and search systems users cannot trust

Naive RAG gives plausible answers. Shippable RAG gives grounded, complete, cited, testable answers — and knows when to abstain.

I rescue document-intelligence systems that miss evidence, hallucinate specifics, fail exhaustive queries, or cannot prove where answers came from.

Standard · single model

Advanced · open-weights SOTA frontier

One query forks two ways: a vector path drifts through an embedding cloud to nearest neighbors while a structured-filter path snaps a metadata grid down to qualifying rows; the two streams converge and rerank into a single ordered result.

RAG / Document Intelligence

Deal-Room Document Intelligence

Hybrid retrieval, structured filtering, citation grounding, exhaustive search, evaluation, and trustworthy document intelligence.

See how trustworthy retrieval works

Product Rescue AI Security RAG and Document Intelligence Voice AI Cloud and Platform Architecture Hands-on CTO Execution

I make agentic AI shippable.

The Manager as Compiler

Ways to work together

Product Rescue Diagnostic

Rescue Sprint

Fractional AI CTO

Examples of pain points I've fixed

AI codebases moving faster than control

Forge

McClawd

HyperCoder

MCP Stack

Agentic systems with invisible risk

CPG Scanner

Vulnflow

GCoder

Voice AI that works in demo but not production

Real-Time Voice AI

RAG and search systems users cannot trust

Deal-Room Document Intelligence