Product Rescue Diagnostic
- Architecture + security review
- Ranked launch-blockers
- Prioritized fix roadmap
- 90-minute readout
Fee credited toward a Sprint if you proceed.
Book a callMacleod Labs rescues AI products stuck between demo and market.
Three decades architecting trading, banking and cloud systems at Morgan Stanley, HSBC, Bank of America and AWS — most recently securing generative AI at several cybersecurity orgs.
I find the real reasons your agent isn't production-ready — the security holes, the brittle plumbing, the architecture that won't scale — fix the critical path, and get it trusted enough to ship to customers and investors.
For teams whose AI works in the demo but isn't yet reliable, secure, or trusted enough to ship.
WorkSpec: the missing control layer for AI work
AI does not just change how work is produced. It changes how work must be specified, reviewed, constrained, and accepted. The problem is no longer output. The problem is control.
Read the article
Every engagement starts with a free call. Pick the entry point that fits where your product is — each one routes to the same conversation.
Fee credited toward a Sprint if you proceed.
Book a callOngoing technical leadership, architecture ownership, hands-on delivery.
Custom / enterprise scope on request.
Book a call[Steve was] asked to rectify the bugs in the code, move it from a test environment to production, document the code and accounts used, and recommend the development tools that would suit the AI development that was taking place. Whilst the first two weeks was impressive, the achievement of these stretch goals was even more impressive, as we were left with a fully documented and running environment, which made it to market in tight deadlines.
— Chief Technology Officer, AI-security company
Named client references available on request.
These are not abstract demos. They are recurring failure patterns I have fixed inside real AI products: codebases moving faster than control, agentic systems with invisible risk, voice AI that breaks outside the demo, and RAG/search systems users cannot trust. They are examples, not the full list — the same approach applies wherever an AI product has to get reliable, secure, and trusted enough to ship.
AI coding tools and agents create speed, but they also create fragile code, weak tests, phantom references, unclear ownership, and build instability.
I audit the codebase, identify the launch blockers, repair the critical path, and put quality gates around the parts that matter.
# compact context pack, not a raw dump
ctx = pack(repo) # ~3k chars, not 50k
ctx += adrs + conventions + dep_graph
ctx += symbol_index(treesitter_parse(repo))
prompt = user_intent + ctx
agent.run(prompt, budget=Budget(steps, tokens))
while not done and budget.ok():
plan = llm(ctx) # reason over state
call = select_tool(plan) # next action
call = pre_tool(call) # mediated ↓
obs = run(call) # sandboxed
ctx += obs + post_tool(obs) # gate report
done = plan.is_complete or budget.spent()
# intercept BEFORE execution — deny or rewrite
def pre_tool(call):
assert call.path in scope # path/scope guard
assert call.tool in allowlist # policy allowlist
assert not secrets(call.args) # no creds leak
call = policy.rewrite(call) # Cursor .cursor/rules
return call # or raise Deny
# edit / exec inside the isolated worktree
before = read(call.path)
apply(call, cwd=wt) # write file / run cmd
after = read(call.path)
diff = unified(before, after) # captured change
return Observation(diff, stdout, exit_code)
# Claude Code PostToolUse hook → 30 compiled rules
def post_tool(diff):
checks = [lint, types, tests, ast_diff,
secret_scan, no_stub, no_phantom_ref,
import_exists] # 6 categories
gate = all(rule(diff) for rule in RULES) # 30, parallel
if not gate: rollback(diff) # revert in ms
return Report(gate, failures) # Codex AGENTS.md
wt = worktree(repo) # isolated checkout
apply(diff, wt)
graph = code_graph(treesitter, wt)
ok = graph.resolve_refs(diff) # no phantom symbols
ok &= symbol_search(diff.new_imports).exist()
build(wt) # self-contained binary
return ok # OpenCode .opencode/
if report.pass:
ctx += "✓ gate green"; continue # keep going
else:
ctx += report.failures # actionable
retries += 1
if retries > MAX: escalate(human) # bounded
# agent re-plans against the failures →
Select a stage above to read its pseudo-code.
Forge (forgeOS) v1.6.0 — Rust binary · Tree-sitter code graph · 30 deterministic rules · 12 MCP tools · 6+ languages · 638 tests. Hooks into Claude Code, Cursor, Codex & OpenCode.
Code harnesses, agentic coding workflows, AI-generated code control, codebase rescue, and release-readiness checks.
See how the control loop worksAgentic AI risk hides across tools, memory, retrievers, APIs, files, permissions, prompts, and human approval boundaries.
I map what the system can read, write, call, trigger, or leak — then fix the control gaps before launch.
Code property graphs, trust-boundary mapping, control-surface analysis, agent permissions, and AI security review.
See how the risk map worksVoice AI breaks on latency, silence detection, streaming, transcription quality, turn-taking, inference cost, telephony, deployment, and real users.
I turn fragile voice prototypes into production-ready pipelines.
Real-time speech, VAD, streaming, inference acceleration, call handling, deployment, and production-readiness.
See how voice becomes production-readyNaive RAG gives plausible answers. Shippable RAG gives grounded, complete, cited, testable answers — and knows when to abstain.
I rescue document-intelligence systems that miss evidence, hallucinate specifics, fail exhaustive queries, or cannot prove where answers came from.
Hybrid retrieval, structured filtering, citation grounding, exhaustive search, evaluation, and trustworthy document intelligence.
See how trustworthy retrieval works