CPG Scanner
Static analysis that maps the attack surface of any LangChain or LangGraph app — MITRE ATLAS taint chains, architecture, and control gaps, from source.
Key metrics
Architecture
A fetcher pulls source from a GitHub URL, git repo, or local path and converts notebooks to Python. A Python AST pass builds a Code Property Graph, then six YAML-driven analysis skills run in strict order over it — filtering to LLM-relevant nodes, classifying trust planes, extracting architecture, mapping MITRE ATLAS taint chains, detecting control gaps, and resolving model governance. The merged JSON report drives an interactive React SVG diagram. Nothing in the target code is ever executed.
Case study
The problem: an LLM app's attack surface is invisible before it ships
A LangChain or LangGraph application is a graph of untrusted inputs, model calls, retrievers, tools, and data stores — but none of that structure is visible in the code itself. It's scattered across decorators, pipe chains, and conditional edges. By the time a security engineer can reason about where a prompt injection lands or where a poisoned document reaches the model, the app is already running. Pen-testing a live deployment is slow, partial, and late.
CPG Scanner answers the question before anything runs: given any LangChain / LangGraph Python codebase — a GitHub URL, a git repo, or a local path — what is its attack surface, and where are the controls missing?
[[toc]]
Building a Code Property Graph from the AST
The scanner never executes the target code. A fetcher pulls the source (cloning
repos and converting .ipynb notebooks to .py as needed), then a Python AST
pass walks every file and emits a Code Property Graph: nodes for CALL,
IMPORT, ASSIGN, FUNC_DEF, and LITERAL, joined by DATA_FLOW and CALL
edges. The default LangGraph adaptive-RAG examples produce 629 nodes and 103
edges from four files in about 1.2 seconds.
The CPG is the substrate. Everything downstream is derived from it, which is what makes the evidence real — every finding traces back to a specific file and line in the source.
Six YAML-driven analysis skills
Six skills run in strict order over the graph. Each is configured entirely in YAML — new frameworks, attack chains, and control patterns are added without touching Python.
flowchart TD SRC["GitHub URL / git repo / local path"] --> FETCH["fetcher.py"] FETCH --> AST["ast_runner.py — Python AST to CPG (629 nodes, 103 edges)"] AST --> S1["llm_filter — prune to 7 LLM-relevant categories"] S1 --> S2["boundary_classifier — assign 8 trust planes"] S2 --> S3["arch_extractor — 14 architecture node types"] S3 --> S4["threat_mapper — MITRE ATLAS taint chains"] S4 --> S5["control_detector — 10 control types + gaps"] S5 --> S6["eci_resolver — model governance card"] S6 --> ASM["assembler.py — merge to scan_result.json"] ASM --> UI["React SVG attack surface map"]
- llm_filter prunes the raw CPG down to LLM-relevant nodes across seven categories, mutating the graph in place so every later skill sees the same filtered view.
- boundary_classifier assigns eight trust planes from import and call patterns — the ingress, model, and privileged-action zones you see drawn as dashed boundaries in the UI.
- arch_extractor maps the CPG onto 14 architecture node types, including
LCEL pipe-operator chains (
prompt | llm | parser) that other tools miss. - threat_mapper does the taint-chain analysis described below.
- control_detector detects 10 guardrail types and emits a gap warning, with the specific library to add, whenever an expected control is absent.
- eci_resolver matches model strings to a governance registry and produces a per-model card with an Epoch Capabilities Index score and blast radius.
MITRE ATLAS taint chains, with evidence
The threat mapper traces data-flow paths through the CPG and matches them
against six MITRE ATLAS techniques. Each finding is a real taint chain, not a
heuristic guess. Direct prompt injection (AML.T0051.000), for example, fires
when a HumanMessage flows through add_messages into a ChatOpenAI call —
and the report carries the source, line, and code snippet for every hop:
| Attack | MITRE ID | Severity | Trigger |
|---|---|---|---|
| Prompt Smuggling / Direct Injection | AML.T0051.000 | Critical | HumanMessage → add_messages → model |
| Indirect RAG Poisoning | AML.T0054.003 | High | loader → vectorstore → retriever |
| Prompt Exfiltration via Tool | AML.T0057 | High | ToolNode with web/shell after the model |
| Privilege Escalation via Agent | AML.T0043 | High | multi-agent supervisor, no HITL gate |
| Model Inversion via Embeddings | AML.T0040 | Medium | embeddings + store, no access control |
| Adversarial Example Injection | AML.T0020 | Medium | unvalidated docs into the retrieval pipeline |
Findings also line up with the OWASP LLM Top 10 (2025): prompt injection (LLM01), sensitive data exposure (LLM02), and supply chain (LLM05).
Step through how a single tainted source propagates across the graph until it reaches a dangerous sink — the same walk the threat mapper performs to build a taint chain:
The interactive attack surface map
The merged JSON report drives a React SVG diagram. Components are laid out
across three trust planes — public ingress, the privileged action plane, and the
data/tool plane — with each node carrying its CPG evidence, code snippets, and
governance card. Nodes the scanner detected directly are badged CPG; the rest
are inferred synthetic nodes that complete the picture.
@image[hero.png]
Clicking through surfaces the detail: attack paths with MITRE technique and mitigations, end-to-end data-flow paths with OWASP risk indicators, and the governance card per model. The diagram below is the scan of the bundled LangGraph adaptive-RAG example, with code snippets pulled straight from source.
@image[01-attack-surface-map.png]
Built to run anywhere
The tool ships as a Typer CLI (scan, serve, list-skills, reindex) and a
FastAPI server, packaged for Docker and deployable via Terraform. Adding new LLM
frameworks, attack taint chains, security-control patterns, or model registry
entries is a YAML edit, not a code change — the analysis engine stays fixed
while the rule set grows. The suite runs 237 tests with zero failures.
Tech stack
Gallery