Watchdog
Watchdog is Cupcake's LLM-as-a-judge capability. It evaluates AI agent tool calls using another LLM before they execute, providing semantic security analysis that complements deterministic policy rules.
What is LLM-as-a-Judge?
LLM-as-a-judge is a pattern where one AI model evaluates the outputs or actions of another. Instead of relying solely on pattern matching or static rules, you use an LLM's reasoning capabilities to assess whether an action is appropriate, safe, or aligned with intent.
For AI coding agents, this means:
- Semantic understanding: Catching threats that don't match simple patterns
- Context awareness: Evaluating actions against the broader conversation
- Dynamic reasoning: Adapting to novel situations without new rules
Why Cupcake is Well-Positioned for This
Cupcake already sits at the chokepoint between AI agents and their tools. Every file edit, shell command, and API call flows through Cupcake's policy engine. This makes it the natural place to add LLM-based evaluation:
- Already intercepting events: No additional integration work for users
- Structured input: Events are already parsed and normalized
- Policy composition: Watchdog results flow into the same policy system as deterministic rules
- Fail-safe by default: If the LLM is unavailable, Cupcake's deterministic policies still protect you
How It Works
When Watchdog is enabled:
- An AI agent attempts a tool call (e.g., run a shell command)
- Cupcake intercepts the event as usual
- Watchdog sends the event to an LLM for evaluation
- The LLM returns a structured judgment: allow/deny, confidence, reasoning
- This judgment is available to your policies as
input.signals.watchdog - Your policies decide the final outcome
Agent Action → Cupcake → Watchdog (LLM) → Policy Evaluation → Decision
Use Cases
Security
- Detecting data exfiltration attempts that don't match known patterns
- Identifying commands that seem misaligned with the user's stated intent
- Flagging suspicious sequences of actions
Developer Experience
- Suggesting better approaches before executing suboptimal commands
- Providing context-aware warnings
- Guiding agents toward project-specific best practices
Non-Deterministic Answer to Non-Determinism
AI agents are inherently non-deterministic. They can be prompted, confused, or manipulated in ways that deterministic rules can't anticipate. Watchdog addresses this by fighting fire with fire—using AI to evaluate AI.
This doesn't replace deterministic policies. It complements them. Use Rego rules for known patterns and hard requirements. Use Watchdog for semantic analysis and catching the unexpected.
