AI System Guardian

Self-healing monitoring daemon for multi-agent AI architectures. 21-pattern deep code scanner + 13 runtime health checks + auto-restart + pipeline stall recovery. Born from fixing 70+ production bugs across an 11-round debugging marathon.

Deep Code Scan Patterns

eval, exec, pickle, shell injection, secret access, network allowlist

Runtime Health Checks

PM2, HTTP probes, pipeline stall, disk, budget, task quality

10s

Patrol Interval

Adaptive: 10s base, 5-min cooldown per process, daily digest

How It Works

1.Patrol Loop — Every 10 seconds, the sentinel runs all 13 health checks in sequence. Each check returns (ok, details).

2.Auto-Heal — If a check fails (e.g., crashed PM2 process), the sentinel attempts automatic remediation: restart the process, roll back a stalled task, or rotate bloated logs.

3.Cooldown — After healing a process, a 5-minute cooldown prevents restart storms. Cooldown resets on successful recovery.

4.Escalation — If auto-heal fails or the same process keeps crashing, the sentinel writes an escalation task and sends a Discord/Telegram alert to a human operator.

5.Deep Code Scan — Every 24 hours, the sentinel runs the 21-pattern security scanner across all auto-generated scripts before they execute, blocking anything with eval(), pickle, or unauthorized network calls.

6.Health Summary — Every 60 seconds, the sentinel writes a JSON health summary consumed by the dashboard. Daily digest sent to Discord at midnight.

Architecture Diagram

┌─────────────────────────────────────────────────┐
│              Shadow Sentinel v2.0               │
│          (Main Patrol Loop — 10s cycle)         │
├────────────┬────────────┬───────────────────────┤
│ 13 Health  │  Auto-Heal │  21-Pattern Security  │
│  Checks    │   Engine   │     Deep Scan (24h)   │
├────────────┼────────────┼───────────────────────┤
│ PM2 Status │ pm2 restart│ eval/exec detection   │
│ HTTP Probe │ Task rollbk│ pickle/ctypes block   │
│ Pipeline   │ Log rotate │ Network allowlist     │
│ Disk/Budget│ Reassign   │ Secret file access    │
└─────┬──────┴─────┬──────┴──────────┬────────────┘
      │            │                 │
      ▼            ▼                 ▼
┌──────────┐ ┌──────────┐   ┌───────────────┐
│ Dashboard│ │ Discord/ │   │ Escalation    │
│  JSON    │ │ Telegram │   │ Task Queue    │
└──────────┘ └──────────┘   └───────────────┘

Use Cases

Multi-Agent AI Systems

When you run 3+ AI agents (orchestrator, workers, dispatcher) via PM2, crashes and infinite loops are inevitable. The guardian catches them in seconds, not hours.

AI-Generated Code Pipelines

If your agents generate and execute Python scripts, the 21-pattern scanner blocks dangerous code (eval, pickle, shell injection) before it runs.

Production Task Queues

Pipeline stall detection finds tasks stuck for 30+ minutes and auto-rolls them back, preventing the entire queue from blocking.

Cost-Sensitive API Operations

Budget burn rate monitoring catches runaway API calls (e.g., infinite retry loops calling GPT-4) before they drain your credits.

Frequently Asked Questions

What is AI System Guardian?▼

AI System Guardian is a self-healing monitoring daemon for multi-agent AI architectures. It runs a 10-second patrol loop that executes 13 runtime health checks (PM2 process status, HTTP probes, pipeline stall detection, disk space, budget burn rate) and a 24-hour deep code security scan with 21 patterns (eval, exec, pickle, shell injection, unauthorized network calls). When it detects a failure, it automatically restarts processes, rolls back stuck tasks, or escalates to a human operator.

How does the self-healing work?▼

When a health check fails — say a PM2 process crashes — the sentinel issues a `pm2 restart` command automatically. A 5-minute cooldown prevents restart storms. If the process crashes again within cooldown, the sentinel writes an escalation task and sends a Discord/Telegram alert. For pipeline stalls, it rolls back the stuck task and reassigns it to a different worker. For bloated logs, it rotates them automatically.

What are the 21 deep code scan patterns?▼

The scanner checks AI-generated scripts for: eval()/exec() (arbitrary code execution), __import__/importlib (dynamic imports), os.system/popen/exec*/spawn* (shell injection), subprocess.Popen, ctypes (native code loading), pickle/shelve (insecure deserialization), shutil.rmtree (recursive deletion), input() (blocking I/O), direct .key/.env/token file access, and outbound HTTP calls to non-allowlisted domains.

Do I need PM2 to run this?▼

PM2 is recommended because the sentinel monitors PM2-managed processes and uses pm2 restart for auto-healing. However, the health check patterns (HTTP probes, disk monitoring, log rotation, code scanning) work independently. You could adapt it to systemd, Docker health checks, or Kubernetes liveness probes.

Can I add custom health checks?▼

Yes. Each health check is a standalone function that returns (ok, message). To add a custom check, write a function following the same pattern and add it to the shadow_sentinel_scan() main loop. The sentinel will include it in the dashboard JSON and daily digest automatically.

How does budget burn rate monitoring work?▼

The sentinel tracks API costs logged by the task pipeline. It calculates hourly and daily burn rates and projects monthly spend. If the projected spend exceeds your configured threshold, it sends an alert. This catches runaway retry loops or infinite task generation that can drain API credits in hours.

Share this tool: X / Twitter LinkedIn

More Free Tools

JSON Formatter JSON Diff Viewer CSS Grid Generator Regex Tester SQL Formatter API Response Formatter Color Palette Generator Cron Expression Generator TypeScript Playground Git Command Builder HTTP Status Codes Base64 Encode / Decode Password Generator Markdown Preview API Request Builder Regex Playground Docker Compose Generator UUID Generator Hash Generator

Mar 24, 2026 · 22 min read

How to Build a Self-Healing AI Agent System: Lessons from 70+ Production Bugs

A practical guide to building a self-healing monitoring system for multi-agent AI architectures. Covers 21 code scan patterns, 13 runtime health checks, auto-restart with cooldown, pipeline stall recovery, and budget burn rate monitoring — all born from an 11-round debugging marathon.

Mar 24, 2026 · 8 min read

Cron Expression Generator: Build and Understand Cron Schedules

Learn cron expression syntax, build schedules visually, and avoid common mistakes. Includes a cron expression generator tool and reference for every cron format.

Mar 24, 2026 · 12 min read

GitHub Actions Complete Guide: Build Your First CI/CD Pipeline in 2026

Learn GitHub Actions from scratch. Set up automated testing, linting, Docker builds, and deployments with real workflow examples. No prior CI/CD experience needed.

Get weekly developer tips

Tool guides, productivity playbooks & AI tricks. Free. No spam.

DevPlaybook Pro 7 days free

Unlock AI-Powered Dev Tools

⚡ AI Code Review, Doc Generator & SQL Builder
⚡ All premium templates & early access
⚡ Member discounts on Gumroad products

Start Free Trial →

$9/mo after trial · Cancel anytime

Want the full toolkit?

Get DevPlaybook Pro

Every template, guide, boilerplate, and automation script in one bundle. 13 premium products — grab them all at once and save big.

$79

$241 separately Save $162 (67% off)

Get DevPlaybook Pro — $79 →

MIT licensed · Instant download · No subscription

See what's included → Browse all deals →