06 · Risks

What can break

Eight identified risks, ranked by blast radius. Severity is the user-visible impact if the risk fires; likelihood is current, not post-mitigation.

IDRiskSeverityLikelihoodMitigation phase
R-01VPS 187.77.29.73 single point of failureCriticalMediumF3
R-0299 duplicate rows in task_queue_sales (no idempotency)HighHighF1
R-03Status SSOT desync · field editable by anyoneMediumMediumF3
R-04Secret sprawl · same password in 16 n8n URLsHighMediumF2
R-05Guardrail gaps · 48 rules but no test coverageMediumMediumF2
R-06HTTP plaintext between n8n and VPSMediumAlwaysF3
R-07No observability · SSH + tail -f is the dashboardMediumAlwaysF2
R-082 COLD playbooks have empty Trigger + Approval fieldsMediumConfirmedF1
Top three · detail

Where to look first

R-01 · VPS SPOF

What breaks: ClawBot dies, all 16 playbooks stop, no drafts written, queue drains within hours.
Why it’s live: the worker runs on a single Hetzner-class VPS (187.77.29.73:8788) with no failover, no autoscaling, no health probes outside of curl.
Mitigation: F3 moves compute to ECS Fargate behind ALB. Until then, the only fallback is manual restart by Farid.

R-02 · 99 duplicates in task_queue_sales

What breaks: reviewers see the same lead 2-3 times, real send risk if approved blindly. Audit reports double-count.
Why it’s live: no idempotency key on insert. A retried cron, a manual re-run, or two playbooks targeting the same contact each produce a fresh row.
Mitigation: F1 hotfix — composite key on (recipient + playbook_id + send_date) enforced on Airtable side or pre-checked by worker.

R-04 · Secret sprawl

What breaks: rotation requires touching 16 n8n workflow JSONs. A leaked token can’t be revoked without taking everything down.
Why it’s live: the URL-embedded HTTP Basic pattern http://clawbot:TOKEN@host:8788 bypasses n8n credentials and predates Secrets Manager adoption.
Mitigation: F2 routes auth through Secrets Manager with IAM-bound retrieval. n8n workflows reference an abstracted credential instead of an inline URL.