Next step · priority 1

The worker at 187.77.29.73:8788

Single point of entry. This box holds the sales queue, runs every ClawBot playbook, and is the only thing standing between n8n crons and Airtable writes. We don’t fully own it yet — this page is the plan to change that.

What it is

FastAPI worker

Python FastAPI service on a VPS. Receives 16 cron payloads from n8n, drafts via LLM, writes back to Airtable + HubSpot. HTTP Basic auth, URL-embedded credentials.

Why it matters

Choke point

Every sales draft, every cold touch, every rebooking nudge crosses this server. If it’s down, the queue stalls. If it’s compromised, the entire pipeline is.

What we know

Almost nothing

IP + port + that ClawBot runs there. No deploy pipeline, no OS access docs, no log access, no backup story. Tribal knowledge lives with Farid.

Phase 1 · discover

Map it before touching it

Four read-only probes against 187.77.29.73:8788. Zero state changes. Goal: establish a baseline so we can detect drift later.

probe 1 Health

GET /health

Confirm the service is reachable and responds. Capture response shape + headers. Expected: 200 with JSON status payload. Anything else is a red flag.

probe 2 Version

GET /version or /

Identify FastAPI version, Python runtime, build hash if exposed. Feeds the patch-cadence question (last deploy date, dependency drift).

probe 3 Auth surface

Unauthenticated challenge

Hit a known protected route without credentials. Verify it returns 401 (not 200, not 500). Confirms Basic auth is actually enforced server-side, not just in n8n.

probe 4 Latency baseline

10× /health · timed

p50 / p95 over 10 sequential pings from local. Baseline for an SLO. Anomalies here later = queue backlog before users notice.

Constraint: read-only only. No POST, no PUT, no draft-trigger. We probe to learn the shape, not to exercise the queue. Anything beyond these four needs Farid in the loop.
Phase 2 · document

Turn tribal knowledge into a runbook

Owner interview

Farid — 30 min

Capture: SSH access path, how deploys happen today, where logs live, what restarts the process, last incident + resolution. Recorded.

Runbook

docs/worker-runbook.md

New file. Sections: connect · tail logs · restart · rollback · rotate credentials · verify healthy. Tested top-to-bottom by someone who isn’t Farid.

Inventory

What runs on the box

OS, Python version, systemd units, cron jobs, exposed ports, firewall rules, disk layout. One .md page. Snapshot, not narrative.

Backup story

Recoverable in < 1h

Today: unknown. Target: code in git, env in secrets manager, infra reproducible from a script. Defines what “rebuild from scratch” looks like.

Phase 3 · harden

Improve, in order

#ImprovementWhyRisk if skipped
1Move HTTP Basic out of n8n URLsURL-embedded credentials leak through every log line + execution history.Credential exposure on n8n logs · rotation forces 16 workflow edits.
2Structured logging to a central sinkToday logs live on the box. No box access = no debugging.Outages diagnosed by guesswork.
3External uptime check & alertHealth Sentinel is internal cron. If the box dies, the cron dies.Silent outages discovered by reviewers when queue stops moving.
4HTTPS terminationPlain HTTP on a public IP. Basic auth over HTTP = credentials in clear.Credential interception · MITM on every payload.
5Reproducible deployDocker + a script, or Container Apps. Anything beats “ssh + edit + restart.”Box is unrebuildable. One disk failure = pipeline gone.
6Move off the IPBare IP + port is fragile. Domain + reverse proxy lets us swap infra without touching n8n.Any infra change = 16 workflow edits + downtime.
Out of scope · for now

What we’re not doing yet

Migration

No lift-and-shift

Moving the worker to ECS/Container Apps is the eventual end-state, not phase 1. Discover first, harden second, migrate third.

Refactor

No code changes

Worker source remains untouched in phases 1–2. We map and document the running system, not rewrite it.

Playbook edits

No prompt changes

Sales_Assistant base prompt + the 16 playbooks stay frozen during discovery. One variable at a time.

Approval gate

Sign-off before phase 1

Blocking: probes are read-only but still hit production. Need explicit go-ahead from Farid (owner) and Rafael (architect) before the first packet. Jess in CC for visibility on sales-queue impact (none expected).