Two workflows pulled live from n8n Cloud API with the production tokens in .env.deploy: BO-26 Cascade Detector (the brain that decides what to send) and Send Approved Tasks (the hand that delivers Gmail). Together they generate 100% of outbound revenue. This page walks them in plain language, names the pain, and proposes the AWS target — one bounded context, one optimized database, no shared state.
Workflows fetched live via GET /api/v1/workflows/{id} using the API key in .env.deploy. JSON dumps saved to n8n-backup/workflows/.
BO-26 creates 100% of the task_queue_sales rows. Send Approved Tasks is the only path to a customer inbox. Every other workflow is a feeder or reporter on top of these.
For each: plain-language flow, what hurts in production today, and the AWS service per concern with one DB per bounded context (DDD).
Every hour it scans HubSpot deals, classifies them across a 7-stage cascade (Cold → Warm → Hot → Negotiation → Won → Stale → Lost), compares against the previous snapshot in Airtable, and writes only the diffs as new task_queue_sales rows ready for human approval.
Snapshot table has ~40k rows and grows. Every hour the workflow does N reads + N writes against Airtable's 5 req/s limit. Splits, waits, and 429-retry loops eat 4-7 minutes per run. No index, no transaction, no rollback if the run crashes halfway.
The 7-stage classifier is ~180 lines of JS embedded in one node. No tests, no version control beyond the workflow JSON, no way to replay a single deal locally. A cascade rule change = edit-in-browser, hope, ship.
Every 5 minutes it pulls approved rows from task_queue_sales, claims each one through an external lock service, routes by sales rep to one of 5 Gmail Service Account credentials, sends, logs the engagement back to HubSpot, and updates the cadence.
Token ySkkUS1f1ZcW… sits in 3 HTTP nodes of this workflow and in 16 sibling workflows. No vault, no rotation. The proxy autotask.1mr.llc is a single Node process on a single VPS — if it dies, every outbound send halts and no one is paged.
A Switch node routes to Jessica / Ivan / Mario / Milos / Daniel Gmail SAs. Daily quota is read from Airtable on each iteration with no atomic decrement — two concurrent claims for the same rep can both pass the 50/day check and overshoot. Discovered after Jessica hit 58 sends one Tuesday.
Both workflows split into two services with disjoint data ownership. No shared schemas, no chatty calls between them — they communicate only via an EventBridge bus when a domain event is meaningful to the other (e.g. TaskApproved, EmailSent).
Trigger: EventBridge Scheduler · rate(1 hour)
Compute: Step Functions Express · Map state over deal pages · Lambda per page (or ECS task if >15 min)
Snapshots: DynamoDB deal_snapshots · PK deal_id · SK snapshot_ts · TTL 90d · single-digit-ms diff lookups, no rate limit
Tasks emitted: RDS Aurora Postgres Serverless v2 task_queue_sales · relational, audit-friendly, the human approval UI talks SQL
Observability: every classification → Kinesis Firehose → S3 → Athena. Replay any deal by re-running one Lambda against the snapshot.
Trigger: EventBridge Scheduler · rate(5 minutes) → enqueue approved tasks into SQS FIFO with MessageGroupId = from_inbox (per-rep ordering, no duplicates)
Workers: ECS Fargate · NestJS · auto-scale on queue depth
Lock: DynamoDB task_claims · conditional write on task_id + TTL 5 min → kills the external claim proxy
Quota: DynamoDB counter per (rep_id, date) with atomic ADD + condition < daily_limit → fixes the race
Cadences + suppressions + send_log: RDS Postgres (same instance as service 1 logically, separate schema/owner)
Credentials: Secrets Manager · 5 rep SA JSONs + HubSpot token, rotated by a Lambda on schedule
Send: Gmail API now (per-rep SA) → SES later if we move off Workspace. HubSpot engagement logged via EventBridge fan-out.
Detector and sender today share task_queue_sales as a god-table and break each other constantly: a schema change for cascade reasons forces a redeploy of the sender. Splitting ownership — detector writes, sender reads only via an event or a narrow read API — means the two services can iterate independently and each picks the database its workload actually wants (KV for snapshots/locks/counters, SQL for relational tasks and cadences).
Stand up the new service alongside n8n. Read the same task_queue_sales, claim through the new Dynamo lock for 10% of approved rows. Compare send rate, bounce, log fidelity vs n8n for a week.
Route 100% through outbound-sender. Migrate cadences + suppressions from Airtable into RDS. Delete autotask.1mr.llc/claim and remove the bearer from all 16 n8n workflows.
Port the 7-stage classifier into a TS package with unit tests. Run Step Functions in parallel with n8n BO-26 and diff the emitted tasks for 5 runs. Cut over once diff = 0.
Move the approval UI off Airtable Interfaces into Hub v2 (NestJS + Next.js already deployed to ECS hv-hub-v2-development). Airtable becomes read-only reporting only.
| n8n concern today | AWS primitive | Why |
|---|---|---|
| scheduleTrigger | EventBridge Scheduler | Cron with at-least-once, retries, DLQ. Managed. |
| Code node orchestration | Step Functions Express | Visual workflow, parallel Map, < 5 min cheap. |
| httpRequest + parse | Lambda (or ECS for >15 min) | Native HTTP, packaged TS, version-controlled. |
| Airtable snapshots | DynamoDB | KV with TTL, no rate limit, single-ms reads. |
| task_queue_sales | RDS Aurora Postgres | Relational, transactions, joins for the UI. |
| autotask claim proxy | DynamoDB conditional write + TTL | Distributed lock without a server. |
| Daily quota counter | DynamoDB atomic ADD | Race-free decrement with condition expression. |
| 5-credential Gmail switch | Secrets Manager + per-rep secret | Rotation, audit, IAM-scoped. |
| splitInBatches + wait | SQS FIFO (MessageGroupId) | Per-rep ordering, dedupe, backpressure. |
| HubSpot engagement log | EventBridge bus | Fan-out, decouple sender from logger. |
| n8n execution history | CloudWatch + X-Ray | Structured logs, trace per task_id end-to-end. |
| Replay / audit | Kinesis Firehose → S3 → Athena | SQL over every classification decision. |
The ~8 webhook-driven Operational Tasks routers and the misc integration glue (HubSpot↔Airtable sync, WhatsApp). Those stay in n8n until the two revenue-critical workflows are off it. See Roadmap phases F5+.