Before we ship more bots, more playbooks or more surfaces, we agree on the foundation. One AWS account, one application core (Hub v2), one source of truth (Airtable) and one worker (ClawBot). Every future feature plugs into these four — nothing else.
If a new initiative doesn’t map cleanly onto one of these four, it belongs in a discovery doc — not in production. This is the contract Jessica and Farid asked us to draw before we go deeper.
One account (891377221201) in us-east-2. Every workload runs on managed primitives: ECS Fargate, RDS, Secrets Manager, CloudWatch, S3, ALB, IAM, VPC. No more SSH-into-a-VPS.
NestJS API + Next.js web on ECS Fargate. Absorbs the 16 n8n cron callers (@Cron decorators) and the ~8 webhook routers (@Controller('/ingest/*')). One deploy pipeline, one auth model.
Source of truth for bots, playbooks, guardrails, queue state and human approval. 5 bases, 23 SSOT tables. The migration touches infra, not data — Airtable stays the surface humans see.
FastAPI engine that drafts every outbound message. Today on the VPS, tomorrow on Fargate behind ALB + HTTPS. Same logic, same prompts, same playbooks — different host.
If a request can’t be answered with “it’s an AWS service, a Hub v2 endpoint, an Airtable table or a ClawBot playbook” — we’re inventing a fifth pillar. That’s the moment to stop and talk.
The AWS services that make up the foundation, grouped by what they do for us. No experimental services, no exotic regions — only the boring, well-understood primitives.
flowchart TB
classDef ingress fill:#16201F,stroke:#47BFDD,stroke-width:1.5px,color:#E8FFFA;
classDef identity fill:#16201F,stroke:#AD7FE1,stroke-width:1.5px,color:#E8FFFA;
classDef compute fill:#0F1F1C,stroke:#32BDAE,stroke-width:2px,color:#A2FFE1;
classDef ai fill:#1A1620,stroke:#FAC236,stroke-width:1.5px,color:#FFE69A;
classDef data fill:#16201F,stroke:#32BDAE,stroke-width:1.5px,color:#E8FFFA;
classDef platform fill:#101817,stroke:#4E6E69,stroke-width:1px,color:#7FA8A2;
subgraph INGRESS["🌐 INGRESS"]
R53["Route 53 + ACM
DNS · TLS"]
ALB["ALB
HTTPS · health checks"]
end
subgraph IDENTITY["🔐 IDENTITY"]
COG["Cognito User Pool
JWT · M2M client creds"]
GOO["Google Workspace IdP
credentials · MFA"]
end
subgraph COMPUTE["⚙️ COMPUTE · ECS FARGATE"]
API["hv-hub-v2-api
NestJS"]
WEB["hv-hub-v2-web
Next.js"]
CLAW["clawbot
FastAPI · F3"]
BR["Bedrock
Claude Sonnet 4"]:::ai
end
subgraph DATA["💾 DATA · three engines"]
PG["RDS Postgres
audit · SQL"]
DDB["DynamoDB
rate · dedup"]
RED["Redis
cache · sessions"]
end
subgraph PLATFORM["🛠 PLATFORM"]
SEC["Secrets Manager"]
CW["CloudWatch + OTel"]
ECR["ECR · S3"]
CP["CodePipeline · IAM · VPC"]
end
R53 --> ALB
ALB --> WEB
ALB --> API
ALB --> CLAW
GOO -.-> COG
COG -.-> API
COG -.-> WEB
API -- "SSE" --> BR
API --> PG
API --> DDB
API --> RED
CLAW --> DDB
CLAW --> RED
WEB --> RED
API -.-> SEC
CLAW -.-> SEC
API -.-> CW
CLAW -.-> CW
WEB -.-> CW
CP -.-> COMPUTE
class R53,ALB ingress;
class COG,GOO identity;
class API,WEB,CLAW compute;
class PG,DDB,RED data;
class SEC,CW,ECR,CP platform;
| Service | Why we use it | Owner pillar | Status |
|---|---|---|---|
| ECS Fargate | Runs every long-lived process (Hub v2 API, Web, ClawBot worker) without us managing servers | AWS | live |
| ECR | Stores the container images we ship · one repo per service | AWS | live |
| ALB | Public HTTPS entry point · routes to ECS services by host/path | AWS | live |
| RDS Postgres | Durable relational state · audit log, sessions, idempotency keys, DLQ tracking · complex joins, ACID transactions, SQL reporting | AWS | live |
| DynamoDB | Hot path at scale · key-value for rate-limit counters, distributed locks, ClawBot dedup · single-digit ms, serverless, no throughput cap | AWS | spec |
| ElastiCache (Redis) | In-memory cache · Hub v2 session cache, Airtable read cache, internal pub/sub, lightweight queues (BullMQ) · sub-ms latency | AWS | spec |
| Secrets Manager | Every API key, every Basic auth password, every webhook secret · no more URL-embedded creds | AWS | live |
| CloudWatch + OTel | One pane of glass for logs, metrics, traces · alarms feed on-call | AWS | live |
| S3 | Static docs site (this page), exported reports, image assets | AWS | live |
| EventBridge + SQS | Fallback scheduler if Hub v2 @Cron ever needs to live outside the API · DLQ for failed jobs | AWS | on shelf |
| Route 53 + ACM | DNS for happen.ventures and subdomains · TLS certs auto-renewed | AWS | live |
| IAM | Per-service roles · no shared keys · ECS task roles instead of long-lived credentials | AWS | tightening |
| Cognito | User pool + Google SSO federation · single sign-on for Hub v2 web/api · token issuance for ECS workloads | AWS | in progress |
| CodePipeline | One deploy pipeline per service · push to main → ECR → ECS update | AWS | in progress |
| VPC | Private subnets for compute · public only for ALB · no direct internet on workers | AWS | live |
| Bedrock | Managed LLMs · Hub v2 API uses Claude Sonnet 4 (us.anthropic.claude-sonnet-4-6) via InvokeModelWithResponseStream for Wiki Chat (RAG over Airtable docs, prompt caching) and Robin Chat (ops assistant). IAM task role scoped to bedrock:InvokeModel* on Anthropic models | AWS | live |
We deliberately do not use Lambda, Step Functions, App Runner, EKS, Aurora or OpenSearch at the foundation level. They’re fine services — they’re just not in the contract. Adding one means a discussion, not a Jira ticket.
Each engine has a clear job. They don’t compete, they complement.
| Engine | Model | When to use it | Concrete HV example |
|---|---|---|---|
| Postgres (RDS) | Relational · ACID · SQL | Data needing joins, multi-table transactions, referential integrity, reporting | Audit log of every Hub v2 action · signed sessions · idempotency keys with expiry · ClawBot pipeline state |
| DynamoDB | Key-value / document · optional eventual consistency | Key-based access, infinite horizontal scale, predictable latency, no joins | Rate-limit counters per recipient · ClawBot dedup (hash → timestamp) · hot-read feature flags |
| Redis (ElastiCache) | In-memory · structures (strings, hashes, lists, sets, streams) | Cache, ephemeral sessions, pub/sub, distributed locks, lightweight queues | Airtable read cache (TTL 60s) · SSR session store · pub/sub to invalidate caches across ECS instances · worker retry queue |
Need SQL, joins or transactions? → Postgres. Need massive scale and key-based access? → DynamoDB. Need it blazing fast and OK to lose it? → Redis.
No more per-app passwords. Cognito is the identity layer; Google is the IdP humans actually use. Hub v2 web, Hub v2 API and any future internal surface trust the same JWT.
Next.js app redirects unauthenticated requests to the Cognito Hosted UI. No custom login form to maintain.
Cognito user pool has Google configured as an external IdP. The user signs in with their @happen.ventures account · MFA stays Google’s problem.
Cognito issues an ID + access token. Hub v2 API validates it with the Cognito JWKS · no shared secret, no session table to babysit.
| Component | Responsibility | Owns |
|---|---|---|
| Cognito user pool | Identity registry · token issuance · group/role claims | AWS |
| Google Workspace IdP | Credential check · MFA · account lifecycle (offboarding propagates) | |
| Hub v2 web (Next.js) | Hosted UI redirect · token storage in httpOnly cookies | Hub v2 |
| Hub v2 API (NestJS) | JWT validation guard · role-based access on every route | Hub v2 |
| ClawBot worker | Service-to-service: M2M token from Cognito client credentials grant | ClawBot |
Cognito is already in the AWS bill, lives in the same VPC posture, and federates to Google in one config. We avoid one more SaaS contract and one more secret to rotate. If we ever outgrow it, the JWT contract is portable.
Every new bot, playbook or surface follows the same path: ECS deploy · secret in Secrets Manager · entry in Airtable. No bespoke infra per feature.
The same diagram explains every workload. New hires onboard in days. Reviews focus on business logic, not infra plumbing.
Multi-AZ Fargate · managed RDS backups · CloudWatch alarms · rotation policies. The on-call playbook becomes “check the dashboard”, not “SSH and tail”.
Everything tagged by service. Cost attribution per pillar. No more “what is this $40/mo on the VPS host?”.
SOC 2 / GDPR conversations get a real answer: AWS-hosted, IAM-scoped, encrypted at rest and in transit, audit log in CloudWatch.
Airtable interfaces don’t change. The human review experience stays identical — only the engine behind it gets stronger.
Secrets and observability come first because they make every later step measurable. Compute before scheduler because the scheduler needs a stable target. Hardening last because you can’t harden what isn’t running yet.
Three calls to lock before F3 starts. None of them are technical — they’re scope and ownership.
No more side-deploys to Render, Railway, Heroku or a friend’s VPS. If it needs to run, it runs on ECS Fargate in us-east-2.
Cron logic lives in NestJS @Cron, not in n8n. EventBridge is the fallback we reach for only when the scheduler must outlive the API.
RDS handles operational state (sessions, idempotency, audit). It does not replace Airtable. Business state — playbooks, bots, approvals — stays where Robin can see it.