Foundation · executive view · 2026-05-25

The backbone
everything else sits on.

Before we ship more bots, more playbooks or more surfaces, we agree on the foundation. One AWS account, one application core (Hub v2), one source of truth (Airtable) and one worker (ClawBot). Every future feature plugs into these four — nothing else.

1AWS account
1App core · Hub v2
1SSOT · Airtable
1Worker · ClawBot
0SPOF VPS
Scroll · The four pillars
The pillars

Four pieces. No exceptions.

If a new initiative doesn’t map cleanly onto one of these four, it belongs in a discovery doc — not in production. This is the contract Jessica and Farid asked us to draw before we go deeper.

Pillar 01

AWS · the platform

One account (891377221201) in us-east-2. Every workload runs on managed primitives: ECS Fargate, RDS, Secrets Manager, CloudWatch, S3, ALB, IAM, VPC. No more SSH-into-a-VPS.

live Hub v2 cluster running

Pillar 02

Hub v2 · the app core

NestJS API + Next.js web on ECS Fargate. Absorbs the 16 n8n cron callers (@Cron decorators) and the ~8 webhook routers (@Controller('/ingest/*')). One deploy pipeline, one auth model.

live hv-hub-v2-{api,web}

Pillar 03

Airtable · the SSOT

Source of truth for bots, playbooks, guardrails, queue state and human approval. 5 bases, 23 SSOT tables. The migration touches infra, not data — Airtable stays the surface humans see.

unchanged appUDQ65M1lSnSM5p +4

Pillar 04

ClawBot · the worker

FastAPI engine that drafts every outbound message. Today on the VPS, tomorrow on Fargate behind ALB + HTTPS. Same logic, same prompts, same playbooks — different host.

migrating VPS → ECS Fargate · F3

Rule of thumb

If a request can’t be answered with “it’s an AWS service, a Hub v2 endpoint, an Airtable table or a ClawBot playbook” — we’re inventing a fifth pillar. That’s the moment to stop and talk.

AWS stack

What we’re standing on

The AWS services that make up the foundation, grouped by what they do for us. No experimental services, no exotic regions — only the boring, well-understood primitives.

drag · scroll · pinch
flowchart TB
  classDef ingress fill:#16201F,stroke:#47BFDD,stroke-width:1.5px,color:#E8FFFA;
  classDef identity fill:#16201F,stroke:#AD7FE1,stroke-width:1.5px,color:#E8FFFA;
  classDef compute fill:#0F1F1C,stroke:#32BDAE,stroke-width:2px,color:#A2FFE1;
  classDef ai fill:#1A1620,stroke:#FAC236,stroke-width:1.5px,color:#FFE69A;
  classDef data fill:#16201F,stroke:#32BDAE,stroke-width:1.5px,color:#E8FFFA;
  classDef platform fill:#101817,stroke:#4E6E69,stroke-width:1px,color:#7FA8A2;

  subgraph INGRESS["🌐 INGRESS"]
    R53["Route 53 + ACM
DNS · TLS"] ALB["ALB
HTTPS · health checks"] end subgraph IDENTITY["🔐 IDENTITY"] COG["Cognito User Pool
JWT · M2M client creds"] GOO["Google Workspace IdP
credentials · MFA"] end subgraph COMPUTE["⚙️ COMPUTE · ECS FARGATE"] API["hv-hub-v2-api
NestJS"] WEB["hv-hub-v2-web
Next.js"] CLAW["clawbot
FastAPI · F3"] BR["Bedrock
Claude Sonnet 4"]:::ai end subgraph DATA["💾 DATA · three engines"] PG["RDS Postgres
audit · SQL"] DDB["DynamoDB
rate · dedup"] RED["Redis
cache · sessions"] end subgraph PLATFORM["🛠 PLATFORM"] SEC["Secrets Manager"] CW["CloudWatch + OTel"] ECR["ECR · S3"] CP["CodePipeline · IAM · VPC"] end R53 --> ALB ALB --> WEB ALB --> API ALB --> CLAW GOO -.-> COG COG -.-> API COG -.-> WEB API -- "SSE" --> BR API --> PG API --> DDB API --> RED CLAW --> DDB CLAW --> RED WEB --> RED API -.-> SEC CLAW -.-> SEC API -.-> CW CLAW -.-> CW WEB -.-> CW CP -.-> COMPUTE class R53,ALB ingress; class COG,GOO identity; class API,WEB,CLAW compute; class PG,DDB,RED data; class SEC,CW,ECR,CP platform;
AWS services · the short list

Each service, in one line

ServiceWhy we use itOwner pillarStatus
ECS FargateRuns every long-lived process (Hub v2 API, Web, ClawBot worker) without us managing serversAWSlive
ECRStores the container images we ship · one repo per serviceAWSlive
ALBPublic HTTPS entry point · routes to ECS services by host/pathAWSlive
RDS PostgresDurable relational state · audit log, sessions, idempotency keys, DLQ tracking · complex joins, ACID transactions, SQL reportingAWSlive
DynamoDBHot path at scale · key-value for rate-limit counters, distributed locks, ClawBot dedup · single-digit ms, serverless, no throughput capAWSspec
ElastiCache (Redis)In-memory cache · Hub v2 session cache, Airtable read cache, internal pub/sub, lightweight queues (BullMQ) · sub-ms latencyAWSspec
Secrets ManagerEvery API key, every Basic auth password, every webhook secret · no more URL-embedded credsAWSlive
CloudWatch + OTelOne pane of glass for logs, metrics, traces · alarms feed on-callAWSlive
S3Static docs site (this page), exported reports, image assetsAWSlive
EventBridge + SQSFallback scheduler if Hub v2 @Cron ever needs to live outside the API · DLQ for failed jobsAWSon shelf
Route 53 + ACMDNS for happen.ventures and subdomains · TLS certs auto-renewedAWSlive
IAMPer-service roles · no shared keys · ECS task roles instead of long-lived credentialsAWStightening
CognitoUser pool + Google SSO federation · single sign-on for Hub v2 web/api · token issuance for ECS workloadsAWSin progress
CodePipelineOne deploy pipeline per service · push to main → ECR → ECS updateAWSin progress
VPCPrivate subnets for compute · public only for ALB · no direct internet on workersAWSlive
BedrockManaged LLMs · Hub v2 API uses Claude Sonnet 4 (us.anthropic.claude-sonnet-4-6) via InvokeModelWithResponseStream for Wiki Chat (RAG over Airtable docs, prompt caching) and Robin Chat (ops assistant). IAM task role scoped to bedrock:InvokeModel* on Anthropic modelsAWSlive

Why this list and not more

We deliberately do not use Lambda, Step Functions, App Runner, EKS, Aurora or OpenSearch at the foundation level. They’re fine services — they’re just not in the contract. Adding one means a discussion, not a Jira ticket.

Data · three engines, three jobs

Postgres, DynamoDB, Redis — no overlap

Each engine has a clear job. They don’t compete, they complement.

EngineModelWhen to use itConcrete HV example
Postgres (RDS)Relational · ACID · SQLData needing joins, multi-table transactions, referential integrity, reportingAudit log of every Hub v2 action · signed sessions · idempotency keys with expiry · ClawBot pipeline state
DynamoDBKey-value / document · optional eventual consistencyKey-based access, infinite horizontal scale, predictable latency, no joinsRate-limit counters per recipient · ClawBot dedup (hash → timestamp) · hot-read feature flags
Redis (ElastiCache)In-memory · structures (strings, hashes, lists, sets, streams)Cache, ephemeral sessions, pub/sub, distributed locks, lightweight queuesAirtable read cache (TTL 60s) · SSR session store · pub/sub to invalidate caches across ECS instances · worker retry queue

Selection rule

Need SQL, joins or transactions? → Postgres. Need massive scale and key-based access? → DynamoDB. Need it blazing fast and OK to lose it? → Redis.

Auth · single sign-on

Cognito + Google · one login for the whole stack

No more per-app passwords. Cognito is the identity layer; Google is the IdP humans actually use. Hub v2 web, Hub v2 API and any future internal surface trust the same JWT.

Step 01

User hits Hub v2 web

Next.js app redirects unauthenticated requests to the Cognito Hosted UI. No custom login form to maintain.

Step 02

Cognito hands off to Google

Cognito user pool has Google configured as an external IdP. The user signs in with their @happen.ventures account · MFA stays Google’s problem.

Step 03

JWT returns to Hub v2

Cognito issues an ID + access token. Hub v2 API validates it with the Cognito JWKS · no shared secret, no session table to babysit.

ComponentResponsibilityOwns
Cognito user poolIdentity registry · token issuance · group/role claimsAWS
Google Workspace IdPCredential check · MFA · account lifecycle (offboarding propagates)Google
Hub v2 web (Next.js)Hosted UI redirect · token storage in httpOnly cookiesHub v2
Hub v2 API (NestJS)JWT validation guard · role-based access on every routeHub v2
ClawBot workerService-to-service: M2M token from Cognito client credentials grantClawBot

Why Cognito and not Auth0 / Clerk

Cognito is already in the AWS bill, lives in the same VPC posture, and federates to Google in one config. We avoid one more SaaS contract and one more secret to rotate. If we ever outgrow it, the JWT contract is portable.

What the foundation unlocks

Once we agree, this is what we get

For Jessica

Predictable shipping

Every new bot, playbook or surface follows the same path: ECS deploy · secret in Secrets Manager · entry in Airtable. No bespoke infra per feature.

For Farid

One architecture story

The same diagram explains every workload. New hires onboard in days. Reviews focus on business logic, not infra plumbing.

For the team

No more 3 AM VPS

Multi-AZ Fargate · managed RDS backups · CloudWatch alarms · rotation policies. The on-call playbook becomes “check the dashboard”, not “SSH and tail”.

For finance

One AWS bill

Everything tagged by service. Cost attribution per pillar. No more “what is this $40/mo on the VPS host?”.

For sales

Compliance answers

SOC 2 / GDPR conversations get a real answer: AWS-hosted, IAM-scoped, encrypted at rest and in transit, audit log in CloudWatch.

For Robin

Same approval surface

Airtable interfaces don’t change. The human review experience stays identical — only the engine behind it gets stronger.

Migration in four phases

How we get from today to the foundation

01
F2 · Secrets + Observability
Move every credential into Secrets Manager. Wire CloudWatch + OTel before touching compute. Measurable cutover from day one.
touches: pillar 01 (AWS)
02
F3 · Compute + Transport
Containerize ClawBot. Run it on ECS Fargate behind ALB + HTTPS. Dual-run with the VPS for one week, then cut over.
touches: pillar 01 + pillar 04
03
F4 · Scheduler absorbed into Hub v2
The 16 n8n cron callers become NestJS @Cron methods in Hub v2. EventBridge stays on the shelf as a fallback. n8n keeps only the webhook routers until F5.
touches: pillar 02
04
F5 · Hardening + decommission
Multi-AZ verified, autoscaling tuned, runbooks complete. Webhook routers absorbed into @Controller('/ingest/*'). VPS turned off. n8n turned off for ClawBot traffic.
exit: SPOF removed · single pipeline of glass

Order is non-negotiable

Secrets and observability come first because they make every later step measurable. Compute before scheduler because the scheduler needs a stable target. Hardening last because you can’t harden what isn’t running yet.

Decision asks

What we need agreement on

Three calls to lock before F3 starts. None of them are technical — they’re scope and ownership.

Ask 01

AWS is the only host

No more side-deploys to Render, Railway, Heroku or a friend’s VPS. If it needs to run, it runs on ECS Fargate in us-east-2.

Ask 02

Hub v2 owns scheduling

Cron logic lives in NestJS @Cron, not in n8n. EventBridge is the fallback we reach for only when the scheduler must outlive the API.

Ask 03

Airtable stays the SSOT

RDS handles operational state (sessions, idempotency, audit). It does not replace Airtable. Business state — playbooks, bots, approvals — stays where Robin can see it.