All work / 01 of 06

Inbound Sentinel.

A 24/7 inbound lead-qualification agent for a B2B SaaS team. It reads the form, asks the right follow-ups, books the meeting, and tells sales what to know before they walk in.

Client
B2B SaaS · 18-person team
Role
Solo AI engineer · design, build, ship
Timeline
6 weeks · Mar–Apr 2026
Status
Live in production
01 · Context

The team was drowning in good inbound and losing it anyway.

A growing B2B SaaS company had a real problem most teams would happily trade for: too many inbound leads. Their site was working. The marketing was working. What wasn’t working was the next 24 hours after a form submission.

Two AEs were splitting all inbound. By the time they triaged, qualified, and tried to book a call, the average response was 19 hours. Leads went cold. The best-fit prospects — the ones already evaluating competitors — were the ones most likely to drop off.

They didn’t need another tool that pinged Slack. They needed something to do the work.

02 · Problem

Speed-to-qualify, not speed-to-respond.

“Reply faster” is a metric that gets gamed by autoresponders that say nothing. The real ask was harder: read each lead carefully, decide whether they’re worth time, get the missing context, and book the meeting — at three in the morning, on a Sunday, on Christmas Eve.

The constraints I set with the team:

  • The agent never lies, never invents, and never pretends to be human.
  • It must hand off cleanly. Sales should walk into the call already knowing the answers.
  • It must be observable. Every action is logged, replayable, and overrideable.
  • It must fail gracefully. When unsure, escalate to a person within five minutes.
03 · What I built

A small, opinionated agent with a small, opinionated toolbox.

One LLM agent loop, four tools, one supervisor that can interrupt it. The whole system fits on one page if you print it. That’s the goal: the smallest thing that solves the problem and stays solved.

Architecture

┌────────────────────────┐ form submit ───▶ │ ingest + enrich │ │ (Clearbit, ZeroBounce)│ └──────────┬─────────────┘ ▼ ┌────────────────────────┐ │ agent loop (GPT-4o) │ │ tools: │ │ - crm.search │ │ - email.send │ │ - calendar.book │ │ - slack.notify │ └──────────┬─────────────┘ ▼ ┌────────────────────────┐ │ supervisor (rules) │ │ trip wires + escal. │ └──────────┬─────────────┘ ▼ booked meeting + brief

Stack

Python 3.12 FastAPI OpenAI gpt-4o LangGraph Postgres + pgvector Modal (workers) Resend Cal.com API HubSpot API Linear (escalation) Logfire

Key decisions

  • One agent, not five. The first instinct in 2026 is to wire up a multi-agent system. I built a single agent with a clear toolbox because I could observe it, debug it, and explain it to the team. We can split later if scaling demands it. It hasn’t.
  • Tools, not freeform email. The agent doesn’t draft email bodies from scratch — it picks from a small set of templated reply strategies and fills slots. This collapses an entire class of brand-voice failures and makes outputs reviewable.
  • The supervisor is rules, not another model. A small set of hard checks (don’t book outside business hours, don’t reply to free-email-domain ICP mismatches without flagging, etc.) sits between the agent’s decisions and any external action. Deterministic guardrails for the things you don’t want to think about twice.
  • RAG for context, not for answers. A pgvector store of the team’s past sales conversations gives the agent recall (“we’ve seen this objection before, here’s how Sarah handled it”) — surfaced in the brief to the AE, not pushed into outbound copy.
  • Full replay. Every run produces a trace. The team can scrub through any conversation, see the model’s reasoning, the tool calls, and the supervisor’s interventions. This was the single biggest factor in earning the team’s trust.

What a single run looks like

[14:02:11] ingest → form lead: M.Lin · acme.co · 850-employee SaaS [14:02:13] enrich → role:Head of RevOps · matched ICP segment B [14:02:14] crm.search → 0 existing records · 2 historical website touches [14:02:18] agent → strategy:qualify-with-context · template:T7 [14:02:21] email.send → "Hi Marcus, before I grab time with Priya…" (3 Qs) [14:11:46] (reply) → answers received · seat count = 1,200 [14:11:50] agent → strategy:book · proposed 3 slots [14:12:05] supervisor → ok (within business hours, ICP confirmed) [14:18:09] (reply) → "Thursday 2pm works" [14:18:11] calendar → booked w/ Priya · invite sent [14:18:12] slack → #inbound: brief posted (147 words)
04 · Outcome

It quietly did the job, which was the point.

Numbers below are 90-day post-launch, measured against the matched 90-day pre-launch baseline. Source: HubSpot + the agent’s own trace logs.

19h → 7m
Median time to first response
2.8×
Booked meetings per 100 leads
312/wk
Conversations handled, autonomously

The qualitative wins mattered as much. AEs stopped doing intake and started doing sales. The team’s Monday meeting got shorter. One AE said the briefs were better than what she used to write for herself.

Escalation rate stabilized at 6%, which is the number we want — high enough to mean the agent isn’t pretending to handle things it can’t, low enough to mean it’s actually working.

05 · Lessons

What I’d do the same. What I’d do differently.

Keep: the supervisor pattern. Letting deterministic code interrupt model decisions is the cleanest way I’ve found to ship an agent without losing sleep. It also makes “why did it do that” an answerable question.

Keep: traces as a first-class product. The replay UI was a few hundred lines of code and earned the most trust per line of anything in the system.

Change: I built the prompt as a single long file and kept tuning it. Halfway through I should’ve refactored to a small registry of named strategies with their own evals. By week five I had a strategy registry; I’d start with that next time.

Change: I underspecified the “agent gives up” criteria for the first two weeks. The agent kept trying. Adding explicit budget caps (max tool calls per lead, max latency per turn) fixed it in an afternoon and would’ve saved a week of fiddling.