AI calling agent — 4-week MVP for Tier-1 customer support



Constraint
The box they were trapped in
A SaaS support team was burning out on Tier-1 calls — password resets, ticket status checks, basic "how do I do X." The work was repetitive, the volume was steady, and human agents were the only thing answering the phone. The client wanted to validate whether an AI voice agent could handle the floor of that queue well enough to ship — and find out inside a four-week window, not after a six-month platform build.
Approach
How we attacked it
Twilio Voice for the call leg, Whisper for speech-to-text, ElevenLabs for text-to-speech, GPT-4 driving the dialogue with conversation memory through LangChain, and a small set of CRM tool calls so the agent can update tickets, check statuses, and book appointments live during the call. FastAPI behind it, Dockerized, deployed on AWS ECS. The whole MVP fits in a single service that any of the team's existing engineers can read and extend.
Decisions
What we picked, and what we rejected
Off-the-shelf voice stack, custom orchestration
Twilio + Whisper + ElevenLabs + GPT-4 are battle-tested parts with known latency profiles. Inside a four-week MVP we couldn't afford to also be debugging the speech pipeline. The novel work goes into the dialogue orchestration and the tool-call layer; the voice path is the part you don't want to invent.
Tier-1 scope only, with a clean handoff to humans
An agent that's good at one slice of the queue is shippable; an agent that's mediocre at the whole queue is a worse experience than what the team has now. We scoped to the Tier-1 patterns the client could enumerate — password reset, status check, basic Q&A, appointment booking — and built a hard handoff for everything else.
CRM tool calls execute live during the call
If the agent ends with "someone will follow up on your reset shortly," the call was theatre. The reset, the status check, the booking, the ticket creation all run as tool calls inside the conversation, with the agent narrating the result. That's the difference between a voice menu and an agent.
Dockerized FastAPI on ECS — same shape as production AI work
We deploy AI services this way regularly. Building the MVP in the same shape meant going from "works on a demo call" to "running 24/7 against the CRM" was a config change, not an architecture migration. The four-week timeline only worked because we didn't reinvent infrastructure on the way.
Trade-off
What we didn't build
We deliberately scoped the agent to Tier-1 only — password reset, status check, basic Q&A, appointment booking. Tier-2 and Tier-3 calls (debugging, account changes, billing disputes) stay with humans, and the agent hands off cleanly when it should. Building a full multi-tenant conversational platform was a separate, larger project that later became AgentVoice. This MVP's job was to prove the loop — voice in, LLM in the middle, real action out — works on a real customer queue inside a four-week window. We did not try to make it everything at once.
Outcome
What changed after we shipped
MVP shipped in four weeks. The agent runs 24/7 against the client's CRM, handles inbound and outbound Tier-1 calls, and resolves the queries it owns in under 90 seconds on average. The architecture and the learnings — what works in tool calling under voice latency, where the dialogue breaks, when to hand off — fed directly into the next, larger build that became AgentVoice.
Talk to us
Have a similar project in mind?
Tell us what you're working on. We'll let you know whether it's a fit.