Back to case studies
AI for BusinessAI-Powered Business Automation

AI calling agent — 4-week MVP for Tier-1 customer support

Client Internal R&D / SaaS partner
Duration 4 weeks (MVP)
Type Tier-1 voice support agent (MVP)
Twilio Voice (call leg)Whisper (STT)ElevenLabs (TTS)GPT-4 + LangChain (dialogue + memory)CRM tool calls (status, booking, ticketing)FastAPIDockerAWS ECS
AI Calling Agent
AI Calling Agent
AI Calling Agent

Constraint

The box they were trapped in

A SaaS support team was burning out on Tier-1 calls — password resets, ticket status checks, basic "how do I do X." The work was repetitive, the volume was steady, and human agents were the only thing answering the phone. The client wanted to validate whether an AI voice agent could handle the floor of that queue well enough to ship — and find out inside a four-week window, not after a six-month platform build.

Approach

How we attacked it

Twilio Voice for the call leg, Whisper for speech-to-text, ElevenLabs for text-to-speech, GPT-4 driving the dialogue with conversation memory through LangChain, and a small set of CRM tool calls so the agent can update tickets, check statuses, and book appointments live during the call. FastAPI behind it, Dockerized, deployed on AWS ECS. The whole MVP fits in a single service that any of the team's existing engineers can read and extend.

Decisions

What we picked, and what we rejected

01

Off-the-shelf voice stack, custom orchestration

Twilio + Whisper + ElevenLabs + GPT-4 are battle-tested parts with known latency profiles. Inside a four-week MVP we couldn't afford to also be debugging the speech pipeline. The novel work goes into the dialogue orchestration and the tool-call layer; the voice path is the part you don't want to invent.

02

Tier-1 scope only, with a clean handoff to humans

An agent that's good at one slice of the queue is shippable; an agent that's mediocre at the whole queue is a worse experience than what the team has now. We scoped to the Tier-1 patterns the client could enumerate — password reset, status check, basic Q&A, appointment booking — and built a hard handoff for everything else.

03

CRM tool calls execute live during the call

If the agent ends with "someone will follow up on your reset shortly," the call was theatre. The reset, the status check, the booking, the ticket creation all run as tool calls inside the conversation, with the agent narrating the result. That's the difference between a voice menu and an agent.

04

Dockerized FastAPI on ECS — same shape as production AI work

We deploy AI services this way regularly. Building the MVP in the same shape meant going from "works on a demo call" to "running 24/7 against the CRM" was a config change, not an architecture migration. The four-week timeline only worked because we didn't reinvent infrastructure on the way.

Trade-off

What we didn't build

We deliberately scoped the agent to Tier-1 only — password reset, status check, basic Q&A, appointment booking. Tier-2 and Tier-3 calls (debugging, account changes, billing disputes) stay with humans, and the agent hands off cleanly when it should. Building a full multi-tenant conversational platform was a separate, larger project that later became AgentVoice. This MVP's job was to prove the loop — voice in, LLM in the middle, real action out — works on a real customer queue inside a four-week window. We did not try to make it everything at once.

Outcome

What changed after we shipped

MVP shipped in four weeks. The agent runs 24/7 against the client's CRM, handles inbound and outbound Tier-1 calls, and resolves the queries it owns in under 90 seconds on average. The architecture and the learnings — what works in tool calling under voice latency, where the dialogue breaks, when to hand off — fed directly into the next, larger build that became AgentVoice.

Talk to us

Have a similar project in mind?

Tell us what you're working on. We'll let you know whether it's a fit.