AgentVoice — voice-first AI agent platform for inbound and outbound calls
Constraint
The box they were trapped in
The phone is still where most sales, support, and ops actually happens, and it doesn't scale the way a web form does. Human agents are expensive, hard to staff for peaks, and tied up on repetitive calls — appointment booking, lead qualification, follow-ups, CRM updates. Old IVR menus annoy callers; text chatbots can't do a call at all. We needed to build a platform where an AI agent could hold a real phone conversation, take actions during the call, and run thousands of those calls in parallel without falling over.
Approach
How we attacked it
A real-time streaming pipeline: speech-to-text → LLM-driven dialogue → neural text-to-speech, with a custom orchestration layer between them that manages turn-taking, interruptions, and context across the conversation rather than treating each utterance as an isolated query. A workflow engine sits on top so each agent has branching logic, intent routing, and multi-step actions wired into the call — not bolted on afterward. Tool integrations (CRM, calendars, SMS, ticketing, customer APIs) execute live during the call, so a booking, qualification, or follow-up actually finishes before the caller hangs up. SIP and WebRTC telephony underneath, so the platform plugs into the customer's existing phone numbers and PBX without owning their carrier relationship.
Decisions
What we picked, and what we rejected
Streaming STT → LLM → TTS, not batch request/response
The line between a phone call and a chatbot is roughly 700 ms of round-trip latency. Streaming transcription, streaming model output, and streaming voice synthesis run in parallel so the agent can start responding before the caller finishes their sentence — and stop talking the moment they get interrupted. A batch pipeline would have failed the call before failing the test.
Frontier LLMs underneath, custom orchestration on top
Training a custom voice LLM was on the table and we rejected it. At this latency the bottleneck isn't model intelligence — it's how the system handles interruptions, turn-taking, and faithful tool calls. We invested the engineering in that orchestration layer instead and let the model providers handle model improvements.
Tool/API calls execute live during the call
If the agent collects information and then "someone will book your slot shortly," the caller's already gone. The booking, the CRM update, the calendar invite, the SMS confirmation — all of those run as tool calls inside the conversation, with the agent narrating the result back to the caller. That's the entire reason a voice agent beats a form.
SIP + WebRTC telephony over a single carrier
Customers already have phone numbers, PBXs, and SIP trunks. Locking the platform to one carrier would have made adoption a procurement project. Speaking SIP and WebRTC means we plug into what the customer already runs — and we're not the single point of carrier failure.
Trade-off
What we didn't build
We did not train a custom voice LLM. At this latency budget, the difference between an agent that sounds human and one that sounds like a chatbot is mostly the orchestration layer — sub-second turn-taking, interruption handling, faithful tool use — not raw model capability. So we built that orchestration layer hard and ran frontier LLMs underneath. We also kept the platform multi-tenant and brand-customizable rather than going vertical on one industry: at the orchestration layer, a tier-1 support agent and an outbound qualification agent are the same machine wearing different scripts.
Outcome
What changed after we shipped
Live platform at https://www.agentvoice.com. Businesses deploy voice agents that take inbound calls and place outbound ones — appointment scheduling, lead qualification, customer onboarding, follow-ups — with CRM updates, SMS triggers, and tool calls happening mid-conversation. Concurrency scales horizontally for peak-load campaigns without spinning up a call centre.
Talk to us
Have a similar project in mind?
Tell us what you're working on. We'll let you know whether it's a fit.