ChatQRE AI agent — natural-language access to internal business data



Constraint
The box they were trapped in
Internal business teams needed answers from data spread across a SQL warehouse and a handful of ops tools. Writing the SQL was an analyst's job, and analysts were the bottleneck — every "how did region X perform last quarter" took a ticket and a day. Off-the-shelf chatbots could talk, but they couldn't run a SQL query against the client's warehouse, apply filters, or carry context across follow-ups. The team needed a real agent over their data, not a chat layer next to it.
Approach
How we attacked it
An AI agent that sits over the client's internal data with three jobs: parse the natural-language question, generate and run the right SQL against the warehouse, then narrate the result. FastAPI runs the request lifecycle; LangChain manages the tool-using agent loop; GPT-4 generates the SQL and the response. A vector index handles the unstructured side — internal docs, definitions, schema notes — so the agent grounds its query against the same vocabulary the team uses. Per-user auth scopes what the agent can see. Multi-turn context means the user can ask, then filter, then drill down without re-stating the dataset.
Decisions
What we picked, and what we rejected
Agent loop, not single-shot LLM-to-SQL
A real question — "how did region X perform last quarter compared to the previous one" — almost never resolves to one clean SQL statement. The agent inspects the schema, writes a query, looks at the result, and refines. Single-shot prompts answer the easy half of the questions and miss the half that matter.
Generated SQL for structured data, RAG for unstructured docs
Vector search over the warehouse to count rows would be wrong on its face — that's what SQL is for. RAG earns its place on the document side: schema notes, definitions, internal ops docs the team uses to talk about the data. The agent picks the right grounding for the data shape, instead of forcing one technique to do both jobs.
Per-user auth scoping at the agent layer
Row-level security in the warehouse alone isn't enough — the agent decides which queries to run, and a leak is a leak whether the SQL was hand-written or generated. Auth flows from the user through the agent into the SQL it generates, so an answer can never include rows the requesting user shouldn't see.
Docker + AWS ECS from prototype to scaling phase
The prototype was a FastAPI service in a container. The scaling phase was the same container with horizontal replicas behind ECS. No rewrite, no infrastructure surprise — the path from "working at one user" to "working at the team" was a config change, not an architecture migration.
Trade-off
What we didn't build
We rejected a thin LLM-on-SQL wrapper. Real business questions don't translate to a single SQL statement — the agent has to inspect the schema, write a query, see what comes back, and refine. We also rejected pure retrieval (RAG only): for structured numbers in a warehouse, generated SQL beats vector search every time. The mix — agent loop with SQL for structured data and RAG for the document layer — is the architecture, and it cost us complexity we wouldn't have if we'd picked one or the other.
Outcome
What changed after we shipped
ChatQRE shipped as a working agent: prototype live in three weeks, scaled in two more. Business teams ask questions of their internal data in plain English, get a SQL-grounded answer with the source rows visible, then filter and follow up without re-stating context — the analyst ticket queue isn't the only path to a number anymore.
Talk to us
Have a similar project in mind?
Tell us what you're working on. We'll let you know whether it's a fit.