PowerDrill AI — conversational analytics over CSVs, databases, and PDFs
Constraint
The box they were trapped in
Most of the data a company actually runs on lives somewhere awkward — a SQL warehouse only analysts can query, an Excel file emailed around the team, a 60-page PDF nobody opens, a NoSQL collection behind an API. The people who need answers (finance, sales, ops, research) have to file requests with someone technical and wait. Traditional BI tools demand SQL, schema knowledge, and dashboard discipline most decision-makers don't have. The result is a data lake nobody can talk to.
Approach
How we attacked it
A conversational layer that picks its backend per question. An LLM parses the user's plain-English query, decides whether the answer lives in a connected SQL/NoSQL database, an uploaded CSV/Excel file in memory, or text/tables/OCR'd content from a PDF, and routes accordingly. Pandas and NumPy do the actual tabular work; a RAG pipeline with vector embeddings handles document grounding; database connectors run live SQL against PostgreSQL, MySQL, SQL Server, and the NoSQL stores. Charts are picked by the shape of the result — bar, line, pie, scatter, heatmap — and rendered in-browser. Multi-turn context lets the user drill down without re-stating the dataset.
Decisions
What we picked, and what we rejected
Every answer is computed from the data, not generated from the model
An LLM answering an analytics question without computation is a confident guess. The architecture forces every response through SQL on a database, Pandas on a tabular file, or RAG retrieval against document chunks before the model narrates the result — and exposes the source rows alongside the answer so the user can verify it. Slower than a guess, honest unlike one.
One conversation, many backends
A user shouldn't have to know whether their answer lives in Postgres, an uploaded Excel sheet, or a PDF chapter. The LLM picks the backend based on the question and the connected sources — SQL connectors for live databases, the analytics engine for in-memory tabular files, RAG with OCR for documents. The user sees one chat; underneath, three execution paths.
Auto-visualization driven by result shape, not user choice
If the user has to choose a chart type, it's a dashboard tool, not a conversation. The visualization layer reads the shape of the query result — categorical vs continuous, time vs not, two columns vs many — and picks the right chart. Bar, line, pie, scatter, heatmap are all on the table. The user can override; most of the time they don't need to.
Reasoning chains exposed to the user
Verifiability is the trust contract for a data product. Every answer ships with the source rows it came from and the computation logic that produced it — visible, not buried in a debug pane. If the answer is wrong, the user can see why, and tell us. If it's right, they can defend it to whoever asks.
Trade-off
What we didn't build
We did not build a thin LLM-on-top-of-data wrapper. The whole point of conversational analytics is that the answer is true, and the only way to make that true is to force every response through retrieval or computation against the actual data before the model gets to narrate it. That costs latency on simple questions and a lot of orchestration code we wouldn't need if we'd let the LLM guess. We also did not build a dashboard tool; auto-visualizations come from the query, not from a canvas the user has to design.
Outcome
What changed after we shipped
Live at https://chat.powerdrill.ai. Finance, sales, ops, healthcare, and research teams ask plain-English questions of their CSVs, Excel files, PDFs, or SQL/NoSQL databases and get answers, charts, and summaries — with the source rows and computation logic shown alongside each answer so the team can audit it.
Talk to us
Have a similar project in mind?
Tell us what you're working on. We'll let you know whether it's a fit.