Building Budget Analysis Tools for Local Government

As an Ulster County Legislator, I needed a better way to analyze the $475 million annual budget. The existing workflow looked like most government budget review processes: a stack of Excel exports, pivot tables built by a single staff person who knew where everything lived, and legislators asking questions that took days to answer.

The problem wasn't a lack of data. The county had four years of detailed budget data at the line-item level — over 14,000 rows per year, organized by fund, department, division, category, and account. The problem was that the data was trapped in a format that required expertise to use.

The question that mattered

Before writing a line of code, we spent time understanding the actual decisions being made. Budget season involves a handful of recurring questions:

Where is spending growing fastest, and why?
Which departments are generating revenue vs. drawing on the general fund?
What changed between last year's adopted budget and this year's proposal?
How does this department's actual spending compare to what was budgeted?

These questions had answers in the data. They just required too much effort to surface.

Why a Sankey diagram

The core visualization is a Sankey flow diagram — a chart where the width of each band represents a dollar amount flowing from a revenue source to an expenditure type. At a glance, you can see that 27% of the county's spending goes to Contractual Expenses, that Federal Aid funds a disproportionate share of Social Services, or that Debt Service is almost entirely funded by county tax revenue.

The diagram is interactive: clicking any node drills into the next level of detail. Click "Contractual Expenses" and you see how it breaks down by department group. Click a department group and you see the individual departments within it, with their revenue allocations.

This kind of visualization doesn't replace analysis — it enables it. A legislator can explore the budget in minutes instead of waiting for a staff report.

DuckDB as the query engine

The data layer is built on DuckDB, an in-process analytical database that reads Parquet files directly. The budget data is processed once from Excel into a 0.16 MB Parquet file, and all queries run in-process with zero round-trips to an external database server.

This approach has a few advantages for this use case:

Speed. Complex aggregations across 14,000 rows complete in milliseconds. There's no query planner warming up, no network latency, no connection pool to manage.

Simplicity. There's no database server to deploy, monitor, or maintain. The application ships with its data.

SQL. Everything is standard SQL. When a new analysis requirement comes up, we write a query — not a new API endpoint and a new data model.

The one constraint: DuckDB's native module can't be used in Next.js API routes when using Turbopack, because the callback-based interface conflicts with the bundler's module system. All queries run in React Server Components, which execute on the server during the request lifecycle.

The variance explorer

One of the most useful features came from a direct request: "I want to see which budget lines changed the most from last year to this year."

The Variance Explorer lets a user select any two adjacent years, choose a filter (lines that increased by X%, decreased by X%, new lines, or zeroed-out lines), and see the results sorted by absolute dollar impact — largest changes first.

This kind of tool is straightforward to build but hard to replicate in a spreadsheet, because the underlying join (matching this year's line items to last year's, accounting for lines that didn't exist in one year) requires exactly the kind of multi-table logic that SQL handles naturally.

What made this project work

A few things made this engagement productive beyond the technical choices:

The client had real data and real questions. We weren't building something speculative — there were actual legislators with actual decisions to make.

We got access to the domain expert early. The budget director spent an afternoon explaining how the data was structured, where the edge cases were, and what the annual budget cycle looked like. That conversation saved weeks of reverse-engineering.

We started with the hardest visualization first. The Sankey diagram was the most complex component and the one most likely to reveal unknown requirements. Building it first meant the rest of the project de-risked progressively rather than saving hard problems for the end.

The result is a tool that county staff now use throughout the budget season. Whether that's worth more than another round of pivot tables is for them to judge — but so far, the answer seems to be yes.

Interested in a similar tool for your organization? Get in touch.