emanuelpetre.dev

Shipping a Rules Engine in 2 Weeks

By Emanuel on Sep 1, 2025
Abstract system diagram

A team I worked with had a problem that is more common than people admit: the existing rules engine was a tangle of conditions that had grown organically over time, and every new business requirement made it worse. Adding a new rule meant either hacking it into the existing structure or carving out yet another exception. Everyone agreed it needed to be replaced. The disagreement was about how.

The plan on the table when I arrived was a full rewrite: design the new system from scratch, build it completely, and ship it as a single release. Months of work, then one deployment. The “Big Bang.”

Why the Big Bang Is a Trap

The appeal of a Big Bang rewrite is obvious. You get to start clean, without the weight of the old structure pulling at every decision. You can do it right this time. The problem is that “right” is only visible in retrospect, and a Big Bang forces you to commit to a design before you have feedback from real users in production.

The other problem is risk accumulation. Every week of development before the first release is another week of untested assumptions. By the time you ship, the gap between what the team built and what users actually need has been widening for months. And there’s no incremental path back. It’s all or nothing.

The team was a few weeks into this approach when it became clear the timeline was slipping. The new rules engine was more complex than anticipated, and the old one was still running in production with new requirements still being added to it.

A Different Framing

The question I asked was: what is the smallest version of the new system that could run alongside the old one and handle real cases?

Not a full replacement, but a coexistence. The old engine keeps running for everything it already handles. The new engine takes on new cases incrementally, starting with internal users and then expanding. At some point, the new engine handles enough that the old one can be retired. The “official” release isn’t a launch. It’s a cleanup.

This changes the risk profile entirely. Instead of one high-stakes deployment, you have a series of small, observable steps. Each one either works or it doesn’t, and you learn either way.

The DSL Approach

The technical mechanism that made this possible was a simple Domain Specific Language, a configuration format that described rules in terms the business logic understood, without coupling to the internals of either the old or new system.

The key constraint I set: the DSL had to map directly onto existing database structures. No new schema migrations, no parallel data model. The rules describe what to evaluate and how to combine the results, but the underlying data they operate on is the same data the old system was already using.

This meant the two systems could run simultaneously without fighting over the database. A request could be evaluated by both engines, their outputs compared, and discrepancies logged. When the new engine’s output matched the old one consistently, confidence was high enough to route real traffic through it.

Iterating in Production

The first users of the new engine were internal: the team, then other employees, then a small beta group. Not because the code wasn’t ready for real users, but because the feedback loop from internal users is faster. They can describe what broke and why. They file tickets rather than churning.

By the time the new engine was handling external traffic, most of the edge cases had already surfaced and been resolved. The experience was boring in the best possible way.

The total time from “here is the DSL design” to “new engine handling production traffic” was about two weeks. The “official” final release, retiring the last pieces of the old engine, was a non-event. Nobody sent an announcement. There was nothing to announce.

What This Approach Costs

It’s worth being honest about the tradeoffs. Running two systems simultaneously is more complex than running one. The DSL adds an abstraction layer that needs to be understood and maintained. The comparison logging creates noise. And the incremental approach requires discipline. It’s tempting to keep adding things to the new engine before cutting over the old one, and that way lies scope creep.

The other cost is that this approach requires organizational patience. A Big Bang has a clear end date, even if it is unreliable. An incremental rollout has a fuzzier timeline, even if it’s safer. Some teams find that ambiguity uncomfortable.

But I’ve never seen a Big Bang rewrite go as planned. I’ve seen the side-by-side approach succeed more times than I can count. The tradeoffs are real, but they’re manageable. The alternative’s tradeoffs are real too, and they’re not.

Emanuel Petre | Software Engineer.