Building a multi-exchange trading system

A multi-exchange trading system looks like a trading problem. Underneath, it's a distributed systems problem wearing a trading costume. Order routing — the part everyone starts with — is the easy bit. What actually breaks is holding one consistent view of the world across venues that don't agree with each other.

The pitch is simple enough: trade the same instrument across five venues, take the best price, never miss a fill. The architecture that actually delivers it is not. The moment you connect that second exchange, you've stopped building a trading system. You're now building something that has to reconcile five independent sources of truth, and each one has its own latency, its own outages, and its own opinion about what your position is. They rarely agree.

The five components

Every serious multi-exchange system I've worked on ends up with the same five parts. It's not fashion. Each one exists to absorb a specific failure mode the others can't.

  • Exchange connectors translate between your internal order model and whatever each venue's API demands — authentication, rate limits, message formats, and the reconnection logic you'll spend more time on than you'd guess.
  • A smart order router decides where each order goes based on price, liquidity, fees, and latency, and splits the order when splitting comes out cheaper.
  • A unified state store folds fills from every venue into one global view of positions and balances.
  • A risk layer enforces limits and invariants across the whole system, not venue by venue.
  • A reconciliation service keeps checking internal state against what each exchange is actually reporting.

Drop any one of these and the system still demos beautifully. Then it fails in production — quietly, at the exact seam where the missing piece should have been.

The router is not the hard part

Smart order routing gets all the attention because it's legible. Best price, lowest fee, deepest book — a clean optimization with a satisfying answer. You can build a router that picks the right venue in an afternoon, and honestly it'll feel like you've cracked the whole thing.

Then the order leaves, and that's where it gets interesting. The venue acks it but the fill is delayed. The connection drops mid-order and you've no idea whether it landed. You retry, the retry succeeds — and so did the original, it turns out. Now you're long twice what you meant to be, on a venue your state store hasn't heard from in 800 milliseconds. The router did its job perfectly. You still lost money.

Routing decides intent. Everything that happens after intent is where the real engineering lives.

One position, five opinions

The core problem with a multi-exchange system is that there's no single position to point at. There are five venue-level positions and one number you wish they added up to. By default, they don't. They only add up if you build them to.

Each venue reports fills on its own schedule. Messages show up out of order. A reconnection replays events you already processed. Without a deliberate consistency model — unified order IDs, event sequencing, fills you can apply twice without harm — the global position walks away from reality one message at a time. The nasty part is how quiet it is. Nothing alarms. The number just stops being true, and you find out later than you'd like.

More monitoring won't save you here. What saves you is deciding, out loud and on purpose, that the execution path is strongly consistent and the reporting path is allowed to be eventual — and then never, ever letting the risk layer read from the eventual side.

Reconciliation is a service, not a script

Most teams treat reconciliation as an end-of-day chore — a script that compares the books and emails a diff. In a system that trades continuously across venues, end-of-day is roughly twelve hours too late. By the time that script runs, you've already traded thousands of times on a position you believed and the venue never confirmed.

Reconciliation has to run continuously, and it has to know which side wins. The venue is the source of truth for what actually filled, full stop. When internal state and venue state disagree, the system moves toward the venue — automatically, right away, and loud enough that a human knows it happened instead of finding out next week.

Design for the venue that disappears

A single-exchange system can pretend the exchange is always there. A multi-exchange system doesn't get to. Run five venues and at any given moment one of them is degraded, rate-limiting you, or flat-out down. That's not an incident you write up. That's just the normal weather of running this thing.

The systems that survive treat a venue going dark as an input, not an exception:

  • Every order carries a client-side ID that survives reconnection and retry, so you can actually spot a duplicate when it happens.
  • Every venue has a defined degraded mode — route around it, not through it — and the router does that on its own, no human in the loop.
  • Every fill gets reconciled against the venue's own record before the risk layer is allowed to trust it.
  • You can flatten the position on one venue without touching the other four.

What you are actually buying

When a firm asks for a multi-exchange trading system, they think they're buying access to more liquidity. What they're really buying is the ability to hold one coherent position across venues that will never, ever coordinate on their behalf. The liquidity is the feature people ask for. The coherence is the system you actually have to build.

Routing is an optimization. Consistency is an architecture.

Anyone can connect to five exchanges. The line between a prototype and a real system is whether it still tells you the truth about your position when one of those five goes quiet. You can't bolt that property on later. It's the thing you design around first, or it never shows up at all.

Ignacio Montoya is a systems architect specializing in algorithmic trading infrastructure, financial systems, and digital asset platforms. He designs and operates systems where capital, risk, or execution are on the line — including multi-venue execution layers with unified state and continuous reconciliation.

If you are scaling execution across venues and the state is starting to drift, the conversation starts here.

See engagement model