Insights · April 2026 · 6 min read

Why most trading systems fail

Most trading systems don't fail because someone got the signal wrong. They fail because of how they were put together. Everyone stares at the strategy. The damage is usually somewhere else — down in the wiring, where nobody's looking.

When a system starts losing money, the first place people look is the strategy: the model, the features, the backtest that looked so clean. I've sat through those post-mortems. The signal is almost never the culprit. The money leaks at the seams — the handoffs between components, the moment two parts of the system stop agreeing about what's true, the failure nobody simulated because it never showed up in the historical data.

The seams are the system

A trading system isn't a model with some plumbing bolted on. It's a chain of independent parts that have to agree with each other in real time. Market data comes in. A signal gets generated. An order goes out. A fill comes back. The position updates. Risk gets evaluated. That loop runs through at least five separate components, and every one of them keeps its own state and breaks in its own way.

Every join between those parts hides an assumption. The signal engine assumes the position it's reading is current. The execution layer assumes the order ID it just minted is unique. The risk layer assumes a fill arrives as one clean event. On a calm day all of that holds, and you'd never know the assumptions were there. Then a venue times out, or a retry lands a second copy of an order, or a socket reconnects and dumps a thousand buffered messages onto the bus at once — and the assumptions give way without making a sound.

Passing your tests doesn't prove much here. It mostly means you haven't hit the thing that breaks it yet.

State consistency is not optional

The failure I run into most is position drift. Your internal state says you're +10. The exchange thinks you're +12. The system keeps making decisions on +10. A couple of minutes later the exchange says +15, and now you're carrying more than you think you are — and the risk layer, reading that same wrong internal number, has no idea anything's off.

Calling that a bug undersells it — it's a hole in the design. Position state gets touched by at least three things: the execution layer that emits fills, the store that adds them up, and the risk layer that reads the total. If you never decided how those three stay in agreement, they won't. And the gap doesn't stay small. It compounds, quietly, until someone notices the number is wrong.

In the execution path, strong consistency isn't a preference. It's the whole job. You can afford to be eventually consistent in the places where stale data only costs you a slightly out-of-date dashboard — monitoring, reporting, the analytics nobody trades off. Blur that line, let the staleness drift into a path that moves capital, and it stops being a visibility problem and starts being a money problem.

Risk as a check is a failure mode

A pre-trade risk check fires once, per order. It looks at that order on its own — size inside the limit, notional under the cap, instrument allowed — waves it through, and considers its work done.

The trouble is that the next order shows up 30 milliseconds later, and another behind it. Each one passes its own check in isolation, and together they sail straight past the limit those checks were meant to defend — because no single check knows about the orders firing alongside it, or the position being updated underneath it, or the exposure stacking up across venues while it runs.

That's the gap between a check and a layer. A check catches the obvious mistake right in front of it. A layer holds a rule across the whole system, over time, with memory. You find out which one you actually built on the day you need to pull the system out of the market — when the check happily lets ten orders through because each one looked fine on its own, and a real risk layer would have slammed the door after the second.

Failures are not edge cases

Venues disconnect. Orders get acknowledged and then the fill never comes. A retry quietly produces a duplicate. None of this is exotic — it's the normal weather of running infrastructure across systems you don't control. Design for the happy path and you haven't built a trading system, you've built a demo that hasn't been caught out yet.

The ones that hold up assume failure as the baseline:

Every order carries a client-side ID that survives a retry.
Every fill is matched to an order intent before it ever touches the position.
Every position update is reconciled against the venue's own record, continuously.
Every component knows what to do when its dependency dies — and the answer is never "keep going and hope."

The fix is architectural, not tactical

When something breaks, the reflex is to reach for the nearest patch: handle the symptom, bolt on a check, restart the service, move on. It buys you a week. Then it breaks somewhere else, you patch that too, and a year later the patches have piled up high enough that nobody can hold the whole thing in their head anymore.

The fix that actually lasts is structural, and it's less exciting than it sounds. Draw the boundaries on purpose. Make the seams things you can see, instead of things you discover at 3am. Decide, out loud, where state has to be strong and where eventual is fine. Put the risk layer somewhere it can stop the system, not just narrate it. Build around the failures you already know are coming — and the day they show up stops being an incident and becomes a Tuesday.

Signals are rented. Architecture is owned.

Strategies come and go. So do venues, and the rules you trade under. The system still standing after all that churn is the one built around the things that don't change — the invariants — rather than whatever the setup happened to be the month it was written. That system costs more up front. It's also the only kind I'd want to be running when something finally goes wrong.

Ignacio Montoya is a systems architect specializing in algorithmic trading infrastructure, financial systems, and digital asset platforms. He designs and operates systems where capital, risk, or execution are on the line.

If this describes a system you are building or operating, the conversation starts here.

See engagement model