Why most trading systems fail
Most trading systems do not fail because the signal was wrong. They fail because the architecture was wrong. The signal is where the attention goes — the architecture is where the losses hide.
When a trading system loses capital, the first instinct is to look at the strategy. The model. The features. The backtest. But in practice, the places where trading systems bleed are almost never the signal. They are the seams — the transitions between components, the moments where state becomes inconsistent, the failures that were never modeled because they did not show up in the historical data.
The seams are the system
A trading system is not a model. It is a coordinated architecture. Market data arrives. Signals are generated. Orders are sent. Fills come back. Positions update. Risk is checked. The full loop touches at least five independent components, each with its own state and its own failure modes.
Every seam between these components is a place where an assumption lives. The signal engine assumes position state is up to date. The execution layer assumes the order ID is unique. The risk layer assumes fills are atomic. When the system runs smoothly, these assumptions hold. When the system is stressed — when a venue times out, when a retry succeeds on the second try, when a reconnection floods the event bus — the assumptions break silently.
A system that works in testing is not a system that works. It is a system that has not yet met the conditions that break it.
State consistency is not optional
The most common failure I see is position drift. Internal state says +10. The exchange says +12. The system keeps trading on +10. Two minutes later, the exchange says +15. The firm is now long more than it believes, and the risk layer — which reads from internal state — cannot catch it.
This is not a bug. It is a design omission. Position state is shared between at least three components: the execution layer (which produces fill events), the state store (which integrates them), and the risk layer (which reads them). Without an explicit consistency model, each component is free to drift, and the drift compounds.
Strong consistency in the execution path is non-negotiable. Eventual consistency can be used for monitoring, reporting, analytics — places where stale data costs visibility but not capital. Mixing them up costs money.
Risk as a check is a failure mode
A pre-trade risk check runs once. It validates the order in isolation: size within limits, notional under cap, instrument allowed. It passes. The order executes. The check's job is done.
But the next order arrives 30ms later. And the order after that. Each passes its own check individually. Cumulatively, they breach the limit the checks were supposed to enforce, because the check does not know about the orders that just happened, or the positions currently being updated, or the exposure building up in parallel across venues.
A check catches obvious errors. A layer enforces invariants. The difference shows up on the day the system needs to be stopped — when the check lets ten orders through in a row because each one passed, and the layer would have stopped after the second.
Failures are not edge cases
Exchanges disconnect. Orders get acknowledged but fills never arrive. Retries produce duplicates. These are not edge cases. They are the default condition of distributed trading infrastructure. A system that is designed assuming happy paths is not a system — it is a prototype.
The systems that survive assume failure as the baseline:
- Every order has a client-side ID that survives retry.
- Every fill is matched against an order intent before being applied to state.
- Every position update is reconciled against the venue's own record, continuously.
- Every component has a defined behavior when its dependency fails — and that behavior is never "keep going and hope."
The fix is architectural, not tactical
When a trading system fails, the instinct is tactical: patch the symptom, add a check, restart the service. This works for a while. Then the system fails in a different place, and the patches accumulate until the architecture can no longer be reasoned about.
The durable fix is architectural. Draw the boundaries. Make the seams explicit. Decide where state is strong and where it is eventual. Place the risk layer where it can enforce, not observe. Design the system around the failure modes it will actually encounter — and when those failures arrive, the system handles them without losing capital.
Signals are rented. Architecture is owned.
Strategies change. Venues change. Regulations change. The system that survives these changes is the one that was built around invariants rather than around the current setup. That system is not cheaper to build. But it is the only kind that lasts.
Ignacio Montoya is a systems architect specializing in algorithmic trading infrastructure, financial systems, and digital asset platforms. He designs and operates systems where capital, risk, or execution are on the line.
If this describes a system you are building or operating, the conversation starts here.
See engagement model