Market data ingestion: the foundation nobody respects

Market data ingestion is the part of a trading system that receives, normalizes, and distributes price and order-book data from venues, and turns a stream of exchange messages into a single consistent view of the market. It is the foundation everything stands on — and it is treated as plumbing right up until the moment it is silently wrong and the whole system is trading on a market that is not there.

Nobody puts their ingestion layer in the demo. The attention goes to the signal, the strategy, the clever execution. But all of those read from the same place — the internal picture of the market that ingestion builds for them. If that picture is stale or gapped or just quietly inconsistent, the smartest signal in the world is reasoning about a market that isn't there. "Garbage in, garbage out" sounds like a cliché until you watch it happen. Here it's just the failure mode, plainly stated.

The job is harder than "receive the feed"

On paper it reads like a solved problem: connect to the websocket, parse the messages, done. The hard part is everything the happy path quietly skips.

  • Normalization — every venue speaks its own dialect: different message formats, different symbol naming, different update semantics. Downstream, everyone wants one model that hides all of that.
  • Sequencing — updates show up out of order, or with holes in them, and you still have to rebuild the book correctly.
  • Gap detection — miss one message and your book is wrong. The system has to notice and resync rather than plow ahead pretending nothing happened.
  • Backpressure — when data comes in faster than you can chew through it, you want the system to degrade on purpose, not silently fall behind.

Every one of those is a spot where your internal view can drift away from the real market. And that's the nasty part: drift in market data doesn't announce itself. The numbers still look like numbers.

The order book is a reconstruction, not a fact

Most venues don't hand you the order book. They hand you a stream of changes to it, and you keep the book alive locally by applying each update in order. So your book is only ever as correct as your sequencing. Apply an update out of order, drop one, or process a snapshot and a delta in the wrong order, and your copy starts drifting from the venue's. That's when you see the tells: a bid sitting above the ask, liquidity that isn't there, prices that were never real.

Which is the whole reason gap detection isn't optional. The moment a sequence number jumps, the local book is suspect — that's it, you can't trust it anymore. A decent ingestion layer catches the jump, stops trusting the book, pulls a fresh snapshot and resyncs. The alternative is feeding a corrupted picture to a signal engine that has no way of knowing it's corrupted, which is exactly the situation you never want to be in.

Stale data is worse than no data

A system that knows it has no data can stop and wait. A system that thinks it has fresh data, when in fact the feed froze two seconds ago, will keep trading confidently into a market that already moved on without it. An outage isn't really the scary part — you notice an outage. The scary part is the silent staleness that looks exactly like a healthy day at the office.

So ingestion has to carry a sense of liveness right next to the data: how fresh is this, when did it last tick, is this venue even still talking to me. And anything reading from a feed should be able to ask "is this current?" and get an honest answer back. Acting on stale data isn't a gentler version of acting on no data — it's worse, because it shows up wearing the costume of confidence.

Why it is the foundation

Ingestion sits underneath everything else. The signal engine reads from it, the execution layer prices against it, risk evaluates exposure off it, monitoring compares back to it. Every one of those inherits the quality of the market view ingestion hands them, and none of them can ever be more correct than the data they're fed. That ceiling is real, and it's set down here at the bottom.

You cannot build a correct system on top of an incorrect view of the world.

That's why the boring layer deserves the same discipline as the glamorous ones. Your signal can be brilliant and your execution can be lightning fast, but if the market data underneath is wrong, all you've built is a very precise opinion about a fiction. Treat the foundation with respect and everything above it at least gets a shot at being right. Skimp on it and nothing above it is worth trusting.

Ignacio Montoya is a systems architect specializing in algorithmic trading infrastructure, financial systems, and digital asset platforms. He builds market data ingestion that keeps the internal view of the market consistent with reality — the foundation every other component depends on.

If your signals are good but the system still misbehaves, the data layer is the place to look — and the conversation starts here.

See engagement model