« BAM Myth #4: Limit BAM to Monitoring Simply KPIs | Main | Thank You Gartner - Event Processing Conference #1 In the Books »

Monday, September 17, 2007

To be or not to be (deterministic) ...

Posted by Richard Bentley

A fundamental requirement for an effective CEP engine is performance. Performance can be measured in many ways - the most common of which is throughput, or events per second (eps). Marketing literature from CEP vendors abounds with increasingly impressive eps numbers, but the lack of accepted industry benchmarks for CEP makes it difficult to understand how these numbers translate to real-world use cases - in terms of whether you can obtain these figures whilst doing non-trivial work with the events once you've got them ...

The twin of throughput is latency; how quickly can you process an event? In Algorithmic Trading for example, latency might be measured as the elapsed time from a market data event coming into the CEP engine to an order being sent out, triggered by the market data event. With the increasing deployment of Algo Trading engines in Capital Markets, fractions of a millisecond matter; the first to detect the opportunity and get the order to market wins the prize. Ultra-low latency may not make for good headlines of the "bazillion events per second" variety, but is often of more relevance to the success or failure of a CEP application.

In our experiences over the last 8 years developing a CEP engine we have seen ceiling latency requirements come down hugely – the benchmark in Algo Trading for example now being single digit milliseconds including any adapter transformation to normalise messages from Market Data Feeds and Order Management Systems. At the same time we have seen the complexity of trading algorithms increasing. Testing of algorithms is becoming more and more important, but testing can never cover all the corner cases that can occur when an algorithm is released into the white water of real market data, cf the turmoil in the markets caused by the events of the last few weeks …

So when an algo does not seem to be behaving as it should, how can we diagnose what’s going on? We could get the algo to generate detailed logs – but this is hardly going to help us meet our latency requirements described above, nor give us an easy way to recreate the situation in a controlled environment. We want our CEP engine to give us ultra low latency, but we also need full disclosure to aid later analysis and optimisation.

The Apama CEP Engine solves this with a hybrid approach; rather than generate extensive logging in the application, the Apama platform can capture every event input to the algo in an integrated event store, with miniscule additional overhead. This “replay log” can then be played back in a test environment, at different playback speeds, with application logging turned way up to investigate exactly what an algorithm did and why it did it. It can also be used as a means to validate a fix once the algo has been tweaked.

The playback approach relies on a fundamental property of the CEP engine, namely determinism. The engine will always produce the same results, in the same order, when presented with the same stream of events. This needs to extend to temporal logic – allowing behaviour to be accurately reproduced even with data played 1000 times faster than it occurred in reality.

With this kind of determinism we can have our cake and eat it – and gain the ultra-low latencies required of a CEP engine whilst enabling full diagnostics for application refinement.


TrackBack URL for this entry:

Listed below are links to weblogs that reference To be or not to be (deterministic) ...:

<-- end entry-individual -->