« September 2007 | Main | November 2007 »

October 2007

Thursday, October 25, 2007

Hitting the nail on the head

Posted by Louis Lovas

In my continuing research into CEP vendor's use of the SQL as the language of choice for CEP applications I am reminded of a past industry initiative – but only in reverse! Not so long ago (relatively speaking of course) a new technology platform was being promoted by a number vendors - both large and small. That new technology was Application Servers. The core message and supporting technology was a backlash to the prior de facto standards (i.e client/server, host-based, etc.). The message from App Server vendors was separate your user interface from your business logic from your data. This message resonated well in the industry for many reasons. As the dust settled on the AppServer landscape a number of leading vendors (commercial and opensource) emerged – still with that same message. The value of separating UI's from business logic from data was clear then and still is today.  User Interfaces are like the fashion and electronics industry. Everyone wants the coolest (UI) gadgets available (witness Adobe Flex and Microsoft WPF). Business logic is central to operational effectiveness of the Enterprise. As for the data, once it was de-coupled it meant that business logic components could break out of their silos and access data from all parts of an organization. SOA has now superseded AppServers as the leading edge technology for commercial applications – but that separation message still rings true.

Now a new paradigm enters the arena – Complex Event Processing. Numerous vendors are parading their platform in front of prospects, customers, analysts and the market at large. The dust is far from settled but a couple of paradigms are starting to emerge. One builds on the archetype SQL database syntax which I'll refer to as an Event Query Language (EQL). The other model is one that builds upon classic development languages, those used for building complete applications. I'll refer to that as an Event Programming Language (EPL).

What I've begun to see from the CEP EQL (SQL-based, remember from my definition) vendors is an attempt to convince the industry that SQL (with its streaming extensions) is a language for building CEP applications. As such they are violating the separation rules (of business logic and data) that I outline above. Furthermore, the mashing together of business logic and data makes building different iterations of such applications and evolving those iterations over time tremendously cumbersome. Business logic is ever evolving and must be easily adaptable to changing business climates, competitive pressures and regulatory agencies.  The semantics of this logic should be easily articulated using an appropriate metaphor. An EPL by definition provides the syntactic wealth of expression for this purpose. A Rules-style metaphor is also a viable alternative. Apama's Monitorscript, a mature EPL and the Apama Scenario Modeler which provides that Rules style metaphor are well suited for the purpose of building complete CEP applications.

Not to be too inflammatory, SQL is well suited for filtering or enriching data, whether that is from traditional relational databases or from streaming data sources (via the streaming extensions). However, it's no more suitable for the semantic expression of business logic for CEP than it is (or ever was) for traditional commercial applications (in any deployed form; host-based, client/server, AppServer or SOA).

Looking at a few EQL examples you'll begin to see the pattern that I'm referring to. First, a simple, easy to understand example:

SELECT symbol, VWAP(price) FROM Ticker [RANGE 15 minutes]

From this simple SQL statement it's easy to see basic filtering and enrichment of the raw Ticker data. The result set is temporally organized into 15 minute buckets and grouped by symbol with a calculated value – VWAP. However, once you move beyond simple enrichment into complex condition detection the language becomes horribly unwieldy. Furthermore, once you add the need to manage state (see When all you have is a hammer everything looks like a nail) you've moved beyond unwieldy to undecipherable or more likely – impossible to implement.

Here's one example published by an EQL vendor:

CREATE VIEW vwap_stream (vwap_price) AS RStream(SELECT symbol, VWAP(price) FROM ticker [RANGE 15 minutes]);

CREATE VIEW vwap_outside_price(vwap_outside_count) AS SELECT COUNT(*) AS price_outside_vwap FROM ticker, vwap_stream [range 15 minutes] WHERE price - vwap_price > 0.02*price AND symbol = "MSFT";

CREATE VIEW trade_cond_stream (matching_row_count) AS SELECT COUNT(*) FROM ticker [RANGE 2 minute] RECOGNIZE ONE ROW PER MATCH PATTERN [S T] DEFINE S AS |price - PREV(price)| <= .05*PREV(price) AND symbol = "S&P" DEFINE T AS (price >= 1.05*PREV(price) AND symbol = "IBM") OR (price <= 1.02*PREV(price) AND symbol = "MSFT");

Here's the narrative of what it's intended to accomplish:

IF MSFT price moves outside 2% of MSFT-15-minute-VWAP FOLLOWED BY S&P moving by 0.5% AND IBM's price moves up by 5% OR MSFT's price moves down by 2% ALL within a 2 minute time window THEN [Signal] BUY MSFT and SELL IBM;

One note, the final part "THEN BUY MSFT and SELL IBM" is an exception. In the EQL example, there is no provision to take the BUY/SELL action only a means to signal it. Implementing the action is left as an exercise for the user.

A bit of commentary…

  • One minor point in the example (with respect to the narrative), there isn't a symbol called S&P on the U.S Equities market (assumed to be Ticker)   S&P is an acronym for the Standard and Poor's Index of 500 leading   companies in leading industries of the U.S. economy. One will not find that index on a Ticker;   it needs to be calculated at runtime like the VWAP value.
  • The vendor in question highlights   the terseness of the syntax. While it's clearly terse it's arguable if   terseness is goodness. Perl is quite terse and I've rarely heard anyone   singing its praises. In fact overly terse languages are often referred to as "write-only code". Meaning someone wrote it but no one can read it.
  • There are only a few recognizable idioms – most of the processing logic is pre-defined implicit behavior of the query processor. For an applications programmer who needs to both understand and control the code that implements true business logic this presents a most disconcerting situation. And certainly one that I would not feel comfortable owning.
  • Lastly is the violation of separation, multiple streams of data – both raw (Ticker) and derived (vwap_stream, etc.) are inseparable from the semantics of detecting a "BUY/SELL condition". 

Engineers (also known as programmers but that term has long since lost its glamour) still prefer languages with well known, tried and true vernacular. It gives them the control and wide-ranging expressiveness necessary to implement business logic and to have confidence in its correctness and behavior. State of the art optimizers and just-in-time compilers far outweigh any presumed benefits of a terse syntax. 

To conclude, the vendors of Application Servers (and now SOA) learned early on that separation of user interfaces from business logic from data was essential. The purveyors of EQL languages are violating this cardinal rule of Separation. Unfortunately, it was inevitable given the restrictiveness of the SQL syntax.

Saturday, October 20, 2007

To be or not to be (part 2)

Posted by John Trigg

As a follow-on to Richard's description of determinism and its importance to a CEP architecture and how it is an inherent characteristic of Apama, an interesting piece from Hans Gilde on the potential downfalls when determinism (and many other issues) are not considered in an event replay solution

Friday, October 19, 2007

The Opportunity for Business Intelligence: Is it Evolution or Revolution?

Posted by John Trigg

Some recent news on improvements and changes in approaches to BI architectures caught my eye. New technologies suggest that there maybe alternatives to traditional BI architectures (see the recent posting by Curt Monash on in-memory BI and Philip Howard of the Bloor Group on data warehouse appliances).  Though I am not intimately familiar with these new approaches, they seem to suggest the kind of blazing speed and application to some areas, (for instance in-memory analytics and activity monitoring) that overlap with the capabilities of CEP applications.

Maybe a new turf war is on the horizon.

In an article in DM Review earlier this year, Larry Goldman of AmberLeaf took on the daunting task of whether a new event processing technology is required to support a more responsive BI architecture. Larry posed a series of questions for determining whether you should go the CEP route or can make do with existing technology. In light of the new commentary referenced above, I’d like to augment/question some of the thoughts in the Goldman article to show that there are other criteria that argue for going the CEP platform route and that, as we are fond of saying, it’s not just about ‘feeds and speeds.’

(Excerpted from DM Review January 2007, Customer Intelligence: Event Processing Alphabet Soup) with comments interspersed:

1. Do I already have competencies in real-time messaging and streaming? If you do, you may not need an application [specifically designed for CEP}. If you don't, these products may decrease the learning curve.

Agreed that one may have competencies in real time messaging and streaming in terms of accepting the data and storing it, but are you processing it as it arrives?  You must also consider what benefit you can draw from handling this data ‘in flight’ vs. persist, query and analyze?

2. Can my reporting infrastructure handle operational BI, scaling to hundreds or thousands of users? If it cannot, these tools may be able to scale without forcing you to be a performance guru.

Can my infrastructure handle operational BI?  What is operational BI? I believe it’s the notion that traditional BI tools do great at mining vast quantities of captured, processed and transformed data to produce graphs, charts and metrics.  But how do you transform those graphs and charts and metrics into actions – this is what operational BI is looking at.  And this is where the intersection with BAM, CEP, and EDA comes into play.

3. Can users easily identify or specify events to track? If they can't, these tools may help you identify and monitor events without IT involvement.

Can users easily identify or specify events to track?  One of the things that I think is on the forefront in CEP is technology that can determine or detect meaningful patterns, rather than be programmed or setup to react to known/defined patterns.  We see this as a major wave for CEP evolution.

4. What does real time mean to me? How fast do I need to make decisions? Do I have the people or the processes to react in real time?

I don’t disagree with that.  This was central to the recent Roy Schulte presentation on BAM at the Gartner CEP conference in Orlando (September 2007).  Roy has created strata to show that there are different applications and verticals that have different perceptions of real-time, ranging from those measured in milliseconds (e.g. trading) to those measured in minutes and hours (e.g. supply chain management).

5. Perhaps there is a 5th question here and that is one that presents the unique capabilities of CEP to the audience.  Do I need to monitor event data across time windows (A and B happen within X of one another [or not])?  Do I need to monitor large numbers of permutations of each rule simultaneously?  Do I need to derive or infer activity from my event flows?  Traditional query based approaches struggle with these issues especially if the demand or query refresh rate is high.

As the world of traditional BI architecture evolves and users look to determine whether CEP based architectures are appropriate, it is important to note that there may be additional benefits to the use of CEP rather than just ‘trading up’. Why not look at the two technologies as two parts to a greater solution? Augmenting an existing BI infrastructure with CEP is one approach (in which one applies event processing logic to the streams before they are passed into the data warehouse/analysis layer) as is augmenting a CEP solution with analytics/KPI from an existing BI infrastructure. There are opportunities for both sets of technology and collaboration in this instance may help to clarify rather than obfuscate for the target user.


Monday, October 08, 2007

When all you have is a hammer everything looks like a nail

Posted by Louis Lovas

In the nascent work of CEP technology there are a lot of vendors beating their chests proclaiming they have the solution for Algorithmic trading in Capital Markets. In reviewing both marketing material and documentation of some of the vendors, I have to both laugh and shake my head. The scenarios and example code snippets tell me two things. 1) Many of the vendors are bluffing; they really don’t know anything about Algo trading. 2) The examples are clearly pilfered from other, more established vendors.

Algorithmic trading is a very diverse and broad field, and I don’t claim to be an expert in all aspects, no one person could make such a claim. With a focused lens, I’ll briefly explore a few aspects of the algorithmic-driven trading strategies. Strategies loosely fall into two categories 1) ‘Alpha’ or signal generation strategies and 2) Execution strategies.  Alpha in this context means one very simple thing – to make money. Execution on the other hand is about submitting and managing orders in-the-market to minimize market impact. In reality, many Alpha strategies will leverage one or more Execution strategies when their Alpha-seeking logic detects a buy/sell signal, so they are not mutually exclusive.

The goal of minimizing market impact and finding liquidity means that Execution strategies can easily be as complex, if not more complex than their Alpha counterparts. From a software development viewpoint, one rather large requirement of Execution strategies is the need to manage state. That state is the status of the outstanding orders - what amount has been filled, timers on how long they’ve been in play, rejections, cancellations, amendments, simple exception handling and the like. In other words, you don’t simply place an order in the market – you must continuously manage that order; that’s fundamental to an algorithmic trading solution.

We call attention to this because a ‘pilfered’ example of algorithmic trading logic was recently published by one vendor in an attempt to describe how their (SQL) CEP engine is well suited to algo trading. The narrative description, which by the way reads word-for-word from Apama marketing literature, describes an ‘Algo’ that monitors the movement of two stocks: “If the price of MSFT moves by more than 2% beyond a 15 minute window of its VWAP and IBM moves up by 5% …. THEN BUY MSFT and SELL IBM”.  Note that when we present this example it serves as a simple point of reference to illustrate the inherent logic flow and temporal nature of CEP and it is quickly followed by an expansion about how CEP must also be tied into the realities of the business scenario.

The reality is that in the case of trading, large volumes of shares are not simply dumped on the market.  You don’t just “buy” MSFT and “sell” IBM.  In the real world, such trades are executed in the market using numerous execution or clipping strategies. Shares might be sold based on the historic volume curve of the last three weeks of activity, or by participating in the market flow for the current day. It could perform best-execution across an aggregate book comprised of multiple market makers and/or Exchanges. Or the Execution strategy could follow the bid price for a 30 minute time period, then re-price any remaining quantity (where re-pricing could be another mini-algo in itself).  The point of this is to note that when a BUY/SELL signal is issued that marks the beginning not the end of the algorithmic trading strategy. The strategy must manage the state of each outstanding order responding to numerous conditions: the Exchange as it fills (or doesn’t fill) an order, re-pricing if the market moves or as time-in-market expires, cancellations, etc.

Given the inherent complexity of state management in this situation, I fail to see how a query language such as SQL can manage the requisite ‘state’ in such a circumstance. I’m sure with sufficient brain-power and developer effort, some arcane syntax for managing state can be contrived. But to paraphrase an old saying: “when all you have is SQL everything looks like a table”. Programmers building real world applications don’t (and simply cannot) limit themselves to think in arcane “SELECT blah, blah, blah” syntax. Their thought processes rely on the core intrinsic capabilities of programming languages: “for” loops, “while” loops and good old “if-then-else” statements. Without them building many different iterations of such applications and evolving those iterations over time, will prove tremendously cumbersome.

Constructing a full CEP application, whether in trading or in any other problem space is no different and requires attention to all aspects of event management – from filtering to actual handling of complex event objects.  Without a focus on the complete business requirements of the application, you may end up with content for a white paper, but you’ll not be creating applications that customers can actually use.