Code Sample

Thursday, October 25, 2007

Hitting the nail on the head

Posted by Louis Lovas

In my continuing research into CEP vendor's use of the SQL as the language of choice for CEP applications I am reminded of a past industry initiative – but only in reverse! Not so long ago (relatively speaking of course) a new technology platform was being promoted by a number vendors - both large and small. That new technology was Application Servers. The core message and supporting technology was a backlash to the prior de facto standards (i.e client/server, host-based, etc.). The message from App Server vendors was separate your user interface from your business logic from your data. This message resonated well in the industry for many reasons. As the dust settled on the AppServer landscape a number of leading vendors (commercial and opensource) emerged – still with that same message. The value of separating UI's from business logic from data was clear then and still is today.  User Interfaces are like the fashion and electronics industry. Everyone wants the coolest (UI) gadgets available (witness Adobe Flex and Microsoft WPF). Business logic is central to operational effectiveness of the Enterprise. As for the data, once it was de-coupled it meant that business logic components could break out of their silos and access data from all parts of an organization. SOA has now superseded AppServers as the leading edge technology for commercial applications – but that separation message still rings true.

Now a new paradigm enters the arena – Complex Event Processing. Numerous vendors are parading their platform in front of prospects, customers, analysts and the market at large. The dust is far from settled but a couple of paradigms are starting to emerge. One builds on the archetype SQL database syntax which I'll refer to as an Event Query Language (EQL). The other model is one that builds upon classic development languages, those used for building complete applications. I'll refer to that as an Event Programming Language (EPL).

What I've begun to see from the CEP EQL (SQL-based, remember from my definition) vendors is an attempt to convince the industry that SQL (with its streaming extensions) is a language for building CEP applications. As such they are violating the separation rules (of business logic and data) that I outline above. Furthermore, the mashing together of business logic and data makes building different iterations of such applications and evolving those iterations over time tremendously cumbersome. Business logic is ever evolving and must be easily adaptable to changing business climates, competitive pressures and regulatory agencies.  The semantics of this logic should be easily articulated using an appropriate metaphor. An EPL by definition provides the syntactic wealth of expression for this purpose. A Rules-style metaphor is also a viable alternative. Apama's Monitorscript, a mature EPL and the Apama Scenario Modeler which provides that Rules style metaphor are well suited for the purpose of building complete CEP applications.

Not to be too inflammatory, SQL is well suited for filtering or enriching data, whether that is from traditional relational databases or from streaming data sources (via the streaming extensions). However, it's no more suitable for the semantic expression of business logic for CEP than it is (or ever was) for traditional commercial applications (in any deployed form; host-based, client/server, AppServer or SOA).

Looking at a few EQL examples you'll begin to see the pattern that I'm referring to. First, a simple, easy to understand example:

SELECT symbol, VWAP(price) FROM Ticker [RANGE 15 minutes]

From this simple SQL statement it's easy to see basic filtering and enrichment of the raw Ticker data. The result set is temporally organized into 15 minute buckets and grouped by symbol with a calculated value – VWAP. However, once you move beyond simple enrichment into complex condition detection the language becomes horribly unwieldy. Furthermore, once you add the need to manage state (see When all you have is a hammer everything looks like a nail) you've moved beyond unwieldy to undecipherable or more likely – impossible to implement.

Here's one example published by an EQL vendor:

CREATE VIEW vwap_stream (vwap_price) AS RStream(SELECT symbol, VWAP(price) FROM ticker [RANGE 15 minutes]);

CREATE VIEW vwap_outside_price(vwap_outside_count) AS SELECT COUNT(*) AS price_outside_vwap FROM ticker, vwap_stream [range 15 minutes] WHERE price - vwap_price > 0.02*price AND symbol = "MSFT";

CREATE VIEW trade_cond_stream (matching_row_count) AS SELECT COUNT(*) FROM ticker [RANGE 2 minute] RECOGNIZE ONE ROW PER MATCH PATTERN [S T] DEFINE S AS |price - PREV(price)| <= .05*PREV(price) AND symbol = "S&P" DEFINE T AS (price >= 1.05*PREV(price) AND symbol = "IBM") OR (price <= 1.02*PREV(price) AND symbol = "MSFT");

Here's the narrative of what it's intended to accomplish:

IF MSFT price moves outside 2% of MSFT-15-minute-VWAP FOLLOWED BY S&P moving by 0.5% AND IBM's price moves up by 5% OR MSFT's price moves down by 2% ALL within a 2 minute time window THEN [Signal] BUY MSFT and SELL IBM;

One note, the final part "THEN BUY MSFT and SELL IBM" is an exception. In the EQL example, there is no provision to take the BUY/SELL action only a means to signal it. Implementing the action is left as an exercise for the user.

A bit of commentary…

  • One minor point in the example (with respect to the narrative), there isn't a symbol called S&P on the U.S Equities market (assumed to be Ticker)   S&P is an acronym for the Standard and Poor's Index of 500 leading   companies in leading industries of the U.S. economy. One will not find that index on a Ticker;   it needs to be calculated at runtime like the VWAP value.
  • The vendor in question highlights   the terseness of the syntax. While it's clearly terse it's arguable if   terseness is goodness. Perl is quite terse and I've rarely heard anyone   singing its praises. In fact overly terse languages are often referred to as "write-only code". Meaning someone wrote it but no one can read it.
  • There are only a few recognizable idioms – most of the processing logic is pre-defined implicit behavior of the query processor. For an applications programmer who needs to both understand and control the code that implements true business logic this presents a most disconcerting situation. And certainly one that I would not feel comfortable owning.
  • Lastly is the violation of separation, multiple streams of data – both raw (Ticker) and derived (vwap_stream, etc.) are inseparable from the semantics of detecting a "BUY/SELL condition". 

Engineers (also known as programmers but that term has long since lost its glamour) still prefer languages with well known, tried and true vernacular. It gives them the control and wide-ranging expressiveness necessary to implement business logic and to have confidence in its correctness and behavior. State of the art optimizers and just-in-time compilers far outweigh any presumed benefits of a terse syntax. 

To conclude, the vendors of Application Servers (and now SOA) learned early on that separation of user interfaces from business logic from data was essential. The purveyors of EQL languages are violating this cardinal rule of Separation. Unfortunately, it was inevitable given the restrictiveness of the SQL syntax.