Hitting the nail on the head
Posted by Louis Lovas
In my
continuing research into CEP vendor's use of the SQL as the language of choice
for CEP applications I am reminded of a past industry initiative – but only in reverse! Not so long ago (relatively speaking of course) a new technology
platform was being promoted by a number vendors - both large and small. That
new technology was Application Servers. The core message and supporting
technology was a backlash to the prior de facto standards (i.e
client/server, host-based, etc.). The message from App Server vendors was separate your user interface from your
business logic from your data. This message resonated well in the industry for
many reasons. As the dust settled on the AppServer
landscape a number of leading vendors (commercial and opensource)
emerged – still with that same message. The value of separating UI's from
business logic from data was clear then and still is today. User Interfaces are like the fashion and
electronics industry. Everyone wants the coolest (UI) gadgets available
(witness Adobe Flex and Microsoft WPF). Business logic is central to
operational effectiveness of the Enterprise. As for the data, once it was de-coupled it meant that business logic components
could break out of their silos and access data from all parts of an
organization. SOA has now superseded AppServers as
the leading edge technology for commercial applications – but that separation message still rings true.
Now a new
paradigm enters the arena – Complex Event Processing. Numerous vendors are parading
their platform in front of prospects, customers, analysts and the market at
large. The dust is far from settled but
a couple of paradigms are starting to emerge. One builds on the archetype SQL
database syntax which I'll refer to as an Event Query Language (EQL). The other model is one that builds upon
classic development languages, those used for building complete applications.
I'll refer to that as an Event Programming Language (EPL).
What I've
begun to see from the CEP EQL (SQL-based, remember from my definition) vendors
is an attempt to convince the industry that SQL (with its streaming extensions)
is a language for building CEP applications. As such they are violating the separation
rules (of business logic and data) that I outline above. Furthermore, the
mashing together of business logic and data makes building different iterations
of such applications and evolving those iterations over time tremendously
cumbersome. Business logic is ever evolving and must be easily adaptable to changing
business climates, competitive pressures and regulatory agencies. The semantics of this logic should be easily
articulated using an appropriate metaphor. An EPL by definition provides the
syntactic wealth of expression for this purpose. A Rules-style metaphor is also
a viable alternative. Apama's
Monitorscript, a mature EPL and the Apama Scenario Modeler which provides that Rules style
metaphor are well suited for the purpose of building complete CEP
applications.
Not to be
too inflammatory, SQL is well suited for filtering or enriching data, whether
that is from traditional relational databases or from streaming data sources
(via the streaming extensions). However, it's no more suitable for the semantic
expression of business logic for CEP than it is (or ever was) for traditional
commercial applications (in any deployed form; host-based, client/server, AppServer or SOA).
Looking at
a few EQL examples you'll begin to see the pattern that I'm referring to.
First, a simple, easy to understand example:
SELECT symbol, VWAP(price)
FROM Ticker [RANGE 15 minutes]
From this
simple SQL statement it's easy to see basic filtering and enrichment of the raw
Ticker data. The result set is temporally
organized into 15 minute buckets and grouped by symbol with a calculated value –
VWAP. However, once you move beyond simple enrichment into complex condition
detection the language becomes horribly unwieldy. Furthermore, once you add the
need to manage state (see When all you have is a hammer everything looks like a nail)
you've moved beyond unwieldy to undecipherable or more likely – impossible to
implement.
Here's one
example published by an EQL vendor:
CREATE VIEW vwap_stream (vwap_price) AS RStream(SELECT
symbol, VWAP(price) FROM ticker [RANGE 15 minutes]);
CREATE VIEW vwap_outside_price(vwap_outside_count) AS SELECT
COUNT(*) AS price_outside_vwap FROM ticker, vwap_stream [range 15 minutes] WHERE price - vwap_price > 0.02*price AND symbol =
"MSFT";
CREATE VIEW trade_cond_stream (matching_row_count)
AS SELECT COUNT(*) FROM ticker [RANGE 2 minute]
RECOGNIZE ONE ROW PER MATCH PATTERN [S T] DEFINE S AS |price - PREV(price)|
<= .05*PREV(price) AND symbol = "S&P" DEFINE T AS (price >=
1.05*PREV(price) AND symbol = "IBM") OR (price <= 1.02*PREV(price)
AND symbol = "MSFT");
Here's the
narrative of what it's intended to accomplish:
IF MSFT price moves outside
2% of MSFT-15-minute-VWAP FOLLOWED BY S&P moving by 0.5% AND IBM's price
moves up by 5% OR MSFT's price moves down by 2% ALL within a 2 minute time
window THEN [Signal] BUY MSFT and SELL IBM;
One note, the
final part "THEN BUY MSFT and SELL IBM" is an exception. In the EQL
example, there is no provision to take the BUY/SELL action only a means to
signal it. Implementing the action is left as an exercise for the user.
A bit of
commentary…
- One minor point in the example
(with respect to the narrative), there isn't a symbol called S&P on
the U.S Equities market (assumed to be Ticker)
S&P is an acronym for the Standard and Poor's Index of 500 leading
companies in leading industries of the U.S. economy. One will not find that index on a Ticker;
it needs to be calculated at runtime like the VWAP value.
- The vendor in question highlights
the terseness of the syntax. While it's clearly terse it's arguable if
terseness is goodness. Perl is quite terse and I've rarely heard anyone
singing its praises. In fact overly terse languages are often referred to
as "write-only code". Meaning someone wrote it but no one can
read it.
- There are only a few
recognizable idioms – most of the processing logic is pre-defined implicit
behavior of the query processor. For an applications programmer who needs
to both understand and control the code that implements true business
logic this presents a most disconcerting situation. And certainly one that
I would not feel comfortable owning.
- Lastly is the violation of
separation, multiple streams of data – both raw (Ticker) and derived (vwap_stream, etc.) are inseparable from the semantics
of detecting a "BUY/SELL condition".
Engineers (also known as programmers
but that term has long since lost its glamour) still prefer languages with well
known, tried and true vernacular. It gives them the control and wide-ranging
expressiveness necessary to implement business logic and to have confidence in
its correctness and behavior. State of the art optimizers and just-in-time
compilers far outweigh any presumed benefits of a terse syntax.
To conclude, the vendors of
Application Servers (and now SOA) learned early on that separation of user
interfaces from business logic from data was essential. The purveyors of EQL languages are violating
this cardinal rule of Separation. Unfortunately, it was inevitable given the restrictiveness of the SQL
syntax.