Louis Lovas

Saturday, June 14, 2008

Successful languages - show me the code please

The SIFMA show has just ended. It was a great success, we announced a few new partnerships and our latest version, Apama 4.0. We had a constant flow of people coming by our booth which of course increased dramatically at 3:00 pm each day when we opened the bar. We also had a magician who mesmerized the crowd with his sleight-of-hand exploits.  While our fearless leader John Bates had a seemingly endless stream of journalist and analyst interviews, myself and my colleagues did a three day tour of (booth) duty.  In addition to showing off the new Apama 4.0 Studio, we had demonstrations of our various Accelerators in Pricing, Smart Order Routing, Market Surveillance, FX Aggregation and good ol' Algo Trading.

SIFMA is a true technology show, vendors large and small had dazzling displays of their wares - hard and soft. For the average attendee it was the quintessential kid in a candy store experience. It was truly geek heaven.  My compatriots and I fielded a wide range of questions about the Apama platform, from the basic explanation of CEP in Capital Markets to how Apama is deployed in a wide range of asset classes.  A consistent theme I heard from many of the attendees coming by our booth was  "show me some code". The challenge of course was explaining a programming language in five minutes. It made me think of a recent blog by Mark Tsimelzon of Coral8 on what makes a programming language successful.

One of the most prominent characteristics of CEP, yet one of the most contentious is language. Mark's reference to slashdot links to a well written tutorial by Daniel Pietraru on the success of general-purpose programming languages and the likelihood of newcomers (i.e. Ruby, Python) unseating the mainstays (i.e. Java, C++). In a nutshell the answer to that question is an emphatic no. Interestingly, there are numerous aspects of this research on language that are applicable to CEP.

Event Processing Languages come in many shapes and sizes. They build upon prior art deriving their base syntax and semantic logic from a variety of older languages, whether that's SQL or general purpose languages like Java or C++. The marketing departments of all vendors trumpet the merits of their chosen course. In the past, I certainly have not been too quiet about this particular topic myself (although I have soften a bit lately).  The very fact that the language of CEP is so fractured among vendors is a clear sign of CEP's lack of maturity.  Yet it's what makes us unique, it's our special sauce and quite frankly gives this blog community the level of interest it enjoys. Standardization and commoditization has that Borg sense of sameness that I find a bit dull and boring (oh I've probably just incited a riot by making that statement).

There are a number of attributes that make for a successful programming language. The mainstays of C, C++, Java and (now) C# hold a commanding 49.915% popularity rating. Daniel Pietraru ascribes this success to a number of rationale, the first being similar syntax. A recognizable syntax is what gives languages that sense of familiarity as Mark Tsimelzon so thoughtfully pointed out.  But I believe there is one aspect of a programming language that is perhaps the most important of all, that of readability. Readability plays a huge role in the long term survival of not only a language but the software projects built in it.  The Java language, as shown in the popularity chart of this tutorial,  holds a commanding lead (20.176%) over all other general purpose languages. This success is arguably due to a natural evolution, a Darwinian survival of the fittest so to speak (or most popular as the case may be). It's authors wisely pruned the obtuse and wildly unreadable aspects of C++.  It's interesting to note where SQL (or PL/SQL to be specific) ranks, you'll have to check that for yourself.

How does all this relate to the language of CEP? The same set of characteristics still ring true. A familiar recognizable syntax is important yet readability is vitally important. This is the approach we at Apama have taken. Our EPL, MonitorScript is quite purposed for the event processing paradigm yet has the familiar readability of those mainstay languages like Java.  MonitorScript is predominately an imperative programming language with declarative constructs purposed for the event paradigm. Yet even the declarative on all Tick(symbol="IBM") is an easily understood concept.  In a short five minute session a Java, C++ or C# programmer will be able to not only see a familiar syntax but also a readable semantic of even a reasonably complex MonitorScript application.  This makes the first impression of Apama and our EPL a good one.

 

Monday, June 02, 2008

CEP Maturity Models

                                    <p>CEP Maturity Models</p>                                                                                                            


With the contentious debate on CEP maturity brewing, Tim Bass is using the Garner Hype Cycle to indicate that CEP is in the Technology Trigger phase - certainly an arguable point. Considering the next two phases
in the Gartner Hype Cycle, are "Peak of Inflated Expectations" and "Trough of Disillusionment" I'm not so sure we are at such an early phase. Those two terms give the illusion CEP is very nascent and I personally don't think so. One's view of the maturity of a software platform is predicated on past experiences and use cases. In my most recent blog on high availability the idea of a mandate on maturity was a point I was attempting to convey in the notion of Lost Opportunity vs. Loss-Less.  A Loss-Less use case clearly requires a mature platform to support such a business critical function.

As with most software infrastructure platforms, CEP being no exception, one can describe or categorize multiple maturity models. What a platform has and what a platform does.   

A CEP platform has (or should have) development tools, deployment & management tools, connectivity adapters, database adapters, dashboards, a robust architecture for reliability, scalability and high availability.  All of these infrastructure capabilities have a maturity life cycle within a CEP platform and are paramount to customer IT organizations responsible for the care and feeding of applications.
What a CEP platform does refers to what sort of applications one can build with the technology. Can one just do simple pattern detection or infinitely more complex analytics?

These two categories have independent maturity life cycles but are inter-related. To illustrate consider the following fictitious example. Say we have an application whose sole purpose is to count. What this application does is count an integer number continually for each connected client. It simply needs to count upwards - 7x24 without missing a beat, failure to count means catastrophic business failure with financial and legal ramifications. It starts out counting for just a few clients but over time grows to support thousands or tens of thousands of clients.   In order to support the deployment of our application, we need a platform with the reliability demanded of a 7x24 operation. A high availability architecture is mandated in the event of a failure and it must be able
to scale as the business growsThis simple example illustrates the point that even the simplest application, if critical to the business needs a mature platform.  Just substitute a counting application for any more complex use-case - Smart Order Routing, Dark Pool trading, Fraud detection, etc. 

What a CEP platform has tracks independently of what it is capable of doing. The maturity of what a CEP platform has is much more quantifiable, we just need to look at other platforms such as enterprise class Application Servers, Message Systems and Database systems and for the most part follow suit. What CEP does, is likely what Tim is referring to when he states we're in the Technology Trigger phase.  This maturation process will drive much more slowly and will expand as new use cases are discovered.



Wednesday, May 14, 2008

High Availability a mandate for success

                        <p>High Availability: a mandate for success</p>                                                                        


As CEP has moved into the mainstream of mission critical business applications it brings rise to a whole set of new challenges for the purveyors of CEP technologies.  There is a mandate for software platforms that want to play in the enterprise class big leagues.  That mandate encompasses many capabilities over and above the core functionality of a platform, be it CEP or any other software technology. Witness the maturity of common infrastructure platforms today, whether it's messaging, databases, application servers or other, they all share a set of capabilities or ilities necessary to be considered in the enterprise class.

A software platform typically starts its life with an initial appeal of either providing greater business value and/or improved developer productivity. As it moves from being the cool new toy on the bleeding edge, employed for new and emerging use cases, it starts to get more widely deployed in critical business functions. As this maturity occurs, the IT user community wants... demands that the core set of its capabilities be broadened to better support the deployed production environment.  Two primary areas are Management and Monitoring and High Availability.

As a brief description, a highly available platform is one that is tolerant of faults.  Faults or failures can happen to either software or hardware components. A system that can detect failures and recover from them is considered to be highly available. There is a tremendous amount of academic research and technology focused on fault tolerant systems. Availability is the tolerance to failure achieved through redundancy and/or persistence. My intent here is to simply provide a pragmatic perspective to the challenges and solutions for high availability in CEP.

Lost Opportunity vs. Loss-Less

As a broad categorization one can divide CEP applications into two groups.  The first group is rather broad itself and encompasses use-cases such as sensor monitoring, business or system activity monitoring and in Capital Markets certain types of algo trading. From the perspective of high availability I've lumped all of these (and similar) uses cases together because of their failure context.  If for any reason there is a failure that brings down the CEP application (or a key component thereof), the down time represents lost opportunity. While the system is down one loses the opportunity for... an Algo to make a trade, for a fraudulent trade to be detected, for a threat condition to be recognized and so on and so forth.  When the system is once again operational, the lost data is largely irrelevant as in the algo trading case. Market data is very time sensitive and as time advances the value of that data diminishes eventually reaching zero.  In other cases, lost data could be recovered from persistent sources (more on this later).  The overall point of this category of CEP use cases is that failures do not represent catastrophic business failure but a moment lost.

The second categorization tips the scale in the other direction. The classic examples in Capital Markets are pricing engines, crossing engines, Dark Pools and Smart Order Routers. In these examples the CEP application is, in effect the trading destination for client Order flow.  Institutional investors (the clients) place orders to buy or sell large quantities of an asset (stock, currency, options, etc.) with these trading systems. There are both legal and regulatory requirements to ensure the transaction is completed accurately, in the best possible manner (i.e best execution) and without loss of information.  This type of CEP application mandates a loss-less model. It is paramount incoming streams of data (i.e. client orders) are not lost for any reason. Failures of CEP systems deployed for these purposes do represent catastrophic business failure and all the ramifications that can lead to.  As such highly available architectures become a mandate for deployment into the production arena.

High Availability Variants

High Availability architectures fall on a continuum, starting from the very basic ability to recover application state to continuous operation with seamless fail over. Choosing the right architecture for any particular deployment is based on the categorization I outline above (Lost Opportunity and Loss-Less) and the dollar-value cost of up time.

A basic premise of all high availability architectures is the notion of redundancy. Having redundant components provides that safeguard against failure. If a component (software or hardware) fails, the secondary or backup instance assumes a primary role.  In the variants I outline below (cold, warm, continuous) the chief difference between them is the fail over time and consequently downtime duration.

Persistence is another component in a high availability deployment. Persistence for recovery of application state. This may sound somewhat obvious, but it's typically not transparent to CEP applications nor is disk persistence the only means of recovery. Many of the external services CEP applications connect to include a measure of persistence and therefore provide a level of recovery themselves. Reliable message systems, distributed cache engines and in Capital Markets FIX servers provide a means of recovering application state.


Cold Standby

A cold standby deployment provides availability via complete replication of the primary system (both hardware and software). The replica or backup server(s) is considered cold because they do not actively or passively participate in the business function of the primary system. In fact, they could actually be powered off.  In the event of a failure of the primary system, the standby machine assumes control. The process by which that fail over occurs could be one of many. In the simplest case, human intervention manually performs the fail over. This could entail switching network and storage cables, and the like. In a more automated case, the use of clustering software would be employed. These products provide automated detection of faults and automatic fail over of network, SAN and application components (i.e. CEP engines) to backup servers.

A few noteworthy points about cold standby.

     
  1. Fail over time can be quite lengthy, measurable in the tens of minutes at best. If the fail over mechanism is manual the first order of business is a means to detect a failure in the first place. Obviously the application stops working, but having a means to monitor runtime health is always a first-line of defense against failure. I'll cover runtime Management and Monitoring in another blog.  Even in clustered environments, the failure detection and fail over sequence can cause downtime of such duration that a data recovery scheme is necessary.
  2.  
  3. Loss of data is very likely. In the streaming data paradigm of CEP, failure of the application does not turn off the spigot. A cold standby model inherently incurs a downtime period between failure detection and startup of the standby system. If the application environment is in the     Lost Opportunity category then loss of data is largely insignificant.
  4.  
  5. Data persistence and recovery become an inherent part of the CEP application design. CEP applications must be designed to persistent state at appropriate intervals. In the event of a failure, the standby system can recover the application state from the persistent store. As part of the recovery process, the application must be able reconcile the last known persisted state with reality. For example, in the algo trading environment a trading application keeps track of Orders in-market and current position. To safeguard against losing this information in the event of a failure, as the state of Orders and position changes it is persisted to disk (or a SAN device). During the recovery phase, the CEP application must reconcile the Order state read from the persistent store with the state as known by the execution venue(s). A non-trivial task to say the least and one that is very dependent on, and unique to specific trading destinations. If connectivity is via FIX, that protocol provides a Order Status Request feature that can be used to facilitate this reconciliation



Warm Standby

Similar to a cold standby deployment, availability is achieved in a warm standby system through redundancy.  However the similarity ends there, in a warm standby environment the standby or secondary system (or systems) are generally on line with their own private connection to external data sources and networks. However, they operate is a strictly passive mode, leaving all the actual business functions to the active primary system.  The primary and secondary systems, acting in a master-slave relationship keep tabs on one another via a heartbeat scheme. A fail over is initiated when the primary system fails to respond to a heartbeat request from the standby.

A few noteworthy points about warm standby.

     
  1. Minimal fail over time (as compared to cold standby).  Since the secondary system (or systems) are up and running typically with their own connection to external systems, they can quickly assume control in the event of failure on the primary system.
  2.  
  3. Loss of data is minimized. Application state, while actively maintained by the primary system is also monitored and replicated by the secondary (passive) systems. So instead of persisting application state (i.e Orders and positions) to disk, it is replicated in-memory on the secondary systems.  This of course implies reliable inter-connectivity between the systems. The most common means of accomplishing this is via a message bus. Similar to cold standby's data persistence, the propagation of state between the master and slave CEP engines becomes part of the application design.
  4.  
  5. It's important to minimize or eliminate false-positives in the design of the heartbeat scheme. In the ideal case, heartbeating is done on an out-of-band connection between primary and secondary systems, out of the application processing code path and any input/output data queues.  This ensures heartbeating is not (or minimally) effected by fluctuations in processing load or congestion within the application itself.
  6.  
  7. There can be multiple secondary (slave) systems providing multiple levels of redundancy to ensure the absolute minimal downtime.
       



Continuous Operation

A high availability architecture that provides continuous operation or continuous availability implies a seamless fail over model. Like the standby designs, redundancy is the key ingredient. However, unlike either cold or warm standby where the redundant systems are passive, in the continuous environment multiple redundant systems are peers and execute in parallel. Each system runs a full complement of the application code and it's supporting infrastructure (i.e. CEP engine). To make this model function effectively, external connectivity is not uniquely wired to each instance but multiplexed by an independent component that manages both the inbound and outbound data streams to and from all redundant systems. Inbound the data is multiplexed to each instance, outbound this multiplexor manages the duplicate streams using a first out wins model. Since each peer instance is executing in tandem, they are producing duplicate output. In the design of a continuous operation high availability architecture duplicates are a natural by-product. Managing the duplicate output is vitally important.

A few noteworthy points about continuous operation.

     
  1. Fail over is truly seamless or in fact non-existent. If one of the participant instances drops out (i.e. fails) its an inconsequential event since the remaining systems are fully active and continue to operate.
  2.  
  3. Implies strict determinism in the CEP engine. Since all redundant systems are actively engaged at all times, they must produce the exact same output streams given that they receive the same input streams.
  4.  
  5. Failed instances that are brought back on line must have a means to catch up.  Since processing does not stop, a cleanly restarted system must have a means to load state from one of the currently running instances.


Opposing Forces
 
High availability is fast becoming a key constituent for CEP. The three basic variants I've outlined are just the starting points when considering options for a highly available deployment. There are cost considerations to take into account, both from a dollars and cents viewpoint but also manageability, complexity and performance. These are factors to weigh in evaluating or designing a high availability solution. Throughput and low-latency are typically paramount objectives in any CEP application. Wrapping that application in any variant of high availability can impact the overall performance. There is overhead in persisting state to disk or replicating that state across a message bus to secondary systems. To minimize the impact, there are disk and message subsystem solutions designed for extreme low-latency. RamSAN devices leverage solid state disk  technology for improved storage performance. RTI and 29West offer high-speed low-latency reliable messaging. These technologies and others can be employed within a high availability architecture to keep the performance demons at bay.

CEP technology is clearly reaching a level of critical mass as a platform for mainstream business, especially in Capital Markets. Most if not all CEP vendors provide some high availability options within their platform. However CEP is not an island within itself, it lives within a larger infrastructure and the degree of high availability is only as good as the weakest link.  Availability requirements are not just something to add-on after the fact, but should be part of the overall design of the infrastructure, connectivity and the deployed applications.

Thanks for reading...
Louie

 

Thursday, April 17, 2008

RAD tools for CEP: the good, the bad and ...

On my drive home from work one evening I was doing my usual auto-pilot routine, listening to the news on the radio when I heard an interesting technology-related story. It was a discussion on how technology jargon seems to slowly creep into our every day language. Most of the story slipped by me, but there was that one phrase that stuck with me. Just imagine 10 or 15 years ago if you said to someone "I Googled his Blog on my Blackberry", what sort of reaction would you get? A blank stare? Deer in the headlights? Made me chuckle to myself.

Over the past decade (or two) a plethora of new technology has become part of our every day lives. Over that same period, the software business has witnessed incredible innovations, the maturing of long-standing technologies and new spins on tried and true ideas.

Rapid Application Development or RAD as it's been known has been around for many years and applied to the application development process in all sorts of ways. These generally graphical tools have been layered over traditional development languages and platforms all with the lofty goal of shortening the application development cycle. RAD tools have been given a number of different names like Visual Programming, Application Designers, Scenario Modelers, etc. RAD tools for building commercial database applications have been around in various forms for over 20 years, and they seem to resurface each time a new technology for application development surfaces. CEP is no exception; most if not all CEP vendors have a RAD tool as part of their product offering. RAD tools offer big promises, but do they deliver?

RAD tools are not a panacea, nor are they simply demo building toys either. Keeping the right perspective is important to getting the most out of them. RAD tools come in all shapes and sizes. For CEP the visual programming paradigm seems to have hit home. Given a designer's canvas one can visually wire together abstract elements or components to form an application. These components called smartblocks or operators are dropped on to a canvas and allow the programmer to wire the outputs of one to the inputs of another. CEP applications typically start with connections to one or more raw streams of data. These raw stream operators are then wired to other operators to perform a number of tasks – such as a derived event stream with a set of temporally sensitive statistics. Some CEP RAD designers also allow the programmer to add rules and execute actions on these streams of data in a work-flow like manner, again in that visual programming style.

RAD tools are often on the front lines when vendors show their wares to prospects and in the hands of an expert they're a thing of beauty. Whipping up a prototype application is always a big win and generally provides proof points of the technology. By their very nature they promote a certain style of programming. Unfortunately that style while great for quickly churning out functioning applications is somewhat of an opposing force to good application design methodology. As I mentioned, the value (or point) of using RAD tooling is to shorten the development cycle. Customer's initial reactions to this are always positive. Software development as we all are painfully aware is a labor intensive task. Anything to shorten the time and therefore reduce costs is clearly perceived to be a bonus. But RAD tools do not always promote the best practices. Please don't read this the wrong way; I'm not saying RAD tools inherently create badly designed apps. It's more the mindset and environment in which they're used. I'm sure we've all been there; first we build a prototype of something. Next thing you know it's pre-production and then deployed live (ahhh!). If you're lucky all's well that ends well. But applications need constant care and feeding. Even if the application's platform provides many of the 'ilities (i.e. scalability, reliability, high-availability, etc.) apps still need a good dose of proper design to ensure they can stand the test of time (i.e. long-term maintenance).

RAD tool's visual programming metaphor typically bears little resemblance to traditional development languages and therefore the techniques to development and debug tend to be equally unconventional. Due to their ability to rapidly try out ideas, RAD promotes a trial and error style of code development (i.e. try something, if it doesn't work try something else and so on and so forth…). I like to think of this as stimulus-response or behavioral code development. The developer simply observes the behavior of the RAD-generated application based on how the various abstract components are assembled. If it does not behave as expected, it's generally quick and easy to make a change and try again. The very short path from coding to testing promotes this style of development.

There are a number of other aspects of RAD that present challenges to application development. The visual programming metaphor is more restrictive in its ability to fully express semantic logic. There are always portions of applications too complex to express in that visual metaphor. Being able to seamlessly integrate with or connect to components written in standard languages helps to avoid working against the RAD tool instead of leveraging its power.  RAD tools are often code generators, meaning the assemblage of abstract components are run through a parser that spits out source code in a traditional language or in the case of CEP, the vendor's EPL.  Debugging RAD-generated applications can therefore present a challenge.  It would be ideal to be able to debug from the same high-level operator view point used to create them, instead of trying to decipher machine-generated code.

If RAD was really a bad idea, I doubt it would have survived as a staple in software development for two decades. That short development cycle is a major benefit in fast moving business climates. Competitive pressures, especially in tight margin environments like Capital Markets have strained IT organizations to the limits. Tools to quickly push new and updated applications to the forefront can make a huge difference to the bottom line. But in the end, employing good design methodologies is a key part of the application development cycle. Rapid Application Development tools are just one means to breathe life into that design.

Wednesday, February 20, 2008

Apama Monitorscript by Example

There have been a couple recent posts on Apama's Monitorscript language, both here and here. To provide a bit more insight into the Apama EPL, below is a working sample that demonstrates a number of its capabilities. The language includes a declarative nature for defining and registering listeners for specific event types and it has a java-like syntax for imperative logic.  The language provides a balance between a recognizable vernacular and a purposed nature for event processing.

 

Example narrative

Prior to an annotated walk-thru of the code sample, I thought it would help to first explain its purpose and what event streams it's processing. This simple example defines a work dispatcher. It receives a request in the form of an event (AddSymbol) to dispatch a discrete listener against an event stream of market depth (bids and asks) events. This discrete listener processes the market depth for a specific symbol.  The actual work performed as it pertains to this example is inconsequential and is represented by an empty method (processDepth). Additionally, once a listener is dispatched it also listens for a request to terminate.

 

The subtleness of this example is its ability to leverage the simplicity of the Apama EPL and the power of the runtime engine wherein it executes. Thousands or even tens of thousands of listeners can be dispatched each running in its own independent context processing its unique slice of the streaming market data.

 

In reality there are a number of techniques that can be employed within the MonitorScript EPL to accomplish this sort of work dispatcher. The EPL includes a spawn operator which I've outlined in a previous blog. The spawn operator is the primary means for establishing independent worker threads and is the basis for instance creation. The example below focuses on event listeners to define discrete units of work.

 

1 package com.apamax.sample;

 

2 monitor ProcessMarket {

3  sequence <string> symbols; // contains list of symbols to process.

4   com.apama.marketdata.SubscribeDepth subDepth;

5   com.apama.marketdata.Depth adepth;

6  dictionary< string, string > emptyDict;

 

7  action onload {

 

8    // Listen for incoming AddSymbol events and

9    // add to symbols list if not already present

10    AddSymbol addSymbol;

11    on all AddSymbol(): addSymbol {

12     if symbols.indexOf(addSymbol.symbol) = -1 then {

13       string local_symbol := addSymbol.symbol;

14       symbols.append(local_symbol);


15        // Subscribe to this symbol

16       route com.apama.marketdata.SubscribeDepth("", "", local_symbol, emptyDict );

 

17       // wait for 20.0 seconds, if no depth event received, terminate

18       listener waitListener;

19       waitListener := on wait(20.0) {

20          route RemoveSymbol(local_symbol);

21       }

 

22       listener depthListener;

23       depthListener := on all com.apama.marketdata.Depth(symbol=local_symbol):adepth {

24           waitListener.quit();

25           processDepth(adepth);

26       }

 

27       // Listen for RemoveSymbol events and remove from symbols list,

28       // unsubscribe & quit

29       RemoveSymbol removeSymbol;

30       on RemoveSymbol(symbol=local_symbol): removeSymbol {

31           integer index := symbols.indexOf(removeSymbol.symbol);

32           if index != -1 then {

33              symbols.remove(index);

34             processRemove(removeSymbol.symbol);

35

36             // Unsubscribe to this symbol

37             route com.apama.marketdata.UnsubscribeDepth("", "",

                                                            removeSymbol.symbol,

                                                           emptyDict );

38             depthListener.quit();

39             }

40       }

41     }

42     else {

43       log "Debug: Ignored (existing) Add Symbol Event = " + addSymbol.symbol;

44     }

45    }

46 }

 

47    action processDepth(com.apama.marketdata.Depth d) {

48       // Do something

49    }

 

50    action processRemove(string s) {

51       // Do something.

52    }

53 }

 

 

Example Annotation

 

In describing this example, the first point to note is that the event definitions are not included. For the sake of brevity they're assumed to be defined elsewhere. Actually there are only a few anyway. They can be categorized into two logical groups; control events (AddSymbol, RemoveSymbol, SubscribeDepth, UnsubscribeDepth) and data events (Depth).  This categorization is only for a semantic understanding of the example, there is no such classification in the language. Additionally, Monitorscript has an easily recognizable syntax to anyone schooled in Java, C++ and other classic languages.

 

A monitor (line 2) defines the encapsulating block definition. Similar to a java class it is typically scoped to a package name space (line 1). Monitors are the main block scope and a typical Apama application is made up of many monitors that interact with one another by sending and receiving events. Within a monitor one can declare events, define actions (i.e. methods) and variables (integers, strings, floats, etc.). This example defines a handful of monitor-scoped variables. The language also supports a number of complex data types; the sequence and dictionary both use a C++ template style declaration to define array types and collection types respectively (lines 3 and 6).

 

The onload action (line 7) is the main entry point of a monitor. When a monitor is loaded into the runtime engine, it's onload action is immediately invoked. This work dispatcher example is entirely implemented within this action, for the sake of brevity it's a simple way to describe the language. Line 10 defines an instance of an AddSymbol event and declares a listener for all occurrences of this event type (line 11).  The remainder of the functionality of this example is scoped to the encapsulating block of this listener (lines 12 – 45). This is an important note, since the intent is to receive and process multiple AddSymbol events (potentially 1,000's) where each AddSymbol will cause the invocation (dispatch) of a discrete unit of work that is represented by this encapsulating block of code. Within this block of code we communicate with other monitors and establish a number of unique listeners for this unique symbol name.

 

The route statement (line 16) sends a SubscribeDepth event. Route is the standardized form of communication between monitors. Under the covers, the route statement causes the event to be routed to be placed at the head of the engine's input queue – thus become the next event to the processed by the engine.  Semantically, routing a SubscribeDepth event starts the flow of Depth events for this symbol (i.e. local_symbol). Lines 22-26 establish a listener to receive the stream of Depth events for this symbol, calling the action processDepth upon receipt of each one.

 

In addition to establishing a Depth listener, this block of code also creates a wait timer in lines 17-21. The purpose of this timer is to terminate this dispatched unit of work for this unique symbol if we do not receive an initial Depth event within 20 seconds. Line 24 kills that wait listener once the Depth events start flowing. Termination is handled by the RemoveSymbol listener declared at line 30. Note that since it will be executing within the context of a specific symbol's unit of work we're only interested in receiving a single occurrence of RemoveSymbol. This is specified in the on statement – sans the all modifier. Upon receipt of a RemoveSymbol event we unsubscribe, remove the symbol's entry from the list and terminate (i.e. quit) the Depth listener for this symbol. Like AddSymbol, RemoveSymbol control events can arrive from another monitor or a client connected to the runtime engine.

 

I hope this simple example sheds light on the simplicity, elegance and power of the Apama Montorscript EPL.

 

Post Script …

 

After posting this blog, one of my esteemed colleagues with a much better command of the Monitorscript language offered a few refinements to avoid the need to manually handle termination (i.e. lines 17 – 21 in the code snippet). It does add one new control event - Terminate, but it avoids the need to use listener variables.

 

 

on com.apama.marketdata.Depth(symbol=local_symbol):adepth and not wait (20.0) {

   on all com.apama.marketdata.Depth(symbol=local_symbol):adepth and not Terminate(local_symbol) {

   processDepth(adepth);

  }

}

 

...

 

 

on RemoveSymbol(symbol=local_symbol):removeSymbol {

     ...

  route Terminate(removeSymbol.symbol);

}

 

 

 

This enhancement shows the declaration of complex (or compound) listeners against multiple event streams (and a timeout condition) concurrently. This is a commonly used technique in MonitorScript – and clearly quite powerful.

 

Wednesday, February 13, 2008

Thoughts on the Bitter Pill

In reading a recent CEP blog, a Bitter Pill To Swallow by Tim Bass and Opher's commentary it reminded me of a few thoughts I've had recently on the role of CEP in the modern trading platform for Capital Markets.

First, let me say that I believe Apama takes a much larger view and includes much more in the CEP stack than many other vendors. Naturally, we’ve turned this into a competitive advantage. This larger perspective encompasses many aspects beyond the core CEP engine. That includes dashboards, connectivity adapters, development and deployment tooling, vertical-focused application starter kits and of course language. That said, I still believe the CEP industry is still quite nascent and only loosely defined. However I think it is gaining clarity – from both an industry view and customer/prospect expectations. What’s the role of CEP in financial services and where does it fit in total software stack of fully deployed trading platforms? A few recent experiences have drawn me to the conclusion that CEP technology is just one of many elements in that infrastructure.

  • AITE Report on High Performance Infrastructure

The AITE report is a review of the infrastructure components that make up a trading platform, this included various sorts of connectivity (market data, order management, messaging), distributed cache and CEP. The report did not articulate the actual trading applications (i.e. strategies) themselves; those are unique to any one customer. The point being, those trading apps are viewed as independent and not defined as part of the trading infrastructure (a rather obvious point frankly). The technology used for strategy implementation is left as “an exercise for the user”.  My interpretation of this… CEP is part of the infrastructure and not necessarily the application. Without question, I clearly understand that the division between infrastructure and application can be quite gray, and with CEP lacking an absolute definition it's not all that surprising. Within Apama we are driven towards providing a total platform for our customers. As such, we undertake the task of providing both the infrastructure and a framework for applications. This framework takes the form of ready-to-use functions, components and starter kits or accelerators as we call them. Ideally, customers only need to inject their own IP. Our platform lends itself it that end – a one-stop-shop if you will. Our future plans are focused on this breadth in the platform because that's the sort of demand we face. While I don't think this necessarily broadens the definition of CEP, it simply means CEP as a technology cannot exist in a vacuum. It needs the standard complement of tooling both within itself and for outside integration to the IT-centric world at large. This is the stuff one comes to expect as maturity overtakes hype.

  • Standardization on EP benchmarking

There have been a few initiatives to define vendor-neutral benchmarks for CEP. Additionally, a few vendors have published benchmark studies, BEA in particular. Efforts to define independent benchmarks will inevitably do more that just define test cases for CEP, they will also shape the definition of what CEP is and does. I would expect any effort to benchmark CEP engines will be somewhat narrowly focused to just the core capability of doing something to or with streaming data. Many of the elements that make CEP truly commercially viable, from tools for development and visualization to deployment, runtime management and high availability will not be part of the equation. Practically speaking it stands to reason this is the case. It's impossible to create a neutral benchmark for these things. However, it would be ideal for benchmarks to include real-world usage. Some finance examples would be a Pricing engine, Index Arbitrage, Spread trading, Crossing and VWAP trading. I would expect not all CEP products to be fit-for-purpose to meet the needs of these examples. Furthermore, I'm not convinced one could say these were all CEP use cases either.

  • Discussions with prospects

Over the course of the past year I've met with numerous clients. There are a couple of observations I've made with respect to the growing understanding of CEP technology. First I've noticed an increasing number of them use the phrase "researching CEP platforms" in conversation on their next-generation trading platform. While this might appear innocuous, it represents a shift in thinking. CEP is gaining wider awareness and its general applicability to trading. It was not so long ago that the term CEP was not mentioned either by us or the client – it had no apparent meaning to the opportunity at hand. However, times are changing and CEP awareness is growing, unfortunately this does not necessarily mean there is complete clarity on its definition. This lack of definition, I believe has caused (future) adopters of CEP to look at the various vendors' products through different lenses. For example if a particular CEP product is not fit-for-purpose for implementing a Pricing Engine that does not necessarily mean it cannot provide a component of a Pricing Engine, with the remainder (or a significant portion) being implemented in more a traditional manner. In a competitive bid environment, I am convinced clients draw the line-in-the-sand at a different place for different vendors based on their fit-for-purpose. Unfortunately, this causes CEP products to be judged on unequal footing.

Another validation that shows CEP is gaining awareness but still immature is a recent report from the Forrester group on CEP Adoption. To paraphrase, CEP adoption is being driven by and used within the business side of organizations sans IT. Business tends to be on the bleeding edge of technology simply because they're looking for any nugget that can give them a competitive edge. IT on the other hand, has to face the challenging realities of daily care and feeding of software. Their priorities are in stability and manageability - attributes of mature technologies.

As the CEP hype curve starts to level off, it will follow the same path as all other hyped technology to eventual commoditization. Its usage will coalesce around a few paradigms and an industry definition will start to solidify. The items I mention above indicate that this is happening already.
 

Wednesday, November 28, 2007

Design Patterns in CEP – The instance Life Cycle

Much has been written on CEP design patterns. You can find two very good whitepapers on the subject on the Complex Event Processing website. These whitepapers and various blogs very clearly and succinctly describe a number of use-cases that show how CEP engines with their complex pattern detection and temporal awareness can be used effectively to derive real business value from streaming data sources. However, in reading these documents and contrasting them to my own experiences with Apama CEP applications they miss a very real and fundamental design pattern. That of instance life cycle management. I see Complex Event Processing as much more than pattern detection but as the need for pattern detection and the expression of business logic executing within a managed life cycle context.

What I mean by the instance life cycle is the need to invoke multiple application instances upon receipt of specific control events or the detection of event patterns. The execution context and the application instance that runs within it define a unit of work. The life time of instances is driven directly by the application's semantics. Instances might have a long or short life span. There are three basic traits of the instance life cycle; creation, modification and deletion. Creation is simply the need to establish a new object instance and context for its execution. Modification is the need to support dynamic changes to the operational state (i.e. runtime parameters). Lastly, is support for terminating a running instance – either abruptly or gracefully. The instance life cycle can be managed by various means - other parts of an application or from a user's dashboard. Interfaces to create new, modify existing or delete instances, given adequate security privileges should be present. The runtime environment of the CEP engine should provide a means to establish and manage these application instances.

To be a bit more definitive, instances can represent many differing aspects of an application. In the Algo Trading world 'instances' generally refer to running trading strategies. The basic template of a strategy is to take a set of input parameters during creation. For example, an Arbitrage strategy might take two instrument names and a Bollinger Band range as input. Modifying the running strategy might be the ability to increase or decrease the Band range. Deleting the strategy would cause it to either complete or cancel any outstanding orders its managing and then terminate. Given this example, executing multiple concurrent instances of the Arbitrage strategy should be a straight forward task and the CEP engine should provide the corresponding language semantics to make this a simple task.

Instances are not limited to this top-level construct of a strategy. They could represent many things in a CEP engine. Using another finance example, a CEP application designed for best execution of client orders would represent each client or parent order as an instance. Each subsequent child order also represents a managed instance with its own life cycle. For a CEP application monitoring a manufacturing environment, instances might represent parts moving through a production process from raw material to finished goods. A managed instance could represent a mortgage application as it flows through qualification in a BAM application within a bank.

In some respects I'm not identifying a revelation or an idea all that new. Commercial applications of all sorts incorporate the notion of instance life cycle management. It's a classic design pattern that is provided in some form in both development languages and application frameworks (i.e. AppServers). The point I'm attempting to identify is that one should be wary of getting enamored with the uniqueness of CEP. It's not just about design patterns for filtering and enrichment but also incorporating classic and traditional constructs for application design and implementation. In the traditional languages such as java and C++ the instance design pattern can take numerous forms. It can be as simple as the new operator. Thus an instance can be represented as a new object instance of a class. An instance could take the form a thread via CreateThread or in the old-school Unix mentality, an instance can be a forked process. Granted all of these are not equivalent, there are pros and cons in using them for life cycle management. Furthermore, frameworks such as J2EE application servers provide their own instance management schemes to make the application development task of instance management easier. CEP engines that are implemented using an embedded model, specifically those that run within the context of a host language (i.e. java) and/or infrastructure (i.e. AppServer) can leverage these inherent lifecycle instance management capabilities.

Other languages have managed to leverage the best of these options to create safer more scalable instance management. Apama's MonitorScript event processing language (EPL) supports an application threading model via a spawn operator. Each instance is referred to as an mThread. From this core capability in the EPL, the life cycle design pattern is built. The spawn operator allows Apama CEP applications to be built and designed to leverage the best of thread-based concurrent programming (simpler coding model, code isolation and encapsulation, etc.) without the worry of all the complex uses that surround the use of system threads (thrashing, non-deterministic scheduling, poor portability, poor scalability). mThreads in MonitorScript guarantee predictable behavior, are fully deterministic, fully portable and massively scalable. This capability is not adjunct to the Apama platform. It's a core capability inherent in the language and heavily leveraged by many of the supporting development and runtime tools.

CEP platforms that narrowly define the paradigm of event stream processing as a language for filtering and enrichment leave many very important aspects of application management undefined and as such challenging to implement. The uniqueness and the benefits of CEP are clearly evident in the many documented design patterns. However, CEP cannot ignore the mature aspects of traditional application design such as instance life cycle management.

Tuesday, November 13, 2007

Taking Aim

I am both humbled and appreciative of all the accolades, constructive comments (hey, fix that misspelled word) and yes, criticism on my latest blog about using SQL for Complex Event Processing. I was expecting some measure of response, as shown by this rebuttal given the somewhat polarizing nature of using the SQL language for CEP applications. For every viewpoint there is always an opposing, yet arguably valid outlook. I welcome any and all commentary. Again thanks all reading and commenting.

There were two main themes of the criticism that I received. One was on the viability of the SQL language, the other on my commentary on the use of Java and C++ for CEP applications. I would like to clarify and reinforce a few points that I made.

I chose the aggregation use-case as an example to highlight limitations of SQL because its one that I have recent experience with. I have been both directly and indirectly involved in six aggregation projects for our financial services customers over the past year. In those endeavors I've both learned much and leveraged much. As I tried to describe in a condensed narrative, aggregation is a challenging problem. One that is best solved by use of complex nested data structures and associated program logic. Trying to represent this in SQL is a tall order given the flat 2-dimensional nature of SQL tables and the limited ability for semantic expression. To try to further explain this, I have taken a snippet of a MonitorScript-based implementation, presenting it as a bit of pseudo code that describes the nested structure I am referring to. I've intentionally avoided any specific language syntax and I've condensed the structures to just the most relevant elements. But suffice to say, defining these structures is clearly possible in Java and C++ and Apama's MonitorScript. I would also like to give credit where it's due and acknowledge many of my colleagues for the (abridged) definition I'm using below from one of our customer implementations.

structure Provider {

string symbol; // symbol (unique to this provider)

string marketId; // the market identifier of the price point

integer quantity; // quantity the provider is offering

float timestamp; // time of the point

float cost; // transaction cost for this provider

hashmap<string,string> extraParams; // used for storing extra information on point

}

 

structure PricePoint

{

float price;// the price (either a bid or ask)

array<Provider> providers; // array of providers at this price

integer totalQty; // total quantity across all providers at this price

}

 

structure AggregatedOrderBook

{

integer sequenceId; // counter incremented each time the book is updated

integer totalBidQuantity; // total volume available on the bid side

integer totalAskQuantity; // total volume available on the ask side

integer totalProviders; // total number of providers

array<PricePoint> bids; // list of bids, sorted by price

array<PricePoint> asks; // list of asks, sorted by price

}

An aggregation engine would create an instance of AggregatedOrderBook for each symbol, tracking prices per market data Provider. As market data quotes arrive they are decomposed and inserted (sort/merged) into the appropriate PricePoint and total values are calculated. This is an oversimplification of what transpires per incoming quote, but the aim here is to provide a simplified yet representative example of the complexities in representing an Aggregated Order Book.

Furthermore, after each quote is processed and the aggregate order book is updated it's imperative that it be made available to trading strategies expeditiously. Minimizing the signal-to-trade latency is a key measure of success of algorithmic trading. Aggregation is a heavyweight, compute intensive operation. It takes a lot of processing power to aggregate 1,000 symbols across 5 Exchanges. As such, it is one (of many) opposing forces to the goal of minimizing latency. So this presents yet another critical aspect of aggregation, how best to design it so that is can deliver its content to eagerly awaiting strategies. One means of minimizing that latency is to have the aggregation component and trading strategies co-resident within the CEP runtime engine. Passing (or otherwise providing) the aggregated order book to the strategies becomes a simple 'tap-on-the-shoulder' coding construct. But it does imply the CEP language has the semantic expressiveness to design and implement both aggregation and trading strategies and then the ability to load and run them side-by-side within the CEP engine. Any other model implies not only multiple languages (i.e. java and streamSQL) but likely some sort of distributed, networked model. Separating aggregation from its consumers, the trading strategies will likely incur enough overhead that it impacts that all important signal-to-trade latency measure. I do realize that the CEP vendors using a streaming SQL variant have begun to add imperative syntax to support complex prodedural logic and "loop" constructs something I'm quite glad to see happening. It only validates the claim I've been making all along. The SQL language at its core is unsuitable for full-fledged CEP-style applications. The unfortunate side effect of these vendor-specific additions is that it will fracture attempts at standardization.

In my previous blog, I wanted to point out the challenges of the SQL language to both implement logic and manage application state. To that end, I provided a small snippet of a streamSQL variant. A criticism leveled against it states that it's an unnecessarily inefficient bit of code. I won't argue that point, and I won't take credit for writing it either. I simply borrowed it from a sample application provided with another SQL-based CEP product. The sample code a vendor includes with their product is all too often taken as gospel. A customer's expectation is that it represents best practice usage. Vendors should take great care in providing samples, portions of which inevitably end up in production code.

The second criticism I received was on a few unintentionally scathing comments I made against Java and C++. I stated that using C++ and/or Java "means you start an application's implementation at the bottom rung of the ladder". My intent was to draw an analogy to CEP with its language and surrounding infrastructure. All CEP engines provide much more than just language. They provide a runtime engine or virtual machine, connectivity components, visualization tools and management/deployment tools. CEP vendors like all infrastructure vendors live and die by the features, performance and quality of their product. All too often I've witnessed customers take a "not invented here" attitude. They may survey the (infrastructure) landscape and decide "we can do better". For a business' IT group chartered with servicing the business to think they can implement infrastructure themselves is a naïve viewpoint. Granted, on occasion requirements might be so unique that the only choice is to start slinging C++ code, but weighing the merits of commercial (and open source) infrastructure should not be overlooked.

My goal in this and past blogs is to provide concrete use-cases and opinions on CEP drawn from my own experiences with designing, building and deploying Apama CEP applications. In doing so I was quite aware that I am drawing a big red bulls-eye on my back making me an easy target for detractors to take aim. Surprisingly, I have received much more positive commentary than I ever expected and fully professional criticisms. I thank all that have taken the time to read my editorials, I am quite flattered.