Code Sample

Wednesday, May 27, 2009

Location Tracking, fit-for-purpose design patterns in CEP

Posted by Louis Lovas


As the CEP community is well aware the technology often gets compared and contrasted to traditional data processing practices. It's a topic that ebbs and flows like the ocean's tides. It's a conversation that occurs regularly in various forums, with clients, analysts and of course within this community at large. Far be it from me to stir things up again but I believe there is some credible evidence the critics draw upon. This touched a nerve not too long ago when Curt Monash hammered the CEP community.  The response was swift, but frankly unconvincing.

In many respects, I believe this argument is rooted in how some CEP vendor's have architected their product. Many vendors have a focus of event stream processing as a variant of database processing. They see streaming data as just a database in motion and therefore have designed and built their products myopically around that paradigm. By doing so those vendors (possibly inadvertently) have plotted a course where the fit-for-purpose of their products is focused on use-cases that are data-intake dominate. They can consume the fire-hose of data, filter, aggregate and enrich it temporally but little else. What is missing in this equation is the definition of the semantics of the application. Whether that is a custom application such as algo trading or monitoring telco subscribers or some business intelligence (BI) dashboard. To those vendors, that is viewed as adjunct or external (i.e. the client) and solved with the typical complement of technologies; java, C++, .NET and considered outside of the direct domain of CEP. 

While this paradigm clearly does work, it incites the (CEP) critics; "where's the distinguishing characteristics? why can't I just do this with traditional data processing technologies?".

A challenging question when so many CEP products are stuck with that look and feel of a database, even a few of the academic projects I've reviewed seem to be myopically centered on this same paradigm. It reminds of that television commercial with the tag line:  "Stay Within the Lines. The Lines Are Our Friends." (I think it was for an SUV). Quite frankly such thinking does not represent the real world. Time to think outside the box (or table as the case may be).

Yes, the real world is full of in motion entities, most often interacting with each other in some way. Cars and trucks careening down the freeway zigzag from one lane to another at high speed with the objective of reaching a destination in the shortest possible time without a collision.  Would be an interesting CEP application to track and monitor the location and movement of such things.  

In fact, location tracking is beginning to show signs of being a common use-case with the Apama platform. Not long ago we announced a new customer, Royal Dirkzwager that uses Apama to track ship movements in sea lanes. My colleagues Matt Rothera and David Olson recently published a webinar on maritime logistics. This webinar follows the same basic premise as the Royal Dirkzwager use-case, that of  tracking the location of ships at sea.  In fact, we aren't the only one seeing activity in location tracking, here's a similar use-case for CEP in location-based defense intelligence.  The basic idea is the ability to track the movement of some entity, typically in relation to other entities, are they getting closer together (i.e. collision detection) or moving further apart (i.e. collision avoidance), are they moving at all? at what speed? will they reach a destination at an appropriate time? A CEP system needs, at it's core the ability to have both temporal and geospatial concepts to easily support this paradigm.  Here's a handful of examples where this applies:

  • Tracking ship movements at sea (as I mentioned with Royal Dirkzwager, and the Apama webinar on maritime logistics)
  • Airplanes moving into and out of an airspace
  • Baggage movement in an airport
  • Delivery trucks en route to destinations
  • Service-enabled mobile phones delivering content as people move through shopping and urban areas
  • Men, machines and material moving on the battlefield


These are just a handful of location tracking use-cases for which the Apama platform is well suited.

Another colleague, Mark Scannell has written a location tracking tutorial that is part of the Apama Studio product. This is a great example that exemplifies the power of the Apama EPL for building location tracking CEP applications. The tutorial provides a narrative description explaining it's purpose and the implementation. Below I've included a small snippet of that example to highlight the elegant simplicity, yet powerful  efficiency of the Apama EPL. If you're in need of a brief introduction to the Apama EPL, you can find that here, the beginning of a three part series on the language.


Location Tracking in the Apama EPL
.
// Track me - the tracked entity
action trackMe() {
       
  // Call self again when new location is detected
  on LocationUpdate( id = me.id ):me {
     trackMe();
  }

  // Detect a neighbor in our local area -
  // do this by excluding events that are for our id,
  // which will also cause us to reset this listener
  // through calling trackMe() again.
 
  LocationUpdate n;
  on all LocationUpdate( x in [ me.x - 1.0 : me.x + 1.0 ],
                         y in [ me.y - 1.0 : me.y + 1.0 ] ):n and
     not LocationUpdate( id = me.id ) {

     // Increment number of neighbors that have been spotted
     // and update the alarm
     spotted := spotted + 1;
     updateAlarm();           

     // Decrement count of spotted neighbors after one second
     on wait ( 1.0 ) {
       spotted := spotted - 1;
       updateAlarm();
     }
  }
}

As a brief introduction, the Location Tracker tutorial is designed to track the movement of Entities (i.e. cars, ships, trucks, planes, or any of those things I listed above) in relation to other Entities within a coordinate system or grid. An entity is considered a neighbor if it is within 2 grid units (-1,+1) of any other entity. The grid and the units within the grid are largely irrelevantly for the syntactic definition of entity tracking. Their semantic meaning on the other hand, is within the context of a specific use-case (i.e. a shipping harbor, air space, battlefield, etc.).

From the tutorial I pulled a single action, trackMe, it contains the heart and soul of the tracking logic.  As entities move they produce LocationUpdate events. The event contains the entities unique id and the X,Y coordinate of the new location. This trackMe action is designed to track their movement by monitoring LocationUpdate events. For each unique entity there is a spawned monitor instance (a running micro-thread so to speak) of this trackMe action. 

The idea is that when an entity moves its new location is instantly compared against all other tracked entities (except of course itself, witness the recursive call to trackMe when id's match (id = me.id)) to determine if it has become a neighbor (remember the 2 grid units).  This is elegantly implemented with the listener "on all LocationUpdate( x in [me.x - 1.0 : me.x + 1.0], ...". In a narrative sense, this can be read as "If the X,Y coordinate of this entities new location is within 2 grid units (-1.0, + 1.0)  of me then identify it as a neighbor and update an alarm condition"  (via  a call to updateAlarm()).

This small bit of code (about 20 lines) exhibits an immensely powerful geospatial concept, the ability to track the movement of 100's, 1000's even 10,000's of entities against each other as they move, and of course this is accomplished with millisecond latency.


 


This small example demonstrates a few of characteristics of the Apama EPL, specifically that it is an integrated well-typed procedural language with event expressions. It allows you to code complex event conditions elegantly expressed in the middle of your procedural program. This allows you to focus on the logic of your application instead of just selecting the right event condition.

However to get a clear picture, the language of CEP is just one aspect of an overall platform. The Apama strategy has also been focused on a whole product principle, one where the whole is greater than the sum of the parts. As a mandate to our vision we have outlined four key defining characteristics: 1) A scalable, high performing Event Engine. 2) Tools for rapid creation of event processing applications supporting business and IT users. 3) Visualization technologies for rich interactive dashboards and 4) An Integration fabric for the rapid construction of high performance, robust adapters to bridge into the external world.

The possibility exists that CEP will struggle to break out as a truly unique technology platform when so many just see a variant of database technology.  It's time to break out of the box, drive over the lines and succinctly answer the critics questions. CEP is not about tables, rows and columns but events. Events that are often artifacts of the real world. A world that is in constant motion, be it ships, planes, car, trucks, phones, or you and me. Information flows from it all in many forms but that does mean we have squeeze it into the database paradigm.

Once again thanks for reading, 
Louie


Thursday, April 09, 2009

Scalable concurrency, a design pattern in the Apama EPL

Posted by Louis Lovas


This is my final installment in a series devoted to a specific example in the Apama EPL. I began this example by describing the basic design pattern of a  consumer/producer.  Further enhancements enabled multiple consumers and as a result the instance idiom.  Finally below, I will again enhance this consumer/producer by illustrating how one can leverage multi-core processors for massive scalability and parallelism.

As I have mentioned before, instances or 'sub-monitors' as they're often referred to in the Apama EPL define a discrete unit of work. That unit of work represents a set of business logic however large (a complete application scenario) or small (a simple analytic).  Instances are created on demand using the spawn operator in the language. Each scenario instance is invoked with a unique set of input parameters that represent that occurrence. Each instance can then uniquely maintain its own reference data, timers and event streams, in effect its own state.  In general programming patterns this is known as a factory behavioral model but we've extended it to include an execution model.

To provide a means to leverage multi-core processors, the Apama EPL provides a syntax and a simple semantic to allow those instances to execute in parallel. We do this with a language feature called contexts. These are silos of execution which take the factory model to the next level. A context defines a logical container that holds and executes instances of a scenario (of the same or differing types). The EPL provides a semantic for inter-context communication, there is no need for mutexes, semaphores or other locking schemes thus avoiding common deadlock code patterns typical of imperative languages such as java. Each context in effect has it's own logical input queue to which events are streamed from external sources or other contexts.  Behind contexts our CEP engine squeezes the most out of operating system threads to leverage maximum use of multi-core processors.

The same CEP engine can create multiple contexts (a context pool as you'll soon see in the code example below), they can be used to hold and execute multiple scenario instances, additionally those instances can create sub-contexts for additional parallelism. If for example, these instances are an application for pricing Options and require a compute-intensive calculation such as Black Scholes, additional contexts can be spawned for these calculations. Furthermore, sub-contexts can be designed as shared compute services to be leveraged by multiple scenario instances running in different (parallel) contexts.

Contexts take the factory model and extend it to include a parallel execution model with a few simple keywords in the EPL as you'll soon see below.

The enhancements to the Item consumer/producer include a Context Pool which I've listed the code for below and the enhanced Item Producer that leverages it. The interface is unchanged except for one new event and the Consumer (client) has a minor revision  (thus adhering to my belief that an EPL should follow the principles of structured programming of modularity and encapsulation that I've blogged on at the start of this series).  The complete example for this revision is available here and requires Apama version 4.1 (or later of course).





The Context Pool
.

package com.apamax.sample;


event ContextPool {
    integer numContexts;
    sequence<context> contexts;
    integer idx;
   
    action create(integer nc, string name) {
        self.numContexts := nc;
        while(nc > 0) {
            contexts.append(context(name, false));
            nc := nc - 1;
        }
    }
   
    action getContext() returns context {
        context c:= contexts[idx];
        idx := idx + 1;
        if(idx=numContexts) then {
            idx := 0;
        }
        return c;       
    }
}


The ContextPool as implemented here is a general-purpose utility that provides a pool of contexts via a create method (i.e. action) and a means to distribute a workload across them in a simple round-robining technique each time the getContext action is called.

As I mentioned above contexts are mapped to operating system threads, so judicious use of the create action is expected. The basic rule-of-thumb is that number of total contexts should equal the number of cores on a server.  One noteworthy point, contexts can be public or private. A public context means that event listeners running within it can receive event streams from external sources (i.e. adapters), listeners within a private context can only receive events that are directed  to the context via the enqueue statement in application logic running in another context. For my example, this context pool utility creates private contexts: context(name, false)

I've leveraged another general capability of the Apama EPL in the implementation of this context pool, that of actions on events. You'll notice these two actions are enclosed in an event definition which is part of our com.apamax.sample package.

In keeping with it's charter of structured programming,  actions on events provides a means to promote code modularity by encapsulating reusable utility functions (like a context pool).


 


The (parallel) Item Producer
.
package com.apamax.sample;


monitor ItemService {
   
  event ClearUserID {
      integer id;
  }

            
  integer count := 0;
  float price := 0.0;
   
  action onload {
      ContextPool cf:=new ContextPool;
      cf.create(4, "ClientService");
   
      // list of subscriber (user) identifiers
      sequence<integer> ids := new sequence<integer>;
       
      SubscribeToItems s;
      on all SubscribeToItems():s {
          if ids.indexOf(s.subscriberId)= -1 then {
              context c:= cf.getContext();
              ids.append(s.subscriberId);
              route SubscriptionResponse(s.subscriberId, c);
              on completed SubscriptionResponse() {
                  spawn startSubscriptions(s.subscriberId, s.item_name,
                                           context.current()) to c; 
              } 
          }
      }
       
      ClearUserID c;
      on all ClearUserID():c {
          log "in " + c.toString();   
          integer index := ids.indexOf(c.id);
          if index != -1 then {
              ids.remove(index);
          }
      }
  }

  action startSubscriptions(integer this_subscriberId, string name,
                            context mainContext) {
      log "in startSubscriptions";
       
      on all wait(0.1) and not UnsubscribeFromItems(subscriberId =
                                               this_subscriberId) {
          route Item(this_subscriberId, name, count, price);
          count := count + 1;
          price := price + 0.1;
      }

      on UnsubscribeFromItems(subscriberId = this_subscriberId){
          enqueue ClearUserID(this_subscriberId) to mainContext;
      }       
  }
 
}



To get a general sense of what the multi-instance Item Producer code is intended to do, I suggest a quick scan of my last installment, this revision does not change that basic foundation it only parallelizes it. It is worth pointing out how little the code and design has changed yet this implementation has the ability to scale massively to tens of thousands of instances across multiple processor cores.  Clearly this is just a simple example that does very little real work (producing Item events). However structurally, it's a model that represents how one would design such a scalable service in the Apama EPL.

The parallel Item Producer (like it's previous incarnation) manages multiple uniquely identified Consumers. For that it must maintain a list of identifiers, one for each Consumer.  But this time, the Producer instance created on behalf of the Consumer is spawned into a context:  spawn startSubscriptions(s.subscriberId, s.item_name, context.current()) to c; We're still passing the subscriberID and item_name, (the instance parameters) but we also pass the context handle of the main context (context.current()).   This is necessary for the inter-context communication.  

The Consumer implementation has undergone a minor change to support this parallelized execution mode to match the Producer.  A good design pattern is to ensure that monitors that frequently pass events operate within the same context. This is not a hard-fast rule, only one that limits the amount of inter-context communication (i.e. enqueueing).  I've enhanced the interface slightly, there is a new event, SubscriptionResponse  that is used as a response to subscription requests (on all SubscribeToItems()) .  This event is used to communicate back to the client the context handle of the Producer spawned on its behalf. Once the Consumer receives this event, it also spawns into this same context. By doing so, both the Producer and Consumer operate as they always did sending Item events (route Item(this_subscriberId, name, count, price)) and handling termination (on UnsubscribeFromItems).  Within each context, the producer/consumer still adheres to that single-cast event passing scheme where it creates and sends uniquely tagged Item events. The Consumer and the Interface are included in the download (not shown here for brevity's sake).

Two additional noteworthy points to highlight in this Producer implementation.

1) The on completed SubscriptionResponse() listener.  The completed  keyword indicates that this listener wakes up after the SubscriptionResponse  event has been delivered.  This way we can guarantee that our Consumer has received this event and has the context handle before spawning the Producer.

2) To process UnsubscribeFromItems events, the statement: enqueue ClearUserID(this_subscriberId) to mainContext; is executed.  This statement is used to send an event to the listener (on all ClearUserID) which executes in another context. Recall, that the action startSubscriptions is the target of the spawn operator. So this is the main body of code for which multiple instances are parallelized running in contexts (from the pool). The onload action, which is controlling all of this spawning is logically considered the main context. Due to the strong semantic for inter-context communication, events must be enqueued to another context's input queue. Each context in effect has its own input queue and with the context handle the inter-context communication mechanism is defined. So to communicate the client termination request from the spawned instance running in a private context the ClearUserID event must be enqueued to the main context where the appropriate listener is waiting.

Routing (i.e. route Item(...)) is still possible, but routed events stay within the boundaries on the context where the Producer and it's corresponding Consumer reside.  To logically expand the example, multiple Consumers could reside in the same context (i.e. a multi-cast design pattern as I described in the previous revision of this example).

 

This example is designed to illustrate the simplicity of parallelism in the Apama EPL. With just a few simple statements, one can quickly and easily leverage multi-core processor technologies for massive scalability.

As I mentioned earlier this is the final entry for this specific example, if you're just seeing this for the first time you can start from the beginning (only three short segments) here. I hope this has been informative and provided some insight into the Apama EPL, I plan to have many more code examples in the future on various use cases.

You can download the complete example here with the consumers, interface and producer. Any questions or comments, just let me know,
Louie


Monday, March 23, 2009

We're going on Twitter

Posted by Giles Nelson

Louis Lovas and myself, Giles Nelson, have started using Twitter to comment and respond to exciting things happening in the world of CEP (and perhaps beyond occasionally!).

The intent is to complement this blog. We'll be using Twitter to, perhaps, more impulsively report our thinking. We see Twitter as another good way to communicate thoughts and ideas.

We would be delighted if you chose to follow our "twitterings" (to use the lingo), and we'll be happy to follow you too.

Click here to follow Louis and here to follow Giles (you'll need to signup for a Twitter account).

Tuesday, March 10, 2009

The instance design pattern in the Apama EPL

Posted by Louis Lovas

<p>Instance design pattern in the Apama EPL</p>


In this my third installment on a series devoted to the Apama Event Processing Language (EPL), I'll continue where I left off last time in which I described the basic design of event passing for a consumer/producer model.  For this revision I've extended the model to support multiple consumers.  This introduces the instance management feature of the Apama EPL.  Instances or 'sub-monitors' as they're often referred to define a discrete unit of work. The unit of work can be very small (an analytic calculation) or very large (a whole trading algo). Each instance gets spawned with it's own parameter set, listens for it's own event streams and operates in a very singleton manner. To which I mean, within the semantics of the application an instance need only be concerned about managing itself not other instances. Overall, it is a  factory behavioral model extended to include an execution model.  This is a key aspect of the Apama EPL, making a common application requirement simple to implement, robust in design, and highly performant in the CEP model.

The Apama CEP engine manages the execution of these sub-monitors (also known as mThreads internally). In a typical production deployment, there would be 100's or 1000's of sub-monitors running. The spawn operator is the single statement in the language that accomplishes this unique feature. Spawn is basically a self-replicating scheme with certain behavioral rules. The main rule: the cloned instance does not get the active listeners (i.e. on all someEvent...) of the parent. It must establish it's own. Actually it's the perfect model for that factory-like behavior.  The clone does not want it's parents listeners, but would create it's own based on the parameters passed such as the symbol name in a trading algo or the subscriberId in our Producer example below. Speaking of our example ...

For the sake of brevity, I've just listed the extended Producer side of my consumer/producer example below. For the complete example, you can download it here.

The (extended) Item Producer.
package com.apamax.sample;

monitor ItemService {
   
    event ClearUserID {
        integer id;
    }
   
    UnsubscribeFromItems u;
        
    integer count := 0;
    float price := 0.0;
    listener l;
   
    action onload {
        // list of subscriber (user) identifiers
        sequence<integer> ids := [];
       
        SubscribeToItems s;
        on all SubscribeToItems():s {
            if ids.indexOf(s.subscriberId) = -1 then {
                ids.append(s.subscriberId);           
                 spawn startSubscriptions(s.subscriberId, s.item_name);      
            }
        }
       
        ClearUserID c;
        on all ClearUserID():c {
            log "in " + c.toString();   
            integer index := ids.indexOf(u.subscriberId);
                if index != -1 then {
                    ids.remove(index);
                }
        }
    }

    action startSubscriptions(integer this_subscriberId, string name) {
        log "in startSubscriptions";
       
        l := on all wait(0.1) {
            route Item(this_subscriberId, name, count, price);
            count := count + 1;
            price := price + 0.1;
        }

        on UnsubscribeFromItems(subscriberId = this_subscriberId):u {
            stopSubscriptions(u.subscriberId, u.item_name);
        }       
    }
   
    action stopSubscriptions(integer subscriberId, string name) {
        // do something to stop routing events
        log "in stopSubscriptions";
        l.quit();
        route ClearUserID(subscriberId);
    }
}

           


To get a general sense of what this bit of code is intended to do, I suggest a quick scan of my previous installment where I introduced this example.

The extended Item Producer is expected to manage multiple uniquely identified consumers. For that it must maintain a list of identifiers, one for each consumer.  It does that by appending and removing entries from an array (sequence<integer> ids).  his is a common idiom for tracking identifiers, syntactically it's similar in many imperative languages.

This example uses a single-cast event passing scheme where the Producer routes Item events uniquely tagged to the consumer (route Item(this_subscriberId, name, count, price)).

 On the consumer side, Item events are listened for based on a subscriberId (on all Item(subscriberId = myId)). It's the uniqueness of subscriberId (one per consumer) that defines this as a single-cast design. A common twist to this a multi-cast event passed scheme (not be be confused with the UDP multicast) where multiple consumers might be requesting the same information (i.e. the item_name in our example).  A well understood example of this would be a market data adapter providing trade data for the same symbol to multiple consumers.  The Item Producer would change very little to support a multi-cast event passing scheme.

In the listener "on all SubscribeToItems()", we spawn to the action startSubscriptions when we receive a  SubscribeToItems event from a consumer. We pass the parameters of  the consumer's identifier (s.subscriberId) and the item (s.item_name) to instantiate the new sub-monitor.  A new mThread of execution is created for the sub-monitor and it begins executing producing Item events. The parent monitor continues waiting (listening) for another SubscribeToItems event.

 You'll also notice the use of a private event ClearUserID, the purpose behind this is to communicate between the spawned sub-monitor(s) and main (parent) Monitor when an UnsubscribeFromItems request is received.  This is necesssary since the parent monitor manages the id's of connected consumers. A spawned sub-monitor uses this event to simply inform of termination.

The event paradigm in the Apama EPL extends far beyond the notion of processing data events. In one sense you could categorized events as data and control. Data events are consumed and processed by the CEP application. Control events direct the semantics of the application. 




 

This example is designed to illustrate a few powerful yet easy to use features of the Apama EPL:

  1. To highlight that the notion that managing multiple consumers (clients) becomes a simple and safe programming task in the Apama EPL. Instance management is an intrinsic design pattern based on the commonly understood factory model. We did not reinvent the wheel, we simply refined it and made it approachable in the CEP paradigm.
  2. Events are not just application data to be processes by monitors. They provide semantic control of an application as well.

Here's the complete example with the consumers, interface and producer

Once again thanks for reading,
Louie


Wednesday, February 25, 2009

Structured Programming in the Apama EPL

Posted by Louis Lovas

<p>Structured Programming in an EPL</p>


This is my second installment on a series devoted to the Apama Event Processing Language, MonitorScript. In my introduction to the Apama EPL I described the basic structure of the language, some concepts and terminology. I also included the obligatory Hello World sample. In this entry I'll continue that basic tutorial delving into the elemental structure of the EPL. I've decided to do this to highlight the very simple fact that the language adheres to the principles of structured programming.  Just so we're all on the same page, here's a definition I found on the web:

DEFINITION - Structured programming (sometimes known as modular programming) is a subset of procedural programming that enforces a logical structure on the program being written to make it more efficient and easier to understand and modify.


That's a definition that is clearly irrefutable. In order for a language to live up to that definition it must have a number fundamentals.  Code modularity, or developing code as separate modules. This allows for parallel development, improves maintainability and allows modules to be plug replaceable. Encapsulation is a means of hiding an implementation or inner workings of a calculation or algorithm. Lastly there are  Interfaces and instances. This architectural pattern  may be applied to the design and implementation of applications which transmit events between loosely coupled software components. These are just a few of those fundamentals. These essential elements of languages have been part of the scene for decades. They are present in the procedural programming languages like C and in object oriented languages like C++ and java.  

Lest we forget our roots as software engineers, with all the hype surrounding CEP and their constituent domain-specific languages, it's important to do a reality-check and not get swept up and forget the basic principles necessary for creating long-standing maintainable code. Last year I wrote a piece on readability as a criteria for a successful language this is primarily based on a development language having these basic fundamentals: modularity, encapsulation, interfaces and instances.

Apama's EPL, MonitorScript has all these capabilities, as I will describe below. It's what allows us to build reusable framework components, and solutions. Which we've done so with our Capital Markets Framework and Solution Accelerators for FX Aggregation, Smart Order Routing, Surveillance, Bond Pricing, etc.  These are components written in our EPL that have the plug'n'play modularity to be redeployed in multiple customers. 

To illustrate this idea of structure - modularity, encapsulation and interfaces I'll use a short example of a producer and consumer. This is a common design pattern or idiom we use extensively. The Apama EPL's event paradigm extends not only to the type of CEP applications we enable but also to the nature of the language itself. If you're familiar with message passing languages such as Erlang this will be a familiar concept. Different modules that make up an application communicate with one another by passing messages (or events as is the case). 

In this example I have a service or producer monitor that generates and sends Item events, a client or consumer monitor that consumes Item events and an interface for the interaction between the two. If the term Monitor seems strange, I've defined a set of terms and concepts in my introduction, I suggest a quick review of that get up to speed.

The interface to the monitor, defined below is the set of events it receives and transmits. The event definitions for these events are declared within a package name (i.e. com.apamax.sample). Apama's EPL supports java-like package names for name-space isolation to strengthen that modularity notion.


The Item Interface.

package com.apamax.sample;

event SubscribeToItems {
    string item_name;
}

event UnsubscribeFromItems {
    string item_name;
}

event Item {
   string  item_name;
   integer item_count;
   float   item_price;
}



The Item Interface is simply a set of prescribed events. Two are to instruct the consumer to start/stop receiving Items and the Item definition itself. As I mentioned it uses a subscription idiom.   We use this notion extensively where a Monitor is a layer over some subscription-based service such as a market data adapter. A real market data interface would be much more extensive. Here, I've scaled it back for the sake of simplicity, but you can imagine a more robust interface including error handling and status events.

 



The Item Consumer
.
package com.apamax.sample;

monitor ItemClient {

    SubscribeToItems subscribeToItems;
    Item item;
   
    action onload {
        subscribeToItems.item_name := "sample";
        route subscribeToItems;


        on all Item():item {
            log "Got an item: " +  item.toString();
            if (item.item_count > 10) then {
               route UnsubscribeFromItems("sample");
               log "All done.";
            }
        }
    }
}



The Item consumer is also a Monitor in the com.apamax.sample namespace.  It is a user of our event interface to the Item service and as such is interested in receiving events of type Item.  The interface defines the means to do this by subscribing. The consumer simply has to create a SubscribeToItems event and forward it to producer service.

As I mentioned earlier, the Apama EPL adheres to an event paradigm as a fundamental characteristic of the language. The route statement is a means by which Monitors communicate. This is another precept and underlines the fundamentals of modularity and encapsulation.

Once a subscription request has been sent  (route subscribeToItems), the consumer listens for Items events, (on all Item()).  In this simple example, we're just looking to receive them all without any filtering or pattern matching.  I will explore event pattern matching - both the simple and complex in a follow-up blog.

To complete the picture, the sample tests a field in the Item event and terminates the subscription if it exceeds a constant value, (item_count > 10).


The Item Producer.
package com.apamax.sample;

monitor ItemService {
   
    SubscribeToItems subItems;
    UnsubscribeFromItems unsubItems;
    integer cnt := 0;
    float price := 0.0;
    listener l;
   
    action onload {

        on all SubscribeToItems():subItems {
                startItems(subItems.item_name);         
        }

        on all UnsubscribeFromItems():unsubItems {
                stopItems(unsubItems.item_name);
        }
    }

    action startItems(string name) {
        l := on all wait(0.1) {
            route Item(name, cnt, price);
            cnt := cnt + 1;
            price := price + 0.1;
        }
    }

   
    action stopItems(string name) {
        // do something to stop routing events
        l.quit();
    }
}



The Item producer is also in the com.apamax.sample namespace.  It defines listeners for SubscribeToItems and UnsubscribeFromItems.Two events from our interface. Typically, subscriptions would be managed on a per-user basis, thus allowing multiple consumers to subscribe to our Item service. This is a detail I will outline in a subsequent installment along with a few other related features such as instance management. 

Once a subscription request has been received, the startItems action (i.e. a method) is invoked to continuously route Item events to the consumer every 0.1 seconds (on all wait(0.1) ...) .   Again, in a real world scenario, this particular portion of a service would be more involved, such as possibly managing the interaction to an external adapter for example.

For terminating a subscription on behalf of the client (on all unSubscribeItems()), we simply terminate the wait listener (a looping construct) set up in startItems.


 


This example is designed to illustrate a few common principles in the Apama EPL:

  1. To stress that the fundamentals of structured programming are ever present: modularity, encapsulation and interfaces. Two benefits of Apama's modularity that are noteworthy relate to the plug'n'play idea I mentioned earlier.  a) As typical with modular programming, revised modules can be introduced with little or no impact as long as interfaces are left intact. This is also a truism with the Apama EPL. b) Those revised modules (Monitors) can be replaced in a running system, a shutdown is not required. Furthermore, the modularity also extends beyond the boundary of a single CEP engine to multiple CEP engines in an Event Process Network (EPN) with no actual code change.
  2. The event paradigm is inherent in the language not just for application events but for the entire application design center.

In future installments on the Apama EPL, I'll delve into a few more of the language constructs that extend this basic idiom (multiple consumers, spawning, and parallelism).

Once again thanks for reading,
Louie


Monday, January 19, 2009

A Transparent New Year

Posted by Louis Lovas


It's been a while since I've put a few thoughts in print on these pages on event processing, but that's not to say I've not been busy. We've wrapped up quite a successful year in the Apama division within Progress software. For this year, 2009 we've got quite a number of new features and capabilities in store for the Apama CEP platform. We will be extending the breadth and depth our event processing language (EPL) to meet the challenges we've seen in the market and a vision we have for the future. Of course you will see announcements in our standard marketing press releases but I (and others) will explore the technical merits of those new features in these blog pages in much more detail. Something we've not done that much of in the past. There are many historical reasons for that not really worth explaining. Suffice to say our intention is to be more transparent in the coming year.

To kick that off, it's worth starting a bit of a tutorial on the Apama EPL, a language we call MonitorScript.  I'll begin with the basics here and in subsequent blogs build upon these main concepts, providing insight into the power and flexibility of our EPL. And as we release new extensions and capabilities it will be easier to explain the benefits of those new features. So without further ado, here is the first installment.

First a few basic concepts...

  • Apama MonitorScript is an imperative programming language with a handful of declarative statements. This is an important consideration and one we highlight as a distinction in our platform against competitive platforms that are based on a declarative programming model. The imperative model provides a more natural style of development similar to traditional languages like java and C++.
  • The language is executed at runtime by our CEP engine called the Correlator.  

Second a few basic terms...

  • A monitor defines the outer most block scope, similar to a class in java or C++. It is the basic building block of Apama applications. A typical application is made up of many monitors. As you might imagine monitors need to interact with one another, I'll explore that capability in a later blog.
  • A event defines a well ordered structure of  data. The EPTS Glossary definition has a handful of great examples of Events
  • A listener, defined by the on statement, declares or registers interest in an event or event pattern. In a large application it's the Correlator runtime engine that will typically process 1,000's or even 10,000's of registered event patterns. In our example below we just have one.
  • An action defines a method or function similar to java or C++. 
  • The onload() action is a reserved name (along with with a few others) that is analogous to main() in a java program. 
To put it all together... "A monitor typically listens for events and takes one or more actions when an event pattern is triggered."


The language sports a number of capabilities that will be familiar to anyone schooled in java, C++ or any traditional imperative programming language. I won't bore with all the nuances of data types and such obvious basics, those are well articulated in our product documentation and customer training courses. I will however, focus on the unique paradigm of event processing.

Apama MonitorScript, a basic monitor.

event StockTrade {
  string symbol;
  float price;
  integer quantity;
}

monitor StockTradeWatch {

  StockTrade Trade;

    action onload {
      on all StockTrade():Trade processTick;
    }

    action processTick {
      log "StockTrade event received" +
      " Symbol = " + Trade.symbol +
      " Price = "  + Trade.price.toString() +
      " Quantity = " + Trade.quantity.toString() at INFO;
    }
}


This monitor called StockTradeWatch defines an event of type StockTrade. Events can originate from many sources in the Apama platform, the most obvious would be an adapter connected to a source of streaming data (i.e. stock trades as example shows), but they can come from files, databases, other monitors, even monitors in other Correlators.

The onload action declares a listener for events of type StockTrade (i.e. on all StockTrade). When the monitor StockTradeWatch receives StockTrade events, the action processTick is invoked. As you can see in this example we simply log a message to indicate that it occurred. The obvious intent is that this monitor will receive a continuous stream of StockTrade events, each one will be processed in-turn by the processTick action.  One can be more selective in the event pattern with a listener, such as on all StockTrade(symbol="IBM"). I will explore the details of event patterns and complex listeners later on.

As I mentioned, I've started with a simple example that shows the basics of the Apama EPL, MonitorScript. It demonstrates the simplicity by which one can declare interest in a particular event pattern, receive matching events and act on them in your application (i.e. action).

 In subsequent installments I will demonstrate more complex features highlighting the power of the language.  That's all for now.


You can find the second installment here.
Regards,
Louie


Wednesday, February 20, 2008

Apama Monitorscript by Example

Posted by Louis Lovas

There have been a couple recent posts on Apama's Monitorscript language, both here and here. To provide a bit more insight into the Apama EPL, below is a working sample that demonstrates a number of its capabilities. The language includes a declarative nature for defining and registering listeners for specific event types and it has a java-like syntax for imperative logic.  The language provides a balance between a recognizable vernacular and a purposed nature for event processing.

 

Example narrative

Prior to an annotated walk-thru of the code sample, I thought it would help to first explain its purpose and what event streams it's processing. This simple example defines a work dispatcher. It receives a request in the form of an event (AddSymbol) to dispatch a discrete listener against an event stream of market depth (bids and asks) events. This discrete listener processes the market depth for a specific symbol.  The actual work performed as it pertains to this example is inconsequential and is represented by an empty method (processDepth). Additionally, once a listener is dispatched it also listens for a request to terminate.

 

The subtleness of this example is its ability to leverage the simplicity of the Apama EPL and the power of the runtime engine wherein it executes. Thousands or even tens of thousands of listeners can be dispatched each running in its own independent context processing its unique slice of the streaming market data.

 

In reality there are a number of techniques that can be employed within the MonitorScript EPL to accomplish this sort of work dispatcher. The EPL includes a spawn operator which I've outlined in a previous blog. The spawn operator is the primary means for establishing independent worker threads and is the basis for instance creation. The example below focuses on event listeners to define discrete units of work.

 

1 package com.apamax.sample;

 

2 monitor ProcessMarket {

3  sequence <string> symbols; // contains list of symbols to process.

4   com.apama.marketdata.SubscribeDepth subDepth;

5   com.apama.marketdata.Depth adepth;

6  dictionary< string, string > emptyDict;

 

7  action onload {

 

8    // Listen for incoming AddSymbol events and

9    // add to symbols list if not already present

10    AddSymbol addSymbol;

11    on all AddSymbol(): addSymbol {

12     if symbols.indexOf(addSymbol.symbol) = -1 then {

13       string local_symbol := addSymbol.symbol;

14       symbols.append(local_symbol);


15        // Subscribe to this symbol

16       route com.apama.marketdata.SubscribeDepth("", "", local_symbol, emptyDict );

 

17       // wait for 20.0 seconds, if no depth event received, terminate

18       listener waitListener;

19       waitListener := on wait(20.0) {

20          route RemoveSymbol(local_symbol);

21       }

 

22       listener depthListener;

23       depthListener := on all com.apama.marketdata.Depth(symbol=local_symbol):adepth {

24           waitListener.quit();

25           processDepth(adepth);

26       }

 

27       // Listen for RemoveSymbol events and remove from symbols list,

28       // unsubscribe & quit

29       RemoveSymbol removeSymbol;

30       on RemoveSymbol(symbol=local_symbol): removeSymbol {

31           integer index := symbols.indexOf(removeSymbol.symbol);

32           if index != -1 then {

33              symbols.remove(index);

34             processRemove(removeSymbol.symbol);

35

36             // Unsubscribe to this symbol

37             route com.apama.marketdata.UnsubscribeDepth("", "",

                                                            removeSymbol.symbol,

                                                           emptyDict );

38             depthListener.quit();

39             }

40       }

41     }

42     else {

43       log "Debug: Ignored (existing) Add Symbol Event = " + addSymbol.symbol;

44     }

45    }

46 }

 

47    action processDepth(com.apama.marketdata.Depth d) {

48       // Do something

49    }

 

50    action processRemove(string s) {

51       // Do something.

52    }

53 }

 

 

Example Annotation

 

In describing this example, the first point to note is that the event definitions are not included. For the sake of brevity they're assumed to be defined elsewhere. Actually there are only a few anyway. They can be categorized into two logical groups; control events (AddSymbol, RemoveSymbol, SubscribeDepth, UnsubscribeDepth) and data events (Depth).  This categorization is only for a semantic understanding of the example, there is no such classification in the language. Additionally, Monitorscript has an easily recognizable syntax to anyone schooled in Java, C++ and other classic languages.

 

A monitor (line 2) defines the encapsulating block definition. Similar to a java class it is typically scoped to a package name space (line 1). Monitors are the main block scope and a typical Apama application is made up of many monitors that interact with one another by sending and receiving events. Within a monitor one can declare events, define actions (i.e. methods) and variables (integers, strings, floats, etc.). This example defines a handful of monitor-scoped variables. The language also supports a number of complex data types; the sequence and dictionary both use a C++ template style declaration to define array types and collection types respectively (lines 3 and 6).

 

The onload action (line 7) is the main entry point of a monitor. When a monitor is loaded into the runtime engine, it's onload action is immediately invoked. This work dispatcher example is entirely implemented within this action, for the sake of brevity it's a simple way to describe the language. Line 10 defines an instance of an AddSymbol event and declares a listener for all occurrences of this event type (line 11).  The remainder of the functionality of this example is scoped to the encapsulating block of this listener (lines 12 – 45). This is an important note, since the intent is to receive and process multiple AddSymbol events (potentially 1,000's) where each AddSymbol will cause the invocation (dispatch) of a discrete unit of work that is represented by this encapsulating block of code. Within this block of code we communicate with other monitors and establish a number of unique listeners for this unique symbol name.

 

The route statement (line 16) sends a SubscribeDepth event. Route is the standardized form of communication between monitors. Under the covers, the route statement causes the event to be routed to be placed at the head of the engine's input queue – thus become the next event to the processed by the engine.  Semantically, routing a SubscribeDepth event starts the flow of Depth events for this symbol (i.e. local_symbol). Lines 22-26 establish a listener to receive the stream of Depth events for this symbol, calling the action processDepth upon receipt of each one.

 

In addition to establishing a Depth listener, this block of code also creates a wait timer in lines 17-21. The purpose of this timer is to terminate this dispatched unit of work for this unique symbol if we do not receive an initial Depth event within 20 seconds. Line 24 kills that wait listener once the Depth events start flowing. Termination is handled by the RemoveSymbol listener declared at line 30. Note that since it will be executing within the context of a specific symbol's unit of work we're only interested in receiving a single occurrence of RemoveSymbol. This is specified in the on statement – sans the all modifier. Upon receipt of a RemoveSymbol event we unsubscribe, remove the symbol's entry from the list and terminate (i.e. quit) the Depth listener for this symbol. Like AddSymbol, RemoveSymbol control events can arrive from another monitor or a client connected to the runtime engine.

 

I hope this simple example sheds light on the simplicity, elegance and power of the Apama Montorscript EPL.

 

Post Script …

 

After posting this blog, one of my esteemed colleagues with a much better command of the Monitorscript language offered a few refinements to avoid the need to manually handle termination (i.e. lines 17 – 21 in the code snippet). It does add one new control event - Terminate, but it avoids the need to use listener variables.

 

 

on com.apama.marketdata.Depth(symbol=local_symbol):adepth and not wait (20.0) {

   on all com.apama.marketdata.Depth(symbol=local_symbol):adepth and not Terminate(local_symbol) {

   processDepth(adepth);

  }

}

 

...

 

 

on RemoveSymbol(symbol=local_symbol):removeSymbol {

     ...

  route Terminate(removeSymbol.symbol);

}

 

 

 

This enhancement shows the declaration of complex (or compound) listeners against multiple event streams (and a timeout condition) concurrently. This is a commonly used technique in MonitorScript – and clearly quite powerful.

 

Monday, February 11, 2008

Apama CEP Code Snippet

Posted by Chris Martins

We've posted examples of coding in alternative CEP languages in the past to illustrate how concise or verbose those approaches might be in expressing an event processing function.  An example of Apama's language has made an appearance in a blog posting by Lab49.  In the example, the code defines very crisply an operation in which the system responds to incoming price events, ensuring that you skip intermediate events and always process the latest event.


Tuesday, November 13, 2007

Taking Aim

Posted by Louis Lovas

I am both humbled and appreciative of all the accolades, constructive comments (hey, fix that misspelled word) and yes, criticism on my latest blog about using SQL for Complex Event Processing. I was expecting some measure of response, as shown by this rebuttal given the somewhat polarizing nature of using the SQL language for CEP applications. For every viewpoint there is always an opposing, yet arguably valid outlook. I welcome any and all commentary. Again thanks all reading and commenting.

There were two main themes of the criticism that I received. One was on the viability of the SQL language, the other on my commentary on the use of Java and C++ for CEP applications. I would like to clarify and reinforce a few points that I made.

I chose the aggregation use-case as an example to highlight limitations of SQL because its one that I have recent experience with. I have been both directly and indirectly involved in six aggregation projects for our financial services customers over the past year. In those endeavors I've both learned much and leveraged much. As I tried to describe in a condensed narrative, aggregation is a challenging problem. One that is best solved by use of complex nested data structures and associated program logic. Trying to represent this in SQL is a tall order given the flat 2-dimensional nature of SQL tables and the limited ability for semantic expression. To try to further explain this, I have taken a snippet of a MonitorScript-based implementation, presenting it as a bit of pseudo code that describes the nested structure I am referring to. I've intentionally avoided any specific language syntax and I've condensed the structures to just the most relevant elements. But suffice to say, defining these structures is clearly possible in Java and C++ and Apama's MonitorScript. I would also like to give credit where it's due and acknowledge many of my colleagues for the (abridged) definition I'm using below from one of our customer implementations.

structure Provider {

string symbol; // symbol (unique to this provider)

string marketId; // the market identifier of the price point

integer quantity; // quantity the provider is offering

float timestamp; // time of the point

float cost; // transaction cost for this provider

hashmap<string,string> extraParams; // used for storing extra information on point

}

 

structure PricePoint

{

float price;// the price (either a bid or ask)

array<Provider> providers; // array of providers at this price

integer totalQty; // total quantity across all providers at this price

}

 

structure AggregatedOrderBook

{

integer sequenceId; // counter incremented each time the book is updated

integer totalBidQuantity; // total volume available on the bid side

integer totalAskQuantity; // total volume available on the ask side

integer totalProviders; // total number of providers

array<PricePoint> bids; // list of bids, sorted by price

array<PricePoint> asks; // list of asks, sorted by price

}

An aggregation engine would create an instance of AggregatedOrderBook for each symbol, tracking prices per market data Provider. As market data quotes arrive they are decomposed and inserted (sort/merged) into the appropriate PricePoint and total values are calculated. This is an oversimplification of what transpires per incoming quote, but the aim here is to provide a simplified yet representative example of the complexities in representing an Aggregated Order Book.

Furthermore, after each quote is processed and the aggregate order book is updated it's imperative that it be made available to trading strategies expeditiously. Minimizing the signal-to-trade latency is a key measure of success of algorithmic trading. Aggregation is a heavyweight, compute intensive operation. It takes a lot of processing power to aggregate 1,000 symbols across 5 Exchanges. As such, it is one (of many) opposing forces to the goal of minimizing latency. So this presents yet another critical aspect of aggregation, how best to design it so that is can deliver its content to eagerly awaiting strategies. One means of minimizing that latency is to have the aggregation component and trading strategies co-resident within the CEP runtime engine. Passing (or otherwise providing) the aggregated order book to the strategies becomes a simple 'tap-on-the-shoulder' coding construct. But it does imply the CEP language has the semantic expressiveness to design and implement both aggregation and trading strategies and then the ability to load and run them side-by-side within the CEP engine. Any other model implies not only multiple languages (i.e. java and streamSQL) but likely some sort of distributed, networked model. Separating aggregation from its consumers, the trading strategies will likely incur enough overhead that it impacts that all important signal-to-trade latency measure. I do realize that the CEP vendors using a streaming SQL variant have begun to add imperative syntax to support complex prodedural logic and "loop" constructs something I'm quite glad to see happening. It only validates the claim I've been making all along. The SQL language at its core is unsuitable for full-fledged CEP-style applications. The unfortunate side effect of these vendor-specific additions is that it will fracture attempts at standardization.

In my previous blog, I wanted to point out the challenges of the SQL language to both implement logic and manage application state. To that end, I provided a small snippet of a streamSQL variant. A criticism leveled against it states that it's an unnecessarily inefficient bit of code. I won't argue that point, and I won't take credit for writing it either. I simply borrowed it from a sample application provided with another SQL-based CEP product. The sample code a vendor includes with their product is all too often taken as gospel. A customer's expectation is that it represents best practice usage. Vendors should take great care in providing samples, portions of which inevitably end up in production code.

The second criticism I received was on a few unintentionally scathing comments I made against Java and C++. I stated that using C++ and/or Java "means you start an application's implementation at the bottom rung of the ladder". My intent was to draw an analogy to CEP with its language and surrounding infrastructure. All CEP engines provide much more than just language. They provide a runtime engine or virtual machine, connectivity components, visualization tools and management/deployment tools. CEP vendors like all infrastructure vendors live and die by the features, performance and quality of their product. All too often I've witnessed customers take a "not invented here" attitude. They may survey the (infrastructure) landscape and decide "we can do better". For a business' IT group chartered with servicing the business to think they can implement infrastructure themselves is a naïve viewpoint. Granted, on occasion requirements might be so unique that the only choice is to start slinging C++ code, but weighing the merits of commercial (and open source) infrastructure should not be overlooked.

My goal in this and past blogs is to provide concrete use-cases and opinions on CEP drawn from my own experiences with designing, building and deploying Apama CEP applications. In doing so I was quite aware that I am drawing a big red bulls-eye on my back making me an easy target for detractors to take aim. Surprisingly, I have received much more positive commentary than I ever expected and fully professional criticisms. I thank all that have taken the time to read my editorials, I am quite flattered.

Monday, November 05, 2007

Bending the Nail

Posted by Louis Lovas

In my recent blogs (When all you have is a hammer everything looks like a nail and Hitting the nail on the head) one could conclude that I've been overly inflammatory to SQL-based CEP products. I really have no intention to be seditious, just simply factual. I've been building and designing software far too long to have an emotional response to newly hyped products or platforms. Witnessing both my own handy work and many compelling technologies fade from glory all too soon has steeled me against any fervent reactions. I've always thought the software business is for the young. As with most professions one gets hardened by years of experience, some good, and some only painful lessons. Nonetheless, over time that skill, knowledge and experience condenses. The desire to share that (hard-won) wisdom is all too often futile. The incognizant young are too busy repeating the same mistakes to take notice. Funny thing is … wisdom is an affliction that inevitably strikes us all.

Well, enough of the philosophical meanderings just the facts please …

In a recent blog, I explored the need for CEP-based applications to manage state. As a representative example, I used the algo-trading example of managing Orders-in-Market. The need to process both incoming market data and take care of Orders placed is paramount to the goals of algorithmic trading. I'll delve a bit deeper into the state management requirement but this time focusing on the management of complex market data, the input if you will to the algorithmic engine. Aggregation of market data is a trend emerging across all asset classes in Capital Markets. Simply put, aggregation is the process of collecting and ordering quote data (bids & asks) from multiple sources of liquidity into a consolidated Order Book. In the end, this is a classic sort/merge problem. Incoming quotes are dissected and inserted into a cache organized by symbol, exchange and/or market maker and sorted Bid and Ask prices. Aggregation of market data is applicable to many asset classes (i.e. Equities, Foreign Exchange and Fixed Income). The providers of liquidity in any asset class share a number of common constructs but an equal number of unique oddities. For the aggregation engine, there are also common requirements (i.e. sorting/merging) and a few unique nuances. It's the role of the aggregation engine to understand each provider's subtleties and normalize them for the consuming audience. For example, different Equities Exchanges (or banks providing FX liquidity) can use slightly different symbol naming conventions. Likewise, transaction costs can (or should) have an influence on the quote prices. Many FX providers put a time-to-live (TTL) on their streaming quotes, which implies the aggregation engine has to handle price expirations (and subsequently eject them from its cache). In the event of a network (or other) disconnection, the cache must be cleansed of that provider's (now stale) prices. The aggregation engine must account for these (and a host of other needs) since its role is to provide a single unified view of an Order Book to trading applications. The trading applications can be on both sides of the equation. A typical Buy-side consumer is a Best Execution algo. Client orders or Prop desk orders are filled by sweeping the aggregate book from the top. For Market Makers, aggregation can be the basis for a Request For Quote (RFQ) system.

At first glance, one would expect that SQL-based CEP engines would be able to handle this use-case effectively. After all, sorting and merging (joining) is a common usage of SQL in the database world and streaming SQL does provide Join and Gather type operators. However, the complexities of an aggregation model quickly outstrip the use of SQL as an efficient means of implementation. The model requires managing/caching a complex multi-dimensional data structure. For each symbol, multiple arrays of a price structure are necessary, one for the bid side another for the ask side. Each element in the price structure would include total quantity available at this price and a list of providers. Each provider entry in turn, ends up being a complex structure in itself since it would include any symbol mapping, transaction costing, expiration and connectivity information. At the top level of the aggregate book would be a summation of the total volume available (per symbol of course). Algos more interested in complete order fulfillment (i.e. fill-or-kill) would want this summary view.

Using stream SQL to attempt to accomplish this would mean flattening this logical multi-dimension object into the row/column format of a SQL table. SQL tables can contain only scalar values; multidimensional-ness can only be achieved by employing multiple tables. I don't mean to imply this is undesirable or illogical. Initially it seems like a natural fit. However, an Aggregated Book is more than just it's structure, but as I mentioned above, a wealth of processing logic. In the end one would be bending the SQL language to perform unnatural acts in any attempt to implement this complex use-case.

To illustrate an unnatural act, here's a very simple streamSQL example. The purpose of this bit of code is to increment an integer counter, (TradeID = TradeID + 1) on receipt of every tick (TradesIn) event and produce a new output stream of ticks (Trades_with_ID) that now includes that integer counter - a trade identifier of sorts.

 

CREATE INPUT STREAM TradesIn (

Symbol string(5),

Volume int,

Price double

);

 

CREATE MEMORY TABLE TradeIDTable (

TradeID int,

RowPointer int,

PRIMARY KEY(RowPointer) USING btree

);

 

CREATE STREAM Trades_with_ID;

 

INSERT INTO TradeIDTable (RowPointer, TradeID)

SELECT 1 AS RowPointer, 0 AS TradeID

FROM TradesIn

ON DUPLICATE KEY UPDATE

TradeID = TradeID+1

RETURNING

TradesIn.Symbol

AS

Symbol,

TradesIn.Volume

AS

Volume,

TradesIn.Price

AS

Price,

TradeIDTable.TradeID

AS

TradeID

INTO Trades_with_ID;

 

The state to manage and the processing logic in this small stream SQL snippet is no more than incrementing an integer counter (i.e. i = i + 1). In order to accomplish this very simple task a memory table (TradeIDTable) is used to INSERT and then SELECT a single row (1 AS RowPointer) that contains that incrementing integer (ON DUPLICATE KEY UPDATE TradeID = TradeID + 1) when a new TradesIn event is received. In a way, a rather creative use of SQL don't you think? However, simply extrapolate the state requirements beyond TradeID int and the processing logic beyond TradeID = TradeID + 1 and you quickly realize you would be bending the language to the point of breaking.

In the commercial application world, relational databases are an entrenched and integral component. SQL is the language for applications to interact with those databases. As applications have grown in complexity, the data needs have also grown in complexity. One outgrowth of this is a new breed of application service known as Object-Relational (O/R) mapping. O/R mapping technologies have emerged to fill the impedance mismatch between an application's object view of data and SQL's flat two-dimensional view. A wealth of O/R products are available today so the need for such technologies clearly exists.

Why am I mentioning O/R technologies in a CEP blog? Simply to emphasize the point that the SQL language, as validated by the very existence of O/R technologies in the commercial space, is a poor choice for CEP applications. As I've mentioned in previous blogs, programming languages that provide the vernacular to express both complex structures (objects) and complex semantics (programming logic) are as necessary for aggregation as they are for Orders-in-Market or any CEP application.

So what sort of language is appropriate for CEP? Well, there is always the choice of Java or C++. Using traditional languages such as Java and C++ clearly provide this expressiveness and can be used to build applications in any domain. However, trailing along behind that expressiveness is also risk. Using these languages means you start an application's implementation at the bottom rung of the ladder. The risk associated with this is evident in many a failed project. A step up is domain-specific languages. For the domain of streaming data, Event Programming Languages (EPL's) are clear winners. Like C++ and Java they contain syntax for defining complex objects (like an Aggregated Order Book) and imperative execution but they also include a number of purposed declarative constructs specifically designed to process streaming data efficiently. Apama's MonitorScript is one such EPL.