Louis Lovas

Sunday, June 21, 2009

High Frequency Trading driving the need to build quickly, run fast

Posted by Louis Lovas

<p>High Frequency Trading driving the need to build quickly, run fast</p>


In just about any race there is usually a starting point and a finish line, unless of course you are in an arms race.  For that sort of race there may have been some nebulous beginning in the distant past, but there is no finish line. The race just keeps sprinting along, each competitor angling for an edge, regularly recharging their ammunition supply with some new weaponry to get ahead however slight or temporary.

I recently read an interesting article describing High Frequency Trading as embroiled in an arms race. I certainly believe it's well entrenched in such a conflict, but frankly this combat has arguably had a beneficial net effect especially in that it's contributed to the wellspring of invention, inspiring the creative spirit in all the supporting attributes that make High Frequency Trading a reality. Behind any trader (and trading firm) is an entire armada including the vendors supplying the underlying hardware, networks, software platforms and trading applications.  They are all immersed in the war.  As new hardware, software and/or algo's are deployed it allows the trader to do battle and speed ahead even if it's just for a short while.  Competitive pressures, increasing market volatility, regulatory imperatives, risk mitigation and a host of other challenges are the land mines and roadside bombs on the long and winding road that stall and slow causing re-tooling and re-stocking the ammunition (i.e. algo strategies). There is no time to stop and catch your breath or stand on the roadside.

Sang Lee from the Aite Group reports that High Frequency Trading has had a significant impact on the overall market, providing greater liquidity, tighter spreads and overall improving the quality of the market.  At the macro level these are great advancements and mark a natural evolutionary step due to so many market changes in recent years (i.e. electronic trading venues, adoption of CEP platforms for algo trading, etc.) in Equities and beyond (i.e. FX and Futures & Options).  Down in the trenches, the battles rage on day by day as a multitude of traders and an untold number of algo strategies provide the market liquidity by moving in and out of positions in milliseconds (or even less time). The trading firms engaged in this never ending conflict drive a set of imperatives on software infrastructures for building and deploying algos in the High Frequency battlefield:

Rapid development and customization of algo's

Algo strategies in the High Frequency world have a limited life time. They soon become obsolete (i.e. whatever alpha they took advantage of has disappeared due to the competition, economic changes, or other situations).  To react and respond to this inevitability, having the right sort of tooling to recalibrate strategies is a necessity. This includes graphical modeling tools for Quants to prototype ideas quickly, backtest with historic data, test in a scalable manner to instill confidence prior to production rollout and lastly dynamic parameterization of strategies from graphical dashboards. Not forgetting the code-slinging types, an Integrated Development Environment (IDE) for support of event processing language (EPL) development for more low-level tasks.


Abstracting over increasingly complex strategy logic

Supporting Quants with a rich and robust set of functionality from the basics (connectivity to markets) to the advanced (Linear Algebra, Black Scholes, and other statistical functions).


Support for the 'ilities (availability, security, reliability, ...) to manage the mundane

Deploy with confidence. An important role of software infrastructure is to instill confidence that deployed strategies are always available, securely accessed and run without failure.


Support for scalable performance, providing high throughput and low latency

This is probably the most paramount requirement in the arms race of High Frequency Trading.  The race to the microsecond is pushing both hardware and software vendors alike. Parallelism in CEP engines like Apama's Correlator can leverage multi-core processor architectures like the Intel Nehalem


Along with my colleague Dan Hubscher,  I have recorded a 30 minute webinar that describes how the Apama platform along with the Apama Algorithmic Trading Accelerator meet these imperatives.

The pre-recorded webinar, is available here:  Apama Algorithmic Trading Accelerator, Build Quickly, Run Fast.

Once again thanks for reading (plus watching and listening to the webinar in this case), you can also follow me at twitter, here.
Louie



Wednesday, May 27, 2009

Location Tracking, fit-for-purpose design patterns in CEP

Posted by Louis Lovas


As the CEP community is well aware the technology often gets compared and contrasted to traditional data processing practices. It's a topic that ebbs and flows like the ocean's tides. It's a conversation that occurs regularly in various forums, with clients, analysts and of course within this community at large. Far be it from me to stir things up again but I believe there is some credible evidence the critics draw upon. This touched a nerve not too long ago when Curt Monash hammered the CEP community.  The response was swift, but frankly unconvincing.

In many respects, I believe this argument is rooted in how some CEP vendor's have architected their product. Many vendors have a focus of event stream processing as a variant of database processing. They see streaming data as just a database in motion and therefore have designed and built their products myopically around that paradigm. By doing so those vendors (possibly inadvertently) have plotted a course where the fit-for-purpose of their products is focused on use-cases that are data-intake dominate. They can consume the fire-hose of data, filter, aggregate and enrich it temporally but little else. What is missing in this equation is the definition of the semantics of the application. Whether that is a custom application such as algo trading or monitoring telco subscribers or some business intelligence (BI) dashboard. To those vendors, that is viewed as adjunct or external (i.e. the client) and solved with the typical complement of technologies; java, C++, .NET and considered outside of the direct domain of CEP. 

While this paradigm clearly does work, it incites the (CEP) critics; "where's the distinguishing characteristics? why can't I just do this with traditional data processing technologies?".

A challenging question when so many CEP products are stuck with that look and feel of a database, even a few of the academic projects I've reviewed seem to be myopically centered on this same paradigm. It reminds of that television commercial with the tag line:  "Stay Within the Lines. The Lines Are Our Friends." (I think it was for an SUV). Quite frankly such thinking does not represent the real world. Time to think outside the box (or table as the case may be).

Yes, the real world is full of in motion entities, most often interacting with each other in some way. Cars and trucks careening down the freeway zigzag from one lane to another at high speed with the objective of reaching a destination in the shortest possible time without a collision.  Would be an interesting CEP application to track and monitor the location and movement of such things.  

In fact, location tracking is beginning to show signs of being a common use-case with the Apama platform. Not long ago we announced a new customer, Royal Dirkzwager that uses Apama to track ship movements in sea lanes. My colleagues Matt Rothera and David Olson recently published a webinar on maritime logistics. This webinar follows the same basic premise as the Royal Dirkzwager use-case, that of  tracking the location of ships at sea.  In fact, we aren't the only one seeing activity in location tracking, here's a similar use-case for CEP in location-based defense intelligence.  The basic idea is the ability to track the movement of some entity, typically in relation to other entities, are they getting closer together (i.e. collision detection) or moving further apart (i.e. collision avoidance), are they moving at all? at what speed? will they reach a destination at an appropriate time? A CEP system needs, at it's core the ability to have both temporal and geospatial concepts to easily support this paradigm.  Here's a handful of examples where this applies:

  • Tracking ship movements at sea (as I mentioned with Royal Dirkzwager, and the Apama webinar on maritime logistics)
  • Airplanes moving into and out of an airspace
  • Baggage movement in an airport
  • Delivery trucks en route to destinations
  • Service-enabled mobile phones delivering content as people move through shopping and urban areas
  • Men, machines and material moving on the battlefield


These are just a handful of location tracking use-cases for which the Apama platform is well suited.

Another colleague, Mark Scannell has written a location tracking tutorial that is part of the Apama Studio product. This is a great example that exemplifies the power of the Apama EPL for building location tracking CEP applications. The tutorial provides a narrative description explaining it's purpose and the implementation. Below I've included a small snippet of that example to highlight the elegant simplicity, yet powerful  efficiency of the Apama EPL. If you're in need of a brief introduction to the Apama EPL, you can find that here, the beginning of a three part series on the language.


Location Tracking in the Apama EPL
.
// Track me - the tracked entity
action trackMe() {
       
  // Call self again when new location is detected
  on LocationUpdate( id = me.id ):me {
     trackMe();
  }

  // Detect a neighbor in our local area -
  // do this by excluding events that are for our id,
  // which will also cause us to reset this listener
  // through calling trackMe() again.
 
  LocationUpdate n;
  on all LocationUpdate( x in [ me.x - 1.0 : me.x + 1.0 ],
                         y in [ me.y - 1.0 : me.y + 1.0 ] ):n and
     not LocationUpdate( id = me.id ) {

     // Increment number of neighbors that have been spotted
     // and update the alarm
     spotted := spotted + 1;
     updateAlarm();           

     // Decrement count of spotted neighbors after one second
     on wait ( 1.0 ) {
       spotted := spotted - 1;
       updateAlarm();
     }
  }
}

As a brief introduction, the Location Tracker tutorial is designed to track the movement of Entities (i.e. cars, ships, trucks, planes, or any of those things I listed above) in relation to other Entities within a coordinate system or grid. An entity is considered a neighbor if it is within 2 grid units (-1,+1) of any other entity. The grid and the units within the grid are largely irrelevantly for the syntactic definition of entity tracking. Their semantic meaning on the other hand, is within the context of a specific use-case (i.e. a shipping harbor, air space, battlefield, etc.).

From the tutorial I pulled a single action, trackMe, it contains the heart and soul of the tracking logic.  As entities move they produce LocationUpdate events. The event contains the entities unique id and the X,Y coordinate of the new location. This trackMe action is designed to track their movement by monitoring LocationUpdate events. For each unique entity there is a spawned monitor instance (a running micro-thread so to speak) of this trackMe action. 

The idea is that when an entity moves its new location is instantly compared against all other tracked entities (except of course itself, witness the recursive call to trackMe when id's match (id = me.id)) to determine if it has become a neighbor (remember the 2 grid units).  This is elegantly implemented with the listener "on all LocationUpdate( x in [me.x - 1.0 : me.x + 1.0], ...". In a narrative sense, this can be read as "If the X,Y coordinate of this entities new location is within 2 grid units (-1.0, + 1.0)  of me then identify it as a neighbor and update an alarm condition"  (via  a call to updateAlarm()).

This small bit of code (about 20 lines) exhibits an immensely powerful geospatial concept, the ability to track the movement of 100's, 1000's even 10,000's of entities against each other as they move, and of course this is accomplished with millisecond latency.


 


This small example demonstrates a few of characteristics of the Apama EPL, specifically that it is an integrated well-typed procedural language with event expressions. It allows you to code complex event conditions elegantly expressed in the middle of your procedural program. This allows you to focus on the logic of your application instead of just selecting the right event condition.

However to get a clear picture, the language of CEP is just one aspect of an overall platform. The Apama strategy has also been focused on a whole product principle, one where the whole is greater than the sum of the parts. As a mandate to our vision we have outlined four key defining characteristics: 1) A scalable, high performing Event Engine. 2) Tools for rapid creation of event processing applications supporting business and IT users. 3) Visualization technologies for rich interactive dashboards and 4) An Integration fabric for the rapid construction of high performance, robust adapters to bridge into the external world.

The possibility exists that CEP will struggle to break out as a truly unique technology platform when so many just see a variant of database technology.  It's time to break out of the box, drive over the lines and succinctly answer the critics questions. CEP is not about tables, rows and columns but events. Events that are often artifacts of the real world. A world that is in constant motion, be it ships, planes, car, trucks, phones, or you and me. Information flows from it all in many forms but that does mean we have squeeze it into the database paradigm.

Once again thanks for reading, 
Louie


Thursday, April 09, 2009

Scalable concurrency, a design pattern in the Apama EPL

Posted by Louis Lovas


This is my final installment in a series devoted to a specific example in the Apama EPL. I began this example by describing the basic design pattern of a  consumer/producer.  Further enhancements enabled multiple consumers and as a result the instance idiom.  Finally below, I will again enhance this consumer/producer by illustrating how one can leverage multi-core processors for massive scalability and parallelism.

As I have mentioned before, instances or 'sub-monitors' as they're often referred to in the Apama EPL define a discrete unit of work. That unit of work represents a set of business logic however large (a complete application scenario) or small (a simple analytic).  Instances are created on demand using the spawn operator in the language. Each scenario instance is invoked with a unique set of input parameters that represent that occurrence. Each instance can then uniquely maintain its own reference data, timers and event streams, in effect its own state.  In general programming patterns this is known as a factory behavioral model but we've extended it to include an execution model.

To provide a means to leverage multi-core processors, the Apama EPL provides a syntax and a simple semantic to allow those instances to execute in parallel. We do this with a language feature called contexts. These are silos of execution which take the factory model to the next level. A context defines a logical container that holds and executes instances of a scenario (of the same or differing types). The EPL provides a semantic for inter-context communication, there is no need for mutexes, semaphores or other locking schemes thus avoiding common deadlock code patterns typical of imperative languages such as java. Each context in effect has it's own logical input queue to which events are streamed from external sources or other contexts.  Behind contexts our CEP engine squeezes the most out of operating system threads to leverage maximum use of multi-core processors.

The same CEP engine can create multiple contexts (a context pool as you'll soon see in the code example below), they can be used to hold and execute multiple scenario instances, additionally those instances can create sub-contexts for additional parallelism. If for example, these instances are an application for pricing Options and require a compute-intensive calculation such as Black Scholes, additional contexts can be spawned for these calculations. Furthermore, sub-contexts can be designed as shared compute services to be leveraged by multiple scenario instances running in different (parallel) contexts.

Contexts take the factory model and extend it to include a parallel execution model with a few simple keywords in the EPL as you'll soon see below.

The enhancements to the Item consumer/producer include a Context Pool which I've listed the code for below and the enhanced Item Producer that leverages it. The interface is unchanged except for one new event and the Consumer (client) has a minor revision  (thus adhering to my belief that an EPL should follow the principles of structured programming of modularity and encapsulation that I've blogged on at the start of this series).  The complete example for this revision is available here and requires Apama version 4.1 (or later of course).





The Context Pool
.

package com.apamax.sample;


event ContextPool {
    integer numContexts;
    sequence<context> contexts;
    integer idx;
   
    action create(integer nc, string name) {
        self.numContexts := nc;
        while(nc > 0) {
            contexts.append(context(name, false));
            nc := nc - 1;
        }
    }
   
    action getContext() returns context {
        context c:= contexts[idx];
        idx := idx + 1;
        if(idx=numContexts) then {
            idx := 0;
        }
        return c;       
    }
}


The ContextPool as implemented here is a general-purpose utility that provides a pool of contexts via a create method (i.e. action) and a means to distribute a workload across them in a simple round-robining technique each time the getContext action is called.

As I mentioned above contexts are mapped to operating system threads, so judicious use of the create action is expected. The basic rule-of-thumb is that number of total contexts should equal the number of cores on a server.  One noteworthy point, contexts can be public or private. A public context means that event listeners running within it can receive event streams from external sources (i.e. adapters), listeners within a private context can only receive events that are directed  to the context via the enqueue statement in application logic running in another context. For my example, this context pool utility creates private contexts: context(name, false)

I've leveraged another general capability of the Apama EPL in the implementation of this context pool, that of actions on events. You'll notice these two actions are enclosed in an event definition which is part of our com.apamax.sample package.

In keeping with it's charter of structured programming,  actions on events provides a means to promote code modularity by encapsulating reusable utility functions (like a context pool).


 


The (parallel) Item Producer
.
package com.apamax.sample;


monitor ItemService {
   
  event ClearUserID {
      integer id;
  }

            
  integer count := 0;
  float price := 0.0;
   
  action onload {
      ContextPool cf:=new ContextPool;
      cf.create(4, "ClientService");
   
      // list of subscriber (user) identifiers
      sequence<integer> ids := new sequence<integer>;
       
      SubscribeToItems s;
      on all SubscribeToItems():s {
          if ids.indexOf(s.subscriberId)= -1 then {
              context c:= cf.getContext();
              ids.append(s.subscriberId);
              route SubscriptionResponse(s.subscriberId, c);
              on completed SubscriptionResponse() {
                  spawn startSubscriptions(s.subscriberId, s.item_name,
                                           context.current()) to c; 
              } 
          }
      }
       
      ClearUserID c;
      on all ClearUserID():c {
          log "in " + c.toString();   
          integer index := ids.indexOf(c.id);
          if index != -1 then {
              ids.remove(index);
          }
      }
  }

  action startSubscriptions(integer this_subscriberId, string name,
                            context mainContext) {
      log "in startSubscriptions";
       
      on all wait(0.1) and not UnsubscribeFromItems(subscriberId =
                                               this_subscriberId) {
          route Item(this_subscriberId, name, count, price);
          count := count + 1;
          price := price + 0.1;
      }

      on UnsubscribeFromItems(subscriberId = this_subscriberId){
          enqueue ClearUserID(this_subscriberId) to mainContext;
      }       
  }
 
}



To get a general sense of what the multi-instance Item Producer code is intended to do, I suggest a quick scan of my last installment, this revision does not change that basic foundation it only parallelizes it. It is worth pointing out how little the code and design has changed yet this implementation has the ability to scale massively to tens of thousands of instances across multiple processor cores.  Clearly this is just a simple example that does very little real work (producing Item events). However structurally, it's a model that represents how one would design such a scalable service in the Apama EPL.

The parallel Item Producer (like it's previous incarnation) manages multiple uniquely identified Consumers. For that it must maintain a list of identifiers, one for each Consumer.  But this time, the Producer instance created on behalf of the Consumer is spawned into a context:  spawn startSubscriptions(s.subscriberId, s.item_name, context.current()) to c; We're still passing the subscriberID and item_name, (the instance parameters) but we also pass the context handle of the main context (context.current()).   This is necessary for the inter-context communication.  

The Consumer implementation has undergone a minor change to support this parallelized execution mode to match the Producer.  A good design pattern is to ensure that monitors that frequently pass events operate within the same context. This is not a hard-fast rule, only one that limits the amount of inter-context communication (i.e. enqueueing).  I've enhanced the interface slightly, there is a new event, SubscriptionResponse  that is used as a response to subscription requests (on all SubscribeToItems()) .  This event is used to communicate back to the client the context handle of the Producer spawned on its behalf. Once the Consumer receives this event, it also spawns into this same context. By doing so, both the Producer and Consumer operate as they always did sending Item events (route Item(this_subscriberId, name, count, price)) and handling termination (on UnsubscribeFromItems).  Within each context, the producer/consumer still adheres to that single-cast event passing scheme where it creates and sends uniquely tagged Item events. The Consumer and the Interface are included in the download (not shown here for brevity's sake).

Two additional noteworthy points to highlight in this Producer implementation.

1) The on completed SubscriptionResponse() listener.  The completed  keyword indicates that this listener wakes up after the SubscriptionResponse  event has been delivered.  This way we can guarantee that our Consumer has received this event and has the context handle before spawning the Producer.

2) To process UnsubscribeFromItems events, the statement: enqueue ClearUserID(this_subscriberId) to mainContext; is executed.  This statement is used to send an event to the listener (on all ClearUserID) which executes in another context. Recall, that the action startSubscriptions is the target of the spawn operator. So this is the main body of code for which multiple instances are parallelized running in contexts (from the pool). The onload action, which is controlling all of this spawning is logically considered the main context. Due to the strong semantic for inter-context communication, events must be enqueued to another context's input queue. Each context in effect has its own input queue and with the context handle the inter-context communication mechanism is defined. So to communicate the client termination request from the spawned instance running in a private context the ClearUserID event must be enqueued to the main context where the appropriate listener is waiting.

Routing (i.e. route Item(...)) is still possible, but routed events stay within the boundaries on the context where the Producer and it's corresponding Consumer reside.  To logically expand the example, multiple Consumers could reside in the same context (i.e. a multi-cast design pattern as I described in the previous revision of this example).

 

This example is designed to illustrate the simplicity of parallelism in the Apama EPL. With just a few simple statements, one can quickly and easily leverage multi-core processor technologies for massive scalability.

As I mentioned earlier this is the final entry for this specific example, if you're just seeing this for the first time you can start from the beginning (only three short segments) here. I hope this has been informative and provided some insight into the Apama EPL, I plan to have many more code examples in the future on various use cases.

You can download the complete example here with the consumers, interface and producer. Any questions or comments, just let me know,
Louie


Friday, April 03, 2009

Apama showcased at Intel Nehalem Launch

Posted by Louis Lovas


I have a need for speed. A most poignant phrase for Intel's Nehalem (Xeon 5500) Processor Series. Earlier this week with great fanfare Intel launched their new Xeon 5550 Processor at the NASDAQ MarketSite Tower in New York. It was a spectacular event with mammoth displays, videos, demonstrations, guest speakers and a grand crowd enjoying the sights, the news and of course the abundant refreshments. I had the privilege of a front-row seat during the keynote from Sean Maloney, Intel's Executive VP for Sales and Marketing. In that keynote presentation Sean spoke of the vast performance gains Intel has attained with the Xeon 5500 processor over previous generations in performance yet with lower power consumption.  The main thrust of Sean's address was to stress the relevance of the Xeon 5500 to Wall Street.

To showcase this relevance Intel chose a few of their key partners to be part of this keynote presentation, the Apama CEP Platform, Rapid Addition and Thomson Reuters.  Sean, with the assistance of his colleague Adam Moran, did the demonstrations themselves in front of a crowd of about 200+ Wall Street dignitaries and a crew from the press. I had that front row seat to answer any follow on questions stemming from the demonstrations.

To showcase both the Apama CEP platform and the Intel Xeon 5500 Processor, which we did in two distinct demonstrations for Sean, we wanted to highlight the scalable architecture of the Apama platform with something that is near and dear to Wall Street - profit. For the first demonstration, we took one of our standard high-frequency, alpha-seeking strategies (Statistical Arbitrage), scaled it to 1600 instances across 16 threads on 8 cores. During the demonstration market data was piped into all 1600 instances at a whirlwind pace, measuring the throughput and monitoring the overall P&L. This of course was done on both the new Xeon 5500 (code-name Nehalem) and the previous generation of processors, Xeon 5400 (code-name Harpertown).  A visual of the side-by-side comparison was watched live by the crowd on dashboards which I've captured below:

Xeon 5400 (Harpertown) showing the baseline of the performance benchmark for a high-frequency Trading Strategy (Stat-Arb)


Xeon 5500 (Nehalem) showing the improvements acheived in Profit and Throughput for a high-frequency Trading Strategy (Stat-Arb)

The two main improvements achieved by the Nehalem processors are more than doubling of throughput (2.31 vs. 1.01) and a greater profit margin ($15,688,647 vs. $7,504,386). The increased profit was achieved through an increase in the number of arbitrage opportunities due to greater throughput. All this across the same market data over the same time span.  

The second demonstration Sean and Adam presented was the Apama Smart Order Router which was also run on both processor generations (Nehalem and Harpertown) showing the performance gains in Order processing achievable by the Nehalem processor technology.

With the most recent version of the Apama CEP product, we've provided not just a scalable platform but a complete stack, which has always been our vision. One of scalability and usability. We offer tooling for developers and analysts alike. The Apama platform includes a full-fledged event processing language (EPL), dashboard designer and graphical modeling tools. Yet we've not ignored the java developers out there, we support a java language binding to our CEP engine that can play along side our EPL and modeling tools.  Behind that scalable CEP engine, it's connectivity that makes a platform real, thus we provide over 50 adapters to market data, order and execution venues, and standard databases and message buses.  

Intel also did a UK launch of the Nehalem in London on April 1st, the day after the New York launch where my colleague Gareth Smith, the Apama Product Manager was a panelist. He spoke on our experiences partnering with Intel over these past few months.

Nehalem is truly a remarkable achievement by Intel, in fact they've publicly stated it's the most significant processor achievement since the Pentium Pro release back in 1995. I was proud to be part of this significant event and its possibilities for Apama, the CEP industry and Capital Markets. After the main speaking engagements I spoke with numerous people equally exuberant. As I've often said, in Capital Markets we're on a race to the micro-second. We've just edged a bit closer.

Monday, March 23, 2009

We're going on Twitter

Posted by Giles Nelson

Louis Lovas and myself, Giles Nelson, have started using Twitter to comment and respond to exciting things happening in the world of CEP (and perhaps beyond occasionally!).

The intent is to complement this blog. We'll be using Twitter to, perhaps, more impulsively report our thinking. We see Twitter as another good way to communicate thoughts and ideas.

We would be delighted if you chose to follow our "twitterings" (to use the lingo), and we'll be happy to follow you too.

Click here to follow Louis and here to follow Giles (you'll need to signup for a Twitter account).

Wednesday, March 18, 2009

Scalable Performance - a sneak preview of what's ahead for Apama

Posted by Louis Lovas

Engaging the Transmission - Scalable Performance When it Matters Most
We will soon be formally announcing the next major version of the Apama product, but I thought I would give a bit of a sneak preview here. We've added a number of significant enhancements to the product, one in particular I want to highlight is a new parallel execution model. Our CEP engine's new parallelism allows massive vertical scalability on multi-core hardware. We can leverage 8-way, 16-way even 32-way processors with ease.  We've enabled this capability by extending the Apama EPL with four new keywords. To explain requires a bit of background ...

In the Apama EPL, we have the notion of a “sub-monitor”, which can be thought of as a separate parameterized “instance” of a set of business logic that encompasses a scenario, a trading strategy for example. Each sub-monitor can load and maintain its own data (e.g. consider a sub-monitor as corresponding to a different occurence, with different timeouts, key indicators, client data, etc.) and can set up its own listeners to key into a (different or overlapping) subset of the event stream. This allows us to easily model factory like behavior, with each sub-monitor maintaining it's own state and possibly listening for different (sequences of) events – but applying the same basic logic including the actions to take when an event condition becomes true.  We call this our micro threading or mThread architecture.

In the latest Apama release, v4.1 we extended this and introduced the notion of contexts. These are silos of execution which take the factory model to the next level. ”context" is a reserved word in Apama 4.1 – it defines a collection of sub-monitors which can inter-operate in a protected address space, with strong semantics for inter-context communication (via event passing) similar in concept to the Erlang message passing programming model. Most importantly, it is also the unit of parallelization – allowing the same CEP engine to spawn multiple “contexts” which key into the event flow but execute in parallel on multi-core architectures. 

Contexts in Apama's EPL adhere to our language's basic event paradigm, providing a safe semantic for concurrency yet avoiding the typical race conditions common in multi-threaded programming in other languages (i.e. java) which require the use of mutexes, semaphores, etc.

"The world IS concurrent. It IS parallel. Things happen all over the place at the same time. I could not drive my car on the highway if I did not intuitively understand the notion of concurrency; pure message-passing concurrency is what we do all the time."

A great quote, one that we've taken to heart in the design of the parallelism now available in our EPL. Our approach is based on a deep understanding of the types of applications being built with the Apama platform. Our broad customer base provided us that advantage. For example, we took our Smart Order Router Solution Accelerator, enhanced it to use the context model and did a performance benchmark and achieved a 6.5 times increase in overall capacity on an 8-core server while holding to a steady low-latency threshold (notice we also improved overall performance in the v4.1version over previous versions as well).

This is a graph that compares the Capacity (number of concurrent open orders that can be processed in a specific timescale) of the Equities Smart Order Router for an increasing number of symbols. The comparison was on three versions of the Apama product. In the parallel version we have modified the SOR to partition symbols across contexts (each context goes on a processor core). The machine used for the experiment was an 8-core Intel Server. 

Apama's new enhanced parallel execution model is a reflection of how our customers use CEP technology to build out real world applications. The competitive landscape of CEP dwells on performance with wild claims of speed often without substance. It reminds me of a teenager revving their car's engine at a traffic light. You can see the needle on the tachometer race up, the engine makes a lot of noise, but to what purpose? With the new release of the Apama CEP platform it shows that we know how and when to engage the transmission.

Tuesday, March 10, 2009

The instance design pattern in the Apama EPL

Posted by Louis Lovas

<p>Instance design pattern in the Apama EPL</p>


In this my third installment on a series devoted to the Apama Event Processing Language (EPL), I'll continue where I left off last time in which I described the basic design of event passing for a consumer/producer model.  For this revision I've extended the model to support multiple consumers.  This introduces the instance management feature of the Apama EPL.  Instances or 'sub-monitors' as they're often referred to define a discrete unit of work. The unit of work can be very small (an analytic calculation) or very large (a whole trading algo). Each instance gets spawned with it's own parameter set, listens for it's own event streams and operates in a very singleton manner. To which I mean, within the semantics of the application an instance need only be concerned about managing itself not other instances. Overall, it is a  factory behavioral model extended to include an execution model.  This is a key aspect of the Apama EPL, making a common application requirement simple to implement, robust in design, and highly performant in the CEP model.

The Apama CEP engine manages the execution of these sub-monitors (also known as mThreads internally). In a typical production deployment, there would be 100's or 1000's of sub-monitors running. The spawn operator is the single statement in the language that accomplishes this unique feature. Spawn is basically a self-replicating scheme with certain behavioral rules. The main rule: the cloned instance does not get the active listeners (i.e. on all someEvent...) of the parent. It must establish it's own. Actually it's the perfect model for that factory-like behavior.  The clone does not want it's parents listeners, but would create it's own based on the parameters passed such as the symbol name in a trading algo or the subscriberId in our Producer example below. Speaking of our example ...

For the sake of brevity, I've just listed the extended Producer side of my consumer/producer example below. For the complete example, you can download it here.

The (extended) Item Producer.
package com.apamax.sample;

monitor ItemService {
   
    event ClearUserID {
        integer id;
    }
   
    UnsubscribeFromItems u;
        
    integer count := 0;
    float price := 0.0;
    listener l;
   
    action onload {
        // list of subscriber (user) identifiers
        sequence<integer> ids := [];
       
        SubscribeToItems s;
        on all SubscribeToItems():s {
            if ids.indexOf(s.subscriberId) = -1 then {
                ids.append(s.subscriberId);           
                 spawn startSubscriptions(s.subscriberId, s.item_name);      
            }
        }
       
        ClearUserID c;
        on all ClearUserID():c {
            log "in " + c.toString();   
            integer index := ids.indexOf(u.subscriberId);
                if index != -1 then {
                    ids.remove(index);
                }
        }
    }

    action startSubscriptions(integer this_subscriberId, string name) {
        log "in startSubscriptions";
       
        l := on all wait(0.1) {
            route Item(this_subscriberId, name, count, price);
            count := count + 1;
            price := price + 0.1;
        }

        on UnsubscribeFromItems(subscriberId = this_subscriberId):u {
            stopSubscriptions(u.subscriberId, u.item_name);
        }       
    }
   
    action stopSubscriptions(integer subscriberId, string name) {
        // do something to stop routing events
        log "in stopSubscriptions";
        l.quit();
        route ClearUserID(subscriberId);
    }
}

           


To get a general sense of what this bit of code is intended to do, I suggest a quick scan of my previous installment where I introduced this example.

The extended Item Producer is expected to manage multiple uniquely identified consumers. For that it must maintain a list of identifiers, one for each consumer.  It does that by appending and removing entries from an array (sequence<integer> ids).  his is a common idiom for tracking identifiers, syntactically it's similar in many imperative languages.

This example uses a single-cast event passing scheme where the Producer routes Item events uniquely tagged to the consumer (route Item(this_subscriberId, name, count, price)).

 On the consumer side, Item events are listened for based on a subscriberId (on all Item(subscriberId = myId)). It's the uniqueness of subscriberId (one per consumer) that defines this as a single-cast design. A common twist to this a multi-cast event passed scheme (not be be confused with the UDP multicast) where multiple consumers might be requesting the same information (i.e. the item_name in our example).  A well understood example of this would be a market data adapter providing trade data for the same symbol to multiple consumers.  The Item Producer would change very little to support a multi-cast event passing scheme.

In the listener "on all SubscribeToItems()", we spawn to the action startSubscriptions when we receive a  SubscribeToItems event from a consumer. We pass the parameters of  the consumer's identifier (s.subscriberId) and the item (s.item_name) to instantiate the new sub-monitor.  A new mThread of execution is created for the sub-monitor and it begins executing producing Item events. The parent monitor continues waiting (listening) for another SubscribeToItems event.

 You'll also notice the use of a private event ClearUserID, the purpose behind this is to communicate between the spawned sub-monitor(s) and main (parent) Monitor when an UnsubscribeFromItems request is received.  This is necesssary since the parent monitor manages the id's of connected consumers. A spawned sub-monitor uses this event to simply inform of termination.

The event paradigm in the Apama EPL extends far beyond the notion of processing data events. In one sense you could categorized events as data and control. Data events are consumed and processed by the CEP application. Control events direct the semantics of the application. 




 

This example is designed to illustrate a few powerful yet easy to use features of the Apama EPL:

  1. To highlight that the notion that managing multiple consumers (clients) becomes a simple and safe programming task in the Apama EPL. Instance management is an intrinsic design pattern based on the commonly understood factory model. We did not reinvent the wheel, we simply refined it and made it approachable in the CEP paradigm.
  2. Events are not just application data to be processes by monitors. They provide semantic control of an application as well.

Here's the complete example with the consumers, interface and producer

Once again thanks for reading,
Louie


Wednesday, February 25, 2009

Structured Programming in the Apama EPL

Posted by Louis Lovas

<p>Structured Programming in an EPL</p>


This is my second installment on a series devoted to the Apama Event Processing Language, MonitorScript. In my introduction to the Apama EPL I described the basic structure of the language, some concepts and terminology. I also included the obligatory Hello World sample. In this entry I'll continue that basic tutorial delving into the elemental structure of the EPL. I've decided to do this to highlight the very simple fact that the language adheres to the principles of structured programming.  Just so we're all on the same page, here's a definition I found on the web:

DEFINITION - Structured programming (sometimes known as modular programming) is a subset of procedural programming that enforces a logical structure on the program being written to make it more efficient and easier to understand and modify.


That's a definition that is clearly irrefutable. In order for a language to live up to that definition it must have a number fundamentals.  Code modularity, or developing code as separate modules. This allows for parallel development, improves maintainability and allows modules to be plug replaceable. Encapsulation is a means of hiding an implementation or inner workings of a calculation or algorithm. Lastly there are  Interfaces and instances. This architectural pattern  may be applied to the design and implementation of applications which transmit events between loosely coupled software components. These are just a few of those fundamentals. These essential elements of languages have been part of the scene for decades. They are present in the procedural programming languages like C and in object oriented languages like C++ and java.  

Lest we forget our roots as software engineers, with all the hype surrounding CEP and their constituent domain-specific languages, it's important to do a reality-check and not get swept up and forget the basic principles necessary for creating long-standing maintainable code. Last year I wrote a piece on readability as a criteria for a successful language this is primarily based on a development language having these basic fundamentals: modularity, encapsulation, interfaces and instances.

Apama's EPL, MonitorScript has all these capabilities, as I will describe below. It's what allows us to build reusable framework components, and solutions. Which we've done so with our Capital Markets Framework and Solution Accelerators for FX Aggregation, Smart Order Routing, Surveillance, Bond Pricing, etc.  These are components written in our EPL that have the plug'n'play modularity to be redeployed in multiple customers. 

To illustrate this idea of structure - modularity, encapsulation and interfaces I'll use a short example of a producer and consumer. This is a common design pattern or idiom we use extensively. The Apama EPL's event paradigm extends not only to the type of CEP applications we enable but also to the nature of the language itself. If you're familiar with message passing languages such as Erlang this will be a familiar concept. Different modules that make up an application communicate with one another by passing messages (or events as is the case). 

In this example I have a service or producer monitor that generates and sends Item events, a client or consumer monitor that consumes Item events and an interface for the interaction between the two. If the term Monitor seems strange, I've defined a set of terms and concepts in my introduction, I suggest a quick review of that get up to speed.

The interface to the monitor, defined below is the set of events it receives and transmits. The event definitions for these events are declared within a package name (i.e. com.apamax.sample). Apama's EPL supports java-like package names for name-space isolation to strengthen that modularity notion.


The Item Interface.

package com.apamax.sample;

event SubscribeToItems {
    string item_name;
}

event UnsubscribeFromItems {
    string item_name;
}

event Item {
   string  item_name;
   integer item_count;
   float   item_price;
}



The Item Interface is simply a set of prescribed events. Two are to instruct the consumer to start/stop receiving Items and the Item definition itself. As I mentioned it uses a subscription idiom.   We use this notion extensively where a Monitor is a layer over some subscription-based service such as a market data adapter. A real market data interface would be much more extensive. Here, I've scaled it back for the sake of simplicity, but you can imagine a more robust interface including error handling and status events.

 



The Item Consumer
.
package com.apamax.sample;

monitor ItemClient {

    SubscribeToItems subscribeToItems;
    Item item;
   
    action onload {
        subscribeToItems.item_name := "sample";
        route subscribeToItems;


        on all Item():item {
            log "Got an item: " +  item.toString();
            if (item.item_count > 10) then {
               route UnsubscribeFromItems("sample");
               log "All done.";
            }
        }
    }
}



The Item consumer is also a Monitor in the com.apamax.sample namespace.  It is a user of our event interface to the Item service and as such is interested in receiving events of type Item.  The interface defines the means to do this by subscribing. The consumer simply has to create a SubscribeToItems event and forward it to producer service.

As I mentioned earlier, the Apama EPL adheres to an event paradigm as a fundamental characteristic of the language. The route statement is a means by which Monitors communicate. This is another precept and underlines the fundamentals of modularity and encapsulation.

Once a subscription request has been sent  (route subscribeToItems), the consumer listens for Items events, (on all Item()).  In this simple example, we're just looking to receive them all without any filtering or pattern matching.  I will explore event pattern matching - both the simple and complex in a follow-up blog.

To complete the picture, the sample tests a field in the Item event and terminates the subscription if it exceeds a constant value, (item_count > 10).


The Item Producer.
package com.apamax.sample;

monitor ItemService {
   
    SubscribeToItems subItems;
    UnsubscribeFromItems unsubItems;
    integer cnt := 0;
    float price := 0.0;
    listener l;
   
    action onload {

        on all SubscribeToItems():subItems {
                startItems(subItems.item_name);         
        }

        on all UnsubscribeFromItems():unsubItems {
                stopItems(unsubItems.item_name);
        }
    }

    action startItems(string name) {
        l := on all wait(0.1) {
            route Item(name, cnt, price);
            cnt := cnt + 1;
            price := price + 0.1;
        }
    }

   
    action stopItems(string name) {
        // do something to stop routing events
        l.quit();
    }
}



The Item producer is also in the com.apamax.sample namespace.  It defines listeners for SubscribeToItems and UnsubscribeFromItems.Two events from our interface. Typically, subscriptions would be managed on a per-user basis, thus allowing multiple consumers to subscribe to our Item service. This is a detail I will outline in a subsequent installment along with a few other related features such as instance management. 

Once a subscription request has been received, the startItems action (i.e. a method) is invoked to continuously route Item events to the consumer every 0.1 seconds (on all wait(0.1) ...) .   Again, in a real world scenario, this particular portion of a service would be more involved, such as possibly managing the interaction to an external adapter for example.

For terminating a subscription on behalf of the client (on all unSubscribeItems()), we simply terminate the wait listener (a looping construct) set up in startItems.


 


This example is designed to illustrate a few common principles in the Apama EPL:

  1. To stress that the fundamentals of structured programming are ever present: modularity, encapsulation and interfaces. Two benefits of Apama's modularity that are noteworthy relate to the plug'n'play idea I mentioned earlier.  a) As typical with modular programming, revised modules can be introduced with little or no impact as long as interfaces are left intact. This is also a truism with the Apama EPL. b) Those revised modules (Monitors) can be replaced in a running system, a shutdown is not required. Furthermore, the modularity also extends beyond the boundary of a single CEP engine to multiple CEP engines in an Event Process Network (EPN) with no actual code change.
  2. The event paradigm is inherent in the language not just for application events but for the entire application design center.

In future installments on the Apama EPL, I'll delve into a few more of the language constructs that extend this basic idiom (multiple consumers, spawning, and parallelism).

Once again thanks for reading,
Louie


Monday, February 16, 2009

Sun host Developers Workshop of complementary technology for low latency

Posted by Louis Lovas


Last month I was invited to speak at a Developers Workshop sponsored by Sun Microsystems on building low-latency trading applications. I had a 25 minute time slot to fill with the goal of educating an audience of 110 architects, engineers and technical managers from Wall Street on CEP and Apama's vision of it.  I'm usually able to sequester an audience for a lot longer than that to give them my perspective and since I tend to ramble just a bit, it was a tall order to me to whittle it down to this shorter time slot. 

To go one step further, I also did a demonstration of the Apama Algorithmic Trading Accelerator (ATA) and our graphical modeling tool, Event ModelerTM. So I had to move fast to accomplish this.

Since this was a Sun sponsored event, there were a number of sessions devoted to Solaris and Sun hardware. Sun has done some great work with the Solaris operating system to leverage multi-core processors for scaling low-latency applications. Still, you need ample knowledge and expertise to be able to fine-tune the OS to unlock that capability. There were a few demonstrations of how, using a few of the Solaris OS command tools you can better apportion  processor cycles to starving applications to achieve tremendous gains in performance. One has to be quite scholarly in Solaris systems management so one does not shoot thyself in the foot.

Besides myself representing Apama and CEP, I thought Sun did a great job of bringing together a group of complementary vendors that touched upon the theme for the workshop - low latency. Just to highlight a few... Patrick May from Gigaspaces discussed distributed cache technology and  Jitesh Ghai from 29West described the benefits of low-latency messaging to the trading infrastructure.  I've always considered both of these technologies very complimentary to CEP.  Among many other uses, distributed caching engines provide a basis for a recoverability model for CEP applications. Low-latency messaging brings new possibilities for architecting CEP applications in a distributed model.  

As for me, I presented a number of CEP themes in my talk:

1) Drivers for CEP Adoption. 

Fitting with the theme of the workshop, the drivers for CEP adoption are the increasing demands for low-latency applications. It's the race to the micro-second on Wall Street whether we like it or not. Additionally, the need for rapid development and deployment of complex applications is pushing CEP technology into the hands of the sophisticated business users. Graphical modeling tools empower these astute users, the ill prepared will get left behind.

2) Evolution of CEP  in Capital Markets.

 From single asset broker algos to cross asset, cross border smart order routing to real-time risk and surveillance. CEP is growing and maturing in Cap Markets on numerous fronts.


3) Anatomy of  CEP Architecture.

I presented a macro-level anatomy of the Apama vision of that architecture. There are 4 key components: 1) the CEP engine itself. 2) Integration to the external world. 3) Tools for development and deployment. 4) Visualization across a wide spectrum from richly interactive desktop click trading to widely distributed web-based user interfaces.

Lastly, I want to thank my pal Gareth Smith the Apama Product Manager and slide designer extraordinaire for these slides on the architecture. He's a wiz at putting ideas into compelling visuals.

You can download the slides of my Apama CEP presentation here.

As always thanks for reading,
Louie


Monday, January 19, 2009

A Transparent New Year

Posted by Louis Lovas


It's been a while since I've put a few thoughts in print on these pages on event processing, but that's not to say I've not been busy. We've wrapped up quite a successful year in the Apama division within Progress software. For this year, 2009 we've got quite a number of new features and capabilities in store for the Apama CEP platform. We will be extending the breadth and depth our event processing language (EPL) to meet the challenges we've seen in the market and a vision we have for the future. Of course you will see announcements in our standard marketing press releases but I (and others) will explore the technical merits of those new features in these blog pages in much more detail. Something we've not done that much of in the past. There are many historical reasons for that not really worth explaining. Suffice to say our intention is to be more transparent in the coming year.

To kick that off, it's worth starting a bit of a tutorial on the Apama EPL, a language we call MonitorScript.  I'll begin with the basics here and in subsequent blogs build upon these main concepts, providing insight into the power and flexibility of our EPL. And as we release new extensions and capabilities it will be easier to explain the benefits of those new features. So without further ado, here is the first installment.

First a few basic concepts...

  • Apama MonitorScript is an imperative programming language with a handful of declarative statements. This is an important consideration and one we highlight as a distinction in our platform against competitive platforms that are based on a declarative programming model. The imperative model provides a more natural style of development similar to traditional languages like java and C++.
  • The language is executed at runtime by our CEP engine called the Correlator.  

Second a few basic terms...

  • A monitor defines the outer most block scope, similar to a class in java or C++. It is the basic building block of Apama applications. A typical application is made up of many monitors. As you might imagine monitors need to interact with one another, I'll explore that capability in a later blog.
  • A event defines a well ordered structure of  data. The EPTS Glossary definition has a handful of great examples of Events
  • A listener, defined by the on statement, declares or registers interest in an event or event pattern. In a large application it's the Correlator runtime engine that will typically process 1,000's or even 10,000's of registered event patterns. In our example below we just have one.
  • An action defines a method or function similar to java or C++. 
  • The onload() action is a reserved name (along with with a few others) that is analogous to main() in a java program. 
To put it all together... "A monitor typically listens for events and takes one or more actions when an event pattern is triggered."


The language sports a number of capabilities that will be familiar to anyone schooled in java, C++ or any traditional imperative programming language. I won't bore with all the nuances of data types and such obvious basics, those are well articulated in our product documentation and customer training courses. I will however, focus on the unique paradigm of event processing.

Apama MonitorScript, a basic monitor.

event StockTrade {
  string symbol;
  float price;
  integer quantity;
}

monitor StockTradeWatch {

  StockTrade Trade;

    action onload {
      on all StockTrade():Trade processTick;
    }

    action processTick {
      log "StockTrade event received" +
      " Symbol = " + Trade.symbol +
      " Price = "  + Trade.price.toString() +
      " Quantity = " + Trade.quantity.toString() at INFO;
    }
}


This monitor called StockTradeWatch defines an event of type StockTrade. Events can originate from many sources in the Apama platform, the most obvious would be an adapter connected to a source of streaming data (i.e. stock trades as example shows), but they can come from files, databases, other monitors, even monitors in other Correlators.

The onload action declares a listener for events of type StockTrade (i.e. on all StockTrade). When the monitor StockTradeWatch receives StockTrade events, the action processTick is invoked. As you can see in this example we simply log a message to indicate that it occurred. The obvious intent is that this monitor will receive a continuous stream of StockTrade events, each one will be processed in-turn by the processTick action.  One can be more selective in the event pattern with a listener, such as on all StockTrade(symbol="IBM"). I will explore the details of event patterns and complex listeners later on.

As I mentioned, I've started with a simple example that shows the basics of the Apama EPL, MonitorScript. It demonstrates the simplicity by which one can declare interest in a particular event pattern, receive matching events and act on them in your application (i.e. action).

 In subsequent installments I will demonstrate more complex features highlighting the power of the language.  That's all for now.


You can find the second installment here.
Regards,
Louie