« March 2009 | Main | May 2009 »

April 2009

Friday, April 24, 2009

Get Yourself Connected

Posted by Richard Bentley

Our integration with Vhayu Velocity, announced earlier this week, is the latest connectivity option we've added to the Apama platform. This follows hot on the heels of the launch of our adapter for the Brazilian BOVESPA market, and got me musing about the importance of connectivity for our Apama CEP platform - and the effort we need to invest to develop and maintain it. All in all we now support more than 40 distinct adapters (as listed here) and are adding more all the time (we have 5 new adapters in development right now).

The importance of connectivity cannot be overstated. One could have the best CEP Engine on the planet, but if there's no way of getting events in and out of it then it's of zero use - the proverbial Ferrari with no wheels. For basic infrastructure, you can go some way with strong support for standard middlewares through JMS, databases through ODBC/JDBC, etc. For Capital Markets however, where latency is often a critical success factor, it is often about direct connectivity to the market, involving development to proprietary and extensive market-specific APIs; although good support for the FIX protocol is most definitely necessary, it is nowhere near sufficient for the low latency high frequency trading world.

Of course, once you have strong connectivity to a particular market then that becomes an enabler - our support for the native market data and order execution APIs of the Chicago Mercantile Exchange (CME), for example, is allowing Apama to play in the proprietary trading world of the Futures and Commodities markets with great success, and it is no surprise that Apama is doing particularly well in Brazil! But such connectivity does not come cheap; as exchanges regularly rev their APIs, adapters need constant care and feeding, and there are always new markets and systems that need to be connected, particularly for a platform like Apama which plays across all asset classes, on a global stage. Apama currently retains a team of 10 Engineers and QA staff who *just* build and maintain our adapters, and we regularly second other personnel to this team to cope with demand spikes.

We took a decision early on to invest in developing our own connectivity; we clearly had options - there are no shortage of intermediaries out there who purport to connect to 000s of different systems. But there's always that one system that they don't support - or don't happen to support on the platform you need - so you never get away from having to build and maintain it yourself to some extent. And then of course there's the need to integrate with proprietary systems - when you start off by targeting your product at the Tier 1 investment banks, there's an awful lot of in house systems you need to hook up to!

In fact, our earliest clients were all of this kind, requiring proprietary connectivity to their home-grown systems. Overall this has been a boon for Apama, as we were forced to focus almost day 1 on building a toolkit which allowed us to rapidly develop new connectivity – resulting in our Integration Adapter Framework (IAF). All Apama connectivity is built using IAF - and as a result benefits from common configuration and management interfaces, uniform latency measurement framework, real-time status reporting and more.

Building and supporting the IAF, developing our 40+ (and counting) adapters, keeping up with exchange upgrades and so on requires a huge investment of time and effort. Whilst this might be a much less glamorous aspect of developing a CEP Platform (my colleagues in Engineering touchingly refer to it as "the filth"), it is a hugely critical one.

... and whatever happened to the Stereo MCs anyway?

Tuesday, April 14, 2009

Empower the Business, but Keep Control

Posted by Matt Rothera

Mrothera-web I was reading a recent blog from Opher Etzion at IBM in which he made reference to some of the challenges with innovation and the IT department.  He states:

Richard's question about innovation and IT department is more complicated one, however, there is some truth in his observation, based on my own experience in an IT department of a large organization, IT departments may be more conservative than their users, and typically will be cautious about be anything that is giving a "programming capabilities" to the users (and is conceived as "losing the control"). Since many event processing applications' value to the business users is doing exactly this (giving more power to the business user to do their own programming) it is sometimes a challenge to get it through the IT department, but this is a longer discussion...

 

While I have had many great conversations over the past couple of years with IT visionaries about CEP and event processing, I would agree with Opher that in the general sense, there is “conservatism” when it comes to exposing “programming capabilities” to their users.  There is a paradox with CEP, because the knowledge of the actual useful CEP rules for the business usually lie with the “Line of Business” or the “End User/Partner”.  While it definitely makes sense to push management and creation of the CEP rules out to those with the most domain knowledge (such as business analysts and operational business users), the thought of completely losing control of this environment sometimes “trumps” the benefits that can be made from such an agile CEP environment.  I find this very similar to the early days of SOA, when web services were just coming of age.  There was a notion that end users or business users would simply be able to consume these services at their desktop and orchestrate their own business processes.  However, the reality of “infinite loops”, “complex data mediation”, and the overall requirement to still fundamentally write “programming logic” let reality and pragmatism win the day.

 

At Progress Apama, we provide tools for 3 different sets of users to get involved creation and management of CEP rules:  IT, Business Analysts, and End Users.  However, the architecture recognizes the fact that as more power is exposed to business analysts and end users, there is more a chance of “loss of control”.  For this reason, we provide more of a “delegated model”, in which each layer of user (starting with IT) selectively exposes capabilities to the business analysts, who in turn selectively exposes parameters to the end user.   

 

For the business analyst, IT creates very simple “Smart Block Interfaces”, which capture the necessary programming logic and exposes information to the modeling environment in a straight-forward manner.  IT keeps control by exposing the data, analytics, and operations that offer the best combination of power and control for the analyst.   Apama provides a platform for IT to in effect create a custom environment for the analyst, or create a Domain Specific Language for the analyst, completely “fit for purpose” for the task at hand.  This not only allows the analyst to be extremely productive for their specific area of expertise, but allows IT to limit the risk as opposed to exposing all of the raw CEP capabilities to this class of user.

 

For the end user, the business analyst follows a similar process.  Based on the model, the analyst exposes parameters with constraints to the end user.  This allows the end user simply to set parameters within the given constraints to either change existing CEP rule instances, or create new instances “on the fly” with the desired parameters.  Allowing the end user to set the parameters through highly contextual dashboards provides again the right balance of power and control when getting end users involved in the process.

 

Finally, the back-testing environment provides the final piece of the puzzle to allow business analysts to validate their new models against historical production data.  By running the models through the historical event data, the model and the outputs can be verified prior to deploying into production.  Similarly, user based parameters can also be validated against the historical data in the same way, providing assurances that the parameters will have the anticipated effect.

 

In summary, I believe that the business benefits of CEP will largely be recognized “closer to the end user and the business”.  However, I don’t believe that this means needing to find clever methods of bypassing IT.  A pragmatic approach incorporates a variety of users within the organization, and makes the most out of each skill set in the enterprise.  The total result is measured by the collaboration of all the users, rather than the efforts of any specific user type within the organization.

Thursday, April 09, 2009

Scalable concurrency, a design pattern in the Apama EPL

Posted by Louis Lovas


This is my final installment in a series devoted to a specific example in the Apama EPL. I began this example by describing the basic design pattern of a  consumer/producer.  Further enhancements enabled multiple consumers and as a result the instance idiom.  Finally below, I will again enhance this consumer/producer by illustrating how one can leverage multi-core processors for massive scalability and parallelism.

As I have mentioned before, instances or 'sub-monitors' as they're often referred to in the Apama EPL define a discrete unit of work. That unit of work represents a set of business logic however large (a complete application scenario) or small (a simple analytic).  Instances are created on demand using the spawn operator in the language. Each scenario instance is invoked with a unique set of input parameters that represent that occurrence. Each instance can then uniquely maintain its own reference data, timers and event streams, in effect its own state.  In general programming patterns this is known as a factory behavioral model but we've extended it to include an execution model.

To provide a means to leverage multi-core processors, the Apama EPL provides a syntax and a simple semantic to allow those instances to execute in parallel. We do this with a language feature called contexts. These are silos of execution which take the factory model to the next level. A context defines a logical container that holds and executes instances of a scenario (of the same or differing types). The EPL provides a semantic for inter-context communication, there is no need for mutexes, semaphores or other locking schemes thus avoiding common deadlock code patterns typical of imperative languages such as java. Each context in effect has it's own logical input queue to which events are streamed from external sources or other contexts.  Behind contexts our CEP engine squeezes the most out of operating system threads to leverage maximum use of multi-core processors.

The same CEP engine can create multiple contexts (a context pool as you'll soon see in the code example below), they can be used to hold and execute multiple scenario instances, additionally those instances can create sub-contexts for additional parallelism. If for example, these instances are an application for pricing Options and require a compute-intensive calculation such as Black Scholes, additional contexts can be spawned for these calculations. Furthermore, sub-contexts can be designed as shared compute services to be leveraged by multiple scenario instances running in different (parallel) contexts.

Contexts take the factory model and extend it to include a parallel execution model with a few simple keywords in the EPL as you'll soon see below.

The enhancements to the Item consumer/producer include a Context Pool which I've listed the code for below and the enhanced Item Producer that leverages it. The interface is unchanged except for one new event and the Consumer (client) has a minor revision  (thus adhering to my belief that an EPL should follow the principles of structured programming of modularity and encapsulation that I've blogged on at the start of this series).  The complete example for this revision is available here and requires Apama version 4.1 (or later of course).





The Context Pool
.

package com.apamax.sample;


event ContextPool {
    integer numContexts;
    sequence<context> contexts;
    integer idx;
   
    action create(integer nc, string name) {
        self.numContexts := nc;
        while(nc > 0) {
            contexts.append(context(name, false));
            nc := nc - 1;
        }
    }
   
    action getContext() returns context {
        context c:= contexts[idx];
        idx := idx + 1;
        if(idx=numContexts) then {
            idx := 0;
        }
        return c;       
    }
}


The ContextPool as implemented here is a general-purpose utility that provides a pool of contexts via a create method (i.e. action) and a means to distribute a workload across them in a simple round-robining technique each time the getContext action is called.

As I mentioned above contexts are mapped to operating system threads, so judicious use of the create action is expected. The basic rule-of-thumb is that number of total contexts should equal the number of cores on a server.  One noteworthy point, contexts can be public or private. A public context means that event listeners running within it can receive event streams from external sources (i.e. adapters), listeners within a private context can only receive events that are directed  to the context via the enqueue statement in application logic running in another context. For my example, this context pool utility creates private contexts: context(name, false)

I've leveraged another general capability of the Apama EPL in the implementation of this context pool, that of actions on events. You'll notice these two actions are enclosed in an event definition which is part of our com.apamax.sample package.

In keeping with it's charter of structured programming,  actions on events provides a means to promote code modularity by encapsulating reusable utility functions (like a context pool).


 


The (parallel) Item Producer
.
package com.apamax.sample;


monitor ItemService {
   
  event ClearUserID {
      integer id;
  }

            
  integer count := 0;
  float price := 0.0;
   
  action onload {
      ContextPool cf:=new ContextPool;
      cf.create(4, "ClientService");
   
      // list of subscriber (user) identifiers
      sequence<integer> ids := new sequence<integer>;
       
      SubscribeToItems s;
      on all SubscribeToItems():s {
          if ids.indexOf(s.subscriberId)= -1 then {
              context c:= cf.getContext();
              ids.append(s.subscriberId);
              route SubscriptionResponse(s.subscriberId, c);
              on completed SubscriptionResponse() {
                  spawn startSubscriptions(s.subscriberId, s.item_name,
                                           context.current()) to c; 
              } 
          }
      }
       
      ClearUserID c;
      on all ClearUserID():c {
          log "in " + c.toString();   
          integer index := ids.indexOf(c.id);
          if index != -1 then {
              ids.remove(index);
          }
      }
  }

  action startSubscriptions(integer this_subscriberId, string name,
                            context mainContext) {
      log "in startSubscriptions";
       
      on all wait(0.1) and not UnsubscribeFromItems(subscriberId =
                                               this_subscriberId) {
          route Item(this_subscriberId, name, count, price);
          count := count + 1;
          price := price + 0.1;
      }

      on UnsubscribeFromItems(subscriberId = this_subscriberId){
          enqueue ClearUserID(this_subscriberId) to mainContext;
      }       
  }
 
}



To get a general sense of what the multi-instance Item Producer code is intended to do, I suggest a quick scan of my last installment, this revision does not change that basic foundation it only parallelizes it. It is worth pointing out how little the code and design has changed yet this implementation has the ability to scale massively to tens of thousands of instances across multiple processor cores.  Clearly this is just a simple example that does very little real work (producing Item events). However structurally, it's a model that represents how one would design such a scalable service in the Apama EPL.

The parallel Item Producer (like it's previous incarnation) manages multiple uniquely identified Consumers. For that it must maintain a list of identifiers, one for each Consumer.  But this time, the Producer instance created on behalf of the Consumer is spawned into a context:  spawn startSubscriptions(s.subscriberId, s.item_name, context.current()) to c; We're still passing the subscriberID and item_name, (the instance parameters) but we also pass the context handle of the main context (context.current()).   This is necessary for the inter-context communication.  

The Consumer implementation has undergone a minor change to support this parallelized execution mode to match the Producer.  A good design pattern is to ensure that monitors that frequently pass events operate within the same context. This is not a hard-fast rule, only one that limits the amount of inter-context communication (i.e. enqueueing).  I've enhanced the interface slightly, there is a new event, SubscriptionResponse  that is used as a response to subscription requests (on all SubscribeToItems()) .  This event is used to communicate back to the client the context handle of the Producer spawned on its behalf. Once the Consumer receives this event, it also spawns into this same context. By doing so, both the Producer and Consumer operate as they always did sending Item events (route Item(this_subscriberId, name, count, price)) and handling termination (on UnsubscribeFromItems).  Within each context, the producer/consumer still adheres to that single-cast event passing scheme where it creates and sends uniquely tagged Item events. The Consumer and the Interface are included in the download (not shown here for brevity's sake).

Two additional noteworthy points to highlight in this Producer implementation.

1) The on completed SubscriptionResponse() listener.  The completed  keyword indicates that this listener wakes up after the SubscriptionResponse  event has been delivered.  This way we can guarantee that our Consumer has received this event and has the context handle before spawning the Producer.

2) To process UnsubscribeFromItems events, the statement: enqueue ClearUserID(this_subscriberId) to mainContext; is executed.  This statement is used to send an event to the listener (on all ClearUserID) which executes in another context. Recall, that the action startSubscriptions is the target of the spawn operator. So this is the main body of code for which multiple instances are parallelized running in contexts (from the pool). The onload action, which is controlling all of this spawning is logically considered the main context. Due to the strong semantic for inter-context communication, events must be enqueued to another context's input queue. Each context in effect has its own input queue and with the context handle the inter-context communication mechanism is defined. So to communicate the client termination request from the spawned instance running in a private context the ClearUserID event must be enqueued to the main context where the appropriate listener is waiting.

Routing (i.e. route Item(...)) is still possible, but routed events stay within the boundaries on the context where the Producer and it's corresponding Consumer reside.  To logically expand the example, multiple Consumers could reside in the same context (i.e. a multi-cast design pattern as I described in the previous revision of this example).

 

This example is designed to illustrate the simplicity of parallelism in the Apama EPL. With just a few simple statements, one can quickly and easily leverage multi-core processor technologies for massive scalability.

As I mentioned earlier this is the final entry for this specific example, if you're just seeing this for the first time you can start from the beginning (only three short segments) here. I hope this has been informative and provided some insight into the Apama EPL, I plan to have many more code examples in the future on various use cases.

You can download the complete example here with the consumers, interface and producer. Any questions or comments, just let me know,
Louie


Friday, April 03, 2009

Apama showcased at Intel Nehalem Launch

Posted by Louis Lovas


I have a need for speed. A most poignant phrase for Intel's Nehalem (Xeon 5500) Processor Series. Earlier this week with great fanfare Intel launched their new Xeon 5550 Processor at the NASDAQ MarketSite Tower in New York. It was a spectacular event with mammoth displays, videos, demonstrations, guest speakers and a grand crowd enjoying the sights, the news and of course the abundant refreshments. I had the privilege of a front-row seat during the keynote from Sean Maloney, Intel's Executive VP for Sales and Marketing. In that keynote presentation Sean spoke of the vast performance gains Intel has attained with the Xeon 5500 processor over previous generations in performance yet with lower power consumption.  The main thrust of Sean's address was to stress the relevance of the Xeon 5500 to Wall Street.

To showcase this relevance Intel chose a few of their key partners to be part of this keynote presentation, the Apama CEP Platform, Rapid Addition and Thomson Reuters.  Sean, with the assistance of his colleague Adam Moran, did the demonstrations themselves in front of a crowd of about 200+ Wall Street dignitaries and a crew from the press. I had that front row seat to answer any follow on questions stemming from the demonstrations.

To showcase both the Apama CEP platform and the Intel Xeon 5500 Processor, which we did in two distinct demonstrations for Sean, we wanted to highlight the scalable architecture of the Apama platform with something that is near and dear to Wall Street - profit. For the first demonstration, we took one of our standard high-frequency, alpha-seeking strategies (Statistical Arbitrage), scaled it to 1600 instances across 16 threads on 8 cores. During the demonstration market data was piped into all 1600 instances at a whirlwind pace, measuring the throughput and monitoring the overall P&L. This of course was done on both the new Xeon 5500 (code-name Nehalem) and the previous generation of processors, Xeon 5400 (code-name Harpertown).  A visual of the side-by-side comparison was watched live by the crowd on dashboards which I've captured below:

Xeon 5400 (Harpertown) showing the baseline of the performance benchmark for a high-frequency Trading Strategy (Stat-Arb)


Xeon 5500 (Nehalem) showing the improvements acheived in Profit and Throughput for a high-frequency Trading Strategy (Stat-Arb)

The two main improvements achieved by the Nehalem processors are more than doubling of throughput (2.31 vs. 1.01) and a greater profit margin ($15,688,647 vs. $7,504,386). The increased profit was achieved through an increase in the number of arbitrage opportunities due to greater throughput. All this across the same market data over the same time span.  

The second demonstration Sean and Adam presented was the Apama Smart Order Router which was also run on both processor generations (Nehalem and Harpertown) showing the performance gains in Order processing achievable by the Nehalem processor technology.

With the most recent version of the Apama CEP product, we've provided not just a scalable platform but a complete stack, which has always been our vision. One of scalability and usability. We offer tooling for developers and analysts alike. The Apama platform includes a full-fledged event processing language (EPL), dashboard designer and graphical modeling tools. Yet we've not ignored the java developers out there, we support a java language binding to our CEP engine that can play along side our EPL and modeling tools.  Behind that scalable CEP engine, it's connectivity that makes a platform real, thus we provide over 50 adapters to market data, order and execution venues, and standard databases and message buses.  

Intel also did a UK launch of the Nehalem in London on April 1st, the day after the New York launch where my colleague Gareth Smith, the Apama Product Manager was a panelist. He spoke on our experiences partnering with Intel over these past few months.

Nehalem is truly a remarkable achievement by Intel, in fact they've publicly stated it's the most significant processor achievement since the Pentium Pro release back in 1995. I was proud to be part of this significant event and its possibilities for Apama, the CEP industry and Capital Markets. After the main speaking engagements I spoke with numerous people equally exuberant. As I've often said, in Capital Markets we're on a race to the micro-second. We've just edged a bit closer.