Louis Lovas

Friday, November 20, 2009

Exploration of Apama 4.2 Feature Set Podcast

Posted by Apama Audio

Louis Lovas, Chief Architect of Progress Apama, discusses aspects of the Apama 4.2 release that focus on application developer productivity and how Apama enhances an organization’s ability to built event-driven applications.

Wednesday, November 04, 2009

Apama 4.2 Deeper Exploration: Enhanced Support for Parallelism

Posted by Apama Audio

In this podcast Louis Lovas, Apama Architect, discusses some of the details of the enhanced parallelism that is now available in Progress Apama 4.2.


.

Monday, October 19, 2009

Progress Apama Announcing Latest Release 4.2

Posted by Apama Audio

As a follow up to the Louie Lovas blog posting on October 16th , this  podcast captures a discussion between David Olson and Giles Nelson on Apama 4.2 features.


Friday, October 16, 2009

Apama 4.2 release - Cruising in the fast lane

Posted by Louis Lovas

Apama 4.2 release - Cruising in the fast lane
The Apama engineering team has done it once again. True to our record of releasing significant new features in the Apama product every 6 months, the v4.2 release is hot off the presses with major new functionality. The Apama roadmap is driven by a keen sense of our customer requirements, the competitive landscape and an opportunistic zeal. The engineering team is a dedicated R&D team driven to excellence and quality. We are dedicated to delivering value to our customers. A consistent comment we've heard from analysts and customers alike is the maturity of the Apama product.  

The current v4.2 release, the third in the v4.x family adds significant enhancements in three concurrent themes - Performance, Productivity and Integration. This consistent thematic model is one we've held for a number of years. Below I've touched upon the highlights of the current release along these themes:


  • Performance
High Performance Parallelism for Developers.  The Apama Event Processing Language (EPL) provides a set of features uniquely suited to build scalable event-driven applications.  The language natively offers capabilities for event handling, correlating event streams, pattern matching and defining temporal logic, etc. Equally important, the language provides a flexible means to process events in parallel.  For this we provide a context model and a new high performance scheduler. Contexts can be thought of as silos of execution, where CEP applications run in parallel. The scheduler's role is to manage the runtime execution in an intelligent high-performance way, and to leverage the underlying operating system threading model. It’s via the context architecture that the Apama Correlator squeezes the most out of operating system threads to achieve maximum use of multi-core processors for massive vertical scalability. For IT developers, this is a effective and efficient means to build high performance, low latency CEP applications without the pitfalls of thread-based programming, such as deadlocks and race conditions.

High Performance Parallelism for Business Analysts.  Not to be left out of the race, we've also ensured the scalable parallelism provided in the Apama CEP engine is available through our graphical modeling tool, the Event Modeler. We've had this graphical modeling capability since the very first release of Apama. This tool designed for analysts, quantitative researchers and of course developers, allows you to design and build complete CEP applications is a graphical model.  Parallelism is as easy as an automatic transmission, simply select P for parallel.

  • Productivity

Real men do use Debuggers (and Profilers too). The Apama Studio now sports major new functionality for development, a source level debugger and a production profiler. Building applications for an event-driven world presents new programming challenges. Having state-of-the-art development tools for this paradigm is a mandate. The Apama EPL is the right language for building event-driven applications - now we have a source-level debugger designed for this event paradigm. Available in the Eclipse-based Apama Studio it provides breakpoints to suspend applications at specific points, examine contents of program variables and single stepping. It works in concert with our parallelism as well. Profiling is a means to examine deployed Apama applications to identify possible bottlenecks in CPU usage.

Jamming with Java. We've enhanced our support for Java for building CEP applications. The Apama Studio includes a complete set of wizards for creating monitors, listeners, and events to improve the development process when building java-based CEP applications in Apama.

  • Integration

The (relational) world plays the event game. While we have provided connectivity to relational databases for many years we've made a significant re-design in the architecture of how we do it with the new Apama Database Connector (ADBC). The ADBC provides a universal interface to any database and includes standard connectors to ODBC and JDBC.  Through the ADBC, Apama applications can store and retrieve data in standard database formats using general database queries, effectively turning these relational engines into timeseries databases. The data can be used for application enrichment and playback purposes. To manage playback the Apama Studio includes a new Data Player that enables back-testing and event playback from a range of data sources via the ADBC. One can replay at varying speeds event data and time itself. The tested CEP applications behaves temporally consistent even as data is replayed at lightening speed.

Cruising at memory speed with MemoryStore. The MemoryStore is a massively scalable in-memory caching facility with in-built navigation,  persistence and visualization functionality.  This allows CEP applications, which typically scan, correlate and discard data very quickly to retain selected portions in memory for later access at extreme speed. This could be for managing a financial Order Book, Payments or other data elements that the application needs to be able to access at user’s requests quickly. Furthermore, if required the in-memory image can be persisted to a relational database for recovery or other retrieval purposes, and lastly the MemoryStore allows selected portions of the in-memory cache to be automatically mapped to dashboards.

Well that's the highlights. There were also about a dozen other features within each of these three themes, just too numerous to mention.

We are committed to improving the Apama product by listening to our many customers, paying close attention to the ever-changing competitive landscape and researching new opportunities.

Again thanks for reading, you can also follow me at twitter, here.
Louie



Thursday, October 08, 2009

If You Build It They Will Come, An Apama Algo Webinar

Posted by Louis Lovas

IF You Build It They Will Come
My colleague Dan Hubscher and I just finished the first of a two part Webinar entitled "Build Quickly, Run Fast". In this Webinar we explained and demonstrated Apama as an Algo platform for high frequency and order execution algorithms.

As I've blogged in the recent past it is an arms race in High Frequency trading.  The need to build quickly is a demanding requirement to keep ahead in the race. Being armed with the right tools is paramount. Rapid development and customization of strategies using graphical modeling tools provides the leverage necessary to keep pace with fast moving markets.

To that point, in this webinar I demonstrated a couple of algo examples. The first was a complete strategy that incorporates an alpha element with multiple order execution options. In  designing and building strategies the trading signal detection is just the first part of the problem. This typically involves an analytic calculation over the incoming market data within some segment or window of time. For example a moving average calculation smooths out the peaks and valleys or the volatility of an instrument's price. Once the signal is detected it's time to trade and manage the order's executions. This is a key distinction between other CEP products and the Apama platform for building trading strategies. While it's possible to define an Event Flow in most or all CEP products for data enrichment and data analysis (i.e. the signal detection), for most other CEP products you have to switch out to some other environment & language to build the rules to manage the executions. The Apama platform is about building complete event-driven applications. So trade signal detection and order executions, whether it's a simple iceberg execution or something much more complex it can easily be designed, built and backtested in the same Apama graphical modeling environment (Of course for those more inclined to traditional development tools and methodologies, Apama offers a full suite of developer tools, an EPL, debugger, profiler and java support).

MovingCrossover Image


The second example in the Webinar demonstration was to build a small, but working strategy from scratch. I did this live in full view of the attendees. For this I did a basic price momentum strategy. This tracked the velocity of price movements. The trading signal was a parameterized threshold which indicated when that price moved up (or down) a specific amount for a specific duration.

This webinar is focused on highlighting the ever-present challenges investment firms face in high frequency trading:
  • Fears of the Black Box
  • The simple fact that markets are continually evolving
  • First Mover Advantage
  • Customization is king
Along with my colleague Dan Hubscher,  the Build Quickly webinar describes how the Apama platform delivers solutions to the Capital Markets industry to meeting these needs and challenges. 

Stay tuned for a link to the recording and don't forget to dial in to part II where we focus on performance requirements and characteristics. Again thanks for reading (plus watching the webinar), you can also follow me at twitter, here.

A follow up note, here's the link to the recordings for both part I and part II on Build Quickly Run Fast.

Louie



Wednesday, September 30, 2009

EPTS, the Symposium of Trento

Posted by Louis Lovas

EPTS, the Symposium of Trento
How many angels can dance on the head of a pin? I suppose that was a question debated at the Council of Trent that took place in Trento, Italy back in the 16th century. However, the Event Process Technical Society's (EPTS) annual symposium just last week took up residence in Trento to discuss and debate a host of lofty topics on event processing.

  • CEP's role and relationship to BPM (or more appropriately event-driven BPM)
  • Event Processing in IT Systems management
  • Event-based systems for Robotics
  • EPTS Working Groups ...
While the sessions and discussions on event processing did not have the global significance of angels on pin heads or the Counter Reformation it did provide a clear indication of just how broadly and deep event based systems can reach. Whether it's a business application monitoring mortgage applications, IT management systems in a Network Operation Center, bedside monitoring systems in a hospital or a robot packing pancakes into boxes they all have a common underpinning, consuming and correlating streaming event data.

Granted, not everyone approaches it with the same viewpoint. IT Systems Management people don't think about processing and correlating events, they think about device management, KPI's, Alerts and the like. Someone building, managing a business process is likely concerned with managing Orders - validating them, stock allocations, warehouses and shipments. Nonetheless, a common framework model behind these systems is event processing.

Two of my favorite sessions at the EPTS Symposium were a panel session on the EPTS Mission and an open forum on Grand Challenges, a brainstorming session focused on identifying barriers to the adoption of CEP.

EPTS Mission

Four panelists, myself included presented their expectations of the EPTS and it's role as an industry consortium, it's goals and what improvements can be made. As a baseline, the EPTS does have a existing mission statement defined as ...

To promote understanding and advancement of Event Processing technologies, to assist in the development of Standards to ensure long-term growth, and to provide a cooperative and inclusive environment for communication and learning.


Given this mission statement and my own expectations there are a number of basic intentions the EPTS should provide to the uninitiated to event processing:

Awareness   Provide commercial business and industry the necessary knowledge of event processing as a technology supported by numerous vendors with continuing research in academia.
Definition Provide a concise and definitive meaning of event processing,  a Taxonomy of Event Processing so to speak. This is both from the horizontal technology perspective and also a vertical focus for a handful of specific industries. It's often difficult for business people to understand technology without the context of a business or application focus.
Differentiation  Provide a clear distinction that defines event processing and distinguishes it from other technologies. Event processing is available is many forms, this symposium provided evidence of that.  Much of it is available in specialized form as in IT Systems management. There are also pure play event processing (CEP) vendors, such as Progress/Apama. But there are also Rules engines, Business Intelligence platforms, Analytic platforms, etc. This easily presents a bewildering world filled for choice, conflicting and overlapping marketing messages. The EPTS is in the perfect position to provide that clarity behind defining what is CEP and what isn't.
Cooperative Event Processing rarely operates in a vacuum. There are many synergistic technologies that closely pair with CEP. Often this can have a specific vertical business flavor, but often it's other platform technology such as BPM and temporal databases.


The EPTS has four working groups that have been active for the last year: Use-cases, Reference Architecture, Language Analysis and Glossary. To a large extent the working groups have provided and are working towards the definition of CEP that is clear. However, there still a need to highlight the salient value of event processing. For specific vertical domains, the value of CEP is clear-cut simply because the fit and function is tailor made. In Capital Markets, for example algo trading has all the hallmarks of a CEP application - high performance, low latency, temporal analytics and a streaming data paradigm fit-for-purpose. However, there are other application domains where CEP is equally viable but much more subtle.  I believe the EPTS can provide a vendor-neutral taxonomy of event processing - from the basics to the advanced. Explain why it's unique and different, why language is important and how it is synergistic with a host of other technologies. To this end, the group has decided to form two new working groups to focus on many of these areas. Clearly a forward thinking move.

The Event Processing Technical Society is an organization made of up both vendors and academics. We're held together by a common thread, a goal that the whole is greater than the sum of the parts and our collective will benefit all even as many of us are undeniably competitors.

Once again thanks for reading,  you can also follow me at twitter, here.
Louie



Sunday, August 09, 2009

Riding the Crest of the Wave... the Forrester Wave

Posted by Louis Lovas


In just a few short days of its announcement news of the Forrester CEP Wave has spread to all corners of the globe. From trade magazines, online journals and blogs to Facebook and Twitter, the headlines are everywhere. A Google search yields thousands of hits.  "Independent Research names Progress® Apama® as a Standout Leader in CEP ..."

The Forrester team of Mike Gualtieri and John Rymer state 'The Fledgling CEP Platform Market Is Vibrant, Competitive, And Dynamic'. Of course those of us that have been immersed in event processing for the past few years already knew that. It was our job to convince Mike and John. On behalf of Progress Apama and the CEP community, I would like to extend a word of thanks and appreciation to both of them for their efforts, diligence and patience in putting this Wave together. An enormous task given they reviewed 9 CEP products and vendor strategies in depth. Considering this was the first CEP Wave they also had to define an initial blueprint on CEP by which to evaluate vendors, they did a commendable job. Well done. You can get a complimentary copy of the pdf version from us here.  

It was quite a few months ago when I and a few of my esteemed colleagues began the CEP Wave process. In the abstract it was not too much different from responding to the questions in a prospect's RFP/RFI, for which I and my colleagues have much practice. However, a difference that I found unique was the format. A client proposal is generally a Word document where one can provide plenty of written detail, and diagrams to depict product architecture and function. Forrester Waves are MS Excel spreadsheets. Vendor's responses to the Wave's questions are to fit into an Excel cell. Being a long-winded person, it was a challenge to have the necessary succinctness dictated by the confines of a cell.  My colleagues were quite helpful to this end. 

In short order, it became clear as to the benefits of the spreadsheet format. While many documents - proposals, reviews, evaluations or other become static paper the moment they're published that is not the case with Forrester's Wave. There is a clear intent behind Forrester's use of the spreadsheet format; it creates a living/dynamic document for their clients.  Spreadsheet's by their very nature can be interactive. Spreadsheet formula's can accept user input and recalculate. This capability is exactly what Forrester leverages in the CEP Wave.

The Forrester CEP Wave is divided into three categories:
  • Current Offering: A platform feature breakdown, development and deployment tools and performance characteristics.
  • Strategy: The vendors investment for the future.
  • Market Presence: Customer base.
Within each of these categories is an entire litany of subcategories containing features and criteria by which the product and vendor are measured. Each is assigned a weight as deemed appropriate by Forrester in reviewing the CEP industry at large.  Each vendor is then judged by their merits and scored. The most important aspect of this is the weighting. This is the key that gives the Wave that dynamic nature. From a client perspective, the weighting can be adjusted to suit your specific requirements. If for example, your shop is Windows-only you don't need to have a high weight on multi-platform support, you can lower that value. Likewise, if you have strong need for high availability/disaster recovery you can increase that weighting. Making these adjustments will tune the Wave for your specific requirements. You will then see how vendors stack up against each other with your customized weights. By doing so, what you will find is that the Apama platform pops to the top of the list all too often.

Once again thanks for reading, you can follow me at twitter, here.
Louie



Sunday, June 21, 2009

High Frequency Trading driving the need to build quickly, run fast

Posted by Louis Lovas

<p>High Frequency Trading driving the need to build quickly, run fast</p>


In just about any race there is usually a starting point and a finish line, unless of course you are in an arms race.  For that sort of race there may have been some nebulous beginning in the distant past, but there is no finish line. The race just keeps sprinting along, each competitor angling for an edge, regularly recharging their ammunition supply with some new weaponry to get ahead however slight or temporary.

I recently read an interesting article describing High Frequency Trading as embroiled in an arms race. I certainly believe it's well entrenched in such a conflict, but frankly this combat has arguably had a beneficial net effect especially in that it's contributed to the wellspring of invention, inspiring the creative spirit in all the supporting attributes that make High Frequency Trading a reality. Behind any trader (and trading firm) is an entire armada including the vendors supplying the underlying hardware, networks, software platforms and trading applications.  They are all immersed in the war.  As new hardware, software and/or algo's are deployed it allows the trader to do battle and speed ahead even if it's just for a short while.  Competitive pressures, increasing market volatility, regulatory imperatives, risk mitigation and a host of other challenges are the land mines and roadside bombs on the long and winding road that stall and slow causing re-tooling and re-stocking the ammunition (i.e. algo strategies). There is no time to stop and catch your breath or stand on the roadside.

Sang Lee from the Aite Group reports that High Frequency Trading has had a significant impact on the overall market, providing greater liquidity, tighter spreads and overall improving the quality of the market.  At the macro level these are great advancements and mark a natural evolutionary step due to so many market changes in recent years (i.e. electronic trading venues, adoption of CEP platforms for algo trading, etc.) in Equities and beyond (i.e. FX and Futures & Options).  Down in the trenches, the battles rage on day by day as a multitude of traders and an untold number of algo strategies provide the market liquidity by moving in and out of positions in milliseconds (or even less time). The trading firms engaged in this never ending conflict drive a set of imperatives on software infrastructures for building and deploying algos in the High Frequency battlefield:

Rapid development and customization of algo's

Algo strategies in the High Frequency world have a limited life time. They soon become obsolete (i.e. whatever alpha they took advantage of has disappeared due to the competition, economic changes, or other situations).  To react and respond to this inevitability, having the right sort of tooling to recalibrate strategies is a necessity. This includes graphical modeling tools for Quants to prototype ideas quickly, backtest with historic data, test in a scalable manner to instill confidence prior to production rollout and lastly dynamic parameterization of strategies from graphical dashboards. Not forgetting the code-slinging types, an Integrated Development Environment (IDE) for support of event processing language (EPL) development for more low-level tasks.


Abstracting over increasingly complex strategy logic

Supporting Quants with a rich and robust set of functionality from the basics (connectivity to markets) to the advanced (Linear Algebra, Black Scholes, and other statistical functions).


Support for the 'ilities (availability, security, reliability, ...) to manage the mundane

Deploy with confidence. An important role of software infrastructure is to instill confidence that deployed strategies are always available, securely accessed and run without failure.


Support for scalable performance, providing high throughput and low latency

This is probably the most paramount requirement in the arms race of High Frequency Trading.  The race to the microsecond is pushing both hardware and software vendors alike. Parallelism in CEP engines like Apama's Correlator can leverage multi-core processor architectures like the Intel Nehalem


Along with my colleague Dan Hubscher,  I have recorded a 30 minute webinar that describes how the Apama platform along with the Apama Algorithmic Trading Accelerator meet these imperatives.

The pre-recorded webinar, is available here:  Apama Algorithmic Trading Accelerator, Build Quickly, Run Fast.

Once again thanks for reading (plus watching and listening to the webinar in this case), you can also follow me at twitter, here.
Louie



Wednesday, May 27, 2009

Location Tracking, fit-for-purpose design patterns in CEP

Posted by Louis Lovas


As the CEP community is well aware the technology often gets compared and contrasted to traditional data processing practices. It's a topic that ebbs and flows like the ocean's tides. It's a conversation that occurs regularly in various forums, with clients, analysts and of course within this community at large. Far be it from me to stir things up again but I believe there is some credible evidence the critics draw upon. This touched a nerve not too long ago when Curt Monash hammered the CEP community.  The response was swift, but frankly unconvincing.

In many respects, I believe this argument is rooted in how some CEP vendor's have architected their product. Many vendors have a focus of event stream processing as a variant of database processing. They see streaming data as just a database in motion and therefore have designed and built their products myopically around that paradigm. By doing so those vendors (possibly inadvertently) have plotted a course where the fit-for-purpose of their products is focused on use-cases that are data-intake dominate. They can consume the fire-hose of data, filter, aggregate and enrich it temporally but little else. What is missing in this equation is the definition of the semantics of the application. Whether that is a custom application such as algo trading or monitoring telco subscribers or some business intelligence (BI) dashboard. To those vendors, that is viewed as adjunct or external (i.e. the client) and solved with the typical complement of technologies; java, C++, .NET and considered outside of the direct domain of CEP. 

While this paradigm clearly does work, it incites the (CEP) critics; "where's the distinguishing characteristics? why can't I just do this with traditional data processing technologies?".

A challenging question when so many CEP products are stuck with that look and feel of a database, even a few of the academic projects I've reviewed seem to be myopically centered on this same paradigm. It reminds of that television commercial with the tag line:  "Stay Within the Lines. The Lines Are Our Friends." (I think it was for an SUV). Quite frankly such thinking does not represent the real world. Time to think outside the box (or table as the case may be).

Yes, the real world is full of in motion entities, most often interacting with each other in some way. Cars and trucks careening down the freeway zigzag from one lane to another at high speed with the objective of reaching a destination in the shortest possible time without a collision.  Would be an interesting CEP application to track and monitor the location and movement of such things.  

In fact, location tracking is beginning to show signs of being a common use-case with the Apama platform. Not long ago we announced a new customer, Royal Dirkzwager that uses Apama to track ship movements in sea lanes. My colleagues Matt Rothera and David Olson recently published a webinar on maritime logistics. This webinar follows the same basic premise as the Royal Dirkzwager use-case, that of  tracking the location of ships at sea.  In fact, we aren't the only one seeing activity in location tracking, here's a similar use-case for CEP in location-based defense intelligence.  The basic idea is the ability to track the movement of some entity, typically in relation to other entities, are they getting closer together (i.e. collision detection) or moving further apart (i.e. collision avoidance), are they moving at all? at what speed? will they reach a destination at an appropriate time? A CEP system needs, at it's core the ability to have both temporal and geospatial concepts to easily support this paradigm.  Here's a handful of examples where this applies:

  • Tracking ship movements at sea (as I mentioned with Royal Dirkzwager, and the Apama webinar on maritime logistics)
  • Airplanes moving into and out of an airspace
  • Baggage movement in an airport
  • Delivery trucks en route to destinations
  • Service-enabled mobile phones delivering content as people move through shopping and urban areas
  • Men, machines and material moving on the battlefield


These are just a handful of location tracking use-cases for which the Apama platform is well suited.

Another colleague, Mark Scannell has written a location tracking tutorial that is part of the Apama Studio product. This is a great example that exemplifies the power of the Apama EPL for building location tracking CEP applications. The tutorial provides a narrative description explaining it's purpose and the implementation. Below I've included a small snippet of that example to highlight the elegant simplicity, yet powerful  efficiency of the Apama EPL. If you're in need of a brief introduction to the Apama EPL, you can find that here, the beginning of a three part series on the language.


Location Tracking in the Apama EPL
.
// Track me - the tracked entity
action trackMe() {
       
  // Call self again when new location is detected
  on LocationUpdate( id = me.id ):me {
     trackMe();
  }

  // Detect a neighbor in our local area -
  // do this by excluding events that are for our id,
  // which will also cause us to reset this listener
  // through calling trackMe() again.
 
  LocationUpdate n;
  on all LocationUpdate( x in [ me.x - 1.0 : me.x + 1.0 ],
                         y in [ me.y - 1.0 : me.y + 1.0 ] ):n and
     not LocationUpdate( id = me.id ) {

     // Increment number of neighbors that have been spotted
     // and update the alarm
     spotted := spotted + 1;
     updateAlarm();           

     // Decrement count of spotted neighbors after one second
     on wait ( 1.0 ) {
       spotted := spotted - 1;
       updateAlarm();
     }
  }
}

As a brief introduction, the Location Tracker tutorial is designed to track the movement of Entities (i.e. cars, ships, trucks, planes, or any of those things I listed above) in relation to other Entities within a coordinate system or grid. An entity is considered a neighbor if it is within 2 grid units (-1,+1) of any other entity. The grid and the units within the grid are largely irrelevantly for the syntactic definition of entity tracking. Their semantic meaning on the other hand, is within the context of a specific use-case (i.e. a shipping harbor, air space, battlefield, etc.).

From the tutorial I pulled a single action, trackMe, it contains the heart and soul of the tracking logic.  As entities move they produce LocationUpdate events. The event contains the entities unique id and the X,Y coordinate of the new location. This trackMe action is designed to track their movement by monitoring LocationUpdate events. For each unique entity there is a spawned monitor instance (a running micro-thread so to speak) of this trackMe action. 

The idea is that when an entity moves its new location is instantly compared against all other tracked entities (except of course itself, witness the recursive call to trackMe when id's match (id = me.id)) to determine if it has become a neighbor (remember the 2 grid units).  This is elegantly implemented with the listener "on all LocationUpdate( x in [me.x - 1.0 : me.x + 1.0], ...". In a narrative sense, this can be read as "If the X,Y coordinate of this entities new location is within 2 grid units (-1.0, + 1.0)  of me then identify it as a neighbor and update an alarm condition"  (via  a call to updateAlarm()).

This small bit of code (about 20 lines) exhibits an immensely powerful geospatial concept, the ability to track the movement of 100's, 1000's even 10,000's of entities against each other as they move, and of course this is accomplished with millisecond latency.


 


This small example demonstrates a few of characteristics of the Apama EPL, specifically that it is an integrated well-typed procedural language with event expressions. It allows you to code complex event conditions elegantly expressed in the middle of your procedural program. This allows you to focus on the logic of your application instead of just selecting the right event condition.

However to get a clear picture, the language of CEP is just one aspect of an overall platform. The Apama strategy has also been focused on a whole product principle, one where the whole is greater than the sum of the parts. As a mandate to our vision we have outlined four key defining characteristics: 1) A scalable, high performing Event Engine. 2) Tools for rapid creation of event processing applications supporting business and IT users. 3) Visualization technologies for rich interactive dashboards and 4) An Integration fabric for the rapid construction of high performance, robust adapters to bridge into the external world.

The possibility exists that CEP will struggle to break out as a truly unique technology platform when so many just see a variant of database technology.  It's time to break out of the box, drive over the lines and succinctly answer the critics questions. CEP is not about tables, rows and columns but events. Events that are often artifacts of the real world. A world that is in constant motion, be it ships, planes, car, trucks, phones, or you and me. Information flows from it all in many forms but that does mean we have squeeze it into the database paradigm.

Once again thanks for reading, 
Louie


Thursday, April 09, 2009

Scalable concurrency, a design pattern in the Apama EPL

Posted by Louis Lovas


This is my final installment in a series devoted to a specific example in the Apama EPL. I began this example by describing the basic design pattern of a  consumer/producer.  Further enhancements enabled multiple consumers and as a result the instance idiom.  Finally below, I will again enhance this consumer/producer by illustrating how one can leverage multi-core processors for massive scalability and parallelism.

As I have mentioned before, instances or 'sub-monitors' as they're often referred to in the Apama EPL define a discrete unit of work. That unit of work represents a set of business logic however large (a complete application scenario) or small (a simple analytic).  Instances are created on demand using the spawn operator in the language. Each scenario instance is invoked with a unique set of input parameters that represent that occurrence. Each instance can then uniquely maintain its own reference data, timers and event streams, in effect its own state.  In general programming patterns this is known as a factory behavioral model but we've extended it to include an execution model.

To provide a means to leverage multi-core processors, the Apama EPL provides a syntax and a simple semantic to allow those instances to execute in parallel. We do this with a language feature called contexts. These are silos of execution which take the factory model to the next level. A context defines a logical container that holds and executes instances of a scenario (of the same or differing types). The EPL provides a semantic for inter-context communication, there is no need for mutexes, semaphores or other locking schemes thus avoiding common deadlock code patterns typical of imperative languages such as java. Each context in effect has it's own logical input queue to which events are streamed from external sources or other contexts.  Behind contexts our CEP engine squeezes the most out of operating system threads to leverage maximum use of multi-core processors.

The same CEP engine can create multiple contexts (a context pool as you'll soon see in the code example below), they can be used to hold and execute multiple scenario instances, additionally those instances can create sub-contexts for additional parallelism. If for example, these instances are an application for pricing Options and require a compute-intensive calculation such as Black Scholes, additional contexts can be spawned for these calculations. Furthermore, sub-contexts can be designed as shared compute services to be leveraged by multiple scenario instances running in different (parallel) contexts.

Contexts take the factory model and extend it to include a parallel execution model with a few simple keywords in the EPL as you'll soon see below.

The enhancements to the Item consumer/producer include a Context Pool which I've listed the code for below and the enhanced Item Producer that leverages it. The interface is unchanged except for one new event and the Consumer (client) has a minor revision  (thus adhering to my belief that an EPL should follow the principles of structured programming of modularity and encapsulation that I've blogged on at the start of this series).  The complete example for this revision is available here and requires Apama version 4.1 (or later of course).





The Context Pool
.

package com.apamax.sample;


event ContextPool {
    integer numContexts;
    sequence<context> contexts;
    integer idx;
   
    action create(integer nc, string name) {
        self.numContexts := nc;
        while(nc > 0) {
            contexts.append(context(name, false));
            nc := nc - 1;
        }
    }
   
    action getContext() returns context {
        context c:= contexts[idx];
        idx := idx + 1;
        if(idx=numContexts) then {
            idx := 0;
        }
        return c;       
    }
}


The ContextPool as implemented here is a general-purpose utility that provides a pool of contexts via a create method (i.e. action) and a means to distribute a workload across them in a simple round-robining technique each time the getContext action is called.

As I mentioned above contexts are mapped to operating system threads, so judicious use of the create action is expected. The basic rule-of-thumb is that number of total contexts should equal the number of cores on a server.  One noteworthy point, contexts can be public or private. A public context means that event listeners running within it can receive event streams from external sources (i.e. adapters), listeners within a private context can only receive events that are directed  to the context via the enqueue statement in application logic running in another context. For my example, this context pool utility creates private contexts: context(name, false)

I've leveraged another general capability of the Apama EPL in the implementation of this context pool, that of actions on events. You'll notice these two actions are enclosed in an event definition which is part of our com.apamax.sample package.

In keeping with it's charter of structured programming,  actions on events provides a means to promote code modularity by encapsulating reusable utility functions (like a context pool).


 


The (parallel) Item Producer
.
package com.apamax.sample;


monitor ItemService {
   
  event ClearUserID {
      integer id;
  }

            
  integer count := 0;
  float price := 0.0;
   
  action onload {
      ContextPool cf:=new ContextPool;
      cf.create(4, "ClientService");
   
      // list of subscriber (user) identifiers
      sequence<integer> ids := new sequence<integer>;
       
      SubscribeToItems s;
      on all SubscribeToItems():s {
          if ids.indexOf(s.subscriberId)= -1 then {
              context c:= cf.getContext();
              ids.append(s.subscriberId);
              route SubscriptionResponse(s.subscriberId, c);
              on completed SubscriptionResponse() {
                  spawn startSubscriptions(s.subscriberId, s.item_name,
                                           context.current()) to c; 
              } 
          }
      }
       
      ClearUserID c;
      on all ClearUserID():c {
          log "in " + c.toString();   
          integer index := ids.indexOf(c.id);
          if index != -1 then {
              ids.remove(index);
          }
      }
  }

  action startSubscriptions(integer this_subscriberId, string name,
                            context mainContext) {
      log "in startSubscriptions";
       
      on all wait(0.1) and not UnsubscribeFromItems(subscriberId =
                                               this_subscriberId) {
          route Item(this_subscriberId, name, count, price);
          count := count + 1;
          price := price + 0.1;
      }

      on UnsubscribeFromItems(subscriberId = this_subscriberId){
          enqueue ClearUserID(this_subscriberId) to mainContext;
      }       
  }
 
}



To get a general sense of what the multi-instance Item Producer code is intended to do, I suggest a quick scan of my last installment, this revision does not change that basic foundation it only parallelizes it. It is worth pointing out how little the code and design has changed yet this implementation has the ability to scale massively to tens of thousands of instances across multiple processor cores.  Clearly this is just a simple example that does very little real work (producing Item events). However structurally, it's a model that represents how one would design such a scalable service in the Apama EPL.

The parallel Item Producer (like it's previous incarnation) manages multiple uniquely identified Consumers. For that it must maintain a list of identifiers, one for each Consumer.  But this time, the Producer instance created on behalf of the Consumer is spawned into a context:  spawn startSubscriptions(s.subscriberId, s.item_name, context.current()) to c; We're still passing the subscriberID and item_name, (the instance parameters) but we also pass the context handle of the main context (context.current()).   This is necessary for the inter-context communication.  

The Consumer implementation has undergone a minor change to support this parallelized execution mode to match the Producer.  A good design pattern is to ensure that monitors that frequently pass events operate within the same context. This is not a hard-fast rule, only one that limits the amount of inter-context communication (i.e. enqueueing).  I've enhanced the interface slightly, there is a new event, SubscriptionResponse  that is used as a response to subscription requests (on all SubscribeToItems()) .  This event is used to communicate back to the client the context handle of the Producer spawned on its behalf. Once the Consumer receives this event, it also spawns into this same context. By doing so, both the Producer and Consumer operate as they always did sending Item events (route Item(this_subscriberId, name, count, price)) and handling termination (on UnsubscribeFromItems).  Within each context, the producer/consumer still adheres to that single-cast event passing scheme where it creates and sends uniquely tagged Item events. The Consumer and the Interface are included in the download (not shown here for brevity's sake).

Two additional noteworthy points to highlight in this Producer implementation.

1) The on completed SubscriptionResponse() listener.  The completed  keyword indicates that this listener wakes up after the SubscriptionResponse  event has been delivered.  This way we can guarantee that our Consumer has received this event and has the context handle before spawning the Producer.

2) To process UnsubscribeFromItems events, the statement: enqueue ClearUserID(this_subscriberId) to mainContext; is executed.  This statement is used to send an event to the listener (on all ClearUserID) which executes in another context. Recall, that the action startSubscriptions is the target of the spawn operator. So this is the main body of code for which multiple instances are parallelized running in contexts (from the pool). The onload action, which is controlling all of this spawning is logically considered the main context. Due to the strong semantic for inter-context communication, events must be enqueued to another context's input queue. Each context in effect has its own input queue and with the context handle the inter-context communication mechanism is defined. So to communicate the client termination request from the spawned instance running in a private context the ClearUserID event must be enqueued to the main context where the appropriate listener is waiting.

Routing (i.e. route Item(...)) is still possible, but routed events stay within the boundaries on the context where the Producer and it's corresponding Consumer reside.  To logically expand the example, multiple Consumers could reside in the same context (i.e. a multi-cast design pattern as I described in the previous revision of this example).

 

This example is designed to illustrate the simplicity of parallelism in the Apama EPL. With just a few simple statements, one can quickly and easily leverage multi-core processor technologies for massive scalability.

As I mentioned earlier this is the final entry for this specific example, if you're just seeing this for the first time you can start from the beginning (only three short segments) here. I hope this has been informative and provided some insight into the Apama EPL, I plan to have many more code examples in the future on various use cases.

You can download the complete example here with the consumers, interface and producer. Any questions or comments, just let me know,
Louie