« April 2008 | Main | June 2008 »

May 2008

Wednesday, May 28, 2008

Apama Wins FX Algo Award

Posted by Chris Martins

The Apama platform notched another award last week, this time at Forex Network New York, an event hosted by Profit & Loss Magazine on May 22nd. Apama was selected to receive a Digital Market Award as the Best Algorithmic Trading Platform for its ability to power unique FX trading strategies.  FX has proven to be a particularly strong market opportunity for the Apama platform's CEP capabilities. With its breadth of connectivity to FX liquidity venues and rich set of tools, Apama is quite attractive to organizations who want to quickly capitalize on FX trading opportunities with their own unique strategies.   

Pl_award_logo

Thursday, May 22, 2008

Waters Survey - Pick Your CEP Vendor

Posted by Chris Martins

While it may not have the excitement of Hillary vs. Barack (or David vs David for you American Idol fans), we would nevertheless encourage all those eligible (investment firms, hedge funds, brokerages and exchanges) to vote in the Annual Waters Survey, currently being conducted.  Progress is a contestant in two places - for Data Stream Management (#11) and CEP (#20).  Not all the products included in those categories really belong there, but we'll leave it to the voters to reach those conclusions.  Voting ends midnight, May 28th.

Waters_survey_logo

Tuesday, May 20, 2008

TradeTech Recap

Posted by Chris Martins

Colleague, Dr. Giles Nelson, CTO of Progress Software EMEA and a co-founder of Apama, took some time to share his thoughts about this year's TradeTech, Paris, which we have captured in the audio file below.

Wednesday, May 14, 2008

High Availability a mandate for success

Posted by Louis Lovas

                        <p>High Availability: a mandate for success</p>                                                                        


As CEP has moved into the mainstream of mission critical business applications it brings rise to a whole set of new challenges for the purveyors of CEP technologies.  There is a mandate for software platforms that want to play in the enterprise class big leagues.  That mandate encompasses many capabilities over and above the core functionality of a platform, be it CEP or any other software technology. Witness the maturity of common infrastructure platforms today, whether it's messaging, databases, application servers or other, they all share a set of capabilities or ilities necessary to be considered in the enterprise class.

A software platform typically starts its life with an initial appeal of either providing greater business value and/or improved developer productivity. As it moves from being the cool new toy on the bleeding edge, employed for new and emerging use cases, it starts to get more widely deployed in critical business functions. As this maturity occurs, the IT user community wants... demands that the core set of its capabilities be broadened to better support the deployed production environment.  Two primary areas are Management and Monitoring and High Availability.

As a brief description, a highly available platform is one that is tolerant of faults.  Faults or failures can happen to either software or hardware components. A system that can detect failures and recover from them is considered to be highly available. There is a tremendous amount of academic research and technology focused on fault tolerant systems. Availability is the tolerance to failure achieved through redundancy and/or persistence. My intent here is to simply provide a pragmatic perspective to the challenges and solutions for high availability in CEP.

Lost Opportunity vs. Loss-Less

As a broad categorization one can divide CEP applications into two groups.  The first group is rather broad itself and encompasses use-cases such as sensor monitoring, business or system activity monitoring and in Capital Markets certain types of algo trading. From the perspective of high availability I've lumped all of these (and similar) uses cases together because of their failure context.  If for any reason there is a failure that brings down the CEP application (or a key component thereof), the down time represents lost opportunity. While the system is down one loses the opportunity for... an Algo to make a trade, for a fraudulent trade to be detected, for a threat condition to be recognized and so on and so forth.  When the system is once again operational, the lost data is largely irrelevant as in the algo trading case. Market data is very time sensitive and as time advances the value of that data diminishes eventually reaching zero.  In other cases, lost data could be recovered from persistent sources (more on this later).  The overall point of this category of CEP use cases is that failures do not represent catastrophic business failure but a moment lost.

The second categorization tips the scale in the other direction. The classic examples in Capital Markets are pricing engines, crossing engines, Dark Pools and Smart Order Routers. In these examples the CEP application is, in effect the trading destination for client Order flow.  Institutional investors (the clients) place orders to buy or sell large quantities of an asset (stock, currency, options, etc.) with these trading systems. There are both legal and regulatory requirements to ensure the transaction is completed accurately, in the best possible manner (i.e best execution) and without loss of information.  This type of CEP application mandates a loss-less model. It is paramount incoming streams of data (i.e. client orders) are not lost for any reason. Failures of CEP systems deployed for these purposes do represent catastrophic business failure and all the ramifications that can lead to.  As such highly available architectures become a mandate for deployment into the production arena.

High Availability Variants

High Availability architectures fall on a continuum, starting from the very basic ability to recover application state to continuous operation with seamless fail over. Choosing the right architecture for any particular deployment is based on the categorization I outline above (Lost Opportunity and Loss-Less) and the dollar-value cost of up time.

A basic premise of all high availability architectures is the notion of redundancy. Having redundant components provides that safeguard against failure. If a component (software or hardware) fails, the secondary or backup instance assumes a primary role.  In the variants I outline below (cold, warm, continuous) the chief difference between them is the fail over time and consequently downtime duration.

Persistence is another component in a high availability deployment. Persistence for recovery of application state. This may sound somewhat obvious, but it's typically not transparent to CEP applications nor is disk persistence the only means of recovery. Many of the external services CEP applications connect to include a measure of persistence and therefore provide a level of recovery themselves. Reliable message systems, distributed cache engines and in Capital Markets FIX servers provide a means of recovering application state.


Cold Standby

A cold standby deployment provides availability via complete replication of the primary system (both hardware and software). The replica or backup server(s) is considered cold because they do not actively or passively participate in the business function of the primary system. In fact, they could actually be powered off.  In the event of a failure of the primary system, the standby machine assumes control. The process by which that fail over occurs could be one of many. In the simplest case, human intervention manually performs the fail over. This could entail switching network and storage cables, and the like. In a more automated case, the use of clustering software would be employed. These products provide automated detection of faults and automatic fail over of network, SAN and application components (i.e. CEP engines) to backup servers.

A few noteworthy points about cold standby.

     
  1. Fail over time can be quite lengthy, measurable in the tens of minutes at best. If the fail over mechanism is manual the first order of business is a means to detect a failure in the first place. Obviously the application stops working, but having a means to monitor runtime health is always a first-line of defense against failure. I'll cover runtime Management and Monitoring in another blog.  Even in clustered environments, the failure detection and fail over sequence can cause downtime of such duration that a data recovery scheme is necessary.
  2.  
  3. Loss of data is very likely. In the streaming data paradigm of CEP, failure of the application does not turn off the spigot. A cold standby model inherently incurs a downtime period between failure detection and startup of the standby system. If the application environment is in the     Lost Opportunity category then loss of data is largely insignificant.
  4.  
  5. Data persistence and recovery become an inherent part of the CEP application design. CEP applications must be designed to persistent state at appropriate intervals. In the event of a failure, the standby system can recover the application state from the persistent store. As part of the recovery process, the application must be able reconcile the last known persisted state with reality. For example, in the algo trading environment a trading application keeps track of Orders in-market and current position. To safeguard against losing this information in the event of a failure, as the state of Orders and position changes it is persisted to disk (or a SAN device). During the recovery phase, the CEP application must reconcile the Order state read from the persistent store with the state as known by the execution venue(s). A non-trivial task to say the least and one that is very dependent on, and unique to specific trading destinations. If connectivity is via FIX, that protocol provides a Order Status Request feature that can be used to facilitate this reconciliation



Warm Standby

Similar to a cold standby deployment, availability is achieved in a warm standby system through redundancy.  However the similarity ends there, in a warm standby environment the standby or secondary system (or systems) are generally on line with their own private connection to external data sources and networks. However, they operate is a strictly passive mode, leaving all the actual business functions to the active primary system.  The primary and secondary systems, acting in a master-slave relationship keep tabs on one another via a heartbeat scheme. A fail over is initiated when the primary system fails to respond to a heartbeat request from the standby.

A few noteworthy points about warm standby.

     
  1. Minimal fail over time (as compared to cold standby).  Since the secondary system (or systems) are up and running typically with their own connection to external systems, they can quickly assume control in the event of failure on the primary system.
  2.  
  3. Loss of data is minimized. Application state, while actively maintained by the primary system is also monitored and replicated by the secondary (passive) systems. So instead of persisting application state (i.e Orders and positions) to disk, it is replicated in-memory on the secondary systems.  This of course implies reliable inter-connectivity between the systems. The most common means of accomplishing this is via a message bus. Similar to cold standby's data persistence, the propagation of state between the master and slave CEP engines becomes part of the application design.
  4.  
  5. It's important to minimize or eliminate false-positives in the design of the heartbeat scheme. In the ideal case, heartbeating is done on an out-of-band connection between primary and secondary systems, out of the application processing code path and any input/output data queues.  This ensures heartbeating is not (or minimally) effected by fluctuations in processing load or congestion within the application itself.
  6.  
  7. There can be multiple secondary (slave) systems providing multiple levels of redundancy to ensure the absolute minimal downtime.
       



Continuous Operation

A high availability architecture that provides continuous operation or continuous availability implies a seamless fail over model. Like the standby designs, redundancy is the key ingredient. However, unlike either cold or warm standby where the redundant systems are passive, in the continuous environment multiple redundant systems are peers and execute in parallel. Each system runs a full complement of the application code and it's supporting infrastructure (i.e. CEP engine). To make this model function effectively, external connectivity is not uniquely wired to each instance but multiplexed by an independent component that manages both the inbound and outbound data streams to and from all redundant systems. Inbound the data is multiplexed to each instance, outbound this multiplexor manages the duplicate streams using a first out wins model. Since each peer instance is executing in tandem, they are producing duplicate output. In the design of a continuous operation high availability architecture duplicates are a natural by-product. Managing the duplicate output is vitally important.

A few noteworthy points about continuous operation.

     
  1. Fail over is truly seamless or in fact non-existent. If one of the participant instances drops out (i.e. fails) its an inconsequential event since the remaining systems are fully active and continue to operate.
  2.  
  3. Implies strict determinism in the CEP engine. Since all redundant systems are actively engaged at all times, they must produce the exact same output streams given that they receive the same input streams.
  4.  
  5. Failed instances that are brought back on line must have a means to catch up.  Since processing does not stop, a cleanly restarted system must have a means to load state from one of the currently running instances.


Opposing Forces
 
High availability is fast becoming a key constituent for CEP. The three basic variants I've outlined are just the starting points when considering options for a highly available deployment. There are cost considerations to take into account, both from a dollars and cents viewpoint but also manageability, complexity and performance. These are factors to weigh in evaluating or designing a high availability solution. Throughput and low-latency are typically paramount objectives in any CEP application. Wrapping that application in any variant of high availability can impact the overall performance. There is overhead in persisting state to disk or replicating that state across a message bus to secondary systems. To minimize the impact, there are disk and message subsystem solutions designed for extreme low-latency. RamSAN devices leverage solid state disk  technology for improved storage performance. RTI and 29West offer high-speed low-latency reliable messaging. These technologies and others can be employed within a high availability architecture to keep the performance demons at bay.

CEP technology is clearly reaching a level of critical mass as a platform for mainstream business, especially in Capital Markets. Most if not all CEP vendors provide some high availability options within their platform. However CEP is not an island within itself, it lives within a larger infrastructure and the degree of high availability is only as good as the weakest link.  Availability requirements are not just something to add-on after the fact, but should be part of the overall design of the infrastructure, connectivity and the deployed applications.

Thanks for reading...
Louie

 

Thursday, May 01, 2008

An Apama Hat Trick

Posted by Chris Martins

Last week proved to be a busy one for Apama on the marketing front as we issued three separate announcements in conjunction with our presence at the TradeTech show in Paris.  Two of the announcements focused on customers, while the third focused on work that a partner is jointly doing with Apama in the area of market surveillance.

  • ING Wholesale Banking announced that it is expanding its use of Apama, previously focused on algorithms for Benelux Small and Mid-Caps.  ING has re-engineered those algorithms to address markets in Hungary and Poland with a variety of features that include hybrid cross asset algorithms that leverage ING’s direct access within those emerging markets.
  • SEB, the Scandinavian financial group, announced it will expand its use of Apama to deliver advanced order flow monitoring services within a compliance application.  This was a second SEB announcement, following one last year regarding the SEB deployment of Apama to support client trading in Exchange-traded equities and futures.
  • And lastly, but certainly not least, together with Detica, a key partner, we jointly announced a Market Surveillance Accelerator.  Accelerators are extensions to the core Apama platform that help our customers jumpstart their deployments, incorporating business logic and other components like sample dashboards and adapters for connectivity.  In this instance, we are combining the technical know-how and experience of Detica and Apama – both of which are now supporting projects at the FSA and Turquoise - to address the growing demand for real-time market surveillance capabilities.  We’ve previously announced Accelerators for Market Aggregation and Smart Order Routing.   And there'll be more to come.

These three announcements collectively illustrate that a key part of the value of the Apama CEP platform is its versatility.  Apama may initially be deployed in support of a specific application like algorithmic trading or a specific asset class, but upon experience with the product, many of our customers expand their use to different asset classes, different geographic markets and/or entirely different applications – like compliance or risk or market surveillance.