« April 2007 | Main | June 2007 »

May 2007

Wednesday, May 16, 2007

Complex Event Processing at CERN

Posted by Giles Nelson

This week I visited CERN in Switzerland, the European Organisation for Nuclear Research, who is a customer of Progress. It was an astonishing and inspiring visit. CERN is in the final stages of building the Large Hadron Collider (LHC) which is due to go into production late this year. The LHC consists of a 27km loop in which protons will be accelerated and collided at unprecedented power levels to give us new insights into the building blocks of matter. In particular the search is on for the Higg's Boson, predicted originally in a paper dating from the 1960s. Finding this will fill a gap in the Standard Model of elementary particles and forces, and will help in furthering a "theory of everything". A particular highlight was to go down nearly 100m underneath the ground to look at the ATLAS experiment - a truly massive particle detector. Its enormous size consists of a number of different elements which detect different types of particles - muons, gluons and many others. The huge magnets which form part of the detector are cooled with liquid helium down to -269 degrees C to make them superconducting (and therefore more powerful). Viewing all this brought home what a remarkable engineering effort it all is.

Anyway, what has all this got to do with events? Well, through a number of presentations that CERN staff were kind enough to give us throughout the day it became apparent that their whole world is to do with events and the processing of them. The term "event" is one which they used often, to describe the information gathered by the detectors which sit around the collider. Every time a set of protons collides sets of events are created which need to be analysed and filtered to determine which are of real interest. For example, there are two ways in which a proton can decay to produce two Z particles (check). One is predicted to involve a Higg's Boson so the set of events to look for is something like "proton collision followed by a Higg's Boson followed by two Z particles". To identify such sets of temporally correlated events the raw events are propagated up through three levels of filter to be finally sent through to a central computing resource for further research and analysis. Up to 40 million collisions per second take place. These are firstly analysed in FPGA hardware reducing the 40 million collisions to a few thousand of interest. These are further filtered in software to produce finally a few hundred. These few hundred are then sent to other computing systems for further analysis.

It's not only collider events that CERN needs to handle. CERN also has a newly built central control centre, part of which is used to monitor CERN's technical infrastructure. About 35,000 separate sensors exist to monitor everything from fire, to electricity substations, to coolant plants. All these sensors are currently producing about 1.6M events per day all of which have to propagated to a central point for analysis. In turn these 1.6M are reduced to 600K events which are overviewed by human operators. Most are inconsequential (for example the 18KeV power supply is still producing 18KeV) but some will require attention. By appropriately analysing these CERN can ensure that the colliders are running as smoothly and as safely as possible. With billions of euros invested so far in the LHC, keeping the collider up and running as continually as possible is a top priority.

The visit proved a fascinating insight into the world of particle physics and the data processing challenges it produces. It really showed event processing at its most extreme.

Monday, May 14, 2007

In Piam Memoriam Fundatoris Nostri

Posted by John Bates

There have been a number of exchanges recently on the cep-interest group and on this blog on the topic of “the origins of event processing and CEP." As someone who has been involved in event processing research and products for 17 years I’ve been asked to add a perspective here. Wow this makes me feel old.

Although I started researching composite/complex event processing as part of my PhD at Cambridge in 1990, I certainly wasn’t the first. So I can’t claim to have invented CEP. As Opher Etzion correctly observed in an email to cep-interest, my experience was also that this discipline originated from the “active database” community. There was much work done prior to 1990, which added events and event composition to databases. The term “ECA rules” – or Event-Condition-Action rules were a popular way of describing the complex/composite event processing logic.

When I was experimenting with multimedia and sensor technologies in the early 90s – and trying to figure out how to build applications around distributed asynchronous occurrences (such as tagged people changing location) – I realized that building “event-driven” applications in a distributed context was a new and challenging problem from a number of angles. I looked for prior work in the area. Although I didn’t specifically find any work in this area, I was able to look to the active database community. In particular, a paper by Gehani et al on “composite event expressions” (as recently mentioned by Opher) looked ideal for the applications I had in mind. This paper outlined an algebra for composing events and a model to implement the subsequent state machines. I implemented the Gehani model as part of my early work. While it was a great concept, it had a number of shortcomings:

  • Although it claimed to be able to compose any complex event sequence, it was incredibly difficult to compose even a simple scenario.
  • It didn’t consider the issues of distributed event sources, such as network delays, out-of-order events etc.
  • It didn’t consider the key issue of performance – how could you process loads of events against a large number of active composite event expressions.

Active databases had only considered events within the database. And databases had fundamental problems of store-index-query – which are not ideally suitable for such fast-moving updates. In order to make composite events applicable as a way of building applications, the above shortcomings had to be addressed.

Composite event expressions was only one aspect of my work initially, but as the volumes of real-time data continued to grow and new sources of data continued to emerge, it became clear that the work in distributed composite/complex event processing had legs. Also, it seemed to excite many people.

There were of course the cynics. Many of my Cambridge colleagues thought that events were already well understood in hard and soft real-time systems and in operating systems – and that’s where they belonged. It is true that event processing has been part of systems for several decades. Traditional systems handle events in the operating system. However, never before had events been exposed at the user level as ad hoc occurrences, requiring specific responses. There was a new requirement for applications that could “see” and respond to events.

Some closed “event-based systems”, such as GUI toolkits, like X-windows, allowed users to handle events in “callback routines”. For example, when someone clicks on a window, a piece of user code could be called. This approach tried to make the most of traditional imperative languages, and make them somewhat event-based. But this paradigm is tough to program and debug – and events are implicit rather than explicit. Also the program has to enter an “event loop” in order to handle these events – to make up for the fact that the programming paradigm wasn’t designed to handle events explicitly.

So we began to realize that events had to be explicit “first class citizens” in development paradigms. Specifically, we saw a new set of capabilities would be required:

  • An event engine – a service specifically designed to look for and respond to complex event patterns. This engine must be able to receive events from distributed sources and handle distributed systems issues.
  • An event algebra – a way of expressing event expressions, involving composing events using temporal, logical and spatial logic, and associated actions. These might be accessible though a custom language or maybe even through extensions of existing languages.
  • Event storage – a services specifically designed to capture, preserve in temporal order and analyze historic event sequences.

A number of my colleagues in Apama worked on these and other areas. As far as a research community went, we published mostly in distributed systems journals and conferences, such as SigOps. We worked closely with other institutions interested in events, such as Trinity College Dublin.

In 1998 I, along with a colleague, Giles Nelson, decided to start Apama. I later found out that concurrent with this, and coming from different communities, other academics had also founded companies – Mani Chandy with iSpheres and David Luckham with ePatterns. These companies had different experiences – David told me ePatterns unfortunately overspent and became a casualty in the 2000 Internet bubble bursting.  David of course went on to write a very successful book on event processing. iSpheres went on to do brilliantly in energy trading but was hurt by the Enron meltdown and struggled to compete with Apama in capital markets. Apama focused primarily on capital markets, with some focus on telco and defence, and went on to be very successful, being acquired by Progress in 2005. Interestly, long after these pioneering companies were started, several new startups appeared – all claiming to have invented CEP !!

So that’s my potted history of CEP. I don’t think any of us can claim to have invented it. I think some of us can claim to have been founding pioneers in taking it into the distributed world. Some others of us can claim to have started pioneering companies. All of us in this community are still in at an early stage – and it is going to get even more fun.

There’s one bit I haven’t talked about yet – and that’s terminology. Most researchers originally called this area “composite event processing”. The term “complex event processing” now seems to be popular – due to David’s book. There are some arguments about the differences between “complex/composite event processing” and “event stream processing”. From my perspective, when Apama was acquired by Progress, Mark Palmer invented the term “event stream processing” to avoid using the word “complex” – which Roy Schulte from Gartner thought would put off customers looking for ease-of-use. However, then it seemed that the industry decided that event stream processing and complex event processing were different – the former being about handling “simple occurrences” in “ordered event streams” and the latter being about handling “complex occurrences” in “unordered event streams”. Now in my opinion, any system that can look for composite patterns in events from potentially distributed sources, is doing complex/composite event processing. Yes, there may be issued to do with reordering but there may be not. It depends on the event sources and the application.

It's often tricky to work out "who was first" in academic developments.  But it's good to know we have some excellent pioneers active within our CEP community who all deserve a lot of respect.

Monday, May 07, 2007

CEP and Capital Markets: What's the fuss all about?

Posted by Richard Bentley

There has been some comment on this blog recently on whether Capital Markets is the most or the least demanding of  domains for CEP. I'm not going to comment on that, but what I will say is that Capital Markets is certainly one of the most *fertile*! Look at any of the Web Sites of the CEP vendors mentioned on these pages and you'll find references to such esoterica as Algo Trading, Risk Management, Surveillance, Dynamic Pricing and so much more - all applications of CEP in the Capital Markets domain.

So why does CEP find such a willing audience in Capital Markets? The obvious answer is to talk about the huge - and rising - volumes of real-time market data feeds (the NYSE OPRA options feed alone can deliver > 100,000 messages per second [ref]); the need to take immediate action in response to patterns in these event streams; and the money to be made by those that get their orders to market first. "High Frequency Trading" was a term barely heard on the trading floor 5 years ago, but is today very much a reality - and it is the emergence of CEP technologies that has made it so.

But it is not just the problem of dealing with the burgeoning volumes of real-time data that are driving Capital Markets firms' adoption of CEP - even the Regulatory bodies are lending CEP vendors a helping hand! The European Union's Markets in Financial Instruments Directive (MiFID), for example, comes into effect on November 1 and, amongst its many provisions, removes the current barriers that prevent listing and trading stocks on any European Stock Exchange. A fragmentation of the European Equity market is widely expected, with firms listing on multiple exchanges simultaneously, and the reduction in barriers to entry fostering the emergence of new exchanges like Chi-X and Turquoise. This fragmentation resonates with the current situation in the Foreign Exchange Markets and, to a lesser extent, in US Equity markets; but why is it of interest for CEP?

The answer lies in the term "Best Execution". If Wile E. Coyote wants to buy a chunk of Acme Corp for example (and he really should ...), then he wants to pay the cheapest price ... but if the stock is listed on 5 markets, priced in several different currencies, with different transaction costs for each market, and is changing on each of these exchanges 10 times a second ... then determining where the "best" is at any one point is no mean feat! You might think that the banks might decide not to bother, but don't worry - the regulators are helping us out here as well! From Nov 1 the brokers will be obligated to offer their clients "Best Execution" - a fact that is driving a huge amount of interest in CEP from Capital Markets right now.

And we haven't even begun to talk about the subtleties of what "Best" might actually mean for those brokers' clients yet ... nor that it's not enough for the brokers to meet their clients' definitions of "Best" - they've got to be able to prove they did too! The ability to capture market data from multiple real-time feeds, capture order execution details, and demonstrate that executions were obtained were in fact the "Best", given prevailing market conditions, will focus attention not only on real-time event processing - but on real-time event capture and replay - one of the key technical underpinnings to a mature CEP offering.

The MiFID regulations, and the similar "Reg NMS" regulations in the U.S., have focused the attention of many of our Capital Markets' clients on real-time market Aggregation, Smart Order Routing, Transaction Cost Analysis and the like. It's clearly a great time to be in the CEP business ... and we've not even touched on real-time risk, pricing, surveillance ... (next time).

I said I wasn't going to comment but: most or least demanding? Who cares ...