Serge Blais

« 4/4 XACML SE used... | Main | iep for Message... »
Thursday Feb 21, 2008

CQL, Great, yet another standard....

Tags: , , , , ,

CQL, A New Monitoring Paradigm ?

Well, not really a standard. More like a maths based principle, a proposed solution for Data Stream Management System. It is studied under the very large database (VLDB) field in the Stream Data Management Systems. See here for the broader subject. For this paper, we will focus on the following:

"CQL is an expressive SQL-based declarative language for registering continuous queries against streams and updatable relations." RefA

Said otherwise, this is a live query, connected on a stream of events, updating its result with time passing. CQL is a language specialized in queries time based data flows. If you would like to take a look at a sample (aside from the ones provided in references), take a look at my fishy example...

For the enterprise developer, this has the potential to become as relevant to his job as SQL is. CQL is used by the Intelligent Event Processor (iep) of the Open ESB bus (and soon the Java CAPS 6 implementation). But first thing first. Let's take a look at what it is, and what it actually means to us, enterprise teckies...

What are the Problems we're Trying to Solve?

CQL is not only for application monitoring, but also for providing business and technical solutions to time based problems.

  • How many accesses were done by Fred in the last minute?
  • Did we get any sales of red cars in the last week?
  • Is the number of JMS messages in the broker increasing over time? What changed?
  • Do we have about the same number of requests served per minutes in each of the instances of the cluster?

While all of these problems can be answered using the current technologies, CQL promises to do so, in a more elegant fashion, and easier. I always say that all problems can be solved using Assembler, if you would have enough time. Well, CQL has the promise to help resolve time based problems faster than using other technologies, e.g. with less resources. But for this, we have to put back our student hat, and learn an new way of dealing with this type of questions. Luckily for us, CQL is built as an extension of SQL, so we just need to learn what is so new about it.

How does CQL applies to Open ESB Based Systems?

Open ESB allows the integration of a CQL engine (the Intelligent Event Processor - iep service engine) seamlessly with the other components of the bus. Consequently, the iep service engine can be triggered by a message on the MNR or it can expose an external access through a BC.

Introducing Some Basic CQL Concepts

The following diagram is a very basic picture of what an iep process looks like. There is

  1. an input stream
  2. a method to capture the data window we need,
  3. and an output.

Bear in mind that this is just some of the elements you will find in the iep processor, and this is by no mean a replacement for the documentation. This is just to give you a flavor of what it is...

Stream of Events

This is the feed of data set that are being processed. The stream can be an input or an output.

Each data set is a tuple (an unordered composite data structure). Each tuple is assigned a timestamp that is the time in which the data entered the stream. Who assigns this timestamp is application specific. The specific composition of the tuple is known as the Schema. You can see the schema as the list of columns in a SQL table.

Relations

For the sake of this discussion, this is the same as a relational table. "A relation is a collection of tables that have the same schema, and which are all indexed by time." RefB

Tables

A table will be a concrete repository for the result of a relation. "Tables are a finite collection of events that belong to the same schema." [...] "The Table operator provides a snapshot of the current state of a relation." RefB

Transformations and Filtering

These are the operations that be done on the stream. While Filtering will keep the same schema in and out, the transformation will potentially produce a new schema.

Sliding Windows

"...a window that at any point of time contains a historical snapshot of a finite portion of the stream." RefA page 9. So this is our two bridges in the river example. A sliding window sets the context of the CQL operation.

IEP Processor, Development Process

  • Start with the result you want

Following a use case based approach, it only makes sense to build a system when you know where you wanna go, just like building mockup screens of a new system, to validate what it should look like and what functionality it should expose. In iep, knowing what you will do with the results (display it, forward it, store it,...) will help in defining its format.

  • Make sure you have access to the inputs you need

The systems to which you need to connect may not produce the data you need. You may need to ask another group to generate the stream you need, or you may need to ask for being able to access it.

  • Map out the steps to getting to your results

At this point, you know the ins and outs of your iep process. You have access to the data, and the result will make sense (and be useful) to someone. Time to estabish the transformation of the instreams into the outstreams.

  • Don't be afraid to go out of the iep processor if you need to!

IEP is still in its infancy. Some transformations that you would need for your solution, like a Fourier transform for example, may not be available. You may need to provide for a temporary stream to an internal processor, and listing to a stream out of this processor. Don't be afraid to integrate with other technologies (and raw coding), but remember that you may be skewing your timestamps doing so.

Quick usage ideas and samples...

Pre-made iep processors? Yes, a virgin area for pattern development... Here are some problems domains, that may gain from using cql in their solutions. This is, by no stretch of the imagination, a complete list, or the only way to resolve such problems.

Comparing input steams, and emiting an event if the variance is too large.

Applicable to:

  • Load balancing monitoring
  • Broker monitoring (consumers/producers)
  • Business level concepts (production chains, call centers,...)

Flaging high flyers

Applicable to:

  • Quality control (out of norm elements)
  • Too much resources consumed (CPU, Memory, etc...)

Time based trends

Applicable to:

  • Increase/Decrease in traffic monitoring (high/low)
  • Denial of service attack detection (tuple based detection)

Telco Applications

Applicable to:

  • Notification to a cell phone (SIP Listener) of a home/corporate event.

Banking/Investing Applications

Appliable to:

  • Transaction Monitoring (on an account, a stock, etc...)

And of course, Business Application Monitoring (BAM)

Applicable to:

  • Trend Detection
  • Sales Decomposition (last week, last day, last hour)
  • What is currently being built on the production floor
  • Calls in queue for a call center.
  • ...

Further Readings...

Following here, are papers and tutorials I found interesting. Take a look at these to accustom yourself with this field and way of thinking.

Linear Road: A Stream Data Management Benchmark.-Arvind Arasu .- Proceedings of the 30th VLDB Conference .- Toronto, Canada, 2004

IEP SE Page (Workshop, How to, ...)

IEP Quick Start Tutorial

Hands on lab by Sang Shin Slight note on this one. You can use SOAP UI to test your deployment. So if you wish, you do not need to use the source code to test the service, as recommended by the exercise. Just use SOAP UI, and it will be fine. If you play around with this, you'll notice that the WS request doesn't have any responses, even when there is an error in the schema. So, you'll have to keep an eye on the Glassfish log file for monitoring. (This is with the nov 2007 version of Open ESB).

Open esb, developer's guide to iep

Next Steps...

For me, the next step is to start implementing some small sample of the iep (see the fishing expedition), and see how I can integrate this into the XACML SE. There are many ways to provide for this integration, and this will serve as a method of exploring these methods.

References

RefA . The CQL Continuous Query Language: Semantic Foundations and Query Execution .- Arvind Arasu and Shivnath Babu and Jennifer Widom .- Stanford University

RefB. Open esb, developer's guide to iep

Comments:

Post a Comment:
  • HTML Syntax: NOT allowed