Thursday Aug 14, 2008

I have two excuses for having fallen somewhat silent over the past month or so neither of which invokes the myth of the Medusa. The first is that this being the middle of the season as far as motorsports is concerned, I've been spending far too much time in the service of  spending less of it on a per lap basis. But more to the point, I have failed to be more energetic on these pages, as most of my fuel budget has been invested in the first of a series of rewrites of CWS (Caroline Web Server). While outwardly perhaps the changes appear somewhat minimalist, the innards, somewhat akin to cresting the uphill at Lime Rock, have been significantly displaced.

As pC begins to take shape as a commercially viable platform, most of the team is rethinking much of what we've learned and accomplished over the past two years. We are aware that this might be our last opportunity to get it right, for once we go commercial, much of pC's interfaces and structure will be far less malleable, for better or worse. One of the key issues that has been wonking us for a while now is how, where and when to make both the platform and DE layers less Java-centric. Were this 1979, our approach would likely have been to supply bindings for whatever languages we could think of. If this were the fall of 1996, perhaps we might have been seduced by the siren's song of IDLs and Corba. But in today's world, it seems far more reasonable to navigate these waters by opting for the simplest, most neutral representation one might find, and matey, that has to be JSON

While the platform might be going in a slightly different direction (more on that as it develops), in the case of CWS, we choose to obliterate the current Jeri/RMI implementation in favor of the more ReSTfull landscape afforded by JSON over HTTPS. As the user view of CWS appears to be a single package, in reality, as it both uploads, installs and initially configures the on-grid payload as well as providing the command interface for controlling the farm, its serves two disjoint functions. The installer/agentserver launcher portion of CWS is accomplished purely though webdav and platform API communications, so it remained largely intact. But on the control side, everything has changed and hopefully for the better.  How many non-Java environment will wish to control Java servlet deployments is certainly questionable, but just to plant a little seed, the way CWS is being reorganized going forward, it hopefully wouldn't take a whole lot of effort to run something other than Glassfish as the web server instance. Besides, Glassfish isnt just for Java anymore, what with its Ruby and PHP support.

Being a devotee of simplicity, largely due to my somewhat limited mental capacity, once I wrapped my head around the notion of a document driven API, I found working with JSON to be quite enjoyable. In the end, what had been a dozen or more Java methods boiled down to three main URLs and documents.

 URL  HTTP TYPE
 HTTP METHOD
 /control  application/json
 GET, PUT, DELETE
 /info  application/json  GET
 /deploy  mutlipart/form-data, application/json
 POST, DELETE


The control document informs the CWS agent server to update the current state of the server farm per the contents of the following JSON document:

   {"serverCount" : 0 | 1-32 | "0", "1-32" | "auto",
    "minServers" : 1-32,
    "maxServers" : 1-32      
   }

This simple document handles all the control features for the farm. The min and max server fields are only used if the serverCount attribute is set to "auto" in which case the CWS agentserver will monitor load and flex within the min/max server count specified. Otherwise, set the serverCount to non-zero and the farm runs with the specified number of servers, set it to zero and all the server instances vanish. Cool, huh? Commands can be issued to the agent server at the appropriate external ip address or DNS host binding at port 9464, which enables any on or off grid non-Java application the ability to flex the farm through simple documents set over HTTPS. A GET retrieves the last issued command to the far, while a PUT will set the farm state to the requested configuration. A DELETE will destroy the farm entirely.

The info documents presents a view of the last state captured state of the farm. The contents of the JSON document appear as follows:
{"port":80,
    "monitoringInfo":{"serverCount":"2",
                      "upTime":"0",
                      "suggestedServerCount":"0",
                      "intensity":"0",
                      "averageRequestTime":"0.0",
                      "arrivalRate":"0.0",
                      "serviceRate":"0.0",
                      "projectedIntensity":"0",
                      "queuingProbability":"0"},
    "serverCount":3,
    "autoFlexing":true,
    "state":"Running",
    "maxServers":16,
    "minServers":1,
    "networkName":"cwsfarmNetwork",
    "requestedCount":1,
    "extAddrName":"cwsfarmExtIpAddr",
    "farmName":"cwsfarm",
    "version":"0.3.0"
   }

The deploy document is used to deploy, undeploy and list applications associated with the server farm. Applications are deployed by providing a URL which contains the target name of the war to be deployed. The contents of the war are transmitted as multipart formdata. To deploy or replace an application, a POST is issued to /deploy/my.war where my.war is the application to be installed. The contents of the formdata is unwound onto grid and then submitted to the farm for deployment. To undeploy an application, a DELETE is issued to   /deploy/mycontextpath where the mycontextpath is that of the application to be undeployed. A GET command of /deploy returns a list of the deployed war files (in the next release this will be the changed to reflect the deployed context paths).

One of the cool details of the implementation is that the agentserver's uses the Java internal lightweight http server, which hopefully means little appreciable bloat to the memory footprint. In fact, document retrieval via a brower is very, very fast, so performance looks on par, if not better than the previous methodology. On grid the documents are unencrypted, as we employ a the platform's L7 Virtual Service to handle the encryption/decryption from off grid clients, so requests from on grid within the same network can be done in plain text. So while the Java API still is in there and available for those whom Java remains their first love, we've now exposed a mechanism by which others can play too. Not a golden fleece perhaps, but a pleasant voyage none the less.

Tuesday Jun 10, 2008

Spent quite a full today hanging around the Boston waterfront, generally in the shadows, trying to get a sense of just what vision the term Enterprise 2.0 was attempting to conjure. Certainly, there were quotable quotes aplenty from the luminaries to draw upon. But of course, these days anything that even remotely resembles a movement requires such mantras and they're always open to wide interpretation. So by mid morning it hadn't come as a shock that Beelzebub had taken up his usual residence in the particulars.

In some ways and at some levels, I'm a disinterested observer. Even on the best of days, I really can't pretend to get excited about wikified-socially-networked CRMs.  Apparently, this is not as true for hard core devotees still fully immersed in bluish versions of the enterprise software world. These acolytes seem intent to light candles at the altar of subsuming all the latest internet kit and simply deploying it atop or along side everything they've come to know and love over the past few decades. Bring in the new Gods to stand with the old. I wish them well. But I'm supremely skeptical that they'll succeed in boiling this ocean simply by replacing or augmenting the current set of tools with a healthy dose of cool, contemporary, collective collaboration.

Maybe I'm missing something, got it all wrong or just of too suspicious a nature, but I couldn't shake the feeling this particular brand of Nirvana was issuing forth from big traditional softwares houses who were simply salivating over the prospect that finally there was a new way to sell spreadsheets and databases. Regardless, the faction embracing collaboration as the primary benefit of harnessing Web 2.0 concepts seems to ignore some of its primary tenets, namely the notions of rapid innovation, creation, synthesis and ease of connectivity. I heard a lot of talk around simplifying UIs, but nothing about the same aesthetics being applied to hardware and software architecture, deployment, scale, or any of the other characteristics that must be present for such flexibility. Maybe that will be in tomorrow's talks. In the meantime, it strikes me, that public web-based Saas inherits an extremely important property of the web itself, namely Darwinism.  Create a meme that no one cares about and it invariably goes the way of the trilobytes, only far more rapidly. Will enterprise Saas, if not similarly conceived and deployed, retain such properties when subsumed internally? Surely, the ability to rabidly create, cleanse, reshape and replace software is a far more fundamental and compelling a notion than embracing blogs and wikis for managing professional interactions.  Unfortunately, for the technically inclined, it may very well be that in the back room of the enterprise no one can hear you scream.

Yet while aspects of what was presented seemed like simply a new riff on an old theme, judging by the murmurs of the attendees, their view is still a little cloudy. Perhaps this is because while many have a fairly good idea of what they think they want, they have yet to figure out just how to pull the whole deal off. As a result, despite the fact that only a few of the multitude had actually heard of pC, I'm encouraged. Maybe they are sensing what we think we already know.  Truly new paradigms demand truly new platforms.

Wednesday May 28, 2008

Perhaps the most interesting and potentially controversial feature of the current Caroline Web Server (CWS) implementation is its ability to monitor incoming request load and automatically allocate or deallocate server instances in response to it. Auto-flexing is an optional feature which can be enabled either through the CWS command line interface or when if bound as a component of a larger service through the CWS API. In this article, I will briefly present an overview of this feature and how it is currently implemented.

 A Bit of Background
Though CWS architecture is the subject of a future article, it's worth mentioning up front that the primary on-grid components of CWS are the agent server (AS) and the web server instances themselves. The agent server is a pure Project Caroline (pC) artifact, while the web servers are GFv3 web tier code wrapped by a pC aware launcher. While there can be upwards of thirty-two web server instances there is but one agent server. The agent server acts as a control surface through which deployers can manage lifecycle of both the servers and their applications.  Beyond this, the agent server also acts as a data collection point for statistics related to how well the individual servers are performing. To do so, the AS, maintains back channel communications with each web server.

The Knobs We Use to Automatically Flex
We all tend to be prisoners of our past experiences and CWS, particularly in the case of auto-flexing, is no exception. Having co-authored the flexing code for the Sun Grid's ComputeServer Project, at least initially, it didn't seem completely insane to adopt a similar strategy and utilize the principles underlying the work of A. K. Erlang to both provide the basis for monitoring information as well as the foundation for self directed auto flexing. The algorithm employed requires that we understand two important characteristics of the system. We need to be able to obtain the number of requests for service  that are made against the system and we need to be able to characterize the rate at which these requests are completed. Ideally, we'd like impose a requirement on the Caroline API to provide the raw measurements at the hardware switch on a per virtual service basis, but alas, our currently supported hardware affords us no such monitoring notion.  This leaves us to the remaining option which is to instrument the web servers themselves and periodically gather up the numbers for publication.

Fortunately, GFv3 incorporates Grizzly as its connector technology. The Grizzly API makes available a number of interesting statistics for our use. Currently, we need to fabricate numbers from what it provides.  To count the received requests, when we take a snapshot of the numbers, we tally all the processed requested (both successful and failed) and the current depth of the queue which yields the total number requests made upon a given server for the time period in question.  Determining the average amount of time taken to process a given request is a bit more complicated. As Grizzly is multi-threaded, but only returns the cumulative amount of time spent processing threads, it is necessary for us to query the server periodically to determine the average number of threads in use during the polling period. When the AS requests to statistics from an individual server, the number returned reflects this average. As this is a somewhat imprecise number, one notable difficultly here is that it is possible to exceed real time, that is that number_of_requests_processed * average_processing_time really shouldn't exceed the polling interval, although in practice this can be seen on occasion.

Given that the demand level for service is rather unpredictable, at least from the CWS vantage point, the AS employs some simple, perhaps ill conceived, modifications of Erlang. Rather than accumulate load averages over an indefinite time period, at the moment, the AS maintains a cache of the ten most recent samples. These samples have been taken with a regular periodicity, currently once every ten seconds, and the results are then averaged. Currently, the algorithm doest not weight the most recent sample over its predecessors, although it might in future. A further modification is that we do not wait to flex until the algorithm predicts that requests for service are spending time enqueued, rather we flex when the Erlang intensity level reaches .90 (as opposed to 1.0). Observationally, testing seems to indicate that while the servers do ultimately respond to load, there is a tendency to lag actual demand (most likely due to the unweighted nature of the sample cache) and thus to some extent, throttle it. For spikes in usage this seems somewhat less than desirable, but on the other hand, the system does seem to respond without much tendency to hunt while under constant load.

But Wait, There's More
In the case of a two tiered application constructed as a servlet tied to a pC Database resource, CWS auto-flexing may very well provide all the dynamic allocation necessary to meet user demand. But undoubtedly, there are other more complex architectures which might employ multiple horizontally scaled components.  In these instances, CWS directed auto-flexing might not be as appropriate as the actually performance bottlenecks might reside elsewhere. In these cases, what can be done?

We believe that the answer lies in coupling the numbers that CWS produces with those of other components making up the service. Our postulate is that CWS will be called upon to provide the visibility necessary such that an outside policy engine could make well reasoned decisions about where and when to allocate more resources. To that end, CWS provides both an API for specifying the running number of servers to deploy as well as an interface for grabbing the numbers that it employs to make decisions around auto-flexing. For complex services, a dedicated policy engine, fully aware of the application's architecture, can make use of this data to dynamically provision not only the web tier, but other aspects of the application. We expect as more pC components become available to test this assertion in future.

Conclusions
One of the more important parts of the pC value proposition is trying to eliminate the need for human interaction in regard to the day to day running of service deployments. pC's API has allowed for the creation of a web tier that is capable of managing resources in response to actual demand. While still in its infancy, CWS's ability to automatically respond to load hints at interesting new ways to enhance efficiency and independence of action for  software as a service solutions.



Tuesday May 20, 2008

We of Project Caroline (pC) have been pretty busy lately given our recent initial roll out of the Public Evaluation (PE) grid. While there's little to suggest that the pace of our development will be slowing anytime soon, personally, I've felt the need to begin to scratch an itch thats been bothering me for a while. I'm hoping these pages will provide the right vehicle for doing so. Given the explosion of interest in Cloud computing, scrawling a few random words about pC and the stuff I happen to be working on seems the best way to find relief.

By way of introduction, My current responsibilities on the project revolve around web tier issues, specifically I've been charged with the design and production of the Caroline Web Server (CWS). CWS, actually is somewhat of a misnomer as the CWS service actually allows for the deployment of a farm of horizontally scaled web servers. Currently the business portion of CWS utilizes bits and pieces of the new Glassfish v3 to act as the container. CWS surrounds this engine with the appropriate logic to control both the servers lifecycle as well as application deployment into farm. So to first order, a lot of what will wind up on these pages will be hopefully of interest to those having dealings with pC from a front end perspective.

But, primarily my intent here is to blog mostly on random anecdotal experiences with the pC Platform and its API. Like our development partners, I too am a consumer of the platform and my hope is that we can start dialogue on a variety of topics. The few that I currently have in my head for upcoming articles are to discuss the subtleties of using the pC Filesystem resources, Meta-data usage and conventions, CWS features and improvements as well as an article or two on how CWS and the unzip utility were put together.

Whether all this will be purely Cathartic or perhaps serve a higher purpose, only time and the availability of it will tell. In either case, we'll get started in earnest right after the upcoming release of CWS 0.3.0 is completed in the next couple of days. In the meantime, I bid you peace.

This blog copyright 2008 by Ron Mann


View My Stats