Carrying the Load: Enabling the CWS Auto-flexing Feature.
A Bit of Background
Though CWS architecture is the subject of a future article, it's worth mentioning up front that the primary on-grid components of CWS are the agent server (AS) and the web server instances themselves. The agent server is a pure Project Caroline (pC) artifact, while the web servers are GFv3 web tier code wrapped by a pC aware launcher. While there can be upwards of thirty-two web server instances there is but one agent server. The agent server acts as a control surface through which deployers can manage lifecycle of both the servers and their applications. Beyond this, the agent server also acts as a data collection point for statistics related to how well the individual servers are performing. To do so, the AS, maintains back channel communications with each web server.
The Knobs We Use to Automatically Flex
We all tend to be prisoners of our past experiences and CWS, particularly in the case of auto-flexing, is no exception. Having co-authored the flexing code for the Sun Grid's ComputeServer Project, at least initially, it didn't seem completely insane to adopt a similar strategy and utilize the principles underlying the work of A. K. Erlang to both provide the basis for monitoring information as well as the foundation for self directed auto flexing. The algorithm employed requires that we understand two important characteristics of the system. We need to be able to obtain the number of requests for service that are made against the system and we need to be able to characterize the rate at which these requests are completed. Ideally, we'd like impose a requirement on the Caroline API to provide the raw measurements at the hardware switch on a per virtual service basis, but alas, our currently supported hardware affords us no such monitoring notion. This leaves us to the remaining option which is to instrument the web servers themselves and periodically gather up the numbers for publication.
Fortunately, GFv3 incorporates Grizzly as its connector technology. The Grizzly API makes available a number of interesting statistics for our use. Currently, we need to fabricate numbers from what it provides. To count the received requests, when we take a snapshot of the numbers, we tally all the processed requested (both successful and failed) and the current depth of the queue which yields the total number requests made upon a given server for the time period in question. Determining the average amount of time taken to process a given request is a bit more complicated. As Grizzly is multi-threaded, but only returns the cumulative amount of time spent processing threads, it is necessary for us to query the server periodically to determine the average number of threads in use during the polling period. When the AS requests to statistics from an individual server, the number returned reflects this average. As this is a somewhat imprecise number, one notable difficultly here is that it is possible to exceed real time, that is that number_of_requests_processed * average_processing_time really shouldn't exceed the polling interval, although in practice this can be seen on occasion.
Given that the demand level for service is rather unpredictable, at least from the CWS vantage point, the AS employs some simple, perhaps ill conceived, modifications of Erlang. Rather than accumulate load averages over an indefinite time period, at the moment, the AS maintains a cache of the ten most recent samples. These samples have been taken with a regular periodicity, currently once every ten seconds, and the results are then averaged. Currently, the algorithm doest not weight the most recent sample over its predecessors, although it might in future. A further modification is that we do not wait to flex until the algorithm predicts that requests for service are spending time enqueued, rather we flex when the Erlang intensity level reaches .90 (as opposed to 1.0). Observationally, testing seems to indicate that while the servers do ultimately respond to load, there is a tendency to lag actual demand (most likely due to the unweighted nature of the sample cache) and thus to some extent, throttle it. For spikes in usage this seems somewhat less than desirable, but on the other hand, the system does seem to respond without much tendency to hunt while under constant load.
But Wait, There's More
In the case of a two tiered application constructed as a servlet tied to a pC Database resource, CWS auto-flexing may very well provide all the dynamic allocation necessary to meet user demand. But undoubtedly, there are other more complex architectures which might employ multiple horizontally scaled components. In these instances, CWS directed auto-flexing might not be as appropriate as the actually performance bottlenecks might reside elsewhere. In these cases, what can be done?
We believe that the answer lies in coupling the numbers that CWS produces with those of other components making up the service. Our postulate is that CWS will be called upon to provide the visibility necessary such that an outside policy engine could make well reasoned decisions about where and when to allocate more resources. To that end, CWS provides both an API for specifying the running number of servers to deploy as well as an interface for grabbing the numbers that it employs to make decisions around auto-flexing. For complex services, a dedicated policy engine, fully aware of the application's architecture, can make use of this data to dynamically provision not only the web tier, but other aspects of the application. We expect as more pC components become available to test this assertion in future.
Conclusions
One of the more important parts of the pC value proposition is trying to eliminate the need for human interaction in regard to the day to day running of service deployments. pC's API has allowed for the creation of a web tier that is capable of managing resources in response to actual demand. While still in its infancy, CWS's ability to automatically respond to load hints at interesting new ways to enhance efficiency and independence of action for software as a service solutions.