This is a simple example review utilising the project review and programme audit framework that I wrote about in the proceeding article.
In reality any review would be much more complex, especially in terms of how many facts would be gathered, that many of them would interrelated, and would need to be analysed for relationships before any result can be defined, however I wanted to least demonstrate how the framework 'works'.
The example I've chosen is hopefully one that is reasonably familiar to a great deal of my readership, that of n-tier (typically 3-tier) web architecture. For those of you unfamiliar with this type of technology stack let me reassure you that the framework is still valid and that the example given is simple enough to demonstrate how the sections relate to produce the overall review.
Imagine if you will a large enterprise who currently have some problems with their online systems...
- Problem(s)
- Our Web Site won't scale to meet demand
- Fact(s)
- every so often web servers stop processing requests
- web servers keep going into a wait state
- the web servers are dependent upon the application servers
- application servers occasionally stop when responding to requests from the web servers
- the application servers 'pause' because they are swapping connections and session state between DB instances
- the application servers are dependent on the DB servers
- DB servers are randomly crashing
- when a DB server crashes the DB service fails over to another DB server (in an active / passive pair)
- the DB servers were due to be patched
- the DB servers have not yet been patched
- every so often web servers stop processing requests
- Result(s)
- Because the DB servers have not been patched, which fixes a known problem, they have a tendency to run out of resources, which means that the DB fails over to a near by active / passive instance, the Application servers which are managing session state also then attempt to change the DB server they are connected to and pause whilst this is going on, the web servers are similarly affected in that they are waiting to connect to application servers which are now negotiating the new DB server connection
- Conclusion(s)
- unless the web site improves it stability, it will be unable to scale, or to be scaled further, which would mean that it would continue to fail to meet demand and the organisations plans for online growth and uptake, potentially damaging the companies brand and online reputation
- Recommendation(s)
- patch the DB servers
- perform regression, User Acceptance Testing (UAT), non-functional, HA, backup and recovery, and performance testing as appropriate
- Investigate implementing systems monitoring tools and technology which can be connected at every layer in a n-tier system so as to more easily and quickly identify possibly root cause
- patch the DB servers
What I hope to demonstrate to you is the viability of the framework, and how data and facts gathered can be extremely interdependent and that the data and facts hold the key to resolving the issues at hand. Hopefully this is an adequate example to draw out those themes and I wish you all the best of luck in the reviews and audits you actually go on to and continue to perform. Let me know if you use the framework I describe and how you get on with implementing it.