Friday Oct 19, 2007

Sizing the Sun Identity Manager Repository is once again one of those staple challenges in Sun Identity Manager. I have not seen a good answer for how to do this. So once again, read on if you are interested in understanding how it is done.

Where to Start?

The Sun Identity Manager Application is quite complicated and has literally thousands of touch points. Some of those touch points are more important than others, but all in all quite a challenge to get your head around. The Repository is no exception!

It seems to me, for whatever reason, I am always asked to size the database prior to doing any real analysis or design. By all intents and purposes is utterly ridiculous, granted if you are in the market for an enterprise provisioning system it would be nice to get a total budget picture. Foregoing common sense should not take the place of enterprise scheduling and budgeting. Please take the time to size your Identity Manager Repository correctly; this makes ALL the difference in the world when it comes to performance!

This article will assist in the initial sizing for the Sun Identity Manager Repository.

How data is stored

Before I get into the weeds on how to size the repository I thought it might be beneficial to explain at a very high level how the Sun Identity Manager Repository works. Perhaps, not so much of how the database works more of how we store data and why.

Sun Identity Manager is comprised of several tables; the relevant tables are defined below:

a.       object – The object table stores most configuration objects of the Sun Identity Manager application. Configuration objects consist of basically any object in the application. For example, SystemConfiguration, TaskDefinition, WorkItem, Form, Resource, etc. However, the object table does NOT store any User objects or TaskInstances aka stateful workflows.

b.      task – The task table stores the non static instances of Workflows, also known as TaskInstances’s and WorkItems.

c.       org – The org table stores all Sun Identity Manager organizational information. This is NOT any org information for managed systems.

d.      userobj – The userobj table stores all the Sun Identity Manager enterprise identities. These identities are simply containers for accounts on managed systems.

e.       account – The account table holds the accounts identified on a managed system. Note: if you have heard the term “building the account index”, this is the table being referred too. Each managed system will have a set number of accounts in this table after reconciliation and/or account linking.

f.       log – The log table stores all audit events and log type information.

If you were to closely examine the database scripts used to create the tables above you would notice more tables than what I have defined. I have intentionally left those tables out of this article; the relevant tables are listed.

A key concept to understand in the Sun Identity Manager Repository is that all data is stored in two ways. Each table has indexed and keyed columns used to query objects, and each table has an XML column used to store the entire ASCII representation of the object (depending on the database engine this is typically a BLOB or MEDIUM TEXT data type). This is due to the fact that all Sun Identity Manager objects are de-serialized from java objects to ASCII XML for storage in the Repository.

The application, at a high level, queries by the indexed columns, pulls back XML ASCII text and then serializes the XML into java objects. These objects are usually made available through the use of views (UserView, PasswordView, etc).

Non Normalized Database

One final point on database design and function that is critical to understanding sizing. The Sun Identity Manager Repository is NOT normalized. The data is indexed and we do sometimes only query non XML data but for the most part, the database is what I refer to as an XML database (I am sure there is another industry term but this makes the most sense to me).

What do I mean by not normalized? Let me give a simple example. Let’s say that I was designing a database for a retail application. In my retail application I would have Items; Items are included in Transactions. In this very simplistic example I would need at least two tables, retail_item and retail_transaction. I would most likely have a join table called retail_transaction_join. The relationship would be a one-to-many from retail_transaction to retial_item. Assume that the retail_item and retail_transaction tables have primary keys item_id and transaction_id and that the join table simply uses these as foreign keys.  This design would be normalized meaning that none of the data is duplicated (other than the keys) across more than one table.  If you would like more information and perhaps a better explanation please refer to the following web site (http://en.wikipedia.org/wiki/Database_normalization).

Sun Identity Manager does NOT have a normalized database. Due to the fact that most data is stored in XML every time a new attribute, object etc is added to the XML the size of the XML BLOB grows, thus increasing the size of the XML by a factor of the size of each added entity. A simple example of would be each new ResourceAccount added to the User object adds effectively 1024 bytes (this is calculated as defined below) of data to the User object.

Sizing the database

In order to accurately size the Sun Identity Manager Repository one must consider all the information given to you and implied. The challenge is to understand what is implied and how it affects the database. I want to stress, even with all the information provided in this article, this process is very much a challenging endeavor.

Step 1.  Consider the Number of Users and Number of Resource Accounts

The first step in sizing the Repository is identifying the base number of Users in the system and the number of Resource Accounts each user will have assigned. This in most cases is a simple process but will require some estimation, I would estimate slightly higher than what the client is telling you. A typical user count is 50k, and 5 Resource Accounts per user.

UC = 50000

RC = 5

Once the number of Users and Resource Accounts has been identified calculate the base footprint of the User object without any Resource Accounts assigned, this number needs to be in bytes. This could be done in several ways, I would recommend finding the most accurate method possible. Let UI = the value of base user object. In my research this value was 8000 bytes. Note: It is very important to consider User Exented Attributes in calculating the footprint of the User object, especially if you have a large number of User Extended Attributes.

UI = 8000 bytes

Then add a Resource Account to the user and recalculate the size of the User Object (once again in bytes). The difference of these two values is the size each Resource Account adds to the base User object footprint. Subsequently, we can derive an equation to tell us how large x number of Users are with y number of resources assigned to them. Once again in my research this value was 1024 bytes.

RI = 1024 bytes

Finally once the base value of the User object and the value of each subsequent Resource Account are defined we can derive an equation to demonstrate the base footprint of the Users.

U = (((RI)RC)+UI)UC

Step 2.  Consider the Number of requests per day

In order to understand why requests equate to memory allocation in the database you must understand what a TaskInstance is and subsequently what sub processes could be spawned from them. In general terms I would be more concerned with simply understanding how many workflows you will be executing. Another consideration would be if you have a large workflow in particular that will be executed often. An example of this would be a Access Request with multiple Approvals or perhaps a New Hire Request Form. These workflows will consume more database memory than the standard workflows.

First make an approximation of the base footprint a workflow aka TaskInstance. 

PI = 6000 bytes

The next step is to understand how many in memory workflows will be executed. This will be the aggregate of all the Users, Administrators and System processes.

Identify how many requests will be coming from the User population; a good number to start with is 2%. I consider all workflow activities, any end user tasks, admistrative function, or system process. The important thing to remember is that all these eventually wind up in the Repository as TaskInstances. The following equation is used to calculate the Process Count:

PC = (UC).02

Finally, calculate the Process Footprint by the following (260 represents the max working days in a year):

P = PC * PI * 260

Step 3.  Consider the Number of WorkItems per Request

It is important to understand that Work Items are not just Approvals; WorkItems manifest themselves in a number of ways. If you are not familiar with what a WorkItem is or need more information I would highly encourage you to understand this prior to making any assumptions in this section.

First Identify the base footprint of a WorkItem, as mentioned above this must be done by looking at all the possible manifestations of a WorkItem and averaging them.

WI = 1500 bytes

Next calculate the total number of WorkItems per the Process. In my experience this number is adjusted several times through my estimation. For this article I am using 3, which is just for random purposes.

WC = 3

Finally, calculate the WorkItem base footprint per the Process Count as identified above. The following equation is used (260 represents the max working days in a year):

W = WC * WI * PC * 260

Step 4.  Consider the Audit Log size

Understanding the Log count and size is one of the more important sizing issues. As you can imagine audit data is critical to the Sun Identity Management Solution. Audit Logs are more importantly going to grow the fastest and stay around the longest in the repository. AuditEvents aka Audit Log entries are generated quite a bit in the application by default, additionally events can be user defined. It is very important to understand where these events come from and how much data is stored with each.

First Identify the base footprint of a AuditEvent, as mentioned above this must be done by looking at all the possible manifestations of a AuditEvents and averaging them. I typically use 5000 bytes, I have seen this number fluctuate quite a bit though.

AI = 5000 bytes

Next, we need to figure out how many audit events are triggered from each process as we have defined above. This is relatively straight forward but could be a challenge in more complex environments (3 is the number of Audit Events per request).

AC = PC * 1

Finally the foot print of the Audit Events is calculated by the following (260 represents the max working days in a year):

A = AC * AI * 260

Step 5.  Understanding the Base Install Footprint

Sun Identity Manager installs a set amount of data into the Repository. This number changes with every release and is not static. When sizing the Repository it is very important to understand this value. I will use 76000 bytes for this example but should be verified for every sizing done.

B = 76000 bytes

Putting it all together

Once each of the above steps has been completed it is time to put it all together and understand our database footprint. As you can imagine this is relatively straight forward, just math. The following example will put it all together:

In this example I am using 50k User with 5 Resource Accounts per user and 3 Work Items per request.

Step 1

UC = 50000                      N umber of User

RA = 5                              Number or Resource Accounts per User

UI = 8000 bytes                User object size

RI = 1024 bytes                Resource Account object size

Equation: U = (((RI)RC)+UI)UC

Calculation:

(((((1024) 5 ) + 8000) 50000 ) / 1024 ) / 1024 = 626 megabytes  

Step 2

PI = 6000 bytes               Task Instance size

PC =  500                        Number of processes per day

Equation: P = PC * PI * 260

Calculation:

((500 * 6000 * 260) / 1024) / 1024 = 744 megabytes  

Step 3

WI = 1500 bytes                Work Item size

WC =  3                             Number of Work Items per Process

Equation: W = WC * WI * PC * 260

Calculation:

((3 * 1500 * 500 * 260)  / 1024) / 1024 = 558 megabytes   

Step 4

AI = 5000 bytes                   Audit Event size

AC = PC * 3                       Number of Audit Event per Process per year

Equation: A = AC * AI * 260

Calculation:

((1500 * 5000 * 260)  / 1024) /1024  = 1860 megabytes   

Final Calculation:

The final calculation is pretty straight forward; simply add each of the previous calculations to get a total.

Final

U =  626 megabytes  

P =  744 megabytes  

W =  558  megabytes   

A = 1860 megabytes   

Equation: F = U + P + W + A

Calculation:

(626 + 744 + 558 + 1860)  / 1024   = 3.7 gigabytes   

List of equations

For those of you that don’t want to read through all of this, here are the equations spelled out. Enjoy

Variables

UC = 50000                        Number of Users

RA = 5                                Number or Resource Accounts per User

UI = 8000 bytes                  User object size

RI = 1024 bytes                  Resource Account object size

PI = 6000 bytes                  Task Instance size

PC =  500                           Number of processes per day

WI = 1500 bytes                 Work Item size

WC =  3                              Number of Work Items per Process

AI = 5000 bytes                  Audit Event size

AC = PC * 3                       Number of Audit Event per Process per year

Equations

Equation: U = ((RI * RC) + UI)UC          User footprint

Equation: P = PC * PI * 260                     Task Instance footprint

Equation: W = WC * WI * PC * 260       Work Item footprint

Equation: A = AC * AI * 260                   Audit Event footprint

Equation: F = U + P + W + A                   Final footprint

Summary

If you are wondering why this hasn’t been documented somewhere well I guess you can see why by reading this. It is pretty complicated and changes quite a bit from release to release.  If you are on SWAN I have create a sizing tool using Sun Identity Manager (http://gunga.central.sun.com:6060/idm/user). This tool is still in Beta mode and has some hiccups, none the less you could validate your sizing estimates on the output of the tool.

Another aspect of Database Sizing in the hardware required to deploy the database and the modifying the database create scripts to properly configure the database once you have the database sized appropriately. These two additional topics will be addressed in my next blog entries.

-          Enjoy

Tuesday Oct 02, 2007

A colleague and I were discussing the required configuration of the Sun Identity Manager Remote Gateway for a production system and it dawned on me that I don’t think I have really ever seen a proposed reference architecture for the Gateway. Here you go:

Installation 

How and where does the Gateway get installed? This seems to be the question that comes up the most and a logical starting point for this technical “how to”. The Gateway installs as a Windows Service and should be installed on a Windows 2000 server, do not install it on NT with Active Directory Client Extensions installed. Why? I have tried this in the past and had some issues, the Engineering team doesn’t advise it either. 

The Gateway binds to the directory through a serverless bind thus needs to be installed on systems that know about the Domain. Does this mean it has to be on a Domain Controller? No, the Gateway can be installed on any system matching the criteria above.

When the Gateway is installed as a service, it typically runs as the local system account. If you change this then you will need to add a couple of rights to the users, “Act As Operating System” and “Bypass Traverse Checking”.  This is not to get confused with the Active Directory Administrator account used by the resource adapter. These are tow entirely different things. The Gateway Service Account does need the ability to Run Actions and Create Home Directories. There are other configuration possibilities but by far these are the most common.

Trusts 

What about Trusts? Trust present some issues but generally speaking as long as the Gateway is in the same forest as the domain to be managed or there is a trust between domains the bind will succeed. Note, this is also spelled out in the product documentation.

Why use the LDAP hostname? LDAP hostnames are generally a good idea and I would recommend configuring it where possible. The concept is that when you specify the LDAP hostname you are basically telling IDM to bind to a specific Domain Controller. However, per the product documentation and my experience you do NOT have to give a Domain Controller. If you give a DNS name of the AD Domain instead of an IP address and the Gateway system is configured to return multiple IP’s then you avoid having to point IDM LDAP hostname to a specific domain controller. Why does that matter? It matters in that you can essentially have fault tolerance at the Domain Controller level.

High Availability 

High Availability and Fault Tolerance are also of concern when deriving the Identity Manager logical architecture and in respect to this topic the Sun Remote Gateway. The Gateway should NOT be load balanced! Instead I recommend two Gateway servers (or more depending on function and scale) behind a network device capable of handling active/passive modes, this is commonly handled by a load balancer but I do not want to give the perception that these Gateways should be load balanced. Configure Identity Manager to point to the network device and let the network device determine which Gateway to use based on availability.  I would not recommend OS clustering or attempting to cluster the Gateway, it is really not required in most cases.

Server Specifications 

Server Specs are another question that is brought up quite a bit as well. The answer to this question is not very straight forward. However, generally speaking, the following is very common:

    Pentium 4 3.x ghz dual core

    2 gigs of RAM

Logical Diagrams

The following diagrams illustrate common configurations.

 

Single Domain Configuration

Single Domain Configuration

You will notice in the diagram above Sun Identity Manager is configured to administer one AD Domain. This design is most common and is very stable. The network device IP is configured in the Resource Adapter, in the event a Gateway Server fails the router automatically redirects traffic to the appropriate Gateway Server.

 

Multiple Domain Configuration

Multiple Domain Configuration

You will notice in the diagram above Sun Identity Manager is configured to administer two AD Domains configured with two-way trusts. This design is less common but is possible, notice that IDM could be managing either domain. However, due to the fact that a Gateway Server is configured in each domain and the trusts exist this configuration works beautifully and provides pretty good redundancy. However, I would recommend having two Gateway servers per domain. As in most cases IDM will have a resource adapter configured for each domain being managed.
Multiple Domain Configuration - Higly Available 

Multiple Domain Configuration - Highly Available

This is the same configuration as above with the addition of two Gateway Servers, one for each domain. This would be the most recommended configuration for two domains. Well that sums up my tech blog on the Sun Remote Gateway. If you have any questions or would like to comment please feel free to do so, I am always open to suggestions and feedback.

Friday Aug 24, 2007

I was out of the office last week, on one of my all to familiar last minute trips to an un-familiar place with little information or direction on what I was going to do when I got there. I decided it would be a good idea to hire one of those airport shuttle services. The price of gasoline and the long drive from my home to the airport was not something I looked forward to, this was a nice change. The driver was late as I expected, I had asked him to come earlier than I had needed, I was fine with it. I had pre-paid his tip so I did not need to fumble around with cash at the last minute; I assumed that if they got me to the airport at all I would be happy with the service. I arrived at the airport a little early, as I typically do. The driver was very apologetic and felt that he had ultimately let me down, not done his job. I assured him that it was no big deal and I thanked him for his safe driving and his kindness. After all he was a very kind and interesting older gentleman, I genuinely liked him. When I arrived inside the airport the security line was so long I could not see the end. I could not believe the amount of people in the airport. Once again, I was ultimately un-effected by the situation.

So what does this story have to do with anything? The simple fact is, without my 10 years of experience flying 20 to 40 weeks a year, the outcome of my little story would be much different. I would have probably been very upset at the driver (and the company that hired him). I would have potentially missed my flight, causing a bad situation with my client and my employer. All in all I would have not been a happy camper. The truth of the matter is, I have been in that exact situation more times than I can count. I don't know a single consultant that hasn't. Why was this time different? Were the stars aligned? Did traffic suddenly disappear and allow the drive to be much faster? Did the distance get shorter? Of course not, the difference was that I knew what to expect. My expectations prompted my actions, and therefore assured me (or gave me the allusion) that I was in control of the situation.

The moral of this story is to give you plenty of time to get to the airport. Just kidding! There is however, a serious lesson to be learned here. If you equate my simple story to that of an enterprise software project; is there any difference in my responsibility to clients? I would argue there is not, it is my obligation to help clients understand and to set expectations appropriately. Honesty based on experience, doesn't always mean that a project is going to go on without a hiccup, but it certainly has done me justice. Hence, the implied value of the consultant, to say to our clients "I have seen this before; this is what happened under these circumstances". The value of our experience, I believe, is not measurable in simple terms. If I could brain dump my 15 years of experience to a junior level consultant I would, I can train him/her but nothing, absolutely nothing can replace experience in the field.

As consultants and SME's , do we have the ability to do something about these simple oversights? I believe that we can prepare consultants for the field, much like an army would prepare it's soldiers. Perhaps this is where the term "bootcamp" comes from in respect to product specific training. Ultimately, Senior Enterprise Consultants are the key to success of enterprise software. Without their real world practical experience, we would always miss our flights.

This blog copyright 2009 by Christopher Timmerman