Sizing The Sun Identity Manager Repository
Sizing the Sun Identity Manager Repository is once again one of those staple challenges in Sun Identity Manager. I have not seen a good answer for how to do this. So once again, read on if you are interested in understanding how it is done.
Where to Start?
The Sun Identity Manager Application is quite complicated and has literally thousands of touch points. Some of those touch points are more important than others, but all in all quite a challenge to get your head around. The Repository is no exception!
It seems to me, for whatever reason, I am always asked to size the database prior to doing any real analysis or design. By all intents and purposes is utterly ridiculous, granted if you are in the market for an enterprise provisioning system it would be nice to get a total budget picture. Foregoing common sense should not take the place of enterprise scheduling and budgeting. Please take the time to size your Identity Manager Repository correctly; this makes ALL the difference in the world when it comes to performance!
This article will assist in the initial sizing for the Sun Identity Manager Repository.
How data is stored
Before I get into the weeds on how to size the repository I thought it might be beneficial to explain at a very high level how the Sun Identity Manager Repository works. Perhaps, not so much of how the database works more of how we store data and why.
Sun Identity Manager is comprised of several tables; the relevant tables are defined below:
a. object – The object table stores most configuration objects of the Sun Identity Manager application. Configuration objects consist of basically any object in the application. For example, SystemConfiguration, TaskDefinition, WorkItem, Form, Resource, etc. However, the object table does NOT store any User objects or TaskInstances aka stateful workflows.
b. task – The task table stores the non static instances of Workflows, also known as TaskInstances’s and WorkItems.
c. org – The org table stores all Sun Identity Manager organizational information. This is NOT any org information for managed systems.
d. userobj – The userobj table stores all the Sun Identity Manager enterprise identities. These identities are simply containers for accounts on managed systems.
e. account – The account table holds the accounts identified on a managed system. Note: if you have heard the term “building the account index”, this is the table being referred too. Each managed system will have a set number of accounts in this table after reconciliation and/or account linking.
f. log – The log table stores all audit events and log type information.
If you were to closely examine the database scripts used to create the tables above you would notice more tables than what I have defined. I have intentionally left those tables out of this article; the relevant tables are listed.
A key concept to understand in the Sun Identity Manager Repository is that all data is stored in two ways. Each table has indexed and keyed columns used to query objects, and each table has an XML column used to store the entire ASCII representation of the object (depending on the database engine this is typically a BLOB or MEDIUM TEXT data type). This is due to the fact that all Sun Identity Manager objects are de-serialized from java objects to ASCII XML for storage in the Repository.
The application, at a high level, queries by the indexed columns, pulls back XML ASCII text and then serializes the XML into java objects. These objects are usually made available through the use of views (UserView, PasswordView, etc).
Non Normalized Database
One final point on database design and function that is critical to understanding sizing. The Sun Identity Manager Repository is NOT normalized. The data is indexed and we do sometimes only query non XML data but for the most part, the database is what I refer to as an XML database (I am sure there is another industry term but this makes the most sense to me).
What do I mean by not normalized? Let me give a simple example. Let’s say that I was designing a database for a retail application. In my retail application I would have Items; Items are included in Transactions. In this very simplistic example I would need at least two tables, retail_item and retail_transaction. I would most likely have a join table called retail_transaction_join. The relationship would be a one-to-many from retail_transaction to retial_item. Assume that the retail_item and retail_transaction tables have primary keys item_id and transaction_id and that the join table simply uses these as foreign keys. This design would be normalized meaning that none of the data is duplicated (other than the keys) across more than one table. If you would like more information and perhaps a better explanation please refer to the following web site (http://en.wikipedia.org/wiki/Database_normalization).
Sun Identity Manager does NOT have a normalized database. Due to the fact that most data is stored in XML every time a new attribute, object etc is added to the XML the size of the XML BLOB grows, thus increasing the size of the XML by a factor of the size of each added entity. A simple example of would be each new ResourceAccount added to the User object adds effectively 1024 bytes (this is calculated as defined below) of data to the User object.
Sizing the database
In order to accurately size the Sun Identity Manager Repository one must consider all the information given to you and implied. The challenge is to understand what is implied and how it affects the database. I want to stress, even with all the information provided in this article, this process is very much a challenging endeavor.
Step 1. Consider the Number of Users and Number of Resource Accounts
The first step in sizing the Repository is identifying the base number of Users in the system and the number of Resource Accounts each user will have assigned. This in most cases is a simple process but will require some estimation, I would estimate slightly higher than what the client is telling you. A typical user count is 50k, and 5 Resource Accounts per user.
UC = 50000
RC = 5
Once the number of Users and Resource Accounts has been identified calculate the base footprint of the User object without any Resource Accounts assigned, this number needs to be in bytes. This could be done in several ways, I would recommend finding the most accurate method possible. Let UI = the value of base user object. In my research this value was 8000 bytes. Note: It is very important to consider User Exented Attributes in calculating the footprint of the User object, especially if you have a large number of User Extended Attributes.
UI = 8000 bytes
Then add a Resource Account to the user and recalculate the size of the User Object (once again in bytes). The difference of these two values is the size each Resource Account adds to the base User object footprint. Subsequently, we can derive an equation to tell us how large x number of Users are with y number of resources assigned to them. Once again in my research this value was 1024 bytes.
RI = 1024 bytes
Finally once the base value of the User object and the value of each subsequent Resource Account are defined we can derive an equation to demonstrate the base footprint of the Users.
U = (((RI)RC)+UI)UC
Step 2. Consider the Number of requests per day
In order to understand why requests equate to memory allocation in the database you must understand what a TaskInstance is and subsequently what sub processes could be spawned from them. In general terms I would be more concerned with simply understanding how many workflows you will be executing. Another consideration would be if you have a large workflow in particular that will be executed often. An example of this would be a Access Request with multiple Approvals or perhaps a New Hire Request Form. These workflows will consume more database memory than the standard workflows.
First make an approximation of the base footprint a workflow aka TaskInstance.
PI = 6000 bytes
The next step is to understand how many in memory workflows will be executed. This will be the aggregate of all the Users, Administrators and System processes.
Identify how many requests will be coming from the User population; a good number to start with is 2%. I consider all workflow activities, any end user tasks, admistrative function, or system process. The important thing to remember is that all these eventually wind up in the Repository as TaskInstances. The following equation is used to calculate the Process Count:
PC = (UC).02
Finally, calculate the Process Footprint by the following (260 represents the max working days in a year):
P = PC * PI * 260
Step 3. Consider the Number of WorkItems per Request
It is important to understand that Work Items are not just Approvals; WorkItems manifest themselves in a number of ways. If you are not familiar with what a WorkItem is or need more information I would highly encourage you to understand this prior to making any assumptions in this section.
First Identify the base footprint of a WorkItem, as mentioned above this must be done by looking at all the possible manifestations of a WorkItem and averaging them.
WI = 1500 bytes
Next calculate the total number of WorkItems per the Process. In my experience this number is adjusted several times through my estimation. For this article I am using 3, which is just for random purposes.
WC = 3
Finally, calculate the WorkItem base footprint per the Process Count as identified above. The following equation is used (260 represents the max working days in a year):
W = WC * WI * PC * 260
Step 4. Consider the Audit Log size
Understanding the Log count and size is one of the more important sizing issues. As you can imagine audit data is critical to the Sun Identity Management Solution. Audit Logs are more importantly going to grow the fastest and stay around the longest in the repository. AuditEvents aka Audit Log entries are generated quite a bit in the application by default, additionally events can be user defined. It is very important to understand where these events come from and how much data is stored with each.
First Identify the base footprint of a AuditEvent, as mentioned above this must be done by looking at all the possible manifestations of a AuditEvents and averaging them. I typically use 5000 bytes, I have seen this number fluctuate quite a bit though.
AI = 5000 bytes
Next, we need to figure out how many audit events are triggered from each process as we have defined above. This is relatively straight forward but could be a challenge in more complex environments (3 is the number of Audit Events per request).
AC = PC * 1
Finally the foot print of the Audit Events is calculated by the following (260 represents the max working days in a year):
A = AC * AI * 260
Step 5. Understanding the Base Install Footprint
Sun Identity Manager installs a set amount of data into the Repository. This number changes with every release and is not static. When sizing the Repository it is very important to understand this value. I will use 76000 bytes for this example but should be verified for every sizing done.
B = 76000 bytes
Putting it all together
Once each of the above steps has been completed it is time to put it all together and understand our database footprint. As you can imagine this is relatively straight forward, just math. The following example will put it all together:
In this example I am using 50k User with 5 Resource Accounts per user and 3 Work Items per request.
Step 1
UC = 50000 N umber of User
RA = 5 Number or Resource Accounts per User
UI = 8000 bytes User object size
RI = 1024 bytes Resource Account object size
Equation: U = (((RI)RC)+UI)UC
Calculation:
(((((1024) 5 ) + 8000) 50000 ) / 1024 ) / 1024 = 626 megabytes
Step 2
PI = 6000 bytes Task Instance size
PC = 500 Number of processes per day
Equation: P = PC * PI * 260
Calculation:
((500 * 6000 * 260) / 1024) / 1024 = 744 megabytes
Step 3
WI = 1500 bytes Work Item size
WC = 3 Number of Work Items per Process
Equation: W = WC * WI * PC * 260
Calculation:
((3 * 1500 * 500 * 260) / 1024) / 1024 = 558 megabytes
Step 4
AI = 5000 bytes Audit Event size
AC = PC * 3 Number of Audit Event per Process per year
Equation: A = AC * AI * 260
Calculation:
((1500 * 5000 * 260) / 1024) /1024 = 1860 megabytes
Final Calculation:
The final calculation is pretty straight forward; simply add each of the previous calculations to get a total.
Final
U = 626 megabytes
P = 744 megabytes
W = 558 megabytes
A = 1860 megabytes
Equation: F = U + P + W + A
Calculation:
(626 + 744 + 558 + 1860) / 1024 = 3.7 gigabytes
List of equations
For those of you that don’t want to read through all of this, here are the equations spelled out. Enjoy
Variables
UC = 50000 Number of Users
RA = 5 Number or Resource Accounts per User
UI = 8000 bytes User object size
RI = 1024 bytes Resource Account object size
PI = 6000 bytes Task Instance size
PC = 500 Number of processes per day
WI = 1500 bytes Work Item size
WC = 3 Number of Work Items per Process
AI = 5000 bytes Audit Event size
AC = PC * 3 Number of Audit Event per Process per year
Equations
Equation: U = ((RI * RC) + UI)UC User footprint
Equation: P = PC * PI * 260 Task Instance footprint
Equation: W = WC * WI * PC * 260 Work Item footprint
Equation: A = AC * AI * 260 Audit Event footprint
Equation: F = U + P + W + A Final footprint
Summary
If you are wondering why this hasn’t been documented somewhere well I guess you can see why by reading this. It is pretty complicated and changes quite a bit from release to release. If you are on SWAN I have create a sizing tool using Sun Identity Manager (http://gunga.central.sun.com:6060/idm/user). This tool is still in Beta mode and has some hiccups, none the less you could validate your sizing estimates on the output of the tool.
Another aspect of Database Sizing in the hardware required to deploy the database and the modifying the database create scripts to properly configure the database once you have the database sized appropriately. These two additional topics will be addressed in my next blog entries.
- Enjoy

I wanted to send a quick note to apologize for the first post on this topic. I made a bit of mistake in my calculations, the sizing was correct (had i used bytes), but the end result was incorrect. My data was off quite a bit, the issue was that I had wanted to start all the calculations with kilobytes instead of bytes. And I did, except I did not go back and convert my bytes to kilobytes so all my calcuations were off by roughly a factor of 1000. If you divide my original output by 1024 you will notice the same number which is displayed today. I apologize for any confusion and to the gentleman that caught it, I think we both came to the same realization at pretty much the same time. Enjoy-
Posted by Christopher Timmerman on October 19, 2007 at 02:35 PM CDT #
Your calculation seems very informative. However, the info was overshadowed by your photograph and I cannot read the first few paragraphs.
Check the formatting of the page.
Posted by gopi on October 31, 2007 at 10:18 AM CDT #
Can you tell me what browser you are using? I have checked this on IE and Mozilla and have not seen any issues. Thanks for the feedback.
Posted by Christopher Timmerman on October 31, 2007 at 10:47 AM CDT #
Great article. Your blog is high quality and I would encourage you to give us more!
One slight fix to your example: "one-to-many from retail_transaction to retial_item."
I believe that should be many-to-many, transactions have many items and items can be in many transactions.
Posted by David Lotts on December 04, 2007 at 12:21 PM CST #
You should also mention the existence of debug/Show_Sizes.jsp, which will give you the min / max / avg / total size for each object in a particular repository. Also, the sizings do not include indexes, which can be pretty sizable.
I would caveat that the task instance size can vary dramatically. For one of my customers, they around 20K each.
I am not sure whether you wanted to mention the "attr" tables for completeness ...
Posted by Tim Corder on February 28, 2008 at 01:36 PM CST #