John Brady's Weblog

Oracle and System Performance

All | General | Oracle | Performance
« Previous day (Jul 24, 2005) | Main | Next day (Jul 25, 2005) »
20050725 Monday July 25, 2005

Niagara: Measuring Performance and Sharing

Sun's future Niagara processor has many implications when it comes to measuring the performance of a system, and the behaviour of the applications running on it. A single threaded processor and its execution core are either utilised or not at any given point in time – all or nothing. With a multi-threaded processor like Niagara, utilisation of the execution core can now have values between 0% and 100% - there are shades of grey between black and white.

Our goal is to maximise the performance of our applications on the computer hardware configuration we have available to us. One aspect of this is to understand how utilised the system capacity is. Another aspect is to understand how the resources are being used by each of the applications running on it. With single threaded processors this simplistic on/off view is close enough to the truth, and we have been used to it for many years. With multi-threaded processors we can either retain this simplistic view, or we can enhance it and be honest about the existence of the shades of grey.

Until now a processor has only had a single execution core capable of executing instructions from one thread at a time. To switch between threads involved intervention by the operating system, which in turn involved executing instructions from the kernel to save the current state of this thread, and then choose another thread to schedule on this processor. So only ever one thread of a process was executing on that processor.

The operating system, such as Solaris, records the elapsed time that the thread spends running on that processor, and that the processor is busy (utilised) during this period. This is used to update the running total of the total CPU consumption of the thread and the busy time of the processor, which is used in calculating its overall utilisation.

With a Niagara processor the operating system can still record the same pieces of data about each thread it schedules on each execution core, and they are still valid (i.e. they reflect the scheduling performed by the operating system). However, they no longer tell the full story. Some things are missing which could be very relevant for a multi-threaded processor such as Niagara. We will now also be interested in the efficiency of the processor – how much of its potential capacity has been used and how much is spare - and how effectively a thread has been able to execute on it.

A Niagara processor will have 8 execution cores. Each of these execution cores will be able to concurrently deal with 4 threads. So Solaris will see 32 separate processors for scheduling threads on. However, there are in reality only 8 execution cores, not 32. Each execution core is being shared by 4 separate threads, with the execution core switching between their instruction streams during relatively long memory accesses by one thread to keep itself efficiently utilised. But at any instant in time the execution core is only executing one instruction at a time, not four instructions at a time.

Four threads concurrently executing on a Niagara processor can have different performance behaviours dependent on which execution cores the operating system schedules them on. If the four threads are each scheduled onto different execution cores, and presuming that there are no other active threads, then there is no interference between the execution of the threads. This is similar to having four separate single threaded processors.

However, instead, if the four threads are all scheduled onto the same execution core, then they will be sharing that execution core, and there will be some form of interference between them, as they each in turn have their instructions executed while other threads access data from memory.

Due to this sharing of the execution core, I would now like to know about the utilisation of the physical execution cores in the Niagara processor, as well as the normal thread level scheduling. I would like to know about the Niagara's utilisation both in terms of number of concurrently executing threads, and overall efficiency (how many potential execution cycles were successfully used and how many were not due to lack of a ready instruction to execute).

Obviously, if an execution core is not at maximum efficiency and has unused execution cycles due to long memory accesses, then it is worth adding an extra thread of execution to it. If it is at maximum efficiency already then we would be better off adding in extra, separate execution cores to handle extra threads.

Knowing this information would help us make the right decisions on how to size our systems, and how much extra capacity it could deal with. So the big question is, how is Solaris going to actually measure and report both CPU utilisation and thread CPU usage (consumption) on multi-threaded Niagara processors? Another interesting set of challenges to have with the new design principles behind Niagra.

( Jul 25 2005, 02:33:44 PM BST ) Permalink Comments [3]

Calendar

RSS Feeds

Navigation

Links

Referers

Search

Recent Posts