Sameer Seth's Weblog

Sameer Seth's Weblog

All | General | Music

Main | Next month (Jun 2006) »
20060531 Wednesday May 31, 2006

Solaris kernel is pre-emptible by default
Solaris kernel is pre-emptible by default

It is the fact that Solaris kernel can be pre-empted at any point of time. To verify this, fortunately we had a systemcall which would cause kernel to loop in while(1) for ever. At the same time we had 2 CPU machine. we made that system call from the application which caused CPU 1 to loop for ever. Now we started a while(1) loop in the user land and associated *real time* priority with the process and requested the kernel to bind it to CPU 1 on which kernel was already looping in while(1). The moment we did this, system got hung. The caue of the hang was that the while(1) inside the kernel got *pre-empted* and was switched to CPU 2 and the CPU 1 started running *user land while(1) with real time priority* on CPU 1. This caused both the CPU to loop in while 1.

This makes clear that the Solaris kernel is pre-emptible between CPU's by default. Now there was a question that when kernel is looping on the CPU, who forced it to switch to different CPU. It was found that when we return from *interrupt* a check is made for pre-emption. If any higher priority thread(than the current thread) is runnable on the CPU, OS forces the current thread to switch to different. That's how kernel got switched to the other CPU.

TO summarise it -

1. Solaris kernel can be prempted it some higher priotiry thread(RT) is runnable on the same CPU.
2. To Avoid this, we have to specially place a request to the kernel to not to preempt when we are executing critical code.
3. If we are holding a spin lock in the kernel we wont get pre-empted.

A small question here that comes to my mind is that if there was only one CPU, what would have happened ? I'm not very sure that Solaris would comeback to user land if while(1) loop is started in the kernel. It was only because there were 2 CPU's on the system, kernel while(1) got switched to another CPU. To verify this we can have shell running with real time priority on single CPU machine. From this shell we can start same application that invokes systemcall which will start while (1) loop within the kernel. If we are able to comeback to the shell prompt, this will make sure that the kernel was preempted for higher priority process. If the above is TRUE, kernel is completely preemptible in following situations -
- Kernel code is executing without any lock held
Otherwise we can assume that there are pre-emption points in the Solaris kernel releasing locks etc., where the check is made to yield CPU.

It may happen that the kernel may be preempted while holding a lock. If a kernel control path is holding multiple locks, it may be forced to yield CPU when it releases very first lock with other locks held. To avoid this, we may explicitely set flags to not to yield the kernel.



Linux 2.4 on the other hand is non-preemptible by default. The kernel control path has to yield it self to giveup CPU or it preempts itself when it goes to sleep on some wait queue. Where as Linux 2.6 can be configured to be preemptible. While returning from interrupt/exception, we can check if scheduling is required to yield CPU to high priority task even if kernel was interrupted. If kernel code was holding spin lock or was inside scheduler or handling soft IRQ's, we won't preempt else we can. While releasing spin locks, we check if pre-emption is required. If there is any higher priority thread runable on the CPU.

Nornally, Linux while returning from the interrupt/exception, just checks if the CPU was running in kernel mode earlier to the event(from the CS register). If so, it would just continue to resume from where it had left the kernel. Even if the timer interrupt has occured and the time slice of the current process has elasped.
Attempt is made to make Monta vista Linux kernel fully pre-emptible.

Beware: Solaris kernel programmers have to be careful while desigining the code because they should know the points where kernel should not be pre-empted. Where as Linux kernel progammer can be careless as far as pre-emption is concerned because they know that it is only they who can preempt the kernel. In 2.6 Linux kernel pre-emption points are designed such that pre-emption will happen only after critical code is executed. ( May 31 2006, 08:31:49 AM PDT ) Permalink Comments [1]

20060511 Thursday May 11, 2006

TCP/IP overview on Solaris

Introduction

The write-up gives overview of the TCP/IP stack implementation on Solaris. The discussion starts with some of the stream's concepts like queues, message etc.,. The discussion is extended to explaining how TCP/IP is build as modules to fit in the stream's framework. We will thereafter see how packet's traverse up and down stream in the stack as messages. There is slight discussion on the SYNC queue(stream framework) and how it is used for asynchronously processing TCP/IP messages. Then there is a small discussion on the service routines that would process the messages on the queue's message queue in case it is not able to pass on the message to the next module. We have small discussion on the Fire-engine design related squeue processing. Finally we will see how the normal & TCP urgent data(OOB data) processed at TCP, stream head & sockfs level. The writeup is completely based on the knowledge gained as a result of experience, code browsing and reading some documents. I'd say that the write-up gives fair idea of the TCP/IP implementation on Solaris with chances of errors in my understanding of the subject. There is a great scope of modifications and comments are always welcome. For better understanding of the stream concepts, please refer the book on streams programming guide available at docs.sun.com. For better understanding of TCP/IP concepts, please refer TCP/IP illustrated vol. 1 by W. Richard Stevens .

Please find the complete writeup here...

Important kernel data structures related to TCP-IP and kernel stream's/squeue's and their linking can be found here...



some of the commands to debug TCP/IP on Solaris using 'scat' crash analyser
On Solaris 10: scat is a debugger which is used to analyse kernel live memory or crash dump. I'd like to introduce some of the scat commands that can be used to debug stream's and TCP/IP stack as well.
scat> stream -s
the output of the command will give all the streams currently active on the system. Each entry will have stream pointer, modules stacked on the stream, messages in the queue/syncq, and QFULL information.
scat> stream findproc [stream address]
From 'stream -s' output we can get the stream address. We can use stream address to find the process to which this stream belongs. The output contains proess name, file descriptor associated with the stream, vnode address associated with the stream.
scat> sdump [stream vnode address] sonode
from the vnode address associated with the stream(from stream findproc output), we can get to the sonode structure. sonode contains all the information about the socket.
scat> stream -l [stream address]
We can use stream address to find out details of the stream. This command gives detailed information about all the modules stacked on the stream, details of queue_t structure for each module(stream head also), list all the messages completely queued on the queue, state of all the modules queue and stream head. q_ptr field of the queue_t structure points to the private data for the module. For TCP module, q->q_ptr points to conn_s structure. conn_s is the connection structure for the TCP containing all conneciton specific information. conn_tcp field of conn_s structure points to the tcp_t structure for the TCP connection. for all these we need TCP's queue_t address which we can get from stream -l output. So, we can use sdump command of scat to debug queue_t, conn_s, tcp_t structures in the following way -

for queue_t structure

scat> sdump queue_t


for TCP conn_s structure

scat> sdump [TCP queue address] queue_t q_ptr
this will give us address of conn_s structure for the TCP

scat> sdump [conn_s address] conn_s
here we get the dump of the conn_s structure for the TCP.

for TCP tcp_t structure

scat> sdump [conn_s address] conn_s conn_tcp
this will give us address of tcp_t structure for the TCP

scat> sdump [tcp_t address] tcp_t
here we get the dump of the tcp_t structure for the TCP.

Note: in the entire write-up queue maps to queue_t, message block maps to mblk_t, data block maps dblk_t. I've not used actual fields of the data structures but just the names which denote the fields. Like there is mention of readp & writep for message block which corresponds to read pointer and write pointer of the mblk_t. So, opensolaris.org is the best place to brows the source code.

( May 11 2006, 04:44:03 AM PDT ) Permalink Comments [4]


Today's Page Hits: 5