Weblog

All | General | Solaris
« Enuf error checking... | Main | How do you access... »
20070820 Monday August 20, 2007

lx brand internals - part1 lx brand internals - part1... If you are here, I assume that you already read the brandz design document available  as http://opensolaris.org/os/community/brandz/design/ and are interested in knowing the real low level details of how lx brand does its magic. If so continue...

This post may get modifed in the future as my understanding of things improve :-)

Now that you have got an lx zone [say, lx1]  installed, what happens when you issue, 'zoneadm -z lx1 boot'...
/usr/lib/zones/zoneadmd is the workhorse. It does all the heavy lifting w.r.t setting up the virtual platform; of that server() is the routine that does the actual work.
zsched process of a zone is equivalent to sched of global zone - all kernel threads related to the zone hang off from the zsched process.
zoneadmd creates zsched through server()->zone_ready()->vplat_create()->zone_create()->newproc()
zsched() launches zone_start_init() which will eventually launch the 'init' of the lx brand installed [say, centOS3.0 init].
brand_platform_iter_gmounts() mounts the file sytems mentioned in the config file /usr/lib/brand/lx/platform.xml [starting with keyword 'global']
No Inherited-Package-Dirs for lx brands.
Mount the rest of the file systems from the above xml file [starting with keyword 'mount']
boot/halt scripts allow the brand to perform any custom configuration, etc

lx brand boot callback: lx_support is called from zone_bootup()->do_subproc() to do any additional setup before the zone is booted - say, setting the 'restrat' attribute for init.
Then make the zone_boot() system call – as the comment in zone.c says, this doesn't need to do any work other than to just find the status set by the init() of the brand.

Understanding how init of the brand gets started, throws a lot of light around how brandz works:
zone_start_init() execs /sbin/init of the brand.
elfexec()  realizes that this is a branded process, and hence execs the helper library:
    /Brand/lx1/root/native/usr/lib/lx_brand.so.1
through lx_elfexec() and then maps in linux binary and its interpreter as follows :
 
CPU     ID                    FUNCTION:NAME
  1  48637                  elf32exec:entry   zsched              vp: /Brandz/lx1/root/sbin/init
              genunix`gexec+0x374
              genunix`exec_common+0x471
              genunix`exec_init+0x275
              genunix`start_init_common+0xfb
              genunix`zone_start_init+0x3d
              unix`thread_start+0x8

  1  48637                  elf32exec:entry   zsched                vp:  /Brandz/lx1/root/native/usr/lib/lx_brand.so.1
              lx_brand`lx_elfexec+0xfd
              elfexec`elf32exec+0xf3e
              genunix`gexec+0x374
              genunix`exec_common+0x471
              genunix`exec_init+0x275
              genunix`start_init_common+0xfb
              genunix`zone_start_init+0x3d
              unix`thread_start+0x8

  1  48627            mapexec32_brand:entry   zsched          vp: /Brandz/lx1/root/sbin/init
              lx_brand`lx_elfexec+0x169
              elfexec`elf32exec+0xf3e
              genunix`gexec+0x374
              genunix`exec_common+0x471
              genunix`exec_init+0x275
              genunix`start_init_common+0xfb
              genunix`zone_start_init+0x3d
              unix`thread_start+0x8

  1  48627            mapexec32_brand:entry   zsched           vp: /Brandz/lx1/root/lib/ld-2.3.2.so
              lx_brand`lx_elfexec+0x2aa
              elfexec`elf32exec+0xf3e
              genunix`gexec+0x374
              genunix`exec_common+0x471
              genunix`exec_init+0x275
              genunix`start_init_common+0xfb
              genunix`zone_start_init+0x3d
              unix`thread_start+0x8

AT_ENTRY auxv vector points to _start() of lx_brand.so.1 – solaris linker jumps to this after its boot-strapping.
exec_args() copies the arugments to the user stack. And execution in the userland starts at _rt_boot() and stack looks something like this :

                       
                        ----------------
                        |                              |  <-- %ebp
                        ----------------
                        |          argc              | 
                        ----------------
                        |          argv[]            |
                        ----------------
                        |          NULL            |
                        ----------------
                        |          envp             |
                        ----------------
                        |          NULL            |
                        ----------------
                        |          auxv[]            |
                        ----------------


When the linker turns over control to the entry in the executable [i.e _start() in our case], the stack looks exactly how it was when kernel gave the control to the linker i.e _rt _boot() :
_ rt_ boot()-> setup()->elf_entry_ pt()->_start()

_start() calls lx_init()  which registers the user-space handler() with the lx brand module
Handler is: lx_handler_table() - this looks like this:

LM2`lx_brand.so.1`lx_handler_table:             pushl  $0x0
LM2`lx_brand.so.1`lx_handler_table+5:           jmp    +0x1112  <LM2`lx_brand.so.1`lx_handler>
LM2`lx_brand.so.1`lx_handler_table+0xa:         nop   
LM2`lx_brand.so.1`lx_handler_table+0xb:         nop   
LM2`lx_brand.so.1`lx_handler_table+0xc:         nop   
LM2`lx_brand.so.1`lx_handler_table+0xd:         nop   
LM2`lx_brand.so.1`lx_handler_table+0xe:         nop   
LM2`lx_brand.so.1`lx_handler_table+0xf:         nop   
LM2`lx_brand.so.1`lx_handler_table+0x10:        pushl  $0x10
LM2`lx_brand.so.1`lx_handler_table+0x15:        jmp    +0x1102  <LM2`lx_brand.so.1`lx_handler>
LM2`lx_brand.so.1`lx_handler_table+0x1a:        nop   
LM2`lx_brand.so.1`lx_handler_table+0x1b:        nop

Where we jump into this table from the kernel is dictated by the system call made. lx_handler() takes the system call number as argument and proceeds to emulate it.

Ok...so now the Q is how do we initially get into the kernel when a system call is made by a lx process ??

32bit linux makes use of int80 to make a system call – brandz replaces it with its own handlers for branded processes :
  - brand_interpositioning_enable() installs the new handler on a context switch
  - brand_interpositioning_disable() restores the old handler while switching out
  - The new handler is: brand_sys_int80
Take a closer look at brand_sys_int80; In this post let us focus on 64bit solaris kernel :

Upon int80, the stack looks like this when we enter brand_sys_int80() [refer to 5-27 vol3a of intel
programmers manual]
[1]> <rsp,5/Jn
0xffffff00046c3fd8:
                            feb888bd  <-- %EIP of linux app that caused the 'int 80'    
                            43            <-- Code Segment selector
                            212          <-- EFLAGS
                            8047cec   <-- %esp - stack pointer as of 'int 80'   
                            4b            <-- %SS

brand_sys_int80() switches to the kernel stack of the running thread, and retrieves a lot of brand information
and then jumps to lx_brand_int80_callback(). As of the jump, the stack looks like this :

[1]> <rsp,8/Jn
0xffffff00046c3ee8:
                           feb888bd              <-- return address into app's user land
                           ffffff014ba301c0    <-- Pointer to proc brand data structure
                           ffffff0152573708    <-- lwp brand data
                           ffffff00046c3fd8     <-- Same as above 'rsp'
                           fffffffff7cddc60        <-- lx_brand`lx_brand_int80_callback()

lx_brand_int80_callback() takes the system call number present int %rax, left shifts it by 4 (as each entry
in the lx_handler_table() is 16bytes long] and that is the address it needs to jump to, to emulate the system
call. Then it swaps the 'return address that is available on stack with this newly computed address - such that
when it issues 'return from interrupt [iretq]', it starts off at the new address. The actual 'user land addrss from
the application' is saved into %rax.
The stack as of iretq:
[1]> <rsp,6/Jn
0xffffff00046c3fd8:            
                           fefddb40       
                           43             
                           212            
                           8047cec        
                           4b   
[1]> <rax=J
                           feb888bd

See %rax holds, what used to be on stack !!!! And what is on stack is the address in the lx_handler_table()
where it jumps to...From here on it executes the emulation code and is the content of next post...
( Aug 20 2007, 02:38:51 AM PDT ) Permalink

Comments:

Post a Comment:

Comments are closed for this entry.

Calendar

RSS Feeds

Search

Links

Navigation

Referers