Michael Haines's Weblog

     
 
xVM Server (Hypervisor) Dtrace and FMA
xVM Server and Dtrace

We (Sun) have developed a NEW 'xpv' Dtrace provider for the xVM Server. Why is this important? Well, due to the isolation of the hypervisor from the dom0, there is currently no way to apply DTrace directly to the hypervisor. With that in mind, we Sun have developed a new 'xpv' DTrace provider, that allows you to trace the interaction between the dom0 and the hypervisor. This provider is constructed of SDT probes introduced into the privcmd device driver (which is installed as part of Solaris). The available probes may be listed by using the following command:
# dtrace -l -i 'xpv:::'

While understanding the details of these probes requires a fair amount of knowledge about the Solaris xVM Server interface, simply enabling them all by issuing the following command:
# dtrace -n 'xpv::: {}'

Will provide you with a quick high-level introduction to the steps involved with creating domains, destroying domains, migrating domains, etc.. So the primary reason we developed this was for debugging purposes, as it allows you to monitor how the control tools are interacting with the hypervisor. Specifically, it shows which hypercalls are being executed.

An 'xm destroy ' operation triggers the following probes:
CPU     ID                    FUNCTION:NAME
  1       6315                 privcmd_HYPERVISOR_domctl:dom-destroy-start
  1       6312                 privcmd_HYPERVISOR_domctl:dom-destroy-end

An 'xm create <domain>' gives you this:
CPU     ID                    FUNCTION:NAME
  1       6314                 privcmd_HYPERVISOR_domctl:dom-create-start
  1       6313                 privcmd_HYPERVISOR_domctl:dom-create-end
  1       6298                 privcmd_HYPERVISOR_event_channel_op:evtchn-op-start
  1       6297                 privcmd_HYPERVISOR_event_channel_op:evtchn-op-end
  1       6298                 privcmd_HYPERVISOR_event_channel_op:evtchn-op-start
  1       6297                 privcmd_HYPERVISOR_event_channel_op:evtchn-op-end
  1       6308                 privcmd_HYPERVISOR_memory_op:set-memory-map-start
  1       6299                 privcmd_HYPERVISOR_memory_op:set-memory-map-end
  1       6304                 privcmd_HYPERVISOR_memory_op:populate-physmap-start
  1       6301                 privcmd_HYPERVISOR_memory_op:populate-physmap-end
  1       6325                 do_privcmd_mmap:mmap-start
  1       6324                 do_privcmd_mmap:mmap-entry
  1       6323                 do_privcmd_mmap:mmap-end
[... about 100000 lines deleted ...]
  1       6325                 do_privcmd_mmap:mmap-start
  1       6324                 do_privcmd_mmap:mmap-entry
  1       6323                 do_privcmd_mmap:mmap-end
  1       6318                 privcmd_HYPERVISOR_domctl:setvcpucontext-start
  1       6309                 privcmd_HYPERVISOR_domctl:setvcpucontext-end
  1       6317                 privcmd_HYPERVISOR_domctl:dom-unpause-start
  1       6310                 privcmd_HYPERVISOR_domctl:dom-unpause-end

Note: The output shown above came from running the following command:
# dtrace -n 'xpv::: {}'

xVM Server and FMA

As it stood of approx 1 month ago (I need to get an update for you on this - which I will do), FMA support is completely non-existent under the xVM Server! That basically means practically nothing works right now!

The hypervisor presents virtualized hardware to us, and does not let us  look under the hood all that readily.  For example to perform logout of error telemetry you need to perform a few register reads - well xVM Server by default does not let even dom0 read those MSRs, and even if it did you are not guaranteed to make two back to back MSR reads from the same physical cpu since your vcpu could migrate.  So we can not read cpu/mem error telemetry, we can not perform diagnosis, and therefore can not offline cpus etc. If resources are faulted and retired when running on metal these retires will not happen when you boot the xVM Server.

Similarly, PCIE support is hampered by the lack of MSI (message signalled interrupt) support in the xVM Server as the hostbridge has no means of raising an interrupt to the cpu when it observes an error.  Some telemetry can be gleaned after the fact (at a poll) but much is lost and full diagnosis can not be done.

In the recent project putback to snv_76 engineering laid the foundations to re-use the existing code for x86 FMA under the xVM Server environment. As far as I know we hope to deliver initial FMA support for the xVM Server for at least cpu and memory quite quickly, but work has not yet begun to my knowledge, but as I said above I need to get an update on this as of today. So, standby.....


@ 10:19 AM GMT+00:00 [ Comments [0] ]
 
 
 
 
 
« December 2007
SunMonTueWedThuFriSat
      
1
2
3
4
5
6
7
8
9
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
     
Today

[RSS Newsfeed]

Valid XHTML or CSS?

[This is a Roller site]
Theme by Rowell Sotto.
 
© Michael Haines's Weblog