Monday Oct 13, 2008
If we refer to the System Topology diagram in my previous blog,
we find that the internal disks of T5440 are connected to PCIe-0. Hence
it is not possible to remove the PCIe-0 from the Primary (or Control)
Domain. However it is possible to remove PCIe-1, PCIe-2 and PCIe-3 from
the Primary Domain and allocate them to IO Domains.
In
order to create a IO-domain using PCIe-1, it has to be removed from
Primary Domain. This would cause the Primary Domain to lose its primary
network interface if it has been using the On-board NICs. However if
there was a network card available on PCIe-0, then the primary network
for Primary Domain can be switched to the ports on the network card
before removing PCIe-1 from Primary Domain. If an additional network
card is not available, it should still be possible to remove PCIe-1
from Primary Domain and create a IO domain (let us call it Secondary
Domain) managing devices off PCIe-1. In such a case, the Primary Domain
would provide the boot-disk service to the Secondary Domain and the
Secondary Domain would provide the primary network service for the
Primary Domain. The Pseudo-steps below outlines how this can be done.
- In the Primary Domain
- set the number of VCPUs to 8 (this is just an example number of VCPUs)
- set the memory to 8GB (just an example size of memory)
- create a vdisk-server
- remove PCIe-1 from its control
- This would cause the Primary Domain to lose its network after reboot
- Reboot the Primary Domain and log back into the Primary Domain from Console
- To
cause VCPUs for Secondary Domain to be allocated from T1 (refer to the
Topology Diagaram), create a dummy domain with the rest of 56 VCPUs
from T0. Bind the dummy domain.
- Associate a vdiskserverdevice as the boot-device for Secondary Domain
- Create the Secondary Domain
- set the number of VCPUs to 8
- set the memory to 8GB
- add PCIe-1 to it
- add the vdiskserverdevice as the vdisk for this domain
- Bind, install-OS and boot the domain
- Create a vswitch-device on the Secondary Domain
- Reboot the Secondary Domain
- Create a vnet-device for the Primary Domain associated with the above vswitch-device
- Plumb
and configure the vnet device on the Primary Domain (assumingthe
On-Board network ports are connected to the primary network of the Data
Center) Now the Primary Domain should have the primary network
available.
- Remove the dummy domain and proceed with creating other domains.
With the above technique, when the Primary Domain is rebooted,
the Secondary Domain may seem to pause until the Primary Domain boots
back. Similarly when the Secondary Domain is rebooted, the Primary
Domain's primary network may appear to freeze until the Secondary
Domain comes back online. But that is far better than losing all the
domains and the applications running in those domains.
Monday Oct 13, 2008
The Sun Fire T5440 can have at most 4 UltraSPARC T2
processor. Each UltraSPARC-T2 Procesor is directly connected to ¼th of
the entire system memory with 1Gigabyte memory interleaving and owns a
PCIe Root-Complex. When fully populated with Processor and memory,
Solaris can see 256 CPUs and 512GB of memory. That is a lot for many
applications except for some large databases. With this class of
system, it is not usually possible to consume the entire system with a
singe instance of most applications. But that is in fact a very good
opportunity to consolidate a bunch of such applications in this system
using LDOMs, there-by reducing Power consumption and rack space. An
example is the SugarCRM application. It is a web based application
written using PHP and has a MySQL database backend. Yun Chew has
written a nice blog
demonstrating how to consolidate SugarCRM application on this system
using LDOMs. I can think of many such applications that can be
consolidated on this and T5140 and T5240 based systems.
The
work done by Yun referred to above, there was no need to create any IO
domains, but because T5440 has 4 PCIe Root-Complex, it is possible to
create up to 4 IO domains for applications sensitive to IO performance.
Such applications, like database can be run in the IO domain so that
the application can have direct access to the physical disks. The other
domains - like application server domains can access the database over
virtual NIC. Each of the application server domains can have another
virtual NIC to communicate with the external world.
The good
thing about LDOMs based virtualization is that, even if the Primary
Domain goes down, other domains continue to be functional. Many other
virtualization technology does not have this advantage, which is why
Live Migration is very critical for such virtualization technology.
To
get the best performance out of a LDOMs based application deployment,
it is important to understand the system topology a bit so that it
becomes easier to determine what to place where. I have tried to create
a sketch of the system topology below for reference.

When
creating domains, IO and CPU requirement for the applications that
would run in the virtualized environment should be estimated. The
IO-performance of virtualized 1Gig network and virtualized disk is same
as native. But compared to native-IO, virtualized-IO consumes more CPU
cycles, often in the range of 5%-25%, depending on the size and
frequency of the IO. Hence, when doing resource planning for LDOMs
environment, couple of points should be considered to get the best
performance from the T5440 LDOMs environment.
- Is the application CPU intensive?
- Does it scale up with additional CPUs?
- Is the application Disk or Network IO intensive?
- Moderately IO intensive applications would consume less than 50% of maximum IO capacity of the device
- Is the application both CPU and IO intensive?
- How many interrupt sources the domain would need to manage?
- PCIe based Fiber Channel HBAs normally have 2 interrupt source.
- PCIe based 1G network devices have either 1 or 2 interrupt sources, while 10G network devices have 8 interrupt sources
- Each virtualized IO device created out of vsw or vds have 1 interrupt sources
The
number of VCPUs that need to be allocated to a Domain depends largely
on the ability of the application to make good use of the VCPUs. In
addition to the VCPUs needed by the application, extra VCPUs should be
allocated to handle interrupts. For optimal performance, when VCPUs
are allocated to a domain, then they should be allocated in multiples
of 4 at least, preferably in multiples of 8 where possible.
In the next section I will describe how to create IO domain with Inter-IO Domain Dependency
Monday Oct 13, 2008
With the introduction of Chip Multi-Threading (CMT) in the SPARC
Processor Family, a new sun4v based architecture was also introduced.

This
sun4v interface allows the Operating System to communicate with the
hardware via a layer called the Hypervisor. The Hypervisor provides a
Hardware Abstraction to the Operating System. The Hypervisor itself is
not an Operating System and is delivered with the platform bundled with
the Firmware. Now it should be possible to carve out different groups
of actual Hardware components and present it to the Operating System.

This
combination of the Hypervisor and sun4v based Operating System are the
key enablers for LDOMs. LDOMs is supported on all UltraSPARC T1 and
UltraSPARC T2 based system. There are some nice documents
about LDOMs including discussion forums that you can join or post your questions.
LDOMs Concept
A
UltraSPARC T1 processor is equipped with up to 8 cores, with 4 Hardware
Threads (Strands) per core. Each Hardware Thread is seen as a CPU by
the Operating System. A UltraSPARC T2 Processor is also equipped with
up to 8 cores per chip with 8 Hardware Threads per core.
When
creating domains, CPUs are allocated to a domain. A CPU allocated to
one domain cannot be shared with another domain. Similarly when memory
is allocated to a domain, the same memory cannot be allocated to
another domain. Hence CPU and memory are partitioned across domains.
However, the IO devices like network cards or disks can be shared.
When sharing disks, a single slice of a disk cannot be shared with
multiple domains, however different slices of a disk can be allocated
to different domains. It is also possible to create large files on a
mounted filesystem and make a file available to a domain as disk.
UltraSPARC
T1 based T2000 have 2 PCI-e Root-Complex, UltraSPARC T2 based T5120
and T5220 have 1 PCI-e Root-Complex along with 2xOn-Chip 10Gigabit
Ethernet, UltraSPARC T2 Plus based T5140 and T5240 also have 2 PCI-e
Root-Complex, and T5440 has 4 PCI-e Root Complex. It is possible to
allocate a Root-Complex to a Guest Domain so that the Guest has direct
access to the devices connecting to the Root-Complex.
LDOMs Components
- Primary
Domain - This is default or the first domain that is available with a
new system. Initially all system resources remain allocated to this
domain. This is the only domain that can be used to configure other
domains. This Domain is sometimes referred as Control Domain.
- Service
Domain - A domain that provides disk and network services to other
domains. For example, if a Guest Domain makes a Disk Image stored in
its filesystem available for booting another domain, then it can be
called a Service Domain
- IO Domain - A domain that owns
physical IO devices. When such domain shares its devices with another
domain , it can also be terms as Service Domain
- Guest Domain - A domain that depends on any of the above three domains for its IO services.
- Virtual Disk Client (vdc) - A device driver component active in Guest Domain to provide disk view to the domain
- Virtual Disk Server (vds)
- A device driver component active in Service Domain, that is
responsible for the physical IO after receiving requests from the vdc.
- Virtual Network Client (vnet) - Similar to vdc above, but provide Virtual NIC service to the Guest
- Virtual Network Switch (vsw) - A switch implementation that communicates with vnet on one side and and with the NIC device-driver on the other side.
- Virtual Console Concentrator (vcc) - Provide Console access to a Guest Domain
- MAU - These are the On-Chip Cryptographic Co-Processors. There is 1 MAU per core.
Steps for Creating a Domain
- Some CPU and Memory resources from the Primary Domain must be removed so that it can be allocated to other domains
- A vcc instance need to be created in the Primary Domain
- A vsw and vds (Virtual Disk Server Device) instance need to be created
- At this time a Guest Domain can be created
- It should be assigned a Console Port (vcc)
- Its vdc should be associated with a Virtual Disk Service
- Its vnet should be associated with a vsw
Tony Shoumack wrote a nice blueprint to provide detailed help with domain creation using LDOMs.
The
per core FPU of UltraSPARC T2 and UltraSPARC T2 Plus are just
functional units of the core. When a Domain need to execute Floating
Point instruction, the core associated with the Domain takes care of it.
If
the Domain need to accelerate Cryptographic Operations by offloading it
to the On-Chip Cryptographic Co-Processor, then, MAUs need to be
assigned to the domain.
In the next section, I will cover how to allocate devices and CPU to get the best performance.