Charlie Chen's Weblog

pageicon Thursday Nov 29, 2007

How does MSI/MSI-X work - MSI part?

1. Background 

MSI, Message Signaled Interrupts, uses in-band pci memory space message to raise interrupt, instead of conventional out-band pci INTx pin.

When system wants to use msi, it should setup msi pci capability control registers. Simply said, it would write address register (32b or 64b) and data register, and set enable bit of msi control register. When device chip wants to send interrupt, it will write the data in data register to the address specified in address register.

MSI-X is an extension to MSI, for supporting more vectors. MSI can support at most 32 vectorsm while MSI-X can up to 2048.

Using msi can lower interrupt latency, by giving every kind of interrupt its own vector/handler. When kernel see the message, it will directly vector to the interrupt service routine associated with the address/data. The address/data (vector) were allocated by system, while driver needs to register handler with the vector.

By allocate vector area generally for all kinds of pci devices, system will reach a general solution to reporting interrupts quickly.

2. Msi registers

capability id: 1b, val = 0x5, the code of this capability.

next pointer: 8b, the offset of the next capability, 0 if this is the last one.

message control: 16b
bit 8: if per-vector masking is supported
bit 7: if 64b address is used
bits 6-4: number of allocated vectors by system, evaluated by 2 to this 3-bit value.
bits 3-1: number of requested vectors by function, evaluated by 2 to this 3-bit value.
bit 0: if msi is enabled, disabled by default, driver can't use this bit to mask interrupt

message address: 32b

message upper address: 32b, only used when 64bit address is enabled by bit 7 of control register

message data: 16b, specified by system, lower bits combination can be modified reflecting the identity of the interrupt source.

mask bits: 32b
pending bits: 32b
Every vector is given a number, corresponding to a bit in mask and pending resgister. Mask bit set indicates that the function is prohibited from sending the associated message. The pending bit set has s pending associated message. This mechanism enable software to disable or defer message sending on a per-vector basis.
pageicon Thursday Nov 15, 2007

Virtualization race prediction

Virtualization means efficient way of hardware resource management, which would be better done by firmware and operating system. So, when OS or processor suppliers realize that virtualization can be an inevitable tool to customers, this resource management thing will no longer be a toy, but an internalized organ. That's not good for ISVs of virtualization, such as Vmware.

A  reasonable prediction is that vmware will have a hard time facing competitors: Linux's Xen, MS's virtual PC/Server, SUN's intrinsic containers, Oracle's coming thing, etc.

Virtualization, like everything else in system software land, should be free and as invisible as possible, or, users should not pay too much for such an admin tool.

pageicon Thursday Nov 08, 2007

simulate Ethernet controller - concepts

To simulate an Ethernet controller, we need to:

1) simulate just enough registers' side effect, to mimic chip's logic;
2) control events' time of occurring, to control chip's performance.

Registers access on a chip are just like RAM's operation, with one exception that registers' read/write operation has side effects such as causing packet to send, or read clear, while RAM is totally a passive part. It's just the side effects that is the key to simulate an IO module's logic, no matter what kind of and complex the module - scsi controller, simple UART, or NIC.

Important as the correct logic of registers' interpretation, so is to make the asynchronous way up from IO to cpu expectable. This is not like the real world, and the difference is very important for a simulator. In real world, there are two ways of communication between modules including cpu, memory, various IO modules. They are synchronous way and asynchronous way.

synchronous way: From cpu down to IO, all accesses should be predicted. This path is deterministic.
asynchronous way: From IO up to cpu, interrupts happen randomly. Obviously, this path is not deterministic.

In simulator, we need to take care of asynchronous way in a semi-deterministic way: at least we should guarantee an asynchronous event can happen timely, or, not to be delayed too much to change upper layer application's behavior drastically.

The method is to timing all the events happened from IO land. Simulator must have implemented some form of central clock. The task here is to relate the IO events to this central clock.

The result is change asynchronous events to synchronous ones. The essential is that all events in a simulator should be timed. How to do that smartly is difficult.