Nikolay Igotti
CECR talk
28 Oct I will be giving a talk on VirtualBox at SECR conference: "VirtualBox: Struggle for Performance in Type 2 Hypervisors" see this link. So if you'll be in Moscow around this timeframe — come by, will try to make it both technical and fun.Posted at 04:18PM Oct 18, 2009 by nike in Sun | Comments[0]
Bug investigation (part 1 - tale of sir VIF)
Last year at VirtualBox was fun, but provided such a heavy load that I almost had no time to blog. Now let's fix this a bit. There are several almost detective stories of recent bug hunting in VBox core I'd like to share. They show level of complexity of modern virtualizers, and how bugs in system software may manifest in rather unexpected ways to applications (and guest kernel is an application for us :)). Let's start with this little bugger. Almost 2 years old, and it is a lot for VBox bug. Bug manifested itself asI'm receiving the following message when certain commands are run in FreeBSD 6.2 and VirtualBox 1.4.0: sigreturn: eflags 0x80247
This error was one of nasty blockers to run FreeBSD reliably as the guest, and as I thought it's a good idea to support it, I looked on that bug. As it was pointed out in the bug, message is triggered by:
if (!EFL_SECURE(eflags & ~PSL_RF, regs->tf_eflags & ~PSL_RF)) {
printf("sigreturn: eflags = 0x%x\n", eflags);
return (EINVAL);
in FreeBSD kernel's sys/i386/i386/machdep.c. This check essentially means, that sometimes, in EFLAGS CPU register some bit FreeBSD considered insecure was toggled.
Bit which is of interest is VIF bit, which has rather convoluted story behind itself.
When Intel, in early 90s wanted to keep compatibility with legacy 16-bit DOS code, while running protected mode OSes, they introduced so called VM86 mode. This mode was first take of Intel on virtualization, and call it clumsy is kind of compliment. VIF (and VIP) flags are exactly part of this extension. VIF represents virtualized version of IF flag (interrupts enabled flag). If DOS application would be allowed to modify real IF - it could disable interrupts and render whole system unusable. Thus instead, cli instruction in VM86 mode affected only VIF flag. At the same time, pushf and popf instructions which are (almost) only way to access EFLAGS, were modified in such a way, that value of VIF bit was placed to IF bit. And as VIF is 19th bit of EFLAGS, it's not visible in 16-bit version of pushf.
So now the bug reasons: sometimes FreeBSD executes BIOS calls in VM86 mode, which may modify VIF flag on the CPU. When taking protected mode interrupts (such as timer used for task scheduling), if it happens in the wrong moment (when VIF flag value was toggled) our dynamic recompiler wasn't clearing VIF flag (as according to Intel/AMD instruction manuals it shouldn't). All following EFLAGS accesses has VIF flag setting masked, thus to OS it looked like VIF bit toggled at random. Fix was not that hard: just mask out VIF and VIP bits in EFLAGS when taking interrupts in VM86, as those bits makes no sense outside of VM86 mode.
Next post will cover story of most time consuming bug I ever worked on (about 80 hours of continuous hacking).
Posted at 12:23PM Mar 12, 2009 by nike in Sun | Comments[0]
Long absolute jumps on AMD64
Sometimes it may be required to perform calls and jumps to absolute address on 64-bit AMD. Unfortunately, x86_64 instruction set only allows 32-bit displacements, so traditional approach is to move desired address into register and call or jump using it. Unfortunately, it requires scratch register, or push/pop of register. In case of jump, it also problematic if we wish not touch registers. Here I suggest alternative approach, using ret instructions for long jumps. While not too complicated, this trick can help some compiler/JIT writers to handle very long jumps.
DECLINLINE(void) tcg_out_pushq(TCGContext *s, tcg_target_long val)
{
tcg_out8(s, 0x68); /* push imm32, subs 8 from rsp */
tcg_out32(s, val); /* imm32 */
if ((val >> 32) != 0)
{
tcg_out8(s, 0xc7); /* mov imm32, 4(%rsp) */
tcg_out8(s, 0x44);
tcg_out8(s, 0x24);
tcg_out8(s, 0x04);
tcg_out32(s, ((uint64_t)val) >> 32); /* imm32 */
}
}
DECLINLINE(void) tcg_out_long_jmp(TCGContext *s, tcg_target_long dst)
{
tcg_out_pushq(s, dst);
tcg_out8(s, 0xc3); /* ret */
}
Posted at 03:20PM Oct 23, 2008 by nike in Sun | Comments[0]
Python API to the VirtualBox VM
One of the important advantages of the VirtualBox virtualization solution is powerful public API allowing to control every aspect of virtual machine configuration and execution. Last month I was working on Python and Java bindings to that API. Those bindings are shipped with VirtualBox 2.0 SDK.
There are two families of API bindings:
SOAP allows to control remote VMs over HTTP, while XPCOM is much more high-performing and exposes certain functionality not available with SOAP. They use very different technologies (SOAP is procedural, while XPCOM is OOP), but as it is ultimately API to the same functionality of the VirtualBox, we kept in bindings original semantics, so other that connection establishment, code could be written in such a way that people may not care what communication channel with VirtualBox instance is used. As an example of how flexible and powerful those API are, I developed extensible Python command line shell to the ViritualBox, usable as simpler CLI alternative to GUI. Same shell code can work with either SOAP or XPCOM connection to the VirtualBox. To start XPCOM version of shell:- download VirtualBox 2.0 for your platform (Linux and Solaris Python bindings officially supported)
- download SDK
- unpack SDK
cd sdk/bindings/xpcom/python/sampleexport VBOX_PROGRAM_PATH=/opt/VirtualBox-2.0.0/ PYTHONPATH=..:$VBOX_PROGRAM_PATH./vboxshell.pyto start the shell
def showvdiCmd(ctx, args):
mach = argsToMach(ctx,args)
if mach == None:
return 0
hdd = mach.getHardDisk(ctx['ifaces'].StorageBus.IDE, 0, 0)
print 'HDD0 info: id=%s desc="%s" size=%dM location=%s' %(hdd.id,hdd.description,hdd.size,hdd.location)
return 0
and add following line to commands map:
'vdiinfo':['Show VDI info', showvdiCmd],
Then you can run it like this: vdiinfo Win32 (or however your VM of interest is named).
Easy, isn't it? Moreover this command will work not only with XPCOM bindings, but with SOAP too.
This example also shows how to access VirtualBox constants in toolkit neutral manner - 'ifaces' field of context contains reflection information
useadble to get values of the constant.
Actually, there are other languages bindings to VirtualBox API shipped with SDK, including Java and C++, but I personally find Python easiest for start. You can ask here questions on VirtualBox language bindings (not only Python), and I will try to help.
Posted at 01:26PM Sep 05, 2008 by nike in Sun | Comments[15]
Informative paper on memory
Ulrich Drepper wrote pretty interesting, yet somewhat longish and probably too detailed paper on memory management. Wouldn't say that I recommend it to everybody, but software people who want look cool knowing that SRAM cell needs 6 transistors, while for DRAM it's enough to have 1 transistor and 1 capacitor should read it for sure.Seriously speaking, this paper could be of interest if you want to understand what really goes on when you do MOV EAX,[ECX].
Posted at 11:23PM Aug 07, 2008 by nike in Sun | Comments[0]
Back at Sun
Now I'm back at Sun again — definitively love this place! This time I'm working on the VirtualBox project - OS virtualization software. Project looks very interesting, so in my future postings will cover what I encounter in my journeys deep into the kernel and back
.
If you have questions I'm capable to answer - feel free to ask me in comments.
Update: thanks everybody who welcomed me back!
Posted at 08:07PM Aug 06, 2008 by nike in Sun | Comments[5]
Leaving Sun
Starting Aug 17 I leave Sun Microsystems. My personal e-mail is igotti@gmail.com. Have fun!PS: I have no other technical blog yet, you may look at my Livejournal blog, but it's in Russian and generally have not that much info yet
PPS: my last project - compressed object pointers is now of production quality, being reviewed and hopefully will be integrated into Hotspot workspace rather soon.
Posted at 03:15PM Aug 15, 2007 by nike in Personal | Comments[6]
FS neutral data recovery tool
Data recovery from ext3 FS success story.[Read More]Posted at 11:18AM Aug 02, 2007 by nike in Personal | Comments[0]
Explicit template instantiation in shared libraries
When explicit template instantiation saves the day.[Read More]Posted at 11:09PM Jul 19, 2007 by nike in Sun | Comments[0]
Double mapping of memory regions on Unix
Mapping same physical memory pages onto several different virtual addresses locations at the time from the userland.[Read More]Posted at 04:54PM Jul 14, 2007 by nike in Sun | Comments[0]
Hotspot internals Q&A
If you have question on Hotspot VM internals - feel free to ask here.[Read More]Posted at 12:22PM Jul 08, 2007 by nike in Sun | Comments[15]
ILP64, LP64, LLP64
What LP64, LLP64, ILP64 stands for?[Read More]Posted at 10:32AM Jul 08, 2007 by nike in Sun | Comments[1]
Raw page table access
Solaris x86 code demonstrating raw access to CPU's page table. As usual, don't try this on sensitive machines (although this code is pretty safe). [Read More]Posted at 06:26PM Jul 05, 2007 by nike in Sun | Comments[0]
Debugger for Win32 (v2)
Mini-debugger for Win32 allowing tracing even statically linked binaries, not only imported symbols.[Read More]Posted at 04:09PM Jul 04, 2007 by nike in Sun | Comments[0]
C mini-contest
Mini-contest: "how well do you know C"?[Read More]Posted at 04:32PM Jul 03, 2007 by nike in Personal | Comments[6]
Sunday Oct 18, 2009