Written on 2015-06-27 18:57:55
For a more abstract view, there exist different perspectives on virtualization.
This post intends to give a practical overview on these and the currently available technologies. Keep in mind this is also work in progress and will get additional content in the future, by then this message will be removed.
A piece of hardware emulates another piece of hardware, such that no distinction seems to exist.
The virtualization software makes sure, all hardware (CPU, chipset, I/O, ...) instructions from the host cpu are translated for the guest. Such that a completely different set of hardware seems to be present.
That way, with a big hit on performance, different architectures than the one being provided by the host system can be made available. I.e. MIPS / ARM / SPARC on x86.
The guest OS runs natively without changes.
CPU emulation will not take place, just chipset and other hardware gets emulated. Some CPU instructions may be altered though, but no hardware emulation takes place, CPU-wise.
This yields way better performance than hardware emulation does, but you usually have to stick with one kind of architecture.
No hardware emulation takes place, but the host offers an API for hardware access to the guests.
Different architectures will NOT run.
Guest operating systems may may need the have their kernels patched, such that this API can be used. Xen has different operating modes, depending on the degree of paravirtualization being used.
Note: KVM is not just a pure Paravirtualizator, it just also provides paravirtualized drivers along with virtualized ones. Also it also uses qemu under the hood for hardware emulation.
No hardware emulation takes place, and the operating system kernel is shared.
Software: (patched kernel needed, thus only backported changes = bad.)
Just leave these technologies needing kernel patches alone, here's why I guess this is the better choice:
The same development will eventually take place, like it happened with KVM vs. Xen. All major linux distributions chose KVM as primary virtualization technique once a solution (read: KVM) was present within the mainline kernel. Xen was dropped. I'd be astonished if this were different with OpenVZ vs. LXC.
LXC just got fresh support in Proxmox, and will likely supersede Virtuozzo in the future. (But that's just an educated guess of mine.)
Currently there is a lot of fuss about
docker for 'app virtualization'.
docker used to use LXC as a backend, but nowadays they develop their own lib/userland tool called libcontainer for managing the OS functions such that their product will run.
lmctfy development ('let me contain that for you'), which has got the same scope as docker, is currently stalled according to the github project readme:
lmctfy is currently stalled as we migrate the core concepts to libcontainer and build a standard container management library that can be used by many projects.
Where you have minimal OS, acting as a hypervisor and virtual machine manager, and most interaction flows directly between VM and processor, without passing the HV OS.
A regular OS like any linux distribution, a Windows variant or Mac OSX is used, and your virtualization software is installed there.
All system calls have to pass the emulated/virtualized hardware which is provided through the host OS. All calls will have to pass through the host OS / the Hypervisor.
This is simply all the container stuff, where a guest OS is running as another process (-tree) is running within the host OS.
Hardware virtualization purely through sofware is costly and slow. Processors nowadays usually provide instruction set extensions like VT-x (Intel), VIA TV (VIA) or AMD-V (AMD), depending on the manufacturer.
These implement an access control specifically for virtualization, along to the rings we will talk about in a minute.
With VT-x there basically exist two modes:
Hypervisors run in VMX Root Op mode.
VM's do not.
If non-root-op stuff is run in ring 0 (see below) by a VM, the Hypervisor can catch this instructions since he runs in root-op-mode, basically implementing trapping.
Prior to this, binary code was passed to the HV from the VM and translated on the fly for security reasons. But with extra instructions, this of course takes place much faster.
To further speed things up, there also exist hardware implemenations for 'Nested Paging' / SLAT (second level address translation). These are called EPT ('Extended Page Tables', Intel) or RVI ('Rapid Virtual Indexing', AMD) and make 'shadow page table' management via the hardware possible. That way usually MMU (memory mapping unit of the cpu) intensive work loads can be sped up.
Also maybe you have to have turned on these CPU virtualization features on in the BIOS, too, if your hypervisor is slow as hell. It can be the case, that the mainboard has these deactivated by default (for whatever a reason).
These are separations such that processes within a certain ring can just execute a subset of the processor instructions of the processes being present in the lower ring. For going lower, a kind of API is provided, via interrupts, and context switches are necessary for transitions.
Rings can be implemented in purely in software (slow), but nowadays hardware (instructions within the processor, way faster, see above, google 'binary translation') is used for this.
First an overview, which ring permits which level of hardware enforced access in protected mode on an x86 cpu: (There exist some more modes, of course. ;))
Another term for the rings is hierarchical protection domains. They are mechanisms to secure execution of hardware-level instructions in the processor.
I.e. processes running in ring 0 have direct memory access, and do not have to use virtual memory where the RAM access would be limited for security reasons.
According to the virtualbox manual usually only 0 and 3 are used usually. But virtualbox also happens to use ring 1 for security reasons. See the aforementioned manual for more information how this takes place.
When ring protection is coupled to certain processor modes, it is basically the known differentiation between kernel- and userspace.
Depending on the ring the guest operates mostly in, the virtualization classification is also different, and that is why this part here was included into the post initially.