Everything* You Ever Wanted to Know about Virtualization (but Were Afraid to Ask)
This blog series is going to be about virtualization provided by a type-1 hypervisor with an emphasis for the embedded system use case on an ARMv8 Cortex-A processor. Even with the narrowed down scope, this will be a four-blog series. This first blog entry will provide an overview of what virtualization is, some of its history, the benefits of using it for on embedded systems, and a quick look at the different hardware features that makes it affordable in terms of CPU overhead and memory usage. Subsequent blog entries will go into more details how these hardware features are used in virtualization solutions such as the Xen Project hypervisor.
What Is Virtualization?
Technopedia defines virtualization as the “creation of a virtual resource such as a server, desktop, operating system, file, storage or network.”
Digging deeper, “virtual,” especially in the computing world, means “not physically existing as such but made by software to appear to do so.” So, virtualization is the act of creating a resource that doesn’t physically exist as such, but made by software to appear so. Another way to define it is that virtualization is faking out the type or amount of a physical resource with different type or amount of resource, such as CPUs, memory, storage media, and I/O.
Sometimes—especially in ye olden days like the 90’s and early 00’s—this was done primarily with software, but at the cost of significant performance loss. The best results are achieved when that virtualization software can use hardware features to make the fake-out easier, reducing the number of CPU cycles required to perform the virtualization.
The basic concept of virtualization isn’t new. This is what operating systems have been doing for decades. For example, threading is faking out the number of CPUs. Virtual memory is faking out the amount or location of memory. File systems fake out the number of storage media, to the extent that a program running in user-space can act for all intents and purpose as if it is the only one with access to those resources. It is even possible for an operating system or user-space program to create and manage a virtual machine in which another, potentially different, operating system can run. This is called a Type II Virtual Machine Manager (VMM), aka a hypervisor.
However, the amount of virtualization that an operating system provides quickly hits limits caused by performance degradation; software virtualization and abstraction always comes with a performance cost. In contrast, a Type I hypervisor runs directly on the hardware without relying on any services provided by an operating system.
In both cases, software running in those virtual machines believe they have exclusive access to one or more CPU cores, memory, and I/O that may or may not actually reflect the real hardware. This ability to create virtual environment for the software to run is very powerful and provides a number of benefits.
A Brief History of Virtualization
The idea of abstracting hardware and virtualizing it for software is not new, it has been in the literature since the 1960’s. In 1974, Popek and Goldberg formalized the definition of the hypervisor as a piece of software with the following three essential characteristics:
- The VMM provides an environment for programs which is essentially identical with the original machine;
- Programs run in this environment show at worst only minor decreases in speed;
- The VMM is in complete control of system sources.
The virtual machine is the environment created by the VMM.
Virtualization techniques, including the use of hypervisors, were first experienced in the server market. Initially most businesses hosted a single application on a server due to the relative parity between an application’s processing requirements and the server’s processing power, and also due to early limitations of the server’s OS when running multiple applications simultaneously.
Over time, increases in computing power resulted in excess capacity, creating demand for a means to consolidate applications in a way that was also easy to manage them. Demand for a solution to provide backwards compatibility for obsolescent operating systems also grew during this time, as this enabled businesses to continue using legacy applications dependent on those outdated operating systems.
VMWare provided the first commercially available hypervisors for x86 computers in 2001 that met both of these needs. An initial consolidation ratio of 5:1 allowed businesses to eliminate four out of five servers. Recent VMWare reports show consolidation ratios of 15:1. As companies migrated more of their services to run in virtual machines, it became possible to outsource data center services entirely, and infrastructure-as-a-service and platform-as-a-service in the cloud became a viable business model, allowing those companies to host their applications on remote virtual servers.
Hypervisors remain a key software component enabling the efficient use of processor resources for on-demand, elastic, metered cloud services from Amazon Web Services, Microsoft Azure, and Google Cloud.
Benefits of Embedded Virtualization
The first obvious benefit of embedded virtualization is the ability to run multiple operating systems. There are a number of different operating systems in use on embedded products, including Linux. Being able to use different operating systems side-by-side on the same chip can give designers flexibility to choose and mix different capabilities and functionality, for example a real-time operating system like FreeRTOS or RTEMS and a GUI-rich OS like Linux, provided by different operating systems. Being able to use different operating systems also avoids the need to port software from one OS to another. However, there are some other, not as obvious benefits as well, because of virtualization’s ability to provide isolation, enable new capabilities, and reduce product development risk.
An important benefit of virtualization is that virtual machines can be used to provide strong isolation between software functions (SWF). As a single hardware resource becomes responsible for hosting multiple software functions, isolation is necessary to restrict unfettered growth in complexity and to mitigate the impacts of concurrent execution of those software functions. Without isolation, the number of possible interactions between different pairs of software functions grows geometrically with the number of functions consolidated, dramatically increasing the effort to analyze and describe the interactions and side effects. In their 1974 paper, Popek and Goldberg state that “isolation, in the sense of protection of the virtual machine environment, is meant to be implied as a result of the third characteristic defined above.”
Isolation is also useful for providing fault containment by preventing erroneous or malicious behavior in one software function, or application, from affecting another. Early literature going back to the ’70s identifies the need for spatial separation for fault containment, which is commonly achieved by restricting the memory locations a SWF can access, thus protecting the data and instructions of the other SWFs in a consolidated system. ARINC 651 also identifies the need for temporal separation, later expanded upon in ARINC 653, where a SWF in one partition cannot affect the execution time or time to access a resource of a SWF in another partition.
Isolation is also useful for enforcing greater decoupling between software components. Coupling between software components leads to various issues with development, integration, maintenance, and future migrations. This is because coupling leads to complex dependencies between software components, often implicit or unknown, such that a change to, or addition of, a software component often has a wide-reaching and unexpected ripple effect throughout the system. Isolation through use of virtual machines can be used to enforce strong decoupling where any dependencies between software functions are made explicit, making it easier to understand and eliminate unintended or unexpected interactions.
Enable New and Improved Features
The capabilities provided by virtualization can also be used to enable new features and improve old ones. As already mentioned, the isolation provided by virtual machines allows for enhanced security and safety, as it becomes possible to run functions in isolation, i.e. sandbox them, so that a breach or failure in one VM is limited to that VM alone. Not even security vulnerabilities in the VM’s OS would result in compromise of functions in another VM, providing defense in depth.
The capability to consolidate disparate software functions enables the implementation of a centralized monitoring and management (MM) function that operates externally to the software functions being monitored. This MM function could be used to detect and dynamically respond to breaches and faults, for example, restarting faulted VMs or terminating compromised VMs before the hacker could exploit it. A centralized monitoring function could also prove useful in embedded applications which have a greater emphasis on up-time. The monitoring function could detect or predict when a VM is faulting, or about to fault, and ready a backup VM to take over with minimal loss of service.
There are other use cases that are common in the server world, where VMs are managed algorithmically by other programs, being created, copied, migrated, or destroyed in response to predefined stimulus. Virtualization enables guest migration, where the entire software stack, or part of it, could be moved from one VM to another, potentially on another platform entirely. This could be an important enabler for self-healing systems. Migration can help with live system upgrades, where the system operator could patch the OS or service critical library in a backup copy of the VM then test the patched VM to validate correct operation before migrating the actively running application to the patched VM, again with a minimal loss of service.
Another use case seen in the server market is the ability to perform load balancing, either by dynamically controlling the number of VMs running to meet the current demands, or by migrating VMs to a computing resource closer to where the processing is actually needed, reducing traffic on the network. These are all capabilities that could be coming soon to an embedded system near you.
Reduce Program and Product Risk
Virtualization can be used to reduce program risk by providing the means to reconcile contradictory requirements. The most obvious example is the case where two pre-existing applications are needed for a product, each developed to run on a different operating system. In this case, the contradictory requirements are regarding the OS to use. Other examples include different safety or security levels, where isolation allows you to avoid having to develop all of your software to the highest level, or using software with different license agreements, where you might want to keep your secret sauce software separate from open source contamination.
Long-lived programs can also benefit from the ability to add new VMs to the system at a later date, creating a path for future upgrades. Likewise, in a system using VMs, it becomes easier to migrate to newer hardware, especially if the hardware supports backward compatibility, like the ARMv8 does for the ARMv7. Even if it isn’t, thanks to Moore’s law, newer processors will have even greater processing capabilities, and emulation can be used in an VM to provide the environment necessary to run legacy software.
Virtualization can also be used to reduce risk of system failure during runtime. Previously mentioned was dynamic load balancing, which can also be considered one way to reduce the risk of failure, but virtualization can also be used to easily provide redundancy to key functionality by running a second copy of the same VM. With the centralized monitoring also previously mentioned, the redundant VM can even be kept in a standby state, and only brought to an active state if data indicates a critical function is experiencing issues or otherwise about to fail.
Key Hardware Features
In recent years, processors designed for the embedded product market space have started including key hardware features. These features make the most sense for multi-core, system-on-chip, processors that can best leverage the benefits of the technique. Below is a list of key hardware features that future blog entries will address in more detail:
- New execution modes for the hypervisor
- On ARM, these are called Exception Levels (EL)
- Multi-stage memory management units (MMU) and system MMUs (SMMUs)
- Used to sandbox the virtual machine and the I/O devices allocated to it to specific memory regions
- Interrupt, especially timer, virtualization