

# **Memory Virtualization**







MCSN - N.Tonellotto - Distributed Enabling Platforms

















ISTITUTO DI SCIENZA E TECNOLOGIE DELL'INFORMAZIONE "A. FAEDO"















# Memory Management Unit

UNIVERSITÀ DI PISA

- The MMU implements the virtual address space
  - Handles accesses to memory requested by CPU
  - It is an hardware component
  - Uses data structures in memory











#### Virtual Address



MCSN - N.Tonellotto - Distributed Enabling Platforms











































4











4





- The complete physical address of the memory cell is determined by combining the page address from the page directory with the lower bits from the virtual address
- More than one entry in the page directory can point to the same physical address
- The page directory entry also contains some additional info about the page
  - Access permission, etc..
- The data structure for the page directory is in main memory
  - The OS has to allocate contiguous physical memory and store the base address of this memory region in a special CPU register
  - The OS individually sets each entry in the page directory
- Layout Example (X86)
  - Address space: 32 bit
  - Page size: 4MB, i.e., 22 bits to address every byte (offset)
  - Page directory size: 1024 entries, i.e., 10 bits to address every entry (directory)









- Typical page size is 4KB, no 4MB
  - Selector 20 bits, Offset 12 bits
- Page table with 1,048,576 entries
  - If page entry is 4 bytes, page size is 4MB
- Each process needs a page table
  - 256 processes will occupy IGB of memory just for page tables
- The solution is to use a huge, sparse page directory
  - Address space regions which are not actually used do not require allocated memory











- The process of determining the physical address is called page tree walking
  - Some processors do it in hardware, others need help from the OS
- A small program might get by with using just one directory at each of levels 2, 3 and 4 and a few level 1 directories.
- IGB of memory can be addressed with one directory for levels 2 to 4 and 512 directories for level 1



DELL'INFORMAZIONE "A. FAEDO lunedì 17 dicembre 12







- The process of determining the physical address is called page tree walking
  - Some processors do it in hardware, others need help from the OS
- A small program might get by with using just one directory at each of levels 2, 3 and 4 and a few level 1 directories.
- IGB of memory can be addressed with one directory for levels 2 to 4 and 512 directories for level 1











- The process of determining the physical address is called page tree walking
  - Some processors do it in hardware, others need help from the OSOD SCCE226
- A small program might get by with using just one directory at each of levels 2, 3 and 4 and a few level 1 directories.
- IGB of memory can be addressed with one directory for levels 2 to 4 and 512 directories for level 1































## Virtualizing Virtual Memory







MCSN - N.Tonellotto - Distributed Enabling Platforms









ISTITUTO DI SCIENZA E TECNOLOGIE DELL'INFORMAZIONE "A. FAEDO"





9

DELL'INFORMAZIONE "A. FAEDO lunedì 17 dicembre 12

STITUTO DI SCIENZA E TECNOLOGIE











• The VMM must map guest virtual address to host physical address







- The VMM must map guest virtual address to host physical address
- Guest OS maintains its own virtual memory page table in the guest physical memory







- The VMM must map guest virtual address to host physical address
- Guest OS maintains its own virtual memory page table in the guest physical memory
- The VMM maintains a mapping from each guest physical memory page to the host physical memory page
  - PMAP data structure







- The VMM must map guest virtual address to host physical address
- Guest OS maintains its own virtual memory page table in the guest physical memory
- The VMM maintains a mapping from each guest physical memory page to the host physical memory page
  - PMAP data structure
- The VMM seems able to intercept MMU hardware requests to translate GVAs
  - Monitoring PTBR
  - Two memory accesses, to guest virtual memory page table and PMAP







- The VMM must map guest virtual address to host physical address
- Guest OS maintains its own virtual memory page table in the guest physical memory
- The VMM maintains a mapping from each guest physical memory page to the host physical memory page
  - PMAP data structure
- The VMM seems able to intercept MMU hardware requests to translate GVAs
  - Monitoring PTBR
  - Two memory accesses, to guest virtual memory page table and PMAP
- So what the hell is a shadow page table?







- The VMM must map guest virtual address to host physical address
- Guest OS maintains its own virtual memory page table in the guest physical memory
- The VMM n emory page to the host - PMAP data What about the TLB? • The VMM s ts to translate G - Monitoring - Two memory accesses, to guest virtual memory page table and PMAP • So what the hell is a shadow page table?







- The VMM must intercept all VM instructions that manipulate:
  - The hardware TLB contents
  - Guest OS page table
- The actual hardware TLB is updated based on the separate shadow page tables
  - They contain the guest virtual to host physical address mapping
- The VMM must protect the host frames containing the guest page tables!



















MCSN - N.Tonellotto - Distributed Enabling Platforms







MCSN - N.Tonellotto - Distributed Enabling Platforms

lunedì 17 dicembre 12

ISTITUTO DI SCIENZA E TECNOLOGIE DELL'INFORMAZIONE "A. FAEDO"





ISTITUTO DI SCIENZA E TECNOLOGIE





MCSN - N.Tonellotto - Distributed Enabling Platforms

DELL'INFORMAZIONE "A. FAEDO lunedì 17 dicembre 12

ISTITUTO DI SCIENZA E TECNOLOGIE







Process 2 in Guest OS want to access its memory whose page number is I






#### The MMU is driven by events

- generated from guest

write instructions to control registers (in particular PTBR)
page invalidation instructions (in case of page faults)
access to missing or protected entries

- generated from host
  - Changes in PMAP translation (GPA > HPA)
    - GPA > HVA changes
    - HVA > HPA changes
  - Memory pressure









Virtual Address



MCSN - N.Tonellotto - Distributed Enabling Platforms

















Virtual Address





MCSN - N.Tonellotto - Distributed Enabling Platforms







Virtual Address













Virtual Address



MCSN - N.Tonellotto - Distributed Enabling Platforms

















Virtual Address

























MCSN - N.Tonellotto - Distributed Enabling Platforms











MCSN - N. Tonellotto - Distributed Enabling Platforms











MCSN - N.Tonellotto - Distributed Enabling Platforms







Virtual Address



















Virtual Address















MCSN - N.Tonellotto - Distributed Enabling Platforms











MCSN - N.Tonellotto - Distributed Enabling Platforms











MCSN - N.Tonellotto - Distributed Enabling Platforms











MCSN - N.Tonellotto - Distributed Enabling Platforms



















lunedì 17 dicembre 12







lunedì 17 dicembre 12







lunedì 17 dicembre 12





Virtual Address



MCSN - N.Tonellotto - Distributed Enabling Platforms

















Virtual Address

























MCSN - N.Tonellotto - Distributed Enabling Platforms































MCSN - N.Tonellotto - Distributed Enabling Platforms









DELL'INFORMAZIONE "A. FAEDO lunedì 17 dicembre 12


















































#### Hardware Assisted Virtualization





MCSN - N.Tonellotto - Distributed Enabling Platforms





- Difficulties of shadow page tables
  - Complex software-only implementation
  - Page fault and synchronization are critical mechanisms
  - Host memory overhead



MCSN - N.Tonellotto - Distributed Enabling Platforms





- Difficulties of shadow page tables
  - Complex software-only implementation
  - Page fault and synchronization are critical mechanisms
  - Host memory overhead
- Why do we need them?
  - MMU was not designed for virtualization
  - MMU is not aware of the two-levels address translation







- Difficulties of shadow page tables
  - Complex software-only implementation
  - Page fault and synchronization are critical mechanisms
  - Host memory overhead
- Why do we need them?
  - MMU was not designed for virtualization
  - MMU is not aware of the two-levels address translation
- New CPUs support two-levels address translation in hardware!
  - Nested Page Tables a.k.a. Rapid Virtualization Indexing (AMD)
  - Extended Page Table (INTEL)





DELL'INFORMAZIONE "A. FAEDO' lunedì 17 dicembre 12

ISTITUTO DI SCIENZA E TECNOLOGIE

UPERIORE



UPERIORE



























MCSN - N.Tonellotto - Distributed Enabling Platforms

DELL'INFORMAZIONE "A. FAEDO' lunedì 17 dicembre 12

ISTITUTO DI SCIENZA E TECNOLOGIE



















MCSN - N.Tonellotto - Distributed Enabling Platforms

ISTITUTO DI SCIENZA E TECNOLOGIE DELL'INFORMAZIONE "A. FAEDO' lunedì 17 dicembre 12

21

Guest OS

Guest

Virtual Address

Guest Physical Address

Host VMM

Host Physical Address



## **Nested Page Tables**







### **Nested Page Tables**







# **Nested Page Tables**









- An application uses the OS interfaces to explicitly allocate and deallocate virtual memory during execution
  - malloc and free from GNU Lib C
- In a non virtual environment, the OS assumes it owns all physical memory in the system
- Hardware does not explicitly provide interfaces to allocate and free physical memory
- OS implements its own mechanism to track memory allocations
  - allocated and free lists
- The VMM must implements analogous data structures
  - Allocation is easy via interception of memory accesses
  - Deallocation is hard: free lists can not be intercepted







- The VMM can not reclaim host physical memory when the guest OS frees guest physical memory
- The VMM does not allocate host physical memory on every VM's memory allocation
- The VMM only allocates host physical memory when the VM touches the physical memory that is has never touched before
- The guest OS reuses the same host physical memory for the rest of allocations

VM's "host memory" usage  $\leq$  VM's "guest memory" size + VM "overhead" memory







- The VMM must reserve enough host physical memory to back all VM's guest physical memory
  - Plus their overhead memory
- Overcommitment seems not supportable
  - Memory overcommitted if sum of VM memory excesses host memory
- Overcommitment benefits:
  - Higher memory utilization
    - ▶ if some VM does not use completely its committed memory, another VM can benefit
  - Higher memory consolidation
    - VMs will have small footprints, so more VMs can be hosted at the same time
- To support memory overcommitment,VMM must be able to reclaim host memory
  - Transparent page sharing
  - Ballooning
  - Host swapping



MCSN - N.Tonellotto - Distributed Enabling Platforms





- Some VMs can have identical sets of memory content
  - Several VMs running the same OS
  - Several VMs executing the same applications
  - Several VMs accessing the same user data
- Reduce memory occupation by reclaiming memory copies



MCSN - N.Tonellotto - Distributed Enabling Platforms





- Some VMs can have identical sets of memory content
  - Several VMs running the same OS
  - Several VMs executing the same applications
  - Several VMs accessing the same user data
- Reduce memory occupation by reclaiming memory copies









25

- Some VMs can have identical sets of memory content
  - Several VMs running the same OS
  - Several VMs executing the same applications
  - Several VMs accessing the same user data
- Reduce memory occupation by reclaiming memory copies







- Guest OS is not aware of the host memory status
  - In particular, it does not free memory if host is running out of it
- A pseudo driver is installed in each guest OS
  - Balloon driver
  - No exposed interfaces to the guest OS
  - Privately communicates with the VMM only
  - It requires memory allocation, depending on its "size"
- If the VMM requires two pages, it sets the balloon size to two pages
- After allocation, these two pages are "pinned"
  - Guest OS assures pinned page will never be flushed to disk
- After pinning, the VMM can safely reclaim the respective host physical pages
  - Nobody actually relies on the content (read or write)
- If the balloon deflates, it will release the "pins"







- Guest OS is not aware of the host memory status
  - In particular, it does not free memory if host is running out of it
- A pseudo driver is installed in each guest OS
  - Balloon driver
  - No exposed interfaces to the guest OS
  - Privately communicates with the VMM only
  - It requires memory allocation, depending on its "size"









- Guest OS is not aware of the host memory status
  - In particular, it does not free memory if host is running out of it
- A pseudo driver is installed in each guest OS
  - Balloon driver
  - No exposed interfaces to the guest OS
  - Privately communicates with the VMM only
  - It requires memory allocation, depending on its "size"









- Guest OS is not aware of the host memory status
  - In particular, it does not free memory if host is running out of it
- A pseudo driver is installed in each guest OS
  - Balloon driver
  - No exposed interfaces to the guest OS
  - Privately commu
  - It requires memo

#### Guest OS swap space is critical









- Guest OS is not aware of the host memory status
  - In particular, it does not free memory if host is running out of it
- A pseudo driver is installed in each guest OS
  - Balloon driver
  - No exposed interfaces to the guest OS
  - Privately commu
  - It requires memo

Guest OS swap space is critical





MCSN - N.Tonellotto - Distributed Enabling Platforms





- Transparent page sharing and ballooning have performance impacts on the VMM
- Host swapping is used if VMM performance is critical
  - When a VM is started, the VMM creates a separate swap file for the virtual machine
  - When necessary, the VMM can swap out the guest memory to its swap file
- The VMM performance is guaranteed
- The VM performance is severely degraded
- Double paging problem
  - Assume the hypervisor swaps out a guest physical page
  - It is possible that the guest OS system pages out the same physical page
    - If the guest is also under memory pressure
  - This causes the page to be swapped in from the hypervisor swap device and immediately to be paged out to the virtual machine's virtual swap device.
  - Note that it is impossible to find an algorithm to handle all these pathological cases properly
- Due to the potential high performance penalty for VMs, host swapping is the last resort to reclaim memory from a VM





# References



- Performance Evaluation of Intel EPT Hardware Assist
  - <a href="http://www.vmware.com/pdf/Perf\_ESX\_Intel-EPT-eval.pdf">http://www.vmware.com/pdf/Perf\_ESX\_Intel-EPT-eval.pdf</a>
- AMD-V<sup>™</sup> Nested Paging
  - <u>http://developer.amd.com/assets/NPT-WP-1%201-final-TM.pdf</u>
- The x86 kvm shadow mmu
  - <a href="http://www.mjmwired.net/kernel/Documentation/kvm/mmu.txt">http://www.mjmwired.net/kernel/Documentation/kvm/mmu.txt</a>
- What Every Programmer Should Know About Memory
  - <a href="http://www.akkadia.org/drepper/cpumemory.pdf">http://www.akkadia.org/drepper/cpumemory.pdf</a>
- Understanding Memory Resource Management in VMware® ESX<sup>™</sup> Server
  - <a href="http://www.vmware.com/files/pdf/perf-vsphere-memory\_management.pdf">http://www.vmware.com/files/pdf/perf-vsphere-memory\_management.pdf</a>

