Stateless allocators and memory resources

Slides: https://guoci.github.io/stateless_alloc_pmr/

Abstract

  • A common use case for a std::pmr::memory_resource object is to use it to construct a std::pmr::polymorphic_allocator object, then use it in a std::pmr container.
  • Here we explore another alternative, that is to use std::pmr::memory_resource with a stateless allocator

Abstract

Common usage:
std::pmr::memory_resource
std::pmr::polymorphic_allocator
std::pmr container

Alternative:
std::pmr::memory_resource
stateless allocator →
container using stateless allocator

Why should you care about this

  • Some containers do not need type erasure on the allocator type
  • For example, temporary containers that are used only within a block scope
  • Reduced refactoring effort on existing code to use std::pmr::memory_resources
  • Eliminating space and compute overhead associated with a stateful allocator

Introduction to std::pmr::memory_resource

  • From cppreference, Memory resources implement memory allocation strategies that can be used by std::pmr::polymorphic_allocator
    1. std::pmr::new_delete_resource: global operator new and operator delete
    2. std::pmr::null_memory_resource: throws on allocate
    3. std::pmr::(un)synchronized_pool_resource: single/multi threaded, object pool pattern pool resource figure
    4. std::pmr::monotonic_buffer_resource: single threaded bump allocator

Introduction to PMR containers

  • std::pmr container uses a std::pmr::polymorphic_allocator.
  • std::pmr::vector alias template as an example:
  • std::pmr::polymorphic_allocator is a wrapper for std::pmr::memory_resource
Reference: https://en.cppreference.com/w/cpp/memory/polymorphic_allocator

Using stateless allocator with memory resource

  • Stateless allocator accesses global state, e.g. malloc
  • Allocation strategy can be changed by modifying that global state
    1. replacing malloc/free: jemalloc/TCMalloc/mimalloc
    2. modifying global variables in code

Using stateless allocator with memory resource, implementation

  1. Implement std::allocator
  2. Define a global pointer to a memory resource
  3. Modify the allocator's (de)allocate member functions to use that pointer for (de)allocation

Allocator implementation

Header only, 45 lines of code

Using stateless allocator with memory resource

  1. Construct a std::pmr::memory_resource object and initialize that global pointer to point to it
  2. Set the type of the allocator of your container to this allocator type

Usage

A real world example

  • Make PMR permeate nlohmann json parser
  • Jason refactored nlohmann json to make it accept std::pmr::polymorphic_allocators, but observed performance regression
  • Using the stateless allocator with std::pmr::memory_resource method, with unmodified nlohmann json, I observed a 30% decrease in parse time

Benchmarking nlohmann json

Comparison to std::pmr containers

  1. Less refactoring of existing code
  2. Non allocator aware types prevents propagation of an allocator instance, for example std::vector of std::array of std::string
  3. No extra pointer needed in each container
  4. De-virtualization of virtual function calls
  5. No type erasure

Devirtualization

  1. Using a memory resource with stateless allocator needs no polymorphism
  2. Some compilers cannot de-virtualize function calls of std::pmr::memory_resource, godbolt

Devirtualize std::pmr::memory_resource

No need to reimplement everything.
  1. derive a struct from a std::pmr::memory_resource
  2. use final keyword on the struct
  3. hide the allocate/deallocate non virtual functions

Vtable location

  • On my benchmarks I also noticed that if I static link the C++ library (-static-libstdc++) or
  • used the following,
  • there is no devirtualization in the generated assembly, but the virtual function call penalty seems to be gone
  • likely to be due to the fact that the vtable not in a shared library, but I am not sure

Allocator benchmarks

Benchmark adapted from: https://en.cppreference.com/w/cpp/memory/monotonic_buffer_resource

Repeated push_back on std::list

  1. global new/delete based allocators
    std::allocator pmr::polymorphic_allocator with pmr::new_delete_resource stateless allocator with pmr::new_delete_resource
    baseline 1.6x 1.0x
  2. pmr::monotonic_buffer_resource

    2x gains with stateless allocator vs pmr::polymorphic_allocator

  3. Penalty for virtual function calls is measurable in this benchmark, but may not be significant in a real world program

benchmarking code

Results

Benchmark conlusions

Devirtualization helps with performance, but performance is more than the speed of allocation/deallocation

  1. cache effects
  2. memory fragmentation
  3. cross node access in NUMA systems
  4. etc...

With space and compute advantages of stateless allocator, it is unlikely to observe a regression compared to using a pmr::polymorphic_allocator

End

Comments, questions?