Analysis of memory allocator usage in the O3DE (Open 3D Engine) codebase.
O3DE uses a custom memory allocation system. Raw new/delete and malloc/free should not be used directly. All allocations flow through the AZ::IAllocator interface with tracking, profiling, and debugging support.
Core files: Code/Framework/AzCore/AzCore/Memory/
IAllocator (interface)
└── AllocatorBase (adds profiling/tracking)
├── SystemAllocator - Default general-purpose (uses HphaSchema internally)
├── OSAllocator - Direct OS allocations, untracked
├── PoolAllocator - Fixed-size pool, non-thread-safe
├── ThreadPoolAllocator - Fixed-size pool, thread-local pools
└── SimpleSchemaAllocator<T> - Generic wrapper for custom schemas
ChildAllocatorSchema<Parent> - Pass-through for memory categorization
Default allocator for most allocations. Internally uses HPHA (High Performance Heap Allocator), based on Dimitar Lazarov's algorithm.
// Implicitly used by default
void* p = azmalloc(1024);
azfree(p);Direct OS heap allocations (malloc/free). Not tracked by the allocator manager. Used for debug infrastructure and bootstrap allocations.
Optimized for many small, fixed-size allocations. ThreadPoolAllocator maintains per-thread pools to avoid lock contention.
// Good for frequently allocated small objects
class SmallObject
{
AZ_CLASS_ALLOCATOR(SmallObject, AZ::PoolAllocator);
};Creates a named allocator that delegates to a parent but tracks allocations separately. Useful for per-subsystem memory accounting.
AZ_CHILD_ALLOCATOR_WITH_NAME(
PhysicsAllocator,
"PhysicsMemory",
"{GUID}",
AZ::SystemAllocator
);
class RigidBody
{
AZ_CLASS_ALLOCATOR(RigidBody, PhysicsAllocator);
};
// Allocations tracked under "PhysicsMemory" but use SystemAllocator's heapO3DE includes arena-style allocators where individual frees are disabled:
Gems/Atom/RHI/Code/Include/Atom/RHI/LinearAllocator.h
Used for per-frame GPU resource allocations in the Atom renderer.
class LinearAllocator final : public Allocator
{
VirtualAddress Allocate(size_t byteCount, size_t byteAlignment) override;
void DeAllocate(VirtualAddress offset) override; // NO-OP
void GarbageCollect() override; // Reset after N cycles
void GarbageCollectForce() override; // Immediate reset
};Allocate()bumps a cursor forward (O(1), ~3 instructions)DeAllocate()does nothing (individual frees ignored)GarbageCollect()resets cursor afterm_garbageCollectLatencycyclesGarbageCollectForce()immediately resets cursor to 0
Supports deferred reclamation for GPU resources still in-flight.
From Gems/Atom/RHI/Code/Source/RHI/LinearAllocator.cpp:
VirtualAddress LinearAllocator::Allocate(size_t byteCount, size_t byteAlignment)
{
VirtualAddress addressCurrentAligned{ AlignUp(m_descriptor.m_addressBase.m_ptr + m_byteOffsetCurrent, byteAlignment) };
size_t byteCountAligned = AlignUp(byteCount, byteAlignment);
size_t nextByteAddress = (addressCurrentAligned.m_ptr - m_descriptor.m_addressBase.m_ptr) + byteCountAligned;
if (nextByteAddress > m_descriptor.m_capacityInBytes)
return VirtualAddress::CreateNull(); // Out of space
m_byteOffsetCurrent = nextByteAddress;
return addressCurrentAligned;
}
void LinearAllocator::DeAllocate(VirtualAddress offset)
{
(void)offset; // Intentional no-op
}
void LinearAllocator::GarbageCollectForce()
{
m_byteOffsetCurrent = 0; // Reset cursor to beginning
}Code/Framework/AzCore/AzCore/JSON/RapidjsonAllocatorAdapter.h
Fixed-size stack buffer for temporary JSON parsing.
template<size_t SizeN, size_t AlignN = alignof(AZStd::byte)>
class RapidjsonStackAllocator
{
static constexpr bool kNeedFree = false;
void* Malloc(size_t size); // Bump cursor
void* Realloc(...); // Extend or copy
static void Free(void*) { } // No-op
};Classes must declare their allocator to use O3DE's memory system:
class MyClass
{
public:
AZ_CLASS_ALLOCATOR(MyClass, AZ::SystemAllocator);
// Optional alignment: AZ_CLASS_ALLOCATOR(MyClass, AZ::SystemAllocator, 16);
};This macro generates:
operator new/operator deleteusing the specified allocatorAZ_CLASS_ALLOCATOR_Allocate()/AZ_CLASS_ALLOCATOR_DeAllocate()static helpers- Disabled array
new[]/delete[](asserts if called)
For header/source separation:
// Header
class MyClass
{
public:
AZ_CLASS_ALLOCATOR_DECL
};
// Source (.cpp)
AZ_CLASS_ALLOCATOR_IMPL(MyClass, AZ::SystemAllocator);// Basic allocation (SystemAllocator)
void* p = azmalloc(size);
void* p = azmalloc(size, alignment);
void* p = azmalloc(size, alignment, AllocatorType);
// Zero-initialized
void* p = azcalloc(size);
void* p = azcalloc(size, alignment);
void* p = azcalloc(size, alignment, AllocatorType);
// Reallocation
void* p = azrealloc(ptr, newSize);
void* p = azrealloc(ptr, newSize, alignment);
void* p = azrealloc(ptr, newSize, alignment, AllocatorType);
// Deallocation
azfree(ptr);
azfree(ptr, AllocatorType);
azfree(ptr, AllocatorType, size, alignment); // Full info for debugging// Create object (calls constructor)
MyClass* obj = azcreate(MyClass, (ctorArg1, ctorArg2), AZ::SystemAllocator);
// Destroy object (calls destructor + frees)
azdestroy(obj, AZ::SystemAllocator, MyClass);
// Shorthand for SystemAllocator
MyClass* obj = azcreate(MyClass, (args));
azdestroy(obj);size_t size = azallocsize(ptr, AllocatorType);Use AZStdAlloc wrapper with AZStd containers:
// Compile-time allocator binding
AZStd::vector<int, AZStdAlloc<AZ::SystemAllocator>> vec;
AZStd::list<Entity, AZStdAlloc<AZ::PoolAllocator>> entities;
// Runtime allocator binding
AZStd::vector<int, AZStdIAllocator> vec(&myAllocatorInstance);
// Functor-based (deferred allocator lookup)
AZStd::vector<int, AZStdFunctorAllocator> vec(&GetMyAllocator);Singleton managing all registered allocators:
AZ::AllocatorManager& mgr = AZ::AllocatorManager::Instance();
// Iterate allocators
for (int i = 0; i < mgr.GetNumAllocators(); ++i)
{
AZ::IAllocator* alloc = mgr.GetAllocator(i);
size_t used = alloc->NumAllocatedBytes();
}
// Force garbage collection on all allocators
mgr.GarbageCollect();
// Dump statistics
mgr.DumpAllocators();
// Out-of-memory callback
mgr.AddOutOfMemoryListener([](IAllocator* alloc, size_t size, size_t align) {
// Handle OOM
});Get singleton instance of any allocator:
AZ::IAllocator& alloc = AZ::AllocatorInstance<AZ::SystemAllocator>::Get();
alloc.allocate(1024, 16);struct AllocatorDebugConfig
{
AllocatorDebugConfig& StackRecordLevels(int levels); // Callstack capture depth
AllocatorDebugConfig& ExcludeFromDebugging(bool exclude); // Skip tracking
AllocatorDebugConfig& UsesMemoryGuards(bool use); // Buffer overrun detection
AllocatorDebugConfig& MarksUnallocatedMemory(bool marks); // Pattern fill
};For debugging, allocators can track detailed allocation info:
const AZ::Debug::AllocationRecords* records = allocator->GetRecords();
// Contains: size, alignment, callstack, thread ID, timestampEnable tracking:
AZ::AllocatorManager::Instance().SetTrackingMode(AZ::Debug::AllocationRecords::Mode::Full);Core interface all allocators implement:
class IAllocator
{
virtual AllocateAddress allocate(size_type byteSize, align_type alignment) = 0;
virtual size_type deallocate(pointer ptr, size_type byteSize, align_type alignment) = 0;
virtual AllocateAddress reallocate(pointer ptr, size_type newSize, align_type newAlignment) = 0;
virtual size_type get_allocated_size(pointer ptr, align_type alignment) const = 0;
virtual void GarbageCollect() { }
virtual size_type NumAllocatedBytes() const { return 0; }
virtual const char* GetName() const;
virtual bool IsReady() const { return true; }
// Debug/profiling
virtual AllocatorDebugConfig GetDebugConfig() { return {}; }
virtual void SetProfilingActive(bool active) { }
virtual bool IsProfilingActive() const { return false; }
};- Always use AZ_CLASS_ALLOCATOR for classes that will be heap-allocated
- Use PoolAllocator for small, frequently allocated objects of uniform size
- Use ChildAllocator to track memory usage by subsystem without overhead
- Use azcreate/azdestroy for objects, azmalloc/azfree for raw memory
- Avoid array new[] — use
AZStd::vectorinstead - Call GarbageCollect() periodically to return unused memory to OS
- Use LinearAllocator for frame-scoped allocations that can be bulk-freed
Benchmarks comparing O3DE allocators against each other and standard malloc/free. Run on Apple M2 (12 cores @ 2.4 GHz), macOS, profile build.
Benchmark source: Gems/Atom/RHI/Code/Tests/LinearAllocatorBenchmarks.cpp
| Allocator | Time | Throughput | vs malloc |
|---|---|---|---|
| LinearAllocator | 2.86 µs | 349M items/sec | ~20x faster |
| malloc/free | 56.0 µs | 17.8M items/sec | 1x |
| SystemAllocator | 95.2 µs | 10.5M items/sec | 0.6x |
| Allocator | 1000 allocs | 5000 allocs | 10000 allocs |
|---|---|---|---|
| LinearAllocator | ~1.5 µs | ~3.7 µs | ~12.6 µs |
| malloc/free | 30.2 µs | 154 µs | 308 µs |
| SystemAllocator | 42.8 µs | 217 µs | 430 µs |
| Allocator | 1000 allocs | 5000 allocs | 10000 allocs |
|---|---|---|---|
| malloc/free | 64.2 µs | 368 µs | 891 µs |
| SystemAllocator | 126 µs | 611 µs | 1319 µs |
| Allocations/Frame | LinearAllocator Time | Throughput |
|---|---|---|
| 100 | 0.28 µs | 355M items/sec |
| 500 | 1.44 µs | 346M items/sec |
| 1000 | 2.91 µs | 344M items/sec |
| Capacity | Time | Throughput |
|---|---|---|
| 64 KB | 1.5 µs | 42 GB/s |
| 256 KB | 3.7 µs | 66 GB/s |
| 1 MB | 12.6 µs | 78 GB/s |
| 16 MB | 191 µs | 82 GB/s |
- LinearAllocator is 20-35x faster than traditional allocators for "allocate many, free all" patterns
- malloc is ~1.5x faster than SystemAllocator for bulk allocation/deallocation
- SystemAllocator overhead comes from:
- Per-allocation tracking and profiling hooks
- HPHA heap management (free lists, coalescing)
- Thread-safety synchronization
- LinearAllocator scales linearly - maintains ~350M items/sec regardless of allocation count
- Throughput increases with capacity due to better cache utilization in sequential access
Arena allocation (LinearAllocator):
- Allocate: Single pointer bump (~3 instructions)
- Deallocate: No-op (0 instructions)
- Reset: Single pointer assignment
Traditional allocation (SystemAllocator, malloc):
- Allocate: Free list search, splitting, bookkeeping
- Deallocate: Free list insertion, coalescing checks
- No bulk reset - must free each allocation individually
These benchmarks simulate actual Atom renderer patterns including memory initialization.
Benchmark source: Gems/Atom/RHI/Code/Tests/RealisticLinearAllocatorBenchmarks.cpp
Allocates contiguous memory for DrawPacket headers, DrawItem arrays, sort keys, filter masks, and SRG pointers.
| Allocator | 100 packets | 1000 packets | 5000 packets |
|---|---|---|---|
| LinearAllocator | 1.17 µs | 12.3 µs | 63.9 µs |
| SystemAllocator | 7.14 µs | 72.4 µs | 362 µs |
| malloc/free | 8.26 µs | 86.6 µs | 471 µs |
LinearAllocator is ~6x faster for DrawPacket construction.
Combined allocation pattern simulating a complete rendering frame.
| Scene Complexity | LinearAllocator | SystemAllocator | malloc | Allocations |
|---|---|---|---|---|
| 1 (simple) | 2.12 µs | 10.8 µs | 12.8 µs | 170 |
| 5 (medium) | 11.3 µs | 53.7 µs | 66.9 µs | 850 |
| 10 (complex) | 22.6 µs | 108 µs | 130 µs | 1,700 |
LinearAllocator is ~5x faster for mixed frame workloads.
Tests allocator behavior over multiple frames.
| Allocations/Frame | LinearAllocator | SystemAllocator | malloc |
|---|---|---|---|
| 100 | 0.017 ms | 0.44 ms | 0.48 ms |
| 500 | 0.091 ms | 2.22 ms | 2.26 ms |
| 1000 | 0.18 ms | 4.46 ms | 4.41 ms |
| 2000 | 0.36 ms | 9.01 ms | 8.88 ms |
LinearAllocator is ~25x faster for sustained frame loads.
Includes memory initialization (write) and traversal (read) to test cache locality.
| Allocator | 1000 packets | Throughput |
|---|---|---|
| LinearAllocator | 16.9 µs | 59M items/sec |
| SystemAllocator | 78.6 µs | 12.7M items/sec |
| malloc/free | 90.9 µs | 11.2M items/sec |
LinearAllocator is ~5x faster even when including memory access patterns.
The realistic benchmarks show 5-25x advantage (vs 20-35x in microbenchmarks) because:
- Memory initialization (
memset) time is included - Larger allocations reduce relative allocator overhead
- Cache effects from memory access patterns
The advantage is still substantial and represents real-world performance gains.
| Use Case | Recommended Allocator |
|---|---|
| Frame-scoped render data | LinearAllocator |
| Temporary parsing buffers | RapidjsonStackAllocator |
| Long-lived game objects | SystemAllocator |
| Many small uniform objects | PoolAllocator |
| Per-subsystem tracking | ChildAllocator |
| Bootstrap/debug infrastructure | OSAllocator |
# Build with benchmarks enabled
cmake --preset mac-ninja # or windows-vs2022
cmake --build build/mac_ninja --target Atom_RHI.Tests --config profile
# Run allocator benchmarks
./build/mac_ninja/bin/profile/AzTestRunner \
./build/mac_ninja/bin/profile/libAtom_RHI.Tests.dylib \
AzRunBenchmarks --benchmark_filter="Linear|System|Malloc"Note: Tests must be enabled on Mac by setting ATOM_RHI_TRAIT_BUILD_SUPPORTS_TEST=TRUE in
Gems/Atom/RHI/Code/Platform/Mac/AtomRHITests_traits_mac.cmake.