Conversation
Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
|
@copilot Please address the following issues in this PR: Critical Bugs
Design Issues
Testing Gaps
|
Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
All 12 issues addressed in commit Critical bugs:
Design improvements: Segment API: New tests: OOM, thread safety, All 24 new tests pass on 2 ranks; all 2255 existing tests pass. |
VMemPow2Allocatoriniris/allocators/vmem_pow2_allocator.pyiris/allocators/__init__.pyto exportVMemPow2Allocatoriris/symmetric_heap.pyto supportallocator_type="vmem_pow2"tests/unittests/test_vmem_pow2_allocator.pywith comprehensive testsimport_external_tensorusesaligned_export_size(export called first)owns_tensorchecksdata_ptr()range for all tensors; zero-element tensors no longer unconditionally claimedweakref.finalizeontensor.untyped_storage()(survivesreshape()/other view ops) +_pending_freedeque (avoids lock-reentry deadlock)_remap_free_blockreleases oldmem_unmap+mem_release, creates newmem_create+mem_map, then restoresmem_set_access(base_va, cumulative, ...)— releasing at free time is incompatible with ROCm's requirement thathipMemSetAccessmust start frombase_va(calling on sub-ranges gives HIP error 1)_CUDAArrayInterfacedefined once at module level_element_size()with module-level_DTYPE_ELEMENT_SIZEcacheimport_external_tensorfixed withtry/finallyva_multiplierparameter removedtest_vmem_pow2_oom— RuntimeError on VA exhaustiontest_vmem_pow2_thread_safety— 4 threads × 20 alloc/free cyclestest_vmem_pow2_close+test_vmem_pow2_close_disables_finalizersprint()calls removed from testsget_allocation_segments()returns(offset, size, va, generation)4-tuples; symmetric_heap uses(offset, size, generation)as dedup key, handles stale peer mappingsvmem_allocator.pyreturnsgeneration=0in segments (backward compatible)ptr==0note, deque thread-safety note)Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.