Dynamic slicing for array on pinned_host
#33158
Unanswered
mathieu-reymond
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I am testing a setup where a large buffer
x(let's say too large to fit on GPU) is on pinned memory. Subsets of the buffer are periodically updated with datay(resulting from computations on the GPU). Here is a small example:yis moved to the pinned memory, andxis updated appropriately. This works without any issues. However, if I want to use dynamic slices instead:I get warnings, and the code fails:
When I look at both variants' jaxpr, I don't see any obvious difference. Both seem to use the same
scatteroperation, except forupdate_window_dims. Here is the variant without dynamic slicing:{ lambda ; a:f32<host>[536870912] b:i32[5]. let c:f32[5] = convert_element_type[new_dtype=float32 weak_type=False] b d:f32<host>[5] = device_put[ copy_semantics=(ArrayCopySemantics.REUSE_INPUT,) devices=(SingleDeviceSharding(device=CudaDevice(id=0), memory_kind=pinned_host),) srcs=(None,) ] c e:i32[1] = broadcast_in_dim[ broadcast_dimensions=() shape=(1,) sharding=None ] 0:i32[] f:f32<host>[536870912] = scatter[ dimension_numbers=ScatterDimensionNumbers(update_window_dims=(0,), inserted_window_dims=(), scatter_dims_to_operand_dims=(0,), operand_batching_dims=(), scatter_indices_batching_dims=()) indices_are_sorted=True mode=GatherScatterMode.FILL_OR_DROP unique_indices=True update_consts=() update_jaxpr=None ] a e d in (f,) }and here the one with dynamic slicing (that fails):
{ lambda ; a:f32<host>[536870912] b:i32[5]. let c:i32[5] = iota[dimension=0 dtype=int32 shape=(5,) sharding=None] d:i32<host>[5] = device_put[ copy_semantics=(ArrayCopySemantics.REUSE_INPUT,) devices=(SingleDeviceSharding(device=CudaDevice(id=0), memory_kind=pinned_host),) srcs=(None,) ] c e:f32[5] = convert_element_type[new_dtype=float32 weak_type=False] b f:f32<host>[5] = device_put[ copy_semantics=(ArrayCopySemantics.REUSE_INPUT,) devices=(SingleDeviceSharding(device=CudaDevice(id=0), memory_kind=pinned_host),) srcs=(None,) ] e g:bool<host>[5] = lt d 0:i32[] h:i32<host>[5] = add d 536870912:i32[] i:i32<host>[5] = select_n g d h j:i32<host>[5,1] = broadcast_in_dim[ broadcast_dimensions=(0,) shape=(5, 1) sharding=None ] i k:f32<host>[536870912] = scatter[ dimension_numbers=ScatterDimensionNumbers(update_window_dims=(), inserted_window_dims=(0,), scatter_dims_to_operand_dims=(0,), operand_batching_dims=(), scatter_indices_batching_dims=()) indices_are_sorted=False mode=GatherScatterMode.FILL_OR_DROP unique_indices=False update_consts=() update_jaxpr=None ] a j f in (k,) }What is
update_window_dims? Can someone explain to me what makes these 2 snippet behave so differently? Is there any alternative way to solve my initial problem, i.e., having a large buffer on pinned memory that I don't want to copy on GPU, but want to update with data coming from the GPU?I would welcome any pointers.
Beta Was this translation helpful? Give feedback.
All reactions