Memory operands should be decoupled from register dataflow for elements_per_thread propagation. This allows:
- Writing N elements per thread to memory
- Reading M elements per thread from same memory location
- This represents valid resharding/redistribution, not an error