You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I found three issues in GPGPU-Sim's current implementation for generic addressing:
Function whichspace( addr_t addr ) calculates memory space boundaries using hardcoded hardware specs (MAX_STREAMING_MULTIPROCESSORS=64, SHARED_MEM_SIZE_MAX=(64*1024)) which no longer match the Volta architecture and lead to incorrect simulated performance counters (e.g. num of global load instructions).
Function whichspace( addr_t addr ) determines if an address is in global, shared, or local space. However, according to CUDA doc, a generic address also falls into const space.
Volta supports reconfigurable L1/shared mem sizes. Hence, the address space window size might change based on kernel requirements, which perhaps should be an input to the function whichspace( addr_t addr ).
A long term fix for Issue 1 might be setting the parameters based on gpusim.config file. One of the problems with using hardcoded specs is that the same load instruction might generate the memory traffic to different memory spaces when executed on a different SM.
Regarding Issue 2 , I am not 100% sure if gpgpusim handles constant space differently so the issue might not be valid.
Regarding Issue 3, I am also not entirely sure if the real hardware changes the shared mem window size based on reconfigured sizes. One way to validate this is to run some microbenchmarks on real hardware using predicate functions.
I can potentially make the fixes and submit a pull request for this. However, since this is not a super quick fix, I am wondering if any other simulator developers have any thoughts on these issues.
The text was updated successfully, but these errors were encountered:
I found three issues in GPGPU-Sim's current implementation for generic addressing:
whichspace( addr_t addr )
calculates memory space boundaries using hardcoded hardware specs (MAX_STREAMING_MULTIPROCESSORS=64, SHARED_MEM_SIZE_MAX=(64*1024)) which no longer match the Volta architecture and lead to incorrect simulated performance counters (e.g. num of global load instructions).whichspace( addr_t addr )
determines if an address is in global, shared, or local space. However, according to CUDA doc, a generic address also falls into const space.whichspace( addr_t addr )
.A long term fix for Issue 1 might be setting the parameters based on gpusim.config file. One of the problems with using hardcoded specs is that the same load instruction might generate the memory traffic to different memory spaces when executed on a different SM.
Regarding Issue 2 , I am not 100% sure if gpgpusim handles constant space differently so the issue might not be valid.
Regarding Issue 3, I am also not entirely sure if the real hardware changes the shared mem window size based on reconfigured sizes. One way to validate this is to run some microbenchmarks on real hardware using predicate functions.
I can potentially make the fixes and submit a pull request for this. However, since this is not a super quick fix, I am wondering if any other simulator developers have any thoughts on these issues.
The text was updated successfully, but these errors were encountered: