You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I have to use cudaDeviceSynchronize kind of function to wait to kernel to get finished but we can not use any kind of synchronization at device functions after version 11.6
May I request an example for it?
Here's the code that I'm trying to run:
__global__ voidNNFeedForwardNormalMultiple(double* __restrict__ values, double* __restrict__ weigths, double* result, int inputsize, int outputsize) {
int idx = threadIdx.x + blockIdx.x * blockDim.x;
int outputidx = idx / outputsize;
int inputidx = idx % outputsize;
if (outputidx >= outputsize || inputidx >= inputsize) {
return;
}
atomicAdd(&result[outputidx], values[inputidx] * weigths[outputsize*outputidx + inputidx]);
}
__global__ voidNNFeedForwardNormalActivate(double* __restrict__ biases, double* result, int size) {
int idx = threadIdx.x + blockIdx.x * blockDim.x;
if (idx >= size) {
return;
}
result[idx] = 1.0 / (1.0 + exp(-(result[idx] + biases[idx])));
}
__global__ voidNNFeedForwardNormal(double* __restrict__ values, double* __restrict__ weigths, double* result, double* __restrict__ biases, int inputsize, int outputsize) {
int blocksize = (inputsize * outputsize + THREADS_PER_BLOCK - 1)/THREADS_PER_BLOCK;
NNFeedForwardNormalMultiple<<<blocksize, THREADS_PER_BLOCK>>>(values, weigths, result, inputsize, outputsize);
//normally cudaDeviceSynchronize() kind of function to wait for child kernel to finish;
NNFeedForwardNormalActivate<<<(outputsize + THREADS_PER_BLOCK - 1)/THREADS_PER_BLOCK, THREADS_PER_BLOCK>>>(biases, result, outputsize);
}
Thanks!
The text was updated successfully, but these errors were encountered:
I'm not sure what you're referring to; cudaDeviceSynchronize() is still a valid and supported API. Usually, though, you don't want to launch all of your work into the default stream but rather use streams explicitly, in which case you'd use cudaStreamSynchronize(). Also keep in mind that any work launched into a stream will still complete sequentially - what I mean by that is that if you launch kernel_a, kernel_b, and kernel_c into a stream they'll run in that order.
I'm not sure what you're referring to; cudaDeviceSynchronize() is still a valid and supported API
No, it's not supported to be called from a device or global function after CUDA 11.6. Also I tried cudaStreamSynchronize and other synchronizations too but none of them could be called from a device function.
Also keep in mind that any work launched into a stream will still complete sequentially
Also no, when you call 2 different device functions without a synchronization, it will run almost at the same time, and in a case like mine (which makes some calculations which will take some time) it will try to activate neurons before matrix multiplication and it will add some more value after activation because the first kernel will be still running
Hello, I have to use
cudaDeviceSynchronize
kind of function to wait to kernel to get finished but we can not use any kind of synchronization at device functions after version 11.6May I request an example for it?
Here's the code that I'm trying to run:
Thanks!
The text was updated successfully, but these errors were encountered: