BIG gpu matrix division
Hi there,
I tried to perform the per-element division of two long 1D GPU matrices, and it ends up with the following exception:
Invalid Configuration Argument - This error means that the dimension of either the specified grid of blocks (dimGrid) , or number of threads in a block (dimBlock), is incorrect. In such a case, the dimension is either zero or the dimension is larger than it should be. This error will only occur if you dynamically determine the dimensions.
After tracing down to the source I found
const dim3 grid(divUp(cols, block.x), divUp(rows, block.y));
and
const dim3 block(Policy::block_size_x, Policy::block_size_y)
Since there're 729,632 rows and 1 column in each of the gpu matrices, the determined grid size is 1 by 91,204 by 1 according to the policy
struct DefaultTransformPolicy
{
enum {
block_size_x = 32,
block_size_y = 8,
shift = 4
};
};
which looks don't fit well with my case because 91,204 already exceeds the limit of 65536.
I was wondering how this policy is decided. Is it possible to override it within my own code, without rebuilding the library?
Same problem here. How can you modify the Policy? I have not found a solution.