python - PyCUDA/CUDA: Causes of non-deterministic launch failures? -


anyone following cuda have seen few of queries regarding project i'm involved in, haven't i'll summarize. (sorry long question in advance)

three kernels, 1 generates data set based on input variables (deals bit-combinations can grow exponentially), solves these generated linear systems, , reduction kernel final result out. these 3 kernels ran on , on again part of optimisation algorithm particular system.

on dev machine (geforce 9800gt, running under cuda 4.0) works perfectly, time, no matter throw @ (up computational limit based on stated exponential nature), on test machine (4xtesla s1070's, 1 used, under cuda 3.1) exact same code (python base, pycuda interface cuda kernels), produces exact results 'small' cases, in mid-range cases, solving stage fails on random iterations.

previous problems i've had code have been numeric instability of problem, , have been deterministic in nature (i.e fails @ same stage every time), 1 frankly pissing me off, fail whenever wants to.

as such, don't have reliable way breaking cuda code out python framework , doing proper debugging, , pycuda's debugger support questionable least.

i've checked usual things pre-kernel-invocation checking of free memory on device, , occupation calculations grid , block allocations fine. i'm not doing crazy 4.0 specific stuff, i'm freeing allocate on device @ each iteration , i've fixed data types being floats.

tl;dr, has come across gotchas regarding cuda 3.1 haven't seen in release notes, or issues pycuda's autoinit memory management environment cause intermittent launch failures on repeated invocations?

have tried:

cuda-memcheck python yourapp.py 

you have out of bounds memory access.


Comments