what difference between using cpu timer , cuda timer event measure time taken execution of cuda code? of these should cuda programmer use , why?
cpu timer usage involve calling cudathreadsynchronize
before time noted. noting time clock()
used or high-resolution performance counter queryperformancecounter
(on windows) queried.
cuda timer event involve recording before , after using cudaeventrecord
. @ later time, elapsed time obtained calling cudaeventsynchronize
on events, followed cudaeventelapsedtime
obtain elapsed time.
the answer first part of question cudaevents timers based off high resolution counters on board gpu, , have lower latency , better resolution using host timer because come "off metal". should expect sub-microsecond resolution cudaevents timers. should prefer them timing gpu operations precisely reason. per-stream nature of cudaevents can useful instrumenting asynchronous operations simultaneous kernel execution , overlapped copy , kernel execution. doing sort of time measurement impossible using host timers.
edit: won't answer last paragraph because deleted it.
Comments
Post a Comment