Inspect copies between host and device
Created by: masterleinad
Running nvprof
shows
Time(%) Time Calls Avg Min Max Name
11.39% 821.00ms 217 3.7834ms 1.4080us 108.56ms [CUDA memcpy HtoD]
10.96% 790.22ms 389 2.0314ms 1.5360us 101.28ms [CUDA memcpy DtoH]
We need to investigate where all these copies come from.