Inspect copies between host and device

Created by: masterleinad

Running nvprof shows

Time(%) Time     Calls Avg       Min       Max      Name
11.39%  821.00ms 217   3.7834ms  1.4080us  108.56ms [CUDA memcpy HtoD]
10.96%  790.22ms 389   2.0314ms  1.5360us  101.28ms [CUDA memcpy DtoH]

We need to investigate where all these copies come from.