Replace tuple-style cast with pointer/stride access

assigned to @rfbird

mentioned in issue #1 (closed)

changed the description

added 1 commit

5e5bc638 - storing raw member pointers and strides instead of single data block pointer

Well, that worked like a dream! Nice work Stuart.

Not only did it eliminate the indirect addressing, but it also shrunk the estimated loop cost from 422 to 153 and allowed the speed-up to jump to basically flat 8x. All internal loops with fixed bounds are also fully unrolled. Right now I see no reason to believe this won't run pretty quick!

Details below:

Old:

   remark #15309: vectorization support: normalized vectorization overhead 0.098
   remark #15301: SIMD LOOP WAS VECTORIZED
   remark #15463: unmasked indexed (or scatter) stores: 10 
   remark #15475: --- begin vector cost summary ---
   remark #15476: scalar cost: 422  
   remark #15477: vector cost: 162.500 
   remark #15478: estimated potential speedup: 2.590

New:

   remark #15301: SIMD LOOP WAS VECTORIZED
   remark #15475: --- begin vector cost summary ---
   remark #15476: scalar cost: 153
   remark #15477: vector cost: 19.000
   remark #15478: estimated potential speedup: 7.990

Nice Bob! Going to add a few more template-free accessors for rank and extent to help with C interfaces and then we should be able to merge. Adding @germann and at @gchen so they are up to date.

added 1 commit

258c4850 - adding runtime rank and extent accessors to AoSoA and Slice

Compare with previous version

added 1 commit

466ca78e - adding multidimensional stride test

Compare with previous version

added 1 commit

2201ae84 - updating examples

Compare with previous version

added 1 commit

c74e9a95 - Adding runtime accessors to memory data to enable vectorization.

Compare with previous version

unmarked as a Work In Progress

merged

mentioned in commit 0faac5b6

assigned to @uy7

Also just a note that I did verify our Cuda UVM implementation is still functioning with the new access patterns.

Replace tuple-style cast with pointer/stride access

Merged by Slattery, Stuart 7 years ago (Mar 9, 2018 7:25pm UTC) 7 years ago

Activity

Replace tuple-style cast with pointer/stride access

Merge request reports

Merged by Slattery, Stuart 7 years ago (Mar 9, 2018 7:25pm UTC) 7 years ago

Activity