dcopy() vs memcopy() vs C code?? ACML slacks?

It is not uncommon to have to copy around arrays and data in HPC applications. There is three ways you can do this:

  • string.h memcpy()
  • BLAS1’s DCOPY()
  • C code and let the compiler optimize it

Of these three I expected DCOPY() to be the fastest, then memcpy() and last the C code. Oh was I wrong. I used the STREAM benchmark on an AMD Opteron 2218 using PGI 7.2 compilers:

  • memcpy() 3056.8 MB/s
  • DCOPY() 5727.4 MB/s
  • C code 5737.7 MB/s

So memcpy() is much slower than I thought, It about the same speed as if I optimize the crap out of the C code using the GNU C compiler, but we are not using the GNU compiler. DCOPY() reaches that speed using the PGI or GNU compiler which is good to know for portable so I still recommend using DCOPY() over memcpy().

The result from the PGI compiler resulted in the great performance. Turns out the compiler is to smart for us:

pgcc stream.c -fastsse -Minline -Minfo
main:
   164, Generated vector sse code for inner loop
   181, Generated vector sse code for inner loop
        Generated 1 prefetch instructions for this loop
   208, Memory copy idiom, loop replaced by call to __c_mcopy8
   218, Generated vector sse code for inner loop

Notice how on line 208 the memory copy was replaced by a call to __c_mcopy8. Looks like PGI is smart and has their own high speed calls to do similar operations built into their compiler. Nice work guys.

Where ACML starts to slack is in use of the AMD multiple memory controllers. Their DCOPY() is not threaded and thus the OpenMP version does not any faster than the single threaded version. While the compiler is even faster on more memory controllers!

  • DCOPY() 2 threads: 4525.8 MB/s
  • C code 2 threads: 10934.6 MB/s

As you can see ACML has room for improvment making use of the multiple memory contolers available in the Opteron platform. Good compilers can do operations as simple as copying arrays of doubles faster using

 #pragma omp for 

. Again test test test.

NONE, NADA, ZIP, ZILCH

Why don't you pony up and be the first to add your comment?

Add your own comment...

plants