HPC The future and Vector Processing

The three of us at the CAC were all invited to a talk at the University Hospital on HPC (High Performance Computing) in medicine. A few cool things came to mind.

  • HPC is being used in real medical applications
  • Vector processing is back (think Cell)
  • HPC is going to be underutilized

The first point is obvious.  And its cool, HPC till now has really been only used for weather prediction, and anything else was all research. Now there is a need to build simple clusters and gateways to them for a doctor to pull data right from a DNA sequencer or MRI machine.

(I will get this graphic latter).

Vector Processing is back.  To me as much as IBM wants to call the eight units hanging off the Cell cpu SPE’s. The cell is just a vector CPU in the eyes of a application.   There is a few problems with this: First, vector processing is not taught in classes at all if your a Computer Science major.  And when I mention vector, SIMD or SSE, 3DNow! to a gradstudent who is trying to graduate in 3 years and is just learning to code they have no clue.

Already in the graphic above you can see that if you use the vector SSE unit on the CPU in your laptop the time to soluation is cut by more than half.  This problem is only going to get worse with things like Cell, and GPU’s.  All these systems are massive SIMD engines, that the peopel writing code on at the collage level know nothing about.  Classes do not exist to teach such things, and the tools are not there to abstract it out.

 My prediction: 80% of the code ran on HPC systems at universities will not use the SPE’s GPUS and Vector units.  Resulting in wasting huge amounts of resources. 

Of the code ran thats written by grad students, I expect 95% to not use them. 

Now I don’t say we should not go this route, I think we have to.  Look at the performace of the Cell, 200 Gflop when working on floats. A Intel or AMD cpu can’t do half that even with SSE3 which doubled the performance of AMD and Intel cpus when (can you guess it?) Only using SSE3 VECTOR units!  Scallar performance is still drag and will not improve (Much).

We need to build tools to make these units available, but many already are around IMSLNAG and similar tools already provide high level methods for solvers to use vector units, and expect them to build ones for the cell.  There is also the lowerlevel BLAS and LAPACK, in the form of MKL, ACML, ATLAS and GOTO. All free to researchers, and provide huge performance gains mostly by memory blocking and vector unit use.  Many are also already parallel with no need to know OpenMP or MPI.  Tlak to your local HPC admin, or email me at brockp@mlds-netowrks.com.

So I think we already have the tools and they are not being used enough, so whats the problem?

 Education,

Courses, ether in the form of formal classes or as seminars need to be available and pushed by Faculty that show students writing new code can find out that these tools are available. Just teaching students to use:

pgcc -fastsse -O3 -ipo -Minfo
gcc -O3

Would make huge performance leaps on systems on campuses.  

Sigh, I started teaching such classes but I can only do so much, You can find what I have done so far at:

www.umich.edu/~brockp

NONE, NADA, ZIP, ZILCH

Why don't you pony up and be the first to add your comment?

Add your own comment...

plants