November 16th, 2007
by brockp
Learned about iWARP today. This I think is the one major new things I learned at SC this year. We talked to two vendors about their nic’s and what iWARP was and how it worked.
iWARP looks like its just RDMA TCP. Uses standard ethernet switches (which is good for our Force10) CX4 and twisted pair but you need NIC’s that have the extra ability written into them. As for MPI library support iWARP is in OFED which OpenMPI supports and we use already to support our Infiniband. The really killer thing about this is the NIC will do this fancy RDMA over TCP and still do regular TCP/IP in the kernel when talking to other devices.
While latency does not look as good as Infiniband its sub 10 micro second which is great and in most cases good enough. Some vendors made claims also about maintaining latency for a large number of connections being made at once. I plan to test this my self with some MPI_Alltoall() tests on differnt number of processors. If the Infiniband shows the behavior claimed I will try to lean on my boss to request some demo hardware.
In the final point I really think the sweet point for iWARP would be 1Gbps cards that use twisted pair and keep the price low. I would gladly put one of these cards in every node i had and use it as my only interface if it was in the sub $500 range. This is much less than Infiniband because in Infiniband you have to buy a extra switch and use those damn CX4 cables ($#%^).
I Don’t know but I was impressed and if the price is right (which it DOES NOT appear to be so at first look) its a great low cost way to go. I would really like to hear what others have to say on this or if you have tried iWARP email me at: brockp@mlds-networks.com or feel free to comment below.
Brock
November 15th, 2007
by brockp
Wednesday:
I am really starting to see allot of PMPI tools. I personally like OPT from Allinea. While OPT may not have all the features of Tau and others and its not free I really think they do a good job presenting data to users in a simple way. When I spoke with Allinea at their both yesterday i really felt they have momentum and have a customer focus.
Some things I would like to see in OPT though:
- More PAPI Hardware Counters
- Support for Serial programs
- Profile information from functions that don’t call MPI functions
- Speed up database operations for large profile runs
In all I have been happy with the tool and have confidence they will add the missing features to match if not pass Tau and friends.
Cluster Resources:
These guys make the wonderful Moab scheduler we use at U of M. We saw a neat tool they had made with a visual effects lab called Y-Film (sorry no link) to simplify all the rendering needed to be done on a day to day basis. The user can do cool thinks like ‘render every 10th frame’ or ‘render backwards’ to see how their work is coming. This is really cool its user level allows artists who normally are stuck with just their desktop until they make their final project to speed up production time by using a cluster. On top of that their cluster was made up PS3’s as a supported configuration to keep the price even lower for small shops. The PS3 was fast also a frame that took 90 seconds on his Mac Book Pro with 2GHz core Duo took 90 seconds vs ~40 on the PS3 at half the cost. Neat tool.
After this we met with CR about how Moab has been working for us and some issues we had. These guys are great, they really cared about what we had to say they treated us very well even though they now have many larger customers (Amazon Yahoo LLNL etc).
To close turns out my boss Andy Caird is now famous (Click for Larger):

Gallery:
{gallery}sc07{/gallery}
November 14th, 2007
by brockp
Monday Microsoft had a reception at the Reno Art Museum. The opening talk was really good. The speaker showed how the Top500 list has some good trends that have happened over the years of the list. One nice fact was, his laptop reached over 1GFlop/s which was the the same as the #500 machine in 1994. The vast improvements in power and size has been just wonderful. The most important topic was a library called PLASMA (Parallel Linear Algebra Software for Multicore Architectures). This would be the next-gen update to things like BLAS and LAPACK (LAPACK was already the update to LINPACK). Here are some slides on PLASMA though they are not the ones from the talk. Plasma Slides. I would really like some feedback on memory bandwidth from main memory vs. flops/s. The Cell Cpu was discussed as a example for these cases. The next version of the cell claims 100Gflop/s double precision. Current DDR in the nodes we have at the CAC is around 5GByte/s per memory controller. Ok follow the math: 5GBs / 100 GFlop = 51MBytes/s per Gflop, 51 KBytes/Mflop. It looks like this is fine for Blas level 3 when you can get your ratio of floating point operations to slow memory reference to n/2. But what about Blas2 ? Many codes still require this what pressure can be placed on the industry to up memory performance? Don’t even talk to me about computing on GPU’s and PCI-e speeds. Tudsday The keynote was interesting, way over my head but the thing I took from the talk that I would really like is making devices simple and not require huge amounts of power and development time for home devices. Torque/Moab BOF Personally I love Torque and Moab. Torque is a simple easy to use resource manager I have used it for a while and solves most our problems. Some highlights coming:
- Cpuset support, We really want this, right now users run like mad out our SGI Altix 4700.
- Job arrays, This could be useful. Right now maybe not in our environment. I would like to see this work well enough to make the Matlab DCT (Distributed Cluster Toolkit) tie-ins for PBSPro work with torque.
Moab
- Non-blocking commands
- Job Templates
Some Strange stuff This is my first blog post, and I know I have bad if not awful writing skills, but this is funny (Click to Enlarge):
