<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" 
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>

<channel>
	<title>MLDS Blog</title>
	<link>http://www.mlds-networks.com/components/com_mojo</link>
	<description>Blogs from members of the MLDS team</description>
	<pubDate>Thu, 21 Aug 2008 01:44:48 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.0</generator>
	<language>en</language>
			<item>
		<title>Lustre 1.6.5.1 and Sun X4500&#8217;s</title>
		<link>http://www.mlds-networks.com/index.php/component/option,com_mojo/Itemid,29/p,46/</link>
		<comments>http://www.mlds-networks.com/index.php/component/option,com_mojo/Itemid,29/p,46/#comments</comments>
		<pubDate>Thu, 21 Aug 2008 01:44:48 +0000</pubDate>
		<dc:creator>brockp</dc:creator>
		
	<category>hpc</category>
	<category>cac</category>
		<guid isPermaLink="false">/?p=46</guid>
		<description><![CDATA[Just finished installing two Sun X4500 AKA Thumppers. Our systems have 48 1 Tbit drives (yes 1000Gig not 1024Gig) and 16 GB of ram.  The intention was to run Lustre on them and move data from our old 2.7TByte lustre 1.6.4 setup.
Today we finished the install final usable disk space is 49TBytes and 8Gbit [...]]]></description>
			<content:encoded><![CDATA[<p>Just finished installing two Sun <a href="http://www.sun.com/servers/x64/x4500/">X4500</a> AKA Thumppers. Our systems have 48 1 Tbit drives (yes 1000Gig not 1024Gig) and 16 GB of ram.  The intention was to run <a href="http://www.lustre.org">Lustre</a> on them and move data from our old 2.7TByte lustre 1.6.4 setup.</p>
<p>Today we finished the install final usable disk space is 49TBytes and 8Gbit of bandwidth provided by two sets of 4 1 Gig links bonded with 802.3ab.  We have so much less real disk after the drives are broken into 14 raid groups with spares and external journals.  Oh and don&#8217;t forget the two boot drives. We decided to go with external journals to help with the random nature of our work load.  Our old disk system was an NFS mount provided by some proprietary data movers. Suffered not from raw performance but from the load of 2000+ cpus asking for IO of different patters to many different files at the same time.  There just wasn&#8217;t a way to provide a single name space with a simple NFS server. </p>
<p> We hope now with lustre writing meta data to 6 15,000RPM disks in raid 0+1 on a server with 16GB of ram for cache should help meta data performance. Random IO performance to data by default will be spread across 14 differnt arrays, 7 per X4500.  This should keep keep heads moving in parallel for multiple requests. The best part is we can just add more X4500&#8217;s latter as IO and space needs come. Last both meta data and object data arrays have external journals keeping journal IO to the underlying ldiskfs filesystem independent.  This should also help the many differnt requests being hit on the servers.</p>
<h3>Performance</h3>
<p>The only real performance test we did was bandwidth from a single host to a single array was limited to the 1Gbit/s speed of the host. There were no metadata tests, if you know of any please email or comment.</p>
<h3> MPI-IO Performance</h3>
<p>This was a big reason for putting lustre on our system.  NFS and <a href="http://www-unix.mcs.anl.gov/romio/">Romio</a> don&#8217;t play well. Lustre was built with MPI-IO in mind and thus works out of the box.  Writing to a single file using <a href="http://www-unix.mcs.anl.gov/mpi/www/www3/MPI_File_write.html">MPI_File_write()</a> on 10 4 cpu Opteron 2218 nodes with 1Gbit/s ethernet reached 650 MB/s write speed. At this point the CPU&#8217;s on the X4500 were filled.  I think higher speeds could be reached using <a href="http://en.wikipedia.org/wiki/TCP_Offload_Engine"> TOE</a> or 10Gbit cards (where TOE is implied).  Even better speeds might be reached using <a href="http://en.wikipedia.org/wiki/Infiniband">Infiniband</a> as its RDMA abilities may free CPU resources. Note Lustre does support Infiniband and can support both Infiniband and TCP networks at the same time. </p>
<p>
The example code for MPI_File_write() was taken from: <a href="http://beige.ucs.indiana.edu/I590/node86.html#2661">Beige.ucs.indiana.edu</a>.</p>
<pre>
module swap pgi/7.2
module swap openmpi/1.2.6-pgi
mpicc mkrandfiles.c

lfs setstripe data14 0 -1 -1
mpirun -np 40 ./a.out -f data14 -l 300

longest_io_time       = 82.115636 seconds
total_number_of_bytes = 50331648000
transfer rate         = 584.541535 MB/s
</pre>
<p>Future hopes is that MPI-IO ability of this scale will allow new forms of research to be done on our clusters.  I expect in the Winter term to teach a course on using <a href="http://hdf.ncsa.uiuc.edu/HDF5/">HDF5</a> parallel IO abilities to researchers.
</p>
<p>Questions: brockp@mlds-networks.com
</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mlds-networks.com/components/com_mojo/wp-feed.php?feed=rss2&amp;p=46</wfw:commentRss>
		</item>
		<item>
		<title>How your Hybrid was Paid for by the Lower/Middle Class</title>
		<link>http://www.mlds-networks.com/index.php/component/option,com_mojo/Itemid,29/p,45/</link>
		<comments>http://www.mlds-networks.com/index.php/component/option,com_mojo/Itemid,29/p,45/#comments</comments>
		<pubDate>Wed, 13 Aug 2008 05:08:55 +0000</pubDate>
		<dc:creator>brockp</dc:creator>
		
	<category>Uncategorized</category>
	<category>Finance</category>
		<guid isPermaLink="false">/?p=45</guid>
		<description><![CDATA[Some thoughts on government intervention that I am sure most people at face value would say is great really distributes wealth from the lower and lower middle classes to the upper middle and high class.
It was all done with Hybrid cars.  Turns out our all wise government decided we need to have tax credits [...]]]></description>
			<content:encoded><![CDATA[<p>Some thoughts on government intervention that I am sure most people at face value would say is great really distributes wealth from the lower and lower middle classes to the upper middle and high class.</p>
<p>It was all done with Hybrid cars.  Turns out our all wise government decided we need to have tax credits when buying a new hybrid car (<a href="http://www.hybridcars.com/federal-incentives.html">Reference</a>). At face value sounds great, this will encourage production of hybrids because it will lower the cost to get a hybrid thus increasing the number of people who can afford them.  It gets support from green/auto/union etc. </p>
<p>What is the side effect of this policy though?  Turns out in absolute terms it&#8217;s not all great. Hybrids being new and complicated are kinda pricey.  Cheapest base model I found was around$20,000 (<a href="http://www.eartheasy.com/live_hybrid_cars.htm">2008 Toyta</a>).  This is out the range of price for a regular house hold. Remember the median house hold income is around $47,000/yr and that&#8217;s on average a family including kids.  So the only people who can really afford these cars are upper middle class and wealthy people. </p>
<p>So this is where the short change comes in.  Only wealthy people can afford these cars and thus they get the tax break.  Everyone pays taxes though.  So everyone pays in to provide these breaks but only those who are rich can get them.</p>
<p>Now there is some technical arguments against this, mostly based that we run a deficit and thus the whole does not make up for these credits, its just financed.  In general thought though when was the last time you saw someone say we could raise the taxes on the rich (roll back the Bush tax cuts) also say and get rid of that tax breaks on hybrids.  I best most who want to do the first would not want to do the second. In the end it&#8217;s all money.</p>
<p>Just trying to keep people honest.  If anyone can think of a reason why we should do such things please try to convince me.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mlds-networks.com/components/com_mojo/wp-feed.php?feed=rss2&amp;p=45</wfw:commentRss>
		</item>
		<item>
		<title>What we spend 1/3rd of our life doing</title>
		<link>http://www.mlds-networks.com/index.php/component/option,com_mojo/Itemid,29/p,44/</link>
		<comments>http://www.mlds-networks.com/index.php/component/option,com_mojo/Itemid,29/p,44/#comments</comments>
		<pubDate>Sun, 10 Aug 2008 05:37:06 +0000</pubDate>
		<dc:creator>brockp</dc:creator>
		
	<category>Uncategorized</category>
	<category>Finance</category>
		<guid isPermaLink="false">/?p=44</guid>
		<description><![CDATA[Ok random thought time.  What do we spend 1/3rd our life doing?  We engage in an economy. We work, we buy, we sell.  When I sit back and think about it of the time we spend living engaging in some for of the national economy is quite large.  I also think [...]]]></description>
			<content:encoded><![CDATA[<p>Ok random thought time.  What do we spend 1/3rd our life doing?  We engage in an economy. We work, we buy, we sell.  When I sit back and think about it of the time we spend living engaging in some for of the national economy is quite large.  I also think its the one thats most fragmented.  I don&#8217;t think people think about the connection between work, buying, and freedom.
</p>
<p>In the last 50 years or so the government has had more and more control over our lives by getting involved with the economy. Organizations like Fannie Mae and the FDA in their own way intended to help us have caused problems all their own. I am now going to regurgitate thoughts onto paper so hang on.
</p>
<p>How does Fannie Mae help us? How does Fannie Mae hurt us?  While I think Fannie helps some with keeping the cost of owning a home lower (see my older post, about how I don&#8217;t know if this is a good thing or not).  The problem it can cause is worse than the benefit.  Many of these organizations are put in place as though they can never fail. Problem is anything built by man can fail and does given the right means. Problem is when a government entity fails it can threaten to really hurt a market.  Notice I don&#8217;t say take down.  You can every kill a market, as long as there is two free people each with something the other wants the market will work. Its more a question how well it would work.</p>
<p>Take large banks. Right now we see many of them having trouble because they got over their head with sub prime loans, just like Fannie Mae did in the name of &#8216;promoting home ownership&#8217;.  Not all the banks did though.  There are many banks, many very large when one of them fails it hurts.  It is not possible for all banks to fail, there is just to many.  So even in the worst kind of misbehavior the damage is limited.  Now lets look at similar government organizations.  While Fannie Mae and Freddie Mac both service the same purpose they are only two institutions and its much simpler for 2 to screw up and fail than many.  Its just the statistics of it.  We now find our self&#8217;s where the government has to bail them out at a huge cost.  The bottom and consequences because Fannie Mae exists is <b>worse</b> than if they were never created in the first place. Without them blood would have been split, oh the initial shock might have been greater but how far the market could fall should government not do anything would be much less.</p>
<p>
Because government got involved in the private property business, that is encouraging the owner ship of one type of propery (real estate). The problem can and are worse.  Then the amazing thing happens that only happens in government which I will go on to explain next.</p>
<p>All institutions have intentions and results.  The hard part is getting from the first (intention) to the second (results) and having them be equal.  In the market if you get to far from results equaling intention you go under.  People loose there jobs (that&#8217;s great) and the organizations that do make it from intention to results grow. Those who worked at the now failed organization are freed to work in more effeicnt and successful organizations.  That is the market is self correcting though it is not kind. In the end we are all better off.</p>
<p>Government instutions also have intentions and results.  The change here is that when the results don&#8217;t land near intentions instead of failing more often money and resources and authority (new power invoked by congress) in added making the failed beast even bigger.  This is what amazes me.  Fannie Mae screwed up, is put its fingers into the wrong pot and got bit, in a working market they would be eaten up by those more prudent and better heads would lead from then on.  Instead the same people will be taking money from tax payers (who are now bearing the risk of a publicly traded company no the share holders).  To keep them up and even lax the rules on what Fannie is allowed to buy!  Does this make any sense? </p>
<p>Fannie is just one example, there are many more, find one time where a government organization has failed and the leading body, be it congress or local board, say &#8220;shut it down, it failed&#8221;.  No we keep trying with the same broken POS.!  Tear it down, maybe try something radically new but judge them on their results and admit failure.  I think history shows more often failure by such organizations than success.
</p>
<p>In the end, again I feel my brain pushing me more and more towards Ben Franklyn style thinking. It is true I think, that the Libertarians are the closest thing we have to our founding fathers.  If I was ever asked if I would rather be a man like the founding fathers or the current crop of leaders, my answer would be quick and simple.  </p>
<p>End Rant.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mlds-networks.com/components/com_mojo/wp-feed.php?feed=rss2&amp;p=44</wfw:commentRss>
		</item>
		<item>
		<title>Xen acm.so: undefined symbol: Py_InitModule4</title>
		<link>http://www.mlds-networks.com/index.php/component/option,com_mojo/Itemid,29/p,43/</link>
		<comments>http://www.mlds-networks.com/index.php/component/option,com_mojo/Itemid,29/p,43/#comments</comments>
		<pubDate>Sun, 10 Aug 2008 03:01:02 +0000</pubDate>
		<dc:creator>brockp</dc:creator>
		
	<category>mlds</category>
		<guid isPermaLink="false">/?p=43</guid>
		<description><![CDATA[When doing some updates on a few servers I have that run Xen I ran into a few errors. This is just some notes about how to fix the problem


After doing the updates I noticed that the Xen proceses were not started up and that none of the Xen commands work.  The errors being:


 [...]]]></description>
			<content:encoded><![CDATA[<p>When doing some updates on a few servers I have that run <a href="http://www.xen.org">Xen</a> I ran into a few errors. This is just some notes about how to fix the problem
</p>
<p>
After doing the updates I noticed that the Xen proceses were not started up and that none of the Xen commands work.  The errors being:
</p>
<pre>
  File "/usr/lib/python/xen/util/security.py", line 25, in <module>
    from xen.lowlevel import acm
ImportError: /usr/lib/python/xen/lowlevel/acm.so: undefined symbol: Py_InitModule4
</module></pre>
<p>
The reason for this is python, which Xen relies on heavily was updated from 2.4 to 2.5. I had the old Xen source around so I could rebuild xen but I did not want to take the risk of borking the systems which are physically over an hour drive away. Turns out though all you need to update is the xen user space tools and deamon. The hypervisor and kernel do not need to be rebuilt.
</p>
<pre>
cd xen-3.0.4-1
make clean
make tools
make install-tools
/etc/init.d/xend restart
</pre>
<p>
That is all.  Quite simple.  Note these are old machines used for testing, do not run such an old version of Xen.
</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mlds-networks.com/components/com_mojo/wp-feed.php?feed=rss2&amp;p=43</wfw:commentRss>
		</item>
		<item>
		<title>I hate my mac</title>
		<link>http://www.mlds-networks.com/index.php/component/option,com_mojo/Itemid,29/p,42/</link>
		<comments>http://www.mlds-networks.com/index.php/component/option,com_mojo/Itemid,29/p,42/#comments</comments>
		<pubDate>Tue, 05 Aug 2008 01:28:14 +0000</pubDate>
		<dc:creator>brockp</dc:creator>
		
	<category>Uncategorized</category>
		<guid isPermaLink="false">/?p=42</guid>
		<description><![CDATA[Not all the time, just every time I try to do something it fights me. I was asked to create some videos like I have done in the past.  Problem is to deal with the heating problems Apple decided to just crank the speeds up on the fans. So now even sitting idle my [...]]]></description>
			<content:encoded><![CDATA[<p>Not all the time, just every time I try to do something it fights me. I was asked to create some videos like I have done in the past.  Problem is to deal with the heating problems Apple decided to just crank the speeds up on the fans. So now even sitting idle my fans run at 4000-5000RPM and doing anything they jump to 6000.  Point being my microphone (used to dub voice over the videos) is picking up all this racket. New mic in my future.  </p>
<p>/rant
</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mlds-networks.com/components/com_mojo/wp-feed.php?feed=rss2&amp;p=42</wfw:commentRss>
		</item>
		<item>
		<title>DGEMM() The Function every HPC user should know</title>
		<link>http://www.mlds-networks.com/index.php/component/option,com_mojo/Itemid,29/p,41/</link>
		<comments>http://www.mlds-networks.com/index.php/component/option,com_mojo/Itemid,29/p,41/#comments</comments>
		<pubDate>Fri, 01 Aug 2008 00:15:02 +0000</pubDate>
		<dc:creator>brockp</dc:creator>
		
	<category>hpc</category>
		<guid isPermaLink="false">/?p=41</guid>
		<description><![CDATA[Its true DGEMM() should be used by everyone!  What should be noted though is if your problem does not require doubles then use SGEMM()!!!
This post is a reply from an email I received from a student. On my post about CUBlas running on Nvidia graphics cards. 
The Quesion

How can I run the Dgemm on [...]]]></description>
			<content:encoded><![CDATA[<p>Its true DGEMM() should be used by everyone!  What should be noted though is if your problem does not require doubles then use SGEMM()!!!</p>
<p>This post is a reply from an email I received from a student. On my post about CUBlas running on <a href="http://www.nvidia.com/object/tesla_c1060.html">Nvidia</a> graphics cards. </p>
<h2>The Quesion</h2>
<p><strong><br />
How can I run the Dgemm on GPUs, how to support the dgemm() on GPUs which only support the single precision. Thanks, if you have done that, could you please share the code with me? Thanks<br />
</strong></p>
<p>
The answer is you can&#8217;t. The current crop of cards including those branded for HPC only use like the <a href="http://www.nvidia.com/object/tesla_computing_solutions.html"> Tesla 8</a> cards can only do single. Reason for this is simple. Graphics did not require it and the first generation HPC targeted cards were really re-branded workstation cards. There is also an added real cost, DOUBLE requires twice the number of transistors. Many modern CPU&#8217;s when running a code in SINGLE will run it twice as fast because it takes a DOUBLE register breaks it in half so it can work on twice as many numbers at a time.  IE real cost savings in performance and silicon.
</p>
<p>
Now there was some talk in PLASMA (<a href="http://sti.cc.gatech.edu/Slides/Kurzak-070618.pdf">Parallel Linear Algebra for Multicore</a>) about swapping in a and out of SINGLE and DOUBLE.  Look on page 5 for an example, in this case it was used on the <a href="http://en.wikipedia.org/wiki/Cell_microprocessor"> IBM Cell BE</a>.  The Cell reaches over 100 GFlop in single but instead of dropping performance by half for double it dropped to 14 GFlop! Thus the reasoning behind PLASMA&#8217;s swapping in and out of SINGLE. NOTE: This was resolved in the <a href="http://en.wikipedia.org/wiki/Cell_microprocessor#PowerXCell_8i_Variant"> Power XCell 8i </a> CPU used on Road Runner.  I still think this is a great idea.
</p>
<p>
So the full answer to run DGEMM() (Double Generic Matrix Multiply) on current graphics cards is you can&#8217;t.  But don&#8217;t give up hope, keep working with CUBLas only use DOUBLE if you need to. Even then keep CUBlas and similar projects in mind the Nvidia <a href="http://www.nvidia.com/object/tesla_c1060.html"> Tesla 10 </a> series introduced DOUBLE support which should make all this pain go away.
</p>
<p>
Short term solutions to high performance DGEMM() is <a href="http://www.tacc.utexas.edu/resources/software/#blas"> Goto Blas</a>.  There is a threaded version which can take advantage of things like Multicore and SMP.  I have gotten my best HPL numbers using Goto and it is a great tool.  If your not using a common platform Goto was built for, the vendor provided libraries (ESSL ACML MKL etc.) work great and many are threaded.  Core 2 and Barcelona with modern BLAS libs are twice as fast per clock than previous generations.
</p>
<p> sorry there is no good solution, but I hope I gave you enough tools to get buy till Tesla 10 hardware is out. If you have questions email me, also for getting my hands dirty I am available for consulting.<br />
Comments welcome.</p>
<p>Brock E. Palen</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mlds-networks.com/components/com_mojo/wp-feed.php?feed=rss2&amp;p=41</wfw:commentRss>
		</item>
		<item>
		<title>Observation of how things should be done</title>
		<link>http://www.mlds-networks.com/index.php/component/option,com_mojo/Itemid,29/p,40/</link>
		<comments>http://www.mlds-networks.com/index.php/component/option,com_mojo/Itemid,29/p,40/#comments</comments>
		<pubDate>Fri, 18 Jul 2008 22:45:46 +0000</pubDate>
		<dc:creator>admin</dc:creator>
		
	<category>Uncategorized</category>
		<guid isPermaLink="false">/?p=40</guid>
		<description><![CDATA[This vacation I spent a few days with the people of the Marine Mammal Center.  I will defer to their website to see what they do.  In general this is a non-profit that goes out and saves/rehabs sick/injured marine mammals. I was lucky enough to engage in both the care of animals currently [...]]]></description>
			<content:encoded><![CDATA[<p>This vacation I spent a few days with the people of the <a href="http://www.marinemammalcenter.org">Marine Mammal Center. </a> I will defer to their website to see what they do.  In general this is a non-profit that goes out and saves/rehabs sick/injured marine mammals. I was lucky enough to engage in both the care of animals currently in their care yet to be released and pick up a newly found stranded Sea Lion.
</p>
<p>
Now those of you who know me know I am a big believer in small government and that most (if not all) giving should be done by private organizations funded by private people and staffed by private people.
</p>
<p>
All I can say is the MMC is a <b>wonderful</b> example of this.  This organization if fully (short 3% provided research grants) funded by private donations.  The MMC is very efficient.  They have to be, there isn&#8217;t any magic money being taken from tax payers. Because of this they have to show results. They have a wonderful public education program which probably saves more seals from mistakes made by the public than their active recover saves each year.  They have a very active community of volunteers many of which have been there 5 years or more.  If you have every worked in a like organization you will know that keeping people around so long is very hard. The MMC is doing a wonderful job of this.  They also do a great job training new people in the care of like animals to do this else where.</p>
<p>
In short the MMC does its job very very well.  Better than any similar case that relies on government money or keeping a single Senator/Politician happy. They respond to their own wish to see the animals thrive and work to do that in the be way they can.
</p>
<p>
It is to me a model of the way charity in the United States should be done.  Private people working towards their private interest in protecting and helping these animals. They have a wonderful new facility being built that is fully funded by private persons that will enable them to help yet more animals.  They are grateful for these people who have provided the funds and are also grateful for all those who while they may not donate cash donate their time and their backs.
</p>
<p>
Just a wonderful example, wonderful.  I have never worked with any government organization that could even compare to how far a dollar or man hour goes.  I would say they are at least 3x&#8217;s as efficent as the next government ran program.
</p>
<p>
Everyone at the MMC thank you very much and for putting up with my questions. I wish you the best in your efforts, and yes I think the Elles are very cute and lazy.
</p>
<p>
Remember they run on private donations do what you can:  <a href="http://www.marinemammalcenter.org/waystogive/donate.asp"> Donate</a></p>
<p>Source of income for operations data:<br />
<a href="http://www.marinemammalcenter.org/about_us/financials.asp">http://www.marinemammalcenter.org/about_us/financials.asp</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.mlds-networks.com/components/com_mojo/wp-feed.php?feed=rss2&amp;p=40</wfw:commentRss>
		</item>
		<item>
		<title>CUDA Blas</title>
		<link>http://www.mlds-networks.com/index.php/component/option,com_mojo/Itemid,29/p,39/</link>
		<comments>http://www.mlds-networks.com/index.php/component/option,com_mojo/Itemid,29/p,39/#comments</comments>
		<pubDate>Mon, 14 Jul 2008 19:12:45 +0000</pubDate>
		<dc:creator>brockp</dc:creator>
		
	<category>hpc</category>
		<guid isPermaLink="false">/?p=39</guid>
		<description><![CDATA[I am a huge fan of BLAS and I wish more people used it.  I have not figured any numbers out but I think U of M (My day job) might waste enough money per year to fund a position to teach faculty and researchers to use BLAS, CUDA and NAG/IMSL.  For example [...]]]></description>
			<content:encoded><![CDATA[<p>I am a huge fan of BLAS and I wish more people used it.  I have not figured any numbers out but I think U of M (My day job) might waste enough money per year to fund a position to teach faculty and researchers to use BLAS, CUDA and NAG/IMSL.  For example it is easy to show how on the same hardware using DGEMM() vs some DO loops can go 10&#8242;x faster while consing the same capital resources (computer, facility space) and consumables (Power, Cooling).  This problem will only get worse as more and more research computing happens at the University.
</p>
<p>So once you have users using BLAS and friends what is the next big leap in performance one can extract from hardware? Nvidia has a great option called CUBlas.  It is part of their CUDA kit for doing general computing work on Nvidia graphics cards. </p>
<p>I was able to port a simple matrix multiply code to CUBlas in an hour, so most codes that depend on this should find they can use CUBlas quickly and easly.</p>
<p>Calculations on graphics cards have to be done from the video buffer memory on the card.  In my case this was 256MB so I could not do very large problems. This should not be much of an issue as it is easy to copy data from the host memory (ram) to the card memory (video buffer).  Also for larger problem I think porting some of the out-of-core ways of running problems like <a href="http://www.pardiso-project.org/">Pardiso</a> and DGEMM() could be implemented on GPU&#8217;s using the system RAM where we used disk in the past and treading the card memory as the ram of old.</p>
<p> Ok technical details, basic form was allocate memory on the card, copy from host memory to card memory, call the function and copy results back: </p>
<pre>
//pointers to memory on the card
float* d_A = 0;
float* d_B = 0;
float* d_C = 0;

CUstatus = cublasInit();

CUstatus = cublasAlloc(DIM*DIM, sizeof(d_A[0]), (void**)&#038;d_A);
CUstatus = cublasAlloc(DIM*DIM, sizeof(d_B[0]), (void**)&#038;d_B);
CUstatus = cublasAlloc(DIM*DIM, sizeof(d_C[0]), (void**)&#038;d_C);

CUstatus = cublasSetVector(DIM*DIM, sizeof(a[0]), a, 1, d_A, 1);
CUstatus = cublasSetVector(DIM*DIM, sizeof(b[0]), b, 1, d_B, 1);
CUstatus = cublasSetVector(DIM*DIM, sizeof(c[0]), c, 1, d_C, 1);

cublasSgemm('n','n', M, N, K, alpha, d_A, lda, d_B, ldb, beta, d_C, ldc);

CUstatus = cublasGetError();

//copy back
CUstatus = cublasGetVector(DIM*DIM, sizeof(c[0]), d_C, 1, c, 1);

//free memory on the card
CUstatus = cublasFree(d_A);
CUstatus = cublasFree(d_B);
CUstatus = cublasFree(d_C);

//shutdown cublas
cublasShutdown();
</pre>
<p>
Quite simple, its all a C/FORTRAN library so nothing special other than an Nvidia card and the CUBlas library.  If you want the full source email me at: <a href=mailto:brockp@mlds-networks.com>brockp@mlds-networks.com</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.mlds-networks.com/components/com_mojo/wp-feed.php?feed=rss2&amp;p=39</wfw:commentRss>
		</item>
		<item>
		<title>GPGPU&#8217;s</title>
		<link>http://www.mlds-networks.com/index.php/component/option,com_mojo/Itemid,29/p,38/</link>
		<comments>http://www.mlds-networks.com/index.php/component/option,com_mojo/Itemid,29/p,38/#comments</comments>
		<pubDate>Mon, 14 Jul 2008 18:54:25 +0000</pubDate>
		<dc:creator>brockp</dc:creator>
		
	<category>hpc</category>
		<guid isPermaLink="false">/?p=38</guid>
		<description><![CDATA[One of the hot things in HPC right now is GPGPU&#8217;s. The general idea is to use graphics cards with their very high memory bandwidth and massive parallel ALU to do general computation.  Think math on graphics chips. This is a great idea because graphics companies have the scale of the consumer market to [...]]]></description>
			<content:encoded><![CDATA[<p>One of the hot things in HPC right now is <a href="http://www.gpgpu.org/">GPGPU&#8217;s</a>. The general idea is to use graphics cards with their very high memory bandwidth and massive parallel ALU to do general computation.  Think math on graphics chips. This is a great idea because graphics companies have the scale of the consumer market to keep prices down and innovation up.  Jezz I love capitalism. The performance of these cards is much much higher than a general purpose Intel or AMD cpu, and are even higher than could be had out of many purpose build accelerators like those from <a href="http://www.clearspeed.com/">Clear Speed</a> and available at a lower cost. </p>
<p>Now GPGPU is not without its problems. Most the problems though are slated to be resolved by <a href="http://www.nvidia.com/object/tesla_computing_solutions.html"> Nvidia</a> and <a href="http://ati.amd.com/products/Radeonhd4800/index.html">ATI/AMD</a>. These problems are:</p>
<ul>
<li>Lack of DOUBLE support</li>
<li>Lack of simple programming interfaces</li>
<li>Lack of standard interface from both major vendors</li>
<li>Lack of large memory space</li>
<li>Lack of Scheduling</li>
</ul>
<p>
Most serious HPC application requires DOUBLE support.  Current cards only support SINGLE but the Nvidia testla 10 cards now support double and ATI sees this also.  So this will be solved soon.  Yes all my MD (protein folding) folks will tell me we don&#8217;t need DOUBLE but they will get in fists fights over it with my FEA (Meshing) friends.  I of course need to support both on the clusters.  In any point this is solved.
</p>
<p>
Programming API&#8217;s are in the works. Nvidia has the wonderful CUDA which I will write about latter. The worst part of cuda is all the hard work it involves that the average PHD candidate will not understand. Currently they can&#8217;t program FORTRAN or C effecntly CUDA asks for so much more.
</p>
<p>
CUDA does have CUBlas which I can&#8217;t stress how much I love it.  CUBlas allowed me in an hour convert the heavy work portion of a code I had to use the graphics card  quickly and no special NVCC compiler.  I was quite happy.  This is what I think should be done.  I have always said most people should just make their code into and LU problem and then use BLAS to Factor it.  Well all of BLAS and LAPACK (PLASMA) should be implimented in CUDA and be a library that fortran and C programmers link against.  Easy right?  Well simpler than writing CUDA still harder than raw C or FORTRAN but the benefits are huge.  I really hope to see Nvidia and ATI implement PLASMA.
</p>
<p>I will cover the other points when I write about using CUBlas.  I will focus now on why I think PLASMA should be used by Nvidea over regular BLAS.
</p>
<p>
PLASMA (Parallel Linear Algebra for Multi-core) was made mostly for cpus like the CELL BE from IBM. The first rendition of the CELL while it could do DOUBLE performed much better in SINGLE. PLASMA wanted to ease this pain by taking LAPACK and the parts of the operation that don&#8217;t require DOUBLE just be ran in single.  Extracting the full performance of the CELL cpu.  Now that the Cell BE has full DOUBLE support it acts more like regular cpus in that the cpu is only half as fast at DOUBLE vs SINGLE.
</p>
<p>
This is still twice as fast though!  Why don&#8217;t we do this in all math libraries?  As long as changing between types (DOUBLE-> SINGLE,  SINGLE->DOUBLE) is cheap we should do this.  I think this matter more and more on larger systems because going from 1Gflop to 2 Gflop might not matter as much but going from 1Tflop to 2 Tflop is a big deal. Because GPU&#8217;s are so much faster and the market that pushes their development (the consumer graphics market) does not require DOUBLE the more that the HPC world can leverage SINGLE the better and the speed bonus to boot.
</p>
<p>
Please post any comments and questions, or email me at brockp@mlds-networks.com
</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mlds-networks.com/components/com_mojo/wp-feed.php?feed=rss2&amp;p=38</wfw:commentRss>
		</item>
		<item>
		<title>Thoughts on Government Housing</title>
		<link>http://www.mlds-networks.com/index.php/component/option,com_mojo/Itemid,29/p,37/</link>
		<comments>http://www.mlds-networks.com/index.php/component/option,com_mojo/Itemid,29/p,37/#comments</comments>
		<pubDate>Sat, 28 Jun 2008 03:43:04 +0000</pubDate>
		<dc:creator>brockp</dc:creator>
		
	<category>Finance</category>
		<guid isPermaLink="false">/?p=37</guid>
		<description><![CDATA[You may think I am going to talk about welfare, well I am and I am not. I am going to talk about welfare for the middle/Upper classes.
Recent episodes of Econ Talk have made me think harder about some of the things the government does.
The Federal Government subsidizes people who own homes. Yes people who [...]]]></description>
			<content:encoded><![CDATA[<p>You may think I am going to talk about welfare, well I am and I am not. I am going to talk about welfare for the middle/Upper classes.</p>
<p>Recent episodes of <a href="http://www.econtalk.org">Econ Talk </a>have made me think harder about some of the things the government does.</p>
<p>The Federal Government subsidizes people who own homes. Yes people who own homes. It is called the intrest deduction. The Federal Government allows those who itemize to deduct any interest paid on their mortgage.</p>
<p>For many this reduces their taxable income by $10,000 or more a year.  My problem is not that there is a deduction (I like the flat tax, or negative income tax).  I can&#8217;t for the life of me figure out why you would really want to promote home ownership?  Really what does it provide?  There is the usual argument that people care for it better etc.  I don&#8217;t think this is it.  I think the benefit goes to the state.  If this is the case just give some fraction of income taxes to the states!
</p>
<p>
Here is why I think that.  I think people think very differently about renting vs. owning.  While currently I am willing to rent a small two bedroom apartment and keep a roommate.  I am not willing to buy a similar kind of property.  What this does is cause a demand for more high quality housing (no one wants to buy rent quality housing).  Because there is more high quality housing property taxes are higher per-capita than in a case where there is not a deduction (incentive to own) on your house.
</p>
<p>
While I think that is the case I do not think that <b>is</b> the case.  Because the deduction on mortgage interest has been around for so long people expect it.  I doubt many would have bought as an expensive house if they didn&#8217;t expect to get 1/3rd of their interest back from the government. Thus while it might have helped some when first put in place because the prices of real estate would not have adjusted quickly.  By now I am sure the benefit of the deduction is baked into prices of all houses now.  I don&#8217;t think many could even afford their homes without the deduction.  I am almost certain though that if there was no deduction that exact same house&#8217;s price would have been lower by around the amount of the deduction.
</p>
<p>
So why do these prices rise to match all added benefits to go along with it. Well the first rule I think to prices adjusting to match something like the deduction is that it must be long term.  Prices must be allowed enough time to float to absorb the benefit.  Thus a coupon would not have the same affect on prices.  A good example is Art Van.  Here in Michigan Art Van is always advertising a sale huge amounts off sticker.  I have come to think that the sale price is the sticker price and that the &#8220;sticker price&#8221; is just some made up number.  The rest is marketing.  On the other hand when another store has its once a year sale I see value in the sale, Not so much for Art Van and its sales 100% of the time.
</p>
<p>
We are simple animals and like a deal, but we all fall victim to the what have you done for me today.  Because the deduction has been around for so long we don&#8217;t care any more.  Its just extra hoops we will jump into because we have to because the price of houses require it.
</p>
<p>
The biggest point I hope anyone who reads this (if you do please comment) is that the market will adjust to give housing of X type to someone who produces Y output.  The price of the house X will adjust to equal some % of the workers time who has output Y.  Thus is the market says programmers should be above average earnings they should have above average housing and the price will reflect all deductions and market realities.  People will trade some amount of their Time (read output from work) for housing.  And that time will be replaced with a house that takes a builders time, and a loggers time etc.  The addition of the deduction does not help in any persons ability to acquire it in the long run.  The price of the house will go up (from its deduction free price) to match the deduction.  In the end the programmer gets the house a programmer is willing to trade his output for.
</p>
<p>
I think this idea of we trade time and effort for the output of others (they trade for our output time and effort) and that we always come out to equilibrium is what a large number of people don&#8217;t understand.  And that any addition/subtraction from the cost will just be subtracted/added back in over time.  So that the amount of output is equal.
</p>
<p>
I will close, I <b>really</b> hate inflation, fed go jump in a creak.
</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mlds-networks.com/components/com_mojo/wp-feed.php?feed=rss2&amp;p=37</wfw:commentRss>
		</item>
	</channel>
</rss>
