Using Index’s to make Gold Fast
We use Gold an allocation manager built by Cluster Resources to collect data on ran jobs. We don’t currently use it for allocation management.
While Scott Jackson did a great job creating gold, it was not built for speed. He backed Gold’s data store with the PostgresQL (PSQL) database but did not create any indexes on the database. This caused PSQL to run very sub optimally. Once Gold started showing some performance problems back a year or so ago we put some work into speeding up gold. This work was added to Gold 2.1.5.0. This solved our problems for a while, till now.
Gold was running a large number of quires of the form:
SELECT g_reservation_allocation.g_id,g_reservation_allocation.g_amount FROM g_reservation, g_reservation_allocation WHERE ( g_reservation.g_id=g_reservation_allocation.g_reservation AND g_reservation.g_start_time< ='1226009628' AND g_reservation.g_end_time>‘1226009628′ AND g_reservation_allocation.g_account=’2′ ) AND g_reservation.g_deleted!=’True’ AND g_reservation_allocation.g_deleted!=’True’;
This query even with index’s on a few of the WHERE conditions PSQL was doing a sequental scan. Using the ‘EXPLAIN ANALYZE’ syntax we see:
Nested Loop (cost=0.00..53411.23 rows=1 width=12) (actual time=2878.228..18789.857 rows=502 loops=1)
-> Seq Scan on g_reservation_allocation (cost=0.00..47592.58 rows=1657 width=16) (actual time=2870.689..17636.409 rows=928 loops=1)
Filter: ((g_account = 2) AND ((g_deleted)::text <> ‘True’::text))
-> Index Scan using g_id_idx2 on g_reservation (cost=0.00..3.50 rows=1 width=4) (actual time=1.238..1.239 rows=1 loops=928)
Index Cond: (g_reservation.g_id = “outer”.g_reservation)
Filter: ((g_start_time < = 1226009628) AND (g_end_time > 1226009628) AND ((g_deleted)::text <> ‘True’::text))
Total runtime: 1707.245 ms
So even with the indexes PSQL is not optimizing the where condition. The worse being the check for !=g_delted.
As you can see an 1.7 second query is way to long, if we want to start 100’s of jobs a minute. To speed this up we found that PSQL lets indexes have WHERE conditions. These conditions lets PSQL know right away where to find data where g_delted!=’True’.
create index g_reservation_acct_idx ON g_reservation_allocation (g_account) WHERE g_deleted!='True'; create index g_id_where_idx ON g_reservation (g_id) WHERE g_deleted!='True';
The first index lowered the query time to around 16ms. The second index is chosen over a regular index just because it is a smaller search area. It was really added for another common slow query that Gold does. Thus it only speeds up the above query to about 8.5ms.
Hash Join (cost=1462.31..1546.42 rows=1 width=12) (actual time=3.846..7.951 rows=502 loops=1)
Hash Cond: ("outer".g_reservation = "inner".g_id)
-> Index Scan using g_reservation_acct_idx on g_reservation_allocation (cost=0.00..75.85 rows=1649 width=16) (actual time=0.020..2.175 rows=928 loops=1)
Index Cond: (g_account = 2)
-> Hash (cost=1462.02..1462.02 rows=117 width=4) (actual time=3.763..3.763 rows=502 loops=1)
-> Index Scan using g_id_where_idx on g_reservation (cost=0.00..1462.02 rows=117 width=4) (actual time=0.134..2.876 rows=502 loops=1)
Filter: ((g_start_time < = 1226009628) AND (g_end_time > 1226009628))
Total runtime: 8.492 ms
So our performance per/query went from 1600ms to ~10ms. Not bad, now we can start 100’s of jobs per minutes. So why was this performance so much better? The reason our performance started to become a problem the second time around was due to how many jobs we had run. g_reservation_allocation had 1,649,899 records. Not huge but big enough that if your doing a SecScan like PSQL was doing running over a gradually growing table would get slower and slower. Having an index of only valid jobs (g_delted!=True) gives PSQL an oprotunity to know exactly very quickly where to look on disk. Of which there was only 928 records. This number will only change with number of running jobs. Which will grow over time as we add hardware, or we run more serial jobs. In any case it will grow much slower than the total number of jobs ever ran. We should hit 2 million records very soon.
Moral is, if your using Gold add the above index’s. It will help you out greatly. You can find my email notifying Gold Users here: http://www.supercluster.org/pipermail/gold-users/2008-November/000123.html. I hope Scott adds this index to the next release of Gold so you don’t have to add it your self.
You can always email me questions at: brockp@mlds-networks.com

