Quantcast

Archive for June, 2010

Overlay deploy timeline on Ganglia graphs

Monday, June 28th, 2010

Don't you sometimes wish you could have a visual indicator of when code has been deployed in production. Something like this :-)

Shows deploy time line on a load graph

This is how you can add deploy timeline to your Ganglia graphs or for that matter to any tool that uses RRDs such as Cacti, Munin, Collectd etc.

Background

RRDtool supports so called VRULEs which are

VRULE:time#color[:legend][:dashes[=on_s[,off_s[,on_s,off_s]...]][:dash-offset=offset]]

Draw a vertical line at time. Its color is composed from three hexadecimal numbers specifying the rgb color components (00 is off, FF is maximum) red, green and blue followed by an optional alpha. Optionally, a legend box and string is printed in the legend section. time may be a number or a variable from a VDEF. It is an error to use vnames from DEF or CDEF here. Dashed lines can be drawn using the dashes modifier. See LINE for more details.

What we want to do is add a VRULE for each deployment. For example those three lines above have been generated using these VRULEs

VRULE:1277731886#FF00FF:"Deploys" VRULE:1277721886#FF00FF VRULE:1277711886#FF00FF

Implementation

Easiest way to add these to Ganglia is to modify graph.php in Ganglia Web. You need to look for following two lines at the end of the file

$command .=  array_key_exists('extras', $rrdtool_graph) ? ' '.$rrdtool_graph['extras'].' ' : '';
$command .=  " $rrdtool_graph[series]";

Then append your own VRULEs ie.

$command .= " VRULE:" . $time . "#FF00FF:\"Deploys\"";

Obviously you have to pull in the $time info from where you keep track of your deploy times. You can also get creative by using different colors for different deploys, change legend labels, add VRULEs to only certain graphs ie. load, CPU etc. This is a quick and dirty way to do it

$deploy_times = array(1278082860,1279393200);
foreach ( $deploy_times as $key => $time ) {
  # Put deploys label only once.
  if ( $key == 0 )
     $command .= " VRULE:" . $time . "#FF00FF:\"Deploys\"";
  else
     $command .= " VRULE:" . $time . "#FF00FF";
}

Now you just have to make sure you append deploy times in the array.

Alternate implementations

Alternate implementation is to create a RRD file whenever you do deploys then overlay that graph on top of an existing graph. Trouble is you have to worry about scaling the graph. Never could get it quite right.

Credit

Thanks goes to the Circonus guys :-) since they made me think of vertical lines instead of trying the RRD overlay. Also thanks to @toredash for pointing me in the right RRDtool direction by suggesting HRULE.

Velocity Conference 2010 takeaways

Sunday, June 27th, 2010

Velocity 2010 was an excellent conference. Following are my takeways from the conference. There is tons more but following are some of the things that made a good impression and are likely not hard to do

Web performance optimization

Mobile performance optimization

Most of the recommendations have been taken off Maximiliano Firtman's Mobile Web High Performance. You can view slides here.

  • Avoid JQuery unless you really need it. Check out slide 90. It takes 1.8 seconds on iPhone and 4 seconds on Android to download and parse JQuery. Use mobile optimized frameworks such as baseJS and XUI
  • Avoid DNS lookups and minimize number of requests since they are slow
  • Embed CSS and Javascript on the home page. After onload download external CSS and JS.
  • Use inline images (slide 56) and pictograms
  • Avoid redirects
  • Use native constructs especially for Webkit browsers e.g. -webkit-text-stroke
  • Keynote announced their Mobile Testing tool for desktops that looks promising http://mite.keynote.com/

SSL/Security

  • According to Google SSL overhead these days is pretty minimal. Around 1% on today's servers.
  • Pet peeve about the presentation is they were advising everyone to use less secure key lengths ie. 1024 bits and RC4 cipher to improve performance. It is true that adding SSL to insecure connections is certainly an improvement but it should be qualified. E-mail probably fine. Financial sites probably bad.

Scalability

  • Hidden Scalability Gotchas in Memcached and Friends by Neil Gunther (author of Guerilla Capacity Planning) and Shanti Subramanyam discussed their findings around memcached. They used quantitative analysis to analyze different memcache versions. Based on their analysis using Neil's model memcache 1.4.5 has higher contention than 1.2.8.

Culture

GangliaView – automatically rotate Ganglia metrics

Wednesday, June 16th, 2010

GangliaView is a simple web app that allows you to automatically rotate selected Ganglia metrics. We use it to rotate key metrics with large graphs showing last hour and last day and smaller graphs showing last week and last month. A sample screen looks like this

GangliaView is derived from CactiView with a number of changes to make it work with Ganglia and removal of frames. You can download it from here

http://github.com/vvuksan/ganglia-misc

Non-Dell SSDs/drives not supported until Q2 2011

Wednesday, June 16th, 2010

I am writing up this post so perhaps I can save some poor sysadmin from chasing their own tales. If you ever receive following error message using PERC H700 or H800 controllers

Jun 15 14:00:17 db07 Server Administrator:  Storage Service EventID: 2335  Controller event log: PD 04(e0x20/s4) is  not supported:  Controller 0 (PERC H700 Integrated)
Jun 15 14:00:18  db07 Server Administrator: Storage Service EventID: 2334  Controller  event log: Inserted: PD 05(e0x20/s5):  Controller 0 (PERC H700  Integrated)
Jun 15 14:00:18 db07 Server Administrator: Storage  Service EventID: 2335  Controller event log: PD 05(e0x20/s5) is not  supported:  Controller 0 (PERC H700 Integrated)

It is due to following

http://www.standalone-sysadmin.com/blog/2010/04/dell-reverses-position-on-3rd-party-drives/

Please note this will not be fixed until Q2 2011.

Beauty of aggregate line graphs

Saturday, June 5th, 2010

If you saw a graph like this

90th percentile response time consolidated line graph

Would it mean anything to you :-) ? First time I was introduced to it I thought they were pointless since you couldn't really see much. That was until I saw something like this

Netstat consolidated line graph

This was was post release. Can you spot something wrong :-) ? Obviously color scheme is somewhat off in the last graph which we later reworked (visible in the top graph). We then have another set of graphs where you can drill down per host aggregations as we are running multiple Resin instances on the same machine so you could find the misbehaving instance.

You can make these graphs pretty easily by using Ganglia's custom report graphs. I will try and post some of the ones we use in next couple days.

For those wondering what is 90th percentile response time you can read my Monitoring your website performance via 90th percentile response time.


Switch to our mobile site