Quantcast

Archive for May, 2010

Devops homebrew part deux

Thursday, May 27th, 2010

This is the second part to the devops homebrew post.

I forgot couple things in my first post so here are couple other observations

Change is an ongoing process

All the changes I talked about in the first post took a long time. It took more than a year to get issues assessed, discussed, designed, implemented and tested so don't expect quick progress. It's like an open heart surgery where you don't have time stop everything and start from scratch.

No hardcoded paths

Perhaps this one should be obvious however it is really important to make the app relocatable ie. app should assume all the files it needs are within it's container. This means that every file reference should be relative to the base container directory e.g. all the WARs and configuration files should be placed in /run/base and startup script would pass that as a variable ie. -DBASEDIR=/run/base. Application should then use BASEDIR instead of /run/base.

Tools, tools, tools

One of the critical operations responsibilities is providing and building tools for use by other groups such as technical support, development, QA etc. This goes beyond using tools such as configuration management and deployment but also building tools that enable other groups to do their jobs more effectively. For instance at one job we used to interface to hundreds of external LDAP/IMAP sources for authentication/authorization purposes. This was fraught with problems since often these services would e.g. misconfigure firewalls (not whitelist the right IP), have expired or self-signed SSL certificates, use wrong LDAP base DNs etc. This would chew up a lot of professional services, dev and ops time since looking at the application logs often gave incomplete answers. Also it could take couple iterations to fix the problem chewing up even more time. We ended up building a simple web page that enabled professional services to quickly validate the service ie. does DNS resolve, can I open up a TCP connection to the target port, is SSL certificate expired etc. This greatly reduced work load and time to resolution. In another job technical support would often need production settings however due to compliance reasons couldn't have unfettered access to the systems. For them we built a web app that allowed read-only view to the needed settings. I'm sure you can think of other cases where little automation can yield you huge efficiencies.

Use underpowered QA environments

This may be controversial since lots of people are of the opinion that you should try to have as close to the exact replica of production in QA. This is true if you are doing performance tests however if you have an underpowered environment some issues are likely to crop up that otherwise wouldn't. It is very hard to simulate production load so having underpowered environments gives you valuable data points. For example our primary QA environment ran on couple virtualized servers with modest disk space allocation ie. 10 GB. On more than one occasion we caught serious code deficiencies when the growing query log (turned on in QA) triggered low disk space alerts. If we had bigger disks we may have missed these. This doesn't preclude having a separate environment just for running performance test just use the underpowered environment for everything else.

Dev vs ops

There is often conflict between dev and ops due to stereotypes, poor communication but very often misaligned business goals. For instance I have very often seen/experienced conflict with devs when they were under intense pressure to deliver a feature on a tight deadline. This often happens in startups that cater to large businesses, universities or government organizations where a large sales deal is contingent on a particular feature being implemented. It leads to poor implementation, QA, production issues etc. which coupled with poor division of labor causes frustration and resentment. Being woken up numerous times in the middle of night due to a production issue quickly wears people out. Therefore it is important to strike a balance between ops and dev goals and overall business goals.

One of the possible approaches is to get together and discuss following issues

  • Ops, dev and QA should jointly assess new product functionality and how it affects each of these groups. Very often product management and sales and marketing will discuss new features only with dev who may not appreciate the difficulty of certain ops decisions.
  • Division of responsibility - discuss whose responsibility is to fix things when they break. There is a spectrum here where ops can do first level troubleshooting then hand it off to developers to developers running and deploying in production and ops providing a supportive role running services and tools that enable the application
  • Off hours coverage - this is probably the most contentious one since no one likes being woken up at night however developers should be on hook for "pager duty". It doesn't have to be regularly but at least once in a while. That is really only way for them to walk in ops shoes. For some organizations this may be a non-issue since their stuff never breaks in off hours ;-) .
  • Ops should involve devs in running the production by educating them about monitoring and performance gathering systems so that they can see effect of their coding first hand. For instance you can implement "monitoring duty" where each week someone different from either dev or ops team is tasked to review performance metrics looking for things that are out of whack.
  • Discuss how you can make each other life's easier. There are always areas where you can complement each others skills and create something that helps everyone.
  • Most important don't forget that a dose of humility goes a long way :-) .

Vonage the new Baby Bell

Thursday, May 13th, 2010

It is sometimes amazing to me how new upstarts morph into their own arch enemies. Case in point is Vonage. For years I used to have Vonage service at home as a backup phone service. I was on a 500 minute plan for $14.99+taxes. This was a great plan for me as I didn't use the phone much. However at some point they decided that was too little money and they hiked up the price to $16.99 (something like that). It may seem like a small difference but I figured I may be better of elsewhere. I ended up switching to Galaxy Voice which I am using to this day since they had more flexible calling plans.

We recently expanded our office space and we needed a phone line added to a conference room. Since I had my old Vonage adapter at home I figured I would bring it and we'd use it. I thought it would be as easy as going to Vonage's web site, supplying the phone adapter ID and my credit card number and I would be set. It wasn't so. After entering the phone ID I got this message

The MAC address you entered is associated with an existing Vonage account. Please call our Customer Care department at 1-866-293-5676 for immediate assistance.

I called the number and spoke to someone in Customer service. This took about 20 minutes while the person kept re-asking for the same data and concluded that they couldn't help me and that I would have to talk to tech support. Tech support guy was equally unhelpful. Basically I could not activate a device that was ever used before since the system "knew" about it. Talk about having a piece of useless technological trash. At that point I was sufficiently frustrated to end the call. I tweeted about my experience and a day later I was contacted by Vonage's Twitter team about having someone at customer service contact me. I thought I'd give it a go. I got a call and this experience was not a whole lot better than the previous ones. Person kept asking me for my personal information including name, billing address, what was the credit card number I used for paying bills and the e-mail address I used. Since this was more than a year ago and I have dozens of e-mail addresses I said I couldn't remember. At that point I ended the call since I was sufficiently frustrated. I was willing to give these people money yet they were making me jump through all this hoops. I don't get it.

It occurred to me later that this was very similar to experiences that I had with a local phone company when I would move and I would have to get through all these bureaucratic hoops to make sure all my features stayed the same after I moved.

Installing RedHat 6 Enterprise DomU under Xen

Tuesday, May 11th, 2010

Recently I downloaded RedHat 6 Enteprise beta (RHEL6). I wanted to install it as a Xen guest (DomU) on top of an existing Centos 5 Xen host. Unfortunately it did not work out of the box. I ran

virt-install --prompt

on the Xen host which let me install RHEL6 however when the install rebooted I was greeted with this error message

fs = fsimage.open(file, get_fs_offset(file))
IOError: [Errno 95] Operation not supported

Fortunately Karanbir Singh had a blog post about this at

http://www.karan.org/blog/index.php/2010/04/28/rhel6-xen-domu-on-a-centos-5-dom0

Differences I found were that I had to make the root partition an ext2 filesystem as well. Also I found out that I couldn't review the partition layout if I ran the installation in the text mode. I had to use VNC to be able to set proper partition types.


Switch to our mobile site