Archive for the ‘Linux’ Category

Using Jenkins as a Cron Server

Monday, August 22nd, 2011

There are a number of problems with cron which cause lots of grief for system administrators with big ones being manageability, cron-spam and auditability. To fix some of these issues I have lately started using Jenkins. Jenkins is an open source Continuous Integration server it has lots of features that make it a great cron replacement for a number of uses. These are some of the problems it solves for me


Jenkins can be configured to retain logs of all jobs that it has run. You can set it up to keep last 10 runs or you can set it up to keep only last 2 weeks of logs. This is incredibly useful since sometimes jobs can fail silently so it's useful to have the output instead of sending it to /dev/null.

Centralized management

I have my most important jobs centralized. I can export all Jenkins jobs as XML and check it into a repository. If I need to execute jobs on remote hosts I simply have Jenkins ssh and execute command remotely. Alternatively you can use Jenkins slaves.

Cron Spam

Cron spam is a common problem with solutions such as this, this and this. To avoid this condition I only have Jenkins alert me when a particular job fails ie. a job exits with return code other than 0.  In addition you can use the awesome Jenkins Text Finder plugin which allows you to specify words or regular expressions to look for in console output. They can be used to mark a "job" unstable. For example in text finder config I checked

X Also search the console output

and specified

Regular expression ([Ee]rror*).*

This has saved our bacon since we used the script which "swallows" up the errors codes from the mysqldump command and exits normally. Text Finder caught this

mysqldump: Error 2020: Got packet bigger than 'max_allowed_packet' bytes when dumping table `users` at row: 234

Happily we caught this one on time.

Job dependency

Often you will have job dependencies ie. main backup job where you first dump a database locally then upload it somewhere off-site or to the cloud. The way we have done this in the past is to leave a sufficiently large window between the first job and consecutive job to be sure first job has finished. This says nothing about what to do if the first job fails. Likely the second one will too. With Jenkins I no longer have to do that. I can simply tell Jenkins to trigger "backup to the cloud" once local DB backup concludes successfully.

Test immediately

While you are adding a job it's useful to test whether job runs properly. With cron you often had to wait until the job executed at e.g. 3 am in the morning to discover that PATH wasn't set properly or there was some other problem with your environment. With Jenkins I can click Build Now and job will run immediately.

Easy setup

Setting up jobs is easy. I have engineers set up their own job by copying an existing job and modifying it to do what they need to do. I don't remember last time someone asked me how to do it :-).

What I don't use Jenkins for

I don't use Jenkins to run jobs that collect metrics or anything that has to run too often.


Install Openstack Nova easily using Chef and Nova-Solo

Wednesday, September 1st, 2010

Inspired by Cloudscaling's Swift-Solo and being excited about being able to create my own cloud I am announcing the Nova-Solo project. Openstack Nova is the Compute portion of the project trying to build open source stack to run Amazon EC2 type service. Nova-Solo is a set of Opscode Chef recipes that allow you to quickly get most parts of the Nova stack up and running. You can fetch it from Github at

At this time Nova-Solo is targeted for Ubuntu 10.04 and it relies on Soren Hansen's package repository to install all of the necessary packages. Following Nova services are installed

  • Cloud controller
  • Object store
  • Volume store
  • API server
  • Compute Server

Soren's package archive is a bit outdated so some of the things don't work. For example you can create users, generate credentials, upload files into buckets but you can't register the image. Soren has said he is in the process of building new packages and I am also in the process of doing the same so hopefully things improve quickly. Nova code is definitely alphaish so beware. To get started use git to clone the nova-solo repository and off you go

git clone git://

In the future as things stabilize we'll be making adjustments to support multiple compute servers (pieces for it are already in Nova-Solo), support other distributions like RHEL/Centos, etc.

Bootstraping your cloud environment with puppet and mcollective

Wednesday, July 28th, 2010

This is a "recipe" on how to bootstrap your whole environment in case of a disaster ie. your data center goes dark or if you are migrating from one environment to another. This guide differs from others in that it uses mcollective and DNS to provide you with greater flexibility in deploying and bootstraping environments. Some of the alternate ways are ec2-boot-init by R.I. Pienaar or Grig Gheorghiu's Bootstrapping EC2 images as Puppet clients.


You will need two disk images, your code repository and your DB backup and you can rebuild your whole environment from scratch in a relatively short period of time. This could be adapted to generic cloud provisioning however use case I'm trying to address is disaster recovery. We are using DNS so that we can keep hostnames consistent between environments ie. mail01 will be a mail server in all environments instead of domU-1-2-3-4 in one, rack-2345 in other etc.

Set up a master node image

Master node is the node that controls all the other nodes. Most importantly it contains all your configuration management data. You will need to install following

  • mcollective with ActiveMQ
  • DnsMasq
  • Puppet from Puppet Labs

1.  You will need to get a DNS name from a dynamic DNS provider such as DynDNS. Once you have that you will need to write a shell script that runs at boot and sets your EC2 private IP to that DNS name. Let's say we want our controller station to be known as we can do something like this

IP=`facter ipaddress`
# Delete any entries from hosts
sed -i "/" /etc/hosts
echo "${IP}" >> /etc/hosts

2. Set up ActiveMQ to be used with mcollective
3. Set up mcollective

Configure as the stomp host in your mcollective configuration for both client and server configuration.

4.Install dnsmasq. You don't need to configure anything since by default dnsmasq will read /etc/hosts and serve those names over DNS

5. Install puppetmaster, configure it anyway you want

6. Image it

Set up a generic/worker node image

You will need to Install following

  • Mcollective
  • puppet agent

1. On the worker node you need to configure the server piece of mcollective and make sure the is pointed to the master ie.

2. Create a reboot agent (we'll discuss later how to use it). Please visit for an example. Create a new file ie. reboot.rb. Paste this code in it

module MCollective
 module Agent
  class Reboot<RPC::Agent
    def reboot_action
     `/sbin/shutdown -r now`

Copy the resulting file to the mcollective agents directory

3. Add following script to the bootup

MASTER=`host | grep address | cut -f4 -d" "`
IS_ALREADY_SET=`grep -c /etc/resolv.conf`
if [ $IS_ALREADY_SET -lt 1 ]; then   
sed -i "s/^search .*/search" /etc/resolv.conf
sed -i "s/^nameserver/nameserver ${MASTER}\nnameserver/g" /etc/resolv.conf
# Set Hostname
IP=`facter ipaddress`
MY_HOST=`/bin/ipcalc --silent --hostname ${IP} | cut -f2 -d=`
hostname ${MY_HOST}

What that does is point tells your worker nodes to use controller DNS for resolving names as well as setting your hostname.

4. Get the mcollective puppet plugin from github

5. Image it

Bringing up the environment

You will need to start the master instance first since that's the instance that everyone will be talking to. As soon as it's up you can start up as many instances as you'd like.

While you wait rsync your puppet manifests and configurations to the master node

To find out what nodes are up and available issue mc-ping from the master and you should get a response similar to this

# mc-ping               time=77.21 ms
domu-12-31-55-11-22-18.compute-1.internal time=188.76 ms

Trouble is that hostnames on the worker nodes are set to Amazon names. We want to make them recognizable e.g. mail01.

To do so simply add the IP of the worker instance and it's name into /etc/hosts on the master e.g.

echo "" >> /etc/hosts

Reload dnsmasq configuration ie.

/etc/init.d/dnsmasq reload

What this has bought you is reverse DNS resolution of the node.  To take effect you will need to reboot the worker node. We already have the reboot agent on the worker nodes so all we have to do is run following command on the master node

./mc-rpc -F hostname=domu-12-31-55-11-22-18 reboot reboot

This will seek out the domU-1-2-3-4 host and reboot it (--arg is irrelevant so put anything). Once the machine is up it will advertise it's new name :-) ie. running mc-ping will show you this

# mc-ping           time=47.59 ms               time=80.71 ms

Now let's activate puppet. From master node run

# mc-puppetd -F hostname=mail01 runonce

 * [ ============================================================> ] 1 / 1

Finished processing 1 / 1 hosts in 1051.23 ms

Once that is done puppetca should give you this

# puppetca --list

Sign it

# puppetca –sign

Now you can simply run

# mc-puppetd -F hostname=mail01 enable

and off you go. Now lather, rinse, repeat to get the rest of the instances going. You would certainly want to automate this further but I leave that exercise to you :-).

If you are looking for an easy cross-cloud API check out my "Provision to cloud in 5 minutes using fog".

Store your cron output for analysis and correlation with cronologger

Tuesday, July 6th, 2010

For the longest time I have wanted to get rid of dozen or so cron messages I receive every morning about things like DB backups, DB cleanups/vacuums, reporting etc. There are a number of solutions out there to help you manage the cron spam such as cronic, shush and cronwrap. They help by e-mailing you only if there is a problem however don't store the cron output itself. To get around that issue I have developed cronologger which can be downloaded from

Cronologger is a BASH script that stores all the cron output into a database. I am using CouchDB since it is a great document oriented database that allows me to add attachments (blobs) to a document. I assume it would not be hard to use MongoDB, Riak and others.

Some of the benefits of this utility are

  • Reduce cron spam
  • Provide the ability to correlate adverse affects by overlaying cron events on e.g. Ganglia graphs
  • Provide a better report of all the batch jobs that ran, diff them with past jobs if they should look the same, etc.
  • Provide the ability to easily view what is currently running on the whole infrastructure ie. job_duration < 0
  • Review historical output

I am still working on web GUI for most of these things. I will gladly accept patches and new contributions.

Tip: To get view a list of documents in a CouchDB database you can use the _utils view e.g. http://localhost:5984/_utils/

Installing RedHat 6 Enterprise DomU under Xen

Tuesday, May 11th, 2010

Recently I downloaded RedHat 6 Enteprise beta (RHEL6). I wanted to install it as a Xen guest (DomU) on top of an existing Centos 5 Xen host. Unfortunately it did not work out of the box. I ran

virt-install --prompt

on the Xen host which let me install RHEL6 however when the install rebooted I was greeted with this error message

fs =, get_fs_offset(file))
IOError: [Errno 95] Operation not supported

Fortunately Karanbir Singh had a blog post about this at

Differences I found were that I had to make the root partition an ext2 filesystem as well. Also I found out that I couldn't review the partition layout if I ran the installation in the text mode. I had to use VNC to be able to set proper partition types.

Customizing iomega StorCenter ix4-200d with ipkg

Wednesday, April 28th, 2010

I have the iomega StorCenter ix4-200d. It is a nice little NAS with a number of decent features including rsync server etc. Unfortunately there were couple things I wanted fixed since for example rsync was at version 2.6.9 which does not support incremental updates. Machine runs a custom Linux distribution so I figured someone must have figured out how to customize it. I found part of the answer here

To enable SSH you need to log in as administrator to your StorCenter then go to https://<storcenterIP>/support.html. Turn on SSH access. StorCenter will reboot. Then you will be able to ssh into the box as root where password is your admin password with soho prepended ie. if your web gui password is secret then root password is sohosecret.

Post has a way to bootstrap Debian on the box however I found an easier solution ie. StorCenter ships with ipkg utility which is similar to apt-get and yum commands. To enable proper repositories I searched and found them here

Easy way to add them is cut and paste following

cat <<EOF > /etc/ipkg.conf
src cross
src native

Then type

ipkg update

After that you can check the list of available packages by typing

ipkg list | less

To install packages type

ipkg install <package_name>

Please note that packages are installed in /opt so adjust paths properly ie. screen is installed in


Hope this helps someone