Quantcast

Archive for the ‘Uncategorized’ Category

Operating on Dell RAID arrays cheatsheet

Wednesday, November 23rd, 2011

I have to infrequently add new drives to Dell RAID arrays like H700. For some reason it takes me couple searches to find the info so here so I can find it later.

List all drives

/opt/MegaRAID/MegaCli/MegaCli64 -PDList -aALL

Create a RAID array (e.g. RAID 0)

/opt/MegaRAID/MegaCli/MegaCli64 -CfgLdAdd -r0 [32:4, 32:5] -aALL

List working RAID arrays

/opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -Lall -aALL

Confirm you got the right RAID array e.g. Virtual Disk 1

/opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -L1 -aALL

Delete RAID array

/opt/MegaRAID/MegaCli/MegaCli64 -CfgLdDel -L1 -aALL

				

Monitoring links and monitoring anti-patterns video

Sunday, June 5th, 2011

John Vincent aka. lusis has started an interesting conversation surrounding monitoring on Freenode on channel he named ##monitoringsucks. He has also done an awesome job of starting up a Github project of the same name that is shaping up to be a nice collection of links to tools and blog posts. Check it out

https://github.com/monitoringsucks/

Also we just got a hold of  the monitoring anti-patterns Ignite Talk from Devopsdays Boston by Alexis Lê-Quôc aka. @alq. It is a short video (5 minutes) so it's definitely worth seeing.

Misconceptions about RRD storage

Tuesday, December 14th, 2010

I want to address the misconceptions about RRD (Round-Robin Database) that seem to crop up often even among seasoned sysadmins. Complaints can be summarized with these two points

  • RRD doesn't offer high resolution ie. after about an hour it's all averages and I want to knows what was the metric value last year at this hour and minute
  • Data drops off/is destroyed after a year - I want to keep my data forever, disk is cheap etc.

Those are valid points however none of them are the fault of RRD. RRD is a circular buffer so in order to be able to write into it you have to precreate it (otherwise it wouldn't be a circular buffer :-) ). Obviously more data points you store bigger the RRD file will be. To illustrate the point Ganglia Monitoring uses following defaults to create RRDs

RRAs "RRA:AVERAGE:0.5:1:244" "RRA:AVERAGE:0.5:24:244" "RRA:AVERAGE:0.5:168:244" "RRA:AVERAGE:0.5:672:244" "RRA:AVERAGE:0.5:5760:374"

This will create multiple circular buffers within the same RRD database file. In order to make sense out of this you need to know what the polling interval is ie. how often do you write into RRDs. In Ganglia's case the default is 15 seconds so

  • "RRA:AVERAGE:0.5:1:244" says write actual values (:1:) for every polling interval. Save last 244 of those so in our case we'll have 61 minutes worth of actual data points. Since it's a circular buffer data older than 61 minutes will be "dropped"
  • "RRA:AVERAGE:0.5:24:244" says average 24 values (:24:), 24 * 15 seconds = 360 seconds = 6 minutes. 244 of those times 6 is a whole day
  • You can do the next two :-)
  • Last one "RRA:AVERAGE:0.5:5760:374" says average whole day (5760 * 15 seconds = 1440 minutes = 1 day) worth of values and store it in 374 points ie. little more than a year

When graphing RRDtool is smart enough to use the buffer which gives you the most data points. To store all this data RRD file will use about 12kBytes. Thus if you want higher resolution you will need to change the definition e.g. you could do this

"RRA:AVERAGE:0.5:1:2137440"

which will give you one year worth of data points with no averaging with 15 second interval. Trouble is the size of this RRD file is 17 Mbytes. This may not seem as bad but one of the RRD drawbacks is that every time you add data to an RRD the whole file is written over so if you have 1000 metrics you can be potentially writing 17 GBs of data every 15 seconds. This may be a problem depending how many metrics you are keeping track of. There are alternatives which increase throughput such as storing RRDs in RAMdisk or using rrdcached. Alternatively you can opt to keep 2 weeks worth of data points with e.g.

"RRA:AVERAGE:0.5:1:81984"

which will result in size of about 650 kBytes per RRD file. Or you can do something else altogether. Flip side of RRD is that there are no indexes to maintain, no tables that need to be rotated.

Update: I was wrong about the whole RRD file needing to be updated. In retrospect it makes sense and I apologize for providing the wrong info. You can read comment from Tobi Oetiker (creator of rrdtool) in comments below for more detail. This is actually awesome news since there is very little downside in making larger RRDs.

As far as Ganglia you can modify the defaults in /etc/ganglia/gmetad.conf file. You can also use gmetad-python which allows you to write your own plugins and store metric data in both RRD format, SQL or any other storage engine of your choice.

More on RRDtool can be found here

http://oss.oetiker.ch/rrdtool/tut/rrdtutorial.en.html

PHP 5.3 name spaces separator

Monday, August 16th, 2010

I am posting this to help others that may encounter a similar problem.

I have been doing some PHP development recently using Predis, a PHP Redis library. While instantiating the Redis\Client object I get

Warning: Unexpected character in input:  '\' (ASCII=92) state=1 in .....

Problem was explained in this issue

http://github.com/nrk/predis/issues/closed#issue/11

If you are still running on PHP 5.2 you should use the backported version of Predis and not the mainline library which targets only PHP >= 5.3 (the backslash is the namespace separator in PHP 5.3).

More discussion of this change can be found here.

http://giorgiosironi.blogspot.com/2009/09/introspection-of-php-namespaces.html

GangliaView – automatically rotate Ganglia metrics

Wednesday, June 16th, 2010

GangliaView is a simple web app that allows you to automatically rotate selected Ganglia metrics. We use it to rotate key metrics with large graphs showing last hour and last day and smaller graphs showing last week and last month. A sample screen looks like this

GangliaView is derived from CactiView with a number of changes to make it work with Ganglia and removal of frames. You can download it from here

http://github.com/vvuksan/ganglia-misc

Vonage the new Baby Bell

Thursday, May 13th, 2010

It is sometimes amazing to me how new upstarts morph into their own arch enemies. Case in point is Vonage. For years I used to have Vonage service at home as a backup phone service. I was on a 500 minute plan for $14.99+taxes. This was a great plan for me as I didn't use the phone much. However at some point they decided that was too little money and they hiked up the price to $16.99 (something like that). It may seem like a small difference but I figured I may be better of elsewhere. I ended up switching to Galaxy Voice which I am using to this day since they had more flexible calling plans.

We recently expanded our office space and we needed a phone line added to a conference room. Since I had my old Vonage adapter at home I figured I would bring it and we'd use it. I thought it would be as easy as going to Vonage's web site, supplying the phone adapter ID and my credit card number and I would be set. It wasn't so. After entering the phone ID I got this message

The MAC address you entered is associated with an existing Vonage account. Please call our Customer Care department at 1-866-293-5676 for immediate assistance.

I called the number and spoke to someone in Customer service. This took about 20 minutes while the person kept re-asking for the same data and concluded that they couldn't help me and that I would have to talk to tech support. Tech support guy was equally unhelpful. Basically I could not activate a device that was ever used before since the system "knew" about it. Talk about having a piece of useless technological trash. At that point I was sufficiently frustrated to end the call. I tweeted about my experience and a day later I was contacted by Vonage's Twitter team about having someone at customer service contact me. I thought I'd give it a go. I got a call and this experience was not a whole lot better than the previous ones. Person kept asking me for my personal information including name, billing address, what was the credit card number I used for paying bills and the e-mail address I used. Since this was more than a year ago and I have dozens of e-mail addresses I said I couldn't remember. At that point I ended the call since I was sufficiently frustrated. I was willing to give these people money yet they were making me jump through all this hoops. I don't get it.

It occurred to me later that this was very similar to experiences that I had with a local phone company when I would move and I would have to get through all these bureaucratic hoops to make sure all my features stayed the same after I moved.

Devops homebrew

Friday, April 9th, 2010

There has been quite a bit of discussion about Devops and what it means. @blueben has suggested we start a Devops patterns cookbook so people can learn what worked or didn't work. This is the description of the environment we implemented at a previous job. Some of these things may or may not work for you. I will try to keep it short.

Environment background

7 distinct applications/products that had to be deployed and tested ie. base/core application, messaging platform, reporting app etc. All applications were Java based running on either Tomcat or Jboss.

Application design for deployment

These are some of the key points

  1. Application should have a sane default configuration options. Any option should be overrideable by an external file. In most cases you only need to override database credentials (host, username, password). Goal is to be able to use the same binary across multiple environments.
  2. Application should expose key internal metrics. We for instance asked for a simple key/value pairs web page ie. JMSenqueue=OK etc. This is important because there are lots of things that can break inside the application which external monitoring may miss like JMS message can't be enqueued, etc.
  3. Keep release notes actions to a minimum. Release notes are often not followed or partially followed thus make sure point 1. is followed and/or try to automate everything else.

Continuous Integration

We used CruiseControl for Continuous Integration. It was used solely to make sure that someone didn't break the build.

Creating releases

Developers are in charge of building and packaging releases. This primarily because QA or Ops will not know what to do if a build fails (this is Java remember). Each release has to be clearly labeled with the version and tagged in the repository. For example Location 1.1.5 will be packaged as location-1.1.5.tar.gz. Archives should contain only WAR (Tomcat) or EAR (Jboss) files and DB patch files. Releases are to be deposited into an appropriate file share ie. /share/releases/location.

Deployment

In order to eliminate most manual deployment steps and support all the different applications we decided to write our own deployment tool. First we started off with a data model which roughly broke down to

  1. Applications – can use different app server containers ie. Tomcat/JBoss, may/will have configuration files that can be either key/value pairs or templates. For every application we also specified a start and stop script (hotdeploy was not an option due to bad experiences with our code).
  2. Domains/Customers – we wanted a single Dashboard that would allow us to deploy to multiple environments e.g. QA staging (current release), QA development (next scheduled release), Dev playbox, etc. Each of these domains had their own set of applications they could deploy with their own configuration options

First we wrote a command line tool that was capable of doing something like this

$ deployer –version 1.2.5 –server web10 –domain joedev –app base –action deploy 

What this would do is

  1. Find and unpack the proper app server container e.g. jboss-4.2.3.tar.gz
  2. Overlay WAR/EAR files for the name version e.g. base-1.2.5.tar.gz
  3. Build configuration files and scripts
  4. Stop the server on the remote box (if it's running)
  5. Rsync the contents of the packaged release
  6. Make sure Apache AJP proxy is configured to proxy traffic and do Apache reload
  7. Start up the server

One of the main reason we started off with a command line tool is that we could easily write batch scripts to upgrade whole set of machines. This was borne out of pain of having to upgrade 200 instances via a web GUI at another job.

Once deployer was working we wrote a web GUI that interfaced with it. You could do things like View running config (what config options are actually on the appserver), Stop, Restart, Deploy (particular version), Reconfig (apply config changes) and Undeploy. We also added the ability to change or add configuration options to the application specific override files. Picture is worth thousand words. This is a tiny snippet how it approximately looked for one domain

This was a big win since QA or developers no longer needed to have someone from ops deploy software.

DB patching

Another big win was "automated" DB patching. Every application would have a table called Patch with a list of DB patches that were already applied. We also agreed that every app would have dbpatches directory in the app archive which would contain a list of patches named with version and order in which they should be applied e.g.

  • 2.54.01-addUserColumn.sql
  • 2.54.02-dropUidColumn.sql

During deployment startup script would compare contents of the patch table and a list of dbpatches and apply any missing ones. If the patch script failed e-mail would be sent to the QA or dev in charge of particular domain.

A slightly modified process was used in production to try to reduce down time ie. things like adding a column could be done at any time. Automated process was largely there to make QA's job easier.

QA and testing

When a release was ready QA would deploy the release themselves. If there was a deployment problem they would attempt to troubleshoot it themselves then contact the appropriate person. Most of the times it was an app problem ie. particular library didn't get commited etc. This was a huge win since we avoided a lots of "waterfall" problems by allowing QA to self-service themselves.

Production

Production environment was strictly controlled. Only ops and couple key engineers had access to it. Reason was we tried to keep the environment as stable as possible. Thus ad hoc changes were frowned upon. If you needed to make a change you would either have to commit a change into the configuration management system (puppet) or use the deployment tool.

Production deployment

The day before the release QA would open up a ticket listing all the applications and versions that needed to be deployed. On the morning of the deployment (that was our low time) someone from ops, development and whole QA team engaged in deploying the app and resolving any observed issues.

Monitoring

Regular metrics such as CPU utilization, load etc. were collected. In addition we kept track of internal metrics and set up adequate alerts. This is an ongoing process since over time you discover what your key metrics are and what their thresholds are ie. number of threads, number of JDBC connections etc.

Things that didn't work so well or were challenging

  1. One of the toughest parts was getting developers' attention to add "goodies" for ops. Specifically exposing application internals was often put off until eventually we would have an outage and lack of having the metric resulted in extended outage.
  2. Deployment tool took couple tries to get right. Even as it was there were couple things I would have done differently ie. not relying on a relational database for the data model since it made it difficult to create diffs (you had to dump the whole DB). I'd likely go with JSON so that diffs could be easily reviewed and committed.
  3. Other issues I can't recall right now :-)

Wrapup

This is the shortest description I could write. There are a number of things I glossed over and omitted so that this is not too long. I may write about those on another occasion. Perhaps the key take away should be that Ops should focus on developing tools that either automate things or allow its customers (QA, dev, technical support, etc.) to self-service themselves.

Update: There is a second part to this posts

Devops religion wars

Tuesday, April 6th, 2010

I have been trying to stay out of the devops arguments but it seems that they are slowly devolving into religious wars. It seems that each group ie. devops and non-devops is convinced that they are in possession of "eternal self-evident truths" and that everyone else is unenlightened hater or similar.  Proof in point is following post

http://brian.moonspot.net/devops-dealnews

Brian describes their devops process which seems reasonable to me. What is most important is that it works for him, his group and his site.

Unfortunately comments devolve from there. A non-devops person raises a good point about the process however does it with poor style and insulting language. Response is to compare devops and non-devops approach with giving man a fish vs. teaching someone to fish. It goes from there. It's all just too silly. Firstly I am not aware of definite devops definition. Secondly every environment is different. What may work for you may not work everywhere else. I really doubt that continuous deployment would work if your web app was used in providing emergency medical care. That said things have changed and availability expectations have increased so cooperation between development and ops is critical. Therefore let's try to stop with the silly arguments and try to learn from each other. Most of all avoid insulting language. I realize we all get frustrated at times but it really devalues your view.

Password complexity madness

Friday, January 22nd, 2010

You know the pitch. Each time you create an account for a "secure" site you are forced to come up with a complex password ie. you need to have a number, a capitalized letter, perhaps a special character such as + or -. Trouble is policies differ so on one site password has to be a minimum length, maximum length, some don't allow special characters etc. The thing is at one point in time this made sense and was required to keep basic security but it may not make sense today.

Ages ago computer systems (in particular UNIX systems) used to store passwords in a hashed format (hash . You can read more on cryptographic hashes on Wikipedia. The trouble is that these hashes were available for any user to see ie. you could copy a password file (/etc/passwd) or use YP/NIS tools to get a list of all passwords in an organization. Once you have the password file you do not know what the passwords are however you can take a word dictionary start computing hashes since a particular password will always convert to the same hash and compare it if there are any matches in your password file. If you find a match you know have "discovered" users password. This is often referred to as off-line password cracking since it allows you derive passwords without interacting with the target system. This has many advantages since you can try millions of passwords quickly and the target system's administrator will not be alerted. Based on this fact password policies were instituted that mandated password complexity since passwords with complexity ie. 9pc_miu would be nearly impossible or very hard to break (it may take years to break it). This made sense then.

However it doesn't make much sense now since on most systems regular users have no access to the password hashes. On UNIX systems "shadow" (/etc/shadow) is used to hide them or you may be using LDAP which has the capability of hiding password hashes, etc. The only users that have access to those hashes are administrator however they have other ways of acquiring your passwords. Thus your real exposures in order of importance are

  • Trivial passwords or easily guessable password ie. 123456, 1234, date of birth
  • Using same password across different sites ie. this is a problem if e.g. site A.com gets hacked and hackers are able to determine your password and log into site B.com

I actually feel that password complexity breeds poor security since people will write down complex passwords instead of remembering them. Just remember how many times have you seen passwords on post-it notes on someone's monitor. Perhaps it is time to scrap the password complexity and use something simpler.

Keeping an eye on binary log growth

Thursday, October 1st, 2009

Recently I got a report that some pages on the site were extremely slow. Looking at the web server metrics didn't show anything new however mySQL DB metrics showed a definite change

MySQL server CPU utilization

MySQL server CPU utilization

ie. at the end of Week 38 there is an increase in CPU utilization. Nearly 60% increase. Interestingly enough there was a new software release at the end of Week 38 which pointed to either a bug or a new feature. Luckily I have been collecting mySQL metrics using this gmetric script. This led me to these two graphs

mysqlupdate

mysqlinsert

So nearly double number of inserts and nearly triple the updates. Using mysqlbinlog I analyzed the update and insert statements and was able to identify the two culprit INSERT and UPDATE statements then sent it off to developers.

I also observed that had I watched the binary log growth I may have identified this earlier since there were a lot more binary logs for the period since the release. Thus mysql average binary log growth rate gmetric was born :-) . Now all I need to do is find out what normal growth rate is and if it goes outside of that norm use Nagios to send me a non-urgent alert.


Switch to our mobile site