Quantcast

Archive for the ‘Uncategorized’ Category

Software doesn’t run itself

Sunday, September 13th, 2009

Perhaps I should no longer be surprised but I am by the article mentioned in this blog post

http://www.nakedcapitalism.com/2009/09/another-lehman-mess-no-one-can-run-the-software.html

In particular this

Once it went bankrupt, the staff who supported these systems “evaporated”, according to Steven O’Hanlon, president of Numerix, a pricing and valuation company which is working with Lehman Brothers Holding Inc to unwind the derivatives portfolio.

These days computer systems are the blood of your company so allowing critical technical staff to simply "evaporate" is mind boggling. Granted company imploded but still I would think that someone should have figured out going into bankruptcy that they should set aside money to pay for their maintenance.

Ultimate problem as pointed out in the blog post on Naked Capitalism that documentation is usually skimped on since it "doesn't provide value". Although I would also add that when people say "code is documented" they don't usually mention their systems infrastructure is documented. That can sometimes be even bigger impediment. At a previous job there was a Perl CGI script that most people didn't know about and even fewer understood. If that script didn't work our whole load balancing infrastructure would "mysteriously" fail since app servers wouldn't register themselves to web servers and leading to a full blown outage. It was such an obscure "feature" that you could literally spend weeks chasing other avenues since this was so non-obvious.

Also I would not take comfort in having source code to an application. Lot of customers of startups will write in their contracts that if a startup goes bust they get access to the source code. That may sound nice but it doesn't mean you will necessarily be able to run it. There are so many "secret" recipes, undocumented workarounds that are often involved in running most complex pieces of software that you should really be cautious.

In closing if you care that your software runs make sure you keep at least couple folks who have run it around.

http://www.nakedcapitalism.com

/2009/09/another-lehman-mess-no-one-can-run-the-software.html

Simple “web service” for Ganglia metrics

Friday, September 11th, 2009

Here is a simple PHP script to allow you to get current Ganglia metrics. You will need Ganglia web installation. Drop this script somewhere. Then invoke it via e.g.

http://mygangliaserver/ganglia/metric.php?server=web1&metric_name=load_one

Where server is the name of the server for which you want metrics and metric_name is the exact name of the metric you are looking for e.g. load_one, disk_free etc. Only thing that is returned is either ERROR message or actual value.

<?php

$GANGLIA_WEB="/var/www/html/ganglia";

include_once "$GANGLIA_WEB/conf.php";
include_once "$GANGLIA_WEB/get_context.php";
# Set up for cluster summary
$context = "cluster";
include_once "$GANGLIA_WEB/functions.php";
include_once "$GANGLIA_WEB/ganglia.php";
include_once "$GANGLIA_WEB/get_ganglia.php";

# Get a list of all hosts
$ganglia_hosts_array = array_keys($metrics);

$found = 0;

# Find a FQDN of a supplied server name.
for ( $i = 0 ; $i < sizeof($ganglia_hosts_array) ; $i++ ) {
 if ( strpos(  $ganglia_hosts_array[$i], $_GET['server'] ) !== false  ) {
 $fqdn = $ganglia_hosts_array[$i];
 $found = 1;
 break;
 }
}

if ( $found == 1 ) {
 if ( isset($metrics[$fqdn][$_GET['metric_name']]['VAL']) ) {
 echo($metrics[$fqdn][$_GET['metric_name']]['VAL']);
 } else {
 echo("ERROR: Metric value not found");
 }
} else {
 echo "ERROR: Host not found";
}

?>

Nothing fancy. It contains rudimentary error checking so please be gentle :-) . Feel free to extend it satisfy your needs. Also this is likely not scalable if you have hundreds of hosts and tons of requests.

Broken hostname resolution and PAM don’t mix

Wednesday, September 9th, 2009

I don't mean PAM the cooking spray but Pluggable Authentication modules. I was asked to change some DNS settings for a set of hosts ie. move them from one domain to another e.g. from them being in domain.com to be in domain.net. At the end of the process head node all of the sudden started refusing logins with following error message

fatal: Access denied for user vvuksan by PAM account configuration

It took some hair pulling but after a while I concluded that the headnodes hostname was set to the old name e.g. server5.domain.com which was no longer resolvable. As soon as hostname was changed ie.

% hostname server5.domain.net

Things automagically started working again. Hope this prevents someone from going bald :-) .

Cloud computing’s Achilles Heel

Tuesday, September 1st, 2009

I have touched upon this issue before however here are some illustrations of what I think is cloud computing's Achilles heel. It has to do with shared hardware and virtualization. In my case I have a Drupal site running in a Xen guest running on top of a Xen host. For whatever reason while being indexed by a Google bot Apache went "crazy" allocating tons and tons of memory and swapping like crazy. At this point the Xen guest is nearly unusable since the load is close to a 100.

xen-guest

Now let's look at what is happening to the underlying Xen host ie. one that runs the Xen guest

xen-host

Yikes. If you had another instance on this particular Xen host you can bet that instance would be severly affected. The trouble is that you may not be really aware of it since you do not have access to the underlying hardware. You may be scratching your head why all of the sudden you are getting subpar performance. Also if you are a cloud provider how do you deal with situations like this ? Do you simply shut down machines that exceed certain performance thresholds ? What if this happens to be a production database server which is doing a database dump and should be "allowed" to thrash the disk ? What if you shut it down and you corrupt customers' database ? It gets real tricky real quick.

Also forget about oversubscription. You need one poorly behaving guest to ruin it for everyone else. Although more you oversubscribe more the risk of performance degradation.

Trouble with cloud computing

Sunday, August 23rd, 2009

While we are on the subject of cloud computing the real problem with it is that rightfully or not it has been portrayed as the computing infrastructure "savior". Just check out the description on Wikipedia (and I am not really picking on Wikipedia just using is as a representative quote)

Cloud computing is a style of computing in which dynamically scalable and often virtualized resources are provided as a service over the Internet.[1][2] Users need not have knowledge of, expertise in, or control over the technology infrastructure in the "cloud" that supports them.[3]

Now that you have heard that kind of a "pitch" it is hardly surprising to hear how great of an idea would it be to move everything off to the cloud thus avoid any capital expenses on equipment, avoid having to maintain the hardware, "unlimited" scalability etc. etc.  Trouble is that that is only a small piece of the whole infrastructure puzzle. In reality the only thing that clouds allow me to do is easily "create" and "destroy" hardware (I guess they abstract hardware). That is certainly a nice feature and no doubt has some value. However clouds don't "automatically" scale (they need some type of middleware to do that), they don't automatically configure themselves (configuration/manual management does that), nor they automatically alert or monitor stuff in YOUR application. You have to do that. Lots of it.

This actually reminds me of the "managed hosting" pitch a lot of the colocation providers will try to sell you on. In some cases they would scoff at the fact that you just wanted plain Jane colocation ie. a cabinet, some amount of power, dual network drops etc. No, they wanted to sell you on stuff like managing your OS/updates, doing back ups etc. so that you can spend your time on more "productive" tasks. That is all nice however that provides very little to no value to me. I can install an OS and all the updates through a mixture of PXEboot, console access and configuration management in less than 10 minutes and I know exactly what needs to be backed up. Do I really need /usr/ backed up on every web machine ? No.

In closing while evaluating cloud computing make sure you really look at what is the problem you are really trying to solve. Clouds do not in themselves provide you with magical ponies. You still have to do most of the work.

Cloud stress or why computing clouds are not for everyone

Friday, August 21st, 2009

Yesterday Slashdot featured a story about a study conducted to evaluate response time of the major cloud infrastructure providers

http://tech.slashdot.org/story/09/08/20/0327205/Amazon-MS-Google-Clouds-Flop-In-Stress-Tests

One of the main findings was that "Response times on the service also varied by a factor of twenty depending on the time of day the services were accessed".

Unfortunately this is not a surprise to me. One of the main issues with shared infrastructure is well ... sharing. There will always be a user or couple users that will for one reason or another use infrastructure inefficiently and this will end up degrading everyone's performance. For example you may have a shared database machine and a user who decides to do full backups daily. Guess what while those backups are running your other users will be severly impacted.

Things are even more complicated in the cloud since you are usually running a virtualized instance which is sharing a piece of physical hardware with other virtualized instances. As such you have very little insight into what other instances are doing and they may be doing a lot to degrade your performance. Even though most of the virtualization technologies promise isolation ie. controlling how much I/O or CPU particular instance gets practice is different. For instance I run a number of Xen hosts/guest and I can see if a particular Xen guest goes crazy ie. starts thrashing the disk all the other Xen guests will start "seeing" higher CPU wait I/Os. This leads me to a story of sorts some time ago I signed up for service from an inexpensive VM vendor (we're talking $10-20/month cheap) so I can run my own web server and mail. Machine was excruciatingly slow most of the time, so slow that typing commands on the prompt took couple seconds yet I wasn't running anything on it. After I installed Ganglia I noticed that CPU WAIT I/O was about 10% most of the time and Load one was average of 4. Remember I haven't even installed anything on this machine. They moved me to a different machine but the same thing happened so I cancelled the service. Company was obviously over subscribing their machines or they had lots and lots of "abusers".

I am not trying to say that clouds are useless but for years (since EC2 was in beta) I heard a lot of preposterous claims about clouds. Even to the point where it was suggested we should run a back up data center on EC2 since it was "cheaper". Even if we could get away from the security concerns ie. not being able to run VLANs, having your traffic cross a shared bridged network interface etc. I just don't see if you needed any type of reliable performance you can rely on clouds to deliver. Sure you could try to get clever with the load balancer but in any case there is always a potential that a set of your visitors will end up on a web server that is affected by someone else's process or worse that all of the sudden your site is terribly slow and there is literally no explanation for it. Try to explain that to your boss :-) .

That said there are obviously cases where clouds could be great ie. when you need to scale quickly from let's say handful machines to dozens of machines then dispose of them when you are done etc. There are likely other scenarios but you really have to evaluate it application by application.

Radio 101 Zagreb Podcasting

Tuesday, August 11th, 2009

Hijacking this blog for some musings in Croatian :-)

Prije par tjedana stavio sam na

http://radio101.podzone.net/

snimke informativnih emisija zagrebačkog Radia 101. To sam napravio budući da mi je dojadilo što takvo što nije do tada nije učinjeno. To je što se mene tiče velika šteta budući da smo daleke 1996 nakon pokušaja gašenja Radia Damir Džeko i ja pokrenuli podcast Radia 101. To je vjerojatno bio jedan od prvih podcastova na Webu. Nažalost nemam kopije sitea iz tog vremena međutim evo arhive sa archive.org iz 1998

http://web.archive.org/web/19980113130105/vukovar.unm.edu/r101/audio/

Ja sam bezuspješno u nekoliko navrata razgovarao sa ljudima na Radiju da oni preuzmu snimanje i arhiviranje takvih podcastova međutim to očito nije polučilo rezultate. No nije to toliko sada bitno. Što je bitno je što ćemo napraviti u budućnosti. Radio 101 je kultna institucija i informativne emisije Radija su po mojem mišljenju kulturna baština Hrvatske i kao takve ih treba sačuvati za buduće naraštaje. Možda to nekima zvuči bez veze međutim kako budućim naraštajima dočarati što se dešavalo u ranim 2000im godinama nego kroz audio/vizualne snimke. Čitanje suhoparnih vijesti je jedna stvar međutim slušati emisije tipa Parliament Show ili Speaker's Corner gdje obični ljudi iznose svoje tegobe, svoja mišljenja je druga stvar. Također ne znam da li itko može dočarati Aktualac ili Uj Fuj :) .

Prije par tjedana sam poslao e-mail Silviju Vrbancu budući da me on koliko toliko zna i dobio sam odgovor da se na podcasting radi. Nažalost tu sam priču već čuo pa bih volio na neki način izvršiti pritisak da se to ostvari što prije. Naime ja sam svoj podcasting set up dostupan preko linka http://radio101.podzone.net/ složio za manje od 3 sata. Bravo ja :) .  Ako nije teško idemo to složiti.

Nakon što se to obavi bilo bi pametno sa primjerice Nacionalnom Sveučilišnom Knjižnicom dogovoriti arhiviranje materijala tako da postoji netko drugi osim Radija koji se o tome brine.


Switch to our mobile site