Monday, February 26, 2007

A Few Notes On Quality

James McGovern talks about quality in a recent post. In the shadow of six-sigma, TQM, ISO 9xxx and CMMI, we need to remind ourselves that the pursuit of quality has lead some to the brink of insanity. Perhaps it's the methods we have been cajoled into employing for measuring quality that are insane.

Feigenbaum, Deming and their contemporaries were the first to apply statistical analysis to quality management in the 1930's. In Total Quality Control, Feigenbaum states, “In the phrase ‘quality-control’, the word quality does not have the popular meaning of ‘best’ in any absolute sense. To industry, it means ‘best for certain customer conditions’.” Perfection can be measured because there must be one, ultimate example the desired product. The existence of an exemplar is the basis for six-sigma and all the other quality control methods in industry. However, this poses a bit of a problem for software development.

How so? Well, in most cases the goal of software development is to build a single instance of an exemplar! Using methods designed to measure defects or deviations from an exemplar in order to create that exemplar seems to be a little unrealistic. Does that mean we fall back to "best for certain customer conditions"? Of course, this begs the question, what is quality and how is quality measured?

This very question vexed Robert M. Pirsig so much, that he was eventually incarcerated in a psychiatric facility for a number of years. In his world-famous book, Zen And The Art Of Motorcycle Maintenance: An Inquiry into Values, Pirsig writes, “Quality … you know what it is, yet you don’t know what it is. But that’s self-contradictory. But some things are better than others, that is, they have more quality. But when you try to say what the quality is, apart from the things that have it, it all goes poof! There’s nothing to talk about. But if you can’t say what Quality is, how do you know what it is, or how do you even know that it even exists?” Pirsig’s approach was to establish a stable system. Using one of his college classes as an experiment in quality, he informed his students that he would be withholding letter grades just in their class until the end of the semester. “Then a hoped-for phenomenon began. During the third or fourth week some of the A students began to get nervous and started to turn in superb work and hang around after class with questions that fished for some indication as to how they were doing. The B and high-C students began to notice this and work a little and bring up the quality of their papers to a more usual level. The low C, D and future Fs began to show up for class just to see what was going on.” Pirsig had hoped and predicted that his students knew exactly what quality was and without any measure of their progress whatsoever, would know exactly what they had to do to improve the quality of their work. He then moves into a long dialog about the philosophy of quality and how quality is defined and recognized, however, he does not manage to find how quality can be measured.

I think we all know when a piece of software is crap. Instead of exerting effort devising methods to measure quality, enforce quality, etc., why can't we just spend all that superfluous time and effort preventing problems in the first place and fixing them, when they occur?

Monday, February 12, 2007

Open Source Software Policy

Earlier, I posted about the open source software policy I developed for my employer. Optaros, Inc. has a similar policy which it has published for the benefit of us all.

Lifecycle or Immortality

Vilas talks about exceptions to the enterprise architecture as slums. I have to agree. In fact, as EAs, we need to plan for their emergence. Let's face it, people will find ways to work around processes, rules and guidelines if enough money is at stake and frankly, that's the way it should be. Our enterprises exist to make money and aside from regulations, everything that does not support that goal is fluff. There is a large difference between governance and policing and too often, we forget this and start EA practices from the point of enforcement rather than guidance and containment. The best way to manage deviations from the grand plan is to anticipate them by providing a lightweight process for granting architectural waivers.

Let's revisit the SDLC acronym. The last half is Life Cycle. Death is a part of the Life Cycle. Yes, there is an old maxim that says "software lives forever" and there is truth to that; that is also the kind of immortality that needs to be well-defined within an enterprise architecture. Therefore, don't make slums immortal. Give them a lifespan. Why don't we want slums around in the first place? Well, they are difficult to govern and that leads to expensive maintenance. It should be relatively easy to show the long-term maintenance costs for a slum in it's business plan. At some point, the system will either need to be turned off because of TOC or, it needs to be refactored into the EA to lower it's cost of operations.

When designing the process for granting a waiver, ensure that there is a Retire or Re-factor clause in the plan stating either a date the software is to be removed from production or the date re-factoring is to begin. Clever designers will keep the existing EA as a constraint when building the slum so that when it comes time to retire or re-factor, the cost of doing so is as low as possible.

Friday, February 09, 2007

Wishlist Item!!

Now this is a 16-bit processor I would just love to have!

Latency and Design

Once again, we are reminded that bandwidth can be bought but latency lives forever. I think that over the past 10 years or so, we have largely ignored this axiom or worse, never really learned the difference. In an attempt to simplify our architectures, we have added layers and have relied on physical deployment to keep the architecture neat and tidy. Separation of concerns were enforced by the wire. Client-server, 3-tier, n-tier, etc. Worse, each box represented one application or sub-component of a single application. This approach has been hawked as a viable pattern for a distributed application. Well, it's not, it's all about distributing an application. There is a difference and yet, we still haven't learned much about how to design and build distributed applications. Why is this an issue? Well, it's not until we start running out of room in the datacenter or we actually want to get the most out of our desk/laptop with a multi-core processor. See where I'm going with this?

Consider a case where we have Photoshop, a 4-core cpu and a very large photo. Which approach is going to get the job done with the least latency:

  1. 4 copies of photoshop processing 1 photo each,
  2. 4 copies of photoshop, each processing 1/4 of the photo,
  3. 1 copy of photoshop sending 1/4 of the task in parallel to each core?

Ok, that's a bit of an unfair question. Let me re-phrase, which approach will have the most latency? If you answered #1, that's most likely. So, why do we continue to build web and enterprise applications like this when time is money? That, by the way, is an entirely fair question and I'm sure your CFO would like to know the answer.

Saturday, February 03, 2007

Project Black Box

While at Sun Labs last week, I had an opportunity to see project Black Box. In a nutshell, the idea is to use a standard 8'x20' ISO standard shipping container, put doors on both ends and re-model the interior turning the container into a portable data center. It's an interesting concept that I'm sure has been knocked about for several years and I recall several companies have created mobile data centers but, this one is a bit different.

The most interesting aspect about the Black Box is the potential compute and storage density. While there are 8 standard 40U racks, 7 are available for customer use. After taking away space for cables, etc, that leaves 7 racks with 38U of space, each. Plus, the racks slide out, sideways, into the central aisle allowing for simultaneous access to the front and rear of the rack from each end of the service aisle. Combined with the well-designed cooling and air circulation system, this is about 30% less floor space than what can be put into a typical data center. Assuming A/C, raised floor, and structure, the savings are even more compelling.

Sun plans to sell the unequipped box for $300-$400K and the servers and storage are extra. The customer needs to supply water, power and network access. It will be interesting to see what kinds of new business concepts the Black Box will enable.