Following is an overview of root-cause analysis, in the manner of a fable ... with time machine! “Root cause analysis” is a common term in industry and in news articles like the recent one about the origins of cracks in a shield building at the Davis-Besse nuclear power reactor, but it's impossible to summarize in a few sentences so most writers just assume that readers know the method. But root cause analysis is a whole array of methods, developed over decades by many people.
And now for "The Traveler's Tale," Part I.
MEMO
From:
Richard C. Asplundh, Staff Engineer,
Rapid Prototyping Div.
To: Boss
Re:
Mixed Results with Prototype X-1A Time Machine
Date:
September 4, 1877
Via: Catamount
Brand Whiskey bottle
Well,
boss, the new time machine works! In one direction, anyway! I'm leaving my progress report in a
bottle and burying it where I hope someone will find it in your time zone, and send it to the home office. Meanwhile, I'm staying busy back
here and trusting that you still have the meter running,
paycheck-wise.
Remember
how you asked me to do just one little test run before lunch with the
X-1A? As in, “Ricky, old boy, how about you go backward just a few minutes,
and try out the Back-to-Base Homing Mode?”
I'm
here to say the homing mode doesn't work yet. Also, you need to tell
the lab rats that they miscalibrated the ChronoCounter, because that
"couple of minutes" they dialed in was considerably more than a century.
Fortunately,
because of my deep training in all forms of Root Cause Analysis,
I've been able to make myself useful back here, pending some help from your direction.
Before
I
get down to details, here's the view from 40-K feet: I get a job
in a hard-rock gold mine. I immediately learn that the boys at the Acme Mine are smack in the middle of an all-out business
crisis. But it can't withstand my root-cause skills for long, and that's me without my laptop, or my notebook, or the stack of
proprietary software.
But as I always like to say, root cause
analysis is more headware than hardware. Me and my rough-hewn buddies
get things sorted out, against all odds.
The big adventure starts like this: There's a haze, I feel dizzy, then find myself in a mountain forest.
The machine's control panel says it's just a few minutes before I
left, but I can see that this doesn't look like the inside of our
company's Southwest South Dakota Warehouse at all. I wait a day in case
you might send a rescue squad from the future, but no dice, so I
decide to trudge off and meet the natives, whoever they may be.
I
have to say it's pretty exciting to go off exploring when you don't
know if you're fifty years off the beam or fifty thousand. But I find
a grubby town and get my bearings: I've touched down
outside of Deadwood, Dakota Territory. It's August 16, 1877, about ten years into the Black Hills gold rush.
Nothing
too glamorous about this side of the Old West: unpainted slab-sided buildings up
high, and mud down low. I head for a “help wanted” sign, walk
inside, and a thin geezer with a visor says they need a man to
tend the couple of dozen Missouri mules that live in a stable at the
bottom of the Acme Mine.
Graybeard identifies himself
as Too-Tall Johanson. He says that mules pull the ore cars from the
heading down a little iron track, back to the main shaft, where a
steam engine drags the rock to the surface. The mules live down
there, never seeing daylight … like our IT people.
I
tell the guy behind the counter that I don't know one end of a mule
from the other. But Too-Tall hires me for two dollars a day, hands me
a shovel, and says I'll pick it up. “The way things are going at
this mine, it won't be for long anyway,” Too-Tall says. This is
about when my fact-finding antennae go into hyper-activity.
“The
owners in Frisco are about to close us down and everything else is
going wrong too!” says he. “Cain't hardly understand it!” I
clap Too-Tall on the shoulder and tell him that help has arrived from
an unexpected direction. He shakes his head and I go off to grab some
worn-out old miner togs from a heap in the back, which feel like they
were hacked out of old pine shingles. I buy a carbide lamp on credit
at the company store, and down into the dark I go.
After a week I switch to the
midnight shift. That way, I can give a few pointers to the mine's baseball team at
batting practice after dinner. It's in the cellar, with 43 losses and 26 wins.
The mine is in even worse shape.
According to the boys, things seemed to go south all of a sudden.
Starting about two weeks before I drop in, gold-ore production
took a nosedive first in quantity and then quality. And all of a
sudden there were new, weird problems nobody had seen before. Miners
are a superstitious bunch and morale took a tumble.
I find out why
nobody wanted the mule-tending job: there was an unexplained
explosion in the mule stable a week back and it's made them all
superstitious that another mule is about to blow up. Meanwhile, all
the mules have belly aches and make a lot of noise. We are buying
gallons of Brother Jubal's patent medicine and mixing it in the water
trough, but it doesn't help.
I
find out plenty of other things the first week. I buy some paper and
a box of pencils and make a stack of notes back at the bunkhouse.
Soon it's time to start my root-cause analysis, frontier-style. It's
the world's first. (As I once explained at a staff picnic last year,
while the core concepts behind root causes are recognizable in
Aristotle's notions of moral responsibility and determinism, it didn't get going until
Operations Research during World War II and the postwar study
of loss control. So right now I'm seventy years ahead of the
competition.)
I initiate my work one Saturday night when I'm whooping
it up with the graveyard shift in the Dirty Dog Saloon. I shove aside
the shot glasses and peanut shells, pull out two sheets of foolscap
with my incident description and pass it around to the boys
for a review.
There
isn't room for all my work product in one whiskey bottle, but it went
like this:
“Series of mishaps and problems at the Acme Mine
beginning around August 2, continuing to date. Most time-critical is
rapid deterioration in gold-ore production. Problem first noticed
with downward trend in ore deliveries in tons/day, falling to 33%
below targets. Persisted for two weeks. Tonnage recovered to
acceptable range by August 12 but starting August 7, assayed ore
quality at the stamp mill fell from 10 troy oz/ton to 1 oz. Monthly
gross revenue from stamp mill dropped 32% year over year. Owners plan
to close mine at end of fiscal.”
I was careful to word this
like the classic I.D.: focus on describing the most serious symptom
and don't point fingers or guess about solutions. Too early for that!
Well, incident descriptions are something new to the crowd at the Dirty Dog Saloon, so I buy another
round of rotgut liquor and warm them up to the idea. The audience
even adds a couple of bullet points, along with a bullet hole
following gunplay between a tinhorn gambler and a placer miner two
tables over.
As
you know, my next task is the Problem Statement,
describing the deviation from the desired state, and of course it's
scoped to stay within our field of control.
I draft a short paragraph, in declarative sentences, stating the
goal to achieve.
Now the hard part is about to begin: getting buy-in from the powers that be. Something tells me that they don't place a lot of faith (yet) in statistics and decision trees.
Which follows in the second whiskey bottle, so stay tuned for the Part 2 of "Root Cause Analysis: The Time Traveler's Tale."