Comments about technological history, system fractures, and human resilience from James R. Chiles, the author of Inviting Disaster: Lessons from the Edge of Technology (HarperBusiness 2001; paperback 2002) and The God Machine: From Boomerangs to Black Hawks, the Story of the Helicopter (Random House, 2007, paperback 2008)

Saturday, July 7, 2012

Root Cause Analysis: The Traveler's Tale, Part 2

Recap of Part 1: In the first installment about our analytically-inclined time traveler, Rick Asplundh plans an easy test of the company's time machine at the request of the R&D Division chief, intending to go back in time just a few minutes and then return. But the uncalibrated machine goes back much further, dumping him in the Black Hills outside Deadwood, Dakota Territory. Date: 1877. And the “return to home” button doesn't work either. Asplundh waits by the stranded machine, then goes into town to find work. He takes a job at a gold mine tending mules underground, and quickly learns that the underground workings are deep in a business crisis. Earlier that month, something very odd started afflicting the mules and the mining gear too, reducing daily tonnage, and then ore quality. 

Seeing the miners spiral into despair, Asplundh realizes he has a skill nobody in 1877 has: training in root cause analysis! He convenes a group of fellow miners at a saloon and takes them through the initial scoping steps. Rick used their bullet points to write up an incident description, but that's just the beginning. He needs the Acme mine management to empower a team and get behind a business-improvement process that won't be invented for another seventy years. Now, Part 2 of the memo that Rick is writing to his boss, “A Traveler's Tale.”

       ~ ~ ~ ~ ~ ~ ~ ~ ~

           SECOND MEMO

From: Richard C. Asplundh, Staff Engineer, Rapid Prototyping Div.
To:      Boss
Re:     Mixed Results with Prototype X-1A Time Machine, Continued
Date:   September 6, 1877
Via:    "Winged-Victory" Brand beer bottle

Picking up from the first memo: I was on the way to Too-Tall Johnson's office after getting off work one morning, pondering how I could get the mine bosses to take my incident description and root-cause analysis seriously.

I find the front office abuzz with sports talk. Forget the gold mine going bust: today's panic is the mine's baseball team, because the playoffs are coming up. Another mine has hired away our coach, and odds are long. It looks like the crises are just piling up into a big explosion, but I am not discouraged. Using principles from Eliyahu Goldratt's Theory of Constraints, I think I'll try an Evaporating Cloud diagram. It's one way to work through apparently unsolvable dilemmas to a win-win solution by breaking down preconceptions. I go to a dusty counter, unroll a sheet of butcher paper, and draw the boxes and arrows, Goldratt-style: The mine bosses want the business problem to be fixed but they are so discouraged they don't want to invest any effort or staff … unless it's for the baseball team. I probably could help the mine by nominating and leading a root-cause investigation team if I could get one with knowledge and implementation authority, but that's a hard sell in the 1870s. I've got a true dilemma on my hands that if left unsolved, will wreck the business and my confidence as well.

Goldratt said that every dilemma offers a way out – no exceptions. The word “team” sticks in my mind. I decide to focus my thinking on the mine's nine and its place in this puzzling picture, then Eureka!

I evaporate the cloud by injecting more details and getting past old assumptions, just like Goldratt described in his business novel The Goal. Fact: the Acme Nine needs a head coach. Fact: I coached Little League baseball. Fact: from going to the games, I know that our batting roster has almost all the mining knowledge I need, from hard-rock miners to drillers, bookkeepers to blacksmiths to the steam hoist operators. We even have a shift boss as a third bagger, and we'll need a supervisor to carry the weight when the going gets tough, as every root-cause process does. It's got a few holes but I can figure a way to fill out any knowledge gaps, or my initials aren't RCA.

I turn, call for attention, and shout these immortal words: “What if I coach the team, and finish out the season?”

A pause, then “Huzzah for Rick!,” they say. They know from my color commentary in the stands that I understand the game, and I've shown them a few pitches outside the Gem Theater that turned their heads. Or was it the beer?

But first!” I pull out my first-draft Problem Statement and ask for some advice. After some editing around the potbelly stove, it reads like this when signed by the mine superintendent:

Problem: Beginning in early August the Acme Mine began suffering from a mysterious, abrupt production crisis, first reflected in daily tonnage and then a declining ore assay. Together these have slashed revenue and threaten to close the operation. Goal: Find the root cause and meet or exceed revenue targets before the next quarterly report to Corporate.”

One more thing,” I add as I fold up the paper, which is critical to scoping my investigation and pursuing implementation schemes. “We've got a problem in the infield, so I need to switch out our shortstop with a Welshman who works in the powder magazine. I'm thinking Scorch-Face Smith.” Yes, Scorch is a good switch-hitter when he's sober, the superintendent agrees, but wants to know why. I think fast and say daily exposure to nitroglycerine is going to give Scorch that extra burst of speed to steal some bases.

The real reason is that the first wave of data-gathering has given me an intuition (yes, hunches are a valid part of any root-cause investigation, if followed with data and logical cause-and-effect rigor) that we can't dig up our root cause without an explosives guy on board.

Why? I hear that the powder monkeys know more about some problems down below than they have let on. And that reluctance doesn't necessarily mean they are part of the root cause. Typical for root cause investigations: somebody's always dragging their feet, or actively misdirecting the effort, and guilt may have nothing to do with it. They act out of fear their department will get blamed or maybe they'll have to do more work because somebody else screwed up. I'd say that office politics is in the top three reaons why root-cause efforts fail, along with lack of persistence and poor followup at the end of of the improvement process, during implementation.

So on top of everything else in managing my new 1870s lifestyle and running the mule team downstairs, I'm supposed to coach a pretty rough bunch into a championship baseball team. I teach them a few pitching and hitting tricks that Abner Doubleday never thought up. They appreciate the tips and that's why they're willing to indulge my root-cause work on the side. They think I'm half-addled – I can't even keep my story straight on where I came from. That's okay – whatever it takes!

I don't have to remind you that mustering the right investigation team is key: they need to have subject-matter expertise, good interpersonal skills to dig out the “who, what, when, where, why, and how” information, and finally the ability to implement a solution. Because this is looking like we have a long and tangled chain of causation that produces multiple symptoms – nothing simple here -- they need a good facilitator to keep them on track. That's me.

One of my jobs is to keep the root cause team/ baseball team in the containment zone. That means sticking to the range of relevant operations: not all events that affect business operations, but only those that they have the ability to change. So when they start blaming gold prices in Frisco, that's out of bounds. I'm not looking for cheery faces all the time: I usually find constructive disagreement gives the best results.

And there's plenty of unhappiness to keep me happy. Right away I have to a put a new man in right field since I need a really smart guy from the steam-drilling department. He's got bad eyesight and this would have caused an uproar except that our pitcher is developing quite a slider and not many balls are getting out to right field.

Two weeks pass. The baseball team has won a few games nobody expected and now it's time to introduce my newly energized baseball team to the Ishikawa fishbone diagram. “Now, I know what you're going to say, so just hold your horses,” I start out. “Scorch, you were going to say that fishbones have their detractors, but!  I like fishbone diagrams because filling out them out brings forth rich speculation about the full range of possible causes and connections. So bear with me.” These are hard-rock miners who haven't been doing a lot of diagramming on the frontier or any kind of symbolic logic for that matter, so it's not easy to sell them on the idea, but I insist that this is part of my strategy to raise team morale before the playoffs.

So after the practice each afternoon, and with a few gallons of beer that I buy, we flesh out the fishbone. You'd recognize it right away: problem statement goes on the right, linked to a spine that ties together branches representing major categories of possible causes. We settle on six fishbone categories adjusted for the setting:
  • Machines: Steam drills (drills, boilers that supply steam); tools for dressing bits; tools used at the working face in the mine; Hoist gear; Mine cars; Mine rails; Timbers to hold up workings
  • Men: Miners (poor training, poor safety discipline, labor unrest); Hoist operators; Mule wranglers and stablehands ( I make sure to include my department, to make the point that no one is immune from the evil eye)
  • Materials used up daily: Dynamite, Coal used for raising steam, Steel drill bits, water quality
  • Management: Superintendent; shift foremen
  • Measurement: Assay office, scalehouse that weighs tonnage, surveys of underground workings
  • Mine (Environment): Ore bodies; Temperature; Humidity; Flooding; Fire; Cave-ins
Normally I'd keep my fishbone nice and dry in a conference room with a big whiteboard, using markers and sticky notes. A good fishbone is something the Japanese keep posted in meeting rooms for months, in fact. But in the Wild West, I settled with a whitewashed wall on the back of a toolshed by the baseball diamond and that's my whiteboard. We just nail the pieces of paper to the board.

It can be hard to move the notes around on such a board but we've got to have something that keeps the paper from blowing away in the Chinook wind. Four-Finger Halloran, the second baseman, says we should move the mining engineer from “Men” to “Management” since he does the geology and his plans guide the excavations. The bat-boy, who's wanted in Texas for a murder or two, speaks up: “Yep, if the engineer was on the job! He was visiting his uncle over in the Glory Hole Mine last month, stepped on a rotten trapdoor and fell down a shaft. He's been laid up all month.” I tell them not to jump to conclusions like blaming it all on the absent engineer. But something here is worth pursuing, so I enrich this stem with some notes and move the Mine Engineer stem and all its twigs to a new spot under the Management branch. I need a nail to hold it and one of the boys obliges with a throwing-knife: it lodges in the paper and misses me. I give him a hard look but we now have a set of possible intermediate causes and will add earlier causes to them.

As I've said a million times or more, the typical fishbone diagram starts as no more than a categorized, broad-span brainstorm list based on current knowledge, in which the main categories show up as four to eight big branches. Within each branch are stems. Here, “drills” and “hoists” are stems under “machinery.” Each stem that looks promising to the RCA team deserves the addition of twigs that list the events leading to the specific problem. 

It's this later work, probing into possible causes with methods like the Five Whys, that brings the project to a good conclusion. I don't believe that using “why” exactly five times for each symptom is always the best method, and sometimes a better question is “how could,” but the boys seem to prefer the Whys. There are too many permutations to investigate to the Nth degree, but there's enough wisdom on the baseball team to pick the most promising theories for a closer look. I buck them up by saying that the odds are good that somewhere in this burgeoning list is our root cause, or causes, along with the chain of events that led to the bad events described in the Problem Statement. Now we need facts to prove, or disprove, whether the circled, top-priority causes were a factor.

Okay. Now set aside all your guesses on why the mine is is in so much trouble,” I say. “Just go out and gather the data and let the bones fall where they may.” At first the boys struggle with this, since it could reflect badly on their department, but I compare it to a bloodhound. Who can lead a bloodhound when it picks up a scent? Nobody! The bloodhound finds its own trail.

Duly instructed if not inspired, the Acme Nine baseball/root-cause team indulges me by bringing new information after each practice. Some of it is in tabular form. I gather it up and try out a Pareto chart – trying to find the classic 20% of causes that will bring 80% of the benefit. Paretos need a heap of statistics and these are sparse along the frontier. All I have for each day are things like mining tonnages, water in the sump, steam generation, and staffing levels. The mining engineer is still out from his fall down the shaft, so I don't have the benefit of his expertise. In fact, it may be better that he stays away. The mood was pretty ugly last week, because some of the fellows thought he had sabotaged the mine.

We circle a dozen possible causes to start with, though we'll probably have to dig into more of them later. To illustrate how hard they work at this, take the Steam Drill Problem, which is just one twig under the branch labeled Machinery:

Why – Level 1 “The steam drills are under-performing by 35% when making holes for placement of explosives - Why?” They guess that it could be the drills, the driller, or the steam supply. The drill squad goes off to investigate and reports back to the full team that nothing is wrong with the drills or drillers, but the steam flow is low.

So that leads to the Second Why: “Why is the supply of steam to the drills low?” The team says it could be due to at least five causes, including the boiler machinery (such as plugged tubes, from mineral scaling), or a change in fuel supply, or efficiency of the steam lines. The “steam team” goes off to check and reports back that supply pressure is nominal at the boiler side, but strangely low at the far end of the hoses. A clue!

Now the Third Why: “Why is steam pressure low at the drills' inlet? “ Before charging off, I have them brainstorm all possible causes: there could be too much hose in the run, the hose could be leaky, or maybe it's plugged with rust or debris.” Give me numbers! I cry and the steam team hustles off check. They discovers that the hoses are leaking more than usual: an average of 2.5 leaks per 20 feet of hose. We are starting to close in on a contributing cause, I'm sure.

And a fourth Why: “Why are there so many leaks in the steam hoses?” They hypothesize as follows: the leather or rubber could be getting old, the man in charge of patching the routine holes might be falling down on the job, there could be sabotage, or something could be wearing them out faster. A newly appointed Hose Team goes off to investigate.

So you get the idea. By now everybody in the mine begins to see that using Five Whys is no shortcut, and that root-cause work is more perspiration than inspiration.

Meanwhile I'm compiling a detailed chronology for the Change Analysis. That's a complete list of events at the mine, day to day, over the last six weeks. It took a lot of work, but I like them: a chronology is one way to sift through the causes and effects, in this case, what might have driven the deterioration in production statistics. My Pareto chart tells me that there's no significant difference between ore production between the day shift and the night shift, but there was a striking change in tonnage over time, in one of three payzones where miners are working. That payzone, called Queen of the North, had been accounting for most of the revenue, and now it's way down, so the Pareto charts indicate a cause or causes should be found there.

We carry on for another week and I feel like we're closing in on it. The Acme baseball team is on a winning streak. Better yet, a plausible chain of causation is emerging out of our many Cause and Effect Diagrams and a Current Reality Tree

I start them working on an Anticipatory Failure Analysis. Shortly afterward the crisis hits, which it always does at some point among us root-cause practitioners. But this crisis is a little more pointed: namely, the point of a Colt .45. When I walk into the office, there's a committee waiting for me. Or shall I say a posse? I'll finish my report when and if they let me out of the hoosegow.

         ~ ~ ~ ~ ~ ~ ~ ~ ~
Conclusion in Part 3!

1 comment:

  1. Very very nice, my quickstart az team likes your blog and they are following you through. Keep posting and we will wait for your next release.