Monthly Archive for February, 2010

Human Error and Incremental Risk

As something of a follow-up to my last post on Aviation Safety, I heard this story about Toyota’s now very public quality concerns on NPR while driving my not-Prius to work last week.

Driving a Toyota may seem like a pretty risky idea these days. For weeks now, weve been hearing scary stories about sudden acceleration, failing brakes and car recalls. But as NPRs Jon Hamilton reports, assessing the risk of driving a Toyota may have more to do with emotion than statistics.

Emotion trumping statistics in a news article?  Say it isn’t so!

Mr. LEONARD EVANS (Physicist, author, Traffic Safety): The whole history of U.S. traffic safety has been one focusing on the vehicle, one of the least important factors that affects traffic safety.

HAMILTON: Studies show that the vehicle itself is almost never the sole cause of the accident. Drivers, on the other hand, are wholly to blame most of the time. A look at data on Toyotas from the National Highway Traffic Safety Administration confirms this pattern.

Evans says his review of the data show that in the decade ending in 2008, about 22,000 people were killed in vehicles made by Toyota or Lexus.

Mr. EVANS: All these people were killed because of factors that had absolutely nothing to do with any vehicle defect.

HAMILTON: Evans says during that same period, its possible, though not yet certain, that accelerator problems in Toyotas played a role in another 19 deaths, or about two each year. Evans says people should take comfort in the fact that even if an accelerator does stick, drivers should usually be able to prevent a crash.

(bold mine)

From 1998 to 2008, about 2,200 people per year (out of a total of about 35,000 total vehicle deaths per year) died in Toyotas because of some sort of non-engineering failure.  During that same period, just under two people were killed per year due to the possible engineering failure.  So all this ado is about, at most, a 0.09% increase in the Toyota-specific death rate and a 0.005% increase in the overall traffic death rate.

So why is the response so excessive to the actual scope of the problem?  Because the risk is being imposed on the driver by the manufacturer.

Mr. ROPEIK[(Risk communication consultant)]: Imposed risk always feels much worse than the same risk if you chose to do it yourself. Like if you get into one of these Toyotas and they work fine, but you drive 90 miles an hour after taking three drinks. That won’t feel as scary, even though its much riskier, because you’re choosing to do it yourself.

And, lest we forget, even in the case where the accelerator did stick there was still a certain degree of human error:

Mr. EVANS: The weakest brakes are stronger than the strongest engine. And the normal instinctive reaction when you’re in trouble ought to be to apply the brakes.

My frustration is when I compare the reality of the data with most of the reporting on the subject, I think of Hicks’ Hudson’s NSFW “Game Over” rant. (Corrected per the comments.  Thanks, 3 of 5!)

After all, given that you’re more likely to die in your home (41%) than in your car (35%), you’re still statistically safer taking to the road than sitting home cowering in fear of your Prius.

Human Error

In his ongoing role of “person who finds things that I will find interesting,” Adam recently sent me a link to a paper titled “THE HUMAN FACTORS ANALYSIS AND CLASSIFICATION SYSTEM–HFACS,” which discusses the role of people in aviation accidents.  From the abstract:

Human error has been implicated in 70 to 80% of all civil and military aviation accidents. Yet, most accident reporting systems are not designed around any theoretical framework of human error. As a result, most accident databases are not conducive to a traditional human error analysis, making the identification of intervention strategies onerous. What is required is a general human error framework around which new investigative methods can be designed and existing accident databases restructured. Indeed, a comprehensive human factors analysis and classification system (HFACS) has recently been developed to meet those needs.

Consider that pilots, whether private, commercial, or military, are one of the more stringently trained and regulated groups of people on the planet.  This is due, at least in part, to the history of aviation.  As the report notes,

In the early years of aviation, it could reasonably be said that, more often than not, the aircraft killed the pilot. That is, the aircraft were intrinsically unforgiving and, relative to their modern counterparts, mechanically unsafe. However, the modern era of aviation has witnessed an ironic reversal of sorts. It now appears to some that the aircrew themselves are more deadly than the aircraft they fly (Mason, 1993; cited in Murray, 1997). In fact, estimates in the literature indicate that between 70 and 80 percent of aviation accidents can be attributed, at least in part, to human error (Shappell & Wiegmann, 1996).

One upon a time, operating an airplane was so dangerous that only highly-skilled experts could do it, and even then the equipment would get out of their control and crash.  Later (yet still almost twenty years ago), the equipment improved to the point that equipment failure no longer overshadowed operator error, but planes still get out of control and crash.

Other than the fact that pilots are almost universally still highly-skilled and/or trained operators, this doesn’t sound all that different from the evolution of computing.

Flight has obviously never really had the adoption rate explode like PC’s in the Age of the Web, but there is still a strong parallel between aircraft accidents and Information Security failures.  This assertion becomes even more true once the paper gets into James Reason’s “Swiss Cheese” model of understanding root causes of aircraft accidents.

Reason identifies four factors that interact with each other increase accident rates, which I’ll paraphrase as:

  1. Unsafe Acts — This is the cause of the active failure (i.e. crash), such as a poor decision or a failure to watch the instruments or otherwise recognize the unsafe situation was forming or occurring
  2. Preconditions for Unsafe Acts– Situations that increase risk of an accident, such as miscommunication between aircrew members or with others outside the aircraft, such as air traffic control
  3. Unsafe Supervision– failures of management or leadership to recognize when they are, for example, pairing inexperienced pilots together in less-than-optimal conditions
  4. Organizational Influences — Usually business-level decisions, such as reducing training hours to reduce costs

How familiar does this sound?  If you’ve ever read an IT Audit report, this should seem painfully familiar, even if only analogously.  The paper provides a strong taxonomy within each area, and I could easily drill down at least one more level into each one.  Read the paper to learn more and become a better professional problem solver, security-related or otherwise.

For example, using a real-world case I dealt with recently.  This is an easy example which ties the four levels together more neatly than many, so consider it an “Example-Size Problem” and extend as you see appropriate.

The incident was the loss of sensitive business information, which I personally believe hurt the company in a negotiation:

  1. Unsafe Act:  The VP left his unencrypted laptop unattended while at a meeting — this was the Active Failure/Unsafe Act that led to the Mishap
  2. Preconditions:  The VP assumed that others were watching his laptop, but did not explicitly confirm this fact
  3. Unsafe Supervision:  Despite knowing that Executives are high-risk users with regards to sensitive information on their laptops, the IT Executive Support Team had recommended against deploying Full-Disk Encryption on executives’ laptops because they feared being held accountable if an executive lost information due to an encryption system failure
  4. Organizational Influences:  While a Laptop Encryption Policy existed and specified that the VP should have been encrypted for multiple reasons, the policy was widely ignored, there was no cultural pressure to ensure that mobile information was protected, and thus compliance was unacceptably low.  No pressure to comply was generated by Executive management because the cost associated with doing so was considered to be prohibitive.

In this case, the damage (opportunity cost) of lost revenue due to that single lost laptop was many multiples of the complete cost of deploying a Full-Disk Encryption system.  Unfortunately, in the absence of a comprehensive analysis of the series of failures leading up to the unsafe act, the real root cause of an incident may be ignored or mis-assigned, leading to either an incomplete or unsustainable remediation course.

When incidents occur, it’s rare to see a true and honest assessment not just what went wrong, but why.  Too often, in fact, the culture seems to be to put it down to, “nobody could have predicted it.”  Reject these assessments.  To improve an organization, we must refuse to accept these explanations.  Instead, find the root cause–all the way up to the Organizational Influences–and then Fix It.

Pie charts are not always wrong

In a comment, Wade says “I’ll be the contrarian here and take the position that using pie charts is not always bad.” And he’s right. Pie charts are not always bad. There are times when they’re ok. As Wade says “If you have 3-4 datapoints, a pie can effectively convey what one is intending to present.” Which is true. But in every case I’ve seen, those situations are as well served with a small bar graph.

What’s the least contrived situation in which a pie chart is better than a bar graph or table? (Pac man and pies are two obvious examples.)

Symantec State of Security 2010 Report Out

http://www.symantec.com/content/en/us/about/presskits/SES_report_Feb2010.pdf

Thanks to big yellow for not making us register!  Oh, and Adam thanks you for not using pie charts…

The Visual Display of Quantitative Information

In Verizon’s post, “A Comparison of [Verizon's] DBIR with UK breach report,” we see:

pie-charts-suck.jpg

Quick: which is larger, the grey slice on top, or the grey slice on the bottom? And ought grey be used for “sophisticated” or “moderate”?


I’m confident that both organizations are focused on accurate reporting. I am optimistic that this small example in the utlity of pie charts will inform report writers. The report writers and their graphics departments, loving their customers, will move to bar charts to help them compare numbers between sources.

I’m confident that not using pie charts is a best practice.

Elsewhere: “The only time it makes sense to use a pie chart.”

And elsewhere: “The Visual Display of Quantitative Information, 2nd edition

Adam & Andy Jaquith: A conversation

In December, Andy Jaquith and I had a fun conversation about info security with Bill Brenner listening in. The transcript is at “Meeting of the Minds,” and the audio is here.

Measuring the unmeasurable — inspiration from baseball

The New School approach to information security promotes the idea that we can make better security decisions if we can measure the effectiveness of alternatives.  Critics argue that so much of information security is unmeasurable, especially factors that shape risk, that quantitative approaches are futile.  In my opinion, that is just a critique of our current methods and instruments, not any proof of ultimate feasability.  What we need is major innovations in metrics, instrumentation, and such.

We can take inspiration from other fields.  Consider this innovation in statistical value management in baseball, a.k.a. the ”Moneyball” approach:

Evaluating fielding is baseball’s hardest math. There are just too many unknowns in a play. How much ground did Jeter cover? How fast was the ball moving? In essence: How unlikely was it that he’d catch the ball?   [...]

Sportvision’s FieldFX camera system records the action while object-recognition software identifies each fielder and runner, as well as the ball. After a play, the system spits out data for every movement: the trajectory of the ball, how far the fielder ran, and so on. “After an amazing catch by an outfielder, we can compare his speed and route to the ball with our database and show the TV audience that this player performed so well that 80 percent of the league couldn’t have made that catch,” says Ryan Zander, Sportvision’s manager of baseball products. That information, he says, will allow a much more quantitative measure of exactly what is an error.

Happy Valentine’s Day!

They say that Y equals m-x plus b
(well, when you remove the uncertainty).
So let me reveal a secret confession:
You’re the solution to my least squares obsession.

stolen from the applied statistics blog

Open Security Foundation Looking for Advisors

Open Security Foundation – Advisory Board – Call for Nominations:

The Open Security Foundation (OSF) is an internationally recognized 501(c)(3) non-profit public organization seeking senior leaders capable of providing broad-based perspective on information security, business management and fundraising to volunteer for an Advisory Board. The Advisory Board will provide insight and guidance when developing future plans, an open forum for reviewing community feedback and a broader view when prioritizing potential new services.

I figure readers of this blog should be interested in helping drive open data sources.

Best Practices for Defeating the term “Best Practices”

I don’t like the term “Best Practices.” Andrew and I railed against it in the book (pages 36-38). I’ve made comments like “torture is a best practice,” “New best practice: think” and Alex has asked “Are Security “Best Practices” Unethical?

But people keep using it. Worse, my co-workers are now using it just to watch get me spun up. My continued snark is clearly a Best Practice because I keep doing it despite evidence that it doesn’t work.

I’d love to hear your experiences. What are proven or effective practices for getting people to stop using the term?