Checklists and Information Security

by adam on April 10, 2012

I’ve never been a fan of checklists. Too often, checklists replace thinking and consideration. In the book, Andrew and I wrote:

CardSystems had the required security certification, but its security was compromised, so where did things goo wrong? Frameworks such as PCI are built around checklists. Checklists compress complex issues into a list of simple questions. Someone using a checklist might therefore think he had done the right thing, when in fact he had not addressed the problems in depth…Conventional wisdom presented in short checklists makes security look easy.

So it took a while and a lot of recommendations for me to get around to reading “The Checklist Manifesto” by Atul Gawande. And I’ll admit, I enjoyed it. It’s a very well-written, fast-paced little book that’s garnered a lot of fans for very good reasons.

What’s more, much as it pains me to say it, I think that security can learn a lot from the Checklist Manifesto. One objection that I’ve had is that security is simply too complex. But so is the human body. From the Manifesto:

[It] is far from obvious that something as simple as a checklist could be of substantial help. We may admit that errors and oversights occur–even devastating ones. But we believe our jobs are too complicated to reduce to a checklist. Sick people, for instance, are phenomenally more various than airplanes. A study of forty-one thousand trauma patients in the state of Pennsylvania–just trauma patients–found that they had 1,224 different injury-related diagnoses in 32,261 unique combinations. That’s like having 32,261 kinds of airplane to land. Mapping out the proper steps for every case is not possible, and physicians have been skeptical that a piece of paper with a bunch of little boxes would improve matters.

The Manifesto also addresses the point we wrote above, that “someone using a checklist might think he’d done the right thing”:

Plus, people are individual in ways that rockets are not–they are complex. No two pneumonia patients are identical. Even with the same bacteria, the same cough and shortness of breath, the same low oxygen levels, the same antibiotic, one patient might get better and the other might not. A doctor must be prepared for unpredictable turns that checklists seem completely unsuited to address. Medicine contains the entire range of problems–the simple, the complicated, and the complex–and there are often times when a clinician has to just do what needs to be done. Forget the paperwork. Take care of the patient.

So it’s important to understand that checklists don’t replace professional judgement, they supplement it and help people remember complex steps under stress.

So while I think security can learn a lot from The Checklist Manifesto, the lessons may not be what you expect. Quoting the book that inspired this blog again:

A checklist implies that there is an authoritative list of the “right” things to do, even if no evidence of that simplicity exists. This in turn contributes to the notion that information security is a more mature discipline than it really is.

For example, turning back to the Manifesto:

Surgery has, essentially, four big killers wherever it is done in the world: infection, bleeding, unsafe anesthesia, and what can only be called the unexpected. For the first three, science and experience have given us some straightforward and valuable preventive measures we think we consistently follow but don’t.

I think what we need, before we get to checklists, is more data to understand what the equivalents of infection, bleeding and unsafe anesthesia are. Note that those categories didn’t spring out of someone’s mind, thinking things through from first principles. They came from data. And those data show that some risks are bigger than others:

But compared with the big global killers in surgery, such as infection, bleeding, and unsafe anesthesia, fire is exceedingly rare. Of the tens of millions of operations per year in the United States, it appears only about a hundred involve a surgical fire and vanishingly few of those a fatality. By comparison, some 300,000 operations result in a surgical site infection, and more than eight thousand deaths are associated with these infections. We have done far better at preventing fires than infections. [So fire risks are generally excluded from surgical checklists.]

Security has no way to exclude insiders the fire risk. We throw everything into lists like PCI. The group who updates PCI is not provided in depth incident reports about the failures that occurred over the last year or over the life of the failure. When security fails, rather than asking, ‘did the checklist work’, the PCI council declares that they’ve violated the 11th commandment, and are thus not compliant. And so we dan’t improve the checklists. (Compare and contrast: don’t miss the long section of the Manifesto on how Boeing tests and re-tests their checklists.)

One last quote before I close. Gawande surveys many fields, including how large buildings are built and delivered. He talks to a project manager putting up a huge new hospital building:

Joe Salvia had earlier told me that the major advance in the science of construction over the last few decades has been the perfection of tracking and communication.

Nothing for us security thought leaders to learn. But before I tell you to move along, I’d like to offer up an alpha-quality DO-CHECK checklist for improving security after an incident:

  1. Have you addressed the breach and gotten the attackers out?
  2. Have you notified your customers, shareholders, regulators and other stakeholders?
  3. Did you prepare an after-incident report?
  4. Did you use Veris, the taxonomy in Microsoft’s SIR v11 or some other way to clarify ambiguous terms?
  5. Have you released the report so others can learn?

I believe that if we all start using such a checklist, we’ll set up a feedback loop, and empower our future selves to make better, and more useful checklists to help us make things more secure.

3 comments

Excellent post. I add the following to your alpha:
– Did the failed controls have a metric?
If yes, was the target appropriate?
If no, should you periodically measure performance and what is the initial target value?

Honest question: Why don’t checklists include performance measurement beyond periodic audits?

I think it’s because the checklist stakeholders believe measurement is too expensive. Until they have an incident, they’re correct. Perhaps we need another question on the checklist:
Has management made the explicit decision to wait for an incident before investing in control performance measurement?

by Jared on April 10, 2012 at 5:35 pm. Reply #

I’d like to expand the items on your checklist slightly, simply because I find compound checklist items more difficult to grok.

1. Have you addressed the immediate concerns?
a. Have you addressed the breach?
b. Have you gotten the attackers out?
2. Have you notified all appropriate parties?
a. Have you notified management?
b. Have you notified regulators?
c. Have you notified shareholders?
d. Have you notified customers?
e. Have you notified all other stakeholders not already mentioned?

Et cetera, et cetera.

by Cliff Barbier on April 12, 2012 at 7:16 pm. Reply #

I’ve started writing about the future of security compliance, would be nice to hear your thoughts: http://wp.me/p2mKzA-5

by atorm on April 28, 2012 at 7:59 pm. Reply #

Leave your comment

Not published.

If you have one.