Archive for the 'Data Analysis' Category

Data void: False Positives

There’s a good post at Gartner pointing out the lack of data reported by vendors or customers regarding the false positive rates for anti-spam solutions.  

Although Gartner customers almost never complain about false positive rates, I wonder if false positives are under estimated. End users rarely complain about false positives, but they are very vocal reporting Spam in their inbox. Box Sentry (www.boxsentry.com) recently did a tests in a number of organizations and found the false positive rate in some organizations using popular anti-spam tools was as high as 13% of legitimate emails. The largest proportion of false positives in their study was legitimate person-to-person traffic.  While it could be that these organizations have over-tuned their systems to block more Spam at the expense of quarantining more legit email, the reality was the email administrators had no idea they had such a high false positive rate because they never checked.  Have you? 

Going further, it would be very valuable to estimate the cost of false positives.

As I’ve discussed in a previous post, this is just another instance of a general problem in the security industry.  You can’t do rational analysis of effectiveness, cost-effectiveness, risk, and the rest without some estimate of false positive rates and their costs.

Symantec State of Security 2010 Report Out

http://www.symantec.com/content/en/us/about/presskits/SES_report_Feb2010.pdf

Thanks to big yellow for not making us register!  Oh, and Adam thanks you for not using pie charts…

The Visual Display of Quantitative Information

In Verizon’s post, “A Comparison of [Verizon's] DBIR with UK breach report,” we see:

pie-charts-suck.jpg

Quick: which is larger, the grey slice on top, or the grey slice on the bottom? And ought grey be used for “sophisticated” or “moderate”?


I’m confident that both organizations are focused on accurate reporting. I am optimistic that this small example in the utlity of pie charts will inform report writers. The report writers and their graphics departments, loving their customers, will move to bar charts to help them compare numbers between sources.

I’m confident that not using pie charts is a best practice.

Elsewhere: “The only time it makes sense to use a pie chart.”

And elsewhere: “The Visual Display of Quantitative Information, 2nd edition

Does It Matter If The APT Is “New”?

As best as I can describe the characteristics of the threat agents that would fit the label of APT, that threat community is very, very real.  It’s been around forever (someone mentioned first use of the term being 1993 or something) – we dealt with threat agents you would describe as “APT” at MicroSovled when I was there in 2001-2005.  We dealt with it as a firewall vendor at Progressive Systems in 1998.  This isn’t a “is the APT real?” blogpost.

That said, I wanted to talk about why there should be still more discussion around the APT.  Hogfly at the Forensic Incident Response blog asks:

“What should matter is how successful they have been. What should matter is defending ourselves. What should matter is how and where we share this information. What should matter is taking this information to those with the ability to do something about it. What should matter is taking the fight to the enemy.

So I ask again, does it matter if this threat is new?”

My response is that it actually matters very much.

We are hearing a new label.  Whether the label originated from “the cool kids” or not, it’s being co-opted by marketing.  And right now, we’re sort of in this important window of trying to get some understanding, some significant amount of intersubjectivity about what the APT is and what it means to a broader audience.  Once that’s established, then we can try to understand what to do.  But why does it matter if the threat is new or old?

There is a significant increase in the use of the term.  When it’s a BusinessWeek cover story (2008, btw), it gets seen by people.  What we need to understand is if this “new” visibility is the result of either a change in the threat landscape or a change in the marketing landscape.

IS APT A SHIFT IN FREQUENCY, A SHIFT IN CAPABILITY, OR A SHIFT IN BOTH FREQUENCY AND CAPABILITY?

If it is a change in the threat landscape, we need to understand what aspect of the landscape is changing.  The shift could be said to be one of a few scenarios:

1.)  More attacks on the same targets by the same actors. That is, is the government, defense industrial base, or other targets attractive to certain nation-states are experiencing a new amount of threat events.

2.) More attacks on new targets by the same actors. That is, are the nation-state actors finding new targets?  If so, are their targets of choice changing from organizations that are antagonistic to the policy desires of the sponsor state (certainly the Mandiant report reads like the Chinese are after anyone who threatens their political stability), to other targets – like retailers or hospitals (has, as Mandiant says, the APT become *everyone’s* problem)?

3.)  More attacks on the same targets by new actors. That is, it’s not just the usual suspects.  If *this* is the case, then we’re seeing a fundamental shift in the capabilities of threats.  That is, bad guys who used to be dumb just got a lot smarter thanks to the dissemination of skills/resources (sharing of technique, new access to advanced toolsets, etc) and they are going after all those people who were worrying about the APT in 2003.

4.)  More attacks on new targets by new actors. That is, the bad guys who used to be dumb just got a lot smarter and are now trying to use their new smarts against victims who heretofore had not had to worry about the APT.

Finally, the other option is that there is no shift in frequency or capability, but there is a shift in marketing budgets.  I tried to run a google trend on “Advanced Persistent Threat” but got:

Your terms – “Advanced Persistent Threat” – do not have enough search volume to show graphs.

And “APT” trend search was clouded by other things that shared the same TLA.

WHAT DO YOU THINK?

I’m not sure what we’re seeing.  I was personally disappointed by the Mandiant report’s lack of demographics and frequency information.  I’m ready to believe that we’re seeing a fundamental shift in distributions concerning the threat agents, but there wasn’t anything in the report to support that notion.  I will leave you with a couple of items from the Verizon Report, though, and I’ll let you draw your own conclusions, given that the Verizon data set isn’t heavy on what we might call the Defense Industrial Base – those folks already live and breathe this stuff  – and this data is from 2008.

SOURCE OF ATTACKING IP

TARGETED VS. OPPORTUNISTIC ATTACKS

TREND IN USE OF CUSTOMIZED MALWARE

TIME TO DISCOVERY

NotObvious On Heartland

I posted this also to the securitymetrics.org mailing list.  Sorry if discussing in multiple  venues ticks you off.

The Not Obvious blog has an interesting write up on the Heartland Breach and impact.  From the blog post:

“Heartland has had to pay other fines to Visa and MasterCard, but the total of $12.6 million they have set aside to handle the one-time costs is a drop in the bucket compared $1.5 billion in 2008 revenue and does not really even skim much off the top of the $161 million in profits from that same year (the numbers for 2009 look to be tracking the same). It is almost a guarantee that any member of the class action who submits a claim will see many years of scrutiny before receiving any payment, something which Heartland can factor into their yearly financial plans (and accommodate for by increasing fees).”

For thought:

  1. One wonders how much a “sufficient” (loaded term, of course) InfoSec program for a company like Heartland costs on an annual basis.
  2. Does this set a sort of “worst case” bounds to impact distributions?
  3. If so, how does a worst case impact of ~$13million (US) impact security management at retailers (politically)?

Chris Soghoian’s Surveillance Metrics

I also posted about this on Emergent Chaos, but since our readership doesn’t fully overlap, I’m commenting on it here as well.

Chis Soghoian, has just posted some of his new research into government electronic surveillance here in the US. The numbers are truly astounding (Sprint for instance provided geo-location data on customers eight million times in thirteen months).

There’s lots of great data on what’s being collected versus what’s being reported as collected. I know you’ll all be shocked to know that surveillance is dramatically under reported. It’s all very very interesting. Check it out.

For Those Not In The US (or even if you are)

I’d like to wish US readers a happy Thanksgiving. For those outside of the US, I thought this would be a nice little post for today: A pointer to an article in the Financial Times,

Baseball’s love of statistics is taking over football

Those who indulge my passion for analysis and for sport know that I love baseball and love how the “Moneyball” approach challenged decades of dogma in the national pastime with scientific analysis.  Today’s financial times discusses how Chelsea (“The Blues” – UK football team) collaborates with the Boston Red Sox (the most superficial bandwagon team ever in baseball) on decision making and analytics.

Go Blues

Best lines:

“Mike Forde, Chelsea’s performance director, visits the US often. “The first time I went to the Red Sox,” he says of the Boston baseball team, “I sat there for eight hours, in a room with no windows, only flipcharts. I walked out of there saying, ‘Wow, that is one of the most insightful conversations on sport I have ever had.’ It was not: ‘What are you doing here? You do not know anything about our sport.’ That was totally irrelevant. It was: ‘How do you make decisions on players? What information do you use? How do we approach the same problems?’”

and:

“Forde sees his task as “risk management”.

Huh.

Rational Ignorance: The Users’ view of security

Cormac Herley at Microsoft Research has done us all a favor and released a paper So Long, And No Thanks for the Externalities:  The Rational Rejection of Security Advice by Users which opens its abstract with:

It is often suggested that users are hopelessly lazy and unmotivated on security questions. They chose weak passwords, ignore security warnings, and are oblivious to certi cates errors. We argue that users’ rejection of the security advice they receive is entirely rational from an economic perspective.

And you know it’s going to be good when they write:

Thus we find that most security advice simply offers a poor cost-benefit tradeoff to users and is rejected.  Security advice is a daily burden, applied to the whole population, while an upper bound on the benefit is the harm suffered by the fraction that become victims annually.  When that fraction is small, designing security advice that is beneficial is very hard.  For example, it makes little sense to burden all users with a daily task to spare 0.01% of them a modest annual pain.

People are not stupid.  They make what we, as relative experts on the topic of security, perceive to be bad decisions, but this paper argues that their behavior is rational.

[W]e argue for a third view, which is that users’ rejection of the security advice they receive is entirely rational from an economic viewpoint.  The advice o ers to shield them from the direct costs of attacks, but burdens them with increased indirect costs, or externalities. Since the direct costs are generally small relative to the indirect ones they reject this bargain. Since victimization is rare, and imposes a one-time cost, while security advice applies to everyone and is an ongoing cost, the burden ends up being larger than that caused by the ill it addresses.

The paper provides both a good and accessible overview of externalities and rational behavior using spam as an example.

For example, Kanich et al. [32] document a campaign of 350 million spam messages sent for $2731 worth of sales made. If 1% of the spam made it into in-boxes, and each message in an inbox absorbed 2 seconds of the recipient’s time this represents 1944 hours of user
time wasted, or $28188 at twice the US minimum wage of $7.25 per hour.

Coincidentally, we get a little over 300 million spam messages into our corporate email gateways every month, which means that I can compare the cost-per-delete-click (at $7.25/hour) against the cost of our corporate spam filtering contract without having to do any real math.  Since we pay about $50,000/month for filtering.  That means that we’re getting a pretty good deal, since our white-collar employees cost over $14/hour.

That’s just time that would be spent seeing and deleting the message, don’t forget.  Fourteen Dollars per hour completely ignores the cost of attention disruption (much more than two seconds) and the Direct Losses, either because I cannot quantify, which causes the entire argument to appear specious in the eyes of  Senior Leadership, or I am not at liberty to disclose enough detail to pass the “cannot quantify” test.

They then go on to document in fairly accessible models why password complexity, anti-phishing awareness, and SSL Errors are cost-inefficient, and get into a favorite topic of mine, the difficulty of defining security losses or the benefit from adding safeguards at the end-user level.  This section should be mandatory reading for any security person who attempts to talk to non-security people about the topic–i.e. all of us.

What’s missing from the paper, though, is the next logical step of analysis, the appropriate Risk Management strategy in response to the information presented. Hopefully that will be the follow-on paper, because as it was, it felt like a bit of a cliff-hanger to me.  All of the discussion assumes that mitigation is the only option.  This may feel right from a Security perspective, but it’s probably not the correct risk management decision.

To manage the risk in these cases, though, I see a strong argument for risk transfer.  High-Impact, Low-Likelihood events are best managed by aggregating the risk into a pool and spreading the cost across the pool, i.e. buying insurance against these losses.  If you could buy anti-phishing insurance for $1/person/year (which, realistically, is multiples of what it could cost if 200 million people all bought in) rather than throwing large, uncoordinated piles of money at ineffective awareness training or technical countermeasures which will probably be out-innovated by the attackers in hours or days, why wouldn’t you?

Why have anti-virus vendors not thought of this?  If your AV vendor said they would also insure you against Direct Losses (having your bank account cleaned out) for your $50/year subscription, would that differentiate them enough to win your business?

By all means, we should continue to work on the challenges of improving the security experience and reducing the risk of using computers.  More accurately, though, we should be reducing the amount that must be experienced by users at all to improve security of their information and transactions.

How to Value Digital Assets (Web Sites, etc.)

Many security management methods don’t rely on valuing digital assets.  They get by with crude classifications (e.g. “critical”, “important”, etc.).  But if you need to do financial justification or economic analysis of security investments or alternative architectures, especially risk analysis, then you need something more precise and defensible.

This tutorial article presents one method aimed at helping line-of-business managers (“business owners” of digital assets) make economically rational decisions.  It’s somewhat simplistic, but it does take some time and effort.    Yet it should be feasable for most organizations if you really care about getting good answers.  Warning: No simple spreadsheet formulas will do the job.  Resist the temptation to put together magic valuation formulas based on traffic, unique visits, etc.

(This is a long post, so read on if you want the full explanation…) 
Continue reading ‘How to Value Digital Assets (Web Sites, etc.)’

Botnet Research

Rob Lemos has a new article up on the MIT Technology Review, about some researchers from UC Santa Barbara who spent several months studying the Mebroot Botnet. They found some fascinating stuff and I’m looking forward to reading the paper when it’s finally published. While the vast majority of infected machines were Windows based (64% XP, 23% Vista), 6.4% were running either OS X Tiger or Leopard, demonstrating yet again that just because you have a Mac doesn’t mean you are safe. More interesting to me was:

The researchers also discovered that nearly 70 percent of those redirected by Mebroot–as classified by Internet address–were vulnerable to one of almost 40 vulnerabilities regularly used by the most popular infection toolkits designed to compromise computer systems. About half that number were vulnerable to the six specific vulnerabilities used by the Mebroot toolkit.

The research suggests that users need to update more often, says UCSB’s Vigna.

Unfortunately, until the paper comes out we won’t know which vulnerabilities were being used and how old they are. Hopefully, that will be explained further as it would be really interesting to see how this data compares with what Verizon found in their research.