Friday, September 21, 2012

Information Classification

Information classification is the process of arranging information with shared characteristics.  It is a lot harder than what meets the eye.  And if it’s the employees classifying data (increasingly a non-issue as there is way too much information), it’s more like an art than a science and more like contextual guesstimating than measuring.  Even harder when a user must make the determination on what is a formal record (required for legal or regulatory reasons) – vs. what is not.  And there is a really good explanation as to why that’s the reality.

Today around the globe, employees do business, real business in Facebook, Twitter, blogs, SharePoint, text messages and email.  As email has been the business tool of choice for many years and as there are billions of them used in business every day, it’s a good place to start to explore just why having 100% exactitude in classifying is not a reality.

Let’s delve into an example to start to understand just how complicated the mere act of classifying information can be. 

Lily, the manager of the sales support unit gets the following email from Teddy, the head of the leasing business unit.  You make the call—is it a record and if so, how should it be classified?

“Thanks for overseeing the Ace Leasing deal.  I thought your assistant manager, Dylan did a good job and I think he is ready for bigger challenges and a boost in pay.  It would have been useful if he brought contracting in sooner.  We should really think about how to make the documentation process touch fewer hands and simpler over all.  Also, we need to get implementation services involved ASAP.  Please have Riley from contracting confirm the pricing, as it wasn’t on the attached proposal.

Best, Teddy. 
BTW-say hi to your daughter Cooper.”

This email or millions like it happen every day, all day long.  If you were asked to classify it, would you say it’s a record requiring long term retention? If you did, what kind of record is it?

If you had any employee determine what the business value of the email was, they could classify it many different, albeit CORRECT ways.  Most employees predictably classify information with a parochial perspective about what it is based on their work experience.  If Lily, the recipient classified it, she would be colored by the utility of the email for her job or department. In that case maybe it’s a sales record which should be put in the Ace Leasing file.  On the other hand, as a manager she may see it as an HR related record, which recommends advancing Dylan and getting him a pay raise.  Maybe it should even go into Lily’s personnel folder as being complimentary of her good management of her unit.  Maybe the email is a record for the contracting department or instructions for the implementation of the project. Maybe it’s also a record for the Business Process Improvement team to fix the business process as management thinks it’s broken.  Fact is it could properly be classified as all those types of records.  All different records have different retention periods associated with them.  And further depending upon who classifies and what business unit they are from, the result may be substantially different. 

Not surprisingly, employees are not particularly good at classifying information, even the smart ones, and if they don’t need to do it, they won’t, and don’t even care. Now imagine each employee touches 100 information nuggets daily that need classification.  This partly explains why classification is so difficult.  It also makes the point that there are many subjective right answers. I believe many records could be properly classified in different correct ways.  We sometime think there is only one right way.

For almost a decade I have been thinking about the use of auto-classification technology to classify and manage information.  I used to think it wasn’t ready for prime time.  Today it is really powerful when used properly.  I then got hung up on lawyers attacking it giving a known failure rate.  I got over that as they attack everything any way and reasonableness and information volumes dictate relying on technology to do the heavy classification lifting.  Given information volumes and expecting employees to do the classifying is like asking your auditors to count the grains of sand on the beach, and classify them according to size and shape.  And now I am down to how effective the technology has to be to allow your classification to be done by a computer.  There are no hard and fast rules about confidence ratings or efficacy scores (sometimes referred to as F-Score,) even though most people would be substantially comforted if there were simple rules for what was good or good enough.  

I know employees are not good at classification.  I know that employees don’t have time to do it and even if they did, they usually won’t get it right.  I know people classify information in different ways and rarely are consistent from employee to employee.  I know information volumes for most big businesses are growing at 20-50% per year.  I know computers can do classification.  I know it is not simple or cheap to do auto-classification.  I know it takes upfront effort to get auto-classification right.  I know that a company can’t dispose of business information without some diligence process to ensure that records are retained and evidence is preserved.  I know that I have concluded that every big business needs to consider defensible disposition of information using technology to make it happen.  In the end, I know people will attack the process and they will attack the auto-classification soft underbelly—the failure rate, the confidence score, the F-Score.  I used to think it had to be above 90% to be good enough. Then I thought well maybe 80% is good enough.

Well, I have changed my thinking because the paradigm bounding my thoughts on this topic is flawed. As the classification tool crawls, it uses linguistic and numerical analysis to determine what something is and how to properly classify it.  In the end if the software tells me it believes it’s correct with a confidence score of 51% or higher—what that means is the software probably got it right but maybe there is another category that is also a good option.  In the end people do exactly what the technology does, but we hold technology to a different and higher standard.  I am not sure what the right confidence score is, but I think we need to give technology a chance and not look for reasons to dismiss its utility. Nothing’s perfect, including your employees.


Wednesday, June 6, 2012

Kahn’s 4 Keys to Defensible Disposition

With virtually no companies methodically applying retention rules to their ever-growing information heaps, and no practical way for employees to discern what is needed and what is digital data debris, you need to be thinking about how you will defensibly dispose of info crud.  After all, “innocent” technology folks have been forced to defend claims of destruction of evidence for merely recycling systems to make room for more stuff.  So here are Kahn’s 4 Keys to Defensible Disposition.   

Kahn’s 4 Keys to Defensible Disposition
1.   There is sufficient diligence (including review, audit, analysis by human and/or technology) to determine that the information subject to disposition is no longer needed for records retention or legal purposes.
2.   The analysis and diligence process is managed by individuals without any personal interest or incentive in the disposition of the specific content subject to disposition and any disposition is undertaken with agreement and oversight by law department and relevant business unit heads.
3.   The disposition process followed is documented, routinized and repeatable and all disposition actions taken are authorized, final, complete and irreversible.
4.   Prior to any disposition, there will be sufficient notification of the proposed disposition actions to be taken, to the affected business unit heads and the legal representative to be able to immediately stop the disposition process if questions arise as to the appropriateness or legality of the disposition.

Friday, February 10, 2012

Keep Clouds Floating

Who do you do business with? When you need to park your information does it matter what parking lot you select? Do you select based on cost? Do you select based on functionality? Perhaps based on both? What matters most?

I do believe in the cloud. I don’t believe in parking information assets with the cheapest cloud or the one that has a questionable future life. If information is worth storing then it must be worth protecting and having access to in the future. If you have any question about whether or not the Cloud will be floating next week, and you don’t know if you will have access to your data, then you should care.

Imagine a company builds a “cyberlocker” business in the Cloud. Basically it’s a cloud storage provider with a cool moniker. Let’s call the business Megaupload for fun. And let’s say Megaupload decides to use other cloud storage providers to park your data—sort of like outsourcing the “storage in the cloud” to another “storage in the cloud provider.” But let’s say Megaupload is alleged to have done some IP thievery for which they are being pursued by the government for their alleged criminal wrong doing and as a result, the US government closes Megaupload’s cloud doors for business.

And because the doors were closed without warning, you don’t have access to your information. What if you never get it back?

Imagine no more because if you read the February 1, 2012 USA Today article entitled “Legit Megaupload users cut off from their files USA Today” you will realize the story is real and the Cloud risks you fear can come true. Kick the Cloud tires hard. Check the Cloud doors for tightness? Make sure the Cloud is mature and well financed and isn’t going away any time soon.

Information matters. Keep Clouds floating.

Are you kidding me

Tuesday, January 3, 2012

Bad Information Can Be Deadly

Bad info kills. Is it true that Yemen officials gave US bad intelligence info prompting a missile strike which killed a Yemeni Political instead of an al Qaeda leader as the US was told? Acting on bad info in any business impacts results in major kinds of ways. No doubt Jabir Shabwani, a guy “mistakenly” killed would agree that bad info can be deadly.

Are You Killing Me?

Read more in the Wall Street Journal, “U.S. Doubts Intelligence That Led to Yemen Strike” on December 29, 2011

Take Information Management Seriously

Criminal charges are being brought against BP engineers for the disaster of the Deep Horizon—the Gulf explosion that took 11 lives and created the worst environmental accident in US history. Apparently, the guys gave bad information to regulators which down played the risks of the deep water drilling operations. You think if the engineers, who are being CRIMINALLY prosecuted, got a “do over” they would make the same decisions as before. If providing bad information, destroying needed information and not retaining information can be the basis of prosecution than we should be taking its management more seriously?

Just saying, Are You Kidding Me?

Read more in the Wall Street Journal “Criminal Charges Are Prepared in BP Spill” December 29, 2011.