In which Jill dispels the maxim that sharing is caring.

I always marvel at the amazing web of relationships I’ve developed in my analytics career. I run into people who’ve worked with my team, people who know my clients, blog readers, and friends of friends. Analytics and data people have long memories. It’s best just to be nice to everyone because you never know who might be your next colleague (or boss!).

It was fun to get a note from a retail expert who attended a few of my keynotes and kept the conversation going:

Hi, Jill.I saw your keynote on data strategy at TDWI Las Vegas back in 2011. Then you appeared here in Rome in 2013, and I attended your talk with several colleagues.

One of your topics was classifying data, and we’re still talking about it. You showed a few examples of how one of your clients had established data categories. We are now at the point where we’ve established our own data classification scheme but cannot agree on which data should be “shared” and which data should be “public.” I’m not sure we even understand the difference! Can you help?

–Stefano, Rome

Ciao, Stefano! Since we saw each other last, the industry’s focus has shifted from authoritative and harmonized master data to harnessing streaming sensor data in real time — and everything in between. We’ve learned two things:

  • Companies aren’t capturing all the data they claim to need
  • Companies are only using a fraction of the data they capture

How do we reconcile these two lessons? By understanding the data in the context of its usage. Given data’s increasing volumes and complexities, classifying data is a good way to start.

Here’s an example of a data classification:

Data Classifications - Jill Dyche

This example presents tiers of data based on organizational breadth or reach. It establishes how available and pervasive certain data is. However, this is by no means the only way to classify data. Data may also be classified by:

  • Audience type: For instance, business people viewing daily sales reports have different consumption behaviors than data scientists
  • Security tier: Which access rules will be assigned to different data types
  • Data source: Different systems of origin mandate different provisioning and usage rules
  • Value: Certain data might be more critical to your company’s business operations (think credit risk score) or operations (think profitability)

Shared data is different from public data. Public data is out there for the world to see, freely and inarguably viewable by a range of individuals inside and outside your company’s four walls. Your stock history, list of strategic business partners, and the weather data you use to optimize delivery routes are all examples of public data.

Shared data, on the other hand, involves specific parties. Your sales team might share product defect feedback with R&D. Your campaign managers might share PII (personally identifiable information) with your privacy office for compliance purposes. Shared data is more confined, so parties who don’t share or use it might not even know it exists.

The good news is the bad news: the burden is on the teams that govern and manage corporate data to know the difference between data that is public and data that is merely shared. Understanding data’s context will determine which classification fits best. Buona fortuna!

Original post on “Q&A with Jill Dyché” column on