One of the innovations we’ve worked on at DataBanker is our method for measuring the somewhat abstract concept of “Privacy.” The measurement and analysis of privacy factors is surfaced in a Privacy Score, providing feedback on an entire data portfolio, and as a graphic measurement of privacy disclosure for a single data source.
Before I’m done explaining how I scored privacy, I have to climb up a mountain and explain a little of social network and graph theory. However, before we uncoil our rope, I’d like to address the question of why I decided to provide what appears to be the first practical Privacy Score.
My first task upon joining that earlier company was to gather data samples from various sources so that their file formats could be analyzed, and the upload and conversion software developed. Friends and family were asked for their data and their uniform response was “Why, and what are you going to do with it?” After explaining the company’s personal data marketplace business proposition (when you’re the first, it may take a couple of tries), it turned out that a key part of the “what are you going to do with it?” question was really about our security and privacy practices. The question was expected, but the level of concern from friends and family was unexpected. People who felt I could be trusted with key assets and personal health were asking me this question, and not sending their data until it was sufficiently answered. It must be a REALLY important question.
We responded by incorporating privacy as a key component to our business plan and service architecture. We developed innovative methods and designs so that those concerns were answered. We filed for patents on several innovative ideas that we discovered in treading this new path. The concepts and design behind the Privacy Score was one of those filings. Yeah, I’m kind of proud of that.
That’s the “Why?”, now let’s start the climb and I’ll explain the “How?”.
As luck would have it, at the time I started gathering data for the marketplace inventory, I was also reading Duncan Watts’ excellent book, Six Degrees: The Science of a Connected Age. As I was reading, I started thinking about how reputation and privacy were closely linked, and how they were examples of a “cascade” type of power law relationship. To put it into layman’s terms: If small rocks are stacking up on a hillside ledge, the rock pile will build up until slides have a high probability of happening with each new addition. It’s a gradual build-up with occasional small, limited releases, until a big rock comes along and knocks it all away.
The core idea that releasing enough information about yourself eventually causes the loss of some privacy, is easy enough to grasp. That’s a pretty simple power law function where all elements are roughly equal in weight and keep building until small slides start happening. The probability of a slide – a metaphor for an impact to your relationship – increases greatly as you reach the knee of a curve and keep adding stuff.
What was missing from many privacy discussions was the big thing that comes crashing like the Kool-Aid Man through your rockpile, and sends your reputation flying. That’s the “KAM Factor”.
The base for the Privacy Score is a power law function modified by a kicker – the “KAM Factor”. The next tweak to the equation addressed the question, “Once your privacy/reputation has sustained the “KAM” release, how much worse can things get?” It turns out that they can get a bit worse. If Punchy (from Hawaiian Punch) and a red bull come running down the mountain with the Kool-Aid Man, the rock pile is going to release with a little more energy than the Kool-Aid Man alone. If your reputation suffers from your being an embezzler, and on your way to court you also get cited for littering, you’ll find that the judge can think even less of you. The power law is still active as a minor component of your Privacy Score, once the KAM Factor has been established as the main driver of privacy and reputation.
The above description provides a map to the shape of, and key factors in, the privacy scoring model. What’s not described is how the sensitivity weights for the data elements are assigned. That requires an understanding as to which data elements are KAMs. In other words, which elements are the ones that people will be sensitive about releasing? There are two key components to sensitivity concerns that can be simply stated as: “What do they know?” (we call this “Persona”) and “Do they know that it was me?” (this is “Identity”).
Persona elements describe the container that is a person: What kind of things they like, what products they own, and what they do. A well rounded Persona is a reputation impact that hasn’t happened yet, but could if an Identity assignment is made.
Identity elements are those that decrease one’s anonymity. Identity elements are things like a name, an online alias, an email address, a SSN, or an account number. A frequented location is also a type of Identity element, but one that behaves differently from the other naming elements previously described. A location could also be a Persona element describing the environment you did your activity in. In creating a sensitivity score for adding up Identity elements, location-related elements are separated out and treated differently, before finally being added into both the Identity and Persona components of the Privacy Score.
Once the Identity and Persona component scores are calculated, they are factored together to calculate the Privacy Score. That Privacy Score is an estimate of risk for reputation impact from the information you are releasing. It’s worth stating that the Privacy Score does not include a factor for whether the potential impact would be positive or negative for your reputation.
The Privacy Score is calculated using probabilities and sensitivity weights appropriate for a general audience. If your data contains values that are far from normal, or if someone already holds a secret about you that is also released to them in an aggregated, but detailed, report, your probability for reputation impact increases. Likewise, the single element sensitivity values that are assigned by default may be different than what you would set for those elements. A personalized service should let you tune those sensitivities to your individual tastes.
The Privacy Score algorithm provides a unique view into the amount of information you are releasing about yourself, but it also has limits. Please spend time to understand how those limits relate to your circumstances and feelings around privacy. Remember that you own the final responsibility for your Privacy and Reputation.