Category Archives: information

Nana Your Business! – My Perception of American Privacy

Knock Knock.
Who’s There?
Nana.
Nana Who?
Nana your business!

As I read through the press, blogs and message boards these past few weeks, I see a lot of what I suspect are misconceptions about the nature of privacy we enjoy.   So much of common perception seems as shallow as the knock knock joke above with little more nuance than the exclamation point at the end of a child’s joke.  Yet, the very issue of defining what we mean by privacy requires context to provide a definition for privacy and our expectations.  Even so, the term privacy means very different things to different people at different times.  

With that said, the first area I think of privacy is health care, though it may be just because it seems an issue in so many groups to which I belong (MS groups and groups taking care of sick kids).  I know when I go to the doctor I have to sign Health Insurance Portability and Accountability Act of 1996 (HIPAA) forms.  To tell the truth, I bet most people think the acronym is for Health Information Privacy Act without even knowing there is another “A” at the end to make them pause. How many people know what they sign simply says they have been presented with an opportunity to see the privacy practices of the establishment, and they have the right to request additional restrictions on who may see or use your data?  This is far different from how many seem to envision HIPAA, but then how many of us read the law or even take the time to look it up on line?

Speaking of which, how many of us read all of the terms of agreement or access pages which pop up with so many websites?    Last week, I attended a lecture given by Paul Ohm where he talked about the mistaken impression many of us have about our privacy.  Here’s a hint: We don’t have many if any.  When it comes to those agreements at which most of us barely glance, he quoted a study saying if U.S. consumers actually read all of the agreements, it would take more than 3 billion hours, or said another way, more than a thousand hours for every man, woman and child in the U.S. There is a reason we don’t read them.  There is simply too much to process without grinding our lives to a halt.

Here is one of Paul Ohm’s papers: http://ssrn.com/abstract=1450006

After listening to him and the case studies, I would bet the combination of a few publicly accessible databases would identify the first and last name of every poster on every message board to which I belong along with addresses for where we live.  For a small example of how easy it is to identify people, how many people do you think share your zip code, birthday (including year) and sex?  For 87% of us the answer is 0. (page 1705).  Page 1717 has three great examples of data we thought secure enough to protect our privacy failing to do so.  My favorite is William Weld, then gov of Massachusetts swearing his medical records were safe because the company maintaining the database for researchers had stripped his “SSN, address, name and other explicit identifiers” prior to making the data public. A graduate student purchased the public database for $24 and sent the governor’s health records including his current and past diagnoses along with his prescriptions to his office.

So how do we collect and use data without disclosing individuals’ information?  Now there are many companies and government establishments who swear to protect your data, but how do they propose to do this?  The three main ways I see this done are 1) strip personal information from the files 2) try not to release data at any level which would allow a person to identify another persons information in a released data cell or 3) distort all data in ways big enough to preserve “confidentiality” without distorting the results to the point of making the data useless for the purposes collected.

(for more on the methods used to protect confidentiality, continue on next page)

For the first method of stripping personal information, we saw how well this approach worked for William Weld in the previous example.  Usually, I see approaches number 1 and 2 combined.  The problem with this is keeping track of all of the relationships of data cells.  If I can isolate all the records for people in a state, and then all of the records for people in a social group, and all the people who own cars who left state A on a given date, and all the people who…  Remember, up above in the data used to pick out the single ID?  That was only 3 data points at a single point in time.  We provide so much more all the time.  Even just our cell phones give rough locations at multiple points in time.

Of course there is always option 3 where all of the data is distorted a little bit.  This seems on its surface a bit reassuring because suddenly there is no single point for my record.  There are simply circles of possibility in which my true data point lies.  The problem with this approach comes from much the same place as the example above.  If I provide enough data points/circles, eventually my record is identifiable.  The only limitations on the ability to identify my point are the numbers of databases and ability for computers to draw the circles of possibilities to identify space shared by all of the circles.  With all the data we provide voluntarily and the ever increasing speed of computers, how secure is my information?

Now it’s true you may not know my exact answer, location or whatever other peace of data for which you are mining.  However, you know enough to make rough guesses about much of my life.  When it comes to predicting facts about me now or how I will react in the future, were you ever going to do better?  By this I mean even the most basic facts about individuals and their identification has some randomness on any given day.  Years ago, I worked on a survey given every 5 years where one of the questions was about race.  One of the most confusing things as an analyst was coming to understand why and/or how the same people reported different races than they had reported 5 years prior.  There were logical reasons, but the point remains.  Even the most basic things about us are not set in stone.  So a rough picture of me and my likely predispositions and actions defines me pretty darn well…possibly even as well as the non distorted picture.

In summary, we live in a database driven world.  With or without this most recent Supreme Court DNA ruling or evidence of phone companies cooperating with the NSA, the preconceptions some people seem to have about privacy they enjoy are incredibly suspect.  I just can’t get excited about the recent news.  The type of privacy many of those upset think we had was long since discarded.  What’s more, we threw it or signed it away ourselves in the name of expediency.

What we need is a new socially acceptable definition of the privacy we can reasonably expect.  I’ll start.  I want society able to broadly approve the types of usages for data sets.  The 3 billion hours of reading on privacy rights for every web site along with different rules for each doctor’s office or hospital creates an unrealistic definition of “acceptance.”  It is too easy to change parts of these agreements without consumer awareness.

If we take the collection of data for granted, perhaps it is time to fall back to the next bulwark, the usage of data collected about individuals. As I understand things, this is the more European model for safeguarding privacy, but I know less about their concept of privacy.  Knowing ours is complicated enough.

Share