Digging Into My Facebook Data

Piles of Books

Facebook has been in the spotlight lately, over a variety of issues related to how they collect and use the data of their “members”. Which means they’re doing a lot of apologizing and tinkering with their system, hoping to avoid more negative publicity and political interference.

But even without the recent problems, Facebook would be making alterations to their data policies, because of new laws in the European Union that go into effect next month. Among other features, the General Data Protection Regulations (GDPR) will give citizens of the EU the right to see the data companies have collected on them.

Which is probably one reason why Facebook is now offering a way to download a copy of the information they have on you. You’ll find a link to make the request under General Account Settings.

If you’re an active Facebook user, be prepared for a large file. They will be sending your entire timeline, all the messages you’ve sent and received, every photo and video you’ve uploaded, and more.

My file, however, was not large at all, a zipped file of 74kb.

Although I registered for a Facebook account ten years ago, I’ve never posted anything in that time1 and very rarely comment on the posts of others. The only reasons I open the app a few times a month are to see the latest photos from friends and relatives, and to read new comics from Bloom County. I’m just not very social I guess.

In fact, the only even slightly interesting part of my Facebook data is in the Ads section, where we find a list of advertisers with my contact info. First advertiser: Cyndi Lauper. Farther down is Rod Stewart. Very odd.

The rest of the list includes a few companies I use regularly or from whom I’ve requested information. And many sites dealing with crowdfunding I’ve never heard of. I’m very sure I did not click on any ads for these firms in Facebook or on articles related to them.

All of which leads to a basic question: why did Facebook send my information to those advertisers? What did their algorithms find in my bland profile and very sparse timeline that lead to those matches? I suspect some of this data came from the harvesting Facebook does on other websites.

Anyway, check out the data Facebook has stored in your account. You may find something even more interesting.


The image is piles of old fashioned data taken by Michael Coghlan, posted to his Flickr account, and used under a Creative Commons license.

1. Ok, maybe not never. I found one post I made in April 2010: “Still on my ongoing effort to figure out the appeal of Facebook and why I would want to spend time on it. At least the iPad makes it easier than than the iPhone app. :-)”. I’m still working on that.

The Problem Is Greater Than Facebook

Following up on the previous post, a few more random thoughts related to the current Facebook data security mess.

First, the problem with the collection and use of personal data extends far beyond Facebook. Google, Twitter, Instagram, WhatsApp1, SnapChat, and many other social media companies all offer services you don’t pay for.

All make money through selling you, their “members”, to advertisers. All have long, legally detailed terms of service, which you agreed to (even if you didn’t read it), that allow them to use your contributions and data in pretty much any way they want. Which brings up copyright issues that are a whole ‘nother rant.

But it’s not just social media collecting your data. Plenty of companies that charge for products and services – Apple, Samsung, Amazon, your phone and cable companies, your supermarket, gas station, and big box stores (remember your loyalty card?) – collect valuable data on your buying habits. And pretty much anything else they can find. Information they can use to make even more profits.

It will be interesting to see whether Europe’s new data security laws, which take affect in May, will impact the behavior of Facebook and the others. One major goal of the legislation is to give users more control over their data, including the ability to have some of it deleted. Facebook and other data-driven companies, on the other hand, are dependent on users willingly giving over their information and not caring what happens next. 

Over here in the US, despite calls for investigation and pending lawsuits, our current laws probably don’t cover this situation. It’s also very unclear what new regulations on Facebook and other social media companies would look like, considering the long tradition of free speech rights in this country. Plus, if actual data breaches of the past are any indication, there isn’t a lot of political will to do anything related to consumer protection.

I’ve seen many calls on Twitter and elsewhere to delete your Facebook accounts. That’ll show them. Except it probably won’t since the people who actually follow through is a very, very small fraction of their overall membership. Plus, Facebook will still have your data and has the infrastructure in place to continue following you around the web.

On top of everything else, Facebook makes it very difficult to actually delete an account. Bill Fitzgerald, my go-to guy for understanding data security and privacy issues, has some recommendations for people who want to try. If you’d rather continue using Facebook, check out Wired’s guide to the complicated world of their privacy and security settings.

Finally, when Mark Zuckerberg’s name comes up in the news, does anyone else picture Jesse Eisenberg in The Social Network? Considering Zuck’s shall we say “relaxed” attitude towards the privacy of his customers, I’m beginning to think the portrayal of him in that film wasn’t all that far from real life. Maybe he needs to hire Eisenberg to front him and get Aaron Sorkin to write the script. Certainly would be more entertaining.


Cartoon is by the wonderful Randall Munroe, posted at his site xkcd and used under a Creative Commons license. Check out his book What If? in which he answers absurd hypothetical questions with real science.

1. Instagram and WhatsApp are both owned by Facebook.

Selling Your Personal Data Is Their Business

Grid

You probably noticed that Facebook was in the headlines again this week.

Social media, TV pundits, and politicians were outraged over high profile investigative reports in the New York Times and the Guardian claiming that personal information on 50 million Facebook users had been harvested by a researcher in 2014 and used to create targeted political ads for the trump campaign.

The details, of course, are far more complicated.1

For one thing, too many reports are calling what happened a “data breach”, often comparing it in some way to the Experian story from last year. But the term breach implies that someone outside of Facebook, in this case a researcher for the UK-based data analysis company Cambridge Analytica, broke in and stole the information.

In fact, the researcher followed Facebook’s rules and only collected information from something like 270,000 users, all of whom consented to the process. Then, thanks to the Facebook terms of service and API2 that applied in 2014, he was also able to harvest data from all of their friends, which brings us to the 50 million number most often quoted.

So, rather than having personal data stolen, Facebook gave it away. Or more likely, sold it.

Because that is their business model. It’s why the company has a market cap of around half a trillion dollars and CEO Zuckerberg has a net worth north of $60 billion.3

Facebook is very successful at collecting data from it’s more than two billion active members and then selling it to advertisers. Cambridge Analytica was one more advertiser and it didn’t matter that their ads were misleading and dishonest (at best). As long as the funds transfer went through.

Whatever you call this particular abuse of member data, it’s only the latest in a long string of arrogant and clueless decision the company has made over it’s short history. And, even with new privacy laws in Europe and Congress critters fighting over the opportunity to hold hearings, it probably won’t be the last.

And this is as good a time as any to again point out two facts about Facebook that anyone with an account should remember (but probably doesn’t):

1. Facebook is a multinational corporation not a community. Communities are built by people and, while it’s possible to create one using an online platform, the company itself is not going to make it happen.

2. Facebook membership is free. Which means you are not the company’s customer; you are the product they sell to advertisers. Monetizing your content and data is their first, maybe their only, concern.


I’m not sure the image has anything to do with this story.

1. In addition to the two articles linked above (the Times piece is probably a little better), Wired has done some of the best analysis of this story. This piece is a good place to begin.

2. API is application programming interface, the rules established by tech companies that allow outside code to communicate with their systems. In most cases, companies like Facebook provide very specific instructions as to what can be done with APIs.

3. Both took a big hit on Monday when Facebook’s stock dropped hard after investors spent the weekend digesting the Times and Guardian reports from Friday.

The European Approach to Protecting Your Data

Almost four years ago, the highest court in the European Union (EU)1 ruled that citizens of member countries had a “right to be forgotten”. Of course, that ruling left some holes and more than a few questions. But it did trigger some increasingly public conversations around the general topic of privacy and personal data.

That discussion, paired with some massive data breeches at high profile companies, led the EU Parliament to create a new set of laws2 dealing with data security and privacy. Those rules, the General Data Protection Regulations (GDPR), will become effective in the EU beginning in May.

In general, the GDPR sets strict guidelines for the kind of data that can be collected from individuals by companies and organizations, and how that data can be used. That data includes anything that can be used to specifically identify a person (including social media posts, location info, photographs, etc.), as well as not so obviously personal information like race, religion, and politics.

GDPR also requires companies to obtain more specific consent from the user as well as explaining more clearly how their data will be used. Specifically excluded is vague language like “Improving users’ experience”, “marketing purposes”, or “future research”. Companies must also make it easy for users to withdraw their consent and are then required to delete the material they’ve collected. 

So what has any of this got to do with those of us not living in Europe? Plenty.

While the regulations are specific to the member countries of the EU, most of what I’ve read about them suggest that all of us in the US, and elsewhere in the world, will likely be affected by them.

The law applies to any company or organization that does business in the EU member countries and collects personal data from their citizens. That includes many based in the US, familiar names like Facebook, Google, Microsoft, Apple, and more. Since most multinational corporations shuffle information around the world, it’s very likely that they will need to adapt their data handling practices everywhere, not just in Europe.

Plus the law also also provides for some pretty hefty penalties for misusing or failure to secure the data, including fines of up to €20 million or 4% of “global turnover”, whichever is larger. To put that in some perspective €20m (about $24m US at the moment) is pocket change for Facebook. 4% of their total income is not.

I know, all of this is pretty geeky stuff.

However, it’s also important if you’re concerned about the data most companies are already collecting about you and others. If you’re interested in more details of the GDPR in basic, non-legal language, check out this rough guide to GDPR and/or this short summary directed at US corporations.

Of course, the EU laws are not perfect. There will likely be much confusion when they take effect, and when the first law suits follow not long after. It will be interesting to see whether the big data collectors will be forced to change their behavior. Or will they just find new ways to continue their current practices? After all, our information is the foundation of their massive profits.

Beyond that, there’s also the larger question of whether the US should implement similar laws? It’s not likely to happen in this political climate, with political “leaders” who claim that the “free market” will protect us all. But maybe some outside pressure on US-based companies may effect some need change.


The map is from the BBC, showing the current configuration of the European Union. Of course, their home country, the United Kingdom, is in the process of a very contentious “Brexit” from the EU, so that map could change in 2019. In more than one way if the people of Scotland and Northern Ireland make some hard decisions.

1. Very tangential side note: I love that the official anthem of the EU is based on Beethoven’s “Ode to Joy”. Certainly more uplifting music than the militaristic tones of most national anthems.

2. In some of what I’ve read, experts says that GDPR isn’t so much “new” law as it is a clarification of many different data and privacy regulations that are already on the books, combined with court rulings. Either way, GDPR is likely going to change the way companies do business in the EU, and possibly elsewhere.

Average Doesn’t Necessarily Mean Fair

 

In all the yelling back and forth (aka “discussion”) about net neutrality rules last year, much was said and written about the “average” internet user in the US.

The FCC chairman, who led efforts to kill them, and his supporters claimed that competition in the market would take care of any issues related to ISPs who try to slow or block competitors on their networks. According to this theory, customers could just switch to another provider if their current ISP begins to play with the traffic.

Except that “average” doesn’t mean everyone is equal, and is usually a crappy way to understand any issue.

The map above illustrates just how bad the internet market is for most locations in the US. It uses actual data about the availability of high bandwidth access, the kind necessary to fully benefit from modern web services, and clearly demonstrates that it “varies greatly based on where you live”.

Average in this example is weighted very heavily in favor of metropolitan areas where households are likely to have at least two high bandwidth choices when it comes to internet service providers. The darker colors on the map show areas with fewer choices and slower speeds.

But even in those lighter areas, like that splotch of white around Washington DC where I live, choice doesn’t necessarily mean competition. Our two major ISPs offer the same packages at the same price. And once the furor over this issue dies down, both are equally likely to favor their own content over competitors. We do have a few other, smaller, options buzzing around but they are not equivalent, even if the chairman wants us to believe they are.

So, maybe “average” is acceptable to those who dislike all governmental regulation. But it’s not to the millions who are below, and far below, that average.


On another issue, this map was created using data from a variety of public sources and ESRIs wonderful Story Map application. You can zoom in to the county level to get more information, although you should pull the map into it’s own window for best results.