Careless researchers expose millions of Facebook users’ sensitive data

Cyber Chief Magazine brings you the tactics to uncover and neutralize the insider threat

If you needed another reason to stop sharing intimate information with apps on Facebook or Facebook itself, consider this newest revelation: academics at the University of Cambridge have been using the data harvested through myPersonality, a popular personality app, as a basis for a tool used for targeting adverts based on personality types.

facebook user data sharing

Access to the tool was reserved for those who paid for it but, by now, we’re all used to companies earning money by using our data.

The worse news in this case is that they put the data on a website to share with other researchers (for free) without thoroughly anonymizing it and that, for four years, this data was accessible to anyone who discovered access credentials posted on GitHub.

Poorly anonymized and secured data

According to New Scientist reporters, the data found sitting on the aforementioned site includes status updates, personal details (age, gender, relationship status) and “Big Five” personality scores of millions of users who used the app.

This information isn’t tied to a name/username, but to unique IDs associated with each user data set. As far as anonymization goes, that’s no nearly good enough: anyone could easily discover the Facebook identity of each user based on some of that data. And, as Facebook insists on real names to be used for opening user accounts, in most cases this means that the data is tied to a real-world name and identity.

Free access to this data was limited to scholars and researchers working at a variety of companies. They registered as a collaborator to the project and received login credentials to the site. Unfortunately, one of these academics shared their credentials with students working on a course project, and they inadvertently published the login credentials on GitHub.

So, for four years, anyone who discovered the credentials was free to use them and access the sensitive data.

Who’s to blame for this mess?

Everyone in this sorry mess is trying to shift the blame on someone else.

The University of Cambridge says the app was created by David Stillwell, one of the duo of University of Cambridge’s academics behind the myPersonality project, well before he joined the university and so it did not go through their ethical approval processes.

Facebook has suspended the app from its platform in early April, ostensibly due to policy violations (poor explanation of how the collected data will be shared).

Stillwell says that Facebook was aware of the details of the myPersonality project for many years, and definitely before they changed their platform policies in 2014, when they reduced the data apps could access. He also says that researchers that were given access to the data set had to agree not to de-anonymize the data.

While his first point is valid, adequate anonymization of the data set surely had to be their responsibility. Alternatively, they could have found a better way for the data to be used without providing direct access to it.

As it stands, the data was accessed and potentially de-anonymized and used by who knows how many authorized and unauthorized individuals and organizations, and copies of it are likely in too many hands for it ever to be permanently and unquestionably destroyed.

As an interesting sidenote: dr Alexandr Kogan, the man who created the Facebook app that harvested user information that ended up being used by Cambridge Analytica, was at one time part of the project. He was still part of it when Stillwell and his colleagues refused to provide Cambridge Analytica with access to their data in 2013, because they said they intended to use it for political purposes.