Data can provide information, information can lead to insight and knowledge, and knowledge is power. It’s no wonder, then, that seemingly everybody in this modern, computerized world of ours loves to suck data from everyone and everything.
We are getting used to it, more or less, but many are striving and fighting to set healthy boundaries for data collection and use.
Ethical considerations for data collection and use
Laura Norén, director of research at Obsidian Security and a sociologist with an interest in the social impact of technology and the ethics of data science, notes that there are four typical ethical considerations that come up in data-saturated projects:
- Individuals’ rights regarding their data
- Informed consent
- Fairness, accuracy, and transparency of data-informed, algorithmic technologies.
With regard to privacy, the basic question is: are we protecting people’s ability to manage the way they want to be perceived by others, including corporations that make decisions about their creditworthiness, employability, character, and insurability?
“The second point is related to the first. We are entering a technological moment in which it is fairly easy to collect and store a more complete, lasting data portrait of an individual than we have ever been able to do before, and the implications of this are still unfolding,” she explains.
Do individuals have rights to their data after they die? Should it be deleted or can it be used? If the it’s the latter, how will this effect the friends and family the deceased leave behind? Do the want the details of their lives, conversations, and relationships shared and what can they do if they don’t? The right to be forgotten is being debated around the world. EU’s legal implementation of the right to be forgotten is a good, if imperfect, first step.
When it comes to informed consent, theory and practice are greatly mismatched. The fact that privacy policies and terms of service exist and are there for people to read what is happening with their personal data is good, but the fact that certain policies aren’t written in an accessible way often leaves the end user unable to understand and provide meaningful consent.
“The European Commission has started to investigate improvements to informed consent by requiring transparent, accessible language instead of dense legalese. Still, if data trails can potentially last for a very long period of time, and if the technological cost of communicating with individuals has declined, it may be beneficial to develop ongoing consenting procedures to replace the once-and-forever approach,” she opines.
Finally, as human history is riddled with bias and technologists draw on data generated from our biased organizations and cultures, the algorithms are likely to produce biased results.
“Some of these biases may lead to objectionable interventions, some may not. For instance, if policing algorithms replicate or exacerbate racial bias in police stops and arrests, we have a fairness problem. If the data and the algorithm cannot be audited so that we know why they are behaving as they do, we have a transparency problem that could be covering up an accuracy problem,” she explains.
The good news is that, within scholarly and open source communities, a great deal of effort is being made to encourage researchers and practitioners to share their data and their code so that other scholars can attempt to reproduce their work in a process akin to a scientific audit.
“Computer science and data science degree programs often require students to take courses that train them in the art and science of privacy protection, doing reproducible work, data encryption, bias detection, and research ethics. These courses set an expectation for engineers and developers to have the responsibility to consider the impact their work has on people and communities,” she also notes.
But coming up with a rigid set of rules and tools to address social impacts and social problems can be a fool’s errand, she believes.
“Flexible principles and an iterative process of monitoring (and tooling) is usually easier to implement and modify as technologies, regulations, and data availability changes. That said, there are some baseline considerations that should be adopted. Where privacy-sensitive data is not necessary to have, it should not be captured or stored. When privacy-sensitive data is captured, it should be encrypted at rest and in motion.”
Scientists as guardians of data
Data science is not the first field to grapple with the ethical impacts of science.
“Leaning on some of the great medical ethicists who ask themselves where to draw the line between trial subjects, scientists, and pharmaceutical companies and genetic material, I frame this issue as one of capable guardianship,” Norén says.
A capable guardian takes proper safeguards to make sure a data set is used in the way the data subjects are permitting it to be used. She encrypts it to prevent hackers from getting their hands on sensitive information. She explores using differential privacy techniques to further protect the most sensitive data, and knows how she will terminate the guardianship if subjects no longer trust her.
“Capable guardianship is a great way to think about what our obligations are to the data we have about individuals and avoid turning the conversation into one of data ownership. The ownership model doesn’t fit well with data because data can so easily be replicated and thus refuses typical economic logic that ties value to scarcity,” she explains.
Building an ethical cybersecurity product
At Obsidian Security, the biggest challenge they are encountering is responding to interest. They are taking the time needed to build a technologically and sociologically rigorous product and are almost at the point of taking on our first pool of clients.
The goal, after all, is to ship a product that will lower the risk, so they definitely won’t ship it without privacy protections in place.
“Almost by default, defensive cybersecurity provides broad social benefits. In other words, creating better cybersecurity products is a social good,” she notes.
“Keeping data about employees or customers in the organizations that they have consented to do business with fits with the basic principle of consented relationships. Providing alerts about data breaches – another typical cybersecurity goal – is also in line with basic ethical principles around transparency.”
The cybersecurity industry has the added advantage of being full of cynics and skeptics, who are quite likely to trade convenience for security.
For companies who want to increase transparency, one solution might be to add role-based permissions and an audit trail, she points out.
“Role-based permissions restrict access to the smallest number of people, making it less likely that sensitive data falls into the wrong hands or is toyed with by curious minds. Audit trails around access and use of data further cements the idea that these are precious data whose usage will be closely monitored. Anyone misbehaving around the data assets can then be retrained or have their permissions downgraded.”