Philosophical Ethics for Data Science

· by Jo Hardin · Read in about 8 min · (1642 words) ·

In philosophy departments, classes and modules centered around data ethics are not a new thing. The ethical challenges around working with data are not fundamentally different from the ethical challenges philosophers have always faced. But putting an ethical framework around data science principles (see here and here) is indeed new for most data scientists, and for many of us, we are woefully under-prepared to teach so far outside our comfort zone.

We start with a grounding in the definition of Ethics:

Ethics, also called moral philosophy, has three main branches:

Applied ethics “is a branch of ethics devoted to the treatment of moral problems, practices, and policies in personal life, professions, technology, and government.”

Ethical theory “is concerned with the articulation and the justification of the fundamental principles that govern the issues of how we should live and what we morally ought to do. Its most general concerns are providing an account of moral evaluation and, possibly, articulating a decision procedure to guide moral action.”

Metaethics “is the attempt to understand the metaphysical, epistemological, semantic, and psychological, presuppositions and commitments of moral thought, talk, and practice.”

While, unfortunately, there are myriad examples of ethical data science problems (see, for example, blog posts bookclub and data feminism), today’s entry connects some of the broader data science ethics issues with the existing philosophical literature. We point out that we are only scratching the surface and a deeper dive might involve education in related philosophical fields (epistemology, metaphysics, or philosophy of science), philosophical methodologies, and ethical schools of thought. You can peruse all of these through, for example, a course or readings introducing the discipline of philosophy. Indeed, maybe this blog post will whet your appetite to create a vital interdisciplinary bridge (here between philosophy and data science) which can only enhance both fields.

Below we provide some thoughts on how to approach a data science problem using a philosophical lens.

Case Study to Structure Ethical Discussion

Many ethics case studies provided in a classroom setting describe algorithms built on data which are meant to predict outcomes. Large scale algorithmic decision making presents particular ethical predicaments because of both the scale of impact and the “black-box” sense of how the algorithm is generating predictions.

Consider the well-known issue of using facial recognition software in policing. There are many questions surrounding the policing issue: what are the action options with respect to the outcome of the algorithm? What are the good and bad aspects of each action and how are these to be weighed against each other?

The two main ethical concerns surrounding facial recognition software break down into how the algorithms were developed and how the algorithm is used. When thinking about the questions below, reflect on the good aspects and the bad aspects and how one might weight the good versus the bad.

Creating the algorithm

  • What data should be used?
    • If the accuracy rates of the algorithm differ based on the demographics of the subgroups within the data, is more data and testing required?
  • Who and what criteria should be used to tune the algorithm?
    • Who should be involved in decisions on the tuning parameters of the algorithm?
    • Which optimization criteria should be used (e.g., accuracy? false positive rate? false negative rate?)
  • Issues of access:
    • Who should own or have control of the facial image data?
      • Do individuals have a right to keep their facial image private from being in databases?
      • Do individuals have a right to be notified that their facial image is in the data base? For example, if I ring someone’s doorbell and my face is captured in a database, do I need to be told? [While traditional human subjects and IRB requirements necessitate consent to be included in any research project, in most cases it is legal to photograph a person without their consent.]
    • Should the data be accessible to researchers working to make the field more equitable? What if allowing accessibility thereby makes the data accessible to bad actors?

Using the algorithm

  • Issues of personal impact:
    • The software might make it easier to accurately associate an individual with a crime, but it might also make it easier to mistakenly associate an individual with a crime. How should the pro vs con be weighed against each other?
    • Do individuals have a right to know, correct, or delete personal information included in a database?
  • Issues of societal impact:
    • Is it permissible to use a facial recognition software which has been trained primarily on Caucasian faces, given that this results in false positive and false negative rates that are not equally dispersed across racial lines?
    • While the software might make it easier to protect against criminal activity, it also makes it easier to undermine specific communities when their members are mistakenly identified with criminal activity. How should the pro vs con of different communities be weighed against each other?
  • Issues of money:
    • Is it permissible for a software company to profit from an algorithm while having no financial responsibility for its misuse or negative impacts?
    • Who should pay the court fees and missed work hours of those who were mistakenly accused of crimes?

To settle the questions above, we need to study various ethical theories, and it turns out that the different theories may lead us to different conclusions. As non-philosophers, we recognize that the suggested readings and ideas may come across as overwhelming. If you are overwhelmed, we suggest that you choose one ethical theory, think carefully about how it informs decision making, and help your students to connect the ethical framework to a data science case study.

Some Readings in Philosophy

In order to break down the algorithmic steps outlined above, students probably need some grounding in ethical structures. You might choose to use only one of the following as a lens to investigate an ethical case study. Alternatively, you may provide snippets from a variety of sources and compare how the different ethical frameworks would inform the algorithmic decisions.

We note that some of the links above point to reference articles (instead of original or scholarly works). While we recognize the value in reading original sources, we also have experience in the difficulty of grappling with new philosophical ideas (especially for students in disciplines outside of philosophy or ethics). If you are using original sources in your classroom that are accessible to data science students, please add the sources to our discussion of this blog, as we are certain that the citations would be much appreciated in the data science community!

Learn more


Thanks much to Julie Tannenbaum and Michael Spezio whose careful reads of previous drafts led to a much improved blog entry.

About this blog

Last summer we wrote a series of blog entries designed to start conversations around teaching data science, Teach Data Science. We covered topics such as data science software, data ingestation, data technologies, data wrangling, visualization & exploration, communication, and key reports and findings on data science.

One key element that was lacking on our 2019 blog was a discussion about and a commitment to teaching the ethical aspects of data science. We have now found ourselves in the summer of 2020, overwhelmed by the state of the world and re-committed to the ethical challenges which can help data science be a positive force for change.

Although none of us are experts in ethics, we have all included ethics discussions in our classrooms for many years. In the weeks to come, we will share some of the ways we engage our students in these important topics. We will provide resources for readings, examples, datasets, and exercises. We believe that data ethics are part of every data science analysis and classroom experience, and we hope that this summer’s blog will entice you into presenting ethical dilemmas and related conversations to your students early and often.

During the summer of 2020, we plan to write a dozen blog entries. We hope that you bookmark the site and check in regularly. Want a reminder? Sign up for emails at!forum/teach-data-science (you must be logged into Google to sign up).