Today’s blog is a compilation of datasets and data sources to use in a data science classroom whose goals are to include relevant and timely information to consider issues of the day. We hope that the datasets below can be used in conjunction with some of this summer’s previous blogs, for example, considering
- health implications when describing COVID data,
- language around describing social justice data, and
- learning outcomes for getting the most out of ethical data science discussions.
Before linking to the data, we encourage you to reflect on how data are collected and what impact poor data collection can have on any ensuing conclusions.
- In criminal justice datasets race designation is often a guess of the reporting officer. Consider California Assembly Bill No. 953 from 2015:
- In their Data Equity Framework, We All Count details seven stages of looking at data projects, including data collection & sourcing.
The requirements for equitable data collection are complex. It’s not as simple as trying to ask everyone and not leave people out. Sample selection is important of course, but so is survey design, collector behaviour, scope and scale, cultural translation, collection mediums, data corruption, compatibility and fidelity and much more. It’s super worth doing, if for no other reason than your data will be more useful.
Christina Abraham discusses the impact of over-simplifying racial categories.
The Schusterman Family Foundation writes about How we collect data determines whose voice is heard and has provided guidance on More Than Numbers: A Guide Toward Diversity, Equity and Inclusion in Data Collection.
Researchers at the Urban Institute have put together a report on The Alarming Lack of Data on Latinos in the Criminal Justice System.
Campaign Zero has created the Police Scorecard Data to evaluate how police departments interact with the communities they serve.
The Stanford Open Policing Project provides information on over 200 million traffic stops across 42 states.
The Citizens Police Data Project collects and publishes information about police misconduct in Chicago.
The Police Data Initiative promotes responsible policing through the use of open data.
The National Archive of Criminal Justice Data curates data on criminal justice, with close to 3,000 studies / datasets.
ProPublica has compiled datasets related to criminal justice on a wealth of issues:
The data from Five Thirty Eight includes many studies related to criminal justice:
- ProPublica has compiled datasets related to the environment including:
- Pulitzer-winning Washington Post series on Dangerous new hot zones are spreading around the world with data sources and explanation of the data.
Race & gender
Tidy Tuesday dataset on African American Achievements.
Tidy Tuesday dataset on the Slave trade.
Washington Post article on Postal Service warns 46 states their voters could be disenfranchised by delayed mail-in ballots. Jacob Bogage is collecting data and will likely post it publicly.
Large Data Archives
DrivenData crowd-sources solving data science problems with positive social impact.
FiveThirtyEight is a data journalism website which started by doing political analyses but now uses data to cover politics, science, economics, and lifestyle. They provide access to many of their datasets.
About this blog
Last summer we wrote a series of blog entries designed to start conversations around teaching data science, Teach Data Science. We covered topics such as data science software, data ingestation, data technologies, data wrangling, visualization & exploration, communication, and key reports and findings on data science.
One key element that was lacking on our 2019 blog was a discussion about and a commitment to teaching the ethical aspects of data science. We have now found ourselves in the summer of 2020, overwhelmed by the state of the world and re-committed to the ethical challenges which can help data science be a positive force for change.
Although none of us are experts in ethics, we have all included ethics discussions in our classrooms for many years. In the weeks to come, we will share some of the ways we engage our students in these important topics. We will provide resources for readings, examples, datasets, and exercises. We believe that data ethics are part of every data science analysis and classroom experience, and we hope that this summer’s blog will entice you into presenting ethical dilemmas and related conversations to your students early and often.
During the summer of 2020, we wrote a dozen or so blog entries. We hope that you bookmark the site and check in regularly. Want a reminder? Sign up for emails at https://groups.google.com/forum/#!forum/teach-data-science (you must be logged into Google to sign up).