Diversity in Data Science & Statistics

· by Jo Hardin · Read in about 7 min · (1310 words) ·

Calls for Diversity

Data science is made up of not only sets of tools, methods, and problems to solve, but also actual people who make up the statistics & data science community. The National Academies Report on Data Science for Undergraduates (see previous blog post at: https://teachdatascience.com/nasem) includes a section on “Ensuring Broad Participation” which reiterates the importance of creating an inclusive community where all views are heard and supported. In their report they say:

According to the South Big Data Innovation Hub’s Keeping Data Science Broad, “the variety of perspectives such diversity [in terms of race, gender, religious affiliation, socioeconomic status, ethnicity, and first-generation status] provides is as essential as that provided by the trans-disciplinary nature of data science for innovation and growth of the field” (Rawlings-Goss et al., 2018, p. 29). The report explains that the first step in creating a more inclusive environment is to ensure that students and faculty alike—at all types of educational institutions—have equitable access to resources (e.g., high-quality data, tools, technology, adaptable and appropriate curriculum, and advisors). Also crucial to retaining broad participation in data science is a “culturally relevant curriculum,” a more diverse faculty, and collaborations between majority-serving and minority-serving institutions (Rawlings-Goss et al., 2018, p. 31).

Thus, it is the responsibility of academic institutions to ensure inclusion and broad participation and engagement in data science programs. Master (Nov 7, 2017, webinar) suggests that data science programs at higher education institutions increase exposure to data science fields, broaden beliefs about who belongs in these fields, challenge students’ beliefs about fixed abilities, and show that data science can make a difference in society in order to broaden participation and engagement in data science. Williams (Nov 7, 2017, webinar) suggests that faculty adjust curriculum to be more inclusive, create opportunities for students to engage in community data, affirm student ability, and create diverse teams of students. The efforts highlighted by Master and Williams not only lead to increased engagement, but they also stand to sustain participation of underrepresented populations in data science. If data science is to avoid a similar decrease in participation that occurred in the 1980s in computer science among female students, it is imperative that underrepresented students are supported both academically and through mentorship, recognizing the opportunities that the field of data science presents and the value they can add to it.

Some of the introductory data science courses described in this report have made inclusion and broad participation a central goal, shaping pedagogy, technical infrastructure, and staffing.

Additionally, the more different perspectives that get brought to a field, the better off the field will be. A more diverse organization is more interesting, more socially conscious, and smarter. Scientific American recently laid out the benefits.

Decades of research by organizational scientists, psychologists, sociologists, economists and demographers show that socially diverse groups (that is, those with a diversity of race, ethnicity, gender and sexual orientation) are more innovative than homogeneous groups. Phillips, 2014

And hopefully it also goes without saying that, as educators, we all recognize that diversifying is just the right thing to do. But calls for diversifying any community are not always embraced or understood. We lay out a few ways that the statistics, data science, machine learning, and R communities are working to build inclusive spaces.


Those that follow the #rstats hashtag know that the community of R users is quite welcoming to women and gender minorities. A recent article by Reshama Shaikh details some of the positive steps that the larger R community has taken to become more inclusive.

A big part of that effort has come through the network of R-Ladies. From the R-Ladies website:

As a diversity initiative, R-Ladies’ mission is to achieve proportionate representation by encouraging, inspiring, and empowering the minorities currently underrepresented in the R community. R-Ladies’ primary focus, therefore, is on supporting the R enthusiasts who identify as an underrepresented minority to achieve their programming potential, by building a collaborative global network of R leaders, mentors, learners, and developers to facilitate individual and collective progress worldwide.


The ASA Committee on Minorities in Statistics organizes an annual StatFest Conference aimed at encouraging undergraduate students from traditionally marginalized backgrounds (in particular, Black, Hispanic, and Native students) to consider careers and graduate studies in the statistical sciences.

The conference includes:

  • presentations from established professionals, academic leaders, and current graduate students that will help attendees understand the opportunities and routes for success in the field,
  • opportunities for networking,
  • opportunity for attendees to submit and present posters describing their research, and
  • panel forums that will provide information and tips for a rewarding graduate student experience, achieving success as an academic statistician, opportunities in the private and government arenas, among other topics.

The first StatFest took place at Spelman College in 2001, and Fall 2019 (September 21, 2019) will be the 19th annual StatFest, hosted at The University of Texas Health Science Center at Houston. Encourage your undergraduate students of color to attend! Sign up here.

The Committee on Minorities in Statistics is also putting together a pre-JSM Diversity Workshop and Mentoring Program. Sign-up here.

LGBTQ+ Resources

Recently, Significance magazine put together a comprehensive set of resources for statisticians and data scientists to use in the classroom and the workplace. Their focus on creating inclusive spaces discusses issues specific to the LGBTQ+ community (e.g., pronouns), but their approach to creating a safe space is valid for all classrooms and professional settings for all individuals. Related, Significance published an article Friends and allies: LGBT+ inclusion in statistics and data science sharing advice and making recommendation for creating more inclusive and supportive statistics and data science spaces for gender non-conforming and LGBTQ+ persons.

Additionally, the resources above give suggestions for data sets that are inclusive and tips for using data in class. For educators, being thoughtful to introduce data in a respectful and inclusive manner seems a low bar that we should all be able to cross when working with students.

Moving Forward

It is worth pointing out those who support data scientists who are women, LGBTQ+ persons, differently abled persons, and people of color continue to play a role in improving dynamics of the R community. Shout out to those in positions of privilege who, when offered an invitation to speak at a conference, ask about the gender, racial, LGBTQ+, and able-bodied balance of the other speakers. New York R Conference did a great job last month with their female speakers. Here’s to continued support of data scientists whose voices are not typically or publicly lifted up.

Learn more

About this blog

Each day during the summer of 2019 we intend to add a new entry to this blog on a given topic of interest to educators teaching data science and statistics courses. Each entry is intended to provide a short overview of why it is interesting and how it can be applied to teaching. We anticipate that these introductory pieces can be digested daily in 20 or 30 minute chunks that will leave you in a position to decide whether to explore more or integrate the material into your own classes. By following along for the summer, we hope that you will develop a clearer sense for the fast moving landscape of data science. Sign up for emails at https://groups.google.com/forum/#!forum/teach-data-science (you must be logged into Google to sign up).

We always welcome comments on entries and suggestions for new ones.