Practical Data Science: an introduction to the PeerJ collection

· by Nicholas Horton · Read in about 4 min · (689 words) ·

In 2017, Jenny Bryan and Hadley Wickham published the “Practical Data Science for Stats” PeerJ collection. (The papers were also published in a special issue of The American Statistician.)

The “Practical Data Science for Stats” Collection contains a series of short papers focused on the practical side of data science workflows and statistical analysis.

There are many aspects of day-to-day data analytical work that are almost absent from the conventional statistics literature and curriculum. And yet these activities account for a considerable share of the time and effort of data analysts and applied statisticians.

The goal of the collection is to increase the visibility and adoption of modern data analytical workflows and facilitate the transfer of tools and frameworks between industry and academia, between software engineering and Stats/CS, and across different domains.

We think that the set of papers are an invaluable contribution to the pedagogy of data science, particularly for those whose work and training has primarily been in statistics.

There are many ways to integrate the ideas into your data science classes.

  • One approach using COPSS Past, Present, and Future of Statistical Science is to have students pick a chapter and give a lightning talk (no more than three minutes and three slides) that describes why they picked the entry, one thing they learned, and one question that they still have. My rubric also includes having a “compelling opening line” and pushing their slides to the class github repository by a given deadline. I could imagine doing the same type of activity using the PeerJ papers instead of the COPSS book.

  • Past blog entries have discussed directly incorporating some of the ideas into a classroom. For example, consider the three GitHub entries: GitHub, GitHub in RStudio, GitHub Classroom.

  • Stay tuned for future entries to the Teaching Data Science blog which are informed by the excellent series of PeerJ articles and ideas!

The PeerJ paper collection may also make great additions to your summer reading list.

About this blog

Each day during the summer of 2019 we intend to add a new entry to this blog on a given topic of interest to educators teaching data science and statistics courses. Each entry is intended to provide a short overview of why it is interesting and how it can be applied to teaching. We anticipate that these introductory pieces can be digested daily in 20 or 30 minute chunks that will leave you in a position to decide whether to explore more or integrate the material into your own classes. By following along for the summer, we hope that you will develop a clearer sense for the fast moving landscape of data science. Sign up for emails at!forum/teach-data-science (you must be logged into Google to sign up).

We always welcome comments on entries and suggestions for new ones.