Introduction: A summer of data science education

Why another Data Science Education blog?

This is an exciting time to be teaching students how to extract meaning from data. Amidst the flood of information available in almost all domains there have been a flourishing of powerful, open-source tools to help with the process. For instructors, the many changes can be hard to keep up with. In this blog, we’re hoping to create a roadmap for faculty development that will ease the learning curve and help busy people incorporate new tools and approaches into their teaching.

Each day during the summer we intend to add a new entry on a given topic, along with a short overview of why it is interesting and how it can be applied to teaching. We intend to make the entries short, succinct, and easy to comprehend with the goal that they will motivate you to dive deeper. We hope that these introductory pieces can be digested daily in 20 or 30 minute chunks that will leave you in a position to decide whether to explore more or integrate the material into your own classes. We’ll include next steps and additional readings to allow you explore more as you have interest and time. Our focus will be the R environment (e.g., tidyverse and RStudio) with occasional mention of other relevant tools.

There is definitely an art to googling well that not everyone (including the three of us) can master. The data science field is also moving quickly, so answers from useful sites such as StackOverflow may be quickly out of date. Our ambition is that by reading the short overview entries, a variety of instructors will take the opportunity to learn more about the exciting developments in data science and statistics.

To get started we plan to blog daily during summer 2019 starting on Tuesday, May 28th. We hope that you bookmark the site and check in regularly. Want a reminder? Sign up for emails at https://groups.google.com/forum/#!forum/teach-data-science (you must be logged into Google to sign up).

What topics will the blog cover?

We plan to cover the entire data science analysis cycle:

data ingestation, data technologies, and data wrangling
visualization and exploration
workflow and reproducibility
communication and reporting

as well as providing overviews of key reports and findings.

We welcome suggestions for topics: don’t hesitate to share your ideas (guest entries are also welcomed!)

Who are we?

Hunter

Hunter Glanz (twitter) is an Assistant Professor of Statistics and Data Science at California Polytechnic State University (Cal Poly, San Luis Obispo). He received a BS in Mathematics and a BS in Statistics from Cal Poly, San Luis Obispo followed by an MA and PhD in Statistics from Boston University. He maintains a passion for machine learning and statistical computing, and enjoys advancing education efforts in these areas. In particular, Cal Poly’s courses in R, SAS, and Python give him the opportunity to connect students with exciting data science topics amidst a firm grounding in communication of statistical ideas. Hunter serves on numerous committees and organizations dedicated to delivering cutting edge statistical and data science content to students and professionals alike. In particular, the ASA’s DataFest event at UCLA has been an extremely rewarding experience for the teams of Cal Poly students Hunter has had the pleasure of advising.

Jo Hardin (twitter) is a statistician at Pomona College who is passionate about statistics and data science education for all. She received her BA from Pomona College and her MS and PhD degrees from the University of California, Davis. Two years of working at the Fred Hutchinson Cancer Research Center in Seattle, WA got her hooked into analyzing high throughput data (e.g., simultaneous gene expression of thousands of genes). Much of her theoretical work has focused on computational approaches to statistical problems in genetics. Jo works hard to provide her students with current best practices, including teaching the tidyverse to her students in Introduction to Statistics. Beyond the classroom, she has also worked to engage students in statistics and data science: she has worked on the ASA’s curriculum guidelines task force, sent groups of students to UCLA’s DataFest competition, and hosted a local StatFest conference. Through her endeavors, she is working to help undergraduates learn about analyzing data in the wild and gain the skills to be effective and ethical when tasked with making claims from data. This fall she will be teaching statistics inside the California Rehabilitation Center through the international Inside-Out Prison Exchange Program.

Nick

Nick Horton (twitter) is Beitzel Professor of Technology and Society and Professor of Statistics and Data Science at Amherst College. His recent work has focused on statistics and data science education. Nick is a fellow of the American Statistical Association and the American Association for the Advancement of Science. He chaired the Committee of Presidents of Statistical Societies and the ASA Curriculum Guidelines for Undergraduate Programs in Statistical Science workgroup. Nick serves on the National Academies Committee for Applied and Theoretical Statistics and is a co-author of the 2018 “Undergraduate Data Science: Opportunities and Options” consensus study report and the ASA’s revised GAISE (Guidelines for Assessment and Instruction in Statistics Education) College report.