Teach Data Science

Data Privacy

9 Jul, 2020 education ethics

Data Privacy Data privacy is an increasingly important problem. The flood of data available through sensors, smartphones, and our interactions on the internet have great potential to improve our lives and address long-standing issues and problems. Yet they also raise critical questions about misuse of such data. What is data privacy? And how is it different from anonymous or secure data? Data privacy is concerned with three main components:

Ethical Data Viz

6 Jul, 2020 visualization ethics

Arguably, data have the broadest impact in engaging readers, changing minds, and determining policy when they are presented graphically. It is the potential for enormous impact that requires a data scientist to think most carefully about how their visualizations are created and then subsequently consumed. Many of us already teach data visualization in our statistics and data science classes. Therefore, introducing an ethical framework and a theory on valid graphics will be a natural fit into many classes.

Re-Introduction: Ethics in Data Science Education

1 Jul, 2020 education ethics

Why more Data Science Education blogging? Last summer we wrote a series of blog entries designed to start converstations around teaching data science, Teach Data Science. We covered topics such as data science software, data ingestation, data technologies, data wrangling, visualization & exploration, communication, and key reports and findings on data science. One key element that was lacking on our 2019 blog was a discussion about and a commitment to teaching the ethical aspects of data science.

Keeping Busy with Data Science

13 May, 2020 education communication

It has become increasingly clear that many college students have found themselves without summer plans. Unfortunately, this blog entry is not a list of possible employment opportunities. Instead, it is a compilation of statistics and data science projects to enhance a summer spent socially distant. The list below represents opportunities at a variety of levels. If you are just beginning or quite advanced, there are many ideas for you.

Next Steps

1 Aug, 2019 R Studio education

To finish out the summer, we leave you with one last blog entry. The links below provide information about upcoming endeavors related to data science education. As we become aware of other projects, we are likely to add to the list. Feel free to check back to see what is new on the horizon. Thanks for all the great feedback that we’ve gotten over the summer. Here’s to many future discussions on data science education.

Closing: A summer of data science education

31 Jul, 2019 R Studio education

We don’t know about you, but at the end of this project we find ourselves rejuvenated, empowered, and somewhat exhausted. In writing ten weeks of daily blog entries, we have learned a tremendous amount in terms of technical skills, pedagogical ideas to try in our fall courses, and ways to connect to the amazing & large data science community. One of our main goals we set for ourselves this summer was to create a roadmap for faculty development to “ease the learning curve and help busy people incorporate new tools and approaches into their teaching.

More cloud computing: data science is not done on a laptop

30 Jul, 2019 cloud computing education Amazon web services cluster computing grid computing workflow high performance computing authentication SQL BigQuery Azure Google cloud platform computing

Previous blog entries have discussed cloud based servers (RStudio Server and JupyterHub) and parallel/grid/cluster computing. Today we will expand upon these ideas to discuss at a high level how data science students can leverage cloud based tools to undertake their analyses in a flexible manner. Our discussion is motivated by several recent papers and blog posts that describe how complex, real-world data science computation can be structured in ways that would not have been feasible in past years without herculean efforts.

One model to rule them all

29 Jul, 2019 R confounding causal inference modeling inference statistics python

As we near the end of our summer posts, we’ve started to think more broadly about statistics as well as data science courses. Today’s post considers a broad question relevant for many courses: how can we teach statistical thinking without having to resort to introducing a profusion of tests? Jonas Kristoffer Lindeløv proposed an elegant approach using the idea that common statistical tests are linear models.

Counting commits and peer code review

28 Jul, 2019 R Markdown data ingestation data wrangling github ferpa purrr rvest html code review

Today’s guest entry by Amelia McNamara (University of St. Thomas) describes a creative way that she tackled a problem in one of her upper level courses. One note: The JSM is underway. Looking for interesting talks? Mine’s excellent Shiny for JSM 2019 app for those of you in Denver. This past semester, I taught two sections of a course called Advanced Statistical Software (yes, I’m aware of the acronym. We’re changing the course title soon…).

Data assertion and checks via testthat

25 Jul, 2019 R reproducibility data checking consistency checking workflow data ingestation

Reproducibility and Replicability On May 7, 2019 the National Academies of Sciences, Engineering, Medicine published, “New report examines reproducibility and replicability in science” article here. The report recommends “ways that researchers, academic institutions, journals, and funders should help strengthen rigor and transparency in order to improve the reproducibility and replicability of scientific research.” Reproducibility is at the core of data acumen and needs to be stressed at all levels of the data science curriculum.

Latest Posts

tag