Blogging My Way Through R for Data Science

R for Data Science by Hadley Wickham and Garrett Grolemund is a go-to resource for the data-curious who want to learn R and the tidyverse by curling up with a good book. After spending the summer interning on Garrett’s RStudio Academy team, I wanted to brush up on my basics (and pick up skills I have otherwise been missing), so what better way than to methodically work my way through the text? And what better way to hold myself accountable than to blog about it, and put my progress out into the #rstats community to get feedback and encouragement.

Resources for those studying from R4DS are numerous already. There are crowdsourced solutions manuals, and even a vibrant R4DS community on Twitter and Slack. With that in mind, what can I possibly offer that isn’t just another reproduction of the book’s analyses of the nycflights, gapminder, and other standard data sets?

One of the strengths that I recognized immediately in the book’s preface is its emphasis on workflow:

The R4DS Workflow: Import; Tidy; Transform; Explore: Transform, Visualize, Model, Repeat; Communicate

We import data into a data frame in R; tidy it into a stored form consistent with its meaning; transform it by narrowing our focus to interesting observations (and creating new variables and summaries); explore it with vizualization and models, and (the part that this blog is, in part, going to play) communicate the results of our analysis.

Bingo. That’s my focus - as I progress through the book chapter-by-chapter, I’ll share how I apply this workflow to one of my personal projects, an analysis of backcountry area use at Grand Canyon National Park that I’m calling GrandR (pronounced grander). As the project progresses, I’ll update this blog with a play-by-play of the interesting R tricks I use along the way. I’ll also update the corresponding GitHub repository with the data, R scripts, and (ultimately) a report of my findings. While you can always comment below (with utteranc.es), feel free to share your thoughts and feedback with me directly on GitHub, Twitter or via email at david.failing@gmail.com.

Map of Backcountry Use Areas and Classifications at Grand Canyon National Park

David Failing
David Failing

Lead Data Scientist and Consultant

Mathematics Instructor