Recommended Materials

This is a non-exhaustive list of materials that I have used to learn about statistics, programming, and visualization. I’m always trying to learn new things, so this list will be updated when I come across something I like! If you decide to buy something here, please consider using the links on this page as they are referral links which help to support the blog. Enjoy!

Books

R

The Art of R Programming: A Tour of Statistical Software Design
This was actually the first book I picked up about R because it was on the shelf at my library, but I would not recommend it as a first book for learning R. However, if you want to get into writing R scripts or programs, this is a great resource.

ggplot2: Elegant Graphics for Data Analysis (Use R!)
IF you’re planning on creating visualization with R, you’ll certainly want to look at using ggplot2. Although it is open source and the documentation is online, this book goes into more depth with examples, tips, and tricks. It is also a great way to support the developer.

R Cookbook (O’Reilly Cookbooks)
The “Cookbook” series from O’Reilly is great for when you know your way around a software program but need a quick way to check how to do something. For some reason, there are a few things in R that I routinely forget and I know I can get a good answer from this book. It’s usually faster than pouring over stack overflow for a few minutes.

Programming

Code Complete: A Practical Handbook of Software Construction, Second Edition
This book is a few years old now, but the concepts in it are still rock-solid. If you don’t have a career as a programmer, some of it won’t apply, but most of it does. When I started programming just as a hobby, this book really helped me start to see the bigger picture of software development. Now I can apply many of the concepts I read about in my career too.

Software

R
Considered the premiere statistical software by many, R is a complete environment for data analysis, statistical modeling, and visualization. Additionally, it is free, open-source, and has a very active community. The learning curve is steep, but worth it.

ggplot2
Although it is just an R library, I feel the need to mention it. I use it for almost every R-related post because the graphics are so much better than R’s default graphics package. It takes some time getting used to the syntax, but there is a lot of information online about the package. The developer, Hadley Wickham, is also frequently involved on the ggplot2 message boards and stack overflow.

Microsoft Excel
Excel sometimes gets complaints, but it does a lot of things really well. Sorting and cleaning data is a breeze, simple bar charts look great, and formulas are great for manipulating text data into other formats. I often flip between R and Excel when working with data.

Python
I love using Python for short automated tasks. Putting together a script to scrape data or do some statistical analysis is very easy and usually only takes up a few dozen lines of code. I also find that Python lets me think less about the code and more about my task as compared to lower-level programming languages. Additionally, with the plethora of libraries for Python, many common tasks have already been programmed for you.

Education

Udacity

Udacity is my favorite online education resource. There are two features that make it stand out:

  1. The classes are designed for the platform. Instead of retrofitting college classes for the internet, these classes are designed specifically for Udacity. This means the videos are all 2-5 minutes long, self-paced, and present the opportunity for frequent, but concise, knowledge checks.
  2. The in-class code editor works brilliantly.You never have to leave the website to complete the course work. All of the programming quizzes and assignments are completed directly in the browser. Furthermore, the code compiles and runs very quickly on Udacity’s servers so you only have to wait a few seconds to see if you got the question correct.

Unfortunately, Udacity has recently started to push their “nano degrees” program with special pay-for access to the material. However, if you go digging, you can still find the full classes available for free, it just takes a few more clicks. It’s not entirely clear to me what the advantage of the pay access is, so I’ve never tried it.

If you are looking for great content by industry leaders, Udacity should be your first stop. But if you can’t find something you’re interested in, keep reading.

edX

Until recently, edX seemed to be lacking in features, but I’m now really happy with their platform. They have access to dozens of classes from top schools like Harvard, MIT, and Stanford. Most of the classes follow a video lecture > quiz > homework/project cycle making it feel more like a college class. The lecture videos are longer than Udacity, usually about 10-20 minutes a piece. Additionally, the coursework is usually more involved than what you may find elsewhere. Since edX allows more flexibility to teachers, you will most likely need to do some or all of the following: download software onto your computer, download code for homework, sign up at external websites to participate in discussions and/or grading, etc. This can be a bit of a turn off, but the quality and diversity of the courses keeps me coming back.

Coursera

Coursera is very similar to edX and in fact most of what I said about edX also applies to Coursera. Even the websites look similar! The only real difference I have noticed is that the quality of the material on Coursera is very much more hit-or-miss than edX. With Coursera, some of the classes are just teachers recording themselves talking into their laptop webcam from inside a closet. I probably only have a <50% completion rate on Coursera because I get bored or annoyed with the lectures so much more easily as compared to the other sites. You can still find some gems here, but be prepared to not love everything you sign up for.