Plotting Seinfeld Episode Scores by Season Using ggplot2

I recently made a post over at reddit showing the IMDB scores of Seinfeld episodes by season. It got a lot of traffic, but as it turned out the data I was using was incorrect/outdated/malformed. I also got a lot of questions about how I made the graph. This post should help to answer those questions!

The Data

I got the revised data from IMDB. You can access a clean CSV version here.

Plotting in R with ggplot2

This is the plot we’re going to make:

Episode ratings by season

Here is the code used:

#Read the data into R
#I copied from excel, you can use read.csv() too

#Load ggplot2 
#(use install.packages"ggplot2" if you don't have it yet)

#Use the Cairo library for anti-aliased images in Windows

#Make the plot

#Use aesthetics to set the axes and season coloration
#Note: By setting color here stat_smooth will plot separate fits

  #Use any method you like, loess is default but I specified lm
  #Clean up
  labs(x="Episode #",y="Average Rating",title="Seinfeld Episode IMDB Ratings by Season",color="Season")+
  1. rob

    The two links you provided for IMDB were 404’d for me. How did you obtain the data from IMDB? I tried looking around on the site and it said that you could access it via FTP; could you possibly show how you got it?

    Thanks 🙂

