Plotting Seinfeld Episode Scores by Season Using ggplot2

I recently made a post over at reddit showing the IMDB scores of Seinfeld episodes by season. It got a lot of traffic, but as it turned out the data I was using was incorrect/outdated/malformed. I also got a lot of questions about how I made the graph. This post should help to answer those questions!

The Data

I got the revised data from IMDB. You can access a clean CSV version here.

Plotting in R with ggplot2

This is the plot we’re going to make:

Episode ratings by season

Here is the code used:

#Read the data into R
#I copied from excel, you can use read.csv() too
sf<-read.table("clipboard",sep="\t",header=T)

#Load ggplot2 
#(use install.packages"ggplot2" if you don't have it yet)
library(ggplot2)

#OPTIONAL
#Use the Cairo library for anti-aliased images in Windows
library(Cairo)
CairoWin()

#Make the plot

#Use aesthetics to set the axes and season coloration
#Note: By setting color here stat_smooth will plot separate fits

ggplot(sf,aes(x=c(1:nrow(sf)),y=Rating,color=factor(Season)))+
  geom_point()+
  #Use any method you like, loess is default but I specified lm
  stat_smooth(method="lm")+
  #Clean up
  labs(x="Episode #",y="Average Rating",title="Seinfeld Episode IMDB Ratings by Season",color="Season")+
  ylim(c(7,9))
Tagged on: , ,

3 thoughts on “Plotting Seinfeld Episode Scores by Season Using ggplot2

  1. rob

    The two links you provided for IMDB were 404’d for me. How did you obtain the data from IMDB? I tried looking around on the site and it said that you could access it via FTP; could you possibly show how you got it?

    Thanks 🙂

  2. Pingback: The Siiiiiiiiiiiiiiiiiiiiiiimpsooooooooooooooooons | zahlenbitte

Leave a Reply

Your email address will not be published. Required fields are marked *