Plotting Seinfeld Episode Scores by Season Using ggplot2
Viz
I recently made a post over at reddit showing the IMDB scores of Seinfeld episodes by season. It got a lot of traffic, but as it turned out the data I was using was incorrect/outdated/malformed. I also got a lot of questions about how I made the graph. This post should help to answer those questions!
The Data
I got the revised data from IMDB. You can access a clean CSV version here.
Plotting in R with ggplot2
This is the plot we’re going to make:
Here is the code used:
#Read the data into R
#I copied from excel, you can use read.csv() too
sf<-read.table("clipboard",sep="\t",header=T)
#Load ggplot2
#(use install.packages"ggplot2" if you don't have it yet)
library(ggplot2)
#OPTIONAL
#Use the Cairo library for anti-aliased images in Windows
library(Cairo)
CairoWin()
#Make the plot
#Use aesthetics to set the axes and season coloration
#Note: By setting color here stat_smooth will plot separate fits
ggplot(sf,aes(x=c(1:nrow(sf)),y=Rating,color=factor(Season)))+
geom_point()+
#Use any method you like, loess is default but I specified lm
stat_smooth(method="lm")+
#Clean up
labs(x="Episode #",y="Average Rating",title="Seinfeld Episode IMDB Ratings by Season",color="Season")+
ylim(c(7,9))
