I recently made a post over at reddit showing the IMDB scores of Seinfeld episodes by season. It got a lot of traffic, but as it turned out the data I was using was incorrect/outdated/malformed. I also got a lot of questions about how I made the graph. This post should help to answer those questions!
Plotting in R with ggplot2
This is the plot we’re going to make:
Here is the code used:
#Read the data into R #I copied from excel, you can use read.csv() too sf<-read.table("clipboard",sep="\t",header=T) #Load ggplot2 #(use install.packages"ggplot2" if you don't have it yet) library(ggplot2) #OPTIONAL #Use the Cairo library for anti-aliased images in Windows library(Cairo) CairoWin() #Make the plot #Use aesthetics to set the axes and season coloration #Note: By setting color here stat_smooth will plot separate fits ggplot(sf,aes(x=c(1:nrow(sf)),y=Rating,color=factor(Season)))+ geom_point()+ #Use any method you like, loess is default but I specified lm stat_smooth(method="lm")+ #Clean up labs(x="Episode #",y="Average Rating",title="Seinfeld Episode IMDB Ratings by Season",color="Season")+ ylim(c(7,9))