I recently saw this hazard rate analysis of RuPaul’s Drag Race and was inspired to do some analysis on the one reality show I watch: Project Runway.
I’ve found that the downsides to hazard rates is that they are very inaccurate when you have small data sets and also on the “edge” of the time series. Instead of a hazard analysis, I decided to look at the factors that determine success on Project Runway.
I gathered biographical data from the BravoTV and MyLifetime websites and show outcome data from Wikipedia. There have been 155 regular season contestants and I used 113 in my training data set and 42 in my testing data set. There’s not enough data to predict past winners (there have only been 10 seasons), but there was enough for me to predict past finalists.
For each finalist, I gathered their age; sex; city of residence population; education; if the designer has their own line; raw number of wins, high scores, safe scores, and low scores; and percent of wins, high scores, safe scores, and low scores. (Data available here.)
Using the tree library for R, I fit a tree with the following model:
library(tree) pr<-read.table("clipboard",sep="\t",header=T) pr$popcut<-cut(log(pr$Population),6) ##Generate random samples to split dataset tf<-as.logical(rbinom(155,1,.8)) prtrain<-pr[tf,] prtest<-pr[!tf,] ##Make the tree model prt<-tree(PlacedSeason ~ Age + popcut + Win + High + Safe + Low + WinPct + HighPct + SafePct + LowPct ##Prune to compensate for over-fitting: prt.prune<-prune.tree(prt,4)
Surprisingly, the model ended up needing just two variables: number of wins and number of high scores. Here’s the visual version of the tree. Decision trees are read by evaluating the statement and if it is true going to the left and if it is false going to the right.
This model was 88% accurate on the test data set. Here’s where it made mistakes:
Name | Season | Result | Predicted | Win | High | Safe | Low |
---|---|---|---|---|---|---|---|
Austin Scarlett | 1 | Went Home | Finalist | 2 | 0 | 1 | 5 |
Jerell Scott | 5 | Went Home | Finalist | 3 | 3 | 4 | 2 |
Carol Hannah Whitfield | 6 | Finalist | Went Home | 1 | 4 | 7 | 0 |
Mila Hermanovski | 7 | Finalist | Went Home | 1 | 3 | 3 | 6 |
Sonjia Williams | 10 | Went Home | Finalist | 3 | 2 | 4 | 2 |
Fabio Costa | 10 | Finalist | Went Home | 1 | 4 | 5 | 3 |
The question that you might be asking is can we predict successful designers before the season starts? I looked at the age, sex, city of residence population, education, and if the designer has their own line, but unfortunately did not find a successfully predictive model. That would suggest that the show is mostly about design skill and personality and not about the designers’ backgrounds.
That being said, does the model suggest anyone go to finale now that season 11 has started? As it turns out, the model would predict Daniel to go to the finale because of his two wins. Richard and Stanley are the next closest based on their one win and three high scores (each). However, its also possible the team dynamic of this season makes the model invalid. I couldn’t find any Vegas odds for Project Runway, by money is on Daniel, Stanley, and Michelle.