# How to get to the Project Runway finale

I recently saw this hazard rate analysis of RuPaul’s Drag Race and was inspired to do some analysis on the one reality show I watch:  Project Runway.

I’ve found that the downsides to hazard rates is that they are very inaccurate when you have small data sets and also on the “edge” of the time series. Instead of a hazard analysis, I decided to look at the factors that determine success on Project Runway.

I gathered biographical data from the BravoTV and MyLifetime websites and show outcome data from Wikipedia. There have been 155 regular season contestants and I used 113 in my training data set and 42 in my testing data set. There’s not enough data to predict past winners (there have only been 10 seasons), but there was enough for me to predict past finalists.

For each finalist, I gathered their age; sex; city of residence population; education; if the designer has their own line; raw number of wins, high scores, safe scores, and low scores; and percent of wins, high scores, safe scores, and low scores. (Data available here.)

Using the tree library for R, I fit a tree with the following model:

```library(tree)
pr\$popcut<-cut(log(pr\$Population),6)

##Generate random samples to split dataset
tf<-as.logical(rbinom(155,1,.8))
prtrain<-pr[tf,]
prtest<-pr[!tf,]

##Make the tree model
prt<-tree(PlacedSeason ~ Age + popcut + Win + High +
Safe + Low + WinPct + HighPct + SafePct + LowPct

##Prune to compensate for over-fitting:
prt.prune<-prune.tree(prt,4)
```

Surprisingly, the model ended up needing just two variables: number of wins and number of high scores. Here’s the visual version of the tree. Decision trees are read by evaluating the statement and if it is true going to the left and if it is false going to the right.

Decision tree for Project Runway finalists.

This model was 88% accurate on the test data set. Here’s where it made mistakes:

NameSeasonResultPredictedWinHighSafeLow
Austin Scarlett1Went HomeFinalist2015
Jerell Scott5Went HomeFinalist3342
Carol Hannah Whitfield6FinalistWent Home1470
Mila Hermanovski7FinalistWent Home1336
Sonjia Williams10Went HomeFinalist3242
Fabio Costa10FinalistWent Home1453

The question that you might be asking is can we predict successful designers before the season starts? I looked at the age, sex, city of residence population, education, and if the designer has their own line, but unfortunately did not find a successfully predictive model. That would suggest that the show is mostly about design skill and personality and not about the designers’ backgrounds.

That being said, does the model suggest anyone go to finale now that season 11 has started? As it turns out, the model would predict Daniel to go to the finale because of his two wins. Richard and Stanley are the next closest based on their one win and three high scores (each). However, its also possible the team dynamic of this season makes the model invalid. I couldn’t find any Vegas odds for Project Runway, by money is on Daniel, Stanley, and Michelle.

Tagged on: ,