NaNoGenMo 2014 Dev Diary #3: Results

NaNoGenMo is an idea created by Darius Kazemi to parody NaNoWriWo. Instead of writing a novel, developers write programs to generate 50k+ word “novels”. This series of posts will document my participation throughout the month. With the scaffolding in place that I described in my last post, I was able to pretty easily substitute the sample words for real words. The most significant change I made was implementing a simple cache. There’s no reason to search the graph and calculate the probabilities for the same word more than once. As I expected, this method resulted in mostly gibberish. The reason for this is because using only word concordances means that the only guaranteed grammatical correctness is between each pair of words. There is no guarantee that the sentence as a whole is grammatically correct. Nonetheless, I did end up with some humorous text. Here are just a few of my favorites.

GEORGE: OH THAT’S NEWMAN ALL RIGHT. THE CORNER. ELAINE: I’M VERY VERY AWKWARD.

ELAINE: SO HE HAD ORGASMS? GEORGE: WELL MAYBE. HOW? MOVE EDWARD SCISSORHANDS. I’M SPECIAL. NO PROBLEM FOR BREAKFAST. SO LENNY BRUCE USED.

JERRY: FIRST DATE WAS IT WAS. GEORGE: YOU TELL ME? SHE SAID THANK YOU SLEEPY.

KRAMER: MINIPLEX MULTITHEATER. JERRY: GIDDYUP.

GEORGE: OH THAT SOUP NAZI.

If I redid this, I would want to find a cleaner data source. My source was filled with typos, inconsistencies, and non-standard punctuation. This led to some difficulties seeding the database. Also I think I would count “…” as its own distinct sentence-terminator. Finally, a similar approach that would have less gibberish would be using Markov chains instead of simple concordances. All of my code and the final “Missing Season” is available on github.