Let's Talk Data

NaNoGenMo 2014 Dev Diary #1: Concordance with Neo4j

NaNoGenMo is an idea created by Darius Kazemi to parody NaNoWriWo. Instead of writing a novel, developers write programs to generate 50k+ word “novels”. This series of posts will document my participation throughout the month.

Generating an original novel with software is certainly a Hard Problem, but the rules of NaNoGenMo are lax enough that programmers of any level can participate. It also seems to be a perfect opportunity to explore new technologies. In my case, I wanted to experiment with modeling language in a graph database.

Conceptually, it is possible to model all possible sentences of a corpus in a single graph. Words follow one another in sentences which creates natural links between the words. Consider the corpus “I am hungry. You are hungry. Hungry people eat food.” We can model that corpus in the following manner:
All sentences begin at the node marked “12” and end at the node marked “11”. This shows, for example, that sentences can start with “hungry” or contain “hungry” in the middle. Additionally, “I am” is not a complete sentence in this corpus. This describes the concept of concordances—the ordering of words in a corpus.

With this idea in mind, I decided that I wanted to create a text generated from a concordance graph. I have already shown that Seinfeld transcripts make for an interesting and amusing corpus, so I will probably use that as my source again. To get my feet wet, I wanted to start a with an extremely limited corpus. And what’s better than the intentionally limited Green Eggs and Ham?

I honestly thought a good chunk of this post would be about installing Neo4j, but these two lines did it for me:
brew install neo4j neo4j start
I first populate the graph with three nodes: statement start, question start, and sentence end.
create (s:Start {type:'statement'}); create (s:Start {type:'question'}); create (e:End);
Next, I populate the graph one sentence at a time. The merge query acts as a “get or create” which is applied to each word. Sentences that end in a question mark start at the “question start” node and other sentences start at the “statement start” node. Each word in a sentence then has a concordance to the next, with the final word terminating at the “end” node.

Let’s see how this works for the first sentence, “I am Sam”.
//"get or create" I merge (w:Word {word:'I'}) return w; //The sentence does not end in a question mark so find the //"statement start" node, find the "I" node (which now must exist), //and link them with the "BEGINS" relationship match (s:Start {type:'statement'}) match (word:Word {word: 'I'}) create (s)-[:BEGINS]->(word); //"get or create" AM merge (w:Word {word:'AM'}) return w; //Find the "I" node and the "AM" node and link them with the "CONCORDANCE" relationship match (a:Word {word: 'I'}) match (b:Word {word: 'AM'}) create (a)-[:CONCORDANCE]->(b); //"get or create" SAM merge (w:Word {word:'SAM'}) return w; //Find the "AM" node and the "SAM" node and link them with the "CONCORDANCE" relationship match (a:Word {word: 'AM'}) match (b:Word {word: 'SAM'}) create (a)-[:CONCORDANCE]->(b); //The sentence ends. Find the "SAM" node and the "end" node and link //them with a "TERMINATES" relationship match (word:Word {word:'SAM'}) match (e:End) create (word)-[:TERMINATES]->(e);
After repeating this for all of the sentences, a complete graph of the book is available to query. For example, we can find all of the nodes that can start a question:
match (s:Start {type:"question"})-[:BEGINS]->(w) return s, w;
Notice that some of these words are themselves connected. Since these words appear more than once, we can also count the occurrences:
match (s:Start {type:"question"})-[:BEGINS]->(w) return w, count(*);

w	count(*)
IN	2
COULD	3
WOULD	9
DO	1
YOU	2

With this proof of concept in place, my next task is going to be parsing and loading the Seinfeld transcripts into Neo4j.

By Phillip Johnson | November 1, 2014 | Programming | No Comments |

Prevalence of #occupycentral in Hong Kong Instagrams

Many Hong Kong citizens are currently protesting for democratic reform in downtown Hong Kong. As is expected, the Chinese netizens have taken to social media to spread their message. Instagram in particular was in the press as there were reports of the Chinese government blocking access to the service. Nonetheless, many users in Hong Kong were still able to get Instagrams posted. Using the Instagram API, I gathered geocoded instagrams in Hong Kong tagged “#occupycentral”.

The tag had some rumblings late last week, but really exploded over the past few days.
Although instagram users around the world used this tag, I wanted to get a visual of where in Hong Kong the tweets were coming from. By far, the majority are clustered around the downtown area, although there are some stragglers further away. Also note the few posts from Victoria Harbor.
It is common for Instagram users to tag photos with multiple hashtags. This can increase visibility of posts since users often browse media by tag. Of the Instagrams that were tagged “#occupycentral”, these are the most common other tags. I was surprised to see the popularity of “#umbrellarevolution”, even surpassing the Chinese versions of “#occupycentral”.
If you’d like to see some of the actual photos being uploaded, check out this live map from Geofeedia.

By Phillip Johnson | October 1, 2014 | Portfolio, Viz | No Comments |

One Year of My Workout Data

Penn Jillette would say that there are two kinds of people in the world: skinny fucks and fat fucks. While he places himself in the latter category, I am definitely part of team skinny fuck. Around this time last year I started casually lifting weights. In typical LTD fashion, I also started tracking my weight and workouts.

This chart shows my body weight gain, approximately 10% over the year.As for my workouts, I tracked the exercise, amount of weight, and number of reps. I don’t know what the standard is for recording free weight, but I made my recordings “per limb” so that a bench press of 30lbs means 30lbs per arm. Any days where I skipped a particular exercise were marked as 0lbs. (Mouseover to highlight.)
One thing this graph hides is the number of reps. For example, the transition from 10 reps of 20lbs to 5 reps of 25lbs. This is the same graph except with the y-axis showing the weight multiplied by the number of reps.
I’m still tracking my data and next year I’ll be able to do an update with double the data!

By Phillip Johnson | September 16, 2014 | Viz | No Comments |

How to Write a Text Adventure in Python

People new to programming often ask for suggestions of what projects they should work on and a common reply is, “Write a text adventure game!” I think there are even some popular tutorials floating around that assign this as homework since I see it so much. This is a really good suggestion for a few reasons:

The concept is familiar and fun (everyone loves games!)
They can be written using core libraries
The UI is the console

But new programmers often struggle with knowing where to start. That’s why I wrote and published Make Your Own Python Text Adventure. This book is a structured approach to learning Python that teaches the fundamentals of the language, while also guiding the development of your own customizable text adventure game.

For those of you who know some Python and just need a little guidance, there’s an abbreviated version of the book material here on the blog. It assumes you are familiar with basic programming concepts (if-statements, loops, objects, etc.), but are still new to writing full applications.

Just looking for some code? You can view the tutorial version of the game on GitHub.

By Phillip Johnson | August 28, 2014 | Programming | 127 Comments |

How to Write a Text Adventure in Python Part 4: The Game Loop

This is an abbreviated version of the book Make Your Own Python Text Adventure.

The end is near, we’re almost ready to play the game! We’ll finish this series by implementing the game loop and receiving input from the human player.

The Game Loop

While some applications follow a discrete set of steps and terminate, a game typically just “keeps going”. The only way the program stops is if the player wins, loses, or quits. To handle this behavior, games usually run inside a loop. On each iteration, the game state is updated and input is received from the human player. In graphical games, the loop runs many times per second. Since we don’t need to continually refresh the player’s screen for a text game, our code will actually pause until the player provides input. Our game loop is going to reside in a new module game.py.

import world
from player import Player

def play():
    world.load_tiles() player = Player()
    while player.is_alive() and not player.victory:
        #Loop begins here

Before play begins, we load our world from the text file and create a new Player object. Next, we begin the loop. Note the two conditions we check: if the player is alive and if victory has not been achieved. For this game, the only way to lose is by dying. However, there isn’t any code yet that lets the player win. In my story, I want the player to escape the cave alive. If they do that, they win. To implement this behavior, we’re going to add a very simple room and place it into our world. Switch back to tiles.py and add this class:

class LeaveCaveRoom(MapTile):
    def intro_text(self):
        return """
        You see a bright light in the distance...
        ... it grows as you get closer! It's sunlight!
        
        Victory is yours!
        """

    def modify_player(self, player):
        player.victory = True

Don’t forget to include one of these rooms somewhere in your map.txt file. Now that the player can win, let’s finish the game loop.

def play():
    world.load_tiles()
    player = Player()
    #These lines load the starting room and display the text
    room = world.tile_exists(player.location_x, player.location_y)
    print(room.intro_text())
    while player.is_alive() and not player.victory:
        room = world.tile_exists(player.location_x, player.location_y)
        room.modify_player(player)
        # Check again since the room could have changed the player's state
        if player.is_alive() and not player.victory:
            print("Choose an action:\n")
            available_actions = room.available_actions()
            for action in available_actions:
                print(action)
            action_input = input('Action: ')
            for action in available_actions:
                if action_input == action.hotkey:
                    player.do_action(action, **action.kwargs)
                    break

The first thing the loop does is find out what room the player is in and then executes the behavior for that room. If the player is alive and they have not won after the behavior executes, we prompt the human player for input. This is done using the built-in input() function. If the human player provided a matching hotkey, then we execute the associated action using the do_action method.

The last thing we need to include is an instruction for Python to know that play() should run when running the file. Include these lines at the bottom of the game.py module:

if __name__ == "__main__":
    play()

To run the program, navigate to the folder containing the adventuretutorial package in your console and run python adventuretutorial/game.py. If you get warnings about packages, try setting your PYTHONPATH environment variable manually. Have fun!

Where to go from here

Congratulations! You now have a working text adventure game. With the information learned here, you should be able to quickly add your own custom items, enemies, and tiles. If you’re up for more of a challenge, here are some of the features included in Make Your Own Python Text Adventure:

An easier and more flexible way to build your world (no text files or reflection!)
A game economy where the player can buy and sell items
The ability for players to heal during and between fights
Difficulty settings to make the game harder or easier

By Phillip Johnson | August 28, 2014 | Programming | 134 Comments |