How to use Python generators to save memory

I recently saw a Reddit thread where someone was asking for help managing memory in Python. It was a pretty short script, but it contained the following function:

def generate():
    charlist = "abcdefghijklmnopqrstuvwxyz"
    return [''.join(i) for i in itertools.permutations(charlist)]

Well there's your problem.Needless to say, I think I found the memory problem.

This function returns every possible ordering of the lower case letters a through z. Since there are 26 letters, this means there are 26! permutations—which is a really big number. To give you an idea, this results in more strings than there are grains of sand on Earth and stars in the universe. (Interesting side fact: any shuffled deck of cards is just 1/52! possible decks which means almost certainly no other card deck has ever been ordered that same way.)

The user reported 20% of his memory being used, but I’m surprised that even happened. My back of the envelope suggests that (26! permutations) * (26 letters) * (1 byte per letter) is way more than would fit into anyone’s computer memory. It’s possible Python does some optimization of string storage. Regardless, using a generator, we can reduce the memory footprint to almost nothing.

Let’s take a look first at the memory footprint when generating permutations for just 10 characters. I used the guppy package to gather the stats which currently only works for Python 2.

Python 2.7.6 (default, Sep  9 2014, 15:04:36) 
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from guppy import hpy
>>> import itertools
>>> hp = hpy()
>>> start = hp.heap()
>>> start[0]
Partition of a set of 11866 objects. Total size = 977040 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0  11866 100   977040 100    977040 100 str

After the imports, a profile shows ~11,000 string in memory using about 1MB of memory. Now we’ll do our expensive calculation.

>>> x = [''.join(p) for p in itertools.permutations('0123456789')]
>>> end = hp.heap()
>>> used = end - start
>>> used[0]
Partition of a set of 3628805 objects. Total size = 174182672 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0 3628805 100 174182672 100 174182672 100 str

As expected we have created ~10! new strings which take up 174MB of memory.

Instead of generating all of these strings at once and storing them in memory, we can instead use a generator. A generator is lazy: instead of creating all of the strings immediately, it will generate one string at a time on demand. Switching from a list comprehension to a generator comprehension is trivial. Simply replace the brackets ([ ]) with parentheses.

Here’s what the memory usage looks like with a generator:

Python 2.7.6 (default, Sep  9 2014, 15:04:36) 
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import itertools
>>> from guppy import hpy
>>> hp = hpy()
>>> start = hp.heap()
>>> x = (''.join(p) for p in itertools.permutations('0123456789'))
>>> end = hp.heap()
>>> used = end - start
>>> used
Partition of a set of 32 objects. Total size = 5272 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0      6  19     1680  32      1680  32 dict (no owner)
     1      3   9     1488  28      3168  60 types.FrameType
     2      2   6      560  11      3728  71 dict of guppy.etc.Glue.Owner
     3      6  19      488   9      4216  80 tuple
     4      9  28      480   9      4696  89 str
     5      2   6      256   5      4952  94 types.CodeType
     6      2   6      144   3      5096  97 guppy.etc.Glue.Owner
     7      1   3       96   2      5192  98 itertools.permutations
     8      1   3       80   2      5272 100 types.GeneratorType

Wow! We only created a handful strings (probably due to the variable names). None of the permutations have actually been calculated at this point. To get those permutations, we can call x.next() and iterate through one string at a time. Also take note that the generator itself takes up a trivial 80 bytes.

What we probably won’t save on is CPU time. Using Python’s timeit module, it took 9.326 seconds on a 2.9GHz i7 to generate one million permutations of the letters. To actually generate all 26! permutations on my computer would still take trillions of years. So bear in mind that when you use a generator, you still should only be creating a number of objects that you can generate in a reasonable amount of time.

By using a generator, we saw that it is now memory-safe to create 26! permutations since they will only be generated one at a time, on demand. Generators will not work for all situations, but any time you have a large number of objects and you only need to see one at a time, you might consider using a generator to save memory.

Review: edX Artificial Intelligence

Artificial intelligence has always been something that has interested me, but more in the “I can beat you at chess” sense than the Data from Star Trek sense. Perhaps because the former seems like something I can reasonably design whereas the latter is so far impossible. Thankfully for me, the edX Artificial Intelligence class (aka BerkeleyX: CS188.1x) deals strictly with problem solving.

I was really impressed by the quality of the lectures in this class. They are mostly videos of Dan Klein teaching the class at Berkeley accompanied by really great Power Point slides. I’ve noticed that the best teachers are those who are experts in their field, but still choose to teach intro classes. Professor Klein clearly falls into that category and his passion really comes through in this class. He covers a wide variety of AI topics including search, adversaries, MDPs, and lotteries. Getting to see these ideas come to life in “real-time” through the programming challenges is very encouraging and helps to solidify the concepts learned in lecture. I also appreciated that the weeks with programming challenges did not have any other requirements such as lectures or quizzes.

The main complaint I have is a technical complaint about the edX interface. Homework questions can be answered an unlimited number of times and the final exam questions can be answered 1-2 times. However, after answering a question incorrectly, you cannot simply change your answer and recheck it. Instead, you must hit “reset” which clears your incorrect answer(s) and even more frustratingly sometimes changes the problem! This is not that big of a deal for multiple choice questions, but there were many multi-part, math-based questions where you could get, say, eight out of ten answers correct yet have to redo the entire problem after hitting reset. Based on the discussion forums, some students ended up writing scripts to solve the questions given certain inputs. I went more low-tech and built some spreadsheets to do the same thing. I never got the feeling that this was expected. At most the questions say something to the effect of, “We suggest you work these out on paper before answering the questions.” I understand they may want to prevent students from just randomly guessing, but this just felt punishing.

A smaller complaint, really more of an annoyance, is the number of edX-external websites involved in the class. Although participation is not required, discussion took place on Piazza instead of edX’s built in discussion boards. I immediately thought, “Oh great, yet another ‘novel’ approach to the discussion board” and, go figure, it was nothing special and even seemed to lack some common message board features. Another example (also not required) was submitting practice final exams to GradeScope for grading.

Despite these issues, I would still definitely recommend the class. The content is of such high quality that it’s easy to overlook some frustration here and there. If you are looking for an advanced programming MOOC or have an interest in AI, this is a great place to start.

An Experiment with Scala.js

We don’t usually think of Scala as a front-end language, but Scala.js is challenging that assumption. The framework gives you easy access to the DOM and therefore the canvas, making it trivial to develop an HTML5 application.

To demonstrate this, I built an A* maze solver entirely in Scala.js:
A summary of my discoveries:
Pros:

  • Obviously, you get everything you love about Scala (the language) and get to avoid everything you hate about JavaScript (the language).
  • Rapid prototyping and development of small web apps
  • Fast and highly-optimized code

Cons:

  • SBT is a pain to work with. Versions must line up, and the SBT spell must be incanted exactly to get your project to build.
  • You can’t run code or tests from the IDE (well, at least IntelliJ). One workaround I saw is to create two projects: one for Scala and one for Scala.js and integrate them in the IDE. Feels clumsy.
  • You can only use code that has been compiled for Scala.js which means that most libraries and any Java code you have won’t work out of the box.

    I wouldn’t be ready to recommend Scala.js in projection, but it is definitely fun to play with and I hope more people try it out and expand the ecosystem.

    Full code for this project available on GitHub.

How to Use Recursion in Scala

When learning about recursion, a common example is the Fibonacci sequence. In simple terms, we say that the nth Fibonacci number is equal to the sum on the (n-1)th Fibonacci number and the (n-2)th Fibonacci number. By adding in the rule that the 0th and 1st Fibonacci numbers are 0 and 1 respectively, it’s possible to determine the value of any number in the series. This is pretty simple to define:

  def fib(n:Long):Long = n match {
    case 0 => 0
    case 1 => 1
    case _ => fib(n - 1) + fib(n - 2)
  }

This is a really beautiful function because it purely describes the rules above. If n is 0 or 1, we exit the function otherwise we recursively calculate the (n-1)th and (n-2)th Fibonacci numbers. Unfortunately, pure functions are not always efficient in Scala.

Consider how the computer would work through fib(4). It must traverse a tree of functions until it gets to either fib(1) or fib(0):

                    fib(4)
                 /         \
          fib(3)              fib(2)
          /   \               /   \
     fib(2)    fib(1)    fib(1)    fib(0)
     /   \
fib(1)    fib(0)

Since we told the computer the values of fib(1) and fib(0), it can finally calculate the total by summing all of these together–3.

This tree quickly increases in complexity (at the rate of 2n) to the point where the function is almost unusable for even small numbers like 30 or 50. While most Scala programmers seem to agree that a functional style is preferable to an imperative style, the language itself allows us to do either. Should we sacrifice beauty for practicality? Let’s consider an imperative approach.

  def fib() = {
    var n = 4
    var a = 0
    var b = 1
    var c = 1
    while(n > 0) {
      a = b
      b = c
      c = a + b
      n = n - 1
    }
    a
  }

This code is a bit harder to read. To start out with, we have to declare our n value inside the function which feels wrong. Then we seed a few variables with the start of the Fibonacci sequence: 0, 1, 1. Here’s a chart to show how the computer processes this loop:

nabc
[seed]011
4112
3123
2235
1358

Notice how in each iteration of the loop the old value of a drops off? This saves both time and memory. It saves time because there are fewer iterations of the loop to run through (just 4 instead of 8), and it saves memory because we are only keeping track of a, b, c, and n instead of all the branches of the tree in the previous function.

So while we now have an efficient method, it feels a bit dirty and is not easy to understand simply by looking at the code. Is there a way to balance intuitiveness and efficiency? In fact, we can take advantage of tail recursion in Scala.

  def fib(n:Long):Long = {
    def fibHelper(n:Long, a:Long, b:Long):Long = {
      if(n == 0) a
      else fibHelper(n - 1, b, a + b)
    }
    return fibHelper(n, 0, 1)
  }

This function is “tail recursive” because it is recursive (fibHelper calls itself), but the computer only needs to keep track of the result of the previous call (the “tail”). Just like we saw in the imperative approach, this function “forgets” about the previous a value. By applying a tail recursive approach, we get the performance of the imperative method but with a relatively “pure” function. The tail recursive function does not resort to modifying variables nor does it require n to be defined inside the method.

Here’s a performance comparison for calculating Fibonacci 35:

TypeStack Memory (bytes)Time (ms)
Naive Recursion7889278.8502
Imperative50560.0030
Tail Recursion52480.0037

Tail recursion is so preferable that Scala will try to optimize them automatically when compiling to bytecode. On my version (2.10.4), this function was automatically optimized. However, you can provide a compiler hint with the @tailrec annotation right above the fib method. This will force the compiler to optimize the function accordingly (and will throw a compile-time error if the function is not truly tail-recursive).

Much of this was inspired by the discussion of recursion in Structure and Interpretation of Computer Programs. For more details, I highly recommend this book.

Implementing Minimax in Scala: Alpha-Beta Pruning

Last time, we built a naive minimax algorithm in Scala. While it was effective, it was unfortunately very slow. To increase the performance of the algorithm, we’ll implement a common variant of minimax with what is called “alpha-beta pruning”. Again, if you want to see the full code right away, head over to GitHub.

Recall that the minimax algorithm works by playing out all possible games given a starting game state. During the course of the analysis, undoubtably paths will be encountered that have no hope of resulting in victory. Alpha-beta pruning removes those as possible paths and stops searching any deeper down those paths. This is accomplished by applying a check for each iteration to see if the score returned is worse than the best score encountered.

  private def minimax(state:GameState):Int = {
    minimize(state, maxDepth, Integer.MIN_VALUE, Integer.MAX_VALUE)
  }

  private def minimize(state:GameState, depth:Int, alpha:Int, beta:Int):Int = {
    if(state.isTerminal || depth == 0) return state.hVal
    var newBeta = beta
    state.availableMoves.foreach(move => {
      val newState = state.makeMove(move)
      newBeta = math.min(beta, maximize(newState, depth - 1, alpha, newBeta))
      if (alpha >= newBeta) return alpha
    })
    newBeta
  }

You should see a lot similarity here. The big difference is that now we instead of just accepting the best result as the result of maximize function, β is now the smaller of the previous β value or the result of maximize. Next comes the critical check. If α ≥ β we immediately exit the function and return α as the best possible score.

The maximize function is of course quite similar, except this time we return β if no better scores are are found.

  private def maximize(state:GameState, depth:Int, alpha:Int, beta:Int):Int = {
    if(state.isTerminal || depth == 0) return state.hVal
    var newAlpha = alpha
    state.availableMoves.foreach(move => {
      val newState = state.makeMove(move)
      newAlpha = math.max(newAlpha, minimize(newState, depth - 1, newAlpha, beta))
      if (newAlpha >= beta) return beta
    })
    newAlpha
  }

When I ran this code in the same game as the naive algorithm, the processing time was an order of magnitude faster.