How to use Python generators to save memory

I recently saw a Reddit thread where someone was asking for help managing memory in Python. It was a pretty short script, but it contained the following function:

def generate():
    charlist = "abcdefghijklmnopqrstuvwxyz"
    return [''.join(i) for i in itertools.permutations(charlist)]

Well there's your problem.Needless to say, I think I found the memory problem.

This function returns every possible ordering of the lower case letters a through z. Since there are 26 letters, this means there are 26! permutations—which is a really big number. To give you an idea, this results in more strings than there are grains of sand on Earth and stars in the universe. (Interesting side fact: any shuffled deck of cards is just 1/52! possible decks which means almost certainly no other card deck has ever been ordered that same way.)

The user reported 20% of his memory being used, but I’m surprised that even happened. My back of the envelope suggests that (26! permutations) * (26 letters) * (1 byte per letter) is way more than would fit into anyone’s computer memory. It’s possible Python does some optimization of string storage. Regardless, using a generator, we can reduce the memory footprint to almost nothing.

Let’s take a look first at the memory footprint when generating permutations for just 10 characters. I used the guppy package to gather the stats which currently only works for Python 2.

Python 2.7.6 (default, Sep  9 2014, 15:04:36) 
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from guppy import hpy
>>> import itertools
>>> hp = hpy()
>>> start = hp.heap()
>>> start[0]
Partition of a set of 11866 objects. Total size = 977040 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0  11866 100   977040 100    977040 100 str

After the imports, a profile shows ~11,000 string in memory using about 1MB of memory. Now we’ll do our expensive calculation.

>>> x = [''.join(p) for p in itertools.permutations('0123456789')]
>>> end = hp.heap()
>>> used = end - start
>>> used[0]
Partition of a set of 3628805 objects. Total size = 174182672 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0 3628805 100 174182672 100 174182672 100 str

As expected we have created ~10! new strings which take up 174MB of memory.

Instead of generating all of these strings at once and storing them in memory, we can instead use a generator. A generator is lazy: instead of creating all of the strings immediately, it will generate one string at a time on demand. Switching from a list comprehension to a generator comprehension is trivial. Simply replace the brackets ([ ]) with parentheses.

Here’s what the memory usage looks like with a generator:

Python 2.7.6 (default, Sep  9 2014, 15:04:36) 
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import itertools
>>> from guppy import hpy
>>> hp = hpy()
>>> start = hp.heap()
>>> x = (''.join(p) for p in itertools.permutations('0123456789'))
>>> end = hp.heap()
>>> used = end - start
>>> used
Partition of a set of 32 objects. Total size = 5272 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0      6  19     1680  32      1680  32 dict (no owner)
     1      3   9     1488  28      3168  60 types.FrameType
     2      2   6      560  11      3728  71 dict of guppy.etc.Glue.Owner
     3      6  19      488   9      4216  80 tuple
     4      9  28      480   9      4696  89 str
     5      2   6      256   5      4952  94 types.CodeType
     6      2   6      144   3      5096  97 guppy.etc.Glue.Owner
     7      1   3       96   2      5192  98 itertools.permutations
     8      1   3       80   2      5272 100 types.GeneratorType

Wow! We only created a handful strings (probably due to the variable names). None of the permutations have actually been calculated at this point. To get those permutations, we can call x.next() and iterate through one string at a time. Also take note that the generator itself takes up a trivial 80 bytes.

What we probably won’t save on is CPU time. Using Python’s timeit module, it took 9.326 seconds on a 2.9GHz i7 to generate one million permutations of the letters. To actually generate all 26! permutations on my computer would still take trillions of years. So bear in mind that when you use a generator, you still should only be creating a number of objects that you can generate in a reasonable amount of time.

By using a generator, we saw that it is now memory-safe to create 26! permutations since they will only be generated one at a time, on demand. Generators will not work for all situations, but any time you have a large number of objects and you only need to see one at a time, you might consider using a generator to save memory.

Leave a Reply

Your email address will not be published. Required fields are marked *