I recently saw a Reddit thread where someone was asking for help managing memory in Python. It was a pretty short script, but it contained the following function:
def generate(): charlist = "abcdefghijklmnopqrstuvwxyz" return [''.join(i) for i in itertools.permutations(charlist)]
Needless to say, I think I found the memory problem.
This function returns every possible ordering of the lower case letters a through z. Since there are 26 letters, this means there are 26! permutations—which is a really big number. To give you an idea, this results in more strings than there are grains of sand on Earth and stars in the universe. (Interesting side fact: any shuffled deck of cards is just 1/52! possible decks which means almost certainly no other card deck has ever been ordered that same way.)
The user reported 20% of his memory being used, but I’m surprised that even happened. My back of the envelope suggests that (26! permutations) * (26 letters) * (1 byte per letter) is way more than would fit into anyone’s computer memory. It’s possible Python does some optimization of string storage. Regardless, using a generator, we can reduce the memory footprint to almost nothing.
Let’s take a look first at the memory footprint when generating permutations for just 10 characters. I used the guppy package to gather the stats which currently only works for Python 2.
Python 2.7.6 (default, Sep 9 2014, 15:04:36) [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> from guppy import hpy >>> import itertools >>> hp = hpy() >>> start = hp.heap() >>> start Partition of a set of 11866 objects. Total size = 977040 bytes. Index Count % Size % Cumulative % Kind (class / dict of class) 0 11866 100 977040 100 977040 100 str
After the imports, a profile shows ~11,000 string in memory using about 1MB of memory. Now we’ll do our expensive calculation.
>>> x = [''.join(p) for p in itertools.permutations('0123456789')] >>> end = hp.heap() >>> used = end - start >>> used Partition of a set of 3628805 objects. Total size = 174182672 bytes. Index Count % Size % Cumulative % Kind (class / dict of class) 0 3628805 100 174182672 100 174182672 100 str
As expected we have created ~10! new strings which take up 174MB of memory.
Instead of generating all of these strings at once and storing them in memory, we can instead use a generator. A generator is lazy: instead of creating all of the strings immediately, it will generate one string at a time on demand. Switching from a list comprehension to a generator comprehension is trivial. Simply replace the brackets (
[ ]) with parentheses.
Here’s what the memory usage looks like with a generator:
Python 2.7.6 (default, Sep 9 2014, 15:04:36) [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import itertools >>> from guppy import hpy >>> hp = hpy() >>> start = hp.heap() >>> x = (''.join(p) for p in itertools.permutations('0123456789')) >>> end = hp.heap() >>> used = end - start >>> used Partition of a set of 32 objects. Total size = 5272 bytes. Index Count % Size % Cumulative % Kind (class / dict of class) 0 6 19 1680 32 1680 32 dict (no owner) 1 3 9 1488 28 3168 60 types.FrameType 2 2 6 560 11 3728 71 dict of guppy.etc.Glue.Owner 3 6 19 488 9 4216 80 tuple 4 9 28 480 9 4696 89 str 5 2 6 256 5 4952 94 types.CodeType 6 2 6 144 3 5096 97 guppy.etc.Glue.Owner 7 1 3 96 2 5192 98 itertools.permutations 8 1 3 80 2 5272 100 types.GeneratorType
Wow! We only created a handful strings (probably due to the variable names). None of the permutations have actually been calculated at this point. To get those permutations, we can call
x.next() and iterate through one string at a time. Also take note that the generator itself takes up a trivial 80 bytes.
What we probably won’t save on is CPU time. Using Python’s timeit module, it took 9.326 seconds on a 2.9GHz i7 to generate one million permutations of the letters. To actually generate all 26! permutations on my computer would still take trillions of years. So bear in mind that when you use a generator, you still should only be creating a number of objects that you can generate in a reasonable amount of time.
By using a generator, we saw that it is now memory-safe to create 26! permutations since they will only be generated one at a time, on demand. Generators will not work for all situations, but any time you have a large number of objects and you only need to see one at a time, you might consider using a generator to save memory.