Introducing Scalinear 0.1: A Simple Linear Algebra Library for Scala

Earlier this month I was dabbling with WebGL in Scala.js and found myself running into a wall with the matrix math. Since Scala provides no native support for matrices and many WebGL projects leverage Sylvester.js, I was stuck trying to find a comparable Scala library from which I could poach code. (Scala.js interprets Scala source code).

The best I found was Breeze, which does indeed provide a lot of good support for matrices and vectors. Unfortunately, it is a large library and does a lot more than I needed. Pulling down the necessary code was impractical. So I found myself building my own simple library to tide me over. The result was Scalinear: a simple, no-frills library for matrix and vector support in Scala.


Vectors are created simply:

scala> val v = Vector(2,4,6,8)
v: com.letstalkdata.scalinear.Vector[Int] = [2, 4, 6, 8]

They also support arithmetic in an intuitive manner:

scala> v / 2
res1: com.letstalkdata.scalinear.Vector[Int] = [1, 2, 3, 4]

scala> val u = Vector(3,3,3,3)
u: com.letstalkdata.scalinear.Vector[Int] = [3, 3, 3, 3]

scala> v + u
res2: com.letstalkdata.scalinear.Vector[Int] = [5, 7, 9, 11]


Matrices are syntactically similar to Vertices:

scala> val A = Matrix(Vector(1,2), Vector(3,4), Vector(5,6))
A: com.letstalkdata.scalinear.Matrix[Int] =
[1, 2]
[3, 4]
[5, 6]

scala> val B = Matrix(Vector(1,2,3), Vector(4,5,6))
B: com.letstalkdata.scalinear.Matrix[Int] =
[1, 2, 3]
[4, 5, 6]

scala> A * B
res0: com.letstalkdata.scalinear.Matrix[Int] =
[9, 12, 15]
[19, 26, 33]
[29, 40, 51]

By design, the library is very limited. It’s definitely not intended as a replacement for something like Breeze. Instead, it’s a simple solution when only basic linear algebra support is needed. Having said that, I’d love to know if there are missing functions that would make your life easier. Please take a look at the project and feel free to leave feedback below.

More experimenting with Scala.js

When I was younger, I remember spending way too much time with a puzzle much like this one. Mine was diamond shaped, and on the back there were triangles to create a 3D pyramid with the same shapes. It came with a solution guide, and at the time it seemed like there were thousands of solutions. In retrospect, there were probably only a few hundred.

The simplicity of the game made it seem accessible as an early programming project, and indeed it was one of the first graphical applications I tried to make. My GitHub log tells me the first commit was in July of 2012, and I can’t believe it’s only been three years since I started getting serious about programming.

I never really finished that project and have been wanting to revisit it for some time now. When I decided to learn more about HTML5 canvas graphics, it seemed like the perfect opportunity. But I kind of hate JavaScript. Whenever I find myself writing JavaScript, I feel like I’m hacking my way through the code to fix some weird UI behavior. So it seemed obvious to me combine my recent dabbling in Scala.js with my exploration of HTML5.

The result was a first draft of Hexiles.
A screen shot of the hexiles puzzle game

Play the game!
Fork the game on GitHub!

How to use Python generators to save memory

I recently saw a Reddit thread where someone was asking for help managing memory in Python. It was a pretty short script, but it contained the following function:

def generate():
    charlist = "abcdefghijklmnopqrstuvwxyz"
    return [''.join(i) for i in itertools.permutations(charlist)]

Well there's your problem.Needless to say, I think I found the memory problem.

This function returns every possible ordering of the lower case letters a through z. Since there are 26 letters, this means there are 26! permutations—which is a really big number. To give you an idea, this results in more strings than there are grains of sand on Earth and stars in the universe. (Interesting side fact: any shuffled deck of cards is just 1/52! possible decks which means almost certainly no other card deck has ever been ordered that same way.)

The user reported 20% of his memory being used, but I’m surprised that even happened. My back of the envelope suggests that (26! permutations) * (26 letters) * (1 byte per letter) is way more than would fit into anyone’s computer memory. It’s possible Python does some optimization of string storage. Regardless, using a generator, we can reduce the memory footprint to almost nothing.

Let’s take a look first at the memory footprint when generating permutations for just 10 characters. I used the guppy package to gather the stats which currently only works for Python 2.

Python 2.7.6 (default, Sep  9 2014, 15:04:36) 
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from guppy import hpy
>>> import itertools
>>> hp = hpy()
>>> start = hp.heap()
>>> start[0]
Partition of a set of 11866 objects. Total size = 977040 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0  11866 100   977040 100    977040 100 str

After the imports, a profile shows ~11,000 string in memory using about 1MB of memory. Now we’ll do our expensive calculation.

>>> x = [''.join(p) for p in itertools.permutations('0123456789')]
>>> end = hp.heap()
>>> used = end - start
>>> used[0]
Partition of a set of 3628805 objects. Total size = 174182672 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0 3628805 100 174182672 100 174182672 100 str

As expected we have created ~10! new strings which take up 174MB of memory.

Instead of generating all of these strings at once and storing them in memory, we can instead use a generator. A generator is lazy: instead of creating all of the strings immediately, it will generate one string at a time on demand. Switching from a list comprehension to a generator comprehension is trivial. Simply replace the brackets ([ ]) with parentheses.

Here’s what the memory usage looks like with a generator:

Python 2.7.6 (default, Sep  9 2014, 15:04:36) 
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import itertools
>>> from guppy import hpy
>>> hp = hpy()
>>> start = hp.heap()
>>> x = (''.join(p) for p in itertools.permutations('0123456789'))
>>> end = hp.heap()
>>> used = end - start
>>> used
Partition of a set of 32 objects. Total size = 5272 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0      6  19     1680  32      1680  32 dict (no owner)
     1      3   9     1488  28      3168  60 types.FrameType
     2      2   6      560  11      3728  71 dict of guppy.etc.Glue.Owner
     3      6  19      488   9      4216  80 tuple
     4      9  28      480   9      4696  89 str
     5      2   6      256   5      4952  94 types.CodeType
     6      2   6      144   3      5096  97 guppy.etc.Glue.Owner
     7      1   3       96   2      5192  98 itertools.permutations
     8      1   3       80   2      5272 100 types.GeneratorType

Wow! We only created a handful strings (probably due to the variable names). None of the permutations have actually been calculated at this point. To get those permutations, we can call and iterate through one string at a time. Also take note that the generator itself takes up a trivial 80 bytes.

What we probably won’t save on is CPU time. Using Python’s timeit module, it took 9.326 seconds on a 2.9GHz i7 to generate one million permutations of the letters. To actually generate all 26! permutations on my computer would still take trillions of years. So bear in mind that when you use a generator, you still should only be creating a number of objects that you can generate in a reasonable amount of time.

By using a generator, we saw that it is now memory-safe to create 26! permutations since they will only be generated one at a time, on demand. Generators will not work for all situations, but any time you have a large number of objects and you only need to see one at a time, you might consider using a generator to save memory.

Review: edX Artificial Intelligence

Artificial intelligence has always been something that has interested me, but more in the “I can beat you at chess” sense than the Data from Star Trek sense. Perhaps because the former seems like something I can reasonably design whereas the latter is so far impossible. Thankfully for me, the edX Artificial Intelligence class (aka BerkeleyX: CS188.1x) deals strictly with problem solving.

I was really impressed by the quality of the lectures in this class. They are mostly videos of Dan Klein teaching the class at Berkeley accompanied by really great Power Point slides. I’ve noticed that the best teachers are those who are experts in their field, but still choose to teach intro classes. Professor Klein clearly falls into that category and his passion really comes through in this class. He covers a wide variety of AI topics including search, adversaries, MDPs, and lotteries. Getting to see these ideas come to life in “real-time” through the programming challenges is very encouraging and helps to solidify the concepts learned in lecture. I also appreciated that the weeks with programming challenges did not have any other requirements such as lectures or quizzes.

The main complaint I have is a technical complaint about the edX interface. Homework questions can be answered an unlimited number of times and the final exam questions can be answered 1-2 times. However, after answering a question incorrectly, you cannot simply change your answer and recheck it. Instead, you must hit “reset” which clears your incorrect answer(s) and even more frustratingly sometimes changes the problem! This is not that big of a deal for multiple choice questions, but there were many multi-part, math-based questions where you could get, say, eight out of ten answers correct yet have to redo the entire problem after hitting reset. Based on the discussion forums, some students ended up writing scripts to solve the questions given certain inputs. I went more low-tech and built some spreadsheets to do the same thing. I never got the feeling that this was expected. At most the questions say something to the effect of, “We suggest you work these out on paper before answering the questions.” I understand they may want to prevent students from just randomly guessing, but this just felt punishing.

A smaller complaint, really more of an annoyance, is the number of edX-external websites involved in the class. Although participation is not required, discussion took place on Piazza instead of edX’s built in discussion boards. I immediately thought, “Oh great, yet another ‘novel’ approach to the discussion board” and, go figure, it was nothing special and even seemed to lack some common message board features. Another example (also not required) was submitting practice final exams to GradeScope for grading.

Despite these issues, I would still definitely recommend the class. The content is of such high quality that it’s easy to overlook some frustration here and there. If you are looking for an advanced programming MOOC or have an interest in AI, this is a great place to start.

An Experiment with Scala.js

We don’t usually think of Scala as a front-end language, but Scala.js is challenging that assumption. The framework gives you easy access to the DOM and therefore the canvas, making it trivial to develop an HTML5 application.

To demonstrate this, I built an A* maze solver entirely in Scala.js:
A summary of my discoveries:

  • Obviously, you get everything you love about Scala (the language) and get to avoid everything you hate about JavaScript (the language).
  • Rapid prototyping and development of small web apps
  • Fast and highly-optimized code


  • SBT is a pain to work with. Versions must line up, and the SBT spell must be incanted exactly to get your project to build.
  • You can’t run code or tests from the IDE (well, at least IntelliJ). One workaround I saw is to create two projects: one for Scala and one for Scala.js and integrate them in the IDE. Feels clumsy.
  • You can only use code that has been compiled for Scala.js which means that most libraries and any Java code you have won’t work out of the box.

I wouldn’t be ready to recommend Scala.js in production, but it is definitely fun to play with and I hope more people try it out and expand the ecosystem.

Full code for this project available on GitHub.