How to combine Scala pattern matching with regex

Scala’s pattern matching is arguably one of its most powerful features and is straight-forward to use when matching on patterns like x::xs vs. x vs. Nil, but you can also use it to match regular expressions. This short tutorial will show you how to use pattern matching and regex to parse a simple DSL for filtering search results.

The domain of the tutorial is a library system where users can search by author, title, or year. They can also combine filters to make the search results more narrow. We’ll start by defining some objects to work with.

case class Book(title:String, author:String, year:Int)

val books = List(
  Book("Moby Dick", "Herman Melville", 1851),
  Book("A Tale of Two Cities", "Charles Dickens", 1859),
  Book("Oliver Twist", "Charles Dickens", 1837),
  Book("The Adventures of Tom Sawyer", "Mark Twain", 1876),
  Book("The Left Hand of Darkness", "Ursula Le Guin", 1969),
  Book("Never Let Me Go", "Kazuo Ishiguro", 2005)
)

To filter the books, we need to supply one or more predicates. A predicate is a function that accepts a Book and returns a Boolean. Our goal is to turn something like “author=Charles Dickens” into a predicate. For starters, we need to be able to parse out user-supplied value “Charles Dickens”. Scala’s regex compiler allows for groups to be surrounded by parentheses which can then be extracted as values. The example in the documentation is val date = """(\d\d\d\d)-(\d\d)-(\d\d)""".r. You can see there are three groups defined: one each for year, month, and day. Here are the patterns we’ll allow to constrain search results:

val authorEquals = """author=([\w\s]+)""".r
val authorLike   = """author~([\w\s]+)""".r
val titleEquals  = """title=([\w\s]+)""".r
val titleLike    = """title~([\w\s]+)""".r
val yearBefore   = """year<(\d+)""".r
val yearAfter    = """year>(\d+)""".r

Remember that the goal is to return a predicate for each filter. The syntax for an anonymous predicate is (b:Book) => [boolean]. Using our example, we could create a predicate (b:Book) => b.author == "Charles Dickens". To make the function generic, we need to be able to extract the supplied author value from the filter. Using the predefined regular expressions combined with pattern matching, we can do just that.

def parseFilter(filterString:String):Book => Boolean = filterString match {
  case authorEquals(value) => (b:Book) => b.author == value
}

The filterString is passed in and pattern matched against the pre-defined regular expression authorEquals. Since we declared one group in the expression, we can name that group (value) and then use that group as a variable. Here’s the complete function that includes all of the expressions.

def parseFilter(filterString:String):Book => Boolean = filterString match {
  case authorEquals(value) => (b:Book) => b.author == value
  case authorLike(value)   => (b:Book) => b.author.contains(value)
  case titleEquals(value)  => (b:Book) => b.title == value
  case titleLike(value)    => (b:Book) => b.title.contains(value)
  case yearBefore(value)   => (b:Book) => b.year < Integer.valueOf(value)
  case yearAfter(value)    => (b:Book) => b.year > Integer.valueOf(value)
  case _                   => (b:Book) => false
}

The last case catches any filter that doesn’t match a pattern and returns a predicate that does not match any book. The functional result being that an invalid filter returns no search results.

Finally, we need to be able to check a book against one or more filters. The forall method is true only if all of the filters match the given book.

def checkBook(b:Book, filterString:String) = {
  val filters = filterString.split(",").map(s => parseFilter(s))
  filters.forall(_(b))
}

We now have everything in place to filter the books according to our search string. Here are some examples:

books.filter(b => checkBook(b, "author=Charles Dickens"))
res0: List[Book] = List(
    Book(A Tale of Two Cities,Charles Dickens,1859),
    Book(Oliver Twist,Charles Dickens,1837))

books.filter(b => checkBook(b, "author=Charles Dickens,year>1840"))
res1: List[Book] = List(
    Book(A Tale of Two Cities,Charles Dickens,1859))

books.filter(b => checkBook(b, "title~of"))
res2: List[Book] = List(
    Book(A Tale of Two Cities,Charles Dickens,1859),
    Book(The Adventures of Tom Sawyer,Mark Twain,1876),
    Book(The Left Hand of Darkness,Ursula Le Guin,1969))

Try to add some more filters such as “starts with” or “year equals” to get practice working with regex matching.

Leave a Reply

Your email address will not be published. Required fields are marked *