Scala Notebook

Charles Kubicek

Replace Iterations With Transformations

The Scala collections library is extremely powerful and one of the most compelling parts of Scala. To use if effectively requires re-thinking the approach of programming with collections and data in general.

In Java we think about data processing in terms of iterations; we iterate over collections to create new collections by applying some logic. Here’s a simple example:

Java Example
1
2
3
4
5
6
7
List<Customer> customers = getCustomers()

List<String> result = new ArrayList<String>();

for(Customer c: customers){
  if(c.age > 18) result.add(c.firstName);
}

At the core of data processing with Scala collections are functions which take other functions as parameters, known has higher-order functions or Lambda functions). This relatively simple concept has a profound effect on not only how we do data processing, but how we think about data processing in general. In many cases the function parameter is defined anonymously without being part of a class or object.

Here’s what the above example looks like using higher-order functions in Scala:

Scala Example
1
2
3
4
5
val customers:List[Customer] = getCustomers()

val result = customers
               .filter(c => c.age > 18)
               .map(c => c.firstName)
  • filter is a function on the customers List that takes a function as a parameter. The signature of filter states that the function parameter must take on parameter of the type contained in the collection, and return a boolean. Filter returns a new collection that contains the values which matched the function. You may hear the function parameter referred to as a predicate when used in a filter.

  • map is also a method on the customers List that takes a function but in this case it can return anything. Map will return a new collection with the results of the function parameter applied to each element of the List. The idea is you’re mapping from an input to an output.

The example above could be shortened by replacing the ‘c’ value with an underscore to:

With Underscores
1
2
3
val result = customers
               .filter(_.age > 18)
               .map(_.firstName)

The previous examples defined function parameters ‘in-place’, meaning they’re defined at the point of use as opposed to being functions defined elsewhere and referenced but we could use pre-defined functions:

With Pre-Defined functions
1
2
3
4
5
6
def over18(age:Int) = age > 18
def firstName(customer:Customer) = customer.name

val result = customers
               .filter(over18)
               .map(firstName)

The three Scala examples do the same thing, I’ll let you decide which syntax you prefer. The important things to note about the Scala examples compared to the Java example is:

  • There is no longer a loop Both map and filter encapsulate the looping/iterating so you don’t have to do it, you just provide the transformation function.
  • The List functions return new Lists This removes the need to define a collection to put results into, again this is taken care of for you. Under the covers it uses a Persistent Data Structure for efficiency – it doesn’t actually copy the values a new collection.
  • There is no longer an ‘if’ block Filter encapsulates the ‘if’ so you don’t have to do it, you just provide the transformation function.

With the iterations and conditional logic taken care of for you, all you have to do is provide the transformations you wish to happen as higher-order functions. Another bonus is that immutable programs are easier to write when using transformations as there is no need to define intermediate collections which contain results.

After getting used to programming this way I have found that my code is clearer, there is less of it and it’s easier to digest and verify. But I’ve also found the way I think about processing data has fundamentally changed. With fundamental concepts like map, filter and a few others I feel like I’m better able to visualise and think about how data in my program flows. I’ve also made less mistakes.

The Scala collections library has many operations that you can use instead of writing iterations and the more familiar you are with them the better. Here are some of them to wet your appetite:

  • Map Transforms each item in the collection by applying a function
  • Filter Returns the items in a collection that match a predicate
  • Fold Accumulates items form collection into one single value
  • Flatten transforms a collection of collections into one collection
  • Zip transforms two collections into one containing pairs of items
  • Take returns the first n items in a collection
  • TakeWhile returns the first n number of items that match some predicate
  • Partition transforms a collection in to two collections according to some predicate
  • Slice Returns a given range of items from a collection
  • GroupBy groups items together according to a given function

Scala Notebook: Introduction

In this blog I publish my own notes about the things I’ve learnt when learning how to use Scala coming from a Java background.

There’s lots of good blogs and books but I found some of this material quite hard to comprehend initially and only started to ‘get it’ after doing stuff my self. This aim of this blog is to capture the bits that were missing for me.