In a previous related blog we discussed how pure functions are memoizable. In this blog, we discuss another benefit from this list: pure functions are easier to parallelize.

Functional style of programming makes code more expressive and concise. As we get more comfortable with functional style, where it makes sense, we can lean more towards functional style instead of the imperative style. However, we have to be careful to avoid impure lambda expressions. Let's discuss with an example.

Here's a piece of code that computes the sum of square root of prime numbers from 2 to a given number:

```
import java.util.*;
import java.util.stream.IntStream;
public class Sample {
private static double sumOfSqrtsOfPrimes = 0.0;
public static boolean isPrime(int number) {
return number > 1 &&
IntStream.range(2, number)
.noneMatch(i -> number % i == 0);
}
public static void main(String[] args) {
int number = 500000;
long start = System.nanoTime();
for(int i = 1; i <= number; i++) {
if(isPrime(i))
sumOfSqrtsOfPrimes += Math.sqrt(i);
}
long end = System.nanoTime();
System.out.printf("Time Taken: %g sec\n", (end - start)/1e9);
System.out.println(sumOfSqrtsOfPrimes);
}
}
```

The code within the `main()`

function is imperative in style. It sequentially iterates through each number from `1`

to the given `number`

. It computes the square root of each prime number it finds in that range and adds the value to the `sumOfSqrtsOfPrimes`

variable.

Let's run the code to see how long this code takes to run.

```
Time Taken: 64.6315 sec
1.896851199544097E7
```

It took a little over a minute to run. Let's now change the code in `main()`

from imperative to functional style, but rather naively.

```
public static void main(String[] args) {
int number = 500000;
long start = System.nanoTime();
IntStream.rangeClosed(1, number)
.boxed()
.filter(Sample::isPrime)
.map(Math::sqrt)
.forEach(value -> sumOfSqrtsOfPrimes += value); //This is bad
long end = System.nanoTime();
System.out.printf("Time Taken: %g sec\n", (end - start)/1e9);
System.out.println(sumOfSqrtsOfPrimes);
}
```

First, we get a range of values using the `rangeClosed`

method of `IntStream`

, convert that to a `Stream<Integer>`

using the `boxed()`

function. Then we filter the `Stream<Integer>`

to get only prime numbers, then compute the square root of those values. All is well so far in the code. In the final step, however, we provide to `sumOfSqrtsOfPrimes`

. We'll discuss the consequence of this shortly, but first let's run the code and check on the result and performance.

```
Time Taken: 45.8868 sec
1.896851199544097E7
```

The result of the computation is exactly the same. The time is better as well, but this solution is sequential much like the imperative solution.

Compared to the imperative solution, the functional solution is much easier to parallelize. In order to parallelize the imperative solution we have to manage a pool of threads and schedule the execution on different threads. In the functional solution, we can simply ask for the execution to be parallelized. That fits well with the spirit of functional style—tell what we want without getting into the details of how to do that.

Unfortunately, there's a catch as we'll see soon.

Let's parallelize the execution of this code first.

```
IntStream.rangeClosed(1, number)
.boxed()
.parallel()
.filter(Sample::isPrime)
.map(Math::sqrt)
.forEach(value -> sumOfSqrtsOfPrimes += value); //This is bad
```

We only had to make a small change—inserted a call to `parallel()`

right before calling the `filter()`

operation. Now, instead of sequentially working on each of the number in the range, the operations will be performed in parallel. The number of threads on which the operations will be scheduled, by default, depends on the number of cores on the system. The program should run faster. Let's take a look at the result of the run.

```
Time Taken: 11.6477 sec
1.896639027717821E7
```

It certainly took a lot less time. Before we go out to celebrate, let's compare the sum of the square root of the primes computed by the different versions. Hmm, we got `1.896639027717821E7`

while the previous two versions printed `1.896851199544097E7`

.

No point running really fast to produce the wrong result.

The culprit is shared mutability. The lambda expression we passed to the `forEach`

method is not a pure-function. When called from multiple threads, the lambdas ran into a race condition while updating the shared mutable variable. Trying to synchronize this access is a wrong solution—we get dragged into the complexity and details we can easily avoid by avoiding explicit mutability.

Even though the functional style is highly capable of parallelization, purity of lambda expressions is vital.

Using impure functions with functional style is like having indigestion when there's surplus gourmet food on the table.

Let's fix the ill-written function composition, to do the right things—avoid mutation, make the lambda expression pure. We'll first create a sequential version.

```
double sumOfSqrtsOfPrimes =
IntStream.rangeClosed(1, number)
.boxed()
.filter(Sample::isPrime)
.map(Math::sqrt)
.reduce(0.0, Double::sum);
```

Doing the right things was not hard. We avoided any explicit mutation in the code. The function function `sum`

of `Double`

is a pure function. We defined a local variable `sumOfSqrtsOfPrimes`

and set it to the result from the function composition. All the functions we used in the function composition are guaranteed by the JDK to be thread-safe.

Let's take a look at how this version performs.

```
Time Taken: 46.3244 sec
1.896851199544097E7
```

First, the result is consistent with the previous sequential versions. The time the code took was similar to the previous ill-written functional style. But, this version is not only easier to parallelize, it's safe as well. Let's do that.

```
double sumOfSqrtsOfPrimes =
IntStream.rangeClosed(1, number)
.boxed()
.parallel()
.filter(Sample::isPrime)
.map(Math::sqrt)
.reduce(0.0, Double::sum);
```

We parallelized this version much like we tried with the earlier version. But, now we're in the right direction. The lambdas given to the `filter`

and the `map`

functions are run concurrently. In addition, the lambda given to the `reduce`

function is also run concurrently, but the `reduce`

function will perform a divide-and-conquer operation and merge the partial results of computations in a thread-safe manner. The result of this computation will not only be faster than the sequential version, but the result be correct as well:

```
Time Taken: 10.8499 sec
1.8968511995440975E7
```

As we get more comfortable and familiar with functional style, we have to remember to keep the lambda expressions pure. In particular, we should avoid shared mutability. I've received emails from programmers asking why their code fails to produce proper result when they add `parallel`

. Often times the problem has been shared mutability from lambda expressions. Let's be careful not to fall into such trap.

In this series of blogs, we've looked at five benefits so far. In the next blog we'll look at another benefit: pure functions can be lazy.