In some previous posts I have mentioned my struggles with the performance of the computations needed to implement the ARMA strategies in practice. Finally I have found a worthy solution, and as usual, there is a programming pattern to learn from it – avoid loops in R. 🙂

My first approach was to optimize the algorithms. Willing to trade some quality of the models to gain in performance, I tried a few alternatives, but I didn’t like neither of them. Then I concentrated on improving the overall R performance. After applying a few easy to do things I had to look for something more substantial.

For a little bit I toyed with the idea to use GPU, but although they can provide massive performance improvements, quite often they require a lot of specialized code and this alone can postpone using the system for months.

Then I took a step back, and reconsidered the issues. I am running two expensive tasks each day, on an 8-core (Intel i7 2600K processor, 4 core with hyper-threading) machine. Since each task is a single R process, I realized that I am not using the CPU maximum capacity. So I considered splitting each task in pieces, manually, but (luckily) before doing so, I decided to google for R parallelism.

The solution I finally came to was to use the multicore R package. The only changes I needed to make to my code, was to remove the loops! As an illustration, let’s take the dumbest example, let’s suppose we are computing sqrt with the following code:

for( ii in 1:100 ) { print( sqrt( ii ) ) }

The transformed, *mutlicore-friendly* code looks like:

ll = c() for( ii in 1:100 ) { ll[ii] = ii } print( lapply( ll, sqrt ) )

Why is the last code *multicore-friendly*? Because one can transparently switch to *mclapply* from the multicore package:

library( multicore) ll = c() for( ii in 1:100 ) { ll[ii] = ii } print( mclapply( ll, sqrt ), mc.cores=multicore:::detectCores( ) )

The last version will “lapply” sqrt to each element in the array using as many threads as there are cores in the system! Assuming an 8-core system, the first 8 sqrts will be computed in parallel, and then a new sqrt will be started as soon as one of the previous finishes. Notice the line specifying the number of cores, the package is supposed to detect the number of cores on initialization, but that’s not the case on my system.

This pattern worked perfectly for the ARMA strategy (and for any other strategy computing all required outcomes in similar fashion for that matter): On each day, we need to compute the different actions to be taken for different closing prices. The only invariant in the loop body, ie between different closing prices, is the particular closing price for that iteration. So, I did exactly the same as what I did for sqrt – computed all interesting prices in a loop and then passed everything to mclapply to do the work!

A small and low-risk code change (easy to verify using a known-to-work function) resulted in almost 4 times performance improvement (I run each of the two instruments I currently trade with mc.cores == 4, that’s why the factor of only 4)!

Make sure to remember this patter next time you consider using a loop – I certainly will.