The Average Investor's Blog

A software developer view on the markets

Archive for the ‘coding’ Category

Yet another reason to avoid loops in R

Posted by The Average Investor on Jul 12, 2011

In some previous posts I have mentioned my struggles with the performance of the computations needed to implement the ARMA strategies in practice. Finally I have found a worthy solution, and as usual, there is a programming pattern to learn from it – avoid loops in R. 🙂

My first approach was to optimize the algorithms. Willing to trade some quality of the models to gain in performance, I tried a few alternatives, but I didn’t like neither of them. Then I concentrated on improving the overall R performance. After applying a few easy to do things I had to look for something more substantial.

For a little bit I toyed with the idea to use GPU, but although they can provide massive performance improvements, quite often they require a lot of specialized code and this alone can postpone using the system for months.

Then I took a step back, and reconsidered the issues. I am running two expensive tasks each day, on an 8-core (Intel i7 2600K processor, 4 core with hyper-threading) machine. Since each task is a single R process, I realized that I am not using the CPU maximum capacity. So I considered splitting each task in pieces, manually, but (luckily) before doing so, I decided to google for R parallelism.

The solution I finally came to was to use the multicore R package. The only changes I needed to make to my code, was to remove the loops! As an illustration, let’s take the dumbest example, let’s suppose we are computing sqrt with the following code:

for( ii in 1:100 )
   print( sqrt( ii ) )

The transformed, mutlicore-friendly code looks like:

ll = c()
for( ii in 1:100 )
   ll[ii] = ii

print( lapply( ll, sqrt ) )

Why is the last code multicore-friendly? Because one can transparently switch to mclapply from the multicore package:

library( multicore)

ll = c()
for( ii in 1:100 )
   ll[ii] = ii

print( mclapply( ll, sqrt ), mc.cores=multicore:::detectCores( ) )

The last version will “lapply” sqrt to each element in the array using as many threads as there are cores in the system! Assuming an 8-core system, the first 8 sqrts will be computed in parallel, and then a new sqrt will be started as soon as one of the previous finishes. Notice the line specifying the number of cores, the package is supposed to detect the number of cores on initialization, but that’s not the case on my system.

This pattern worked perfectly for the ARMA strategy (and for any other strategy computing all required outcomes in similar fashion for that matter): On each day, we need to compute the different actions to be taken for different closing prices. The only invariant in the loop body, ie between different closing prices, is the particular closing price for that iteration. So, I did exactly the same as what I did for sqrt – computed all interesting prices in a loop and then passed everything to mclapply to do the work!

A small and low-risk code change (easy to verify using a known-to-work function) resulted in almost 4 times performance improvement (I run each of the two instruments I currently trade with mc.cores == 4, that’s why the factor of only 4)!

Make sure to remember this patter next time you consider using a loop – I certainly will.

Posted in coding, R | 4 Comments »

Low-hanging R Optimizations on Ubuntu

Posted by The Average Investor on Jul 1, 2011

A friend of mine brought my attention recently to the fact that the default R install is way to generic and thus sub-optimal. While I didn’t go all the way rebuilding everything from scratch, I did find a few cheap steps one can do to help things a little.

Simply install the libatlas3gf-base package. That’s all, but it boosts the performance on some R calculations many fold. This package is an optimized BLAS library, which R can use out of the box.

My next step was to enable some SSE instructions when packages are compiled. To do that one needs to overwrite some compiler settings. First, I needed to find the R home path on my system: R.home() returned /usr/lib64/R on my Ubuntu 11.04. The I created a file called /usr/lib64/R/etc/ with the following content:

CFLAGS = -std=gnu99 -O3 -pipe -msse4.2
CXXFLAGS = -O3 -pipe -msse4.2

FCFLAGS = -O3 -pipe -msse4.2
FFLAGS = -O3 -pipe -msse4.2

How did I figured out what to add? Well, I looked up the defaults for these settings in /usr/lib64/R/etc/Makeconf and combined them with what I had in mind (adding SSE4.2 by default). I also removed the default -g. Now, when a new package is installed and compiled, you should see the options. For existing packages, uninstall them using remove.packages and then install them back using install.packages. I start R as root (sudo R) for these operations.

Does your CPU support SSE and what version? Run grep -i sse /proc/cpuinfo.

Last I noticed these two variables in Makeconf:

LAPACK_LIBS = -llapack
BLAS_LIBS = -lblas

The next thing I may try when my current simulations finish is to install optimized development libraries for BLAS and LAPACK (libatlas-dev for instance) and then change the above definitions …

Posted in coding, R | 6 Comments »

ARMA Models for Trading, Part III

Posted by The Average Investor on May 2, 2011

All posts in this series were combined into a single, extended tutorial and posted on my new blog.

In the last post I showed how to pick the parameters for the ARMA model. The next step is to determine the position at the close. One way to do that is by a one day ahead prediction, if the prediction comes negative (remember the series we are operating on is the daily returns) then the desired position is short, otherwise it’s long.


getSymbols("SPY", from="1900-01-01")
SPY.rets = diff(log(Ad(SPY)))
SPY.arma = armaFit(~arma(0, 2), data=as.ts(tail(SPY.rets,500)))
predict(SPY.arma, n.ahead=1, doplot=F)

Now, to build an indicator for back testing, one can walk the daily return series and at each point perform the steps we covered so far. The main loop looks like (in pseudocode):

for(ii in history:length(dailyRetSeries))
   tt = as.ts(tail(head(dailyRetSeries, ii), history))
   ttArma = findBestArma()
   predict(ttArma, n.ahead=1, doplot=F)

Where history is the look-back period to consider at each point, I usually use 500, which is about two years of data. Although the above code is simply an illustration, I hope the main idea is pretty clear by now.

As mentioned earlier, findBestArma needs to be surrounded by a tryCatch block. Same goes for the predict – it may fail to converge. What I do is to have predict included in findBestArma, ignoring models for which the prediction fails.

Another improvement is to use ARMA together with GARCH. The latter is a powerful method to model the clustered volatility typically found in financial series. Sounds complex, but it turns out to be pretty straightforward in R. Just to give you an idea:


getSymbols("SPY", from="1900-01-01")
SPY.rets = diff(log(Ad(SPY)))
SPY.garch = garchFit(~arma(0, 2) + garch(1, 1), data=as.ts(tail(SPY.rets, 500)))
predict(SPY.garch, n.ahead=1, doplot=F)

That’s all I have to say on the theoretical side. I will finish this series with more implementation details and some back testing results in the next post …

Posted in coding, R, Strategies | Tagged: , , , | 1 Comment »

Open-sourcing some of my automation code

Posted by The Average Investor on Feb 20, 2011

To automate my trading I use a mix of scripts. Everything goes – R, Python, shell, C++, etc. For some time now I have been satisfied with the tools I have created. They run once a day, gather data from EODDATA, update the database, run some R magic to decide what needs to be done for trading and finally, I get an email with the report. Good enough for me!

All the tools are available from the quantscript project at Google code. Be aware that this is work in very early progress, and be aware that this is not for mass use – you need to know what you are doing. 😉 In any case, I hope these scripts prove useful in your trading!

Posted in coding, eoddata, R | Leave a Comment »

%d bloggers like this: