The Average Investor's Blog

A software developer view on the markets

Archive for the ‘R’ Category

Were markets exceptionally volatile in 2011?

Posted by The Average Investor on Jan 3, 2012

2011 was a volatile year, no doubt about that, but was it exceptionally so from a historic point of view? To quantify the volatility, I used the Dow Jones Industrial average, which goes back to 1928 on Yahoo Finance:


getSymbols("^DJI", from="1900-01-01")

dji = Cl(DJI["/2011"])

djiVol = aggregate(
               as.numeric(format(index(dji), "%Y")),
               function(ss) coredata(tail(TTR:::volatility(
                                                   calc="close"), 1)))
# The result is 0.8214286, the 82nd quantile

A volatile year no doubt, but once again confirming the fact that, in markets behaviour at least, history does repeat itself. The following plot clearly shows that the volatility experienced during the great depression dwarves the recent levels:

Dow Jones Annual Volatility

Next, out of pure curiosity, I also computed how much money were to be made with these levels of volatility:


getSymbols("^DJI", from="1900-01-01")

# Compute the absolute returns
absRets = abs(ROC(Cl(DJI["/2011"])))
absRets = reclass(ifelse(,0,absRets),absRets)

# Summarize annually
yy = as.numeric(format(index(absRets), "%Y"))
zz = aggregate(absRets, yy, function(ss) tail(cumprod(1+ss),1))

# The result is 10.64

That’s right, an owner of a crystal ball would have been able to multiply his money 10-fold in 2011 alone! For further comparison, in 1932, the owner of the same ball would have been able to multiply his money … 590 times!


Posted in R | 2 Comments »

More orthodox ARMA/GARCH trading

Posted by The Average Investor on Dec 15, 2011

The system described in the earlier series for ARMA trading was in fact an “extreme” version of the more common, orthodox approach prevailing in the literature. Recently I tried using R to reproduce the results of a particular paper, and that lead to a lot of new developments …

How is typically ARMA trading simulated? The data is split into two sets. The first set is used for model estimation, an in-sample testing. Once the model parameters are determined, the model performance is tested and evaluated using the second set, the out-of-sample forecasting. The first set is usually a few times larger than the second and spans four or more years of data (1000+ trading days).

I wanted to be able to repeat the first step once in a while (weekly, monthly, etc) and to use the determined parameters for forecasts until the next calibration. Now, it’s easier to see why I classified my earlier approach as an “extreme” – it does the model re-evaluation on a daily basis. In any case, I wanted to build a framework to test the more orthodox approach.

To test such an approach, I needed to perform а “rolling” forecast (have mercy if that’s not the right term). Let’s assume we use weekly model calibration. Each Friday (or whatever the last day of the week is) we find the best model according to some criteria. At this point we can forecast one day ahead entirely based on previous data. Once the data for Monday arrives, we can forecast Tuesday, again entirely based on previous data, etc.

My problem was that the package I am using, fGarch, doesn’t support rolling forecasts. So before attempting to implement this functionality, I decided to look around for other packages (thanks god I didn’t jump to coding).

At first, my search led me to the forecast package. I was encouraged – it has exactly the forecast function I needed (in fact, it helped me figure out exactly what I need;)). The only problem – it supports only mean models, ARFIMA, no GARCH.

Next I found the gem – the rugarch package. Not only it implements a few different GARCH models, but it also supports ARFIMA mean models! I found the documentation and examples quite easy to follow too, not to mention that there is an additional introduction. All in all – a superb job!

Needless to say this finding left me feeling like a fat kid in a candy store (R is simply amazing in this regard!). Most likely you will be hearing about mew tests soon, meanwhile let’s finish the post with a short illustration of the rugarch package (single in-sample model training with out-of-sample forecast):


getSymbols("SPY", from="1900-01-01")
spyRets = na.trim( ROC( Cl( SPY ) ) )

# Train over 2000-2004, forecast 2005
ss = spyRets["2000/2005"]
outOfSample = NROW(ss["2005"])

spec = ugarchspec(
            mean.model=list(armaOrder=c(4,5), include.mean=T),
fit = ugarchfit(spec=spec, data=ss, out.sample=outOfSample)
fore = ugarchforecast(fit, n.ahead=1, n.roll=outOfSample)

# Build some sort of indicator base on the forecasts
ind = xts(head(as.array(fore)[,2,],-1),["2005"]))
ind = ifelse(ind < 0, -1, 1)

# Compute the performance
mm = merge( ss["2005"], ind, all=F )

# Output (last line): 2005-12-30  1.129232

Hats down to brilliancy!

Posted in R, Strategies | Tagged: , , , | 12 Comments »

Pre-computing a trading plan in parallel

Posted by The Average Investor on Nov 11, 2011

R version 2.14 introduced a new package, called parallel. This new package combines the functionality from two previous packages: snow and multicore. Since I was using multicore to parallelise my computations, I had to migrate to the new package and decided to publish some code.

Often trading strategies are tested using the daily closing price both to determine the position and to perform the trading. Since we need to pre-compute an action plan, parallelisation may be necessary if the computations are heavy.

The code at the end of this post is pre-computing the actions for the CSS Analytics’ DVI indicator. The entry point in the code is as follows:

library( quantmod )
library( parallel )

# Load the code at the end of this post

# Get the SPY ETF from Yahoo
getSymbols( "SPY", from="1900-01-01" )

# Compute the actions
computeDVIActionPar( Cl( SPY ), range=10, cores=8 )

This basically requests to compute the position for all possible closing prices between -10% and +10%, parallelising the work 8 fold. The output of the command is something like:

   Price    Pct Position
1 111.59 -10.00        1
2 127.97   3.21       -1
3 136.38  10.00       -1

This output tells us that if the SPY doesn’t advance more than 3.21%, closing above $127.97, we should establish a long position at the close, otherwise – short. With that knowledge, and depending on the current position, what is left to do is to go to our Interactice Broker account and to put a limit on-close order. The complete code for the missing functions follows.

computeOneDVIAction = function( close, x )
   x[tail( index( x ), 1 )] = close
   dvi = DVI( x )
   val = as.numeric( tail( dvi$dvi, 1 ) )

   # Short if DVI > 0.5, long otherwise
   if( val ) )
      return( 0 )
   else if( val > 0.5 )
      return( -1 )

   return( 1 )

computeDVIActionPar = function( x, step=0.01, range=5, cores )
   require( quantmod, quietly=TRUE )
   require( parallel, quietly=TRUE )

   prices = c( )
   positions = c( )

   latestClose = as.numeric( coredata( last( x ) ) )

   # Shift to the left to use the last entry as the "guessed" close
   yy = lag( x, -1 )

   # range is percentages
   range = range / 100

   # Compute the vector with all closing prices within the range
   close = latestClose * ( 1 - range )
   lastClose = latestClose * ( 1 + range )

   close = round( close / step ) * step
   numSteps = ( close - latestClose ) / step + 1

   close = round( close, 2 )
   lastClose = ceiling( lastClose * 100 ) / 100

   closes = close

      if( close >= lastClose ) break

      close = round( latestClose + step*numSteps, 2 )

      numSteps = numSteps + 1

      closes = c( closes, close )

   # Detect the cores if not supplied
   if( missing( cores ) )
      cores = parallel:::detectCores()

   res = mclapply( closes,
                   x = yy,
                   mc.cores = cores )

   # Summarize the positions
   prices = c()
   pcts = c()
   positions = c()

   # Impossible position
   lastPosition = -1e9

   len = length( closes )
   for( ii in 1:(len - 1) )
      if( res[[ii]] != lastPosition )
         positions = append( positions, res[[ii]] )
         prices = append( prices, closes[ii] )
         pcts = append( pcts, round( ( closes[ii] - latestClose ) /
                                     latestClose * 100, 2 ) )
         lastPosition = res[[ii]]

   positions = append( positions, res[[len]] )
   prices = append( prices, closes[ii] )
   pcts = append( pcts, round( ( closes[len] - latestClose ) /
                               latestClose * 100, 2 ) )

   df = data.frame( prices, pcts, positions )
   colnames( df ) = c( "Price", "Pct", "Position" )
   return( df )

Posted in R, Strategies | 5 Comments »

Covered Call ETF Performance

Posted by The Average Investor on Nov 1, 2011

Covered call ETFs have become quite popular lately. Living in Canada, I have been holding a couple Canadian members of this family for the last few months. When I purchased them, I liked the benefits and since I wasn’t expecting any bull markets on the horizon, I bought some. These were new products back them, so I promised myself to do some more detailed analysis at a later point.

Today was that later point. I took Horizons HXT and HEX ETFs. There are more details on the web site, but in general, HXT is a TSX60 ETF with re-invested dividends, while HEX is the covered called version, paying dividends on monthly basis. HEX was introduced in April and I made my purchase a few months later. Before jumping to the results let’s try to state my expectations. I was expecting after dividends HEX to outperform HXT. Seriously, weren’t the last few months the “best” by definition environment for covered call ETFs?

Now, here is the performance chart:



This chart was created using the following code:

library( quantmod )
library( ggplot2 )

# Get the symbols
getSymbols( c("HXT.TO", "HEX.TO"), from="1900-01-01")

# Align the dates
mm = merge( Ad( HXT.TO ), Ad( HEX.TO ), all=F )

# Compute the returns
hxtRets = dailyReturn( mm[,1] )
hexRets = dailyReturn( mm[,2] )

# Compute the growth
hxtGrowth = cumprod( 1 + hxtRets )
hexGrowth = cumprod( 1 + hexRets )

# Build a data frame for ggplot
df = data.frame(
            row.names=seq(1, length(time(hxtGrowth))))
colnames(df) = c("Date", "HXT", "HEX")

# Plot
gg = ggplot( df, aes( Date ) )
gg = gg + geom_line( aes( y=HXT, color="HXT" ) )
gg = gg + geom_line( aes( y=HEX, color="HEX" ) )
gg = gg + opts( title="HXT vs HEX" )
gg = gg + xlab( "Date" ) + ylab( "Growth" )
gg = gg + scale_color_manual( name="ETFs", values=c("HXT"="blue", "HEX"="red"))

Let’s put it this way – I am disappointed by this chart. Not only the covered call ETF performed worse, it did so with the same level of volatility (just eyeballing the chart). There is even more to it – the above chart assumes perfect dividend re-investment. While there is DRIP in Canada, there are no fractional shares. It’s probably insignificant, but certainly something to keep in mind for products that yield 10-20% annually. Last but not least, HXT does not pay any dividends – they are reinvested and as of recently, its trading is free if your stock broker is Scotia iTrade.

The above chart is not the only tool I used for this analysis, I also maintain a spreadsheet to track the exact performance of these ETFs. Unfortunately, the results of my spreadsheet looks similar to my chart.

The moral of the story – if something looks too good be true, probably it is. The media hype is always a suspect, even from reliable sources like the venerable BNN.

Posted in R, Strategies | 4 Comments »

R. I. P. EMA

Posted by The Average Investor on Oct 20, 2011

That’s right, I am moving away from exponential moving averages. Originally, I decided to use them somewhat arbitrary, probably because they tend to swing faster. Last night, after spending two and half hours debugging an issue which yet again turned out to be a particular property of these averages, I made my mind. I am back to simple moving averages and probably weighted moving averages for faster convergence.

What is the annoying property of exponential moving averages? They are recursive. In other words, each consecutive value is computed using the previous value. So what’s the problem? Here is an illustration:

library( quantmod )

getSymbols( "SPY", from="1900-01-01" )

spyEMA = EMA( tail( Cl( SPY ), 300 ), 200 )
print( as.numeric( last( spyEMA ) ) )

spyEMA = EMA( tail( Cl( SPY ), 400 ), 200 )
print( as.numeric( last( spyEMA ) ) )

Guess what the output is? That’s right – it’s different!

[1] 123.0065
[1] 123.4964

Not quite beneficial for trading research and since it has manifested on multiple occasions, it’s certainly time to move on.

Posted in R | 1 Comment »

Summarizing Returns with R

Posted by The Average Investor on Aug 2, 2011

Often I like to see the performance of a trading strategy summarized annually, quarterly or by month. In R, we start off with the summary function:

aggregateReturns = function( xx, leverage=1 )
   return( tail( cumprod( 1 + xx*leverage ), 1 ) )

Given a series xx, usually a chunk of the original, this function returns the accumulative returns for the period. The leverage is useful to somewhat simulate leveraged ETFs.

The rest is to call this function for various periods:

summarizeDailyReturns = function(
      leverage=1 )
   stopifnot( length( index( ss ) ) == length( index( indicator ) ) )
   stopifnot( is.xts( ss ) )
   stopifnot( is.xts( indicator ) )

   if( tolower( returns ) == "opentoclose" )
      stopifnot( has.Op( ss ) && has.Cl( ss ) )
      rets = Cl( ss ) / Op( ss ) - 1
      stopifnot( has.Ad( ss ) || has.Cl( ss ) )
      if( has.Ad( ss ) )
         rets = Ad( ss ) / lag( Ad( ss ) ) - 1
         rets = Cl( ss ) / lag( Cl( ss ) ) - 1

   rets[as.character(head( index( ss ), 1 ))] = 0
   rets = as.xts( indicator * coredata( rets ) )

   if( tolower( period ) == "annually" )
      yy = as.numeric( format( index( rets ), "%Y" ) )
      rets = aggregate( rets, yy, aggregateReturns, leverage )
   else if( tolower( period ) == "quarterly" )
      rets = aggregate( rets, as.yearqtr, aggregateReturns, leverage )
   else if( tolower( period ) == "monthly" )
      rets = aggregate( rets, as.yearmon, aggregateReturns, leverage )

   return( round( rets, 4 ) )

I will finish the post with an illustration how to use this function (assuming the two functions reside in returns.R from the current directory):



getSymbols("^GSPC", from="1900-01-01")

# The indicator - buy and hold: 1 every day
ind = xts(rep(1, length(index(GSPC))),

summarizeDailyReturns(GSPC, ind, returns="OpenToClose")
summarizeDailyReturns(GSPC, ind, leverage=2, period="quarterly")
summarizeDailyReturns(GSPC, ind, leverage=2, period="daily")

The last call computes the compound growth on a daily basis. 🙂

The final version of the function is available from the quntscript project.

Posted in R | Leave a Comment »

Yet another reason to avoid loops in R

Posted by The Average Investor on Jul 12, 2011

In some previous posts I have mentioned my struggles with the performance of the computations needed to implement the ARMA strategies in practice. Finally I have found a worthy solution, and as usual, there is a programming pattern to learn from it – avoid loops in R. 🙂

My first approach was to optimize the algorithms. Willing to trade some quality of the models to gain in performance, I tried a few alternatives, but I didn’t like neither of them. Then I concentrated on improving the overall R performance. After applying a few easy to do things I had to look for something more substantial.

For a little bit I toyed with the idea to use GPU, but although they can provide massive performance improvements, quite often they require a lot of specialized code and this alone can postpone using the system for months.

Then I took a step back, and reconsidered the issues. I am running two expensive tasks each day, on an 8-core (Intel i7 2600K processor, 4 core with hyper-threading) machine. Since each task is a single R process, I realized that I am not using the CPU maximum capacity. So I considered splitting each task in pieces, manually, but (luckily) before doing so, I decided to google for R parallelism.

The solution I finally came to was to use the multicore R package. The only changes I needed to make to my code, was to remove the loops! As an illustration, let’s take the dumbest example, let’s suppose we are computing sqrt with the following code:

for( ii in 1:100 )
   print( sqrt( ii ) )

The transformed, mutlicore-friendly code looks like:

ll = c()
for( ii in 1:100 )
   ll[ii] = ii

print( lapply( ll, sqrt ) )

Why is the last code multicore-friendly? Because one can transparently switch to mclapply from the multicore package:

library( multicore)

ll = c()
for( ii in 1:100 )
   ll[ii] = ii

print( mclapply( ll, sqrt ), mc.cores=multicore:::detectCores( ) )

The last version will “lapply” sqrt to each element in the array using as many threads as there are cores in the system! Assuming an 8-core system, the first 8 sqrts will be computed in parallel, and then a new sqrt will be started as soon as one of the previous finishes. Notice the line specifying the number of cores, the package is supposed to detect the number of cores on initialization, but that’s not the case on my system.

This pattern worked perfectly for the ARMA strategy (and for any other strategy computing all required outcomes in similar fashion for that matter): On each day, we need to compute the different actions to be taken for different closing prices. The only invariant in the loop body, ie between different closing prices, is the particular closing price for that iteration. So, I did exactly the same as what I did for sqrt – computed all interesting prices in a loop and then passed everything to mclapply to do the work!

A small and low-risk code change (easy to verify using a known-to-work function) resulted in almost 4 times performance improvement (I run each of the two instruments I currently trade with mc.cores == 4, that’s why the factor of only 4)!

Make sure to remember this patter next time you consider using a loop – I certainly will.

Posted in coding, R | 4 Comments »

ARMA Models for Trading, Part VI

Posted by The Average Investor on Jul 6, 2011

All posts in this series were combined into a single, extended tutorial and posted on my new blog.

In the fourth posting in this series, we saw the performance comparison between the ARMA strategy and buy-and-hold over the last approximately 10 years. Over the last few weeks (it does take time, believe me) I back-tested the ARMA strategy over the full 60 years (since 1950) of S&P 500 historic data. Let’s take a look at the full results.

ARMA vs Buy-and-Hold

ARMA vs Buy-and-Hold

It looks quite good to me. In fact, it looks so impressive that I have been looking for bugs in the code since. 🙂 Even on a logarithmic chart the performance of this method is stunning. Moreover, the ARMA strategy achieves this performance with a maximum drawdown of only 41.18% vs 56.78% for the S&P 500. Computing the S&P 500 returns and drawdowns is simple:


getSymbols("^GSPC", from="1900-01-01")
gspcRets = Ad(GSPC) / lag(Ad(GSPC)) - 1
gspcRets[as.character(head(index(Ad(GSPC)),1))] = 0
gspcBHGrowth = cumprod( 1 + gspcRets )

The above code will produce the 10 biggest drawdowns in the return series. To compute the ARMA strategy growth, we first need the daily indicator. This indicator is what took so long to compute. It is in Excel format (since WordPress doesn’t allow csv files). To use the file in R, save it as csv, without any quotes, and then import it in R via:

gspcArmaInd = as.xts( read.zoo(file="gspc.all.csv", format="%Y-%m-%d", header=T, sep=",") )

The first column is the date, the second the position for this day: 1 for long, -1 for short, 0 for none. Note, the position is already aligned with the day of the return (it is computed at the close of the previous day), in other words, no need to shift right via lag. The indicator needs to be multiplied with the S&P 500 daily returns, and then we can follow the above path. The next two columns are the number of auto regressive and the number of moving average coefficients for the model giving the best fit and used for the prediction. The GARCH components are always (1,1).

The only thing I didn’t like was the number of trades, a bit high for my taste – 6,346, or a trade every 2.35 days on average. This has the potential to eat most of the profits, more so in the past than today, however (lower transaction costs, higher liquidity). Still, taking into account the gains from this strategy together with the exceptional liquidity of S&P 500 instruments (SPY for instance trades about 167.7 million shares lately), should suffice to keep a significant amount of the profits.

Last, the simple R-code that produced this nice chart from the two growth vectors is worth showing:

png(width=480, height=480, filename="~/ttt/growth.png")
plot(log(gspcArmaGrowth), col="darkgreen", main="Arma vs Buy-And-Hold")
lines(log(gspcBHGrowth), col="darkblue")
legend(x="topleft", legend=c("ARMA", "Buy and Hold"), col=c("darkgreen", "darkblue"), lty=c(1,1), bty="n")

Pretty neat if you ask me! Code snippets like this one are what makes me believe command line interface is the most powerful interface.

Posted in Market Timing, R, Strategies | Tagged: , , , | 7 Comments »

Low-hanging R Optimizations on Ubuntu

Posted by The Average Investor on Jul 1, 2011

A friend of mine brought my attention recently to the fact that the default R install is way to generic and thus sub-optimal. While I didn’t go all the way rebuilding everything from scratch, I did find a few cheap steps one can do to help things a little.

Simply install the libatlas3gf-base package. That’s all, but it boosts the performance on some R calculations many fold. This package is an optimized BLAS library, which R can use out of the box.

My next step was to enable some SSE instructions when packages are compiled. To do that one needs to overwrite some compiler settings. First, I needed to find the R home path on my system: R.home() returned /usr/lib64/R on my Ubuntu 11.04. The I created a file called /usr/lib64/R/etc/ with the following content:

CFLAGS = -std=gnu99 -O3 -pipe -msse4.2
CXXFLAGS = -O3 -pipe -msse4.2

FCFLAGS = -O3 -pipe -msse4.2
FFLAGS = -O3 -pipe -msse4.2

How did I figured out what to add? Well, I looked up the defaults for these settings in /usr/lib64/R/etc/Makeconf and combined them with what I had in mind (adding SSE4.2 by default). I also removed the default -g. Now, when a new package is installed and compiled, you should see the options. For existing packages, uninstall them using remove.packages and then install them back using install.packages. I start R as root (sudo R) for these operations.

Does your CPU support SSE and what version? Run grep -i sse /proc/cpuinfo.

Last I noticed these two variables in Makeconf:

LAPACK_LIBS = -llapack
BLAS_LIBS = -lblas

The next thing I may try when my current simulations finish is to install optimized development libraries for BLAS and LAPACK (libatlas-dev for instance) and then change the above definitions …

Posted in coding, R | 6 Comments »

ARMA Models for Trading, Part IV

Posted by The Average Investor on May 31, 2011

All posts in this series were combined into a single, extended tutorial and posted on my new blog.

The last post promised to show some back testing results for the ARMA techniques. I decided to use the S&P 500 index for this purpose.

ARMA vs Buy and Hold

ARMA vs Buy and Hold

What really impresses me in the above char it the staggering performance of this approach during the financial crisis. As if it was feeding on the increased market volatility! This fact is also illustrated by the side by side annual comparisons, which I usually use to get an idea of the long term performance.

Year ARMA Buy and Hold
2001 -25.80% -13.04%
2002 7.14% -23.37%
2003 28.04% 26.38%
2004 13.66% 8.99%
2005 8.59% 3.00%
2006 17.26% 13.62%
2007 1.85% 3.53%
2008 100.96% -38.49%
2009 9.75% 23.45
2010 16.82% 12.78

Both systems performed similarly during the 2000/2002 recession, however, there is a stark difference in the performance during the 2008/2009 crisis. While buy and hold was losing money, the ARMA method registered it’s best year ever.

Let’s close with the answers to some obvious questions:

Why did I show only the performance of the last 10 years?
Believe it or not this is a very expensive computation. In fact, I have assembled a powerful i7 CPU machine just to be able to back test and trade these models. Almost 2 full days were necessary to compute the ARMA index for the past 10 years! I will consider publishing back testing for previous years, however, since S&P 500 is not a primary instrument for my trading, that’s not too likely to happen.

What parameters I used to compute the ARMA index?
The code is close to the code I have published in previous posts. I have used the fGarch package and my code always picked a GARCH(1,1) model. For the ARMA part, my code used all models between (0,0,1,1) and (5,5,1,1). I used 500 days of history. I mostly used the skewed Generalized Normal Distribution, specified by the cond.dist=”sged” parameter of garchFit, however, I obtained similar results using cond.dist=”sstd”.

It has become a lengthy and time-consuming post, so I will stop here. If I ever decide to do another post on this topic – I would likely discuss the obstacles I encountered while trading these models on a daily basis …

Posted in R, Strategies | Tagged: , , , | 9 Comments »

%d bloggers like this: