## ARMA Models for Trading, Part II

Posted by The Average Investor on Apr 21, 2011

All posts in this series were combined into a single, extended tutorial and posted on my new blog.

We left the last post at the point of determining the best ARMA model. Before continuing the discussion, however, I would like to make a few points that might seem a bit questionable or unclear:

- We model the daily returns instead of the prices. There are multiples reasons: this way financial series usually become stationary, we need some way to “normalize” a series, etc
- We use the
*diff*and*log*function to compute the daily returns instead of percentages. Not only this is a standard practice in statistics, but it also provides a damn good approximation

Now back to choosing the best fitting ARMA model. A well known statistic to measure the goodness of fit test is AIC (for Akaike Information Criteria). Once the fitting is done, the value of the aic statistics is accessible via:

xxArma = armaFit( xx ~ arma( 5, 1 ), data=xx ) xxArma@fit$aic

There are other statistics of course, which for instance penalize models with mode parameters to avoid over-parametrization, however, typically the results are quite similar.

To summarize, all we need is a loop to go through all parameter combinations we deem reasonable, for instance from 0 to 5, inclusive, both for the AR (the first component) and the MA (the second component), for each parameter pair fit the model, and finally pick the model with the lowest AIC or some other statistic. The full code for *findBestArma* is at the end of the post.

In the code below, note that sometimes armaFit fails to find a fit and returns an error, thus quitting the loop immediately. *findBestArma* handles this problem by using the *tryCatch* function to catch any error or warning and return a logical value (FALSE) instead of interrupting everything and exiting with an error. Thus we can distinguish an erroneous and normal function return just by checking the type of the result. A bit messy probably, but it works.

findBestArma = function( xx, minOrder=c(0,0), maxOrder=c(5,5), trace=FALSE ) { bestAic = 1e9 len = NROW( xx ) for( p in minOrder[1]:maxOrder[1] ) for( q in minOrder[2]:maxOrder[2] ) { if( p == 0 && q == 0 ) { next } formula = as.formula( paste( sep="", "xx ~ arma(", p, ",", q, ")" ) ) fit = tryCatch( armaFit( formula, data=xx ), error=function( err ) FALSE, warning=function( warn ) FALSE ) if( !is.logical( fit ) ) { fitAic = fit@fit$aic if( fitAic < bestAic ) { bestAic = fitAic bestFit = fit bestModel = c( p, q ) } if( trace ) { ss = paste( sep="", "(", p, ",", q, "): AIC = ", fitAic ) print( ss ) } } else { if( trace ) { ss = paste( sep="", "(", p, ",", q, "): None" ) print( ss ) } } } if( bestAic < 1e9 ) { return( list( aic=bestAic, fit=bestFit, model=bestModel ) ) } return( FALSE ) }

## ARMA Models for Trading, Part I « The Average Investor's Blog said

[…] there are more robust statistical methods to do that. More on that in the next post … LD_AddCustomAttr("AdOpt", "1"); LD_AddCustomAttr("Origin", "other"); […]

## bgpl said

hi, nice post.

however, I am confused..

You only have two parameters for the ARMA anyway. So, using AIC to compare different iterations of this essentially becomes comparison of the actual fit, not of the degree of parametrization. in other words AIC reflects both fit and number of parameters, but since your number of parameters are the same across different iterations, it devolves to the fit.

AIC probably makes sense to be employed for comparison among multiple strategies with different parameters and different numbers of parameters, but in this context, I fail to see how it helps – I am likely missing something here.

would the same results be obtained merely by using the least-error instead of AIC ?

thanks !

## The Average Investor said

Hi, the two parameters are in fact the number of parameters used in the ARMA model. (3,5,1,1) describes the best fit using two AR components, five MA components, and GARCH parameters. Once we find a fit for (3,5,1,1) and (5,3,1,1) how do we choose between them? Which one is better? Using AIC for this purpose seems to be a common practice. I have toyed with other ideas – one of them is to use the in-sample returns of the model to choose the better one, a very greedy approach. Never implemented it though …