Posted on by & filed under Content - Highlights and Reviews, Programming & Development.

A guest post by Timothy Pratley, who currently works for Tideworks Technology as a Development Manager building Traffic Control software for logistics clients including SSA Marine, CSX, and BNSF.

Modelling the behavior of stocks is fun and thought provoking. My goal in this post is to demonstrate some features of Clojure by simulating the S&P 500 index. You will now see the sequence operations we discussed last week in a more meaningful context.

We will cover the following themes: random walk simulation, functional generalization, persistent data structures matter, and data exploration. The statements made here in this post are not investment advice. The full source code used in this post can be found at https://github.com/timothypratley/stocksim.

## Stepwise simulation

If people do not believe that mathematics is simple, it is only because they do not realize how complicated life is.” – John von Neumann

The random walk hypothesis can be defined as: the price tomorrow will be the price today + novelty:

Calling `rand-step` produced a new price offset by -0.5 to 0.5 from some input price.

A sequence of prices is produced with `iterate` calling `rand-step` on an initial price 1, then calling `rand-step` on the result, and then `rand-step` on the result of that, and so on. This is an infinite series, but we only take as many prices as we need.

Let’s declare a `sim` function that uses the Incanter library to plot a time series from calling a step function for 30 years of 252 active trading days:

The `sim` function is passed a `step` function for calculating the next value in the series:

This produces a chart of the random walk of iteratively calling rand-step (see the random step figure):

And here is a figure showing the S&P 500 index:

There are two observations we can make about the S&P 500 index:

1. The exponential growth implies relative change with a positive bias.
2. Most changes are small, but a few are large.

A better guess for novelty is: `previous * (random near zero + bias)`.

An interesting property of evenly distributed random numbers is that if you add a bunch together you get numbers that approximate a normal distribution:

Adding together 100 numbers between 0 and 1 will give us a number between 0 and 100. Starting the reduction with -50, centers the result around 0. The final result is a random number that will mostly be close to 0. Let’s plot a thousand of these random normal numbers:

Here is the simulated normal distribution:

Our function is producing random numbers mostly near zero, but sometimes larger. This is just what we want, so now we can use it to define a `step` function that adds a biased random normal relative to the current price:

We modify the price by the multiplication of `price` by our special random number. We then chart a new a random walk, using our new step function:

Here is the biased normal step:

I find it striking that such a formulation exhibits similar behavior to the S&P 500 index. Regardless of what real forces move the index, the end result is hard to distinguish from randomness.

Now let’s establish a baseline of nominal fixed interest. I emphasize nominal because real interest rates fluctuate:

I define the fixed-step to take an extra parameter rate because I intend to use it later with different rates:

Capturing an argument is called the partial application of a function. Here we created a partial application of `fixed-step` (we created a new function that only takes one argument) and passed it to `sim`. Here is the fixed interest step:

\$1 returns \$11 over 30 years at a consistent 8% per annum rate reinvested daily.

The three different `step` functions all work with the same `sim` function to produce very different charts.

## Steps that require history

History is the witness that testifies to the passing of time; it illumines reality, vitalizes memory, provides guidance in daily life and brings us tidings of antiquity.” – Cicero

How would a random walk behave under the influence of momentum?

Momentum is a very obvious concept, but at this stage we can only guess at the specific mechanics. I will define momentum as the 5-day average relative to the 50-day average. This may strike you as arbitrary, because it is!

When the short term average is below the long term average, we have bias movement by a 0.1% drag. Otherwise we have bias movement by a 0.1% push.

Now we encounter a problem. Momentum depends on history, but our step function argument is the current price. Well, let’s solve that by passing the entire history instead! Our step will build up a history by appending values as it goes:

Our simulation starts with a history of one price represented as a vector `[1]`, and we are interested in the last history iteration after our chosen amount of days. Compare this with our original `sim`. Not much has changed. We could refactor all of the steps to avoid duplicating the function. Alternatively, we could use partial functions of the unchanged steps composed with the `last` function to get the last history value, which is the price to act on. If we anticipate coding multiple influences like momentum, we might even generalize further to create a step that combines several multiplicative factors that are calculated from history. In the accompanying source code, I do this in `history-step`. Generalization requires a little refactoring, but the changes are minor and we have good options as our goals change. And, there is no boilerplate code to manage.

Before we look at the result, pause to consider how we are building up data. Clojure vectors are immutable. A new vector is created in every step. To do this efficiently, Clojure vectors share structure. This is a really important thing to keep in mind about functional programming. Smart data structures make convenient means of expression computationally feasible. Without them, the performance would be terrible due to allocating and copying memory.

Alright, how does our momentum-affected simulation look?

Here is the biased normal step with momentum:

This simulation is bittersweet. It behaves how I feel the index should, but not how it actually does. Some other definition might work better. Other data series might be more momentum driven. The 10 year treasury rate does appear to follow such a notion of momentum.

## Getting closer to the data

The Dolphin can take you into the data.” – Johnny Mnemonic

I have been picking parameters by experimenting with what “looks right.” It is now time to see how the data is really behaving:

To calculate the changes we use the pairing trick published earlier, putting a `prices` sequence beside itself using `map` to divide the previous and current price, resulting in a sequence of the daily change ratio.

`juxt` creates a function that calls whatever functions you pass it. How meta is that! `get-stats` is thus a function that returns a vector of results of calling all the input functions on the changes sequence.

The results are mean: 0.34%, largest rise: 11.5%, and largest fall: -20.5%. Here is the daily change ratio S&P 500 index:

Volatility tends to grow and shrink, indicating the steps are not independent. See the S&P 500 index change distribution:

The distribution of change ratios matches our intuition that they fall in a band with a slight positive bias. There are considerable outliers indicating “black swan events”; unexpected events that change investor expectations dramatically.

Rather than generating random numbers, we can sample the real data distribution:

Sequences are order `n` lookup (like a list scan), vectors are order 1 (like an array value lookup). When drawing a random `nth` number, we want to use a vector not a sequence for performance.

This is a random walk, drawing change ratios randomly from all real change ratios in the Yahoo data for ^GSPC (The S&P 500 index):

Using the real change ratio distribution yields a more convincing simulation.

## Many simulations

A ship is safe in harbor, but that’s not what ships are for.” – William G.T. Shedd

So, what good is a simulation? We can quantify the risk implied by our assumptions by running a simulation many times and observing the distribution of total return. To find the return of a single simulation:

Now do a sample-step random walk 1000 times to get 1000 total returns. Plot the returns and calculate the mean:

See the simulated 30 year returns:

The simulated average case is that a \$1 dollar investment returns \$12.8 over the long term, which is equivalent to an 8.5% per annum rate.

8.5% pa average return is in line with conventional wisdom, but the most frequent returns are sadly lower. The implication is that under our random model assumption the index is partially like a bond (likely to reach 5x initial value), and partially like a lottery ticket (you might get up to 100x return if lucky), giving an average of 13x return. For an anchored underlying economic model we might hope instead to see results distributed closer to the average.

We only have one S&P index, but we can use sampling to examine all of the 30-year periods in the past. To do this, partition the data into 30-year sequences for every possible day that has enough subsequent data:

By now, you should be seeing a theme of functions that call functions that call functions that operate on data. Data is the interface, and functions are the means of abstraction. Programming in Clojure is all about making functions to transform data, composing existing functions, thinking of good names for functions, and looking for opportunities to generalize. See all 30-year performance periods for the S&P 500:

The average case return for all periods in the Yahoo ^GSPC data was 7.9x (6.9% pa). The peak frequency was at 5x (5.4% pa), but a lottery aspect pushed the average return up. 30 year holding returns were between 3x (3.7% pa) and 20x (10% pa).

Contrary to the simulated results, there are no outliers at 100x return, suggesting there may be an anchoring component at work. Tantalizingly, the total distribution looks to be made up of overlapping distributions, a hint that the random behavior may be linked to several large driving components? Such observations are highly speculative and very likely an artifact of overfitting, but do leave the door open to believing that there are non-random factors out there to find.

## Conclusion

Coding in Clojure is all about functions and data. Persistent data structures make accumulating large results, like a complete history, feasible. Partial functions and composed functions are convenient paths to generalization.

Be sure to look at the Clojure resources that you can find in Safari Books Online.

## Safari Books Online has the content you need

 Clojure Inside Out is a video where you’ll not only learn how to tackle practical problems with this functional language, but you’ll learn how to think in Clojure—and why you should want to. Neal Ford (software architect and meme wrangler at ThoughWorks) and Stuart Halloway (CEO of Relevance, Inc.) show you what makes programming with Clojure swift, surgical, and accurate. Clojure Programming, helps you learn the fundamentals of Clojure with examples relating it to the languages you know already—whether you’re focused on data modeling, concurrency and parallelism, web programming, statistics and data analysis, and more. The Joy of Clojure goes beyond the syntax, and shows how to write fluent, idiomatic Clojure code. You will learn to approach programming challenges from a Functional perspective and master the Lisp techniques that make Clojure so elegant and efficient. This book will help you think about problems the “Clojure way,” and recognize when they simply need to change the way they program. Practical Clojure is the first definitive reference for the Clojure language, providing both an introduction to functional programming in general and a more specific introduction to Clojure’s features. This book demonstrates the use of the language through examples, including features such as STM and immutability, which may be new to programmers coming from other languages.