### Volatility and the square root of time

When I was a rookie, I asked one of the senior members of my team how to compute the volatility of an asset. His answer was as follow:

That’s simply an annualized standard deviation. If you are using daily data:

1. Compute the daily returns of the asset,
2. Compute the standard deviation of these returns,
3. Multiply the standard deviation by the square root of 260 (because there are about 260 business days in a year).

“Of course, he added, if you are using weekly returns you have to multiply by the square root of 52 and if you are using monthly data you should multiply by the square root of 12. Simple as that.

And so I did for years, not even trying to understand why I was multiplying the standard deviation by the square root of time.

Time passed by and one day, I found time to solve that mystery. Then, I understood why this formula makes perfect sense but, more importantly, I realized that the calculations made by most people I know in the financial industry, including seasoned investment professionals, are dead wrong.

Let me explain.

### Discrete returns

Let $P_t$ be the price of an asset (that doesn’t pay any interim income such as dividends or coupons for the sake of simplicity) on day $t$. Then, the way most people would compute the return ($\delta_t$) on that asset from day $t-1$ to day $t$ is:

$$\delta_t = \frac{P_t}{P_{t-1}} - 1$$

Now suppose we have price data for $T$ days with $t \in{\{0, 1, 2, .., T\}}$ and, therefore, $T-1$ daily of $\delta_t$, and we want to compute the return over the whole period ($\delta_T$). We would use:

$$\delta_T = \prod_{t=1}^T(1+\delta_t)-1$$

For instance, using the data for the DAX from the EuStockMarkets dataset in R:

> P <- as.numeric(EuStockMarkets[, "DAX"])
> T <- length(P)
>
> dP <- P[-1]/P[-T]-1
> prod(1+dP)-1
[1] 2.360688
>

Indeed, it's equivalent to:

> P[T]/P[1]-1
[1] 2.360688
>

And the mean return $\bar{\delta_t}$ over the same period is given by:

$$\bar{\delta_t} = (1+\delta_T)^{1/T}-1$$

In R:

> rT <- prod(1+dP)-1
> (1+rT)^(1/T)-1
[1] 0.0006519036
>

Or:

$$\bar{\delta_t} = \left( \frac{P_T}{P_0} \right)^{1/T}-1$$

Note we use $P_0$ because $\delta_1$ is the return between $t=0$ and $t=1$. In R:

> (P[T]/P[1])^(1/T)-1
[1] 0.0006519036
>

Using all this, it's easy to compute what the annualized return of the DAX was over that period. We have a daily mean return ($\bar{\delta_t}$) and we assume a year is 260 business days long; it follows that:

$$\bar{\delta_{260}} = (1+\bar{\delta_t})^{260}-1$$

In R:

> mT <- (1+rT)^(1/T)-1
> (1+mT)^260-1
[1] 0.1846409
>

In words: over that period, the DAX has returned 18.5% per annum on average.

Now, back to the subject: what is the volatility — that is, the annualized standard deviation — of our daily returns ($\delta_t$)? We know how to compute the standard deviation of daily returns; in R:

> sd(dP)
[1] 0.01028088
>

And, according to my former colleague, we should multiply this by $\sqrt{260}$:

> sd(dP)*sqrt(260)
[1] 0.1657742
>

Well, guess what: this is wrong and, not only it is wrong, it doesn't mean anything.

### Continuous returns

Using discrete returns is absolutely correct for most uses but, when computating a volatility (and, therefore, a Sharpe ratio or an information ratio) you should use log returns or, as I like to call them, continuous returns:

$$\delta_t = \ln{\left(\frac{P_t}{P_{t-1}}\right)}$$

Or, equivalently:

$$\delta_t = \ln{(P_t)} - \ln{(P_{t-1})}$$

Let me explain why.

The critical property of continuous returns is that the total (continuous) return over the whole period (the $T$ days) is a sum:

$$\delta_T = \sum_{t=1}^T\delta_t$$

In R:

> dP <- diff(log(P))
> sum(dP)
[1] 1.212146
>

Is, indeed, equivalent to:

> log(P[T]/P[1])
[1] 1.212146
>

As a result, the mean daily (continuous) return over the whole period ($\bar{\delta_t}$) is simply the arithmetic mean of the daily (continuous) return:

> mean(dP)
[1] 0.0006520417
>

And, you guessed it, the annualized (continuous) return of the DAX was over that period is given by:

> mean(dP)*260
[1] 0.1695309
>

Starting from this, let's go for a (random) walk.

### Random walk

Following Jules Regnault [1], consider a random variable $\delta_t$ that follow some distribution (we don’t care which one) over $T$ periods and just make two basic assumptions: (i) the distribution is stable across time and (ii) the observations of $\delta_t$ are independent one from another.

Denoting $\mu$ the mean of the distribution which is supposed to be stable (assumption i), we know that the sum of $T$ observations will follow a mean $\mu_T$ given by:

$$\mu_T = \mu \times T$$

Now, that distribution also has a standard deviation ($\sigma$): what is the standard deviation of the sum of $T$ observations?

Thanks to the Bienaymé formula [2], we know that the variance of the sum of uncorrelated (assumption ii) random variables is the sum of their variances:

$$Var{\left(\sum_{t=1}^T\delta_t \right)} = \sum_{t=1}^T Var{(\delta_t)}$$

Since we have assumed the distribution is stable (assumption i), so does the variante ($\sigma^2$). From which:

$$Var{\left(\sum_{t=1}^T\delta_t \right)} = \sigma^2 \times T$$

Therefore, standard deviation of the sum of $T$ observations:

$$\sqrt{Var{\left(\sum_{t=1}^T\delta_t \right)}} = \sigma \times \sqrt{T}$$

Here you go: here is the square root of time.

What we are computing here is the standard deviation of a sum and this is indeed how we accumulate continuous returns; not the way we accumulate periodic returns (it's a product). In other words, that formula only make sense if your daily (weekly, monthly... whatever) returns are computed as continuous returns.

The volatility is simply that very same calculation over a standard period of one year (here, $T=260$ days). In R:

> P <- as.numeric(EuStockMarkets[, "DAX"])
> T <- 260
> dP <- diff(log(P))
> mean(dP)*T
[1] 0.1695309
> sd(dP)*sqrt(T)
[1] 0.166096
>

Even better, since we have assumed that our $\delta_t$ are independent, the Central Limit Theorem tells us that, after a large enough number of observations ($T$), their sum should follow (or, at least, be close of) — guess what — a normal distribution.

In other words, with our two assumptions and using continuous returns, we are able to compute the mean and the standard deviation of a normal distribution; which basically means that we knows everything else.

### A demo

Let's make a step-by-step demo with the DAX data.

P <- as.numeric(EuStockMarkets[, "DAX"])
dP <- diff(log(P))
T <- 260
# After T days, we should have:
mT <- mean(dP)*T
sT <- sd(dP)*sqrt(T)

Now, we’re going to use  ecdf to generate 1000 random series of length T that follow the empirical distribution of the DAX:

N <- 1000
dist <- ecdf(dP)
Rt <- matrix(quantile(dist, runif(T*N)), T, N)

Let's plot them:

Ct <- apply(Rt, 2, cumsum)
cols <- heat.colors(N)
op <- par(mar = rep(5, 4))
plot(1:T, Ct[, 1], type = "n", ylim = c(-.5, .7), cex.lab = .7,
cex.axis = .7, cex.main = .8, main = "Figure 1")
for(i in 1:N) lines(1:T, Ct[, i], col = cols[i])
par(op)

You shoud get something like:

Now let's see what the distribution looks like after T days:

# The mean:
mean(Ct[T, ])
# Compare to:
mT
# The standard deviation:
sd(Ct[T, ])
# Compare to:
sT

Lastly, compare the empirical distribution at time T with a normal distribution with mean mT and standard deviation sT:

# Density estimate:
d <- density(Ct[T, ], from = -.5, to = .8)

# Normal distribution with our estimated mean/sd (volatility):
y <- dnorm(d$x, mT, sT) # Plot: op <- par(mar = rep(5, 4)) plot(d$x, d$y, type = "l", xlim = c(-.5, .8), cex.lab = .7, cex.axis = .7, cex.main = .8, main = "Figure 2") lines(d$x, y, col = "red")
par(op)


It should look like this:

Pretty close right?

From this, one can compute the probability associated with any level of return and it is clear that the higher the volatility, the more likely you are to face losses [3].

### Takeaways

First and should you only remember one thing from that post: a volatility should always be computed using continuous (a.k.a log) returns. Any other calculation is false. Period.

Second, most financial models don’t assume anything about the distribution of (say) daily returns: saying that, after $T$ periods, the distribution will be normally distributed is just a consequence of the CLT.

Lastly, the whole random walk thing is based on just two assumptions (i and ii); if you’re looking for witnesses, this is where you should start.

---
[1] Jules Augustin Frédéric Regnault, a French stock broker who first suggested the concept of a random walk of prices in 1863.
[2] Named after Irénée-Jules Bienaymé (1796-1878), one of the last great French statisticians.
[3] For price probabilities, you'll need to use the corresponding log-normal distribution (see ?dlnorm).

### Querying FRED from R

FRED is an amazingly useful website provided by the Federal Reserve Bank of St. Louis compiling over 500k time series from 87 different sources. Here are 2 short R functions to retrieve FRED data in R. You'll need rjson (with admin privileges  : install.packages("rjson")) and a valid API key (register for free here) to run them.

iFRED returns basic information about a time series as a list. For instance, assuming you’re interested by the Russell 1000 index Total Market Index (RU1000TR):

> sid <- "RU1000TR"
> iFRED(sid)
$id [1] "RU1000TR"$realtime_start
[1] "2018-01-23"

\$realtime_end
[1] "2018-01-23"

(...)


qFRED retrieves the time series itself as a matrix (with dates as row names). Id addition to sid, there are 6 other optional argument:

• from: the start date as a Date object (defaults to 1776-07-04);

• to: the end date as a Date object (defaults to Sys.Date());

• units: one of lin (levels, the default), chg (change), ch1 (change from one year ago), pch (percent change), pc1 (percent change from one year ago), pca (compounded annual rate of change), cch (continuously compounded rate of change), cca (continuously compounded annual rate of change) or log (natural log);

• freq: one of d (daily, the default), w (weekly), bw(bi-weekly), m (monthly), q (quarterly), sa (semiannual), a (annual), wef (weekly ending Friday), weth (weekly ending Thursday), wew (weekly ending Wednesday), wetu (weekly ending Tuesday), wem (weekly ending Monday), wesu (weekly ending Sunday), wesa (weekly ending Saturday), bwew (bi-weekly ending wednesday) or bwem (bi-weekly ending Monday).

• aggreg: when using freq, how should data be aggregated? One of eop (end of period, the default), avg (average) or sum (sum)

• na.rm: logical, should missing values (NAs) be removed from the output?

For instance, always with RU1000TR:

> from <- as.Date("2018-01-01")
> qFRED(sid, from)
RU1000TR
2018-01-01       NA
2018-01-02  8346.42
2018-01-03  8398.08
2018-01-04  8431.26
2018-01-05  8487.97
2018-01-08  8504.18
(...)

Let's say you want weekly data with end-of-the-period observations:

> qFRED(sid, from, freq = "w", aggreg = "eop")
RU1000TR
2018-01-05  8487.97
2018-01-12  8624.19
2018-01-19  8698.12
2018-01-26       NA

Same thing but with weekly averages:

> qFRED(sid, from, freq = "w", aggreg = "avg")
RU1000TR
2018-01-05  8415.93
2018-01-12  8543.91
2018-01-19  8653.44
2018-01-26       NA

Percent change from one year ago:

> qFRED(sid, from, units = "pc1")
RU1000TR
2018-01-01       NA
2018-01-02 21.66084
2018-01-03 21.54941
2018-01-04 22.16739
2018-01-05 22.54979
2018-01-08 23.23220
(...)

The code is on Github. Don't forget to replace .FRED_api_key with your own API key!

### ChallengeR #6

Votre dernière mission consistait à coder le tricheur parfait dans mon petit tournois d’algos pour un dilemme du prisonnier répété.

Et le gagnant est @AlekVladNevski qui nous propose une fonction (ici un peu reprise par mes soins) qui va modifier la matrice des paiements :

msp  = function(p, o, n = 2000) {
z <- if(match.call()[[1]]=='f1') c(5, 0) else c(0, 5)
m <- rep(z, each = 4)
dim(m) <- rep(2,3)
assign("m", m, envir = parent.frame(n=1))
return(FALSE)
}

#### Challenger #6

Votre mission, si vous l’acceptez consiste à coder une fonction nbchar qui compte le nombre de caractères du corps d’une autre fonction R, sans tenir compte des espaces, des indentations et des retours à la ligne. Par exemple, avec :

area = function(radius) {
}

Vous devriez vérifier que :

> nbchar(area)
[1] 11
>

C’est la fonction pour laquelle nbchar(nbchar) sera le plus petit qui remportera ce challenge. Vous avez jusqu'au vendredi 26 janvier ; soumettez vos réponses ci-dessous :

Coller votre code ici :

Validez pour enregistrer votre réponse.

### HTML form to Google Sheet

I’ve been trying to figure out how to do this for a while so maybe it will be useful to someone else. I want to create an HTML form here that sends all the inputs to a google sheet. Demo:

Here is the form:

First Name:

Last Name:

This is where the confirmation message will appear after submission.

And it all ends up in that sheet.

Worked? So here is how to do it:

First, quite obviously, you need and HMTL form. Here's mine:

We'll need to update the link_to_google_script thing after next step.

Then we need a Google Sheet. Go to your Google Drive, create one and save its ID somewhere (that the long string in the url between d/ and /edit).

Now, still in Google Drive, create a new Google Apps Script, remove any code in there and relace it with this:

Just replace the id_of_your_google_sheet with the ID of your own sheet.

Once done, save you script, select Publish, Deploy as web app, make sure that "anyone, even anonymous" has access to the app and Publish. After a bit of authorization procedures, you should get this:

Copy that link and paste it in your HTML instead of link_to_google_script then add at the bottom of your HTML page:

Here you go, it should be working!

### ChallengeR #5

Votre dernière mission consistait à trouver la fonction locf (qui remplace les NA par la dernière valeur connue) la plus compacte possible (étant bien entendu, comme d'habitude, que les espaces, les indentations et les retours à la ligne ne sont pas considérés comme des caractères). L’exemple que je vous ai donné était le suivant :

> x <- 10:20
> x[5:6] <- NA
> locf(x)
[1] 10 11 12 13 13 13 16 17 18 19 20
>

Ce faisant, j’ai commis une petite erreur : j’aurais dû préciser que locf devrait aussi faire ça :

> x <- 10:20
> x[c(1:2, 5:6)] <- NA
> locf(x)
[1] NA NA 12 13 13 13 16 17 18 19 20
>

En tenant compte de cet oubli de ma part, vous êtes deux à remporter ce 4ème ChallengeR avec un corps de fonction d’à peine 34 caractères.

La première solution est de @dickoah :

locf = function(x) {
x[cummax((!is.na(x)) * 1:length(x))]
}

La seconde, qui présente l'avantage de traiter les NA en début de vecteur, nous est proposée par @navarre_julien :

locf = function(x) {
i = !is.na(x)
c(NA, x[i])[cumsum(i)+1]
}

Notez que, pour répondre strictement à mon énoncé (et donc sans supporter les NA en début de vecteur), on pouvait raccourcir la version de Julien comme suit :

locf = function(x) {
i = !is.na(x)
x[i][cumsum(i)]
}

La fonction locf idéale est donc quelque chose du genre :

locf = function(x) {
if(is.matrix(x)) {
res <- apply(x, 2, locf)
} else {
i <- !is.na(x)
res <- c(NA, x[i])[cumsum(i)+1]
}
res
}

Bravo à tous les deux ! Voici donc ChallengeR, 5ème du nom.

Votre mission, si vous l’acceptez, consiste à coder le tricheur parfait dans mon petit tournois d’algos pour un dilemme du prisonnier répété. Vous devez concevoir une fonction de la forme :

msp = function(p, o, n = 2000) {
# faire quelque chose...
return(res)
}

De telle sorte qu’elle pulvérise tous les records en trichant vilement. Edit : Tout le code est accessible sur Github.

### ChallengeR #4

Bon, vous êtes non seulement forts mais vous êtes en plus de grands malades. Votre dernière mission consistait à trouver une fonction R qui renvoie la suite de Syracuse de n'importe quel entier x jusqu'à ce qu'elle atteigne 1. Il y a deux façons de faire ça :

En passant par une fonction récursive :

syracuse = function(x) {
a <- tail(x, 1)
if(a == 1) {
return(x)
} else {
syracuse(c(x, ifelse(a%%2, a*3+1, a/2)))
}
}

Ou en passant par une fonction récursive anonyme :

syracuse = function(x) {
a <- tail(x, 1)
if(a == 1) {
return(x)
} else {
Recall(c(x, ifelse(a%%2, a*3+1, a/2)))
}
}

Je dis que vous êtes des grands malades parce que, sur Twitter, vous avez assez rapidement décidé de trouver la fonction la plus condensée possible (sans parler de celui qui voulait faire ça sans if ni else). Du coup, je vous donne la version la plus courte, trouvée par @AlekVladNevski :

syracuse = function(x) {
c(x, if(x>1) Recall(if(x%%2) x*3+1 else x/2 ))
}

Nous allons donc pouvoir passer à l'étape 4.

Considérez le vecteur x suivant :

x <- 10:20
x[5:6] <- NA

Votre mission, si vous l’acceptez, consiste à coder trouver la fonction locf (pour Last Observation Copied Forward) qui, comme son nom le suggère, remplace les NA par la dernière observation connue :

> locf(x)
[1] 10 11 12 13 13 13 16 17 18 19 20
>

Le gagnant sera celui qui proposera la fonction qui utilise le moins de caractères possibles (étant entendu que les espaces, les indentations et autres retours à la ligne ne son *pas* considérés comme des caractères).

### Iterated prisoner’s dilemma

This is an iterated prisoner’s dilemma between yourself and 9 unknown opponents. Each match will last for 10 rounds and you’ll only know who was your opponent at the end of the match. For each round, you must choose between Cooperate (C) or Defect (D).

The payment matrix is :

 C D C [3,3] [0,5] D [5,0] [1,1]

In words:

• If you both cooperate ([C,C]), you'll get 3 points each;
• If you cooperate while your opponent defects ([C, D]), you'll get 0 points but he'll get 5 points;
• If you defect while your opponent cooperates ([D, C]), you'll get 5 points and he'll get nothing;
• If you both defect ([D,D]), you'll get 1 point each.

The number of points earned by each of you during one specific match are calculated at the bottom of the tables and you'll get you total score at the end of the post.

##### Match 1

This is round 1. What do you do?

 You Opponent - - - - - - - - - - - - - - - - - - - - 0 0
##### Match 2

This is round 1. What do you do?

 You Opponent - - - - - - - - - - - - - - - - - - - - 0 0
##### Match 3

This is round 1. What do you do?

 You Opponent - - - - - - - - - - - - - - - - - - - - 0 0
##### Match 4

This is round 1. What do you do?

 You Opponent - - - - - - - - - - - - - - - - - - - - 0 0
##### Match 5

This is round 1. What do you do?

 You Opponent - - - - - - - - - - - - - - - - - - - - 0 0
##### Match 6

This is round 1. What do you do?

 You Opponent - - - - - - - - - - - - - - - - - - - - 0 0
##### Match 7

This is round 1. What do you do?

 You Opponent - - - - - - - - - - - - - - - - - - - - 0 0
##### Match 8

This is round 1. What do you do?

 You Opponent - - - - - - - - - - - - - - - - - - - - 0 0
##### Match 9

This is round 1. What do you do?

 You Opponent - - - - - - - - - - - - - - - - - - - - 0 0
##### Conclusion

Still there? Great!

• Always Defects always defects regardless of what you do;
• Tit-for-Tat cooperates first and then reproduce your last move;
• Grudger cooperates until you defect; then defects all the time;
• Always Cooperates always cooperates regardless of what you do;
• Detective cooperates, defects, cooperates then checks what you did on the 3rd round: if you haven't retaliated, it defects all the time; otherwise it plays Tit-for-Tat;
• Random just cooperates or defect randomly (with a probability of 1/2);
• Win-Stay-Lose-Shift cooperates first then, if you've cooperated on the last round, repeat its last move; otherwise switches;
• Tit-For-Two-Tats same as Tit-for-Tat but only retaliates after two defections in a row;
• Alternates defects first and then just alternates regardless of what you do.