Everything you wish to know about correlations but are afraid to ask

11institutetext: Instituto de Física de Líquidos y Sistemas Biológicos (IFLySiB), Universidad Nacional de La Plata and CCT CONICET La Plata, Consejo Nacional de Investigaciones Científicas y Técnicas, Calle 59 no. 789, B1900BTE La Plata, Argentina.
22institutetext: Departamento de Física, Facultad de Ciencias Exactas, Universidad Nacional de La Plata, Argentina
22email: tgrigera@iflysib.unlp.edu.ar, WWW home page: http://iflysib14.iflysib.unlp.edu.ar/tomas

Everything you wish to know about time correlations but are afraid to ask

Tomás S. Grigera
Abstract

We discuss the various definitions of time correlation functions and how to estimate them from experimental or simulation data. We start with the various definitions, both in real and in Fourier space, and explain how to extract from them a characteristic time scale. We then study how to estimate the correlation functions, i.e. how to obtain a good approximation to them from a sample of data obtained experimentally. Finally we discuss some practical details that arise in the actual computation of these estimates, and we describe some relevant algorithms.

1 Introduction

This chapter is about the definitions and practical computation of time correlation functions, i.e. the mathematical tools that enable us to find and study dynamical correlations in physical systems. Why do we care about correlations? Science is about understanding how things work (or, more ambitiously, how nature works [1]). The question of what “understanding” something really means is not one we plan to answer or even discuss here, but most of us would probably agree that it involves (possibly among other things) knowledge of a causation mechanism: to know how the state of the system at some time influences the behavior of the same system at a later time. Now of course correlation does not imply causality: if A and B are correlated, it might be that A causes B, or that B causes A, or that something else causes both A and B. So correlations do not (directly) tell us about cause and effect. However, causality is not directly measurable, while correlations are. Correlations do not provide us with a causality mechanism, but do constrain the cause-effect relationships we might care to imagine: to explain how a system works, you are free to come up with whatever mechanism (theory) you wish, but if that mechanism does not produce the kind of correlations actually observed, then it cannot be right.

Correlations play a major role in the study of systems of many particles, where the prediction (and maybe even the observation) of detailed particle trajectories is out of the question, due to the large number of variables involved. Instead, one takes a statistical approach, predicting (macroscopic) temporal correlations rather than microscopic trajectories. Conversely, in a macroscopic experiment one does not measure particle trajectories, but records quantities that are the result of the collective instantaneous state of many microscopic components. The exact variations of the observed quantities are due to factors beyond our control (we say they are noisy), but the overall trends and the correlations between the quantities measured at successive times furnish meaningful information about the system’s dynamics. Moreover, time correlations provide information on the dynamics even in the absence of overall trends (e.g. when a physical system is in thermodynamic equilibrium). But time correlations are also greatly useful for the study of systems quite more complex than physical systems in thermodynamic equilbrium: in particular, of special interest for this volume, biological systems.

We will thus proceed to present the mathematical definitions of time correlation functions and their Fourier transforms (§2), to discuss the definition and meaning of the correlation time (§3) and finally to turn to the question of computing time correlations from experimental data (§4).

But before moving on, I would like to make two clarifications. First, this chapter is about questions you are afraid to ask, not because they are so advanced that they touch very dark and well-kept secrets, but because they are so basic that you are embarrassed to ask. No honest scientific question should be embarrassing, but you know. Second, this is perhaps not everything you want to know about time correlations. In particular, I have not included any material regarding their theoretical calculation. But I do discuss all (well, most) of what you need to know to compute them from experimental or simulation data, starting from their various possible definitions and touching many practical and sometimes nasty details. I apologize if these clarifications are disappointing, but you must agree that “almost all you need to know to be able to compute time correlations from experimental data, some of which is so basic that you are embarrassed to ask” would make for considerably less catchy title.

2 Definition of time correlation functions

2.1 Correlation and covariance

Let x and y be two random variables and p(x,y) their joint probability density. We use to represent the appropriate averages, e.g. the mean of x is x=xp(x)𝑑x (recall that the probability distribution of x can be obtained from the joint probability, p(x)=𝑑yp(x,y), and in this context is called marginal probability) and its variance is Varx=(xx)2. Let us write

Cxy=xy=xyp(x,y)𝑑x𝑑y. (1)

Eq. (1) defines the correlation of x and y. Their covariance is defined as

Covx,y=(xx)(yy)=xyxy. (2)

A property of the covariance is that it is bounded by the product of the standard deviations (provided of course that they exist, which cannot always be taken for granted):

Covx,y2VarxVary. (3)

The Pearson correlation coefficient, or Pearson correlation for short, is

rx,y=Covx,yVarxVary. (4)

From the inequality (3) it follows that the Pearson coefficient is bounded, 1rx,y1. It can be shown that the equality holds only when the relationship between x and y is linear [2, §2.12].

The variables are said to be uncorrelated if their covariance is null:

Covx,y=0xy=xy(uncorrelated). (5)

Absence of correlation implies that the variance of the sum is the sum of the variances, because

Varx+y=Varx+Vary+2Covx,y, (6)

but is weaker than independence: independence means that p(x,y)=p(x)p(y). In the particular case when the joint probability p(x,y) is Gaussian, Covx,y=0 is equivalent to x and y being independent, but this not true in general. On the other hand it is clear that independence does imply absence of correlation. The covariance, or the correlation coefficient, can be said to measure the degree of linear association between x and y, since it is possible to build a nonlinear dependence of x on y that yields zero covariance (see Ch. 2 of [2]).

2.2 Fluctuating quantities as stochastic processes

Consider now an experimentally observable quantity, A, that provides some useful information on a property of a system of interest. We will usually assume here for simplicity that A is a scalar, but the present considerations can be rather easily generalized to vector quantities. A can represent the magnetization of a material, the number of bacteria in a certain culture, the rainfall in a certain area, the prize of a share in the stock market, etc. We assume that A can be measured repeatedly and locally in time, so that we actually deal with a function A(t).

We wish to compare the values of A measured at different times. However, we are interested in cases where A is noisy, i.e. subject to random fluctuations that arise because of our incomplete knowledge of the variables affecting the system’s evolution, or because of our inability control them with enough precision (e.g. we do not know all variables that can affect the variation of the price of a stock market asset, we do not know all the interactions in a magnetic system, we cannot control all the degrees of freedom of a thermal bath that fixes the temperature of a sample). The quantity A(t) is then a random variable, and to compare it at different values of its argument we will resort to the correlation and covariance defined in the previous section. This will lead us to define time correlation functions, and this section is devoted to stating their precise definitions.

Let us stress that a statement like “A(t1) is larger than A(t2)” is useless in practice, because even if it is meaningful for a particular realization of the measurement process (experiment), the noisy character of the observable means that a different repetition of the experiment, under the same conditions, will yield a different function A(t). Actually repeating the experiment may or may not be feasible depending on the case, but we assume that we know enough about the system to be able to assert that a hypothetical repetition of the experiment would not exactly reproduce the original A(t). For clarity, it may be easier to imagine that several copies of the system are made and let evolve in parallel under the same conditions, each copy then producing a signal slightly different from that of the other copies.

We note that the expression “under the same conditions” is implicitly qualified in some system-dependent way. Clearly we expect that two strictly identical copies of a system evolving under exactly identical conditions will produce the same function A(t). Same conditions here must be understood in a statistical way: the system is prepared by choosing a random initial configuration extracted from a well-defined probability distribution, or two identical copies evolve with a random dynamics with known statistical properties (e.g. coupled to a thermal bath at given temperature and pressure). We are excluding from consideration cases where the fluctuations are mainly due to experimental errors. If that where the only source of noise, one could in principle repeat the measurement enough times so that the average A(t) is known with enough precision. A(t) would then be an accurate description of the system’s actual evolution, and the correlations we are about to study would be dominated by properties of the measurement process rather than by the dynamics of the system itself. Instead we are interested in the opposite case: experimental error is negligible, and the fluctuations of the observable are due to some process intrinsic to the system. Indeed in many cases (such as macroscopic observables of systems in thermodynamic equilibrium) the average of the signal is uninteresting (it’s a constant), but the time correlation function unveils interesting details of the system’s dynamics.

2.2.1 Stochastic processes

From our discussion of A as a fluctuating quantity, it is clear that A(t) is not an ordinary function. Rather, at each particular value of time, A(t) is a random variable, and A(t) as a whole is a random function, or stochastic process. A stochastic process A(t) is then a family of random variables indexed by t. Thus there must exist a family of probability densities p(A,t) that allows to compute all moments An(t) and in general any average f(A(t)). However, P(A,t) is not enough to characterize the stochastic process, because in general the variables A(t) at different times are not independent. Thus p(A,t) is actually a marginal distribution of some more complicated multivariate probability density.

It is natural to imagine a functional P[A(t)] that gives the joint probability for all random variables A(t). However, an infinite-dimensional probability distribution is a wild beast to ride, and we shall content ourselves with the (infinite) set of n-variable joint distributions

Pn(A1,t1,A2,t2,,An,tn). (7)

Most stochastic processes can be completely specified by the set of all joint probabilities of the form (7) for all n and all possible choices of t1,,tn. The sense of “most” is highly technical [2], but should include all processes of interest to physicists. Here we shall need only the set corresponding to n=2 (which trivially gives also the set for n=1 marginalizing on the second variable).

The notion of stationary processes is an important one, connected with the thermodynamic idea of equilibrium. Completely stationary processes are those for which all the joint probability distributions that define the process are translation invariant, that is

Pn(A1,t1,,An,tn)=P(A1,t1+s,,An,tn+s),s,n,ti. (8)

A less restrictive notion is that of stationary processes up order M. This is defined by the requirement that all the joint moments up to order M exist and are time-translation invariant:

Am1(t1)Am2(t2)Amn(tn)=Am1(t1+s)Am2(t2+s)Amn(tn+s) (9)

for all s, n, {t1,,tn} and {m1,,mn} such that m1+m2++mnM. This is less restrictive not only because of the bound on the number of random variables considered, but also because invariance is imposed only on the moments, and not on the joint distributions themselves.

In completely stationary processes the time origin is irrelevant: no matter when one performs an experiment, the statistical properties of the observed signal are always the same. In particular, this implies that the average A(t) is constant in time (but does not mean that time correlations are trivial). This is the situation one expects when observing a system in thermodynamic equilibrium. For processes stationary up to order M, time origin is irrelevant for moments up to order M. For a stationary process up to first order, one can assert that A(t) is a constant independent of t, and for a process stationary up to second order one can in addition assert that A2(t) (and hence the variance) is time-independent, and that A(t1)A(t2)=A(0)A(t2t1) (using s=t1), i.e. that all second-order moments are function only of the time difference.

2.3 Time correlation functions

The time correlation function is the correlation of the random variables A(t0) and A(t0+t), i.e. the second order moment

C(t0,t)=A(t0)A(t0+t)=𝑑A1𝑑A2P(A1,t0,A2,t0+t)A1A2, (10)

where the star stands for complex conjugate. The time difference t is sometimes called lag, and the function is called self correlation or autocorrelation to emphasize the fact that it is the correlation of the same observable quantity measured at different times. Note however that from the point of view of probability theory A(t0) and A(t0+t) are two different (though usually not independent) random variables. One can also define cross-correlations of two observables at different times:

CAB(t0,t)=A(t0)B(t0+t). (11)

The star again indicates complex conjugate. From now on however we shall restrict ourselves to real quantities and omit it in the following equations.

One can also define a correlation function of the fluctations δA(t)=A(t)A(t). This is called the connected time correlation function111The name comes from diagrammatic perturbation theory, because it can be shown that only connected Feynman diagrams appear in the expansion of this quantity (see e.g. [3, ch. 8]).,

Cc(t0,t)=δA(t0)δA(t0+t)=C(t0,t)A(t0)A(t0+t). (12)

Remembering (5) it is clear that this function is zero when the variables are uncorrelated (which we expect to happen for t). For this reason this function is often more useful than the correlation (10), which for uncorrelated variables takes the value A(t0)A(t0+t).

The names employed here are usual in the physics literature (e.g. [3, 4]). In the mathematical statistics literature, the connected correlation function is called autocovariance (in fact it is just the covariance of A(t0) and A(t0+t)), while the name autocorrelation is reserved for the normalized connected correlation defined below (15).

In what follows, unless otherwise stated, we will assume that we are dealing with processes stationary at least up to second order, so that the time correlation is a function only of the lag t. In this case the difference between the connected and nonconnected correlation functions is a constant,

Cc(t)=A(0)A(t)A2=C(t)A2,(stationary process). (13)

Finally, let us define a normalized connected correlation so that its absolute value is bounded by 1, in analogy with the Pearson coefficient:

ρ(t0,t) =Cc(t0,t)Cc(t0,0)Cc(t0+t,0), (14)
ρ(t) =Cc(t)Cc(0), (stationary). (15)

2.3.1 Properties of the time correlation function

From the definition of the correlation function, it is easy to see that in the stationary case the following hold:

  1. 1.

    Cc(0)=VarA.

  2. 2.

    |Cc(t)|Cc(0) for all t.

  3. 3.

    If A(t) is real-valued, the correlation is even: C(t)=C(t).

Also, for sufficiently long lags one expects that the variables become independent and correlation is lost, so that C(t)A2. Thus the connected correlation should tend to zero, and it will typically have a Fourier transform (see § 2.4)

2.3.2 Example

Let us conclude this section of definitions with an example. Fig. 1 shows on the left two synthetic stochastic signals, generated with the random process described in §4.2.2. On the right there are the corresponding connected correlations. The two signals look different, and this difference is captured by Cc(t). We can say that fluctuations for the lower signal are more persistent: when some value of the signal is reached, it tends to stay at similar values for longer, compared to the other signal. This is reflected in the slower decay of Cc(t).

Refer to caption
Figure 1: Two stochastic signals (left) and their respective connected time correlations (right). Correlation times are τ4.5 (top), τ100 (bottom). The signals were generated through Eq. (44) with μ=0, σ2=2, N=105, and w=0.8 (top), w=0.99 (bottom).

2.4 Fourier transforms of time correlations

Time correlation functions are often studied in frequency space, either because they are obtained experimentally in the frequency domain (e.g. in techniques like dielectric spectroscopy), because the Fourier transforms are easier to compute or handle analytically, because they provide an easier or alternative interpretation of the fluctuating process, or for other practical or theoretical reasons. Although the substance is the same, the precise definitions used can vary. One must pay attention to i) the convention used for the Fourier transform pairs and ii) the exact variables that are transformed.

Here we define the Fourier transform pairs as

f~(ω)=𝑑tf(t)eiωt,f(t)=dω2πf~(ω)eiωt, (16)

but some authors choose the opposite sign for the forward transform and/or a different placement for the 1/2π factor (sometimes splitting it between the forward and inverse transforms). Depending on the convention a factor 2π can appear or disappear in some relations, like (20) below.

As for the second point above, time correlations are defined in terms of two times, t1 and t2 (which we chose to write as t0 and t0+t). One can transform one or both of t1 and t2 or t0 and t. Let us consider two different choices, useful in different circumstances. First take the connected time correlation (12) and do a Fourier transform on t:

Cc(t0,ω)=𝑑teiωtCc(t0,t0+t)=eiωt0δA(t0)δA~(ω), (17)

where δA~(ω) stands for the Fourier transform of δA(t). This definition is convenient when there is an explicit dependence on t0 but the evolution with t0 is slow, as in the physical aging of glassy systems (see e.g. [5]); one then studies a time-dependent spectrum. If on the other hand Cc is stationary, it is more convenient to write t1=t0, t2=t0+t and do a double transform in t1, t2:

C~c(ω1,ω2)=𝑑t1𝑑t2eiω1t1eiω2t2C(t1,t2t1)=𝑑t1𝑑t2eiω1t1eiω2t2δA(t1)δA(t2)=δA~(ω1)δA~(ω2)=𝑑t1𝑑t2ei(ω1+ω2)t1eiω2(t2t1)Cc(t2t1)=(2π)δ(ω1+ω2)C~c(ω2), (18)

where C~c(ω) is the Fourier transform of the stationary connected correlation with respect to t and we have used the integral representation of Dirac’s delta, δ(ωω)=(2π)1eit(ωω)𝑑t. As the above shows, the transform is zero unless ω1=ω2. For this reason it is useful to define the reduced correlation,

CcR(ω)=δA~(ω)δA~(ω)=δA~(ω)δA~(ω), (19)

where the rightmost equality holds when A is real.

The transform C~c(ω) is a well-defined function, because the connected correlation decays to zero (and usually fast enough). We can then say

C~c(ω)=12πδ(0)CcR(ω). (20)

This relation furnishes a fast way to compute Cc(t) in practice (§4.3). But let us comment on the at first sight baffling infinite factor relating the reduced correlation to C~c(ω). This originates in the fact that δA~(ω) cannot exist as an ordinary function, since we have assumed that A(t) is stationary. This implies that the signal has an infinite duration, and that, its statistical properties being always the same, it cannot decay to zero for long times as required for its Fourier transform to exist. The Dirac delta can be understood as originating from a limiting process where one considers signals of duration T (with suitable, usually periodic, boundary conditions) and then takes T. Then 2πδ(ω=0)=𝑑teitω|ω=0=𝑑t=T. These considerations can be made rigorous by defining the signal’s power spectrum,

h(ω)=limT1TAT(ω)AT(ω),AT(ω)=T/2T/2A(t)eiωt𝑑t, (21)

and then proving that h(ω)=C~c(ω) [2, §4.7, 4.8].

3 Correlation time

The connected correlation function measures how correlation is gradually lost as time elapses. One often seeks for a summary of this detailed information, in the form of a time scale that measures the interval it takes for significant decorrelation to happen: this is the correlation time222The term relaxation time is often used interchangeably with correlation time. The relaxation time is the timescale for the system to return to stationarity after an external perturbation is applied. For systems in thermodynamic equilibrium, the fluctuation-dissipation theorem [6] implies that the two are equal. τ. While some definitions imply that after a correlation time the connected correlation has descended to a prescribed level, this quantity is usually better understood as a scale, which precise meaning depends on the details of the shape of the correlation function. It is most useful to compare correlation functions of similar shape, which change as some environmental conditions varies (e.g. when one studies a given system at a set of different temperatures). You should also keep in mind that correlation functions can be quite complicated, and it may be appropriate to describe them using more than one scale (e.g. they could be a superposition of exponential decays). Thus when one speaks of the correlation time one means (or should mean) the largest of them. For the purpose of comparing different decays to see which one is slower, it is less relevant which of the times one chooses as description of the decay as long as a) one chooses the same scale in all cases and b) one knows that all of them scale in the same way when the control variable is changed. We now discuss several possible definitions of τ.

3.1 Correlation threshold

A simple, “quick and dirty”, way to define a correlation time is to choose it as the time it takes for the (normalized, connected) correlation to drop below a prescribed threshold ϵ:

ρ(t=τ)=ϵ. (22)

This definition is that is easy to apply to experimental data, and can be useful to compare correlation functions as long as they have the same shape. But it can be ambiguous if the decay displays oscillations, or if many time scales are present, and if it is inadequate when applied to functions of different shapes (see Fig. 2) or to power laws.

Refer to caption
Figure 2: Different possible shapes for the decay of the time correlation. Top left: simple exponential. Top right: simple exponential (black curve) and double exponential (red curve) decay. In this case, the threshold criterion (here ϵ=0.1) labels the red curve as the fastest, but it clearly has a longer tail. Bottom left: stretched exponential. The average correlation time (26) is τ22.7 (black curve), τ92.6 (red curve). Bottom right: exponentially damped harmonic oscillations.

3.2 Fit parameter

Sometimes an analytical expression can be fit to the correlation function, and a time scale extracted from the fit parameters. For example, if one can fit an exponential decay,

Cc(t)=Aet/τ, (23)

then the fitted value of the time scale τ can be used as correlation time (which in this particular case coincides with the threshold definition using ϵ=1/e). However, real-life time correlations are usually more complicated than a simple exponential. One can perhaps find more complicated functions that fit the correlation, but be wary of the proliferation of parameters333“With four parameters I can fit an elephant, and with five I can make him wiggle his trunk.” Attributed to John von Neumann [7]. (as when fitting the sum of two, three, n exponentials).

You may may get away with fitting the only the last part of the decay with (23), if you can fit a sizeable part of the “tail”. The rationale is that the first part of the decay is dominated by fast microscopic processes one is (often) not interested in, so that by fitting the tail one obtains a good estimate of the correlation time of the slowest process. The problem with this strategy is how to decide when the tail starts; if the τ obtained this way is too sensitive to how this choice is made then it is probably no good.

A function widely used to describe non-exponential decays, employing only three parameters, is the Kolrausch-William-Watts or stretched exponential function,

Cc(t)=Ae(t/τ)β. (24)

Here τ is a time scale and the stretching exponent β controls the shape (β<1 gives a stretched exponential, i.e. a function with a longer tail compared to a simple exponential of the same τ, while β>1 produces instead a compressed exponential). However comparing the τ of two functions with different β can be misleading (see Fig. 2). A better description of the decay using a single number can be achieved by considering the correlation as a superposition of exponential processes,

ρ(t)=0w(τ)et/τ𝑑τ, (25)

which defines the correlation time distribution function w(τ) as essentially the inverse Laplace transform of ρ(t) [8]. Then the average correlation time is τ=τw(τ)𝑑τ. For the stretched exponential one finds [8]

τ=τβΓ(1β), (26)

where Γ(x) is Euler’s gamma function.

3.3 Integral time

A quite general way to define a correlation time is from the integral of the normalized connected correlation,

τint=0ρ(t)𝑑t. (27)

Clearly for a pure exponential decay τint=τ. In general, if ρ(t)=f(t/τ) then τint=constτ. The integral time is related to the variance of the estimate of the mean of the signal (see §4.2 below and [9, §2]). With some care, it can be computed from experimental or simulation data, avoiding the difficulties encountered when using thresholds or fitting functions. The procedure explained next is expected to work well if long enough sequences are at disposal and the decay does not display strong oscillations or anticorrelation [9].

If C^k(c)Cc(kΔt) is the estimate of the stationary connected correlation (obtained as explained in §4.2), then the integral can be approximated by a discrete sum, but the sum cannot run over all available values k=0,,N1, because as discussed in §4.2 the variance of C^k(c) for k near N1 is large, so that the sum k=0N1C^k(c) is dominated by statistical noise (more precisely, its variance does not go to zero for N). A way around this difficulty is [9, §3] to cut-off the integral at a finite time tc=cΔt such that cN but the correlation is already small at t=tc (implying that tc is larger than a few times τ). Thus τint is defined self-consistently as

τint=0ατintρ(t)𝑑t, (28)

where α should be chosen larger than about 5, and within a region of values such that τint(α) is approximately independent of α. Longer tails will require larger values of α; we have found adequate values to be as large as 20 in some cases. To solve (28) you can compute τ(M)=kMC^k(c)/C^0(c) starting with M=1 and increasing M until ατ(M)>M.

3.4 Correlation time from spectral content

Another useful definition of correlation time can be obtained from the Fourier representation of the normalized correlation,

ρ~(ω)=𝑑tρ(t)eiωt. (29)

Because ρ(t) is normalized, dω2πρ~(ω)=1, a characteristic frequency ω0 (and a characteristic time τ0=1/ω0) can be defined such that ρ~(ω) for ω[ω0,ω0] holds half of the spectrum [10], i.e.

ω0ω0dω2πρ~(ω)=12. (30)

This this definition of can be expressed directly in the time domain writing

ω0ω0dω2π𝑑tρ(t)eiωt=20𝑑tρ(t)ω0ω0dω2πeiωt=2π𝑑tρ(t)sinω0tt, (31)

where we have used the fact that ρ(t) is even. Then the correlation time is defined by

0dttρ(t)sin(tτ0)=π4. (32)

It can be seen that if ρ(t)=f(t/τ), then τ0 is proportional to τ (it suffices to change the integration variable to u=t/τ in the integral above).

An advantage of this definition is that it copes well with the case when inertial effects are important and manifest in (damped) oscillations of the correlation function (see Fig. 2). In particular, for harmonic oscillations of frequency ν, τ0=1/(2πν) while τint is undefined.

4 Computing time correlations from experimental data

In this section we examine in detail how to compute in practice the time correlation function of a signal recorded in an experiment or produced in numerical simulation. Up till know we have discussed the theoretical definitions of correlation functions, which are given in terms of averages over some probability distribution, or ensemble. However, when dealing with experimental or simulation data we do not have direct access to the probability distribution, but only to a set of samples, i.e. results from experiments, distributed randomly according to an unknown distribution. We must try to compute the averages we want (the correlation functions), as accurately as possible, using these samples. This is what the field of statistics is about: building estimators that allow to compute the quantities of interest as best as possible from the available samples.

We assume that the experiment records a scalar signal with a uniform sampling interval Δt, so that we are handled N real-valued and time-ordered values forming a sequence ai, with i=1,,N. It is understood that if the data are digitally sampled from a continuous time process, proper filtering has been applied444According to the Nyquist sampling theorem, if the signal has frequency components higher than half the sampling frequency (i.e. if the Fourier transform is nonzero for ωπ/Δt) then the signal cannot be accurately reconstructed from the discrete samples; in particular the high frequencies will “polute” the low frequencies (an effect called aliasing). Thus the signal should be passed through an analog low-pass filter before sampling. See [11, §12.1] for a concise self-contained discussion, or [2, §7.1].. In what follows we shall measure time in units of the sampling interval, so that in the formulas and algorithms below we shall make no use of Δt. To recover the original time units one must simply remember that ai=A(ti) with ti=t0+(i1)Δt. For the stationary case we shall write Ck=C(tk) where tk is the time difference, tk=kΔt and k=0,,N1, and in the non-stationary case Ci,k=C(ti,tk).

4.1 Stationary vs. non-stationary signals

It is clearly hopeless to attempt to estimate an ensemble average unless it is possible to obtain many samples under the same conditions (i.e., if one is throwing dice, one should throw many times the same dice). This implies there is a huge difference in how one can treat a sample from a stationary process vs. a sample from a non-stationary one. If the process is stationary, we can essentially consider the samples ai of one sequence as different repetitions of the same experiment. Estimation then basically consists in replacing ensemble averages with time averages, e.g. one estimates the mean A as a¯, with

a¯=1Ni=0N1ai, (33)

and the stationary correlation with formula (37) below. There is more to be said (see §4.2 below), but in essence the situation is this.

On the other hand, if the process is not stationary it means that the samples of a single sequence are different experiments, in the sense that the conditions have changed from one sample to another (i.e. the system has evolved in some nontrivial way). In this case the correlation depends on two times (not just on their difference) and the mean itself can depend on time. The only way to estimate the mean or the correlation function is to obtain many sequences ai(n), n=1,,M reinitializing the system to the same macroscopic conditions each time (in a simulation, one can for example restart the simulation with the same parameters but changing the random number seed). Time averaging is no good in this case, instead the ensemble average is replaced by average across the different sequences, i.e. the (time-dependent) mean is estimated as

A(ti)ai¯=1Mn=1Mai(n), (34)

and the time correlation as

Cc(ti,tk)C^i,k(c)=1MnMδai(n)δai+k(n),δai(n)=ai(n)ai¯, (35)

where the hat distinguishes an estimate from the actual quantity. These estimators have the desirable property that they “get better and better” when the number of samples M grows, if the samples are independent and the ensemble has finite variance. More precisely, C^i,k(c)C(ti,ti) as M (and similarly for a¯). This property is called consistency, (see [2, §5.2]) and is guaranteed because C^i,k(c)=C(ti,ti) and ai¯=A(ti) (i.e. the estimators are unbiased) and their variance vanishes for M (because the samples are independent).

If one has several sequences sampled from a stationary system, it is possible to combine the two averages: one first computes the stationary correlation estimate (37) for each sequence, and then averages the different correlation estimates (over n at fixed lag k). It is clearly wrong to average the sequences themselves before computing the correlation, as this will tend to approach the (stationary) ensemble mean A for all times, destroying the dynamical correlations.

Before turning to estimation in the stationary case, let us stress that it is always sensible to check whether the sequence is actually stationary. A first check on the mean can be done dividing the sequence in several sections and computing the average of each section, then looking for a possible systematic trend. If this check passes, then one should compute the time correlation of each section and then checking that all of them show the same decay (using fewer sections than in the first check, as longer sections will be needed to obtain meaningful estimates of the correlation function). It is important to note that this second check is necessary even if the first one passes; as we have noted above it is possible for the mean to be time-independent while the time correlation depends on two times (i.e. the stochastic process is only stationary to first order).

4.2 Estimation of the time correlation of a stationary process

If our samples ai are form a stationary signal, we build our estimators using time averages in lieu of ensemble averages, in particular the estimator for A is (33). As we said above, the idea is that we regard the samples as different realizations of the same experiment. However, they are not independent realizations, but correlated realizations. The estimate (33) is still good in the sense that it is consistent, but it has higher error than the equivalent estimate built with independent samples, because it has a higher variance. The variance of the estimate with correlated samples is [9]

Vara¯2τintN[A2A2], (36)

i.e. 2τint times larger than the variance in the independent sample case, where τint is the integral correlation time (27). In this sense N/2τint can be thought of as the number of “effectively independent” samples.

To estimate the connected correlation function we use

C^k(c)=1Nkj=1Nkδajδaj+k,δaj=aja¯, (37)

where a¯ is usually the estimate (33), although the true sample mean A can be used if known. If the true mean is used, the estimator (37) is unbiased, otherwise it is asymptotically unbiased, i.e. the bias tends to zero for N, provided the Fourier transform of Cc(t) exists. More precisely, C^c,k=Cc(tk)α/N, where α=2πVaraC~c(ω=0). The variance of the estimate C^c,k is O(1/(Nk)) [2, §5.3]. This is sufficient to show that, at fixed k, the estimator is consistent, i.e. C^k(c)Cc(tk) for N. However the variance increases for increasing k, and thus the tail of C^c,k is considerably noisy. In practice, for k near N the estimate is mostly useless, and the usual rule of thumb is to use C^c,k only for kN/2.

It is natural to propose an estimate for the nonconnected correlation function doing a time average, analagous to (37),

C^k=1Nkj=1Nkajaj+k. (38)

However, although C^k is unbiased, it can have a large variance when the signal has a mean value larger than the typical fluctuation (see §4.2.1). In general, the connected estimator should be used (an exception may be when an experiment can only furnish many short independent samples of the signal, see the example below in §4.2.2).

Another asymptotically unbiased estimator of the connected correlation can be obtained by using 1/N instead of 1/(Ni) as a the prefactor of the sum in (37). Calling this estimator Cc,k, it holds that Cc,k=Cc(tk)α/NkC(tk)/Nαk/N2, where α is defined as before, and again α=0 if the exact sample mean is used. This has the unpleasant property that the bias depends on k, while the bias of C^c,k is independent of its argument, and smaller in magnitude. The advantage of Cc,k is that its variance is O(1/N) independent of k, thus it has a less noisy tail. Many authors prefer Cc,k due to its smaller variance and to the fact that it strictly preserves properties of the correlation (like property 2 of § 2.3.1), which may not hold for C^c,k. Here we stick to C^c,k, as usual in the physics literature (e.g. [9, 12, 13]), so that we avoid worrying about possible distortions of the shape at small k. In practice however, it seems that as long as N is greater than 10τ (a necessary requirement in any case, see below), and for kN/2, there is little numerical difference between the two estimates.

4.2.1 Two properties of the estimator

We must mention two important features of the estimator that are highly relevant when attempting to compute numerically the time correlation. The first is that the difference between the non-connected and connected estimators is not a¯2, but

C^kC^c,k=a¯[1NkjNkaj+1NkjNkaj+ka¯], (39)

as it is not difficult to compute. The difference does tend to a¯2 for N, but in a finite sample fluctuations are important, especially at large k. Fluctuations are additionally amplified by a factor a¯, so that when the signal mean is large with respect to its variance, the estimate C^k is completely washed out by statistical fluctuations. In practice, this means that while it is true that C(t) and Cc(t) differ by a constant term (A2), in general it is a bad idea to compute C^k and subtract a¯2 to obtain an estimate of the connected correlation. Instead, one computes the connected correlation directly (by computing first an estimate of the mean and using (37)) and then, if needed, adds a¯2 to obtain an estimation of C(t).

Another important fact is that the estimator of the connected correlation will always have zero, whatever the length of the sample used, even if Nτ. To see this, consider the quantity

Bi=(Ni)C^c,i=j=1Niδajδaj+i, (40)

and compute the sum

i=0N1Bi=i=0N1j=1Niδajδaj+i=i=0N1j=1Nk=1Nδajδakδk,j+i. (41)

But i=0N1δk,j+i equals 1 if kj and 0, so

i=0N1Bi=j=1Nk=jNδajδak=12jkNδajδak+j=1N(δaj)2=12[j=1Nδaj]2+12j=1N(δaj)2=12j=1N(δaj)2>0, (42)

where the last equality follows because jδaj=0. Now we can easily do the same sum starting from i=1:

i=1N1Bi=B0+i=0N1Bi=12j=1N(δaj)2<0. (43)

This shows that at least some of the Bi must be negative. But since B0>0, the conclusion is that Bk, and hence C^c,k, which differs from it by a positive factor, must change sign at least once for k1.

The practical consequence of this is that when N is of the order of τ, or smaller, the estimate C^c,k will suffer from strong finite-size effects, and its shape will be quite different from the actual Cc(t). In particular, since C^c,k will intersect the x-axis, it can mislead you into thinking that τ is smaller than N when in fact it is several times larger. Be suspicious if C^c,k changes sign once and stays very anticorrelated. Anticorrelation may be a feature of the actual Cc(t), but if the sample is long enough, the estimate should decay until correlation is lost (noisily oscillating around zero). One must always perform tests estimating the correlation with different values of N: if the shape of the correlation at short times depends on N, then N must be increased until one finds that estimates for different values of N coincide for lags up to a few times the correlation time (see example below).

Once a sample-size-independent estimate has been obtained, the correlation time can be estimated (§3), and it must be checked that self-consistently N is several times larger than τ.

4.2.2 Example

To illustrate the above considerations, we generate a correlated sequence from the recursion

ai=wai1+(1w)ξi, (44)

where w[0,1] is a parameter and the ξi are uncorrelated Gaussian random variables with mean μ and variance σ2. Assuming the signal is stationary it is not difficult to find

a=μ,(aμ)2=1w1+wσ2,Ck(c)=(aiμ)(ai+kμ)=σ2wk, (45)

so that the correlation time is τ=1/logw.

We used the above recursion to generate artificial sequences and computed their time correlation functions with the estimates discussed above. Fig. 3 shows the problem that can face the non-connected estimate. When the average of the signal is smaller than or of the order of the noise amplitude (as in the top panels), one can get away with using (38). However if μσ, the non-connected estimate is swamped by noise, while the connected estimate is essentially unaffected (bottom panels). Hence, if one is considering only one sequence, one should always use the connected estimator. The situation might be different if many sequences, corresponding to different realizations of the same experiment, are available (see the example after next).

Refer to caption
Figure 3: Connected vs. nonconnected estimate. The estimate of C(t) (equation (38), left panels) is much worse that the estimates of Cc(t) (equation (37), right panels). The (artificial) signal was generated with (44). Top panels: μ=1; bottom panels: μ=100. In both cases, σ2=1, w=0.95 (τ20) and length N=5104.

In Fig. 4 we see how using samples that are too short affects the correlation estimates. The same artificial signal was generated with different lengths. For the shorter lengths, it is seen that the correlation estimate crosses the t axis (as we have shown it must) but does not show signs of losing correlation. One might hope that N=100010τ is enough (the estimate starts to show a flat tail), but comparing to the result of doubling the length shows that it is still suffering from finite-length effects. It is seen that a sequence at least 20τ to 50τ long is needed to get the initial part of the normalized connected correlation more or less right, while a length of about 1000τ is necessary to obtain a good estimate up to t5τ. The unnormalized estimator suffers still more from finite size due to the increased error in the determination of the variance (left panel).

Refer to caption
Figure 4: Finite size effects. Estimates of the connected correlation (top) and normalized connected correlation (bottom) for sequence (44) of different lengths as indicated, together with the analytical result Cc(t)=σ2wt. Parameters are μ=0, σ2=1, w=0.99 (τ100).

If the experimental situation is such that it is impossible to obtain sequences much longer than the correlation time, one can get some information on the time correlation if it is possible to repeat the experiment so as to obtain several independent and statistically equivalent sample sequences. In Fig. 5 take several (say M) sequences with the same parameters as in the previous example, but quite short (N=2002τ). As we have shown, it is not possible to obtain a moderately reasonable estimate of Cc(t) using one such sequence (as is also clear from the M=1 case of Fig. 5, where the analytical result is plotted for comparison). However, the figure shows how one may benefit from the multiple repetitions of the (short) experiment by averaging together all the estimates. The averaged connected estimates are always far from the actual correlation function, even for M=500 (the case which contains in total 105 points, which proved quite satisfactory in the previous example): this is consequence of the fact that all connected estimates must become negative. Instead, the averaged non-connected estimates approach quite closely the analytical result even though not reaching the decorrelated region555Note that in this case μ=0 so that fluctuations are larger than the average. If that were not the case, one may attempt to compute a connected correlation estimate by using all sequences to estimate the average, then substracting this same average to all sequences.. Although it is tricky to try to extract a correlation time from this estimate (one may fit the initial decay but it is not possible to know whether the final decay will follow this trend or whether some very slow tail is present), this procedure at least offers a way to obtain some dynamical information in the face of experimental limitations.

Refer to caption
Figure 5: Effect of averaging the estimates of many short sequences. We show the connected (left) and non-connected (right) estimates of M different samples of sequence (44) as indicated in the legend, with N=200, μ=0, σ2=1, w=0.99 (τ100).

4.3 Algorithms to compute the estimators

The estimators can be computed numerically by straightforward implementation of equations (35) or (37), although in the stationary case it is much more efficient to compute the connected correlation through relation (21) using a fast Fourier transform (FFT) algorithm. Let us focus on the stationary case and examine in some detail these algorithms.

Algorithm 1 presents the direct method. It is straightforward to translate the pseudo-code to an actual language of your choice. Apart from some missing variable declarations, the only thing to consider is that it is probably not convenient (or even illegal in some languages, as in classic C) to return a large array, and it is better to define C as an output argument, using a pointer or reference (as e.g. FORTRAN or C do by default) to avoid copying large blocks of data. The advantages of this algorithm are that it is self-contained and simple to understand and implement. Its main disadvantage is that, due to the double loop of lines 8–11, it runs in a time that grows as N2. For N up to about 105 this algorithm is perfectly fine: a good implementation in a compiled language should should run in a few seconds in a modern computer. But this time grows quickly; in the author’s computer N=5105 takes 35 seconds, for N=106 the time is two and a half minutes. In contrast, the algorithm with FFT takes 1 second for N=106 and 11 seconds for N=107.

Algorithm 1 Compute the connected correlation of sequence a (of length N) using the direct O(N2) method. The connected correlation is returned in vector C.
1:function timecorr(a,N)
2:     μ0 Compute average
3:     for i=1,,N do
4:         μμ+ai      
5:     μμ/N
6:     for i=1,,N do Clear C vector
7:         Ci0      
8:
9:     for i=1,,N do Correlation loop
10:         daiμ
11:         for k=0,,Ni do
12:              Ck+1Ck+1+d(ai+kμ)               
13:
14:     for i=1,,N do Normalize and return
15:         CiCi/(Ni1)      
16:     return C

So, if you need the correlation of really long sequences, the FFT-based algorithm, though more difficult to get running, pays off with huge savings in CPU time at essentially the same numerical precision. The idea of the algorithm is to compute the Fourier transform of the signal, use (20) to obtain the Fourier transform of the connected correlation, then transform back to obtain Cc(t). This is faster than algorithm 1 because the clever FFT algorithm can compute the Fourier transform in a time that is O(NlogN).

Actually, we need discrete versions of the Fourier transform formulas (as we remarked before, the Fourier transform of the continuous time signal does not exist). The discrete Fourier transform (DFT) and its inverse operation are defined [11, §12.1] as (it is convenient to let the subindex of ai run from 0 to N1 to write the following two equations),

a~k=j=0N1e2πijk/Naj,aj=1Nk=0N1e2πijk/Na~k, (46)

where we note that the inverse DFT effectively extends the sequence periodically (with period N). The discrete version of (20) is [11, §13.2]

D~k =|a~k|2, where Dj =k=0N1akak+j, (47)

and where the definition of Dj makes use of the (assumed) periodicity of ai. Dj is almost our estimate (37): we only need to take care of the normalization and of the fact that due to the assumed periodicity of ai some past times are regarded as future, e.g. for k=10, in the sum there appear the terms a0a10 up to aN11aN1 (which are fine), but also aN10a0 through aN1a9, which we do not want included. This is fixed by padding the original signal with N zeros at the end, i.e. setting ak=0 for k=N,,2N1 and ignoring the values of Dj for jN.

In summary, to compute the connected correlation using FFT the steps are i) estimate the mean and substract from the ai, ii) add N zeros to the end of the sequence, iii) compute the DFT of the sequence, iv) compute the squared modulus of the transform, iv) compute the inverse DFT of the squared modulus, v) multiply by the 1/(Ni) prefactor. Pseudocode for this algorithm is presented as algorithm 2.

Algorithm 2 Compute the connected correlation of sequence a (of length N) using a fast Fourier transform. This algorithm is O(NlogN).
1:function timecorr(a,N)
2:     μ0 Compute average
3:     for i=1,,N do
4:         μμ+ai      
5:     μμ/N
6:     for i=0,,N do Substract the average from signal
7:         aiaiμ      
8:     for i=N,,2N do Pad with 0s at the end
9:         ai0      
10:
11:     bFFT(a,2N) Compute the FFT of a as a vector of length 2N
12:     for i=1,,2N do Compute squared modulus of b
13:         bi|bi|2 Note that the Fourier transform is complex      
14:     CIFFT(b,2N) Inverse FFT
15:
16:     Cresize(C,N) Discard the last N elements of C
17:     for i=1,,N do Normalize and return
18:         CiCi/(Ni1)      
19:     return C

To translate this into an actual programming language the comments made for algorithm 1 apply, and in addition some extra work is needed for lines 10–13. First, one needs to choose an FFT routine. If you are curious about the FFT algorithm, you can read for example [11, Ch. 12] or [14], but writing an FFT routine is not easy, and implementing a state-of-the-art FFT is stuff for professionals. Excellent free-software implementations of the FFT can be found on Internet. FFTW [15], at http://www.fftw.org deserves mention as particularly efficient, although it is a large library and a bit complex to use. Note that some simpler implementations require that N be a power of two, failing or using a slow O(N2) algorithm if the requiriment is not fulfilled. Also pay attention to i) the difference between “inverse” and “backward” DFTs (the latter lacks the 1/N factor), ii) how the routine expects the data to be placed in the input array, iii) how it is returned, and iv) whether the transform is done “in place” (i.e. overwriting the original data) or not. If the routine is a “complex FFT” it will expect complex input data (so that for real sequences you will have to set to zero the imaginary part of the ai), while if it is a “real FFT” routine it will typically arrange (“pack”) the output data in some way, making use of the discrete equivalent of the A~(ω)=A(ω) symmetry so as to return N real numbers instead of 2N (the real and imaginary parts of the complete DFT). This affects the way you must compute the squared modulus (lines 11–12). For example, for the packing used by the FFTW real routines, lines 11–12 translate to (in C)

b[0]*=b[0];
for (int i=1; i<N; i++) {
b[i] = b[i]*b[i] + b[2*N-i]*b[2*N-i];
b[2*N-i] = 0;
}
b[N]*=b[N];

5 Conclusion

I have tried to convey the basic notions about time correlation functions as well as some practical advice on how to compute them from actual data. I hope this account will be useful for students and researchers finding themselves in need to compute time correlations.

On closing, I wish to thank the colleagues with whom I have worked over the years, and which have helped shape my understanding of time correlations through discussions on concepts and practicalities, in particular A. Cavagna, I. Giardina, V. Martín-Mayor, G. Parisi and the late P. Verrocchio.

Last but not least, I thank D. Chialvo for contributing the title, for organizing and inviting me to the Complexity Weekend, for agreeing that a tutorial of this sort should be written, and for editing and getting this volume to the press.

References

  • [1] P. Bak, How Nature Works: The Science of Self-Organized Criticality (Copernicus, 1996). 10.1007/978-1-4757-5426-1
  • [2] M.B. Priestley, Spectral Analysis and Time Series (Academic Press, 1981)
  • [3] J.J. Binney, N.J. Dowrick, A.J. Fisher, M.E.J. Newman, The Theory of Critical Phenomena: An Introduction to the Renormalization Group (Clarendon Press, 1992)
  • [4] J.P. Hansen, I.R. McDonald, Theory of Simple Liquids with Applications to Soft Matter, 4th edn. (Academic Press, 2013)
  • [5] L.F. Cugliandolo, in Slow Relaxations and Nonequilibrium Dynamics in Condensed Matter, ed. by J.L. Barrat, M. Feigelman, J. Kurchan, no. LXXVII in Les Houches Summer School (Springer, 2004)
  • [6] R. Kubo, M. Toda, N. Hashitsume, Statistical Physics II (Springer, 1998)
  • [7] F. Dyson, Nature 427, 297 (2004)
  • [8] C.P. Lindsey, G.D. Patterson, J. Chem. Phys. 73, 3348 (1980)
  • [9] A.D. Sokal, in Functional Integration: Basics and Applications (1996 Cargèse School), ed. by C. DeWitt-Morette, P. Cartier, A. Folacci (Plenum, New York, 1997)
  • [10] B.I. Halperin, P.C. Hohenberg, Phys. Rev. 177(2), 952 (1969). 10.1103/PhysRev.177.952
  • [11] W.H. Press, B.P. Flannery, S.A. Teukolsky, W.T. Vetterling, Numerical Recipes in C: The Art of Scientific Computing, Second Edition, 2nd edn. (Cambridge University Press, Cambridge ; New York, 1992)
  • [12] M.P. Allen, D.J. Tildesley, Computer Simulation of Liquids (Clarendon Press, Oxford, 1987)
  • [13] M.E.J. Newman, G. Barkema, Monte Carlo Methods in Statistical Physics (Oxford University Press, Oxford, 1999)
  • [14] P. Duhamel, M. Vetterli, Signal Processing 19, 259 (1990)
  • [15] M. Frigo, S.G. Johnson, Proceedings of the IEEE 3, 216 (2005)