This post is a follow-up to the previous one, Discrete Fourier Transform – Trig Parameters in Principle. The titles are deliberately similar – but you will want to distinguish them.
Having shown you how to find trigonometric parameters of a sine wave, I want to show you a real example. I found this in Peter Bloomfield’s “Fourier analysis of time series: an introduction”, 978-0-471-88948-9.
The data, however, I found by searching the Internet. Although the book provides a couple of sources – the data isn’t there any longer.
According to Bloomfield, these numbers are the magnitude of a variable star at midnight on 600 consecutive nights. They have been rescaled.
Before we get started, let me point out that I am a rookie at using the Discrete Fourier Transform. I knew, for example, to expect bin leakage in the previous post… but it hasn’t been all that long that I’ve known about it. Still, what we’re about to do is pretty simple and everything works out fine.
And here is what they look like.
As much as I would like to simply dive in, let me be just a little cautious. Before I compute the DFT (Discrete Fourier Transform), I want to confirm that there is no significant trend in that data.
(I know, I can see that there’s no trend. But if there were, how might we quantify it?)
Let me construct a time variable, i.e. observation number… and show you the beginning and end of it.
Let me construct a data matrix with obs as independent and magn as dependent… fit a line to the data… and print the parameter table (and R^2) of the resulting fit:
I wonder if that’s the smallest R^2 I have ever seen. Anyway, the trend term (the slope of the line, labeled OBS) is insignificant – so we don’t need to remove the trend – there isn’t one. The constant term is close to the mean value:
then the constant would be exactly the mean value. (I told it to use none of the variables, but I didn’t rule out a constant, so that’s what we got: only a constant.) We expect the constant to go through the mean value, from what we know of least-squares regression.
I would, as you might expect, be prepared to remove higher-order trends from the data. I would then compute the DFT of the residuals – that’s how to remove whatever trend was fitted.
Computing the DFT of the data
Now, cut to the chase – compute the DFT of the data, and display a very small sample – just about the smallest possible sample – of the answers:
Now drop the first bin… get the absolute values of the fft values… then multiply each of them by 2 and divide by the number of observations:
Let’s look at the answers:
Let’s zoom in on the left:
So the two bin numbers I care about are 18+3 = 21 and 18+7 = 25 (the horizontal label 1 is associated with bin 19). The associated amplitudes are the numbers stored in the bins…
while the associated periods are the number of observations divided by the numbers of the bins:
I need to save the associated frequencies – the inverses of the periods:
Fitting two frequencies as sine/cosine pairs
Now, let’s just try fitting two sine/cosine pairs to the data.
Because I have no reason to believe that I have two pure sine waves, or two pure cosine waves. I have to assume that they are offset – phase shifted – from the origin. And my high school trig taught me that
that is, that a phase-sifted sine is a linear combination of sin(wt) and cos(wt).
More generally, if we had an amplitude C on the LHS, then the expansion would be:
which is to say that given C and , we may compute A and B as
Conversely, those equations tell us how to get the sine and cosine of given A, B, C. But then square each of those equations and add them:
which is to say that given A and B, we may compute C as
Then, of course, .
In other words, we should get the same answer whether we fit sin/cos pairs or a phase shifted sine (or a phase shifted cosine). (Hmm. What if the software ends up with C negative? I’m sure we get an equivalent answer.)
As I said, I will first seek a fit using sin/cos pairs for each frequency. Then, just to be blindingly obvious about the equivalence, I will seek two functions of the form .
We use our results from the FFT for initial guesses, and we ask Mathematica® for a Nonlinear Model Fit. We are asking for a constant term (), two frequencies (f and f2), and four amplitudes (A and B for f, a and b for f2). I provide initial guesses for the constant, and for the two frequencies. The defaults are zero, I believe.
Here’s the call for the fit, followed by its parameter table:
Pretty solid t-statistics there! Every one of them is highly significant. In particular, we seem to have been right in fitting two frequencies, not one.
There is an adjusted R Squared – very high:
It’s too bad he combined the factors of with the frequencies, but so be it.
The mean from the fitted equation is
from the DFT or the data itself.
What’s going on? Well, consider the function we fitted. The mean of one sine/cosine pair would be zero over an integral number of periods; the mean of the other sine/cosine pair is zero over an integral number of its periods – but the two periods don’t ever coincide, so over any finite interval the mean of the combined trig functions is not zero.
A little bit of the mean value is caught up in the trig terms; that’s why the fitted constant term isn’t exactly the mean value of the data.
At least, that’s what I think.
The frequencies from the least-squares fit are
from the DFT.
Let’s look at the residuals:
Let’s look at the absolute values of the DFT – just as we have before – of the residuals:
The largest values are less than 0.18 – whereas our original two peaks were about 8.
I think it’s safe to say that we’ve extracted all the cyclic structure.
Fitting two frequencies as phase-shifted sines
Just to be convincing, let me now ask Mathematica to fit two phase-shifted sines. That is, for each frequency, I fit . I am looking for a constant term , two amplitudes (B, b), and two phase shifts . As before, I provide initial guesses – the same guesses – for the frequencies and the constant.
First the call for the fit and its parameter table:
There is an adjusted R Squared – the same as before:
And that’s why I don’t worry about trying to extract phase information from the DFT: I can get it in the fit.
And here’s the newly-found fitted equation expanded:
As we expected, they’re the same fit, expressed in two different ways.
Fitting only one frequency
I want to show you, finally, what would happen if we had tried to fit only the dominant frequency – the one whose bin contained the largest absolute value. That is, we have a constant term () two amplitudes (A and B) for a cos/sin pair, and one frequency (f).
Yes, all the t-statistics are significant, and the Adjusted R^2 isn’t bad. But let’s look at the residuals.
In your face! The residuals after I fit only one frequency are still periodic. That tells us that there’s another frequency… i.e. that the 2nd peak in the DFT really was significant. Of course, we already knew that, because the coefficients were unquestionably significant for terms involving both frequencies.
So I have finally figured out how to use one of the commonplace tools in my personal time series toolbox. As usual, it’s just a matter of getting the details right.