## pure sine

I want to look at the Discrete Fourier Transform (henceforth the DFT) of a perfect sine wave. How the heck can I interpret the DFT of some data, if I don’t understand what it tells me in the simplest possible case?

As usual, I am using Mathematica®… in fact I’m using version 7.

Suppose we have the following function: a pure sine wave, with mean 3, amplitude 5, and period 7. (I habitually use prime numbers in examples so that I can see where they end up in any answers.)

Actually, let me do this in stages. First let the mean be zero:

.

Over the range from 0 to 21, we have the following:

We see that the period is, indeed, 7. Now let me construct the function I really want, with mean 3:

.

That is, we now have

(You see now why I did it in stages? The period is certainly around 7, but it’s not perfectly clear, because of the vertical offset.)

I have implicitly and partially set a unit of time, by saying that the period is 7. But is that 7 seconds? 7 years?

Let’s just suppose that it’s 7 samples, i.e. 7 times the sampling interval, whatever that may be. Then we have 7 samples per period, i.e. 7 samples per cycle, hence the frequency is 1/7 cycle per sample.

Note that I have not specified the time unit. I don’t need to… not for what we’re about to do. If it turns out that the samples are daily, then the period is 7 days and the frequency is 1/7 cycle/day. But we can incorporate time units after we done the following.

Now get 21 samples starting from t = 0. Note that 21 is an exact multiple of the period 7. You may already know why that is significant… but I will be showing you why, anyway.

I want both a 2D list (called d1) that includes the time, and a 1D list (called data) that is just the function values; I print only the 1D list. (The second command says to keep the last row – the function values – of the 2D list.)

That is, we have the following samples from our function (from 0 to 20 inclusive):

Let’s take the DFT (Discrete Fourier Transform), and let’s set a variable (nobs) for the number of data points. Let me also save the DFT with both a specific name (sin1) and a generic name (fft). (You know, as I recall, I do not use the specific name anywhere in what I posted. Oh, well.) Let me remark that the “Fast Fourier Transform” is, strictly speaking, one algorithm (OK, possibly a family of algorithms) for computing the DFT, not the DFT itself. Nevertheless, many of us say “FFT” when we mean “DFT” – and my generic name reflects that!

The setting FourierParameters -> {1, -1} makes one choice out of many for the exact definition of the DFT. This is the same choice I made in the post about Fourier Series and Fourier Transforms.

According to the documentation, Mathematica’s general formula is… and this choice of parameters leads to:

Compute… the command is simply “Fourier”:

The second output line is the number of observations (21). The first value on the first line, 63, is the “DC component”, associated with a frequency of zero. Your software – if it is not Mathematica – may return it somewhere else in the answer list (the center, perhaps?).

Can we determine the mean value of the original data? Yes:

That is, we took the first value – the DC component – and divided by the number of observations. Yes, of course, we could have computed the mean directly from the data, instead of extracting it from the DFT. Still, since it falls out anyway, why bother with a time-domain computation?

Can we get the period and the amplitude? Yes.

Drop the first number from the list… get the magnitudes of the rest of the numbers… then multiply by 2 and divide by the number of observations:

Bin 3 is picking out both the amplitude (5) – and tells us the period, which is the number of observations divided by the bin number:

Many books, of course, would have simply assured us that all of these values could be computed from the DFT… but I like knowing exactly what form of the DFT I need, and exactly what I have to do to get the values.

So, we have recovered our trig parameters of 3, 5, 7 from the DFT of our sample of 21 points.

## pure sine plus noise

Having figured out how to get the mean, amplitude, and period, let’s try adding noise to our beautiful sine wave. We still have

.

Add a random variable to the 21 samples – normal with mean zero and standard deviation 2. Here is the 2D list of x vs. the samples plus the random numbers:

The call “SeedRandom[1234]” initializes the pseudo-random number generator with the value 1234 – this guarantees that the table of pseudo-random numbers is always the same: this computation is random but repeatable.

As before, I get both a 2D list (d1) including x (for graphing), and a 1D list (data) of just the function values (for transforming). The black dots, of course, are the noisy sample.

As before, get the DFT and the number of observations:

As before, we get the mean by dividing the DC component (the first number in the list) by the number of observations:

So our estimate of the mean is 3.07 instead of 3.

Now drop that first number from the list… get the magnitudes of the complex numbers… then multiply by 2 and divide by the number of observations:

Plot that last list, of scaled absolute values:

Bin 3 is still picking out both the amplitude (5.31) and the period, which is 7:

That is, the bin number is converted to a period; the bin absolute value is the amplitude. Because of the noise, the amplitude and mean are off a bit. The period is still exactly right.

I’m not surprised.

Let me show something that might be surprising, at least until we think about it.

## pure sine but 22 samples

We have the very same function:

.

But this time we take 22 samples (again starting from t = 0). Here are the values whose DFT we will compute:

Here is a picture of the function and the samples:

The picture reminds us that we have no noise in this case. We get the DFT and the number of observations:

We compute the mean by dividing the first number by the number of observations:

Beautiful. But don’t be misled. The newest (last) observation was

If we were to add a 23rd observation, its value would not be 3, and the DC component – the sample mean – would not be 3. In other words, its a special case that the mean of 22 observations is still 3.

Next, as before, drop that first value from the list… get the magnitudes of the complex numbers… then multiply by 2 and divide by the number of observations:

Plot that last list.

As before, the value of bin 3 gives us the amplitude, and the number of the bin, 3, gives us the period, which is written both as a fraction and as a decimal:

So the amplitude and the period are off – just because the number of samples is not an exact multiple of the period.

What’s happening is that the possible periods are determined by the number of observations. The possible periods are

With 22 observations, a period of 7 is no longer a possible answer. But 22/7 isn’t the right answer either – so we should expect that there isn’t a single nonzero number in the 3rd bin.

That is, we expect the answer has to be spread out… one might say it doesn’t fit in bin 3.

This phenomenon – that the period cannot be confined to one bin – is called bin leakage. Let me emphasize that there was no noise in this example, only 22 samples instead of 21.

Note that two things have happened. The true period of 7 is not a possible answer… and the neighboring bins are no longer zero. The true amplitude of 5 is spread out over neighboring bins.

Now, just how often do you think that the number of samples will turn out to be a multiple of the period? What if there are more than one period?

Those are rhetorical questions. The correct answer to the question I didn’t ask is:

**Expect bin leakage; it should be the usual occurrence.**

I want to do one last thing with this example.

## shift the sine (i.e. the origin)

We start with the same underlying continuous function…

.

This time we take our nicely behaved 21 samples starting from x = 1 instead of x = 0.

That is, we have the following samples from our function (from 1 to 21 inclusive):

Compute the DFT:

We see complex numbers in bin 3 and bin 19 – and nowhere else. What we’ve actually gotten is a linear combination of the DFT for a pure sine and for a pure cosine – which is equivalent to the DFT for a phase-shifted sine. I decided not to worry about it… I’ll show you why in the next post.

We still compute the mean by dividing the first number by the number of observations:

Next, as before, drop that first value from the list… get the magnitudes of the complex numbers… then multiply by 2 and divide by the number of observations:

Our complex values have turned out to have magnitude 5. Bin 3 is still picking out both the amplitude (5) – and the period, which is the number of observations divided by the bin number:

It is still true that with 21 observations, the amplitude of the trig function is specified by bin 3; there’s no bin leakage in this case. But the DFT itself is different from the first case, when our samples began at x = 0. We have in effect shifted the origin, and our function is no longer a pure sine wave. It’s either got a phase or is the sum of a sine and a cosine.

There are many other things I could talk about, but let me refrain. This has been a compact example illustrating how we can get approximate trig parameters from the Discrete Fourier Transform.

The key facts are that we can read off the mean, the amplitude, and the period from the DFT… noise in the signal will affect our answers… and, most importantly, bin leakage will occur whenever the number of samples is not an exact multiple of the frequency or frequencies.

The next post will be a real-world example. And you’ll see how I cope with phases.

(If your software requires that your sample length be a power of 2, you may have difficulties. One recommendation is that you add zeroes to your sample, to get a power-of-2 sample size. Remember that you did that when you try to determine the trig values. I may look at this down the road.)

## Leave a Reply