digression: eigenvalue 1

Before I proceed with consequences of orthogonality, I need to mention an omission. For one thing, I have gotten so caught up in the properties I’ve been looking at, that I have forgotten one of the crucial ones we used earlier. For another thing, the consequence which I have forgotten is still a little strange to me.

The consequence (or consequences)?

• That the dilation equation could be written as an eigenvalue equation,
• that the existence of a scaling function seems to be equivalent to an eigenvalue = 1,
• and that the values of the scaling function at the integers are given by the corresponding eigenvector.

This has been crucial to some of our work: the recursion which I use for computing the scaling function is initialized by setting the values of the scaling function at the integers — that is, by finding the eigenvector with eigenvalue 1. Recursion — especially when combined with a lookup table — is very easy and very powerful; but we absolutely had to have initial values, and that eigenvector provided them. (I did this for the Daubechies D4, and for four wavelet systems with 6 nonzero h’s.)

The strangeness?

If I had never seen it fail, it wouldn’t seem strange. I expect that I will show you an example some day. Furthermore, understanding this seems to require that I move into the frequency domain. And I’m not ready to do that yet. (Actually, I’ve begun doing it, but unfortunately I’m still at the stage where some of my calculations result in what I know to be wrong answers. Details, details. When I get this all ironed out, it should be a great example…. And during the time it has taken to finish another draft of this post, I seem to have it. Yes!)

This is probably a good time to remind you that I am not yet past the stage of laying things out so that I recognize them when I see them in my reading. Oh, they make a lot more sense than they did when I started, but my understanding is still rather fragmented.

The eigenvalue properties are still mysterious, as is the assertion that the sum of the odd h’s is equal to the sum of the even h’s.

Okay, we have one powerful theorem and one example (linear splines) which the theorem does not cover. Let’s look at what they have in common.

For both of them, we have five of six conditions in common… nested spaces $V_j\$ which are closed under translation and which have a scaling property. For the theorem, we have the sixth condition — that we have an orthonormal basis for the space $V_0\$. For the example — the linear splines — we have a basis for $V_0$ but it is not even orthogonal.

They both, however, have an orthogonal direct sum decomposition, which we describe by saying that $W_j\$ is the orthogonal complement of $V_j\$ in $V_{j+1}\$; for example,:

$V_1 = V_0 \oplus W_0\$.

Continuing on, we write such things as

$V_2 = V_1 \oplus W_1 = V_0 \oplus W_0 \oplus W_1\$

$V_3 = V_2 \oplus W_2 = V_1 \oplus W_1 \oplus W_2 = V_0 \oplus W_0 \oplus W_1 \oplus W_2\$.

The decomposition

$V_3 = V_1 \oplus W_1 \oplus W_2\$

is interesting because it shows us that there is nothing special about $V_0\$. Furthermore, we could go in the other direction: there is nothing special about non-negative indices.

$V_0 = V_{-1} \oplus W_{-1} = V_{-2} \oplus W_{-2} \oplus W_{-1}\$.

The first important thing to realize is that any two W spaces are orthogonal to each other. This means that wavelets are orthogonal across scales.

The challenge is knowing if a W space is closed under translation and has the scaling property. The orthogonal direct sum decomposition guarantees that every function in one W space is orthogonal to every function in every other W space. For our linear spline example, where I can exhibit the mother wavelet, it seems clear — and dangerous as that assumption may be, I’m going to make it — it seems clear that $W_0$ is closed under translation and that every W space has the scaling property.

Our theorem guarantees that, if we have an orthonormal basis for $V_0\$; but our theorem does not apply to the linear splines.

The second important thing to realize is that the space $V_0\$ is orthogonal to $W_0\$, and in general $V_j\$ is orthogonal to $W_j\$. But more, the nesting of spaces tells us that $V_j\$ is orthogonal to $W_k\$ whenever k is greater than or equal to j. This means that scaling functions are orthogonal to some wavelets but not necessarily all.

To be specific, the direct sum decomposition

$V_3 = V_0 \oplus W_0 \oplus W_1 \oplus W_2\$

tells us that $V_0\$ is orthogonal to $W_k\$ for some of the k’s greater than or equal to 0; and the generalization to arbitrary j…

$V_j = V_0 \oplus W_0 \oplus ... \oplus W_j\$

tells us that in fact $V_0\$ is orthogonal to $W_k\$ for all k greater than or equal to 0. And we know there’s nothing special about the 0 subscript: for any m < j we may write

$V_j = V_m \oplus W_m \oplus ... \oplus W_j\$.

So. An orthogonal direct sum decomposition tells us a lot about wavelets and scaling functions across scales.

Let me remind us that our powerful theorem told us more: we have orthonormal bases for both $V_0\$ and $W_0\$ — and hence orthonormal bases for every $V_j\$ and $W_j\$.

Now, what are the additional consequences specifically of having an orthonormal basis for $V_0\$?

One of them is developed in the proof of the theorem (for which see Daubechies “Ten Lectures on Wavelets”), wherein we explicitly constructed the mother wavelet. We have that the mother wavelet $\psi$ satisfies the equation

$\psi(t) = \sum_{n} g(n)\ \sqrt{2}\ \varphi(2t-n)\$.

and that $\psi \in W_0\$ if and only if the g’s are given by a magic recipe:

$g(n) = \pm (-1)^n h(M-n)\$, for M any odd integer.

The -n in M-n says that we reverse the h’s; the $(-1)^n\$ says that we alternate the signs; the$\pm\$ says that we may choose to change the first sign or the second; the M says that we may shift the h’s; but in fact we may only shift by an odd integer. Since zero is even, we must shift by at least 1. Oh, we also have that the number of (non-zero) g’s is the same as the number of (non-zero) h’s.

The h’s, you recall, come from the dilation equation for the scaling function $\varphi\$,

$\varphi(t) = \sum_n {h(n)\ \sqrt{2}\ \ \varphi(2\ t - n)}$

Our example of the linear splines, whose translates are not all orthogonal, shows us that the magic recipe does not work if the translates are not orthogonal. To be specific, our h’s were

$\left\{\frac{1}{2 \sqrt{2}},\frac{1}{\sqrt{2}},\frac{1}{2 \sqrt{2}}\right\}\$

and our g’s were
$\left\{\frac{1}{24},-\frac{1}{4},\frac{5}{12},-\frac{1}{4},\frac{1}{24}\right\}\$.

This example also shows us that the number of g’s can be different from the number of h’s.

By contrast, the Haar, the D4, and the D6 etc. scaling functions and wavelets satisfy our powerful theorem, and their mother wavelets were given by that magic formula for the g’s.

Two other results which I discussed in the previous post were:

• the magic recipe guarantees that the sum of the g’s is zero;
• and if the of the g’s is zero, we expect the integral of the mother wavelet to be zero.

For the Haar and Daubechies D4, D6 etc., both of those conditions hold. For the linear spline example, however, only the second condition holds. The magic recipe implies that the integral of the mother wavelet is zero, but the integral can be zero even if we do not use the magic recipe.

Next, there is an extremely important consequence of having an orthonormal basis for $V_0\$. We can derive the following quadratic condition:

$\sum_n h(n) h(n-2k) = \delta(k)\$,

where $\delta(0) = 1\$ and $\delta(k) = 0\$ for k ≠ 0.

Perhaps surprisingly, it is independant of the norm ($\sqrt{P}\$) of the scaling function, but it does depend on A (our customary $\sqrt{2}\$) in the dilation equation. You should recall that I use P to denote the value of the integral of the scaling function squared:

$P := \int \varphi(x)^2 \, dx\$.

I should point out that any consequence of orthonormality which is independent of P is, in fact, a consequence of mere orthogonality. (By orthonormality — or orthogonality –, in this context, I mean that the translates of the scaling function are orthonormal — or orthogonal.)

The quadratic condition has one immediate corollary, for k = 0: the sum of the squared h’s is 1,

$\sum_n h(n) h(n) = 1\$.

Some quick algebra indicates that the general formula would be

$\sum_n h(n) h(n-2k) = \frac{2}{A^2} \delta(k)\$.

I’ll show you later just how important that corollary is.

There is another crucial consequence of orthonormality. I believe it is a corollary of the quadratic condition, but I have usually seen it associated with the frequency domain.

I believe the quadratic condition implies that the number of non-zero h’s is even (if it is finite). In fact, having decided that, I find it in Strang & Nguyen (p. 148), which changes my belief to virtual certainty. Good. Incidentally, they call it “double shift orthogonality”.

Here’s how I got it.

Whenever I write it out for an odd number of h’s, I can show that the product of the first and last must be zero — so if one of them is zero, we just ended up with an even number of nonzero h’s. To be specific, if we assume that we have 3 h’s, h0, h1, h2, then we get the condition

h0 h2 = 0,

which requires that either h0 = 0 or h2 = 0, and then there are only 2 nonzero h’s.

This is why we have D2 (i.e. Haar), D4, D6, etc. but we do not have, for example, D3. That is, there is no orthonormal Daubechies’ wavelet family with three nonzero h’s. (The notations are not universal; Daubechies herself wrote D2 for what I am used to seeing as D4 today.)

Of course, the linear splines have an odd number of h’s — but the point is, they are not orthogonal — they’re allowed to have an odd number because the quadratic condition does not hold.

I do not know how useful the following are, but let me note that we have a quadratic condition relating h’s and g’s:

$\sum_n h(n) g(n-2k) = 0\$.

And we have one for just the g’s, too; I think that for general A, it looks just like the condition on the h’s:

$\sum_n g(n) g(n-2k) = \frac{2}{A^2} \delta(k)\$.

(If we are using the magic recipe to get the g’s from the h’s, do we need either of those conditions? I suppose we could use them to catch a typo in a given set of g’s or h’s….)

There is one last consequence I wish to show you. Again, I am not sure how useful this is, but it is unusual enough to be worth noticing, and we may very well find a use for it. This consequence is that

$(\int \varphi(x) \, dx) ^2 = \int (\varphi(x)^2)\, dx\$,

which I would write as

E^2 = P,

$E = \sqrt{P} = || \varphi ||\$.

That is, the L^2 norm $\sqrt{P}$ of the scaling function is equal to its integral E. This is unusual. It requires both orthogonality and the partition of unity, and might also require — as usual — an interchange of integration and an infinite series. Let me show you.

If the translates of the scaling function are orthogonal, then we have

$\int \varphi(x) \varphi(x-k) \, dx = P \delta(k)\$.

(That P is my notation, not Burrus et al. As I said before, you have every right to curse me out if you are reading Burrus et al. I am sorry that I did not understand their notation, but I’m not going to change mine now.)

Take the sum — or infinite series — over all k, and get

$\sum_k \int \varphi(x) \varphi(x-k) \, dx = P\$.

Interchange, assuming we may, and write

$\int \varphi(x) \sum_k \varphi(x-k) \, dx = P\$.

But one of the consequences of the dilation equation was a so-called partition of unity — the sum of all integer translates of the scaling function is equal to the integral E of the scaling function…

$\sum_k \varphi(x-k) = E\$

so by replacing the summation we get

$E \int \varphi(x)\, dx = P\$,

and then by replacing the integral we get

E^2 = P,

QED.

Summary

So. Here is what we have. Six “consequences” of the dilation equation:

• The sum of the h’s = 2/A.
• $\varphi(t) = \sum_n {h(n)\ A\ \ \varphi(2\ t - n)}\$.
• $E = \int\ \varphi(t)\ dt \$ .
• The sum of the even h’s = the sum of the odd h’s.
• $\sum_k { \varphi(\frac{k}{2^j})} =E\ 2^j$
• $\sum_k { \varphi(t+k)} = E \$

A wavelet equation like the dilation equation serves to define coefficients g(n).

• $\psi(t) = \sum_{n} g(n)\ A\ \varphi(2t-n)\$.

We have the utterly crucial property that if the translates of the scaling function are orthonormal, then the g’s are given by a magic recipe:

• $g(n) = \pm (-1)^n h(M-n)\$, for M any odd integer.

We have two observations about the sum of the g’s.

• the magic recipe guarantees that the sum of the g’s is zero;
• and if the of the g’s is zero, we expect the integral of the mother wavelet to be zero.

If the translates of the scaling function are merely orthogonal, so that the squared-norm is not 1…

$P := \int \varphi(x)^2 \, dx ={ || \varphi ||}^2 \ne 1\$,

then we get the following six — or seven — results:

• the quadratic condition $\sum_n h(n) h(n-2k) =\frac{2}{A^2} \delta(k)\$;
• its corollary $\sum_n h(n) h(n) = \frac{2}{A^2}\$;
• its second corollary: the number of nonzero h’s is even, if finite;
• it’s relatives $\sum_n h(n) g(n-2k) = 0\$ and $\sum g(n) g(n-2k) = \frac{2}{A^2} \delta(k)\$;
• the $L^2\$ norm of the scaling function is equal to its integral: $E = \sqrt{P} = || \varphi ||\$;
• its corollary E = 1 if and only if P = 1.

Now you might understand why I confused E and P: if the translates of the scaling functions are at least orthogonal, then either P and E are both equal to one, or neither is equal to one. If the translates of the scaling functions are orthonormal, then E = P = 1 — they look like the same thing. And I mixed them up when I first started working through Burrus et al. because on the first pass I was focused on the case E = P without realizing it. My bad.

I don’t know about you, but I need to be careful. If we have orthogonality but not orthonormality, then we need both E and P. If we do not have even orthogonality, we may not care about P but we will still need E.

To put it another way, any consequence of orthogonality which is independent of P is also independent of E.

Two examples

Let me close by giving you two examples of how important the quadratic corollary is.

For the first example, let us look for a scaling function with two coefficients h0 and h1, whose integer translates are orthonormal. Write the dilation equation as usual with $A = \sqrt{2}\$, which gives us the sum of the h’s (a linear condition):

$\sum_n h(n) = \sqrt{2}\$.

Then write the corollary of the quadratic condition also for $A = \sqrt{2}\$:

$\sum_n h(n) h(n) = 2/2 = 1\$.

That is, we have two equations in two unknowns,

$\begin{array}{l} \text{h0}+\text{h1}=\sqrt{2} \\ \text{h0}^2+\text{h1}^2=1\end{array}$

and their solution is unique: $h0 = h1 = 1/\sqrt{2}\$. The Haar system. Don’t bother looking for another orthonormal set.

So. From the sum of the h’s and the quadratic condition, we can derive the Haar scaling function and show that it is the only one which leads to an orthonormal basis with two h’s.

For the second example, let us look for a scaling function with 4 coefficients, whose integer translates form an orthonormal basis for $V_0\$. Sound familiar?

Again, we have the linear condition on the sum of the h’s…

$h0 + h1 + h2 + h3 = \sqrt{2}\$ ;

$h0^2 + h1^2 + h2^2 + h3^2 = 1\$.

But we also have the general quadratic condition itself, for N=4,

$h(1) h(1-2 k)+h(2) h(2-2 k)+h(3) h(3-2 k)+h(0) h(-2 k)=\delta(k)\$

and then for k = 1 it reduces to…

$h(0) h(2)+h(1) h(3)=0\$.

(For other values of k, we get 0 = 0.)

We have three equations in four unknowns:

$\begin{array}{l} h(0)+h(1)+h(2)+h(3)=\sqrt{2} \\ h(0)^2+h(1)^2+h(2)^2+h(3)^2=1 \\ h(0) h(2)+h(1) h(3)=0\end{array}$

We do not get a unique solution but one of the solutions is, indeed, the Daubechies D4 scaling function.

I could write out the family of solutions in terms of h[0] as a parameter, but they are not simple. I’m not going to write them out.

I’m just going to recall the family of solutions as given by Burrus et al (and which I’ve already shown you),

$h(0)=\frac{-\cos (a)+\sin (a)+1}{2 \sqrt{2}}$

$h(1)=\frac{\cos (a)+\sin (a)+1}{2 \sqrt{2}}$

$h(2)=\frac{+\cos (a)-\sin (a)+1}{2 \sqrt{2}}$

$h(3)=\frac{-\cos (a)-\sin (a)+1}{2 \sqrt{2}}$

(Of course, we could actually verify that these solutions do satisfy our original 3 equations. I’m pretty sure I did that for myself. In fact, I think that’s how I find the mistake in the original post!)

The Daubechies h’s are found by setting $a = \pi/3\$.

I still haven’t shown why the Daubechies are special, but we’re closer: now, at least, we know that they are one of an infinite number of solutions which satisfy 3 linear and quadratic conditions on the 4 h’s.

I also haven’t shown how we would get the solutions in terms of the angle a. Well, I don’t know yet. There is a chance that another derivation of the Daubechies D4 will proceed via trig polynomials and give us this compact family of solutions. There is also a chance that getting the solutions in terms of an angle is simply black magic.

I’ll try to find out, as we proceed.