Wavelets: Multiresolution Analysis (MRA)

By and large I try not to flee into cold, formal mathematics, not here. You can find all too many books that will give you just the mathematics (and some of them are invaluable references). On the other hand, sometimes I am way too vague. Let me try to give you a clear statement of what I’m going on about when I say “multiresolution analysis” (MRA).

(Don’t misunderstand me. When I’m studying something, I’m sitting there with collections of definitions, theorems, proofs, and examples – trying to make sense of them. I just think that the collection itself is not a substitute for understanding.)

There’s a lot to be said for having a clearly stated theorem. Working through this has had a large impact on my own grasp of the properties of wavelets.

The most lively summary of multiresolution analysis can be found in the first couple of pages of chapter 5 of Daubechies’ “Ten Lectures on Wavelets”. The most lively introduction can be found on pages 10-16.

I have more than a few, close to several, books that seem to present the following concepts in the same way. First, they use multi-resolution analysis to describe orthonormal wavelets; then they use filter banks to describe biorthogonal wavelets; finally, they explain how biorthogonal wavelets could be described by a modified multi-resolution analysis.

No one (of these authors at my fingertips) starts with a modified multiresolution analysis, instead of ending with it. And so I’m not going to either. Although I have not shown you biorthogonal wavelets yet, I have shown you “pre-wavelets” (which I would prefer to call semi-orthogonal wavelets), so we will have something in hand while we look at the standard multi-resolution analysis leading to orthonormal wavelets.

In other words, although I will start the same way these authors do, I will be waving a generalization in our faces, probably with the very next wavelet post.

I want to summarize Daubechies’ introduction to multi-resolution analysis (pp. 129-130). If you own a copy of her book, you should open it to chapter 5 and read the master. If you don’t own a copy of her book, and you are seriously interested in wavelets, then — at the very least — you should find a colleague or a library that owns a copy. But you don’t have to do it right away.

Here’s my summary of the beginning of her chapter 5.

And if something comes out wrong here, you should probably (ahem, almost certainly) assume that she got it right and I got it wrong. Oh, I hope it turns out that I’ve clearly distinguished my comments from her summary. Let’s go.

She says, “Multiresolution analysis provides a natural framework for the understanding of wavelet bases, and for the construction of new examples.”

So how does she describe multiresolution analysis? It is a collection of spaces $V_j\$ which satisfy 6 conditions (whose equation numbers from her text I have included).

First (5.1.1), let’s suppose we have an infinite ladder of closed spaces, one inside another:

$\dotsm\ V_{-3} \subset\ V_{-2} \subset\ V_{-1} \subset\ V_{0} \subset\ V_{1} \subset\ V_{2}\ \dotsm\ \$

(That is not her notation. She uses the negatives of those indices for the ordering of the subspaces, but I will continue to follow Burrus et al. She said, “I choose here the same nesting order (the more negative the index, the larger the space) as in the ladder of Sobolev spaces. This is also the order that follows naturally from the notation of non-orthogonal wavelets…. It is nonstandard, however: Meyer (1990) uses the reverse ordering, more in accordance with established practices in harmonic analysis.”)

Let me empahsize: I have displayed, and Burrus and I are using, an order in which the more positive the index, the larger the space. Where we have, in particular,

$V_{0} \subset\ V_{1}$

she writes $V_{0} \subset\ V_{-1}\$.

Let me also emphasize that the notation is not standardized, and in any given book, you need to see which of the two conventions they are using.

Second and third, let’s suppose that the union of these spaces is all of $L^2(R)\$, the space of (equivalence classes of Lebesgue) square-integrable real-valued functions on the real line R, and their intersection is the {0}-space; that is, (5.1.2),

$\cup\ V_i = L^2(R)\$

and (5.1.3)

$\cap\ V_i = \{0\}\$.

Burrus has those conditions; i just didn’t mention them before. One of our goals is to construct a basis for all of $L^2(R)\$.

Fourth (5.1.4), “… the multi-resolution aspect is a consequence of the additional requirement…”

$f(x) \in V_j \text{ if and only if } f(2^{-j} x) \in V_0\$.

(I did have to change that a little bit because I switched from her index convention on the $V_j\$.)

Furthermore, I phrased that the other way, and so did Burrus:

$f(x) \in V_0 \text{ if and only if }f(2^{j} x) \in V_j\$.

I can’t see that it matters, given that the conditions are “if and only if”.

Fifth (5.1.5), we will also require that the space $V_0\$ be invariant under translation:

$f(x) \in V_0 \text{ if and only if } f(x-k) \in V_0\$ for all integers k.

Sixth (5.1.6) and finally, we will assume that there exists a scaling function $\varphi\$ (she uses the symbol $\phi\$) such that its integer translates are an orthonormal basis for $V_0\$.

NOTE that the scaling function $\varphi\$ is merely one example, a special case, of the functions $f \in V_0\$. She, like most of my references, is distinguishing the scaling and translation properties of the spaces (with elements f) from the scaling and translation properties of the basis $\{\varphi(x-k)\}\$.

This was not what Burrus, and I, did. I followed him in starting with the scaling function $\varphi\$, defining $V_0\$ as its space of translates, $V_1$ as the space of scaled translates, and then requiring that $V_0 \subset V_1\$. Instead of talking about arbitrary elements f in $V_0\$, we spoke specifically about the basis generated by integer translates of $\varphi\$.

This is a very natural thing to do, since we have the Haar and Daubechies orthonormal wavelets in our hands…. More precisely, since we have the orthonormal scaling functions in our hands, it is natural to use them to define the spaces $V_j\$.

The general multiresolution analysis is stated, by contrast, as though we define the spaces $V_j\$ and then find an orthonormal basis for $V_0\$. Well, that what’s Daubechies did, to get her wavelets.

(In fact, I avoided talking about whether the $\varphi\ (x-k)$ were a basis. I haven’t even defined what I mean by a basis! Let me back up and see what Daubechies said earlier. Hmm. Not much at all. It appears to me that she takes for granted the notion of a basis in a Hilbert space. Let me do the same for a while. And I’ll comment at the end.)

Back to Daubechies.

“The basic tenet of multiresolution analysis is that whenever a collection of closed subspaces satisfies [those 6 conditions], then there exists an orthonormal wavelet basis… of $L^2(R)\$…”

$\psi_{j,k} (x) := 2^{j/2} \psi(2^j\ x - k)\$

such that (5.1.7, projection formula)

$P_{j+1} f = P_j f + \sum_{k \in Z} \ \psi_{j,k}\$.

where $P_j$ is the orthogonal projection operator onto $V_j\$. She continues: (5.1.2) ensures that $\lim_{j \to \infty} P_j f = f \text{ for all } f \in L^2(R)\$. In my words, functions in our $V_j\$ spaces are better approximations of functions in $L^2(R)\$ as we go to higher indices.

This is it in a nutshell. The key theorem is that those 6 assumptions get us an orthonormal basis $\psi_{j,k} (x) \text{ for } L^2(R)\$.

Along the way, we get several other results. First, I should emphasize that we just said that all the wavelets are mutually orthogonal, across scale and under translation.

For another thing, the proof is by construction, so we actually know what the $\psi\$ are. They live in (our familiar) $W_j\$ spaces, each of which is the orthogonal complement of $V_j\$ in $V_{j+1}\$:

$V_{j+1} = V_j \oplus W_j\$.

(She will obtain our general formula for the g coefficients of $\psi$ from the h coefficients of $\varphi\$.

She points out that the $W_j\$ spaces inherit the scaling property:

$f(x) \in W_j\text{ if and only if }f(2^{-j}) \in W_0\$.

Equivalently (it seems to me), $f(x) \in W_0\text{ if and only if }f(2^{j}) \in W_j\$.

She also points out that (5.1.7, that projection formula) tells us that for each fixed j, $\psi_{j,k}(x)\$ will be an orthonormal basis of $W_j\$; even better, once we know $\psi_{j,k}(x)\$ is an orthonormal basis of $W_0\$, then for each j, $\psi_{j,k}(x)\$ will be an orthonormal basis of $W_j\$.

Hence it suffices to get one specific $\psi\$, the mother wavelet.

She tells us something similar for the $V_j\$ spaces: from the basis for $V_0\$, we get a basis for every $V_j\$. If we define and denote

$\varphi_{j,n} (x) := 2^{j/2} \varphi(2^j\ x - n)\$,

for integers j,n

then $\varphi_{j,n}(x)\$ is an orthonormal basis for $V_j\$.

(We check that. For j = 0, we have

$\varphi_{0,n}(x) = \varphi(x-n)\$,

which are translates of the scaling function

and

$\varphi_{0,0}(x) = \varphi(x)\$

is the scaling function itself. So this formula does work for $V_0\$. As does the formula for the $\psi_{j,k}(x)\$.)

We get all the V,W orthogonal decompositions…

$W_j\$ orthog to $W_k\$ if j ≠ k.

$V_j = V_J \oplus_{k=0}^{j-J-1}\ W_{J+k}\$.

(I had to change that from what she had because of the different sign convention on indices. I hope I got it right, but we’ve already seen it in special cases.)

$L^2(R) = \oplus_{j \in Z}\ W_j\$.

(That last one better be true, since it corresponds to the $\psi_{j,k}(x)\$ being a basis for $L^2(R)\$ !)

To construct $\psi\$ she derives a few more properties, which I will omit, since we’ll see some of them soon enough (and others pertain to the frequency domain, which we’ll see down the road). Perhaps it is more important that she obtains $\psi\$ as the inverse Fourier transform of another function. As a result, the $\psi\$ may be distributions instead of functions.

The properties I wanted to make sure we saw here are:

• the $\psi_{j,k}\$ are an orthonormal basis for $L^2(R)\$;
• for fixed j, the $\psi_{j,k}\$ are an orthonormal basis for $W_j\$;
• the $W_j\$ spaces have the scaling property: $f(x) \in W_j \text{ if and only if }f(2^{-j}) \in W_0\$;
• for fixed j, the $\varphi_{j,k}\$ are an orthonormal basis for $V_j\$.
• Projections and direct sums

At some point when I was looking at this — I don’t remember whether it was her projection operator or someone else’s — I finally remembered that direct sum decompositions correspond precisely to projection operators. That is, if we have a (possibly non-orthogonal) direct sum of subspaces,

V = U + W,

then the linear operator P which maps V onto U is a projection, and I – P is the projection onto W. And we have a very convenient test: a linear operator P is a projection if and only if it is idempotent:

$P^2 = P\$.

Furthermore, the direct sum is orthogonal

$V = U \oplus V\$

if and only if the projection P is self-adjoint as well as idempotent.

How could I have forgotten that projections and direct sums go together? Well, I’ll try not to forget again. If one is on stage, then so is the other; one may be in the background, and the other in the foreground, but they are both on stage.

I might as well make a few more remarks. In finite dimensional spaces, it’s straightforward to get a projection onto a subspace. If we have $U \subset V\$, then we start with a basis for U, extend it to a basis for V, and then define the projection operator by its action on that basis for V: if e is a basis vector in U, then Pe = e; if e is a basis vector not in U, then Pe = 0.

In infinite dimensions we have more work to do. in fact, my recollection is that John von Neumann created the axiomatic theory of infinite dimensional complete inner product (i.e. Hilbert) spaces by working with the properties of orthogonal projections.

closed nested spaces

Why did Daubechies assume that the $V_i$ were closed? Well, let me just quote the appendix of Halmos’s Finite Dimensional Vector Spaces: “In the discussion of manifolds, functionals, and transformations the situation becomes uncomfortable if we do not make a concession to the topology of Hilbert space. Good generalizations of all our statements for the finite dimensional case can be proved if we consider closed linear manifolds, continuous linear functionals, and bounded linear transformations. (In a finite dimensional space every linear manifold is closed, every linear functional is continuous, and every linear transformation is bounded.)”

In other words, she made the standard assumption of closed subspaces for working in infinite dimensions.

The Lebesgue integral and the space L^2

If you have never seen the Lebesgue integral, think of $L^2(R)\$ as a space of functions whose squares are Riemann integrable (it isn’t); the subtlety here is that we really consider two functions to be the same if they differ only on a set of measure zero — think, “at a finite or countable number of points”. We’re not just using functions so well behaved that their squares have a Riemann integral, but if you’ve never seen this, let it go for now.

Actually, before we let it go, there’s another way to think of the Lebesgue integral…. I’m certain of this for compact intervals [a,b], so that’s how I’ll state it. Even though we’re interested in $L^2(R)\$, the following property of a compact interval may be helpful. The space of continuous functions on [a,b] with inner product the Riemann integral is not complete: not all Cauchy sequences converge to a limit in the space. But it’s a metric space, so we can complete it – and the result is $L^2[a,b]\$. That is, $L^2[a,b]\$ is to $C^2[a,b]\$ as R is to Q.

You might counter with, “But C[a,b] is complete!”. Yes, with a different norm. It’s actually $C^\infty[a,b]\$ that is complete, since the max norm corresponds to uniform convergence.

Let me be precise. We take the set X of continuous functions {x, y, …} defined on a compact interval J = [a,b]. Then we consider two different metrics. One is the sup (or max) metric:

d(x,y) = $max_{t \in J} |x(t) - y(t)|\$.

Then the resulting metric space, usually denoted C[a,b], is complete: the limit of any Cauchy sequence exists in the space. Furthermore, convergence in that space is uniform convergence.

But let’s take a different metric.

$(\int_a^b (x(t)-y(t))^2 \, dt)^{\frac{1}{2}}\$.

Then the resulting metric space is not complete; but its completion is precisely $L^2[a,b]\$. That’s one answer to the question, “Why should I care about the Lebesgue integral?”

A basis for an infinite dimensional space

The issue is that if a vector is written as an infinite series of coeffcients times basis vectors, then we must worry about whether that infinite series converges to a unique answer independent of the order of the basis vectors, as well as about convergence for any one particular order…. I’ve actually just taken a refreshing walk thru a few books, but even a summary of what’s involved when we talk about a basis is will take a full post in its own right.

If you’re curious, however, we’re talking about Schauder bases rather than Hamel, and we’re concerned about unconditional versus conditional bases, and Riesz bases, and (gulp) frames (which are spanning sets that need not be linearly independent). And just for fun: every linear space has a Hamel basis (if you believe in the Axiom of Choice, i.e. Zorn’s Lemma), but not every separable normed linear space has a Schauder basis. And, since the proof of existence of a Hamel basis requires the Axiom of Choice, we have a non-constructive proof: there is a basis, but can you find it? By contrast, I think we can find a Schauder basis if one exists at all.

wavelets without MRA?

Let me quote Daubechies. From p. 136: “Even though every orthonormal wavelet basis of practical interest, known to this date [1992], is associated with a multiresolution analysis, it is possible to construct ‘pathological’ $\psi$ such that the [$\psi_{j,k} (x) := 2^{j/2} \psi(2^j\ x - k)\$] constitute an orthonormal basis for $L^2(R)$ but are not derivable from a multiresolution analysis.” (I altered the signs on her right-hand-side indices.) She provides an example.

In other words, MRA is a very convenient framework from which to get wavelets, but not every wavelet comes from a MRA. Still, every wavelet of practical interest did. (I don’t know if that is still true, more than 15 years later. Let’s keep our eyes open.)

Oh, one of her endnotes says that if the mother wavelet has compact support, then we can find an MRA which leads to it.

Next, we will return to consequences of orthogonality, now that I have more context and understanding.

14 Responses to “Wavelets: Multiresolution Analysis (MRA)”

1. andrei Says:

hello rip,

i found by chance your website when i was looking for something that’s bothering me lately. i’m reading hernandez & weiss’ “first course in wavelets” for my ms degree final exam. in the definition of an MRA, the condition of invariance under integer translation for the space V0 is not listed, but after it is said that the functions phi_0,k=phi(x-k) (defined just like in daubechies) are all in V0 and they say that this follows from the fact that the integer translates of phi are an orthonormal basis for V0 (in your text, condition (5.1.6) ). this is not at all clear, because it’s not specified that a basis for a subspace contains elements from the subspace itself, not from L2(R). in many books it’s also required that V0 is invariant under integer translation, but in others this condition is missing.
can you give me an argument that the functions phi_0,k are in V0 for all integers k without assuming that V0 is invariant under integer translations?
thank you.

andrei

2. rip Says:

Thank you, Andrei, for the question. I learned something. (And I find it satisfying that my library served me well.)

As I understand it, you have asked one question: why are the phi_0,k in V0? And you have raised another: why is the invariance of V0 sometimes omitted?

First question. You said…

“…. in the definition of an MRA, the condition of invariance under integer translation for the space V0 is not listed, but after it is said that the functions phi_0,k=phi(x-k) (defined just like in daubechies) are all in V0 and they say that this follows from the fact that the integer translates of phi are an orthonormal basis for V0 (in your text, condition (5.1.6) ). this is not at all clear, because it’s not specified that a basis for a subspace contains elements from the subspace itself, not from L2(R).”

Well… they must be. The vectors that make up a basis for a space S are themselves in the space S — even if we didn’t assume that, each one of them is trivially in the span of the basis (since each is 1 times itself plus zeroes times the other basis vectors).

“in many books it’s also required that V0 is invariant under integer translation, but in others this condition is missing. can you give me an argument that the functions phi_0,k are in V0 for all integers k without assuming that V0 is invariant under integer translations?”

Sure; I just did. That the phi_0,k are a basis for V0 implies the phi_0,k are in V_0. I need not even think about whether V0 invariant. (You’re asking for such an argument because they didn’t give us invariance; don’t worry about it.)

The real insight is this: that the phi_0,k are a basis also implies that V_0 is invariant under integer translation: if f(t) is any function in V_0, then f(t-k) is also in V_0 for all integer k.

Say it loud: condition 5.1.5 is redundant; it follows from 5.1.6. That’s why it’s often omitted!

I had not realized that. Hernandez and Weiss did not say it was redundant — they simply didn’t list it! — but they provide a reference (Madych, “Some Elementary Properties of Multiresolution Analysis of L2(Rn)” in Chui, “Wavelets: A Tutorial in Theory and Applications” ISBN 0-12-174590-2) that does say it explicitly. (And, as we would expect, Daubechies herself did not say that 5.1.5 was redundant, not in the “Ten Lectures”.)

Madych doesn’t prove that 5.1.5 is redundant (“It is clear…”) but once said, it does seem clear to me. If f(t) is in V_0, then it has an expansion in terms of the basis. We have basis functions and coefficients. Now slide the basis functions by k (replace a_j phi_j(t) by a_j phi_j(t-k) ), and we’re looking at the expansion of f(t-k). Since f(t-k) is in the span of the basis, it’s in V_0.

If this isn’t clear — or if I’ve made a mistake — please let me know.

Rip

• andrei Says:

Thank you very much, the ideas are clear now. I guess I tried to prove that phi_0,k are in V0 using only the definition from Hernandez and Weiss (applied to V0 instead of L2(R) ): “{phi_0,k: k integer} is an orthornormal basis for V0 if every f in V0 is a combination of phi_0,k (with real coefficients and the equality between f and this expansion taking place in L2 norm; without saying that phi_0,k are in V0 for all k). But this is equivalent to the fact that V0 is the span of the phi_0,k and I tried to avoid this equivalence (now I don’t understand why I did so).
Your argument “If f(t) is in V_0, then it has an expansion in terms of the basis. We have basis functions and coefficients. Now slide the basis functions by k (replace a_j phi_j(t) by a_j phi_j(t-k) ), and we’re looking at the expansion of f(t-k). Since f(t-k) is in the span of the basis, it’s in V_0.” is the most thorough thing you can say about my question.
When I first read that statement, it was all clear. The next morning I re-read that line and I couldn’t convince myself why it was true. Maybe sometimes is better to stick to the first ideas and not be so scrupulous with this kind of details, because it will take more time to get to the heart of the real “problems”.

Andrei

3. rip Says:

You’re quite welcome, I’m glad to have helped.

Thanks again for asking; as I said, I learned something, too: that Daubechies’ 6 assumptions are redundant.

Rip

4. Sper Says:

Hi Rip,

I have just discovered your blog (i.e., today) and so far I think it provides a very interesting and exquisite insight on the topic I have looked into (wavelets).

I am studying MRA and wavelet spaces at the moment and am trying to solve some questions concerning simple properties of the wavelet spaces, starting from the properties we know for the approximation spaces (more specifically, I am trying to convince myself that the scale invariance and shift invariance of W0 for the wavelet spaces also hold.) Trivial as this may be to show, I can’t seem to be able to get “back” to the W’s. I enter the approximation
spaces, where I can derive some conclusions, and then apparently I can’t “return” in the wavelet spaces to show the property I am looking for. Could you give me any ideas (regardless of the notation convention followed)?
Thank you very much for you time,

Sperantza

5. rip Says:

Hi Sperantza,

Thanks for the question… I haven’t looked at my wavelet posts for a while, and it’s good to see them again.

To show scale invariance, for example, let me just show that g(t) in W0 implies g(2t) in W1.

g(t) in W0 implies g(t) in V1 because W0 is a subspace of V1.
Then g(2t) is in V2 = V1 + W1, and all we have to do is show g(2t) in W1 rather than in V1.

To do that, take any f(2t) in V1, then f(t) is in V0, and the integral of f(2t) g(2t) is 4 times the integral of f(t) g(t) — which is zero, because f(t) and g(t) are in V0 and W0 respectively. But the zero integral of f(2t) g(2t) says that g(2t) is in W1 instead of in V1, because f was arbitrary in V1.

I hope this helps.

rip

6. Sper Says:

It’s interesting how all maths people (not too many, myself included) when asked first came up with this – I dare say – somewhat incorrect solution to the problem.

The solution is okay up to this point:

Then g(2t) is in V2 = V1 + W1, and all we have to do is show g(2t) in W1 rather than in V1.

As you correctly imply, V1 and W1 are disjoint sets (well, if we ignore the 0 vector, which is their only common element). Nevertheless, as I am sure you know, V1 + W1 \neq V1 \cup W1 (i.e., V2 is not the union of V1 with W1). So this is the point where I think the argument goes wrong. g(2t) not belonging to V1 does not necessarily imply that it is in W1, for this very reason.

The only approach I thought of that seems to get me to a flawless argument is to write g(t) in terms of the basis functions in W0, and then change t to 2t and try and express f(2t) as a linear combination of the basis functions of W1.

Please let me know what you think.

Thank you so much,

Sperantza

• rip Says:

Hi Sperantza,

You can do it your way: we do, after all, have orthonormal bases for both the $V_i\$ and the $W_i\$.

Although I used “+” rather than $\oplus\$, out of laziness, I am absolutely certain that $V2 = V1 \oplus W1\$ (etc.), i.e. that V2 is the orthogonal direct sum of W1 and V1; equivalently, that $W_i\$ is the orthogonal complement of $V_i\$ in $V_{i+1}\$.

(I’m surprised you disagree. Maybe we need to talk about that.)

This means that every function in V2 may be written uniquely as the sum of a function in V1 and a function in W1. What I think I’ve done is show that for g(2t), its hypothetical function in V1 is zero, leaving g(2t) only in W1.

You are right that there may be functions in V2 which are in both V1 and W1 — but such a function is not orthogonal to an _arbitrary_ f(2t) in V1.

By all means, please let me know if you still think I’m wrong.

rip

• Sper Says:

rip,

I’m not sure I disagreed to the fact that W_i is the orthogonal complement of V_i in V_{i+1}; I think all I said was that V_{i+1} is not the union of the two (i.e., of V_i and W_i) . It seemed to me that your argument relied on this, but from going through your latest post I found out that I was mistaken in this assumption.

One potential reason for our seeming divergence of opinions regrading this question, is that in the convention I am following, the approximation and wavelet spaces are not orthogonal complements, but only algebraic complements; consequently, I believe, all bases are not orthogonal, but only Riesz.

I will take some more time and think about all implications and let you know of my conclusions.

Thank you once more for taking the time!

Sperantza

7. Sper Says:

In my previous post, when I wrote “orthogonal” bases, I meant “orthonormal” bases.

8. rip Says:

Hi Sperantza,

Good, we have different assumptions.

What are your texts? Maybe I need a few more books.

rip

9. Sper Says:

My main text is a coursepack which unfortunately has not been published yet (aside from the university edition) so I’m afraid I can’t refer you to it.

From what I’ve seen, it seems to follow the assumptions in Stephan Mallat’s “A wavelet tour of signal processing” (please see 7.4.1. from the second edition), as it appears to provide a more general (less constrained) framework.

Sperantza

10. rip Says:

Thanks, I own Mallat. I’ll get to it someday, but I don’t know when. Bi-orthogonal wavelets have been on my list for a while….

rip

11. Sper Says:

Thank you.

Anyway, in case I do come up with another solution than the -rather lengthy and perhaps not so elegant, but quite straight forward – expansion in terms of the bases functions, I will write back.

All the best,

Sperantza