\(\newcommand{\atantwo}{\text{atan2}} \) \(\newcommand{\sgn}{\text{sgn }} \) \(\newcommand{\R}{\mathbb{R}} \) \(\newcommand{\vec}[1]{\boldsymbol{\mathbf{#1}}} \) \(\newcommand{\ver}[1]{\boldsymbol{\mathbf{\hat #1}}} \) \(\newcommand{\tensalg}[1]{\mathcal{T}(#1)} \)
Prerequisites:
Welcome back to the humble world of understanding polar coordinates, for the Youtube rabbit hole addict turned mathematician.
In this installment, we are gonna take what little we know about exterior algebras and apply it to manifolds. And we are gonna do it by taking a little trip through integration. For intuition on what differential forms mean in \(\R^n\), please refer to Terrence Tao’s exposition.
I decided to split this post in two parts. This part will give intuition as to how the exterior product can be related to integration in manifolds, and use that as an excuse to build on the algebraic side of exterior powers from where we left in How to build stuff out of vector spaces. We will learn how these algebraic constructs can encode geometric information in manifolds, and introduce ourselves to the language they bring with them. In the second part, we will go fully into integration and differential operators in manifolds.
So let’s start with one chart \((U,\phi)\) of our smooth manifold. Our first idea might be to use the chart to move back to to Euclidean space and integrate there. So let’s do that!
$$ \int_U f = \int_{\phi(U)} f(\phi^{-1}(x))dx^1\dots dx^n $$
where \(x=(x^1,\dots,x^n)\) is the integration variable in Euclidean space. But this variable is exactly the local coordinates set up by the chart!
Now, let’s see if and how this changes when we choose another overlapping chart \((U,\varphi)\), setting up local coordinates \(y\).
$$ \int_{\varphi(U)} f(\varphi^{-1}(y))dy^1\dots dy^n = \int_{\phi(U)} f(\phi^{-1}(x))\left|\frac{\partial y}{\partial x}(x)\right| dx^1\dots dx^n $$
Where we did the (completely standard, in Euclidean space) change of variables \(\varphi^{-1}(y)=\phi^{-1}(x)\) and \(\partial y/\partial x\) is the Jacobian determinant of the transition function \(\varphi\circ\phi^{-1}\) at each point of the integration domain. Recall the Jacobian exists because the manifold is smooth!
Now we are stuck. The integral of the function changes as we change coordinates. That’s not what we wanted. We want the integral to remain the same no matter what coordinates we choose. And we cannot just demand for that Jacobian determinant to be \(\pm 1\).
It turns out a function is not the “correct” object to integrate against a general manifold, simply because it does not transform properly. Recall the change of variables formula from multivariable calculus:
$$ \int_{\phi(U)} f(y)dy^1\dots dy^n = \int_{U} f(\phi(x))\left|\det d\phi(x)\right| dx^1\dots dx^n $$
In our case, this \(\phi\) is going to be our transition function. So any object worth integrating in a manifold is going to have to transform as \([\omega]_x(x)=|\det d\phi(x)|[\omega]_y(\phi(x))\), where \([\cdot]_x\) indicates we are expressing an object in local coordinates \(x\).
We already know of an object which transform involves the Jacobian matrix. If you recall the post Trying to understand polar coordinates, the gradient of a scalar function on a manifold transforms as \([df]_x(x)=d\phi(x)^\top[df]_y(\phi(x))\). This is not quite what we wanted, but in the context of integration, gradients are integrated against curves (or, more generally, 1-dimensional smooth manifolds). For a 1-dimensional smooth manifold, \(d\phi(x)^\top\) is a \(1\times 1\) matrix — a scalar. So all we require is that said scalar is positive. We say a manifold is positively oriented if the Jacobian determinant for all its transition functions is positive. We can now define the integral of the gradient of a scalar function on a 1-dimensional manifold \(\gamma\) as:
$$ \int_\gamma df = \int_{t(\gamma)} [df]_t(t)dt$$
Note I assumed the manifold could be covered by the chart \(t\). If that is not the case, simply split the manifold up and integrate each individual chart. What we did was set up our gradient in local coordinates, move our curve to \(\R\) and integrate there. This is exactly what we do when we parameterize curves in \(\R^n\) to integrate them. However, htere are a couple of peculiarities. Let’s do an example to clear them up.
Say we have a curve \(\gamma\) embedded in \(\R^2\) parameterized by \(r(t)=(t,2t)\) with \(t\in (0,1)\). What we now need is a chart that will take each point in the curve to a point in \(\R\). That is easily achieved with the map \((t,2t)\mapsto t\). Now, let \(f(x,y)=xy\) be a function on \(\gamma\). The gradient in the ambient space \(\R^2\) is \(df_{(x,y)}=(y,x)\). But what we are really interested is the gradient in the manifold, which is 1-dimensional! Intuitively, we want the component of the gradient that goes along the curve, what we usually write \(\nabla f\cdot \dot\gamma\). That’s because we only care about how our function changes within the manifold. The ambient space is irrelevant and indeed cannot even be assumed to be there. Let’s get to that the manifold way — by writing the gradient in coordinates. What we want is \(\nabla (f\circ t^{-1})(t) = [df]_t(t)=4t\). Please not I abused (or rather overloaded) notation: \(t\) represents both a chart and the independent variable. This gradient is just a tangent vector: for each \(t\), there corresponds a point in the curve, and \([df]_t(t)\) is a tangent vector attached to it, written in the given coordinates. And if we change coordinates by choosing a parametrization \(s(t’)=(2t’-1,4t’-2)\) with \(t’\in (0,1)\), we can prove the integral remains exactly the same. Note this new change of coordinates maps our curve to the same domain \([0,1]\), but the coordinates are actually different. That is, \(t\) and \(t’\) map the same point \(p\) in the curve to two different outputs.
This is looking very much like the things we can integrate against manifolds are vector fields, things like gradients. We are getting closer… but still nope.
To see why not, note that in an \(n\)-dimensional manifold, tangent vectors do actually transform according to the \(n\times n\) Jacobian matrix, which is only reduced to the wanted determinant in the 1-dimensional case. However, this doesn’t matter. Here is the thing: I claim that what we were integrating up there was not actually a vector field. Yes, it was the gradient. But not the vector field gradient.
If you have studied calculus over general Banach spaces, you might be familiar with what the notation \(df\) actually represents in the precise sense. Given \(f:V\to W\) between Banach spaces, \(df_{x_0}\), when it exists, represents the best linear approximation to \(\Delta f_{x_0}(h) = f(h)-f(x_0)\). That is, the gradient can be thought of as a linear transformation \(df_x(h) = \nabla f(x)\cdot h\) (note that in this case, \(W=\R\) and \(h\in V\)). More precisely, it is a linear functional. And what is that \(h\)? Well, it is the displacement vector from \(x_0\). It is a tangent vector. That is precisely what the \(\nabla f\cdot \dot\gamma\) up there was all about: the derivative of the parametrization of a curve spits out tangent vectors, and the gradient is operating on them to spit out scalars.
It turns out the correct objects to integrate are precisely linear functionals. The way to visualize it is as follows. For an \(n\)-dimensional smooth manifold, the tangent space will work as a local linear approximation to the manifold (think of it as a tangent plane to a smooth surface). Hence, a basis for a tangent space will pretty much define the space locally, by forming an infinitesimal parallelotope. If you set up a surface integral, you will see exactly how these tangent vectors show up. Then, an object \(\omega_x\) that takes this basis of \(T_xM\) to a scalar is defining how an infinitesimal manifold patch transforms, which allows us to set up a Riemann integral.
$$ \int_U \omega \approx \sum_i \omega_{x_i}(v_{i_1},\dots,v_{i_n}) $$
Here, \(U\) is a chart, \(x_i\) are sample points in it and the \(v_{i_j}\) are the basis vectors for \(T_{x_i}M\). Note that the tangent vectors don’t show up in the integral, which only depends on \(\omega\). That’s because the specific choice of tangent basis vectors depends on the choice of parametrization, and we expect our integral to be independent of said choice. A clear example is \(dr=\dot \gamma dt\) for line integrals: the integral does not depend on how “big” or “small” that derivative is at all!
Note this approach is fundamentally different from trying to integrate a function. The basis vectors are aware of the local geometry and work as the equivalent of the “surface patch” (the infinitesimal surface normal in surface integrals). Functions by themselves are only aware of points.
Now let’s talk about the linear functional \(\omega_{x}(v_1,\dots,v_n)\). As we discussed, it has a clear geometric interpretation. Intuitively, being something we expect to integrate, we can expect the following properties:
- If two input vectors are the same, the parallelotope is degenerate and the functional outputs 0.
- If we multiply any of the vectors by a scalar, the infinitesimal volume is scaled by that same scalar. The functional is multilinear.
- If we swap two vectors around, the orientation of the infinitesimal patch is reversed (just like a surface normal). Hence, the sign of the functional changes.
- By the properties above, if two input vectors are linearly dependent, the functional outputs 0.
For more intuition into how this makes sense and why these things are actually the right thing to integrate, please refer to Terrence Tao’s exposition. It’s amazing.
But wait, these are exactly the properties of the \(n\)-th exterior power \(\bigwedge^nT_xM\). And \(\omega_x\) is a linear functional on that vector space, so it’s actually a member of its dual \(\bigwedge^nT_x^*M\)!
The space \(T_x^*M\) is called the cotangent space.
Let’s proceed as in Trying to understand polar coordinates and hunt down a basis for that vector space, and understand how these objects change as we change coordinates. Pick a point \(p\) in a chart \((U,x)\). Then, a basis of \(T_xM\) in coordinates \(x\) is given by \(\partial/\partial x^i\). The basis of the cotangent space in the same coordinates is defined by how it acts on the tangent space basis
$$ dx_i\left( \frac{\partial}{\partial x^j} \right) = \delta_i^j \tag{1}$$
where \(\delta_i^j\) is the Kronecker delta (a function that is 1 when \(i=j\) and 0 otherwise).
You might wonder why I denoted the basis vectors \(dx_i\), as in the differentials of the coordinate functions. Well, we established a differential of a function is a linear functional eating up a tangent vector and returning a scalar. That differential evaluated at a specific direction (tangent vector) represents how much the function changes at that direction. The coordinate function \(x_i\) should change along its coordinate direction given by its tangent basis vector \(\partial/\partial x^i\), and should not change along the other basis vectors. An easy example of this would be \(\R^2\) in polar coordinates: the coordinate \(r\) changes when you move infinitesimally in the direction of a \(\mathbf{\hat r}\) tangent vector (radially), but not when you move infinitesimally along the direction of a \(\mathbf{\hat \theta}\).
In fact, the differential of a coordinate function \(x_i\) in a Hilbert space (which the tangent space is not — yet!) corresponds exactly to the inner product in \(\R^n\) \(dx^i(v)=\langle \mathbf{\hat e}_i; [v]_x \rangle\), where \([v]_x\) is the vector expressed in coordinates \(x\). This is a consequence of the Riesz representation theorem.
That’s why members of \(\bigwedge^kT_x^*M\) for \(k=0,\dots,n\) are called differential \(k\)-forms, and can be integrated against a \(k\)-dimensional submanifold \(S\) of the \(n\)-dimensional manifold \(M\).
A differential \(k\)-form can be written as:
$$ \omega(v_1\wedge\dots\wedge v_k)=(\omega^1\wedge\dots\wedge\omega^k)(v_1\wedge\dots\wedge v_k)=\omega^1(v_1)\omega_2(v_2)\cdots\omega^k(v_k) $$
where \(\omega\in\bigwedge^kT_x^*M\), \(v_i\in T_xM\) and \(\omega_i\in T_x^*M\). We can prove all differential \(k\)-forms can be written as \(k\)-exterior products of differential 1-forms (and linear combinations thereof).
In fact, this applies to any general linear transformation. A linear map \(T:V\to W\) induces a linear map \(\tilde T: \bigwedge^kV\to\bigwedge^kW\) by \(\tilde T(v_1\wedge\dots\wedge v_k)=T(v_1)\wedge\dots\wedge T(v_k)\), extended linearly to all the domain.
Now let’s see what happens with a differential 1-form when we change coordinates. Recall \(x\) coordinates gave us a basis as in equation (1). The basis vectors are just gradients, so we know how those change when we change the basis: as \(dx^i_j =dy^i_k d\alpha_j^k \), where \(y=\alpha(x)\). Note I am using the Einstein summation convention, from the post How to build stuff out of vector spaces. Namely, \(dx, dy\) are matrices with each row containing a basis vector and \(d\alpha\) is the change of basis matrix for the dual vector space. Now, postulate a differential 1-form given by:
$$ \omega_p = \sum_{i=1}^n f_i(p){dx^i}_p $$
with \(p\in M\) and \(f_i:M\to \R\). Note each \(f_i(p)\) is a scalar, so the coordinate expression of \(\omega_p\) in \({\R^n}^*\) in \(x\)-coordinates is the row vector \([f_1(p)\dots f_n(p)]\). If we change the basis (and omitting the \(p\) for clarity):
$$ \omega^i_j = f_i d\alpha_j^k dy^i_k $$
Where, once again, we used the Einstein summation convention. This does not transform according to the determinant of the Jacobian matrix, precisely because it is a 1-form in an \(n\)-dimensional manifold. However, if we took a 1-dimensional submanifold (i.e. curve), we would observe 1-forms transform as needed. To convince yourself of that, go to our example of the gradient on a curve sitting in \(\R^2\), use the the charts \(t\) and \(t’\) and apply the change of coordinates formula to the gradient.
A general \(k\)-form consists on picking \(k\) elements from the ordered basis \((dx^1,\dots ,dx^n)\) to form a new basis vector \(dx^{i_1}\wedge\dots\wedge dx^{i_k}\) with \(1\leq i_1<\dots < i_k \leq n\). Then, a general \(k-\)form can be written as:
$$ \omega = \sum_{1\leq i_1<\dots < i_k \leq n} f_{i_1\cdots i_k}dx^{i_1}\wedge\dots\wedge dx^{i_k} $$
Finding the dimension of this space, the \(k\)-th exterior power of the cotangent space \(\bigwedge^kT_p^*M\) (or of any vector space in general), is an easy combinatorial problem and comes out to \(\frac{n!}{(n-k)!k!}\). The change of coordinates formula can be found by transforming each individual \(dx^i\) and using the properties of the exterior product. Feel free to calculate it yourself, it comes out to an index hellhole and is therefore left as an exercise to the reader.
What we are going to do is do the special case of an \(n\)-form. The space of \(n\)-forms is 1-dimensional, so they can simply be written as \(\omega = fdx^1\wedge\dots dx^n\). Transforming each individual coordinate differential:
$$ \omega = f (dy^{1}d\alpha) \wedge\dots\wedge (dy^{n}d\alpha) $$
The linear transformation \(d\alpha\) applied to the individual vectors induces a linear transformation on the \(n\)-exterior product, a member of a 1-dimensional vector space. That means the action of the induced linear transformation is simply multiplication by a scalar, and that scalar turns out to be the determinant of \(d\alpha\). We are left with:
$$ \omega = f \det d\alpha \cdot dy^1 \wedge\dots\wedge dy^n $$
To understand why the determinant is the appropriate scalar here, note \(d\alpha\) acts on the parallelotope determined by the rows of \(dy\), and the determinant is exactly the scaling factor for the volume of the parallelotope. In fact, this motivates a more abstract definition of determinant of a linear transformation \(A:V\to V\) as the unique scalar \(a\) such that \(\tilde A\omega = a\omega \) with \(V\) an \(n\)-dimensional vector space and \(\omega\in\bigwedge^n V\). In order to get comfortable dealing with \(n\)-forms, let’s prove that.
Take an \(n\)-form \(\omega = e_1\wedge\dots\wedge e_n \in\bigwedge^n V\), where \(B=\{e_i\}_{i=1}^n\) is a basis of \(V\). Since the \(n\)-th exterior power is a 1-dimensional vector space, every element in it is a scalar multiple of \(\omega\). So if we prove that for any linear transformation \(T:V\to V\), the induced transformation acts as \(\tilde T(\omega)=\det T\cdot\omega\) for that particular \(\omega\), it follows for the entire \(\bigwedge^n V\) by linearity. Furthermore, if we prove it for \(V=\R^n\) and \(B\) being the standard basis, it follows for any \(n\)-dimensional vector space by the coordinate isomorphism. With that in mind, if we define \(T\) by its action on the basis \(T(e_i)=a_i\), the corresponding matrix in \(B\) is given by \([T]_{BB}=[a_1\cdots a_n]\). Each column is given by \(a_i=[a_{1,i}\cdots a_{n,i}]^\top\). Now, lets evaluate:
$$ \tilde T(\omega) = T(e_1)\wedge\dots\wedge t(e_n) = a_1\wedge\dots\wedge a_n $$
We express each \(a_i\) as a linear combination of the \(e_i\) and use multilinearity:
$$ \tilde T(\omega) = \left( \sum_{j_1} a_{j_1,1}e_{j_1} \right) \wedge\dots\wedge \left( \sum_{j_n} a_{j_n,n}e_{j_n}\right) = \sum_{j_1}\cdots\sum_{j_n} a_{j_1,\dots , j_n} e_{j_1}\wedge\dots\wedge e_{j_n} $$
The thing \(a_{j_1,\dots , j_n}\) is known as a multi-index and can be appropriately formulated in the language of tensors. More specifically, we have \(a_{j_1,\dots , j_n} = a_{j_1}\cdots a_{j_n} = a_{j_1,1}\cdots a_{j_n,n} \).
We have many terms in this sum. Many of them will contain \(e_{j_a} = e_{j_b}\), and hence the wedge product will be zero. We want all of the \(e_i\) involved in the wedge product to be different in order for it to be non-zero, so only the ones with all the indices \(j_i\) different from each other survive. Since we have \(n\) factors in the wedge product and each index runs from 1 to \(n\), we are looking at terms such that \(\sigma=(j_1,\dots ,j_n)\) is a permutation of ı\((1,\dots , n)\). Denote the set of all such permutations of \((1,\dots , n)\) by \(S_n\). Our expression becomes:
$$ \tilde T(\omega) = \sum_{\sigma\in S_n} a_\sigma\,\,\, e_{\sigma_1}\wedge\dots\wedge e_{\sigma_n} $$
Now, recall we can move around the factors in the wedge product. If there are two factors, changing the order adds a factor of \(-1\), because the product is anticommutative. For \(n\) factors, we need to count the number of swaps we make. For instance, going from \(a\wedge b\wedge c\) to \(b\wedge c\wedge a\) requires that we first swap \(a\) with \(c\), then \(b\) with \(c\), so \(a\wedge b\wedge c = (-1)^2 b\wedge c\wedge a\) (because there were 2 swaps). There are other ways of taking \(a\wedge b\wedge c\) to \(b\wedge c\wedge a\), because the decomposition of a permutation into swaps (or transpositions) is not unique, but the minumum number of swaps required to make that permutation is well-defined. In general, if a permutation \(\sigma\) can be done in a minimum of \(k\) transpositions, we define \(\sgn\sigma = (-1)^k\). This is closely related to the theory of symmetric and cyclic groups, and the Levi-Civita symbol/tensor which we will encounter later.
With that in mind, since \(\sigma = (\sigma_1,\dots , \sigma_n)\) is a permutation of \((1,\dots , n)\), we can swap the factors of the wedge product around and add the appropriate \(\pm 1\) factor:
$$ \tilde T(\omega) = \sum_{\sigma\in S_n} \sgn\sigma\cdot a_\sigma\,\,\, e_1\wedge\dots\wedge e_n $$
And the remaining factor is exactly the determinant of \(T\):
$$ \det T = \sum_{\sigma\in S_n} \sgn\sigma\cdot a_\sigma $$
Permutations end up being important in the theory of differential forms. More generally, the is the beginning of the language of group theory getting introduced in the mathematics of physics.
That’s it for this post. See you in the next one!