\(\newcommand{\atantwo}{\text{atan2}} \) \(\newcommand{\R}{\mathbb{R}} \) \(\newcommand{\vec}[1]{\boldsymbol{\mathbf{#1}}} \) \(\newcommand{\ver}[1]{\boldsymbol{\mathbf{\hat #1}}} \) \(\newcommand{\tensalg}[1]{\mathcal{T}(#1)} \)
First post: Trying to understand polar coordinates
Welcome to the second installment of this post series!
Plot twist: The next posts will have nothing to do with polar coordinates. They are things I discovered while answering my questions about polar coordinates, but as it usually happens in mathematics and life, what you discover along the way is more worthwhile than the final answer.
So, we are moving towards differential geometry. Eventually, I hope to reach basic special or even general relativity. We will see.
This post is going to be way more basic though. We are going to take a bunch of finite-dimensional vector spaces over the same field, and build some other interesting vector spaces out of them. That’s it.
A humble welcome to the world of tensors and differential forms.
Stumble 1: Tensors and differential forms
Since I did not know how to understand polar coordinates, I turned to another question that had been on my head. Specifically, I kept hearing “tensor, tensor, tensor, differential form”, but I was not sure what they were. So I went and found out. What I found is something I could fully understand… but it seemed to bear little resemblance to what people referred to when they talked about tensors and differential forms.
That’s because when people generally talk about tensors and differential forms, they do so in the context of manifolds. We will carry the contents of this post into that context in a later post.
The game starts with a vector space \(V\) over a field \(K\). The tensor algebra \(\tensalg{V}\) is something that’s super easy to describe: it’s the free algebra \(F(V)\). That simply means you start out with all of your \(v \in V\) and then take associative, bilinear formal products however you like. That is, the elements of \(\tensalg{V}\cong F(V)\) are just expressions of the form \(v_1 \otimes \dots \otimes v_r \) which we then can add up to get stuff like \(v_1 \otimes \dots \otimes v_r + u_1 \otimes \dots \otimes u_s\). The product has the following properties:
- The product is bilinear (and by extension, multilinear). That is, it distributes over vector addition and all scalars are out of the product. Just like normal multiplication.
- The product is associative. We have \(v_1\otimes (v_2\otimes v_3) = (v_1\otimes v_2)\otimes v_3 = v_1\otimes v_2\otimes v_3 \).
And that’s it. The product is not commutative, and does not have any extra properties. Elements of the free algebra of \(V\) are called tensors on \(V\). That’s step one to understanding the basics of tensors.
Let’s go for step 2.
Until now, tensors were a purely syntactic construction. We built them by simply concatenating vectors from our vector space with product and sum symbols. Then we manipulate them with the rules for vector spaces together with associativity and bilinearity. To get another view on tensors, we will first need to talk about multilinear maps. The journey throughout multilinear maps starts at dual spaces.
The dual space of a vector space \(V\) is denoted \(V^*\). It is the space of all (continuous) linear functionals \(V\to K\). There is a lot to say about dual spaces, but all we will need is that the dual of the dual \(V^{**}\) is “the same” as \(V\). This is actually a very interesting issue to think about… perhaps in another post.
Since we are interested in multilinear functionals, that is, maps \(V_1 \times \dots \times V_N \to K\) (each \(V_i\) is a vector space over \(K\)), we can think about how to build them out of elements of our dual spaces. That is actually easy enough. We can have a bilinear functional \(V_1 \times V_2 \to K\) by taking \(\omega_1 \in V_1^*\) and \(\omega_2 \in V_2^*\) and defining our bilinear functional \(T\) as:
$$ \forall v_1\in V_1,\forall v_2\in V_2: T(v_1,v_2)=\omega_1(v_1)\omega_2(v_2) $$
To build a multilinear functional \(T:V_1 \times \dots \times V_N \to K\), we do exactly the same thing, and we get:
$$ \forall v_i\in V_i: T(v_1,\dots ,v_N)=\omega_1(v_1)\cdot\cdot\cdot\omega_N(v_N) $$
We denote such a multilinear functional as… surprise \(T=\omega_1\otimes\dots\otimes\omega_N\). We can prove all multilinear functionals can be written in this way.
It’s also possible to prove that this product of \(\omega_i\) is multilinear (and it is clearly associative!). That means we can throw all of these multilinear functionals inside a vector space, which we conveniently call \(V_1^*\otimes \dots \otimes V_N^*\) (simply because the product is between members of \(V_i^*\)). This is called a tensor space, and is a more general object than our previous tensor algebra. In fact, the tensor algebra on \(V\) is simply:
$$ \tensalg{V} = (K) \oplus (V) \oplus (V\otimes V) \oplus (V\otimes V \otimes V) \oplus \dots $$
Where \(\oplus\) is the direct sum of vector spaces. Note the \(K\) is there as the “empty product”, a constant linear functional.
But this is still not what is commonly called just “a tensor”. No. What is commonly called “a tensor” in physics is a multilinear functional on \(V^p \times {V^*}^q\). A member of \({V^*}^{\otimes p}\otimes V^{\otimes q}\) (you can guess \(V^{\otimes q}\) is the \(\otimes\) product of \(V\) by itself \(q\) times).
What use is all of this? The intuitive answer (for me, at least), is to be found in the formalism derived from this: (abstract) index notation.
Let’s build up to it. We will denote a vector \(\mathbf{v}\) as \(v^i\). You should think of the index (not exponent) \(i\) as a label of the components of \(\mathbf{v}\) in some basis — we don’t care which one. On the other hand, we will denote a covector \(\mathbf{w}\) as \(w_j\), where the index \(j\) should be thought about in the same way: a component label in an arbitrary basis. Now recall \(w_j\) as a covector is a linear functional on vectors. We will denote the application of the functional \(w_j\) to a vector \(v_i\) like this:
$$ \mathbf{w}(\mathbf{v}) = w_i v^i $$
As a side note, Riesz’s representation theorem tells us that if \(V\) happens to be a Hilbert space (i.e. it is a complete inner product space), then there exists a \(\mathbf{x}\in V\) such that:
$$ \mathbf{w}(\mathbf{v}) = w_i v^i = \langle \mathbf{x} \, ; \mathbf{v} \rangle $$
That is, in a Hilbert space, all linear functionals (technically, only continuous linear functionals) can be expressed as an inner product! That means essentially that applying the functional \(\mathbf{w}\) is the same thing as taking the inner product with \(x\):
$$ \mathbf{w}(\cdot) = \langle \mathbf{x} \, ; \cdot \rangle $$
So we can associate \(\mathbf{w} \in V^*\) with \(\mathbf{x} \in V\). For instance, in \(\R^n\) (and by extension in finite-dimensional vector spaces with an induced inner product) that means every vector \(\mathbf{x}\) is also a linear functional given by:
$$ \mathbf{x}(\mathbf{v}) = x_i v^i = \langle \mathbf{x} \, ; \mathbf{v} \rangle = \mathbf{x}\cdot \mathbf{v} = \displaystyle \sum_{i=1}^{n} x_i v_i $$
That means in this case, the notation \(w_i v^i\) actually means summing over the index \(i\). This is what is known as Einstein summation convention.
Writing the application of a linear functional \(w_i\) to a vector \(v^j\) as \(w_i v^i\) (repeating the indices) suggests we can extend this formalism to other, more general linear transformations. We write a linear transformation from a vector space to itself as \(L_i^j\), so its application to a vector \(v^k\) is:
$$ L_i^j v^i = \displaystyle \sum_{i=1}^{n} L_i^j v^i = (Lv)^j $$
Where \(Lv=L(v)\) is the transformed vector. What happens is that the repeated indices get summed away and disappear. We call this “index contraction”.
A linear transformation is a special case of a tensor. In the case above, \(L_i^j\) took a vector and returned a vector. But the resulting vector \((Lv)^j\) is an operation on covectors! We can interpret \(L_i^j\) as taking a vector \(v^i\) to return a vector \(L_i^j v^i = (Lv)^j\) which can then take a covector \(w_j\) to return a scalar \((Lv)^j w_j\). More succinctly, \(L_i^j\) takes a vector \(v^i\) and a covector \(w_j\) to return a scalar \(L_i^j v^i w_j\).
In symbols, our tensor index formalism looks like: \(L(v,w) = L_i^j v^i w_j\).
Note how this adds a whole new level of versatility! Instead of just passing linear transformation vectors to give you vectors you can:
- Pass them a scalar \(\lambda\) to give you a linear transformation \(\lambda L_i^j\)
- Pass them a vector \(v^i\) to give you a vector \(L_i^j v^i = (Lv)^j\)
- Pass them a covector \(w_j\) to give you a covector \(L_i^j w_j = (wL)_i\)
- Pass them a a vector \(v^i\) and a covector \(w_j\) to give you a scalar \(L_i^j v^i w_j = wLv\)
- Pass them a linear transformation \(M_k^i\) to give you another linear transformation \(L_i^j M_k^i = (LM)_k^j = LM\)
Now you can extend this to tensors, which are multilinear functionals. You represent a tensor \(T\) that takes \(p\) vectors and \(q\) covectors as \(T_{j_1…j_p}^{i_1…i_q}\). Therefore:
$$ T(v_{(1)},…,v_{(p)},w_{(1)},…,w_{(q)})=T_{j_1…j_p}^{i_1…i_q} {v_{(1)}}^{j_1}…{v_{(p)}}^{j_p} {w_{(1)}}_{i_1}…{w_{(q)}}_{i_q} $$
I know, this is confusing! The things \((1)…(p)\) and \((1)…(q)\) in parentheses are labels of our lists of vectors and covectors. The \(v\)s are the vectors, while the \(w\)s are the covectors. The indices not in parentheses are the actual indices we talked about.
The pair \((p,q)\) (which is the number of lower and upper indices respectively) is the rank of the tensor. For reasons I won’t explain here, a rank-\((p,q)\) tensor is called a \(p\)-covariant \(q\)-contravariant tensor. Similarly, a vector is called a contravariant vector, while a covector is called a covariant vector.
Of course, our index formalism allow us to only pass some of the arguments and obtain a lower rank tensor (because some indices are killed in the process). For instance:
$$ T_{j_1…j_p}^{i_1…i_q} v^{j_1} = T_{j_2…j_p}^{i_1…i_q} $$
But we can also “build up”. And this relates directly to the tensor product.
The tensor product allows you to build tensors. If, again, you let the \(v\)s represent the vectors, while the \(w\)s represent the covectors:
$$ v_{(1)}\otimes …\otimes v_{(p)} \otimes w_{(1)}\otimes …\otimes w_{(p)} = {v_{(1)}}^{j_1}…{v_{(q)}}^{j_q} {w_{(1)}}_{i_1}… {w_{(p)}}_{i_p} = T_{j_1…j_p}^{i_1…i_q} $$
Which is a rank-\((p,q)\) tensor (meaning you can pass \(p\) vectors and \(q\) covectors to it). If you want to compute the output of this tensor when given another tensor, you once again do it by index contraction. But to “build it up” we simply concatenated upper and lower indices. In this way, killing indices corresponds to performing an inner product, and this concatenation usually goes by the name outer product.
So what is a tensor?
- A multilinear functional \(V^p \times {V^*}^q \to F\)
- A member of the tensor product \({V}^{\otimes q} \otimes {V^*}^{\otimes p}\)
- More generally, you can throw in all tensors (of all ranks) on a single space, with addition, scalar multiplication and the tensor product (what is called an algebra)
- They are represented as:
$$ v_{(1)}\otimes …\otimes v_{(p)} \otimes w_{(1)}\otimes …\otimes w_{(p)} = {v_{(1)}}^{j_1}…{v_{(q)}}^{j_q} {w_{(1)}}_{i_1}… {w_{(p)}}_{i_p} = T_{j_1…j_p}^{i_1…i_q} $$
Which endows them with a powerful index formalism.
Next up is the exterior algebra. It’s a bit similar. Well, it is almost exactly the same, with one slight tweak. Then exterior algebra of a vector space is almost the same as its tensor algebra, but with the condition that \(v\otimes v = 0\). This product will be \(v_1 \wedge v_2\), with \(v_1,v_2 \in V\). We proceed in the same way as with the tensor algebra: we build products and add them up. The resulting structure is the exterior algebra \(\bigwedge V\).But our little new restriction \(v\wedge v = 0\) means some things turn out a bit different. In addition to the properties of the tensor product, you can prove that.
- The product is anticommutative. For \(v_1,v_2 \in V\), \(v_1 \wedge v_2 = -(v_2 \wedge v_1)\).
- If two factors in the product are the same, the product is zero. Even more, because of multilinearity, if two factors are linearly dependent, then the product is zero.
We say \(\bigwedge V\) is the quotient algebra of \(\tensalg{V}\) by \(v\otimes v\), and write \(\bigwedge V = \tensalg{V} / v\otimes v\).
We can again also define the \(k\)-th exterior power of \(V\), denoted \(\bigwedge^k V\), which restricts the exterior algebra to just products of exactly \(k\) vectors in \(V\). Elements of the \(k\)-th exterior power of \(V\) are called \(k\)-vectors. Similarly to the tensor algebra, we have:
$$ \bigwedge = \bigwedge\nolimits^0 V \oplus \bigwedge\nolimits^1 V \oplus \bigwedge\nolimits^2 V \oplus \dots $$
Where \(\bigwedge^0=K\). Note this direct sum is not infinite, since \(\bigwedge^k = 0\) for \(k\) greater than the dimension of \(V\).
With this, you can work out bases for tensor and exterior products of vector spaces. If you think about it a little bit more, you can think about how a linear transformation on a vector space \(V\) induces a unique linear transformation in tensor and exterior powers (and its tensor and exterior algebra). This ultimately leads into defining these algebras by means of universal properties. Cute.
But what’s all this for?
This question will be ultimately resolved when we carry these constructions into the context of manifolds. But for now, let’s just say tensors and \(k\)-vectors ultimately seem to encode geometric constructs.
How tensors relate to geometry looks simple at first glance. They are, after all, just linear transformations. The power of tensors lies more in the language they provide us to talk about more complex (multi)linear transformations. Either way, here is a perhaps unexpected example of a tensor: an inner product.
$$ \langle \cdot ; \cdot \rangle : V\times V \to K $$
Well… this is a multilinear map. So we are done. We can write the inner product as a tensor \(T_{ij}\), and now the inner product between \(v\) and \(u\) is \(T_{ij}v^iu^j\).
The true power of all this will manifest itself when we talk about integration of forms and tensor fields on manifolds. I realize this post was perhaps all syntax and little true content. But this language is extremely useful to express powerful ideas. We shall soon see.