\(\newcommand{\atantwo}{\text{atan2}} \) \(\newcommand{\R}{\mathbb{R}} \) \(\newcommand{\vec}[1]{\boldsymbol{\mathbf{#1}}} \) \(\newcommand{\ver}[1]{\boldsymbol{\mathbf{\hat #1}}} \) \(\newcommand{\tensalg}[1]{\mathcal{T}(#1)} \)

Everyone seems comfortable with polar coordinates when they see them.  Just another way to describe points on a plane, right? Not for me. For some reason, they made me very uncomfortable.

We can translate between Cartesian and polar coordinates as:

$$ \begin{align} x &= r\cos \theta \\ y &= r\sin \theta \end{align} $$

Of course, this transformation is invertible, and we can get:

$$ \begin{align} r &= \sqrt{x^2+y^2} \\ \theta &= \atantwo(y,x) \end{align} $$

The transformation is also differentiable. However, it is not linear. That’s where my problems started.

You see, \((x,y)\) is a vector. You can write it in coordinates other than the standard ones by applying a change of basis. However, when we map \((x,y) \mapsto (r,\theta)\), the pair \((r,\theta)\) does not behave like a vector. You cannot add them component wise. You cannot multiply them by a scalar.

All right then. Let’s just view \(\R^2\) as a set, and this map as correspondence with \([0,\infty)\times [0,2\pi) \). This seems a bit arbitrary. After all, we are describing the same plane in different ways, and this way of seeing things does not tell us that. This is just an arbitrary set isomorphism. Uninteresting. We should see some underlying geometry going on.

And then it gets even worse! We are told we can write vectors in the vector space \(\R^2\) down as \(\vec{r}=r \ver{r} \). Here:

$$ \ver{r}=\begin{bmatrix} \cos \theta \\ \sin \theta \end{bmatrix} $$

So it looks like vector spaces do have something to do with this idea of “change of coordinates”. In fact, we just wrote \((x,y)\) using our new expressions. To complicate things even more, professors usually call this \(\ver{r}\) a “new basis vector”, and tell you it depends on position. They will then give you a differentiable trajectory \(\vec{r}(t)\) and come up with this brilliant idea: let’s compute the velocity! Here we go:

$$ \dfrac{d\vec r}{dt} = \dfrac{dr}{dt}\ver r + r \dfrac{d\ver r}{dt} $$

You see, this new “basis vector” also depends on time, so you gotta use the product rule. And, by the way, there is another basis vector I haven’t told you about.

$$ \ver{\theta}=\begin{bmatrix} -\sin \theta \\ \cos \theta \end{bmatrix} = \dfrac{\partial \ver r}{\partial \theta} $$

I don’t know about you, but I couldn’t tell what these were, exactly. They are certainly not a basis for our position in space, since we just write \(\vec{r}=r \ver{r} \). But when we calculate the velocity, or the acceleration,  they do show up as a basis:

$$ \dfrac{d\vec r}{dt} = \dfrac{dr}{dt}\ver r + r \dfrac{d\theta}{dt}\ver\theta $$

$$ \dfrac{d^2\vec r}{dt^2} = \left[ \dfrac{d^2r}{dt^2} – r\left( \dfrac{d\theta}{dt} \right)^2 \right] \ver r + \left[ 2\dfrac{dr}{dt}\dfrac{d\theta}{dt} + r\dfrac{d^2\theta}{dt^2} \right] \ver \theta $$

These “basis vectors” depend on position. They assign a basis to each point in space, in such a way that we can “describe” vectors “attached” to that point, such as velocities or accelerations. That is something we can understand conceptually: we just draw orthogonal arrows at each point of the plane. But still, what’s going on behind the scenes, formally? I had never seen a mapping from a vector (position) to a basis of a vector space “attached” to that vector. That kind of thing never showed up in linear algebra class.

And this is where things get interesting.

Our goal is to understand polar coordinates. Here is more or less how I stumbled upon each of the pieces.

Stumble 2: Smooth manifolds and tangent spaces

Yeap, we are not starting in order. My “stumble 1” will be on another post.

“Manifold” is a thing I kept hearing about. They turned out to be a key piece of the puzzle. Specifically, I want to talk about smooth manifolds.

An \(n\)-dimensional manifold is what you get when you take many copies of \(\R^n\), deform them continuously and then put the pieces together (technically, the resulting topological space needs to be Hausdorff, so we can safely talk about certain things). We say a manifold is locally homeomorphic to \(\R^n\). Let’s put this more precisely!

A manifold \(M\) comes equipped with charts. Every point \(x\in M\) has an open neighborhood \(U\) such that there is a homeomorphism (continuous map with a continuous inverse) \(\phi\) between \(U\) and an open ball in Euclidean space.  The ordered pair \((U,\phi)\) is called a chart, and it gives a way to relate each piece of \(M\) to something we know really well. That is, if you get really close to a point on \(M\), it looks just like if you continuously twisted \(\R^n\) a little bit. These nice tools are collected in an atlas \(\mathcal{A} = \{(U_\alpha,\phi_\alpha) \}_\alpha \), where the \(U_\alpha\) cover all of \(M\). Of course, many atlases may end up defining the same manifolds (you can equate this idea to how many (branched) parameterizations can produce the same curve or surface). By the way, charts \((U,\phi)\) are often called local coordinates on \(U\) or local frames.

For example, the (boundary of) a unit square is a manifold. To see that, you can parameterize each side, and then glue together the parameterizations to make four charts (charts need to be maps between open sets, and they need to cover the entire manifold in order to make an atlas). Here is you could do it.

Each pair of “parenthesis” shows where the corresponding chart starts and ends. You can see there is overlap between them.

Yes, we are starting to get to an important point. But first, we need additional structure. For reasons that will be clear below, we would like our manifolds to be smooth.

We say two charts \(\phi_i, \phi_j\) on domains \(U_i, U_j\) are \(\mathcal{C}^k\)-compatible or \(\mathcal{C}^k\)-related if \(\phi_i \circ \phi_j^{-1}: \phi_j(U_i\cap U_j) \to \phi_i(U_i\cap U_j)\) is \(\mathcal{C}^k\). \(\mathcal{C}^k\)-compatibility is an equivalence relation.

Now imagine you already have a manifold with an atlas of \(\mathcal{C}^\infty\)-related charts. Include in this atlas all other charts which are \(\mathcal{C}^\infty\)-related to the charts you already have in the atlas. Continue this process until you don’t have more charts left to include.  This is your unique maximal atlas, and your manifold is now a \(\mathcal{C}^\infty\) or smooth manifold.

You might wonder why we went through all this trouble. The reason for this is that we want to be able to talk about differentiability of functions \(f:M\to \R\). We could say \(f\) is differentiable on \(U_i\) whenever \(f\circ \phi_i^{-1}\) is. But that fails if we consider an overlapping chart \((U_j,\phi_j)\) with \(U_i\cap U_j \neq \emptyset \), since even if \(f\circ \phi_i^{-1}\) is differentiable, \(f\circ \phi_j^{-1}\) need not be. Indeed, if we restrict the domains to \(U_i\cap U_j\):

$$ f \circ \phi_j^{-1} = f\circ \phi_i^{-1} \circ (\phi_i \circ \phi_j^{-1}) $$

And \(f\circ \phi_j^{-1}\) is differentiable if and only if \(\phi_i \circ \phi_j^{-1}\) is. So we are stuck with a function that may or may not be differentiable at a point depending on our arbitrary choice of charts. That’s not good.

But wait! We just said \(\phi_i \circ \phi_j^{-1}\) is always smooth in a smooth manifold. Bingo! We now have a consistent notion of differentiability, but just on manifolds that happen to be smooth. Note \(f \circ \phi_j^{-1} = f\circ \phi_i^{-1} \circ (\phi_i \circ \phi_j^{-1})\) is an equality between a scalar field on \(\R^n\) and the composition of a scalar field on \(\R^n\) and a vector field on \(\R^n\). We know how to differentiate these. Let’s do it:

$$ \nabla [f \circ \phi_j^{-1}] = \vec{J}_{\phi_i \, \circ \, \phi_j^{-1}}^\top \nabla [f \circ \phi_i^{-1}]  $$

Where \(\vec{J}_{\phi_i\, \circ \, \phi_j^{-1}}^\top\) denotes the transpose of the Jacobian matrix of \(\phi_i \circ \phi_j^{-1}\). This tells us how the gradient of \(f\) changes when we change charts. Or, in more familiar terms, it tells us how the gradient of \(f\) changes when we change coordinates.

Let’s talk a bit about coordinates. Say we have a point \(p\in U \subseteq M\), and a chart on it \((U,\phi)\). The mapping \(\phi\) translates our points on the manifold to points on \(\R^n\). That means each \(p\in U\) has corresponding coordinates \(\phi(p)=(x^1,\dots ,x^n)\). That means \(\phi\) sets up local coordinates on \(U\), given by the coordinate functions \(\phi(p)=(x^1(p),\dots ,x^n(p))\). These coordinate functions project a point in the manifold to its \(i\)-th coordinate relative to some specified chart. So we can say things like charts \(\phi , \varphi\) on the same domain giving rise to different coordinates \(x^i,y^i\).

In the image, you can see how this plays out. The chart \(x\) moves the coordinate grid from some region in \(\R^n\) into an open set on the manifold.

Now we will make this a little bit more interesting (though I do hope you are already interested). We said the gradient “changes” when we change coordinates. However, the gradient is still a single, well-defined object. It should not change its direction or magnitude just because we chose another arbitrary chart. To dive into this question, let’s throw more stuff into the mix.

To start thinking about directions, we will consider a smooth curve on the manifold \(\gamma: [0,1]\to M\). Differentiability for curves is defined in a way analogous to that of functions. You can probably guess where this is going: we will try to define tangent vectors. Let’s get us an arbitrary smooth function \(f:M\to\R\). Now, let’s look at \(f\circ\gamma\), which is just \(f\) restricted to the curve in the manifold. If we make the curve start at \(\gamma(0)=p\), then calculating it’s derivative at \(t=0\) should tell us how \(f\) changes as we move a tiny bit away from the starting point. You can think of it this way. In multivariable calculus, the quantity:

$$ \left. \dfrac{d(f\circ \gamma)(t)}{dt}\right|_{t=0} $$

Is the directional derivative of \(f\) in the direction \(\gamma'(0)\), and we have:

$$ \left. \dfrac{d(f\circ \gamma)(t)}{dt}\right|_{t=0} = \nabla f(\gamma(0)) \cdot \gamma'(0) $$

This is what we are trying to model. Going back to the context of a manifold, we have:

$$ \begin{align*} \left. \dfrac{d(f\circ \gamma)(t)}{dt}\right|_{t=0} &= \left. \dfrac{d(f\circ \phi^{-1} \circ \phi \circ \gamma)(t)}{dt}\right|_{t=0} \\ &= \displaystyle \sum_i \left. \dfrac{\partial (f\circ \phi^{-1})(x)}{\partial x_i}\right|_{\phi(p)} \left. \dfrac{d(\phi \circ \gamma)^i(t)}{dt}\right|_{t=0} \end{align*} $$

Where \((\phi \circ \gamma)^i\) refers to the \(i\)-th component. This is the equivalent of our directional derivative. We may recognize:

$$ v^i = \left. \dfrac{d(\phi \circ \gamma)^i(t)}{dt}\right|_{t=0} $$

As the components of our tangent vector in some basis. Which basis? Well, we don’t know yet! Let’s find out.

The first thing we are going to do is modify a little bit our notation. These \(\phi\) everywhere are getting annoying. If \(x=(x^1,…,x^n)\) are local coordinates set up by \(\phi\), we write:

$$ \left( \dfrac{\partial}{\partial x^i} \right)_p (f) = \left. \dfrac{\partial (f\circ \phi^{-1})(x)}{\partial x_i}\right|_{\phi(p)} $$

You can notice one expression has \(\partial x_i\) and the other has \(\partial x^i\). The first one refers to the partial derivative of any function \(\R^n \to \R\) with respect to the \(i\)-th input. The second one refers to a partial derivative of a function \(M\to \R\) when written in coordinates \((x^1,…,x^n)\).

Going back to the components \(v^i\), we see they are defined in terms of a smooth curve and with respect to a chart. Two problems here. First, we may have another curve \(\alpha\) such that we also have:

$$ v^i = \left. \dfrac{d(\phi \circ \alpha)^i(t)}{dt}\right|_{t=0} $$

So we cannot simply define:

$$ \vec v = \left. \dfrac{d(\phi \circ \gamma)(t)}{dt}\right|_{t=0} $$

As our tangent vector. Otherwise, different curves would give rise to the same tangent vector. Also… tangent vectors would depend on our arbitrary choice of chart! We define a tangent vector at \(p\) as an equivalence class of curves \([\gamma]\) such that for all \(\gamma_1,\gamma_2 \in [\gamma]\), we have \(\gamma_1(0)=\gamma_2(0)=p\) and the first-order contact relation:

$$ \left. \dfrac{d(\phi \circ \gamma_1)(t)}{dt}\right|_{t=0} = \left. \dfrac{d(\phi \circ \gamma_2)(t)}{dt}\right|_{t=0} $$

This is usually called “the equivalence class of curves initialized at \(\gamma(0)=p\) under first-order contact”. The second problem is that our equivalence classes seem to depend on the chart. But they don’t! If two curves agree on their derivative at a point on a given chart, then their derivatives when taken with any chart also agree. In other words, changing coordinates leaves our equivalence classes invariant.

Now we have a consistent definition of tangent vectors: a tangent vector at \(p\) is an equivalence class of curves initialized at \(p\) under the equivalence relation:

$$ \gamma_1 \equiv \gamma_2 \iff  \left. \dfrac{d(\phi \circ \gamma_1)(t)}{dt}\right|_{t=0} = \left. \dfrac{d(\phi \circ \gamma_2)(t)}{dt}\right|_{t=0} $$

And we can prove that if two curves are equivalent on a given chart, then they are equivalent on any chart. For this to be true, we need a smooth manifold.

We can make this set of equivalence classes into a vector space. To do that, we just need the fact that changing charts does not alter our equivalence classes and define:

$$ [\gamma_1]+[\gamma_2]=[\gamma_3] $$

Where \(\gamma_3\) is such that for any chart:

$$ (\phi \circ \gamma_3)'(0) = (\phi \circ \gamma_1)'(0) + (\phi \circ \gamma_2)'(0) $$

We define scalar multiplication analogously. This makes these things into a vector space, which we appropriately call the tangent space of \(M\) at \(p\), denoted \(T_pM\).

We will now hunt down a basis for our new vector space \(T_pM\). To do that, we will return to:

$$ \left. \dfrac{d(f\circ \gamma)(t)}{dt}\right|_{t=0} = \displaystyle \sum_i \left( \dfrac{\partial}{\partial x^i} \right)_p (f) \left. \dfrac{d(x^i \circ \gamma)(t)}{dt}\right|_{t=0} $$

Here we got the component of a vector \(v^i=(x^i\circ \gamma)'(0)\) with respect to some basis in coordinates \(x^i\). In this sense, each tangent vector gives us a directional derivative for \(f\), which we can calculate given a chart. And each directional derivative gives us the components of a vector in a given chart.  There is a one-to-one correspondence between directional derivatives at a point and tangent vectors at that point.

Not only that, but \(f\) in that expression was a completely arbitrary smooth function. Since the equality holds for any \(f\), we are actually talking about a relation between differential operators:

$$ D_\gamma = \displaystyle \sum_i \left. \dfrac{d(x^i \circ \gamma)(t)}{dt}\right|_{t=0} \left( \dfrac{\partial}{\partial x^i} \right)_p $$

Where \(D_\gamma\) is the directional derivative at \(p\) in the direction of \([\gamma]\). This is the directional derivative expressed in coordinates. But hey, there is a one-to-one correspondence of these things with tangent vectors. Whenever that happens, mathematicians just wave their hands and say they are just two different ways of writing the same object. It’s good for business. So tangent vectors at \(p\) in a given chart are written as:

$$ \vec v = \displaystyle \sum_i \left. \dfrac{d(x^i \circ \gamma)(t)}{dt}\right|_{t=0} \left( \dfrac{\partial}{\partial x^i} \right)_p $$

And the things:

$$ \vec{e}_i = \left( \dfrac{\partial}{\partial x^i} \right)_p $$

Are our basis vectors!

There are obviously \(n\) basis vectors, so \(T_pM\) is an \(n\)-dimensional vector space. As such, it is isomorphic to \(\R^n\) and we can think of this (for now!) as each point of the manifold having a copy of \(\R^n\) attached to it. We will later see to what extent this is true.

One last thing. Our vector:

$$ \vec v = \displaystyle \sum_i v^i \left( \dfrac{\partial}{\partial x^i} \right)_p $$

Was given on a specified chart. What if we wanna change charts? We are looking for the change of coordinates formula.

Recall that:

$$ \left( \dfrac{\partial}{\partial x^i} \right)_p (f) = \left. \dfrac{\partial (f\circ x^{-1})(x)}{\partial x_i}\right|_{x(p)} $$

We introduce new local coordinates \(y^i\) and a little trick:

$$ \begin{align*} \left. \dfrac{\partial (f\circ y^{-1})}{\partial x_i}\right|_{y=y(p)} &= \left. \dfrac{\partial (f\circ x^{-1}\circ x\circ y^{-1})}{\partial y_i}\right|_{y=y(p)} \\ &= \displaystyle \sum_j \left. \dfrac{\partial (f\circ x^{-1})}{\partial x_j} \right|_{x(p)} \left. \dfrac{\partial(x^j \circ y^{-1})}{\partial x_i} \right|_{y(p)} \end{align*} $$

Using our new notation:

$$ \begin{align*} \left. \dfrac{\partial (f\circ x^{-1})}{\partial x_j} \right|_{x(p)} &= \left( \dfrac{\partial}{\partial x^j} \right)_{p} (f) \\  \left. \dfrac{\partial(x^j \circ y^{-1})}{\partial x_i} \right|_{y(p)} &= \left( \dfrac{\partial}{\partial y^i} \right)_{p} (x^j) \end{align*} $$

And we may rewrite the previous equality as:

$$ \left( \dfrac{\partial}{\partial y^i} \right)_{p} = \displaystyle \sum_j  \left( \dfrac{\partial}{\partial x^j} \right)_{p} \left( \dfrac{\partial}{\partial y^i} \right)_{p} (x^j) $$

Or even:

$$ \dfrac{\partial}{\partial y^i} = \displaystyle \sum_j \dfrac{\partial x^j}{\partial y^i}  \dfrac{\partial}{\partial x^j} $$

This is the change of coordinates formula for the basis vectors. It’s saying that the partial derivative operator in coordinates \(y^i\) is a linear combination of partial derivative operators in other overlapping coordinates \(x^i\) with the coefficients being the partial derivatives of the coordinate functions \(x^i\) taken through the chart given by the \(y^i\).

We have to figure out how the components transform when we change coordinates. The components in coordinates \(y^i\) were:

$$ v^i_y = \left. \dfrac{d(y^i \circ \gamma)(t)}{dt}\right|_{t=0} $$

To express this in terms of coordinates \(x^i\):

$$ \begin{align*} v^i_y &= \left. \dfrac{d(y^i \circ x^{-1} \circ x \circ \gamma)(t)}{dt}\right|_{t=0} \\ &= \displaystyle \sum_j \left( \dfrac{\partial y^i}{\partial x^j} \right)_p \left. \dfrac{d(x^j \circ\gamma)(t)}{dt} \right|_{t=0} \\ &= \displaystyle \sum_j \left( \dfrac{\partial y^i}{\partial x^j} \right)_p v^j_x \end{align*} $$

So whenever we want to know how to express a vector under new coordinates, we simply use this formula to map \((v_x^j)_j \mapsto v_y^i\) for each \(i\) and then we write:

$$ \vec v = \displaystyle \sum_i v_y^i \dfrac{\partial}{\partial y^i} $$

Now let’s give an example. We will look at \(\R^2\) as a smooth manifold.

To make \(\R^2\) into a smooth manifold we need a maximal atlas of \(\mathcal{C}^\infty\)-related charts. It’s easy to come up with one, start with the identity function \(I(x)=x\) and add in all functions \(R\subseteq \R^2\to\R^2\) that are \(\mathcal{C}^\infty\)-related to the identity function. This gives us all smooth injections from an open set in \(\R^2\) to \(\R^2\), which forms our maximal atlas \(\mathcal{U}\). \(\mathcal{U}\) is called the usual smooth structure on \(\R^2\).

Here are some charts to use as an example. We will use polar coordinates on \(\R^2\) together with Cartesian coordinates (the identity function) on some open set around the origin. This gives:

$$ (x,y) \xrightarrow{\phi_\text{polar}} (r,\theta) = \left(\sqrt{x^2+y^2},\atantwo(y,x)\right) $$

$$ (x,y) \xrightarrow{\phi_\text{Cartesian}} (x,y) $$

Technically, the angle fails to be continuous at the non-negative \(x\)-axis, so we should exclude that from the domain of polar coordinates. We can fix this, but we won’t need to. When they overlap, the first expression also tells us what \(\phi_\text{polar} \circ \phi_\text{Cartesian}^{-1} \) is.

What are our tangent spaces? Pick a point \(p=(p_1,p_2)\in\R^2\). A tangent vector at this point is an equivalence class \([\gamma] \) of curves initialized at \(p\) such that given coordinates \(x\):

$$ \gamma_1,\gamma_2\in [\gamma] \iff (x\circ\gamma_1)'(0)=(x\circ\gamma_2)'(0) $$

In particular, we may use the identity function as a chart, giving:

$$ \gamma_1,\gamma_2\in [\gamma] \iff (\gamma_1)'(0)=(\gamma_2)'(0) $$

So a tangent vector at \(p\) is the class of all curves starting at \(p\) such that they have the same derivative (we are using the fact that changing charts leaves our classes invariant here!). Recalling that the derivative is a vector tangent to the curve, what we really have is that each equivalence class corresponds to the tangent vector that happens to be the derivative at \(t=0\) of all the curves in that class. So each equivalence class is really a tangent vector at \(p\)! Just little (or big, whatever) arrows attached to \(p\).

The equivalence class records two pieces of information: the point \(p\) at which the curves are initialized and the derivative of the curves at \(t=0\) when taken through a chart.

We may write the tangent vectors in a basis using coordinates \(x^i\):

$$ \vec v = v^1 \left( \dfrac{\partial}{\partial x^1} \right)_{(p_1,p_2)} + v^2 \left( \dfrac{\partial}{\partial x^2} \right)_{(p_1,p_2)} $$

Now, let’s take our chart to be the ones described above (polar + Cartesian). Our vector above ends up being (assuming we are away from the origin):

$$ \vec v = v_r \left( \dfrac{\partial}{\partial r} \right)_{(p_1,p_2)} + v_\theta \left( \dfrac{\partial}{\partial \theta} \right)_{(p_1,p_2)} $$

Now recall the change of coordinates formula for a vector. We will write this vector down in Cartesian coordinates \((x,y)\). Using our change of coordinates formula for our components, we have:

$$ \begin{align*} v_x &= \dfrac{\partial x}{\partial r} v_r + \dfrac{\partial x}{\partial \theta} v_\theta \\ v_y &= \dfrac{\partial y}{\partial r} v_r + \dfrac{\partial y}{\partial \theta} v_\theta \end{align*} $$

Calculating the partial derivatives:

$$ \begin{align*} v_x &= v_r \cos\theta – r v_\theta \sin\theta \\ v_y &= v_r \sin\theta + r v_\theta \cos\theta \end{align*} $$

Putting it all together:

$$ \vec v = (v_r \cos\theta – r v_\theta \sin\theta) \dfrac{\partial}{\partial x} + (v_r \sin\theta + r v_\theta \cos\theta) \dfrac{\partial}{\partial y} $$

The \((r,\theta)\) refer to the coordinates of the point to which \(\vec v\) is tangent. The \(v_r, v_\theta\) are the components of the vector written out in that coordinate system. You may recognize this written out in matrix form:

$$ \vec v = \begin{bmatrix} \cos\theta & – r \sin\theta \\ \sin\theta &  r \cos\theta \end{bmatrix} \begin{bmatrix} v_r \\ v_\theta \end{bmatrix} $$

This transformation matrix takes the components of \(\vec v\) in polar coordinates and spits out its components in Cartesian coordinates.

By the way, check out the columns of the matrix. Look who decided to show up!

$$ \begin{align*} \text{first column} &= \ver r \\ \text{second column} &= r\ver \theta \end{align*} $$

So this matrix is really a change of basis matrix! The difference is that we have one matrix for each tangent space within the chart (since the matrix depends on \((r,\theta)\)). The things \((\ver r, \ver \theta)\) form a basis for each tangent space. That’s why they “depend on position”. They actually depend on which tangent space we want a basis for!

You may be wondering where this extra \(r\) factor comes from. Truth it, the basis given by partial derivatives is not always orthonormal. The \(1/r\) is the normalization factor.

We can get our “position-dependent basis vectors” (oh boy, that phrase was so confusing for me!) in yet another way: by transforming our Cartesian basis vectors with our change of coordinates formula.

$$ \begin{align*} \dfrac{\partial}{\partial r} &= \dfrac{\partial x}{\partial r}\dfrac{\partial}{\partial x} + \dfrac{\partial y}{\partial r}\dfrac{\partial}{\partial y} \\ \dfrac{\partial}{\partial \theta} &= \dfrac{\partial x}{\partial \theta}\dfrac{\partial}{\partial x} + \dfrac{\partial y}{\partial \theta}\dfrac{\partial}{\partial y} \end{align*} $$

Calculating the partial derivatives:

$$ \begin{align*} \dfrac{\partial}{\partial r} &= \cos\theta \dfrac{\partial}{\partial x} + \sin\theta \dfrac{\partial}{\partial y} = \ver r \\ \dfrac{\partial}{\partial \theta} &= -r\sin\theta \dfrac{\partial}{\partial x} + r\cos\theta \dfrac{\partial}{\partial y} = r\ver \theta \end{align*} $$

And, again, we got them. There is a comment to make here. The partial derivatives \(\partial/\partial x\) and \(\partial/\partial y\) are with respect to coordinates in the identity chart (which we called Cartesian coordinates). If you go back to the definition of this notation above, you will see these work out to be exactly the partial derivatives we are used to, as they should. The only difference is that they come with an “evaluate at this specific point”. This difference comes from the fact the \(\partial/\partial x\) and \(\partial/\partial y\) are basis vectors for a a given tangent space, and hence depend on position.

Now, as basis vectors they tell us how a function change when we move from our point \(p\) in the direction given by the vector in \(T_p\R^2\). In this case, the standard partial derivatives tell us how functions change when we move in the \(x\) and \(y\) directions. And the partial derivatives in Cartesian coordinates are the standard partial derivatives! That means these “basis vectors” go in the same direction as our “standard basis for \(\R^2\)”. The only difference is, again, they come with evaluation at the point \(p\).

So the \(\partial/\partial x\) and \(\partial/\partial y\) are essentially the standard basis vectors, with the extra piece of information that they start at \(p\). It’s common to write this out as:

$$ \begin{align*} \left( \dfrac{\partial}{\partial x} \right)_p &= (p, \ver x) \\ \left( \dfrac{\partial}{\partial y} \right)_p &= (p, \ver y) \end{align*} $$

Where \(\ver x, \ver y\) are the standard basis vectors. That gives us a new way to write out our tangent spaces on \(\R^n\) as:

$$ (p,\vec v) \in T_p \R^n $$

Where \(\vec v \in \R^n\). The vector operations on \(T_p \R^n\) are then defined as usual:

$$ \begin{align*} (p,\vec v_1) + (p,\vec v_2) &= (p,\vec v_1 + \vec v_2) \\ \lambda (p,\vec v) = (p,\lambda \vec v) \end{align*} $$

You can imagine this as moving over a copy of \(\R^n\) to the point \(p\in \R^n\). Like this.

So why did we pull off all that obscure equivalence class stuff with curves? Why did we not define the tangent space in this way for general manifolds?

Because we can’t. Say we do. Imagine our manifold is sphere. A sphere \(S\) is a 2-dimensional manifold (we are talking about a surface, so it locally looks like \(\R^2\)). We attach a copy of \(\R^2\) to a point \(p\). Then I give you the vector \((p,(1,2)^\top)\in T_pS\).

Which direction is this tangent vector pointing in? As you can see, you would need a convention on how exactly you are orienting this tangent plane. Our definition in terms of how equivalence classes of curves does not have this problem. It clearly formulates a vector in terms of how an arbitrary “test” function changes in a given direction.

So what are polar coordinates? The answer itself is pretty boring. They are just a chart on the manifold \(\R^2\), giving rise to local coordinates and hence basis vectors for each tangent space. That’s how they are “position-dependent”, we are modelling physical space as a smooth manifold! Soon, we won’t even care about that anymore, they are just an example. But what we discovered on the way is a whole new world.

Still, there will probably be more posts about this. You damn \((r,\theta)\), I’m not done with you yet!