Higher Homotopy Groups via Vector Calculus

Leave a comment

This semester I taught an undergraduate course on topology (continuity, compactness, connectedness, and basic homotopy theory) and on the last day of class I decided to give a brief introduction to the theory of higher homotopy groups. For motivation, consider the classical Brouwer fixed point theorem:

Theorem: Every continuous function f \colon B^2 \to B^2 has a fixed point, where B^2 is the closed unit ball in the plane.
Suppose f has no fixed point point, meaning x and f(x) are distinct for every x \in B^2. Define a function r \colon B^2 \to S^1 (where S^1 is the boundary circle of B^2) as follows. Given x \in B^2 there is a unique line in the plane containing both x and f(x), so there is a unique line segment containing x whose endpoints consist of f(x) and a point on S^1. Define r(x) to be the endpoint on S^1. Explicit calculations (using the continuity of f) show that r(x) is continuous, and moreover if x \in S^1 then r(x) = x. A continuous function from a topological space X to a subset A \subseteq X which restricts to the identity on A is called a retraction; we have shown that if there is a continuous function f \colon B^2 \to B^2 with no fixed points then there is is a retraction r \colon B^2 \to S^1.

Let us use algebraic topology to prove that there is no such retraction. Let i \colon S^1 \to B^2 denote the inclusion map, so that r \circ i \colon S^1 \to S^1 is the identity. Passing to the induced homomorphism on fundamental groups, this shows that r_* \circ i_* \colon \pi_1(S^1) \to \pi_1(S^1) is the identity and hence r_* is surjective. But \pi_1(B^2) is the trivial group since B^2 is contractible and \pi_1(S^1) \cong \mathbb{Z}, so r_* \colon \pi_1(B^2) \to \pi_1(S^1) could not possibly be surjective, a contradiction. QED

One might wonder if the argument above works for the closed unit ball B^{n+1} \subseteq \mathbb{R}^{n+1}. Indeed, the first part of the argument works in higher dimensions almost verbatim, and one gets that any continuous function f \colon B^{n+1} \to B^{n+1} gives rise to a retraction r \colon B^{n+1} \to S^n onto the boundary sphere. But the second part of the argument fails: the fundamental group of S^n is trivial for n > 1, so there is no contradiction. The solution is to replace the fundamental group \pi_1 with the higher homotopy group \pi_n; whereas \pi_1(X) is the group of homotopy classes of continuous maps S^1 \to X, \pi_n(X) is the group of homotopy classes of continuous maps S^n \to X (of course, all spaces, maps, and homotopies must have base points).

In the proof of the Brouwer fixed point theorem above, we only needed three properties of the fundamental group:

  • Every continuous map f \colon X \to Y induces a group homomorphism f_* \colon \pi_1(X) \to \pi_1(Y).
  • If f_1, f_2 \colon X \to Y are homotopic continuous maps then (f_1)_* = (f_2)_* \colon \pi_1(X) \to \pi_1(Y).
  • \pi_1(S^1) is not the trivial group.

The first two of these properties generalize to higher homotopy groups with almost identical proofs. The counterpart of the third property, namely that \pi_n(S^n) is not the trivial group, is considerably more difficult. One typically computes \pi_1(S^1) using covering space theory, but there is no counterpart of covering space theory for higher homotopy groups. (Well, such a theory does exist in a manner of speaking, but it is much more complicated than covering space theory.)

To actually compute \pi_n(S^n) one needs some rather powerful tools in algebraic topology, such as the Freudenthal suspension theorem or the Hurewicz isomorphism. The difficulty of this computation is still a bit mysterious to me, and was the subject of one of my recent MathOverflow questions. Even the more modest goal of proving that \pi_n(S^n) is non-trivial is quite a bit more challenging for n > 1 than for n = 1. Nevertheless, I came up with an argument in the case n = 3 based on vector calculus which is suitable for undergraduates; I don’t think I’ve seen this exact argument written down anywhere else, so I thought I would write it up here. It is adapted from a more standard argument involving Stokes’ theorem on manifolds which works in any dimension (but which requires a semester’s worth of manifold theory to understand).

I will prove the following statement:

Main Theorem: There is no continuous retraction f \colon B^3 \to S^2.

This alone is enough to prove the Brouwer fixed point theorem for B^3 without having to worry about higher homotopy groups, but in fact it implies that \pi_2(S^2) is nontrivial. Pick a base point p \in S^2 and consider the identity map I \colon S^2 \to S^2. This determines a class [I] \in \pi_2(S^2,p), so if \pi_2(S^2,p) is the trivial group then there is a base point preserving homotopy between I and then constant map e_p \colon S^2 \to S^2 given by e_p(x) = p. This homotopy is a continuous map H \colon S^2 \times [0,1] \to S^2 which satisfies:

  • H(x,0) = x
  • H(x,1) = p
  • H(p,t) = p for all t

Given such a homotopy, define r \colon B^3 \to S^2 by r(x) = H(\frac{x}{|x|}, |x|) if x \neq 0 and r(0) = 0. It is not hard to check that r is a retraction, contradicting the main theorem.

To prove the main theorem we need a technical lemma:

Lemma: If there is a continuous retraction r \colon B^3 \to S^2 then there is a smooth retraction.

The proof of this lemma uses some slightly complicated analysis, but ultimately it is fairly standard; see the final chapter of Gamelin and Greene’s “Introduction to Topology”, for example. The only other non-trivial input required to prove the main theorem is the following classical result from vector calculus:

Divergence Theorem: Let E be a compact subset of \mathbb{R}^3 whose boundary is a piecewise smooth surface \partial E, let n denote the outward unit normal field on \partial E, and let F be a smooth vector field on E. Then:

\int_{\partial E} F \cdot n\, dS = \int_E div F\, dV

Here div (“divergence”) is the differential operator div(P,Q,R) = P_x + Q_y + R_z.

Proof of Main Theorem:
By the previous lemma it suffices to show that there is no smooth retraction from B^3 to S^2, so suppose r \colon B^3 \to S^2 is such a retraction and denote its component functions by r = (P,Q,R). Thus r may be viewed as a smooth vector field on B^3; since r(v) = v for v \in \partial B^3 = S^2 we have P(x,y,z) = x, Q(x,y,z) = y, and R(x,y,z) = z for every (x,y,z) \in S^2.

Consider the smooth vector field F = P(\nabla Q \times \nabla R) where \nabla is the gradient operator. We will compute the integral of F over S^2 in two different ways and get two different answers, giving a contradiction. Both computations will use the divergence theorem:

\int_{S^2} F \cdot n\, dS = \int_{B^3} div F\, dV

The first computation uses a bit of vector calculus. By the product rule for the divergence of a function multiplied by a vector field, we have:

div(P (\nabla Q \times \nabla R)) = \nabla P \cdot (\nabla Q \times \nabla R) + P div(\nabla Q \times \nabla R)

The second term on the right-hand side vanishes by the product rule for the divergence of the cross product of two vector fields:

div(\nabla Q \times \nabla R) = \nabla R \cdot curl(\nabla Q) + \nabla Q \cdot curl(\nabla R) = 0

Here we used the fact that the curl of the gradient of any smooth function is the zero vector.

According to the standard “triple product” formula from vector algebra, the first term \nabla P \cdot (\nabla Q \times \nabla R) is the determinant of the Jacobian matrix J_F associated to F whose rows consist of \nabla P, \nabla Q, and \nabla R. I claim that this determinant is zero. Since F takes values in S^2 we have that P^2 + Q^2 + R^2 = 1; differentiating both sides of this equation with respect to x gives P P_x + Q Q_x + R R_x = 0, or equivalently F \cdot F_x = 0. Similarly F \cdot F_y = 0 and F \cdot F_z = 0, so the vectors F_x, F_y, and F_z are all orthogonal to the same nonzero vector F and hence there is a nontrivial dependence relation between them. But F_x, F_y, and F_z are the columns of J_F, so it follows that \det(J_F) = 0. We conclude:

\int_{S^2} F \cdot n\, dS = 0

by the divergence theorem. (The various identities used in this argument all appear in the Wikipedia page on Vector Calculus Identities, with the notation div F = \nabla \cdot F and curl F = \nabla \times F.)

Now let us compute the same integral using the fact that F(x,y,z) = (x,y,z) on S^2. Using P = x, Q = y, and R = z we calculate that P (\nabla Q \times \nabla R) = (x,0,0) and hence div(P (\nabla Q \times \nabla R)) = 1. By the divergence theorem we get:

\int_{S^2} F \cdot n\, dS = \int_{B^3} 1\, dV = vol(B^3) \neq 0

This is a contradiction. QED


The Alternating Group is Simple III

Leave a comment

I will now conclude my series of posts about the alternating group by proving that A_n is simple for n \geq 5. Just as with A_5 I stole this argument from Dummit & Foote; while I feel I might have been able to come up with the argument for A_5 on my own, the argument for A_n is a bit too clever for me. If anyone knows who came up with it, please let me know.

We are going to use again and again the formula for conjugating permutations from my last post, so I will repeat it here for reference:

Lemma 1: Let \sigma = (a_1 a_2 \ldots a_k) be a cycle and let \tau be any permutation. Then \tau \sigma \tau^{-1} = (\tau(a_1) \tau(a_2) \ldots \tau(a_k))

Let us jump right into the proof of the main result:

Theorem: A_n is simple for every n \geq 5.

Proof: We use induction on n. The base case, n = 5, was handled in the last post. So assume that A_{n-1} is simple, and let H be a proper normal subgroup of A_n, n \geq 6. Our aim is to show that H is the trivial group.

Our first step is to prove that no non-identity element of H can fix any symbol. Let G_i denote the subgroup of A_n consisting of all elements that fix the symbol i; by Lemma 1 we have \tau G_i \tau^{-1} = G_{\tau(i)} for any permutation \tau. Note that G_i \cong A_{n-1} for each i, so if H intersects some G_i nontrivially then G_i \subseteq H by the induction hypothesis. Moreover, since any G_j can be obtained from G_i by conjugation and H is normal, we have that G_j \subseteq H for all H.

Now, any element of A_n can be written as the product of pairs of transpositions. A pair of transpositions can only permute up to four symbols, so since n \geq 5 every pair of transpositions fixes at least one symbol and hence is in some G_i. Thus every element of A_n can be written as a product of permutations each of which is in some G_i; since G_i \subseteq H, it follows that A_n \subseteq H, contradicting our assumption that H is a proper subgroup.

So no non-identity element of H can fix any symbol. Consequently, if two elements of H agree on even one symbol then they must be the same, for if \tau_1(i) = \tau_2(i) then \tau_1 \tau_2^{-1} fixes i and hence is the identity. To complete the proof we will use this observation to show that the identity is the only element of H.

  • No element of H can contain a k-cycle for k \geq 3:
    Suppose \sigma \in H contains a k-cycle (a_1 a_2 a_3 \ldots). Since n \geq 5 it is possible to choose \tau which fixes a_1 and a_2 but not a_3. By Lemma 1 we have:
    \tau (a_1 a_2 a_3 \ldots) \tau^{-1} = (a_1 a_2 \tau(a_3) \ldots)
    Thus \sigma and \tau \sigma \tau^{-1} are two permutations in H which agree on a_1 but not on a_2; this is a contradiction.
  • No element of H can be the product of disjoint 2-cycles:
    Suppose such an element \sigma were to exist. Since n \geq 6 and \sigma can’t fix any symbols, it must be the product of at least three disjoint 2-cycles:
    \sigma = (a_1 a_2)(a_3 a_4)(a_5 a_6)\ldots
    Let \tau = (a_1 a_2)(a_3 a_5). We have:
    \tau \sigma \tau^{-1} = (a_1 a_2)(a_5 a_4)(a_3 a_6)\ldots
    This time \sigma and \tau \sigma \tau^{-1} agree on a_1 and a_2 but not on a_3, a contradiction.

We conclude that no element of H can have a cycle of length larger than 1; this means that H is the trivial group.

The Alternating Group is Simple II

Leave a comment

In my last post I described the alternating group and its place in the world of groups. I will now prove that A_5 is simple, and in the third and final post of this series I will prove that A_n is simple for n \geq 5. The plan of attack is as follows: first I will carry out some preliminary analysis of conjugacy in S_n and A_n, and then by identifying all conjugacy classes in A_5 I will prove that A_5 is simple. I will then prove that A_n is simple for n \geq 5 by induction. I’m not sure who invented this argument; all I know is that I learned it in Dummit & Foote.

Conjugacy Classes in the Alternating Group

To understand the normal subgroups of a group it is very useful to first think carefully about its conjugacy classes; this is because a normal subgroup is by definition the union of conjugacy classes. Fortunately conjugation in the symmetric group is easy to understand using “cycle notation”. A k-cycle in S_n is a permutation which fixes all but k symbols a_1, \ldots, a_k which acts on these symbols as:

a_1 \to a_2 \to \ldots \to a_k \to a_1

The notation for this cycle is (a_1 a_2 \ldots a_k). It is not hard to show that every permutation decomposes as the product of disjoint cycles, and the decomposition is unique up to reordering the cycles. Indeed, cycle notation makes it particularly easy to understand conjugation.

Lemma 1: Let \sigma = (a_1 a_2 \ldots a_k) be a cycle and let \tau be any permutation. Then \tau \sigma \tau^{-1} = (\tau(a_1) \tau(a_2) \ldots \tau(a_k))

Proof: For i < k we have \tau \sigma \tau^{-1}(\tau(a_i)) = \tau \sigma(a_i) = \tau(a_{i+1}) and similarly \tau \sigma \tau^{-1}(\tau(a_k)) = \tau(a_1).

The lemma extends easily to the case where \sigma is the product of cycles, so we see that conjugation by \tau preserves the cycle structure of \sigma while relabelling the symbols in the cycle. In particular, two elements of S_n are conjugate if and only if the number and lengths of cycles are the same. For instance, (12)(345) is conjugate to (124)(35) in S_5 but not to (12345).

Note that conjugacy in A_n is a little more subtle. A k-cycle is even if and only if k is odd, but not all k-cycles are conjugate in A_n. For instance the transposition \tau = (45) conjugates (12345) to (12354) in S_5, but there is no even permutation which conjugates (12345) to (12354) and hence they are not conjugate in A_5.

To prove that A_5 is simple, we will need to determine the sizes of all of its conjugacy classes. We will do this using the following tool:

Lemma 2: Let g be an element of a group G, let Z_G(g) be the centralizer of g (i.e. the set of all elements of G which commute with g) and let C_G(g) denote the conjugacy class of g. Then |Z_G(g)| \cdot |C_G(g)| = |G|

Proof: Let G act on itself by conjugation. The orbit of g under this action is C_G(g) and the stabilizer is Z_G(g), so the result follows from the orbit-stabilizer theorem.

We will apply this lemma as follows. First we will use our understanding of conjugacy in S_n to identify the centralizer of a cycle. From that it is easy to identify the centralizer of a cycle in A_n, and that will allow us to count the conjugates of a cycle in A_n.

Proposition 3: Let \sigma \in S_n be a k-cycle. Then:
Z_{S_n}(\sigma) = \{\sigma^i \tau:\: 0 \leq i < k,\, \tau \in S_{n-k}\}

Proof: By Lemma 1, the conjugates of \sigma in S_n are precisely the k-cycles. To specify a k-cycle one must specify the symbols in the k-cycle and the order in which they appear; there are \frac{n!}{k!(n-k)!} ways to choose k symbols and k! different orders in which they can appear, though k of the orders define the same cyclic permutation. Thus there are \frac{n!}{k!(n-k)!} \cdot (k-1)! = \frac{n!}{k \cdot (n-k)!} conjugates of \sigma; by Lemma 2, |Z_{S_n}(\sigma)| = k \cdot (n-k)!.

The permutation \sigma^i clearly commutes with \sigma. Any permutation \tau which fixes the k symbols that \sigma acts on also commutes with \sigma, and the subgroup of all such permutations is isomorphic to S_{n-k}. Thus the permutations \sigma^i \tau, \tau \in S_{n-k}, all commute with \sigma; there are k \cdot (n-k)! distinct permutations of this form, so they make up the entire centralizer of \sigma.

Simplicity of A_5

We are now ready to prove the main result of this post:

Theorem: A_5 is simple.

Proof: The only possible cycle structures of non-identity elements in A_5 are (123), (12345), and (12)(34). Recall that in S_5 the cycle structure completely determines the conjugacy class; in A_5 some of these conjugacy classes may split. Let us analyze each conjugacy class in turn using Proposition 3.

  • (123): The centralizer of (123) in S_5 consists of the six permutations (123)^i \tau where i = 0, 1, 2 and \tau is either the identity or (45), so (123) has 120/6 = 20 conjugates in S_5. If \tau = (45) then (123)^i \tau is odd, so the centralizer in A_5 has only three elements and hence the number of conjugates is still 60/3 = 20. Thus all 3-cycles are conjugate in A_5.
  • (12345): The centralizer of (12345) in both S_5 and A_5 is just the cyclic subgroup \{(12345)^i\}, so there are 120/5 = 24 conjugates in S_5 and 60/5 = 12 conjugates in A_5. The other 12 elements in the S_5 conjugacy class are accounted for by the A_5 conjugacy class of (12354) which is disjoint from that of (12345).
  • (12)(34): It is straightforward to check that (12)(34) commutes with the identity, itself, (13)(24) and (14)(23). If \tau does not fix the symbol 5 then \tau (12)(34) \tau \neq (12)(34) by Lemma 1, so (12)(34) does not commute with \tau. A similar argument shows that (12)(34) does not commute with any 3-cycle, so the centralizer has exactly 4 elements and hence (12)(34) has 60/4 = 15 conjugates in A_5.

Including the identity, we have accounted for the conjugacy classes of all 60 elements of A_5: 60 = 1 + 20 + 12 + 12 + 15. So let H be a normal subgroup of A_5. Since H is normal it is the union of conjugacy classes (including the identity), so |H| is the sum of 1 and some subset of \{20, 12, 12, 15\}. But |H| must also divide |A_5| = 60; checking cases the only possible choices for |H| are 1 and 60.

The Alternating Group is Simple I


This past week I covered an abstract algebra course at Columbia, and I decided to prove that the alternating group A_n is simple. I in fact did this in the same algebra class last year, but in the intervening months I almost entirely forgot how the argument goes. So I decided to write it up here while it’s still fresh in my mind. It’s a very nice – and fairly elementary – little application of some important ideas in group theory. In this post I’m going to give some background and explain the significance of the simplicity of A_n, and in the sequel I will go through the proof.

The Alternating Group

I am forced to assume that the reader is comfortable with basic group theory, but I’ll begin by reviewing some of the key ideas. Recall that the symmetric group S_n is the group of permutations of n symbols 1, 2, \ldots, n. A transposition is a permutation which swaps exactly two symbols and leaves the others fixed; it is not hard to see that any permutation can be expressed as the product of transpositions. A permutation is said to be even (respectively, odd) if it can be written as the product of an even (respectively, odd) number of transpositions.

Definition: The alternating group A_n is the subgroup of S_n consisting of all even permutations.

A_n is a normal subgroup of S_n of index 2; the objective of this series of posts is to prove that A_n is simple for n \geq 5, meaning its only normal subgroups are itself and the trivial group. The significance of this property is that if a group G has a normal subgroup H then one can form the quotient group G/H, and often one can infer properties of G from properties of H and G/H. So simple groups are in a sense the “atoms” from which all other groups are built, though it should be noted that H and G/H alone do not uniquely determine G.

The Classification of Simple Groups

Classifying all finite simple groups was one of the great achievements of 20th century mathematics, and like many great mathematical achievements it went almost completely unnoticed by the rest of the world. The classification theorem asserts that all finite simple groups fit into a few infinite families (one of which is the family of alternating groups) with precisely 26 exceptions, the so-called sporadic simple groups. A shameless plug: when I was an undergraduate I did an REU project with Igor Kriz which involved making little computer games based on the sporadic simple groups; later we wrote a Scientific American article about them.

In any event, the classification program took decades to complete and spans thousands of pages written by dozens of mathematicians, and its completion seems to have essentially killed off finite group theory as an active area of research (though from what I understand there are lots of open problems in representation theory for finite groups). Given how monumental the effort was and how few people are still working in finite group theory, I worry that in a few decades all the experts will retire or die and there will be nobody left who understands the proof. It’s a good illustration of the principle that mathematicians tend to care much more about questions than answers.

The Alternating Group and Galois Theory

Aside from their role in the classification program, the alternating groups play a crucial role in the theory of polynomial equations. Indeed, the very notion of a group was invented to understand the structure of solutions to polynomial equations, and the group A_5 is the star of the show.

Everyone learns in high school algebra that there is a formula for the roots of a quadratic equation ax^2 + bx + c = 0:

x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}

Less well known is that there is also a cubic formula and quartic formula for degree three and four equations, respectively. These formulas date back to the 16th century, and it was a frustratingly difficult open problem to find a formula for the roots of a polynomial equation of degree five. It wasn’t until the 19th century that Abel and Galois independently realized that no such formula exists! Abel’s proof came first, but I don’t know what it was; Galois’ argument is the one that survived. Here is a brief sketch.

Galois’ key idea was to focus on the symmetries exhibited by the roots of a polynomial equation. More precisely, he considered their symmetries relative to the rational numbers; there are well-known techniques for finding rational roots of polynomials, so he was interested in the structure of the irrational roots. Let’s look at a couple examples:

  • The roots of x^2 - 2 are \pm \sqrt{2}, so you can get from one root to the other by multiplying by -1. Thus the cyclic group C_2 naturally exhibits the symmetries of the roots.
  • The roots of x^4 - 1 are i, -i, 1, and -1. Notice that you can cycle through the roots just by looking at powers of i: i^0 = 1, i^1 = i, i^2 = -1, i^3 = -1 and i^4 = 1. Thus the symmetries of the roots are given by the cyclic group C_4.
  • The roots of (x^2 - 2)(x^2 - 3) are \pm \sqrt{2} and \pm \sqrt{3}. The roots \sqrt{2} and -\sqrt{2} are interchangeable, as are \sqrt{3} and -\sqrt{3}, but over the rational numbers there is a sort of asymmetry between \sqrt{2} and \sqrt{3}. Thus the symmetry group is C_2 \times C_2.

Of course, one can make all this precise using the language of field extensions. The upshot is that the symmetry groups help characterize what it means to find a formula for the roots of a polynomial equation. As in the example above, equations of the form x^n - a = 0 have cyclic symmetry group C_n. So if the quintic formula had the form \sqrt[5]{a + \sqrt[3]{b}}, for instance, then the symmetry group could be decomposed into a C_5 part and a C_3 part corresponding to the fifth root and cube root, respectively. More precisely, a polynomial equation can be solved by radicals if and only if its symmetry group G has a decomposition

G = G_0 \supseteq G_1 \supseteq \ldots \supseteq G_n = \{1\}

where G_i is a normal subgroup of G_{i-1} and G_{i-1}/G_i is cyclic. Groups with this property are said to be solvable due to the connection with solving equations.

Now, there exist polynomials of degree 5 whose symmetry group is the full symmetric group S_5 (in fact there are many). S_5 contains A_5 as a normal subgroup with quotient C_2, but once we have proved that A_5 is simple we will know that it is not solvable: it has no nontrivial normal subgroups whatsoever, let alone one with a cyclic quotient. This argument shows that there cannot be a general formula in the spirit of the quadratic, cubic, or quartic formulas, but it also shows even more: it gives you a criterion (solvability of the symmetry group) to determine when there is a formula for the roots of a specific polynomial.


1 Comment

It’s been almost a year since I last blogged, and I’ve spent much of that time feeling guilty about not blogging enough. So here we are. I was lured out of my state of blog-apathy by a recent post by mathbabe; in fact, this will be the second of my very few blog posts inspired by that blog. If you’re not already reading that blog, you really should – it’s a brilliant mix of math, politics, economics, data, and sex.

In any event, mathbabe was commenting on a video which has apparently been making its way around the internet. In this video, some mathematicians (Or perhaps physicists? What are string theorists calling themselves these days?) attempted to explain the mind-boggling “fact” that

1 + 2 + 3 + 4 + 5... = -1/12

Watch the video if you like, but by now a number of other mathematicians have rightfully pointed out that most of the fishy manipulations in the video amount to fraudulent nonsense which can be used to justify just about anything. This infuriates me, because the people who made the video could have used the opportunity to legitimately blow people’s minds by placing the equation above (which does make sense, from the right point of view!) in its proper context and explaining some beautiful mathematics.

I don’t have the apparatus to make a cool video, but I do have a blog. So I’m going to make an attempt to do what I think the video should have done (I am not optimistic that my attempt will get picked up by Slate, of course). Instead of adding up all of the positive integers, I’m going to start by adding up all of the powers of two:

2 + 4 + 8 + 16 +... = -2

We still get a negative number, so this equation should be just as counter-intuitive as the original one (though admittedly -1/12 is pretty bizarre). Our strategy for making sense of both equations will be the same:

  1. Write down an equation which makes sense (both logically and intuitively) in a narrow context
  2. Observe that the right-hand side of the equation actually makes sense in a much larger context than the left-hand side
  3. Use the right-hand side as a proxy for the left-hand side in the larger context

This strategy, called analytic continuation by mathematicians, is extremely powerful. But the basic idea is really quite simple, and it is even familiar in the context of language. When you “log in” to your e-mail account, your messages are likely organized into various “folders”, some of which are in your “inbox” and some of which are in your “outbox”. Perhaps you have a list of your friends’ e-mail “addresses” in your “address book”. The words that I put in quotes all began life in the narrow context of physical reality but have been extended to the new context of the internet; your e-mail inbox is not a physical box anywhere in the world, nor does your e-mail address refer to an actual place you can go. Someone at some point in the history of the internet realized that physical mail is a good metaphor for the electronic messages people send each other, and thus the language surrounding physical mail actually makes sense in the context of the internet.

The strange equations that I wrote above are mathematical counterparts of taking a word such as “hyperlink” which only really makes sense in the context of the internet and applying it to real world mail. You would end up with a sentence which looks pretty bizarre, but there would nevertheless be a certain logic to it.

Let’s see how this all plays out mathematically. We’ll start with something that isn’t likely to stir up much controversy:

1/2 + 1/4 + 1/8 +... = 1

This is the mathematical counterpart of the observation that if you walk across half of a room, then a quarter of the room, then an eigth, and so on then you will have crossed the whole room. (Of course, there are some philosophical questions to be raised by the fact that the phrase “and so on” took the place of an infinite number of actions. Even non-controversial infinite series deserve serious thought.)

You might also convince yourself that

1/3 + 1/9 + 1/81 +... = 1/2

It might not be obvious that the answer is 1/2, but this answer is at least plausible: we start with a number which is smaller than 1/2 and add increasingly tiny numbers to it. And if you plug numbers into a calculator you will get good numerical evidence that this equation makes sense; the further out you go in the sum, the closer you get to 1/2. In general, if the absolute value of s is a number smaller than 1, we have:

s + s^2 + s^3 +... = \frac{s}{1 - s}

Of course, the only context in which the left-hand side really makes sense is when |s| < 1; this ensures that the powers of s get very small very vast and thus the settles near a particular value. If |s| \geq 1 then there is no such guarantee: the powers of $s$ do not get smaller, and you can get a number as large as you want by adding up enough terms in the sum.

The right-hand side, on the other hand makes sense in a much larger context: we can plug in any number except s = 1! In particular, we can plug s = 2 into \frac{s}{1-s} to obtain \frac{2}{1-2} = -2. Since the expressions s + s^2 + s^3 +... and \frac{s}{1-s} agree when |s| < 1, it makes sense to use the latter expression as a proxy for the former at other values of s, such as s = 2. In othe words, it is not entirely stupid to write:

2 + 4 + 8 +... = -2

There is some theory which makes this equation even less stupid: \frac{s}{1-s} is (in a sense which can be made precise) the only sensible way to extend s + s^2 + s^3 +... beyond the set |s| < 1. Properly justifying this requires techniques coming from one of the most beautiful subjects in all of mathematics: the calculus of complex numbers. It should not be at all obvious, but in the end this whole discussion is really all about the mysterious powers of complex numbers.

The same techniques allow us to analyze the sum 1 + 2 + 3 +... which got this post started; this time, our starting point is the Riemann Zeta function:

\frac{1}{1^s} + \frac{1}{2^s} + \frac{1}{3^s} +... = \zeta(s)

This time the sum on the left-hand side makes sense as long as |s| > 1, but the same tools described above imply that the Riemann Zeta function can be “analytically continued” to allow any input except s = 1, and its value at s = -1 can be calculated to be -1/12. This calculation could occupy another entire blog post, so I will not go any further than that at this time.

Now that I have explained the sense in which it is not completely stupid to say that the sum of all the positive integers is -1/12, I would like to conclude by arguing that it still is pretty stupid. Notice that according to the reasonging described in this post we did not assign the sum a value by thinking about it intrinsically as we can with, for instance 1/2 + 1/4 + 1/8 +...; instead we related the sum to the Riemann Zeta function and analyzed that function. But there are infinitely many other possible functions which have a similar relationship to 1 + 2 + 3 +..., and many of them will assign different values to the series following the steps outlined here. In fact, you can use these steps to justify giving the sum any value you want. Still, the Riemann Zeta function enjoys a privileged position in mathematics (and physics) so -1/12 is a pretty good choice.

The Geometry of Curves of Constant Width


Today I will finally fulfill my earlier promise to revisit the geometry of curves of constant width. I doubt anyone was going to hold me to this promise, but it’s generally good to keep your promises even if you only made them to your own blog. In any event, I have two goals in this post:

  1. Prove that if two curves have the same constant width then they have the same length (perimeter).
  2. Prove that among all curves with a given constant width the circle encloses the largest volume.

Let us begin by providing some precise definitions. Recall that a plane curve is simply a continuous function \gamma \colon [0,1] \to \mathbb{R}^2, and a plane curve is closed if it begins and ends at the same point, i.e. \gamma(0) = \gamma(1). A closed curve is simple if it intersects itself only at the endpoints, meaning \gamma(t_1) = \gamma(t_2) only if t_1 = t_2 or t_1 and t_2 are both endpoints of the interval [0,1]. The most basic fact about simple closed curves is that they divide the plane into two disconnected regions: a bounded piece (the “inside”) and an unbounded piece (the “outside”). This is called the Jordan curve theorem, and as far as I know the simplest proofs use some reasonably sophisticated ideas in algebraic topology (though only a mathematician would think it even needs to be proved!)

Given a simple closed curve \gamma, let C_\gamma denote the image of \gamma, i.e. the set of all points in the plane that \gamma passes through, and let D_\gamma denote C_\gamma together with the points “inside” \gamma. A line in the plane is said to be a supporting line for D_\gamma if it intersects C_\gamma but does not pass through any interior points of D_\gamma. The set D_\gamma is closed and bounded, so there are exactly two distinct supporting lines for D_\gamma in any given direction. The set of directions in the plane can be parametrized by an angle \theta between 0 and \pi (with the understanding that 0 and \pi represent the same direction). Thus we define a “width” function w_\gamma on the set of directions by letting w_\gamma(\theta) denote the distance between the supporting lines for D_\gamma in the direction \theta. Here’s what the width looks like in an example:


Finally, we say that \gamma has constant width if w_\gamma is constant. The goal is to prove that any two curves of constant width w have the same length, and that among all curves of constant width w the circle of diameter w has the largest area. Before proceeding, we need to understand the geometry of constant width curves a little better.

Specifically, we want to show that every curve \gamma of constant width is convex, meaning D_\gamma contains the line segment between any two of its points. In fact we will prove something a bit stronger: \gamma is strictly convex, meaning it is convex and C_\gamma contains no line segments (so that the line segment joining any two points in D_\gamma actually lies in the interior of D_\gamma). This requires a nice little trick that I couldn’t figure out on my own; special thanks to Ian Agol for helping me out on mathoverflow.

Proposition: Every curve of constant width is strictly convex.
Proof: Let H_\gamma denote the convex hull of D_\gamma; this is by definition the smallest convex set which contains D_\gamma. According to a general fact from convex geometry, the boundary of H_\gamma consists only of points in the boundary of D_\gamma and possibly line segments joining points in the boundary of D_\gamma. So we will show that the boundary of H_\gamma contains no line segments, implying that H_\gamma = D_\gamma and hence that D_\gamma is strictly convex.

According to another general fact from convex geometry the supporting lines for H_\gamma are precisely the same as the supporting lines for D_\gamma, and hence H_\gamma has the same constant width w as D_\gamma. So assume that the boundary of H_\gamma contains a line segment joining two points a and b. Since H_\gamma is convex, the line \ell passing through a and b is a supporting line for H_\gamma. There is exactly one other supporting line for H_\gamma parallel to this line; let c denote a point where it intersects H_\gamma. Consider the triangle abc; its height is precisely w, the width of H_\gamma, so we have that w is strictly smaller than at least one of dist(a,c) or dist(b,c). Assume w < dist(a,c) and consider the supporting lines for H_\gamma which are perpendicular to the line segment joining a and c. The points a and c must lie between (or possibly on) these supporting lines, but the distance between the supporting lines is w since H_\gamma has constant width. We conclude that w < dist(a,c) \leq w, a contradiction.

The reason why strict convexity is important to us is that lines intersect strictly convex curves in a very predictable way:

Lemma: Let \gamma be a closed strictly convex curve and let L be a line which intersects C_\gamma. Then L intersects C_\gamma exactly once if it is a supporting line or exactly twice if it is not.
Proof: Note that the intersection of two convex sets is again convex, so the intersection I = L \cap D_\gamma is a convex subset of a line. Since D_\gamma is closed and bounded the same must be true of the intersection, so the only possibility is that I is a closed interval [a,b] with a \leq b. Note that interior points of [a,b] correspond to interior points of D_\gamma and the boundary points a and b correspond to boundary points of D_\gamma, so we have that a = b if and only if L is a supporting line and a < b otherwise. Thus supporting lines intersect C_\gamma exactly once and any other line which intersects C_\gamma does so exactly twice.

We are now ready to calculate the length of a constant width curve. Our strategy is to use the main result of my previous post, “The Mathematics of Throwing Noodles at Paper.” There we saw that if one randomly tosses a curve of length \ell at a lined sheet of paper with line spacing d then the expected number of line intersections is given by \frac{2 \ell}{\pi d}. So let us toss our curve of constant width w at a lined sheet of paper with line spacing w. The curve must intersect at least one line and it can’t intersect three or more lines, so it either intersects exactly one line or exactly two lines. The curve intersects exactly two lines if and only if they are supporting lines, and hence each line intersects the curve exactly once by the lemma above. If the curve intersects exactly one line then it cannot be a supporting line and thus the lemma implies that the curve intersects the line exactly twice. In either case the total number of intersections is exactly 2, and thus the expected number of intersections is 2. Therefore

2 = \frac{2 \ell}{\pi w}

and hence \ell = \pi w. Thus every curve of constant width w has length \pi w, an assertion consistent at least with the circle of diameter w. The result is called Barbier’s Theorem, and it has a variety of different proofs; I find the argument using geometric probability to be the most beautiful.

We have now settled the length question; what about area? In fact, to place an upper bound on the area inside a constant width curve we will simply use our length calculation together with the following landmark theorem in geometry:

Theorem: Let \ell be the length of a simple closed curve in the plane and let $A$ be the area that it encloses. Then:
4\pi A \leq \ell^2
with equality if and only if the curve is a circle.

In other words, among all curves with a given length the circle is the unique curve which encloses the largest area. This theorem is called the isoperimetric inequality, and it has many beautiful proofs, generalizations, and applications. Our claim about the area enclosed by constant width curves is an immediate corollary since they all have the same length (given a fixed width). I originally intended to prove the isoperimetric inequality in this post using geometric probability, but I would need to take some time to explain how to calculate area probabilistically and I think the post is long enough as it is. Perhaps I will revisit this in the future.

Gender and the Mathematical Community


I still haven’t posted all that much in this blog, and essentially nothing research-related. I’ve been writing a bit offline, and I’ll probably adapt some of what I’ve been thinking about into blog form fairly soon. In this post I’d like to address some issues related to sexism and gender bias in the mathematical (and perhaps broader scientific) community. I think about these issues rather often, but I’m writing about them now because of recent posts in The Accidental Mathematician (Izabella Laba’s blog) and mathbabe (Cathy O’Neil’s blog).

The thrust of Izabella Laba’s post (entitled “Gender Bias 101 for Mathematicians”) is that gender bias in the mathematical community is not limited to a few grouchy old codgers, but rather that it is a systematic cultural and psychological phenomenon which afflicts everybody. There are two potentially controversial assertions implicit in this statement:

  1. Gender bias in the mathematical community exists.
  2. Gender bias in the mathematical community is pervasive and systematic.

The first assertion is pretty hard to argue with, though I’m sure some people still try. Every math department with which I have been affiliated is *massively* male dominated, and there is ample evidence that hiring practices, salaries, journals, etc. are stacked in favor of men. I’m not going to try to document or justify this in any detail because I don’t have the facts available at my fingertips and because the issue has been argued to my satisfaction elsewhere (e.g. in the Accidental Mathematician).

The second assertion might be more surprising to some, and it’s the one I want to discuss here. Izabella Laba’s post quotes a recent study in which faculty from research oriented universities were presented with applications for a lab manager position with randomly assigned male or female names. The study found that a given application with a male name at the top was consistently rated more highly than the same application with a female name. Interestingly enough, the pattern was independent of the gender of the faculty evaluator: female professors were just as biased as male professors. Cathy O’Neil contributes another study which shows that 15-year-old girls outperform 15-year-old boys in science exams in some countries but not others (not in the United States), indicating that gender gaps in science are cultural rather than biological.

Both of these studies are quite compelling, and I’m sure there are others which point to the same conlusion. My intention is to participate in this discussion subjectively rather than objectively. In short, I am going to use the rest of this post to analyze my own gender-oriented biases. Something feels a bit self-indulgent about this exercise, but I think it will be healthy for me even if it isn’t useful for anyone else.

I will begin by admitting outright that I am biased against women. I consider myself to be a pretty progressive guy – perhaps even more progressive than most – and I think that most people who know me would say that overall I do a good job of treating women with the same respect with which I treat men. But this is not because I don’t have biases, it’s because I work very hard to identify them and eliminate them or at least minimize their impact on my behavior. I am unqualified to generalize my own psychological observations to everyone else, but I suspect that it is neurologically almost impossible for a person socialized in 20th or 21st century American society to avoid gender biases: we are bombarded with overt and covert messages about gender constantly and starting at a very young age. Given what I have been learning lately about how insignificant our conscious thought processes are in comparison to our subconscious psychological machinery, these messages must take their toll.

What forms do gender biases take? There are many answers with varying applicability to me. Here is a non-comprehensive unordered list that I have assembled from reading things online, talking to people, and making my own observations.

  • Intelligence and Competence Bias: This is simply the assumption that women are less intelligent or less competent than men. I have heard numerous stories in which Andrew launches into a lengthy explanation to Barbara about a subject in which Barbara is more of an expert than Andrew. Here is a particularly cringe-inducing example of this. I tend not to offer unsolicited explanations to men or women very often, and when providing solicited explanations I usually make an effort to identify my audience’s background, so I don’t think I am terribly guilty of this particular behavior. Instead, I notice this bias in myself when I am seeking an expert on a particular subject and I am presented with a male option and a female option. Sometimes I catch myself behaving or thinking according to the assumption that the male expert is more knowledgable or more adept than the female expert even if I have no particular reason to make such a judgement. I have to force myself to think deliberately about what I know and don’t know when making these sorts of comparisons.
  • Experience Bias: Lately I have started noticing a disturbing pattern in my judgements about a person’s age, experience, education level, etc.: my estimates are consistently lower than reality for women and higher than reality for men. I have heard many stories from women in which they are demoted from faculty member to graduate student or from graduate student to undergraduate by a male interlocutor, and I am embarrassed to admit that I have done this before. I have also heard stories in which a female graduate student or faculty member has been assumed to be a secretary or staffperson; I don’t necessarily consider this to be a “demotion,” but I doubt I would appreciate it if it happend to me. These days I try to avoid guessing somebody’s position or experience level at all, and if I do make a guess it’s generally “faculty” regardless of gender (in a university setting). Still, it requires conscious effort on my part.
  • Common Ground Bias: This is the assumption that, all things being equal, I will have more in common with a male than a female. This bias is fairly understandable – there are, after all, real biological and social differences between men and women – but I think it has unfortunate consequences in an academic setting. Few of my mathematical conversations with my peers begin, “Hi, my name is Paul. Would you like to have a conversation about elliptic cohomology?” Instead, they typically begin with the typical introductory social graces and lead into mathematical territory after a basic rapport has been established. This rapport is more difficult to establish with a person with whom I assume I will have a harder time identifying with before the conversation even begins, and consequently I am more likely to engage in mathematical conversations with my male colleagues than my female colleagues. I don’t know how socially isolated women in math departments feel, but I suspect that it’s more of a problem than I realize. My plan for reducing the impact of this bias is to simply be more bold and less awkward about engaging people in conversation, but this isn’t always easy.
  • Sexual Biases: I am a heterosexual man who is attracted to intelligent and ambitious women, and the women that one finds employed in a math department often fit this description. Sexual attraction is firmly rooted in extremely powerful subconscious processes, and I am certain it affects my interactions with my female colleagues in ways that I don’t fully understand. If nothing else, it consumes some measure of my mental energy that is liberated when I’m interacting with men. It seems very hard to deal with the subconscious aspects of this bias, but I long ago adopted a mechanism which at least helps me manage the factors that are under my control. I decided early on in graduate school that I would categorically avoid romantically pursuing anyone in my own department. This allows me to sidestep the hazards associated with workplace romances in general, but mainly it helps me ensure that I treat all of my colleagues as professionally as possible. I don’t know how often the average woman in a math department is forced to deal with romantic overtures from her male colleagues, but given the highly skewed gender ratios I’m guessing it’s more than I imagine. I am also largely ignorant of the consequences of this behavior.

I’m sure there are other biases worth mentioning, but this list feels like a good start. One interesting supplementary observation about biases in general is that thinking about them leads to an unfortunate feedback loop: worrying about biases against women affects my behavior toward women. I think this effect is fairly minimal in comparison to the consequences of ignoring my biases and failing to monitor my behavior at all, but it’s there all the same.

My final remark about this subject is that there are many other bias issues which are also largely ignored by the mathematical and scientific community. I have encountered some discussion of racial bias in science, but I have heard almost no discussion about biases related to sexual orientation. If anyone reading this is aware of any studies or references about these issues, I would be interested in seeing them. Also, in this post I have focused on the effects of bias on my interactions with my colleagues, but the way my biases manifest themselves in my teaching is a whole other subject which I might take up in the future.

Older Entries