Higher Homotopy Groups via Vector Calculus

Leave a comment

This semester I taught an undergraduate course on topology (continuity, compactness, connectedness, and basic homotopy theory) and on the last day of class I decided to give a brief introduction to the theory of higher homotopy groups. For motivation, consider the classical Brouwer fixed point theorem:

Theorem: Every continuous function f \colon B^2 \to B^2 has a fixed point, where B^2 is the closed unit ball in the plane.
Proof:
Suppose f has no fixed point point, meaning x and f(x) are distinct for every x \in B^2. Define a function r \colon B^2 \to S^1 (where S^1 is the boundary circle of B^2) as follows. Given x \in B^2 there is a unique line in the plane containing both x and f(x), so there is a unique line segment containing x whose endpoints consist of f(x) and a point on S^1. Define r(x) to be the endpoint on S^1. Explicit calculations (using the continuity of f) show that r(x) is continuous, and moreover if x \in S^1 then r(x) = x. A continuous function from a topological space X to a subset A \subseteq X which restricts to the identity on A is called a retraction; we have shown that if there is a continuous function f \colon B^2 \to B^2 with no fixed points then there is is a retraction r \colon B^2 \to S^1.

Let us use algebraic topology to prove that there is no such retraction. Let i \colon S^1 \to B^2 denote the inclusion map, so that r \circ i \colon S^1 \to S^1 is the identity. Passing to the induced homomorphism on fundamental groups, this shows that r_* \circ i_* \colon \pi_1(S^1) \to \pi_1(S^1) is the identity and hence r_* is surjective. But \pi_1(B^2) is the trivial group since B^2 is contractible and \pi_1(S^1) \cong \mathbb{Z}, so r_* \colon \pi_1(B^2) \to \pi_1(S^1) could not possibly be surjective, a contradiction. QED

One might wonder if the argument above works for the closed unit ball B^{n+1} \subseteq \mathbb{R}^{n+1}. Indeed, the first part of the argument works in higher dimensions almost verbatim, and one gets that any continuous function f \colon B^{n+1} \to B^{n+1} gives rise to a retraction r \colon B^{n+1} \to S^n onto the boundary sphere. But the second part of the argument fails: the fundamental group of S^n is trivial for n > 1, so there is no contradiction. The solution is to replace the fundamental group \pi_1 with the higher homotopy group \pi_n; whereas \pi_1(X) is the group of homotopy classes of continuous maps S^1 \to X, \pi_n(X) is the group of homotopy classes of continuous maps S^n \to X (of course, all spaces, maps, and homotopies must have base points).

In the proof of the Brouwer fixed point theorem above, we only needed three properties of the fundamental group:

  • Every continuous map f \colon X \to Y induces a group homomorphism f_* \colon \pi_1(X) \to \pi_1(Y).
  • If f_1, f_2 \colon X \to Y are homotopic continuous maps then (f_1)_* = (f_2)_* \colon \pi_1(X) \to \pi_1(Y).
  • \pi_1(S^1) is not the trivial group.

The first two of these properties generalize to higher homotopy groups with almost identical proofs. The counterpart of the third property, namely that \pi_n(S^n) is not the trivial group, is considerably more difficult. One typically computes \pi_1(S^1) using covering space theory, but there is no counterpart of covering space theory for higher homotopy groups. (Well, such a theory does exist in a manner of speaking, but it is much more complicated than covering space theory.)

To actually compute \pi_n(S^n) one needs some rather powerful tools in algebraic topology, such as the Freudenthal suspension theorem or the Hurewicz isomorphism. The difficulty of this computation is still a bit mysterious to me, and was the subject of one of my recent MathOverflow questions. Even the more modest goal of proving that \pi_n(S^n) is non-trivial is quite a bit more challenging for n > 1 than for n = 1. Nevertheless, I came up with an argument in the case n = 3 based on vector calculus which is suitable for undergraduates; I don’t think I’ve seen this exact argument written down anywhere else, so I thought I would write it up here. It is adapted from a more standard argument involving Stokes’ theorem on manifolds which works in any dimension (but which requires a semester’s worth of manifold theory to understand).

I will prove the following statement:

Main Theorem: There is no continuous retraction f \colon B^3 \to S^2.

This alone is enough to prove the Brouwer fixed point theorem for B^3 without having to worry about higher homotopy groups, but in fact it implies that \pi_2(S^2) is nontrivial. Pick a base point p \in S^2 and consider the identity map I \colon S^2 \to S^2. This determines a class [I] \in \pi_2(S^2,p), so if \pi_2(S^2,p) is the trivial group then there is a base point preserving homotopy between I and then constant map e_p \colon S^2 \to S^2 given by e_p(x) = p. This homotopy is a continuous map H \colon S^2 \times [0,1] \to S^2 which satisfies:

  • H(x,0) = x
  • H(x,1) = p
  • H(p,t) = p for all t

Given such a homotopy, define r \colon B^3 \to S^2 by r(x) = H(\frac{x}{|x|}, |x|) if x \neq 0 and r(0) = 0. It is not hard to check that r is a retraction, contradicting the main theorem.

To prove the main theorem we need a technical lemma:

Lemma: If there is a continuous retraction r \colon B^3 \to S^2 then there is a smooth retraction.

The proof of this lemma uses some slightly complicated analysis, but ultimately it is fairly standard; see the final chapter of Gamelin and Greene’s “Introduction to Topology”, for example. The only other non-trivial input required to prove the main theorem is the following classical result from vector calculus:

Divergence Theorem: Let E be a compact subset of \mathbb{R}^3 whose boundary is a piecewise smooth surface \partial E, let n denote the outward unit normal field on \partial E, and let F be a smooth vector field on E. Then:

\int_{\partial E} F \cdot n\, dS = \int_E div F\, dV

Here div (“divergence”) is the differential operator div(P,Q,R) = P_x + Q_y + R_z.

Proof of Main Theorem:
By the previous lemma it suffices to show that there is no smooth retraction from B^3 to S^2, so suppose r \colon B^3 \to S^2 is such a retraction and denote its component functions by r = (P,Q,R). Thus r may be viewed as a smooth vector field on B^3; since r(v) = v for v \in \partial B^3 = S^2 we have P(x,y,z) = x, Q(x,y,z) = y, and R(x,y,z) = z for every (x,y,z) \in S^2.

Consider the smooth vector field F = P(\nabla Q \times \nabla R) where \nabla is the gradient operator. We will compute the integral of F over S^2 in two different ways and get two different answers, giving a contradiction. Both computations will use the divergence theorem:

\int_{S^2} F \cdot n\, dS = \int_{B^3} div F\, dV

The first computation uses a bit of vector calculus. By the product rule for the divergence of a function multiplied by a vector field, we have:

div(P (\nabla Q \times \nabla R)) = \nabla P \cdot (\nabla Q \times \nabla R) + P div(\nabla Q \times \nabla R)

The second term on the right-hand side vanishes by the product rule for the divergence of the cross product of two vector fields:

div(\nabla Q \times \nabla R) = \nabla R \cdot curl(\nabla Q) + \nabla Q \cdot curl(\nabla R) = 0

Here we used the fact that the curl of the gradient of any smooth function is the zero vector.

According to the standard “triple product” formula from vector algebra, the first term \nabla P \cdot (\nabla Q \times \nabla R) is the determinant of the Jacobian matrix J_F associated to F whose rows consist of \nabla P, \nabla Q, and \nabla R. I claim that this determinant is zero. Since F takes values in S^2 we have that P^2 + Q^2 + R^2 = 1; differentiating both sides of this equation with respect to x gives P P_x + Q Q_x + R R_x = 0, or equivalently F \cdot F_x = 0. Similarly F \cdot F_y = 0 and F \cdot F_z = 0, so the vectors F_x, F_y, and F_z are all orthogonal to the same nonzero vector F and hence there is a nontrivial dependence relation between them. But F_x, F_y, and F_z are the columns of J_F, so it follows that \det(J_F) = 0. We conclude:

\int_{S^2} F \cdot n\, dS = 0

by the divergence theorem. (The various identities used in this argument all appear in the Wikipedia page on Vector Calculus Identities, with the notation div F = \nabla \cdot F and curl F = \nabla \times F.)

Now let us compute the same integral using the fact that F(x,y,z) = (x,y,z) on S^2. Using P = x, Q = y, and R = z we calculate that P (\nabla Q \times \nabla R) = (x,0,0) and hence div(P (\nabla Q \times \nabla R)) = 1. By the divergence theorem we get:

\int_{S^2} F \cdot n\, dS = \int_{B^3} 1\, dV = vol(B^3) \neq 0

This is a contradiction. QED

The Alternating Group is Simple I

Leave a comment

This past week I covered an abstract algebra course at Columbia, and I decided to prove that the alternating group A_n is simple. I in fact did this in the same algebra class last year, but in the intervening months I almost entirely forgot how the argument goes. So I decided to write it up here while it’s still fresh in my mind. It’s a very nice – and fairly elementary – little application of some important ideas in group theory. In this post I’m going to give some background and explain the significance of the simplicity of A_n, and in the sequel I will go through the proof.

The Alternating Group


I am forced to assume that the reader is comfortable with basic group theory, but I’ll begin by reviewing some of the key ideas. Recall that the symmetric group S_n is the group of permutations of n symbols 1, 2, \ldots, n. A transposition is a permutation which swaps exactly two symbols and leaves the others fixed; it is not hard to see that any permutation can be expressed as the product of transpositions. A permutation is said to be even (respectively, odd) if it can be written as the product of an even (respectively, odd) number of transpositions.

Definition: The alternating group A_n is the subgroup of S_n consisting of all even permutations.

A_n is a normal subgroup of S_n of index 2; the objective of this series of posts is to prove that A_n is simple for n \geq 5, meaning its only normal subgroups are itself and the trivial group. The significance of this property is that if a group G has a normal subgroup H then one can form the quotient group G/H, and often one can infer properties of G from properties of H and G/H. So simple groups are in a sense the “atoms” from which all other groups are built, though it should be noted that H and G/H alone do not uniquely determine G.

The Classification of Simple Groups

Classifying all finite simple groups was one of the great achievements of 20th century mathematics, and like many great mathematical achievements it went almost completely unnoticed by the rest of the world. The classification theorem asserts that all finite simple groups fit into a few infinite families (one of which is the family of alternating groups) with precisely 26 exceptions, the so-called sporadic simple groups. A shameless plug: when I was an undergraduate I did an REU project with Igor Kriz which involved making little computer games based on the sporadic simple groups; later we wrote a Scientific American article about them.

In any event, the classification program took decades to complete and spans thousands of pages written by dozens of mathematicians, and its completion seems to have essentially killed off finite group theory as an active area of research (though from what I understand there are lots of open problems in representation theory for finite groups). Given how monumental the effort was and how few people are still working in finite group theory, I worry that in a few decades all the experts will retire or die and there will be nobody left who understands the proof. It’s a good illustration of the principle that mathematicians tend to care much more about questions than answers.

The Alternating Group and Galois Theory

Aside from their role in the classification program, the alternating groups play a crucial role in the theory of polynomial equations. Indeed, the very notion of a group was invented to understand the structure of solutions to polynomial equations, and the group A_5 is the star of the show.

Everyone learns in high school algebra that there is a formula for the roots of a quadratic equation ax^2 + bx + c = 0:

x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}

Less well known is that there is also a cubic formula and quartic formula for degree three and four equations, respectively. These formulas date back to the 16th century, and it was a frustratingly difficult open problem to find a formula for the roots of a polynomial equation of degree five. It wasn’t until the 19th century that Abel and Galois independently realized that no such formula exists! Abel’s proof came first, but I don’t know what it was; Galois’ argument is the one that survived. Here is a brief sketch.

Galois’ key idea was to focus on the symmetries exhibited by the roots of a polynomial equation. More precisely, he considered their symmetries relative to the rational numbers; there are well-known techniques for finding rational roots of polynomials, so he was interested in the structure of the irrational roots. Let’s look at a couple examples:

  • The roots of x^2 - 2 are \pm \sqrt{2}, so you can get from one root to the other by multiplying by -1. Thus the cyclic group C_2 naturally exhibits the symmetries of the roots.
  • The roots of x^4 - 1 are i, -i, 1, and -1. Notice that you can cycle through the roots just by looking at powers of i: i^0 = 1, i^1 = i, i^2 = -1, i^3 = -1 and i^4 = 1. Thus the symmetries of the roots are given by the cyclic group C_4.
  • The roots of (x^2 - 2)(x^2 - 3) are \pm \sqrt{2} and \pm \sqrt{3}. The roots \sqrt{2} and -\sqrt{2} are interchangeable, as are \sqrt{3} and -\sqrt{3}, but over the rational numbers there is a sort of asymmetry between \sqrt{2} and \sqrt{3}. Thus the symmetry group is C_2 \times C_2.

Of course, one can make all this precise using the language of field extensions. The upshot is that the symmetry groups help characterize what it means to find a formula for the roots of a polynomial equation. As in the example above, equations of the form x^n - a = 0 have cyclic symmetry group C_n. So if the quintic formula had the form \sqrt[5]{a + \sqrt[3]{b}}, for instance, then the symmetry group could be decomposed into a C_5 part and a C_3 part corresponding to the fifth root and cube root, respectively. More precisely, a polynomial equation can be solved by radicals if and only if its symmetry group G has a decomposition

G = G_0 \supseteq G_1 \supseteq \ldots \supseteq G_n = \{1\}

where G_i is a normal subgroup of G_{i-1} and G_{i-1}/G_i is cyclic. Groups with this property are said to be solvable due to the connection with solving equations.

Now, there exist polynomials of degree 5 whose symmetry group is the full symmetric group S_5 (in fact there are many). S_5 contains A_5 as a normal subgroup with quotient C_2, but once we have proved that A_5 is simple we will know that it is not solvable: it has no nontrivial normal subgroups whatsoever, let alone one with a cyclic quotient. This argument shows that there cannot be a general formula in the spirit of the quadratic, cubic, or quartic formulas, but it also shows even more: it gives you a criterion (solvability of the symmetry group) to determine when there is a formula for the roots of a specific polynomial.

The Geometry of Curves of Constant Width

2 Comments

Today I will finally fulfill my earlier promise to revisit the geometry of curves of constant width. I doubt anyone was going to hold me to this promise, but it’s generally good to keep your promises even if you only made them to your own blog. In any event, I have two goals in this post:

  1. Prove that if two curves have the same constant width then they have the same length (perimeter).
  2. Prove that among all curves with a given constant width the circle encloses the largest volume.

Let us begin by providing some precise definitions. Recall that a plane curve is simply a continuous function \gamma \colon [0,1] \to \mathbb{R}^2, and a plane curve is closed if it begins and ends at the same point, i.e. \gamma(0) = \gamma(1). A closed curve is simple if it intersects itself only at the endpoints, meaning \gamma(t_1) = \gamma(t_2) only if t_1 = t_2 or t_1 and t_2 are both endpoints of the interval [0,1]. The most basic fact about simple closed curves is that they divide the plane into two disconnected regions: a bounded piece (the “inside”) and an unbounded piece (the “outside”). This is called the Jordan curve theorem, and as far as I know the simplest proofs use some reasonably sophisticated ideas in algebraic topology (though only a mathematician would think it even needs to be proved!)

Given a simple closed curve \gamma, let C_\gamma denote the image of \gamma, i.e. the set of all points in the plane that \gamma passes through, and let D_\gamma denote C_\gamma together with the points “inside” \gamma. A line in the plane is said to be a supporting line for D_\gamma if it intersects C_\gamma but does not pass through any interior points of D_\gamma. The set D_\gamma is closed and bounded, so there are exactly two distinct supporting lines for D_\gamma in any given direction. The set of directions in the plane can be parametrized by an angle \theta between 0 and \pi (with the understanding that 0 and \pi represent the same direction). Thus we define a “width” function w_\gamma on the set of directions by letting w_\gamma(\theta) denote the distance between the supporting lines for D_\gamma in the direction \theta. Here’s what the width looks like in an example:

width

Finally, we say that \gamma has constant width if w_\gamma is constant. The goal is to prove that any two curves of constant width w have the same length, and that among all curves of constant width w the circle of diameter w has the largest area. Before proceeding, we need to understand the geometry of constant width curves a little better.

Specifically, we want to show that every curve \gamma of constant width is convex, meaning D_\gamma contains the line segment between any two of its points. In fact we will prove something a bit stronger: \gamma is strictly convex, meaning it is convex and C_\gamma contains no line segments (so that the line segment joining any two points in D_\gamma actually lies in the interior of D_\gamma). This requires a nice little trick that I couldn’t figure out on my own; special thanks to Ian Agol for helping me out on mathoverflow.

Proposition: Every curve of constant width is strictly convex.
Proof: Let H_\gamma denote the convex hull of D_\gamma; this is by definition the smallest convex set which contains D_\gamma. According to a general fact from convex geometry, the boundary of H_\gamma consists only of points in the boundary of D_\gamma and possibly line segments joining points in the boundary of D_\gamma. So we will show that the boundary of H_\gamma contains no line segments, implying that H_\gamma = D_\gamma and hence that D_\gamma is strictly convex.

According to another general fact from convex geometry the supporting lines for H_\gamma are precisely the same as the supporting lines for D_\gamma, and hence H_\gamma has the same constant width w as D_\gamma. So assume that the boundary of H_\gamma contains a line segment joining two points a and b. Since H_\gamma is convex, the line \ell passing through a and b is a supporting line for H_\gamma. There is exactly one other supporting line for H_\gamma parallel to this line; let c denote a point where it intersects H_\gamma. Consider the triangle abc; its height is precisely w, the width of H_\gamma, so we have that w is strictly smaller than at least one of dist(a,c) or dist(b,c). Assume w < dist(a,c) and consider the supporting lines for H_\gamma which are perpendicular to the line segment joining a and c. The points a and c must lie between (or possibly on) these supporting lines, but the distance between the supporting lines is w since H_\gamma has constant width. We conclude that w < dist(a,c) \leq w, a contradiction.
QED

The reason why strict convexity is important to us is that lines intersect strictly convex curves in a very predictable way:

Lemma: Let \gamma be a closed strictly convex curve and let L be a line which intersects C_\gamma. Then L intersects C_\gamma exactly once if it is a supporting line or exactly twice if it is not.
Proof: Note that the intersection of two convex sets is again convex, so the intersection I = L \cap D_\gamma is a convex subset of a line. Since D_\gamma is closed and bounded the same must be true of the intersection, so the only possibility is that I is a closed interval [a,b] with a \leq b. Note that interior points of [a,b] correspond to interior points of D_\gamma and the boundary points a and b correspond to boundary points of D_\gamma, so we have that a = b if and only if L is a supporting line and a < b otherwise. Thus supporting lines intersect C_\gamma exactly once and any other line which intersects C_\gamma does so exactly twice.
QED

We are now ready to calculate the length of a constant width curve. Our strategy is to use the main result of my previous post, “The Mathematics of Throwing Noodles at Paper.” There we saw that if one randomly tosses a curve of length \ell at a lined sheet of paper with line spacing d then the expected number of line intersections is given by \frac{2 \ell}{\pi d}. So let us toss our curve of constant width w at a lined sheet of paper with line spacing w. The curve must intersect at least one line and it can’t intersect three or more lines, so it either intersects exactly one line or exactly two lines. The curve intersects exactly two lines if and only if they are supporting lines, and hence each line intersects the curve exactly once by the lemma above. If the curve intersects exactly one line then it cannot be a supporting line and thus the lemma implies that the curve intersects the line exactly twice. In either case the total number of intersections is exactly 2, and thus the expected number of intersections is 2. Therefore

2 = \frac{2 \ell}{\pi w}

and hence \ell = \pi w. Thus every curve of constant width w has length \pi w, an assertion consistent at least with the circle of diameter w. The result is called Barbier’s Theorem, and it has a variety of different proofs; I find the argument using geometric probability to be the most beautiful.

We have now settled the length question; what about area? In fact, to place an upper bound on the area inside a constant width curve we will simply use our length calculation together with the following landmark theorem in geometry:

Theorem: Let \ell be the length of a simple closed curve in the plane and let $A$ be the area that it encloses. Then:
4\pi A \leq \ell^2
with equality if and only if the curve is a circle.

In other words, among all curves with a given length the circle is the unique curve which encloses the largest area. This theorem is called the isoperimetric inequality, and it has many beautiful proofs, generalizations, and applications. Our claim about the area enclosed by constant width curves is an immediate corollary since they all have the same length (given a fixed width). I originally intended to prove the isoperimetric inequality in this post using geometric probability, but I would need to take some time to explain how to calculate area probabilistically and I think the post is long enough as it is. Perhaps I will revisit this in the future.

The Mathematics of Throwing Noodles at Paper

2 Comments

An Experiment

Get out a sheet of paper and draw parallel lines on it spaced two inches apart. Take a one-inch long needle and repeatedly toss it onto the sheet of paper, counting the total number of needle tosses and the number of times the needle touches one of the lines. What do you expect the ratio of number of tosses to the number of line intersections to be? In other words, what is the probability that a randomly tossed needle intersects a line? I don’t think the answer is obvious, but it turns out to be the ubiquitous number \pi \sim 3.14159... You can in principle use this to experimentally calculate \pi, though you unfortunately need to toss the needle an impractical number of times in order to get a reasonable estimate.

This experiment is known as Buffon’s Needle Experiment; in this post I’m going to explore more general and seemingly more difficult phenomenon called Buffon’s Noodle Experiment. The setup is the same as before, only instead of tossing a needle (a line segment), we’ll toss a rigid “noodle” in the shape of any desired plane curve. We’ll find that the statistics of noodle crossings is determined just by the length, and not the specific shape, of the noodle in question. Thus the noodle experiment makes a profound connection between geometry and probability theory, a connection which helps solve difficult problems in both areas. This is encapsulated by a beautiful tool called Crofton’s Formula, the first result in an area of mathematics called “Integral Geometry” (or alternatively “Geometric Probability,” depending on whom you ask).

Part of the beauty of the integral geometry approach to Buffon’s needle experiment is that it involves almost no calculations. There are other approaches that involve writing out probability density functions and calculating double integrals, but the argument that I will give below involves only basic (but surprisingly subtle) ideas in probability theory and calculus. It’s a great example of a tricky problem that can be solved through careful abstract thought.

The Explanation

To come to grips with the needle and noodle experiments, we will try to answer the following question: what is the expected number of times that a randomly thrown noodle will cross lines on lined paper with line spacing d? We count with multiplicities: if the noodle intersects the same line twice, that counts as two line intersections.

Let X be the random variable which represents the number of line intersections for a given noodle. Recall that the expected value of X is given by:

E(X) = \sum_{n=0}^\infty n P(X = n)

where P(X = n) represents the probability that the number of line intersections is exactly n. Here are a few basic observations about these expectations in the case where the noodle is actually a needle (i.e. a line segment).

  • The expectation for a needle of length smaller than d (the line spacing) is just P(X = 1) because such a needle can cross at most one line.
  • The expectation for any needle depends only on the length of the needle. Thus if X_\ell denotes the random variable which represents the number of line intersections for a needle of length \ell, we can define a function f by f(\ell) = E(X_\ell).

Now, consider a noodle made up of exactly two needles of lengths \ell_1 and \ell_2 joined rigidly end-to-end. Denote the random variables representing the number of crossings for the two needles by X_1 and X_2, respectively; then the random variable representing the number of line intersections for the noodle is just X_1 + X_2. Note that X_1 and X_2 are not independent since the needles are joined, but it is nevertheless true that

E(X_1 + X_2) = E(X_1) + E(X_2) = f(\ell_1) + f(\ell_2)

A priori the expectation E(X_1 + X_2) depends on the lengths \ell_1 and \ell_2 of the two needles as well as the angle at which they are joined, but the calculation above shows that the expectation is actually independent of the angle. Therefore we can calculate the expectation just by considering the case where angle is such that the two needles form a line segment, i.e. a needle of length \ell_1 + \ell_2. From this we conclude that

f(\ell_1 + \ell_2) = f(\ell_1) + f(\ell_2)

We can iterate this argument for any noodle made by chaining together n needles of lengths \ell_1, \ldots, \ell_n to conclude that:

f(\sum_{i=1}^n \ell_i) = \sum_{i=1}^n f(\ell_i)

In particular, if the needles all have the same length, we conclude that f(n \ell) = n f(\ell) for any positive integer n and any positive real number \ell. By dividing a needle into n equal pieces, this also shows that f(\frac{1}{n} \ell) = \frac{1}{n}f(\ell). Combining these two facts, we conclude that f(a \ell) = a f(\ell)) for any rational number a. The expected number of crossings for a longer line segment is at least as large as the expected number of crossings for a shorter one, so we also know that the function f is non-decreasing. Also, we have that f(0) = 0 (i.e. the line segment of length zero doesn’t intersect any lines). Basic calculus tells us that the only non-decreasing function f which satisfies f(0) = 0, f(\ell_1 + \ell_2) = f(\ell_1) + f(\ell_2), and f(a \ell) = a f(\ell) for every rational number a is the function

f(\ell) = C \ell

where C is some constant. This is already a pretty strong statement about the expected number of crossings for a needle, but we can do even better. Take any noodle of total length \ell made up of n line segments of lengths \ell_1, \ldots, \ell_n and let X be the random variable representing the number of crossings for that noodle; as above, we have:

E(X) = \sum_{i=1}^n f(\ell_i) = \sum_{i=1}^n C \ell_i = C \sum_{i=1}^n \ell_i = C \ell

Thus the expected number of crossings for piecewise linear noodles is still just C times the length of the noodle.

Now take any noodle which is in the shape of a piecewise smooth curve \gamma. Borrowing another fact from basic calculus, \gamma is the uniform limit of a sequence of piecewise linear curves \gamma_n whose lengths converge to the length of \gamma (note that this second condition is not automatically implied by the first). Since the expectations for each of the piecewise linear noodles is given by the formula C \ell, this formula holds for any piecewise smooth noodle. Thus we have proved:

Proposition: Let X denote the random variable corresponding the number of line crossings for a rigid piecewise smooth noodle of length \ell tossed at lined sheet of paper. Then E(X) = C \ell for some universal constant C

Already we have proven something that wasn’t really obvious at the outset: the expected number of intersections depends only on the length, and not on the shape, of the noodle! It remains only to calculate the constant C; since this constant is the same for any noodle (of any shape an any length), it suffices to work out just one example. There is one particular noodle for which this calculation is especially easy: the circle whose diameter is d (the spacing of the lines). This is because no matter how you drop such a circle on a sheet of lined paper whose lines are spaced d apart, the circle *must* intersect exactly two lines and therefore the expected number of intersections is simply 2. The length of the circle of diameter d is \pi d, so we have 2 = \pi d C and hence C = \frac{2}{\pi d}. Thus the expectation formula is E(X) = \frac{2 \ell}{\pi d}.

In the example at the beginning of this post, we considered a needle of length 1 (so that \ell = 1) tossed at a sheet of lined paper with line spacing d = 2. According to our formula, this means that the expected number of line crossings is simply \frac{1}{\pi}. But as we observed above, the expected number of crossings for a needle which is shorter than the line spacing of the paper is simply the probability that at least one crossing will occur. Therefore this probability is \frac{1}{\pi}. According to the law of large numbers, this means that the ratio of the number of needle tosses to the number of crossings approaches \pi as the number of tosses tends to infinity.

Crofton’s Formula

Having accounted for Buffon’s needle and noodle experiments, let us reflect on what we have done. We set out to answer a question about the statistics of tossing noodles at paper, and we found that the answer to the question is an explicit formula involving only the length of the noodle and some constants. Flipping this formula around, notice that this gives us a surprising way to calculate the length of the noodle:

\ell = \frac{\pi d E(X)}{2}

In other words, length is a quantity which can be measured and manipulated using statistical techniques. You might not be convinced right away that this is a useful way to think about length, but in fact there are a variety of difficult theorems in geometry which have remarkably easy proofs when they are translated into this language.

Actually, it’s useful to think about all of this in a slightly different way. Instead of throwing the noodle at the paper, we’ll imagine throwing the paper at the noodle. In other words, we’ll ask a slightly different question: what is the average number of times that a random line in the plane intersects a given plane curve? This question is conceptually a little more problematic than the old question because it is not completely clear what the phrase “random line” should mean; Bertrand’s Pardox is a good illustration of the subtleties involved.

Here is the right meaning of the phrase for our purposes. The set of all oriented lines in the plane can be parametrized by two coordinates: the (signed) distance r from a line to the origin (a number from -\infty to \infty) and the direction \theta (an angle) in which it points (a number from 0 to 2\pi). With this parametrization, we can interpret a “random line” to simply be a random point (r,\theta) in the strip (-\infty,\infty) \times [0,2\pi]. (To placate the highly mathematically literate members of my readership, I’ll remark that the space of lines in the plane is topologically a homogeneous space which has a unique translation invariant measure, and this space differs from the strip with Lebesgue measure by a set of measure zero.)

Now, associated to any piecewise smooth curve \gamma in the plane is a function n_{\gamma}(r,\theta) which represents the number of times that the line determined by the values (r,\theta) intersects \gamma. Crofton’s formula relates the average value of this function to the length of \gamma:

Crofton’s Formula: \text{Length}(\gamma) = \frac{1}{4} \iint n_{\gamma}(r,\theta)\, dr\, d\theta

This formula can be proved using more or less the same procedure that we used to calculate the expected number of crossings for a noodle thrown at a sheet of lined paper: argue that the integral on the right-hand side is additive for line segments attached end-to-end, use an approximation argument to show that it agrees with length up to a multiplicative constant, and fix the constant by calculating a single explicit example (the circle is once again a good choice).

As I alluded in the beginning of this post, Crofton’s formula is the tip of a very deep iceberg. There are analogous formulas for area, volume, and a plethora of interesting geometric quantities. In my next post I will use Crofton’s formula to deduce some facts about curves of constant width, tying this post together with post on the Coaster Roller.