Theorem: Every continuous function has a fixed point, where is the closed unit ball in the plane.
Proof:
Suppose has no fixed point point, meaning and are distinct for every . Define a function (where is the boundary circle of ) as follows. Given there is a unique line in the plane containing both and , so there is a unique line segment containing whose endpoints consist of and a point on . Define to be the endpoint on . Explicit calculations (using the continuity of ) show that is continuous, and moreover if then . A continuous function from a topological space to a subset which restricts to the identity on is called a retraction; we have shown that if there is a continuous function with no fixed points then there is is a retraction .
Let us use algebraic topology to prove that there is no such retraction. Let denote the inclusion map, so that is the identity. Passing to the induced homomorphism on fundamental groups, this shows that is the identity and hence is surjective. But is the trivial group since is contractible and , so could not possibly be surjective, a contradiction. QED
One might wonder if the argument above works for the closed unit ball . Indeed, the first part of the argument works in higher dimensions almost verbatim, and one gets that any continuous function gives rise to a retraction onto the boundary sphere. But the second part of the argument fails: the fundamental group of is trivial for , so there is no contradiction. The solution is to replace the fundamental group with the higher homotopy group ; whereas is the group of homotopy classes of continuous maps , is the group of homotopy classes of continuous maps (of course, all spaces, maps, and homotopies must have base points).
In the proof of the Brouwer fixed point theorem above, we only needed three properties of the fundamental group:
The first two of these properties generalize to higher homotopy groups with almost identical proofs. The counterpart of the third property, namely that is not the trivial group, is considerably more difficult. One typically computes using covering space theory, but there is no counterpart of covering space theory for higher homotopy groups. (Well, such a theory does exist in a manner of speaking, but it is much more complicated than covering space theory.)
To actually compute one needs some rather powerful tools in algebraic topology, such as the Freudenthal suspension theorem or the Hurewicz isomorphism. The difficulty of this computation is still a bit mysterious to me, and was the subject of one of my recent MathOverflow questions. Even the more modest goal of proving that is non-trivial is quite a bit more challenging for than for . Nevertheless, I came up with an argument in the case based on vector calculus which is suitable for undergraduates; I don’t think I’ve seen this exact argument written down anywhere else, so I thought I would write it up here. It is adapted from a more standard argument involving Stokes’ theorem on manifolds which works in any dimension (but which requires a semester’s worth of manifold theory to understand).
I will prove the following statement:
Main Theorem: There is no continuous retraction .
This alone is enough to prove the Brouwer fixed point theorem for without having to worry about higher homotopy groups, but in fact it implies that is nontrivial. Pick a base point and consider the identity map . This determines a class , so if is the trivial group then there is a base point preserving homotopy between and then constant map given by . This homotopy is a continuous map which satisfies:
Given such a homotopy, define by if and . It is not hard to check that is a retraction, contradicting the main theorem.
To prove the main theorem we need a technical lemma:
Lemma: If there is a continuous retraction then there is a smooth retraction.
The proof of this lemma uses some slightly complicated analysis, but ultimately it is fairly standard; see the final chapter of Gamelin and Greene’s “Introduction to Topology”, for example. The only other non-trivial input required to prove the main theorem is the following classical result from vector calculus:
Divergence Theorem: Let be a compact subset of whose boundary is a piecewise smooth surface , let denote the outward unit normal field on , and let be a smooth vector field on . Then:
Here (“divergence”) is the differential operator .
Proof of Main Theorem:
By the previous lemma it suffices to show that there is no smooth retraction from to , so suppose is such a retraction and denote its component functions by . Thus may be viewed as a smooth vector field on ; since for we have , , and for every .
Consider the smooth vector field where is the gradient operator. We will compute the integral of over in two different ways and get two different answers, giving a contradiction. Both computations will use the divergence theorem:
The first computation uses a bit of vector calculus. By the product rule for the divergence of a function multiplied by a vector field, we have:
The second term on the right-hand side vanishes by the product rule for the divergence of the cross product of two vector fields:
Here we used the fact that the curl of the gradient of any smooth function is the zero vector.
According to the standard “triple product” formula from vector algebra, the first term is the determinant of the Jacobian matrix associated to whose rows consist of , , and . I claim that this determinant is zero. Since takes values in we have that ; differentiating both sides of this equation with respect to gives , or equivalently . Similarly and , so the vectors , , and are all orthogonal to the same nonzero vector and hence there is a nontrivial dependence relation between them. But , , and are the columns of , so it follows that . We conclude:
by the divergence theorem. (The various identities used in this argument all appear in the Wikipedia page on Vector Calculus Identities, with the notation and .)
Now let us compute the same integral using the fact that on . Using , , and we calculate that and hence . By the divergence theorem we get:
This is a contradiction. QED
We are going to use again and again the formula for conjugating permutations from my last post, so I will repeat it here for reference:
Lemma 1: Let be a cycle and let be any permutation. Then
Let us jump right into the proof of the main result:
Theorem: is simple for every .
Proof: We use induction on . The base case, , was handled in the last post. So assume that is simple, and let be a proper normal subgroup of , . Our aim is to show that is the trivial group.
Our first step is to prove that no non-identity element of can fix any symbol. Let denote the subgroup of consisting of all elements that fix the symbol ; by Lemma 1 we have for any permutation . Note that for each , so if intersects some nontrivially then by the induction hypothesis. Moreover, since any can be obtained from by conjugation and is normal, we have that for all .
Now, any element of can be written as the product of pairs of transpositions. A pair of transpositions can only permute up to four symbols, so since every pair of transpositions fixes at least one symbol and hence is in some . Thus every element of can be written as a product of permutations each of which is in some ; since , it follows that , contradicting our assumption that is a proper subgroup.
So no non-identity element of can fix any symbol. Consequently, if two elements of agree on even one symbol then they must be the same, for if then fixes and hence is the identity. To complete the proof we will use this observation to show that the identity is the only element of .
We conclude that no element of can have a cycle of length larger than ; this means that is the trivial group.
QED
To understand the normal subgroups of a group it is very useful to first think carefully about its conjugacy classes; this is because a normal subgroup is by definition the union of conjugacy classes. Fortunately conjugation in the symmetric group is easy to understand using “cycle notation”. A -cycle in is a permutation which fixes all but symbols which acts on these symbols as:
The notation for this cycle is . It is not hard to show that every permutation decomposes as the product of disjoint cycles, and the decomposition is unique up to reordering the cycles. Indeed, cycle notation makes it particularly easy to understand conjugation.
Lemma 1: Let be a cycle and let be any permutation. Then
Proof: For we have and similarly .
QED
The lemma extends easily to the case where is the product of cycles, so we see that conjugation by preserves the cycle structure of while relabelling the symbols in the cycle. In particular, two elements of are conjugate if and only if the number and lengths of cycles are the same. For instance, is conjugate to in but not to .
Note that conjugacy in is a little more subtle. A -cycle is even if and only if is odd, but not all -cycles are conjugate in . For instance the transposition conjugates to in , but there is no even permutation which conjugates to and hence they are not conjugate in .
To prove that is simple, we will need to determine the sizes of all of its conjugacy classes. We will do this using the following tool:
Lemma 2: Let be an element of a group , let be the centralizer of (i.e. the set of all elements of which commute with ) and let denote the conjugacy class of . Then
Proof: Let act on itself by conjugation. The orbit of under this action is and the stabilizer is , so the result follows from the orbit-stabilizer theorem.
QED
We will apply this lemma as follows. First we will use our understanding of conjugacy in to identify the centralizer of a cycle. From that it is easy to identify the centralizer of a cycle in , and that will allow us to count the conjugates of a cycle in .
Proposition 3: Let be a -cycle. Then:
Proof: By Lemma 1, the conjugates of in are precisely the -cycles. To specify a -cycle one must specify the symbols in the -cycle and the order in which they appear; there are ways to choose symbols and different orders in which they can appear, though of the orders define the same cyclic permutation. Thus there are conjugates of ; by Lemma 2, .
The permutation clearly commutes with . Any permutation which fixes the symbols that acts on also commutes with , and the subgroup of all such permutations is isomorphic to . Thus the permutations , , all commute with ; there are distinct permutations of this form, so they make up the entire centralizer of .
QED
We are now ready to prove the main result of this post:
Theorem: is simple.
Proof: The only possible cycle structures of non-identity elements in are , , and . Recall that in the cycle structure completely determines the conjugacy class; in some of these conjugacy classes may split. Let us analyze each conjugacy class in turn using Proposition 3.
Including the identity, we have accounted for the conjugacy classes of all elements of : . So let be a normal subgroup of . Since is normal it is the union of conjugacy classes (including the identity), so is the sum of and some subset of . But must also divide ; checking cases the only possible choices for are and .
QED
Definition: The alternating group is the subgroup of consisting of all even permutations.
is a normal subgroup of of index ; the objective of this series of posts is to prove that is simple for , meaning its only normal subgroups are itself and the trivial group. The significance of this property is that if a group has a normal subgroup then one can form the quotient group , and often one can infer properties of from properties of and . So simple groups are in a sense the “atoms” from which all other groups are built, though it should be noted that and alone do not uniquely determine .
Classifying all finite simple groups was one of the great achievements of 20th century mathematics, and like many great mathematical achievements it went almost completely unnoticed by the rest of the world. The classification theorem asserts that all finite simple groups fit into a few infinite families (one of which is the family of alternating groups) with precisely 26 exceptions, the so-called sporadic simple groups. A shameless plug: when I was an undergraduate I did an REU project with Igor Kriz which involved making little computer games based on the sporadic simple groups; later we wrote a Scientific American article about them.
In any event, the classification program took decades to complete and spans thousands of pages written by dozens of mathematicians, and its completion seems to have essentially killed off finite group theory as an active area of research (though from what I understand there are lots of open problems in representation theory for finite groups). Given how monumental the effort was and how few people are still working in finite group theory, I worry that in a few decades all the experts will retire or die and there will be nobody left who understands the proof. It’s a good illustration of the principle that mathematicians tend to care much more about questions than answers.
Aside from their role in the classification program, the alternating groups play a crucial role in the theory of polynomial equations. Indeed, the very notion of a group was invented to understand the structure of solutions to polynomial equations, and the group is the star of the show.
Everyone learns in high school algebra that there is a formula for the roots of a quadratic equation :
Less well known is that there is also a cubic formula and quartic formula for degree three and four equations, respectively. These formulas date back to the 16th century, and it was a frustratingly difficult open problem to find a formula for the roots of a polynomial equation of degree five. It wasn’t until the 19th century that Abel and Galois independently realized that no such formula exists! Abel’s proof came first, but I don’t know what it was; Galois’ argument is the one that survived. Here is a brief sketch.
Galois’ key idea was to focus on the symmetries exhibited by the roots of a polynomial equation. More precisely, he considered their symmetries relative to the rational numbers; there are well-known techniques for finding rational roots of polynomials, so he was interested in the structure of the irrational roots. Let’s look at a couple examples:
Of course, one can make all this precise using the language of field extensions. The upshot is that the symmetry groups help characterize what it means to find a formula for the roots of a polynomial equation. As in the example above, equations of the form have cyclic symmetry group . So if the quintic formula had the form , for instance, then the symmetry group could be decomposed into a part and a part corresponding to the fifth root and cube root, respectively. More precisely, a polynomial equation can be solved by radicals if and only if its symmetry group has a decomposition
where is a normal subgroup of and is cyclic. Groups with this property are said to be solvable due to the connection with solving equations.
Now, there exist polynomials of degree whose symmetry group is the full symmetric group (in fact there are many). contains as a normal subgroup with quotient , but once we have proved that is simple we will know that it is not solvable: it has no nontrivial normal subgroups whatsoever, let alone one with a cyclic quotient. This argument shows that there cannot be a general formula in the spirit of the quadratic, cubic, or quartic formulas, but it also shows even more: it gives you a criterion (solvability of the symmetry group) to determine when there is a formula for the roots of a specific polynomial.
In any event, mathbabe was commenting on a video which has apparently been making its way around the internet. In this video, some mathematicians (Or perhaps physicists? What are string theorists calling themselves these days?) attempted to explain the mind-boggling “fact” that
Watch the video if you like, but by now a number of other mathematicians have rightfully pointed out that most of the fishy manipulations in the video amount to fraudulent nonsense which can be used to justify just about anything. This infuriates me, because the people who made the video could have used the opportunity to legitimately blow people’s minds by placing the equation above (which does make sense, from the right point of view!) in its proper context and explaining some beautiful mathematics.
I don’t have the apparatus to make a cool video, but I do have a blog. So I’m going to make an attempt to do what I think the video should have done (I am not optimistic that my attempt will get picked up by Slate, of course). Instead of adding up all of the positive integers, I’m going to start by adding up all of the powers of two:
We still get a negative number, so this equation should be just as counter-intuitive as the original one (though admittedly is pretty bizarre). Our strategy for making sense of both equations will be the same:
This strategy, called analytic continuation by mathematicians, is extremely powerful. But the basic idea is really quite simple, and it is even familiar in the context of language. When you “log in” to your e-mail account, your messages are likely organized into various “folders”, some of which are in your “inbox” and some of which are in your “outbox”. Perhaps you have a list of your friends’ e-mail “addresses” in your “address book”. The words that I put in quotes all began life in the narrow context of physical reality but have been extended to the new context of the internet; your e-mail inbox is not a physical box anywhere in the world, nor does your e-mail address refer to an actual place you can go. Someone at some point in the history of the internet realized that physical mail is a good metaphor for the electronic messages people send each other, and thus the language surrounding physical mail actually makes sense in the context of the internet.
The strange equations that I wrote above are mathematical counterparts of taking a word such as “hyperlink” which only really makes sense in the context of the internet and applying it to real world mail. You would end up with a sentence which looks pretty bizarre, but there would nevertheless be a certain logic to it.
Let’s see how this all plays out mathematically. We’ll start with something that isn’t likely to stir up much controversy:
This is the mathematical counterpart of the observation that if you walk across half of a room, then a quarter of the room, then an eigth, and so on then you will have crossed the whole room. (Of course, there are some philosophical questions to be raised by the fact that the phrase “and so on” took the place of an infinite number of actions. Even non-controversial infinite series deserve serious thought.)
You might also convince yourself that
It might not be obvious that the answer is , but this answer is at least plausible: we start with a number which is smaller than and add increasingly tiny numbers to it. And if you plug numbers into a calculator you will get good numerical evidence that this equation makes sense; the further out you go in the sum, the closer you get to . In general, if the absolute value of is a number smaller than , we have:
Of course, the only context in which the left-hand side really makes sense is when ; this ensures that the powers of get very small very vast and thus the settles near a particular value. If then there is no such guarantee: the powers of $s$ do not get smaller, and you can get a number as large as you want by adding up enough terms in the sum.
The right-hand side, on the other hand makes sense in a much larger context: we can plug in any number except ! In particular, we can plug into to obtain . Since the expressions and agree when , it makes sense to use the latter expression as a proxy for the former at other values of , such as . In othe words, it is not entirely stupid to write:
There is some theory which makes this equation even less stupid: is (in a sense which can be made precise) the only sensible way to extend beyond the set . Properly justifying this requires techniques coming from one of the most beautiful subjects in all of mathematics: the calculus of complex numbers. It should not be at all obvious, but in the end this whole discussion is really all about the mysterious powers of complex numbers.
The same techniques allow us to analyze the sum which got this post started; this time, our starting point is the Riemann Zeta function:
This time the sum on the left-hand side makes sense as long as , but the same tools described above imply that the Riemann Zeta function can be “analytically continued” to allow any input except , and its value at can be calculated to be . This calculation could occupy another entire blog post, so I will not go any further than that at this time.
Now that I have explained the sense in which it is not completely stupid to say that the sum of all the positive integers is , I would like to conclude by arguing that it still is pretty stupid. Notice that according to the reasonging described in this post we did not assign the sum a value by thinking about it intrinsically as we can with, for instance ; instead we related the sum to the Riemann Zeta function and analyzed that function. But there are infinitely many other possible functions which have a similar relationship to , and many of them will assign different values to the series following the steps outlined here. In fact, you can use these steps to justify giving the sum any value you want. Still, the Riemann Zeta function enjoys a privileged position in mathematics (and physics) so is a pretty good choice.
Let us begin by providing some precise definitions. Recall that a plane curve is simply a continuous function , and a plane curve is closed if it begins and ends at the same point, i.e. . A closed curve is simple if it intersects itself only at the endpoints, meaning only if or and are both endpoints of the interval . The most basic fact about simple closed curves is that they divide the plane into two disconnected regions: a bounded piece (the “inside”) and an unbounded piece (the “outside”). This is called the Jordan curve theorem, and as far as I know the simplest proofs use some reasonably sophisticated ideas in algebraic topology (though only a mathematician would think it even needs to be proved!)
Given a simple closed curve , let denote the image of , i.e. the set of all points in the plane that passes through, and let denote together with the points “inside” . A line in the plane is said to be a supporting line for if it intersects but does not pass through any interior points of . The set is closed and bounded, so there are exactly two distinct supporting lines for in any given direction. The set of directions in the plane can be parametrized by an angle between and (with the understanding that and represent the same direction). Thus we define a “width” function on the set of directions by letting denote the distance between the supporting lines for in the direction . Here’s what the width looks like in an example:
Finally, we say that has constant width if is constant. The goal is to prove that any two curves of constant width have the same length, and that among all curves of constant width the circle of diameter has the largest area. Before proceeding, we need to understand the geometry of constant width curves a little better.
Specifically, we want to show that every curve of constant width is convex, meaning contains the line segment between any two of its points. In fact we will prove something a bit stronger: is strictly convex, meaning it is convex and contains no line segments (so that the line segment joining any two points in actually lies in the interior of ). This requires a nice little trick that I couldn’t figure out on my own; special thanks to Ian Agol for helping me out on mathoverflow.
Proposition: Every curve of constant width is strictly convex.
Proof: Let denote the convex hull of ; this is by definition the smallest convex set which contains . According to a general fact from convex geometry, the boundary of consists only of points in the boundary of and possibly line segments joining points in the boundary of . So we will show that the boundary of contains no line segments, implying that and hence that is strictly convex.
According to another general fact from convex geometry the supporting lines for are precisely the same as the supporting lines for , and hence has the same constant width as . So assume that the boundary of contains a line segment joining two points and . Since is convex, the line passing through and is a supporting line for . There is exactly one other supporting line for parallel to this line; let denote a point where it intersects . Consider the triangle ; its height is precisely , the width of , so we have that is strictly smaller than at least one of or . Assume and consider the supporting lines for which are perpendicular to the line segment joining and . The points and must lie between (or possibly on) these supporting lines, but the distance between the supporting lines is since has constant width. We conclude that , a contradiction.
QED
The reason why strict convexity is important to us is that lines intersect strictly convex curves in a very predictable way:
Lemma: Let be a closed strictly convex curve and let be a line which intersects . Then intersects exactly once if it is a supporting line or exactly twice if it is not.
Proof: Note that the intersection of two convex sets is again convex, so the intersection is a convex subset of a line. Since is closed and bounded the same must be true of the intersection, so the only possibility is that is a closed interval with . Note that interior points of correspond to interior points of and the boundary points and correspond to boundary points of , so we have that if and only if is a supporting line and otherwise. Thus supporting lines intersect exactly once and any other line which intersects does so exactly twice.
QED
We are now ready to calculate the length of a constant width curve. Our strategy is to use the main result of my previous post, “The Mathematics of Throwing Noodles at Paper.” There we saw that if one randomly tosses a curve of length at a lined sheet of paper with line spacing then the expected number of line intersections is given by . So let us toss our curve of constant width at a lined sheet of paper with line spacing . The curve must intersect at least one line and it can’t intersect three or more lines, so it either intersects exactly one line or exactly two lines. The curve intersects exactly two lines if and only if they are supporting lines, and hence each line intersects the curve exactly once by the lemma above. If the curve intersects exactly one line then it cannot be a supporting line and thus the lemma implies that the curve intersects the line exactly twice. In either case the total number of intersections is exactly , and thus the expected number of intersections is . Therefore
and hence . Thus every curve of constant width has length , an assertion consistent at least with the circle of diameter . The result is called Barbier’s Theorem, and it has a variety of different proofs; I find the argument using geometric probability to be the most beautiful.
We have now settled the length question; what about area? In fact, to place an upper bound on the area inside a constant width curve we will simply use our length calculation together with the following landmark theorem in geometry:
Theorem: Let be the length of a simple closed curve in the plane and let $A$ be the area that it encloses. Then:
with equality if and only if the curve is a circle.
In other words, among all curves with a given length the circle is the unique curve which encloses the largest area. This theorem is called the isoperimetric inequality, and it has many beautiful proofs, generalizations, and applications. Our claim about the area enclosed by constant width curves is an immediate corollary since they all have the same length (given a fixed width). I originally intended to prove the isoperimetric inequality in this post using geometric probability, but I would need to take some time to explain how to calculate area probabilistically and I think the post is long enough as it is. Perhaps I will revisit this in the future.
The thrust of Izabella Laba’s post (entitled “Gender Bias 101 for Mathematicians”) is that gender bias in the mathematical community is not limited to a few grouchy old codgers, but rather that it is a systematic cultural and psychological phenomenon which afflicts everybody. There are two potentially controversial assertions implicit in this statement:
The first assertion is pretty hard to argue with, though I’m sure some people still try. Every math department with which I have been affiliated is *massively* male dominated, and there is ample evidence that hiring practices, salaries, journals, etc. are stacked in favor of men. I’m not going to try to document or justify this in any detail because I don’t have the facts available at my fingertips and because the issue has been argued to my satisfaction elsewhere (e.g. in the Accidental Mathematician).
The second assertion might be more surprising to some, and it’s the one I want to discuss here. Izabella Laba’s post quotes a recent study in which faculty from research oriented universities were presented with applications for a lab manager position with randomly assigned male or female names. The study found that a given application with a male name at the top was consistently rated more highly than the same application with a female name. Interestingly enough, the pattern was independent of the gender of the faculty evaluator: female professors were just as biased as male professors. Cathy O’Neil contributes another study which shows that 15-year-old girls outperform 15-year-old boys in science exams in some countries but not others (not in the United States), indicating that gender gaps in science are cultural rather than biological.
Both of these studies are quite compelling, and I’m sure there are others which point to the same conlusion. My intention is to participate in this discussion subjectively rather than objectively. In short, I am going to use the rest of this post to analyze my own gender-oriented biases. Something feels a bit self-indulgent about this exercise, but I think it will be healthy for me even if it isn’t useful for anyone else.
I will begin by admitting outright that I am biased against women. I consider myself to be a pretty progressive guy – perhaps even more progressive than most – and I think that most people who know me would say that overall I do a good job of treating women with the same respect with which I treat men. But this is not because I don’t have biases, it’s because I work very hard to identify them and eliminate them or at least minimize their impact on my behavior. I am unqualified to generalize my own psychological observations to everyone else, but I suspect that it is neurologically almost impossible for a person socialized in 20th or 21st century American society to avoid gender biases: we are bombarded with overt and covert messages about gender constantly and starting at a very young age. Given what I have been learning lately about how insignificant our conscious thought processes are in comparison to our subconscious psychological machinery, these messages must take their toll.
What forms do gender biases take? There are many answers with varying applicability to me. Here is a non-comprehensive unordered list that I have assembled from reading things online, talking to people, and making my own observations.
I’m sure there are other biases worth mentioning, but this list feels like a good start. One interesting supplementary observation about biases in general is that thinking about them leads to an unfortunate feedback loop: worrying about biases against women affects my behavior toward women. I think this effect is fairly minimal in comparison to the consequences of ignoring my biases and failing to monitor my behavior at all, but it’s there all the same.
My final remark about this subject is that there are many other bias issues which are also largely ignored by the mathematical and scientific community. I have encountered some discussion of racial bias in science, but I have heard almost no discussion about biases related to sexual orientation. If anyone reading this is aware of any studies or references about these issues, I would be interested in seeing them. Also, in this post I have focused on the effects of bias on my interactions with my colleagues, but the way my biases manifest themselves in my teaching is a whole other subject which I might take up in the future.
Get out a sheet of paper and draw parallel lines on it spaced two inches apart. Take a one-inch long needle and repeatedly toss it onto the sheet of paper, counting the total number of needle tosses and the number of times the needle touches one of the lines. What do you expect the ratio of number of tosses to the number of line intersections to be? In other words, what is the probability that a randomly tossed needle intersects a line? I don’t think the answer is obvious, but it turns out to be the ubiquitous number You can in principle use this to experimentally calculate , though you unfortunately need to toss the needle an impractical number of times in order to get a reasonable estimate.
This experiment is known as Buffon’s Needle Experiment; in this post I’m going to explore more general and seemingly more difficult phenomenon called Buffon’s Noodle Experiment. The setup is the same as before, only instead of tossing a needle (a line segment), we’ll toss a rigid “noodle” in the shape of any desired plane curve. We’ll find that the statistics of noodle crossings is determined just by the length, and not the specific shape, of the noodle in question. Thus the noodle experiment makes a profound connection between geometry and probability theory, a connection which helps solve difficult problems in both areas. This is encapsulated by a beautiful tool called Crofton’s Formula, the first result in an area of mathematics called “Integral Geometry” (or alternatively “Geometric Probability,” depending on whom you ask).
Part of the beauty of the integral geometry approach to Buffon’s needle experiment is that it involves almost no calculations. There are other approaches that involve writing out probability density functions and calculating double integrals, but the argument that I will give below involves only basic (but surprisingly subtle) ideas in probability theory and calculus. It’s a great example of a tricky problem that can be solved through careful abstract thought.
To come to grips with the needle and noodle experiments, we will try to answer the following question: what is the expected number of times that a randomly thrown noodle will cross lines on lined paper with line spacing ? We count with multiplicities: if the noodle intersects the same line twice, that counts as two line intersections.
Let be the random variable which represents the number of line intersections for a given noodle. Recall that the expected value of is given by:
where represents the probability that the number of line intersections is exactly . Here are a few basic observations about these expectations in the case where the noodle is actually a needle (i.e. a line segment).
Now, consider a noodle made up of exactly two needles of lengths and joined rigidly end-to-end. Denote the random variables representing the number of crossings for the two needles by and , respectively; then the random variable representing the number of line intersections for the noodle is just . Note that and are not independent since the needles are joined, but it is nevertheless true that
A priori the expectation depends on the lengths and of the two needles as well as the angle at which they are joined, but the calculation above shows that the expectation is actually independent of the angle. Therefore we can calculate the expectation just by considering the case where angle is such that the two needles form a line segment, i.e. a needle of length . From this we conclude that
We can iterate this argument for any noodle made by chaining together needles of lengths to conclude that:
In particular, if the needles all have the same length, we conclude that for any positive integer and any positive real number . By dividing a needle into equal pieces, this also shows that . Combining these two facts, we conclude that for any rational number . The expected number of crossings for a longer line segment is at least as large as the expected number of crossings for a shorter one, so we also know that the function is non-decreasing. Also, we have that (i.e. the line segment of length zero doesn’t intersect any lines). Basic calculus tells us that the only non-decreasing function which satisfies , , and for every rational number is the function
where is some constant. This is already a pretty strong statement about the expected number of crossings for a needle, but we can do even better. Take any noodle of total length made up of line segments of lengths and let be the random variable representing the number of crossings for that noodle; as above, we have:
Thus the expected number of crossings for piecewise linear noodles is still just times the length of the noodle.
Now take any noodle which is in the shape of a piecewise smooth curve . Borrowing another fact from basic calculus, is the uniform limit of a sequence of piecewise linear curves whose lengths converge to the length of (note that this second condition is not automatically implied by the first). Since the expectations for each of the piecewise linear noodles is given by the formula , this formula holds for any piecewise smooth noodle. Thus we have proved:
Proposition: Let denote the random variable corresponding the number of line crossings for a rigid piecewise smooth noodle of length tossed at lined sheet of paper. Then for some universal constant
Already we have proven something that wasn’t really obvious at the outset: the expected number of intersections depends only on the length, and not on the shape, of the noodle! It remains only to calculate the constant ; since this constant is the same for any noodle (of any shape an any length), it suffices to work out just one example. There is one particular noodle for which this calculation is especially easy: the circle whose diameter is (the spacing of the lines). This is because no matter how you drop such a circle on a sheet of lined paper whose lines are spaced apart, the circle *must* intersect exactly two lines and therefore the expected number of intersections is simply . The length of the circle of diameter is , so we have and hence . Thus the expectation formula is .
In the example at the beginning of this post, we considered a needle of length (so that ) tossed at a sheet of lined paper with line spacing . According to our formula, this means that the expected number of line crossings is simply . But as we observed above, the expected number of crossings for a needle which is shorter than the line spacing of the paper is simply the probability that at least one crossing will occur. Therefore this probability is . According to the law of large numbers, this means that the ratio of the number of needle tosses to the number of crossings approaches as the number of tosses tends to infinity.
Having accounted for Buffon’s needle and noodle experiments, let us reflect on what we have done. We set out to answer a question about the statistics of tossing noodles at paper, and we found that the answer to the question is an explicit formula involving only the length of the noodle and some constants. Flipping this formula around, notice that this gives us a surprising way to calculate the length of the noodle:
In other words, length is a quantity which can be measured and manipulated using statistical techniques. You might not be convinced right away that this is a useful way to think about length, but in fact there are a variety of difficult theorems in geometry which have remarkably easy proofs when they are translated into this language.
Actually, it’s useful to think about all of this in a slightly different way. Instead of throwing the noodle at the paper, we’ll imagine throwing the paper at the noodle. In other words, we’ll ask a slightly different question: what is the average number of times that a random line in the plane intersects a given plane curve? This question is conceptually a little more problematic than the old question because it is not completely clear what the phrase “random line” should mean; Bertrand’s Pardox is a good illustration of the subtleties involved.
Here is the right meaning of the phrase for our purposes. The set of all oriented lines in the plane can be parametrized by two coordinates: the (signed) distance from a line to the origin (a number from to ) and the direction (an angle) in which it points (a number from to ). With this parametrization, we can interpret a “random line” to simply be a random point in the strip . (To placate the highly mathematically literate members of my readership, I’ll remark that the space of lines in the plane is topologically a homogeneous space which has a unique translation invariant measure, and this space differs from the strip with Lebesgue measure by a set of measure zero.)
Now, associated to any piecewise smooth curve in the plane is a function which represents the number of times that the line determined by the values intersects . Crofton’s formula relates the average value of this function to the length of :
Crofton’s Formula:
This formula can be proved using more or less the same procedure that we used to calculate the expected number of crossings for a noodle thrown at a sheet of lined paper: argue that the integral on the right-hand side is additive for line segments attached end-to-end, use an approximation argument to show that it agrees with length up to a multiplicative constant, and fix the constant by calculating a single explicit example (the circle is once again a good choice).
As I alluded in the beginning of this post, Crofton’s formula is the tip of a very deep iceberg. There are analogous formulas for area, volume, and a plethora of interesting geometric quantities. In my next post I will use Crofton’s formula to deduce some facts about curves of constant width, tying this post together with post on the Coaster Roller.
I started volunteering at MoMath as an “integrator” this past Sunday; the primary role of the integrator is to float around the museum helping people who seem confused by an exhibit. A secondary role is to operate the few exhibits which require staff supervision, and I was taught how to operate an exhibit called the “coaster roller”. The coaster roller works as follows. There is a small pit full of peculiar-looking objects and a sort of “raft” which sits on top of them (I’ll take a picture next time I visit the museum). The objects are about the size of basketballs, but they are not spherical. Here’s a picture of what they might look like, stolen from the internet:
A person (or several children) can sit in the raft and pull themselves along using ropes, rolling over the strange shapes at the bottom.
So what’s so special about the shape of these objects? The important property that they each possess is that they have constant width, meaning that if one of the objects fits in a vise when it is pointing in one direction then it fits in the same vise when it is pointing in any other direction. This is important because if the width varied then the distance between the raft and the floor would vary as it rolled along and the ride would get pretty bumpy!
The constant width property is possessed by a sphere but not by a cube: the width of a cube is larger when measured between opposite corners than when measured between opposite faces. If you think about it for a moment, you might find it hard to convince yourself that bodies of constant width which aren’t spheres can even exist! But in fact they exist in abundance; I will explain a procedure which allows you to construct infinitely many different ones.
To begin, note that given any plane curve of constant width one can obtain a corresponding surface of constant width by rotating the curve in three-dimensional space (you may have already noticed that each of the surfaces in the picture above is rotationally symmetric). Not all surfaces of constant width arise this way, but in any event this shows that it is enough to look at curves of constant width if we are happy just finding a few basic examples.
To construct a curve of constant width, begin by drawing an equilateral triangle with side length . Then draw the circle of radius centered at each of the three vertices of the triangle. The boundary of the intersection of the disks enclosed by these circles is then a curve of constant width . Can you figure out why? Here’s a picture, stolen from Wikipedia:
In fact, the same procedure works if you start with any regular polygon with an odd number of sides (thanks to Ben Levitt from MoMath who corrected my original claim that it works for any regular polygon). The curves of constant width obtained in this way are often called Reuleaux polygons for their discoverer (even though they aren’t technically polygons).
There are all sorts of interesting mathematical questions one can ask about bodies of constant width. Here are some useful facts:
In my next blog post, I plan to discuss some of these facts in greater detail. If you want to read more in the meantime, I recommend The Enjoyment of Math by Rademacher and Toeplitz, two great 20th century mathematicians. Actually, I recommend that book even if you don’t care to read more about bodies of constant width!
For my inaugural blog entry, I’m going to write about one of my favorite exhibits in the museum. It’s based on a seemingly simple game which conceals some surprising mathematical secrets. I first heard about the game from Prof. Mel Hochster during the summer after I graduated from the University of Michigan in 2007. He was teaching a summer course on the Fibonacci numbers to talented high school students (I was a course assistant) and he used the game to illustrate the notion of isomorphism, a term which mathematicians use to describe seemingly different phenomena which are secretly the same.
Anyway, here’s the game. All you need is a friend, a piece of paper, and a pencil. Write the numbers 1 through 9 at the top of the page, and take turns with your friend choosing a number, crossing out each number once it has been chosen. The object of the game is to be the first person to select exactly three numbers which add up to 15.
Example:
You pick 5, I pick 9.
You pick 4, I pick 6.
You pick 8, I pick 3.
You pick 2, and you win: 5 + 8 + 2 = 15! (Note that I didn’t win even though I picked 9 and 6 because I needed *exactly* three numbers which add up to 15.)
If you have a friend nearby, give the game a try. It’s a pretty challenging game, and it’s structure isn’t particularly obvious. Does either player have a winning strategy? Can either player force a draw? What is the best starting move? If you think about the game for long enough, you might eventually be able to provide answers to some or all of these questions, but it will probably take some effort.
The beautiful thing about the game is that it is completely equivalent (isomorphic, in fact!) to a much simpler game that almost everybody understands. To see what’s going on, we’ll use a so-called magic square:
The property possessed by this table of numbers which makes it “magic” is that each row, column, and diagonal adds up to 15. Moreover, every triplet of numbers from 1 to 9 which add up to 15 is represented as some row, column or diagonal. Let’s go through the example game above one more time, only this time we’ll draw an “X” through each number that you picked and an “O” through each number that I picked. Here’s what it looks like at the end:
As you can hopefully see, the game is nothing more than tic-tac-toe in disguise!
At the museum, they came up with a clever way to turn this game into an exhibit. The two players play the game on a computer screen, but one player is seated inside a concealed booth with a magic square. The poor player without the magic square has to labor through a lot of arithmetic, while the player in the booth just has to play tic-tac-toe.
I think this game (and the corresponding exhibit) is one of the best non-technical illustrations of what mathematics is all about.
These are all extremely important themes in mathematics, and I hope to explore each of them further in future blog posts.