Use of unconditional probability

Filed under: math — jlm @ 21:06

Recently I had a discussion with some friends about the probability that a certain card was among the first n cards dealt from a shuffled deck of m cards. Here I’m going to only cover the specific case of a deck consisting of the 13 spades, and the chance that the ace is one of the first two cards dealt — n=2 and m=13. We came up with four ways to perform this elementary probability calculation.

Combinatorics: There are 12C1 ways to have the ace and one other spade, without regard to order. There are 13C2 ways to deal two spades. So the probability is 12C1/13C2 = 12/78 = 2/13.

Trickery: The chance the first card isn’t the ace is 12/13. The chance that the second card, dealt from the remaining 12 cards, also isn’t the ace is 11/12. So the chance that neither is the ace is (12/13)⋅(11/12) = 11/13. Thus the chance that one of the first two cards is the ace is 1 − 11/13 = 2/13.

Conditional probability: The chance that the first card is the ace is 1/13 and the chance it isn’t is 12/13. If we represent the event of the first card being the ace as 1 and the event the second card is the ace as 2, then that is P(1) = 1/13 and P(~1) = 12/13. Now, P(2|~1) = 1/12. ∴ P(1 or 2) = P(1) + P(2|~1)⋅P(~1) = 1/13 + (1/12)⋅(12/13) = 1/13 + 1/13 = 2/13.

Unconditional probability: P(1 or 2) = P(1) + P(2) − P(1 and 2) = 1/13 + 1/13 − 0 = 2/13.

They all generalize into dealing n cards out of m, but the one using unconditional probability not only is the simplest, it generalizes the most cleanly. But most of us found it the least satisfying solution. Maybe because it was too simple?


Ouroboros in Std.Logic

Filed under: math — jlm @ 18:14

This post dissects a very short yet fascinating and enigmatic proof that can be found in the Logic module of the standard library of the Lean proof verifier:

theorem iff_not_self : ¬(a ↔ ¬a)
  | H => let f h := H.1 h h
         f (H.2 f)

OK, the fact being proved isn’t enigmatic — it’s stating that a proposition’s truth value is never the same as its own negation’s — but the proof thereof sure is. If that looks to you like some incomprehensible formal logic circling to eat its own tail, that’s because it kinda is.

However, it’s written extremely densely, which inhibits comprehension — which is sad, because the way it works is very interesting. It’s quite the rabbit hole, though, so let’s take our trusty spade and start digging. The very first thing we encounter is that Lean implicitly converts the ¬(a ↔ ¬a) into (a ↔ ¬a) → False. This is because Lean uses dependent type theory to represent logical statements and their proofs, so we need to discuss how these kinds of theories are used with formal logic.



Calculating limits with hyperreals

Filed under: math — jlm @ 18:37

Here’s a simple (first-year-calculus level) math puzzle:
What’s the limit as n → ∞ of nx⋅(nxx – 1) for x > 0 ?

The right insight makes it work out rather elegantly, but suppose your math muse is AWOL cavorting in Cancun today — is there a more systematic way to approach these kinds of problems? It turns out that the tools of Robinson’s hyperreal analysis are very good at attacking this stuff.

Since we’re looking at a limit as n approaches infinity, the first thing to do is to simply set n to infinity — or rather, to an arbitrary infinite hyperreal (all the positive infinite hyperreals are indistinguishable in a deep sense). The second step is to write everything in terms of infinitesimals, so we introduce the infinitesimal m = 1/n and our expression becomes x⋅((xx)m – 1)/m.

Now we want to get everything expressed as polynomials of infinitesimals (here just m) of as low an order as lets things work out, so we expand the Maclaurin series of all the not-ploynomial-in-m stuff. Here that’s only f (m) ≝ (xx)m, which expands to f (0) + f ’ (0)⋅m + o(m²) = 1 + (ln xx)⋅m + o(m²) because f ’ (m) = (ln xx)⋅(xx)m.

Plugging this back into x⋅((xx)m – 1)/m gives x⋅((ln xx)⋅m + o(m²))/m = x⋅((ln xx) + o(m)) = x⋅(ln xx) + xo(m). Now, x⋅(ln xx) = ln (xx)x = ln x. Also, xo(m) = o(m) since x is finite. So nx⋅(nxx – 1) simplifies into ln x + o(m).

Since m is infinitesimal, o(m) disappears when we transfer back into ordinary reals and make n a limit index variable once again, and we have limn→∞ nx⋅(nxx – 1) = ln x. Now, this is certainly more work than substituting in w ≝ 1/nx and applying l’Hôpital’s rule, but the general technique works on all kinds of limit calculations where a-ha! moments become few and far between.


Why cubie orientations are preserved

Filed under: math — jlm @ 10:59

A while ago in “Rotating only one cubie”, I said there was “extra structure” to the Rubik’s Cube’s moves that preserves the sum-total of all corner cubie orientations, and of all edge cubie orientations, but didn’t preserve the sum-total of center cubie orientations. I didn’t get into that then, because it’s hard to describe, and why they’re preserved wasn’t the point of that post. But I did make a note to get to it later, as I wasn’t aware of any good explanations of it that I could link to, and it’s high time for that “later” to become “now”.

The dynamics of the corner and edge cubies are pretty different in this regard, which is probably expected by everybody who’s played around with the Rubik’s Cube that much — edge and corner cubies have very distinct characters in general. As is usual with the Rubik’s Cube, the corner cubies are easier to think about. (I think this is mostly because there are only 8 of them, while there are 12 edge cubies.) So, let’s do them first.



Rotating only one cubie

Filed under: math — jlm @ 22:00

I’ve entertained myself for countless hours with the Rubik’s Cube, but always as kind of a pointless diversion. It’s something to keep my mind occupied, but I’ve never seriously studied it. I’ve built up a lot of intuition, but when it comes to the group theory that the cubes embody, I have nothing to say. One of the bedrock concepts I’d developed from my intuition is that it’s impossible to rotate only one cubie. You can flip two edge cubies (any two), or four or all twelve, but not one or three. Similarly, you can rotate a corner cubie by 120°, but only if you cancel it out by rotating another corner cubie (any of them) by −120°, or rotate two others by 120° as well to add up to a full 360° (and so be equivalent to no rotation). Why is this? My intuitionistic argument is that a single 90° rotation of a face rotates each of the four edge cubies of the face by 90° and the same for the four corner cubies.

Quarter face-turn rotates 4 edge cubies             Quarter face-turn rotates 4 corner cubies

Since each turn rotates the edge cubies by 360° in total and every sequence is a composition of face turns, the “total rotation” of all the edge cubies together is a multiple of 360° and so equivalent to 0, meaning there can only be an even number of 180° edge cubie flips. The corner cubie rotations sum the same way, hence the corner cubie rotations that don’t cancel themselves out must sum to a multiple of 360°. The conclusions are correct: no sequence can do an odd number of edge flips or corner rotations that aren’t equivalent to zero. However, the argument is unsound: adding rotations around different axes together willy-nilly like this can (in other contexts) produce complete nonsense — there’s some extra structure in the Rubik’s Cube that makes this OK [for different reasons explained here].

Anyway — I’ve “known” for 35 years or so that you cannot rotate a single cubie of the cube, you must also rotate other cubies of its type to cancel it out. But then, semi-recently, I encountered for the first time a “picture cube” where the orientations of all six center cubies had to be right. The first time I scrambled and solved it was fairly easy: first I solved it like the normal cube, and after that it was easy to work out how to rotate one center cubie by 90° and another by −90°. The second time I scrambled and “solved” it I was left with one center cubie upside down and everything else as it should be. And my brain kind of broke. The one center cubie had rotated 180° and everything else was in place. This had to be impossible, you can’t rotate just one cubie, you need to maintain a balance, there has to be another rotation to cancel it out, doesn’t there?!?! I felt like I had taken a number, multiplied and divided it by another number, and gotten a completely new number as a result. Gradually my faculties returned to me, and I thought back to why I thought you can’t have an unbalanced rotation, and it was blatantly obvious why the argument didn’t apply: a face turn rotates only one center cubie, so it does alter the total rotation of all the cubies after all. Each quarter turn preserves the total rotation of the edge and corner cubies, but alters the total rotation of the center cubies by 90°.

It took me about 15 minutes to completely shake off the feeling of unreality and get myself thinking in the paradigm where center cubies can all rotate independently. In another five minutes, it occurred to me that if I double-swapped the opposing edge cubies of one face and double-swapped the opposing corner ones as well, that was equivalent to a 180° face turn except the center cubie wouldn’t be rotated. That let me solve the configuration the cube was in. What about a 90° rotation? Impossible: the parity of the number of face quarter-turns was also the parity of the number of quarter-turns of all the center cubies summed together. So I had a full solution to the picture cube. Playing around with it further, I optimized the double-double-swap into a double-swap of an edge-and-corner-cubie complex and figured that was as far as I could take it. Yet, I’m sure that what’s going to stick with me the most is the eerie sense of irreality of staring at that lone center cubie, a thing that couldn’t be flipped, but was.


Quine, Hofstadter, Gödel again

Filed under: math, philosophy — jlm @ 20:25

During some recent traveling, I re-read some sections of Douglas Hofstadter’s Gödel, Escher, Bach. Of particular interest was a step in the proof of Kurt Gödel’s first incompleteness theorem which involved substituting a variable of a formula with the formula itself. Hofstadter calls this peculiar operation “quining” after the logician W. V. Quine, who wrote extensively about self-reference. As with the previous times I read through that part, I noticed that the operation didn’t specify a variable for substitution like substitutions generally do, but instead performed substitution on all free variables, which is something I haven’t encountered anywhere else. This oddity wasn’t even mentioned, much less justified. Unlike after my previous reads, this time I decided to figure out why it was performed that way.

Now, the more interesting parts of Gödel’s Incompleteness Theorem involve what we now call Gödel coding to demonstrate that classes of logical systems can represent quining their own formulas and verifying their own proofs, the latter of which was very surprising at the time. But those parts turn out to be unnecessary for dealing with this small facet of Gödel’s proof, so let’s just deal with the proof’s final proposition, which is comparatively simple and straightforward: given that a logical system supports the operations of quining and proof verification (and logical conjunction and negation, and an existential quantifier), that logical system is incomplete (or inconsistent).



Solving the quartic using “obvious” steps

Filed under: math — jlm @ 21:40

Looking over the derivations of solutions to quartics (fourth-degree polynomials), I was a little disturbed at how, except for René Descartes’s, the derivations all have steps where “magic” occurs, by which I mean that a step does something mathematically valid but very unintuitive, which just happens to make stuff later on work out just right.

So I wondered whether one could take an approach like Lodovico Ferrari’s, but do it without his “magic”. The basic idea to Ferrari’s approach is to convert the depressed* quartic equation f(x) = x⁴ + αx² + βx + γ = 0 into (x²+p)² = m(x+q)², because taking the square root of both sides produces the quadratic equation x²+p = ±(x+q)√m.

(x²+p)² = m(x+q)² expands into x⁴ + 2px² + p² = m(x²+2xq+q²), which is x⁴ + (2p−m)x² − 2mqx + p² − mq² = 0. We can name that polynomial g(x). From above, we know that the roots of g are the same as the roots of x² ± (x+q)√m + p, and the quadratic formula easily gives us those roots in terms of m, p, & q.

g ≡ f iff α = 2p−m and β = −2mq and γ = p²−mq². Thus our task is to get values for p, q, & m which satisfy these three equations.
m = −β/2q and p = ½(α+m), so p = ½(α − β/2q).
γ = p² − q²m = p² + q²β/2q = p² + ½βq = [½(α − β/2q)]² + ½βq = ¼[α² − αβ/q + β ²/4q² + 2βq].
4γq² = α²q² − αβq + ¼β ² + 2βq³.
2βq³ + (α² − 4γ)q² − αβq + ¼β ² = 0, which is just a cubic in q, which is something we’re presumed to know how to solve. Using q we get m and p from m=−β/2q and p=½(α+m).

With these values of m, p, & q, f ≡ g. So the roots of g(x) from above are the roots of f(x), et voilà, we’re done, no magic needed.


*A depressed nth degree polynomial is one in which the (n−1)th term has a coefficient of zero, so a depressed quartic is a quartic equation with no cubic term. For any nth degree polynomial F(x) = xn + an−1xn−1 + an−2xn−2 + …, there is a depressed nth degree polynomial f(x) = xn + bn−2xn−2 + … such that F(x+s) = f(x), because the coefficient of the (n−1)th term of F(x+s) is ns+an−1, which can always be made zero by selecting s to be –an−1/n.


Why was algebra so difficult to discover?

Filed under: math — jlm @ 18:48

(tl;dr: Asking a bunch of questions, and not making any guesses.)
Consider integer arithmetic prior to the discovery of algebra. How do you go about a really foundational task such as defining division? You give everything involved names, like so: A number called the dividend divided by another number called the divisor is a number called the quotient with a non-negative number called the remainder if the divisor times the quotient plus the remainder is the dividend and the remainder is smaller than the divisor. How do you get anything done with that kind of word salad? No, when you want to actually do some number theory, you go with: n÷d = q R r is equivalent to n = q∙d + r ∧ 0 ≤ r < d. This terminology is taught because you teach division well before you teach algebra, but it’s so useless afterwards that I very well might have never used this sense of the word “dividend” all these years since I learned algebra. Anyone skeptical that basic grade school arithmetic isn’t far harder to grasp without algebra is invited to explain how come long division gets the right answer without using it.

So fine, algebra is incredibly useful, but utility has no relevance to ease of understanding. I remember having some confusion when I was introduced to it. It was along the lines of: “What number is x?” “That can be any number.” “*boggle*”. Yet, those named terms above, they’re just placeholders too: “What number is the dividend?” “That can be any number.” “Oh, okay.” — How come that’s different?

The Greeks were using letters to represent arbitrary points in geometry centuries earlier than the Arabs did that for arbitrary numbers. In teaching geometry we don’t get the problem “What point is A?” “That can be any point.” “*boggle*” — What makes “x can be any number” so much harder than “A can be any point”?


Fun with metafunctions

Filed under: math — jlm @ 23:14

In mathematics, a function is a mapping from one set (the domain) onto another (not necessarily distinct) set (the range). Now, you can cause all kinds of weirdness if you let elements of these sets be functions themselves. Like, you can define a function k from integers to functions from integers to integers, which are the “constant” functions of that integer: if j = k(x) for some x in Z, then j(y) = x for all y in Z. j is a function which carries the x inside it, waiting to be applied to an argument, which is ignored and now the hidden x emerges. Written more densely, k(x)(y) = x for all x, y in Z.

But mathematics doesn’t allow you to do what’d be the most fun: Calling a function on itself. This is because you’d have to include a function in its own domain, and you get to an infinite regress, because the formal definition of the function includes specifying its domain which contains the function whose formal definition includes its domain … and you can’t build such a thing in set theory.

Okay, suppose I try to get around this problem by introducing a set Σ containing non-function elements which maps 1:1 to the functions from Σ to the integers Z. This can’t work because Σ has to be “bigger” than itself. So, suppose Σ only maps to a few of these functions. This seems like it should do, and I can just declare that any several functions I’m interested in have mappings to Σ, and functions obtained by Cantor diagonalization simply won’t. Let a be the function that maps (some of the functions from Σ to Z) onto Σ and b be its inverse.

Neither a nor b is a function from Σ to Z, so I can’t apply a(a) or b(b), so my attempts at self-applying need to be more subtle.
Let’s define a function from Σ to Z, and declare it to be a element of a’s domain:
    f(ξ) = b(ξ)(ξ) + 1
here ξ is a element of Σ, so b(ξ) is a function from Σ to Z, and b(ξ)(ξ) an integer, so it seems to work out.

I declared f to be in a’s domain, so a(f) exists and is an element of Σ, let’s call it θ.
Now I can finally achieve something close to application-on-itself with f(a(f)) = f(θ).
What can we say about this? Well, its value depends on b(θ)(θ).
But b is the inverse of a, so b(θ) = f, and so b(θ)(θ) = f(θ). But we defined f so that f(ξ) and b(ξ)(ξ) could never have the same value!

See what fun functions of functions returning functions are.


Haiku proof

Filed under: math — jlm @ 12:01

Product of all primes
Incremented won’t factor
Thus a larger prime.

I wonder if it’ll catch on…

Powered by WordPress