Reviewing Set Theory
- What are sets?
- Preliminary notions of Logic
This chapter is intended on making sure that everyone is on the same page regarding the fundamentals of set theory. This won't be a deep dive into the material: see Halmos's Set Theory for a more detailed treatment.
That being said, it may be the case that all or most of the material here is obvious. In that case, feel free to skip ahead.
What are sets?
Here's an informal definition:
A set is a collection of objects, which themselves are sets.
This is a simple idea: there's one kind of structure in the universe, and it consists of structures of the same kind. But this isn't a definition. You can't define sets in terms of sets. So what are they?
Any attempt at formalizing a definition of sets will likely lead nowhere. What are objects? What are things? What does it mean for something to be a collection?
Even more confusing is the fact that not all collections of sets we can imagine are sets, for reasons we'll see later in this post. Thus, we adopt an even more nebulous term: class. We call any "definable" collection of sets a class. What does definable mean? There's no good definition, other than "can be written down in some form". A great definition that is.
At some level, the fundamental issue we are facing is that of existence. What does it mean for a set to exist? Does it exist in the same way a grape exists? How about dark matter? There aren't any satisfying resolutions to these questions, since they are more questions of philosophy than mathematics. The perspectives I outlined in the background may offer some comfort from these questions.
However, to do any mathematics, we have to assume something, and for the purpose of this course, it makes sense to assume two things: 1. sets exist, and 2. deductive reasoning works (we'll talk more about the second later).
Preliminary notions of Logic
Before we dive into sets, we have to briefly cover some concepts we will cover in more detail in Chapters 2 and 3.
Sentences
There are plenty of statements whose truth value is known to us. For example: \(1=2\) is false, "pigs fly" is false, and the Jets will never win a super bowl is true (hehe). We call these kinds of statements atomic sentences if they can be expressed in our logical language. We call these sorts of sentences atomic as we can form more complicated ones from gluing together atomic sentences appropriately.
Now that we have atomic sentences, we want to form more interesting sentences by gluing together atomic sentences. To do so, we use logical operands.
There are many logical operands, but the following are the most commonly used ones.
Order | Symbol | Name | Description |
---|---|---|---|
\(0\) | \(\top\) | True | Outputs true. |
\(0\) | \(\bot\) | False | Outputs false. |
\(1\) | \(\lnot\) | Not | Outputs the opposite of the input. |
\(2\) | \(\vee\) | Or | Outputs true if at least one input is true. |
\(2\) | \(\wedge\) | And | Outputs true if at least all inputs are true. |
\(2\) | \(\veebar\) | Xor | Outputs true if exactly one input is true. |
\(2\) | \(\to\) | Implies | Outputs true if second input is true whenever the first is. |
A sentence is a logical operand with inputs as other sentences (potentially atomic).
Logical Formulas
Sentences are nice in the sense that they have a fixed truth value, but what if we wanted to replace the atomic formulas in a sentence with variables? In this case, the sentence becomes a function on the variables that outputs true or false depending on them.
For example: If I had the sentence "Pigs can fly or \(1=1\)", I can replace "Pigs can fly" with \(X\) and \(1=1\) with \(Y\) and create the formula "\(X\) or \(Y\)". This by itself no longer has a truth value, but once I input in the truth values of \(X\) and \(Y\), then it does. We call our modified sentence a sentence map for this reason and denote it as \(S(X,Y)\).
In this example, we call \(X\) and \(Y\) free variables, as their values are required as input. Now suppose I want to write the following statement as a sentence map: "For all even numbers \(x\), \(x=2\)". In this case, despite \(x\) being a variable, it is clearly not free.
To account for this, we add a new tool to our toolkit: quantifiers. Quantifiers take a sentence map with \(n\) free variables, and output a sentence map with \(n-1\) free variables. There are two quantifiers we will use in this course.
- \(\forall\) (For all): Given a sentence map \(S(x_1,x_2,\ldots,x_k)\) and some \(1\leq i\leq k\), $$\forall x_i S(x_1,\ldots,x_k)$$is a sentence map with \(n-1\) free variables \(x_1,\ldots,x_{i-1},x_{i+1},\ldots,x_k\) that evaluates to true iff (given values for the free variables) for all \(x_i\), \(S(x_1,\ldots,x_k)\) is true.
- \(\exists\) (There exists): Given a sentence map \(S(x_1,x_2,\ldots,x_k)\) and some \(1\leq i\leq k\), $$\exists x_i S(x_1,\ldots,x_k)$$is a sentence map with \(n-1\) free variables \(x_1,\ldots,x_{i-1},x_{i+1},\ldots,x_k\) that evaluates to true iff (given values for the free variables) there exists some \(x_i\) such that \(S(x_1,\ldots,x_k)\) is true.
Quantifiers permit us to create far more complicated sentence maps. We call sentence maps that permit quantifiers and such that all variables take on sets as values logical formulas. Here are some examples.
- \(A\in B\): This is a logical formula with two free variables, \(A\) and \(B\), that evaluates to true if \(A\) is a element of \(B\). We sometimes use just \(\in\) to refer to this logical formula.
- \(\forall X,\;X\in A\to X\in B\): This is a logical formula of order \(2\), like the previous example. When it is true, we say that \(A\) is a subset of \(B\), and we sometimes notate this logical formula as \(A\subset B\).
- Sometimes, we don't want our quantifiers to range over all possibles sets. Suppose that I want to write the logical formula \(\forall x, S(x)\), but I only want the quantifier to consider sets that are elements of \(y\). Then, I can write the following logical formula $$\forall x, S(x)\vee x\notin y$$This use case shows up so often that we have a shorthand for it: \(\forall x\in y, S(x)\).
- Similar to the previous example, if we want to write \(\exists x, S(x)\) but want \(x\) to range over the elements of \(y\), I can write $$\exists x, S(x)\wedge x\in y$$Our shorthand for the formula above is \(\exists x\in y, S(x)\).
- Given a logical formula \(S^*(X,Y)\) such that for any set \(X\), there exists exactly one \(Y\) such that \(S^*(X,Y)\) is true, we call \(S\) a set map. We view \(S^*\) as mapping \(X\) to \(Y\) (which we write as \(S^*_X=Y\) or \(S^*(X)=Y\)).
ZFC
ZFC, which stands for Zermelo Fraenkel with Choice, named after mathematicians Ernst Zermelo, Abraham Fraenkel, and Johnny Choice (I know, I know, bad joke), is an axiomatic system (i.e., a system of rules) that govern the behavior of sets. Due to a bit of luck, and the relative well-behavior of this system, it has become the most popular axiomatic system for set theory, but certainly not the only one. In this course, we'll only concern ourselves with ZFC.
So what are the axioms (i.e., rules) of set theory? While some are almost trivially obvious, others are a bit more complicated. Below is a table of their logical formulas, but you aren't expected to understand most of them immediately.
Name | Formula |
---|---|
Axiom of Extensionality | \(\forall X,Y,\;(X\subset Y\wedge Y\subset X)\to X=Y\) |
Axiom of Null Set | \(\exists X\forall Y,\;\lnot Y\in X\) |
Axiom schema of Replacement | For each set map \(S^*\), \(\forall X\exists Y\lnot(\exists Z,\;Z\in X\veebar S^*_Z\in Y)\) |
Axiom of Power Set | \(\forall X\exists Y(\lnot\exists Z,\;Z\subset X\veebar Z\in Y)\) |
Axiom of Union | \(\forall X\exists Y\forall Z,\;Z\in X\to Z\subset Y\) |
Axiom of Regularity | \(\forall X\exists Y,\; Y\in X\wedge X\cap Y=\varnothing\) |
Axiom of Infinity | If \(S^*\) is the successor map, \(\exists Y,\;\varnothing\in Y\wedge\forall Z\;Z\in Y\to S^*_Z\in Y\) |
Axiom of Choice | \(\forall X,\;\varnothing\in X\vee\left(\exists f:X\to\bigcup X,\;\forall A\in X,\;f(A)\in A\right)\) |
There is another axiom schema and another axiom that do appear in some presentations of ZFC, but are implied by the axioms above. They are presented in the table below:
Name | Formula |
---|---|
Axiom schema of Specification | For each order \(1\) logical formula \(\phi\), \(\forall X\exists Y,\;\lnot(\exists Z,\;Z\in Y\veebar(Z\in X\wedge\phi(Z)))\) |
Axiom of Pairing | \(\forall X,Y\exists Z,\;X\in Z\wedge Y\in Z\) |
Axiom of Extensionality
The axiom of extensionality tells us that if \(X\subset Y\) and \(Y\subset X\), then \(X\) and \(Y\) are the same set. This tells us that sets are defined by their elements. As a silly example, most humans share over \(99\%\) of their DNA, but you wouldn't say that they are \(99\%\) the same. But for sets, their defining quality is their elements. Nothing else.
Axiom of Null Set
The Axiom of the Null Set tells us that there exists a set with no elements. We denote this set by \(\varnothing\). Without this axiom, not only would a few of the other axioms be ill-defined, but we wouldn't be able to know whether there even is a set in our system!
Axiom schema of replacement
This isn't an axiom, but rather an (infinite) collection of axioms. For each set map \(S^*\), we have an axiom that says that for any set \(X\), there exists a set \(Y\) whose elements are precisely those of \(X\) mapped by the set map \(S^*\).
As a silly example, let \(H\) be the set map that sends all sets to \(\varnothing\). Then, the axiom of replacement would generate the set \(\{\varnothing\}\) from any non-null set \(X\).
As we'll see in later chapters, there is no axiomatization of ZFC with finitely many statements, so this axiom schema (or something similar) is necessary.
"Axiom" schema of specification
As mentioned above, these aren't axioms in our presentation (proof below), but they are hugely important both historically and mathematically. In earlier axiomatic systems for set theory, the axiom schema of specification was replaced by a slightly more general Axiom schema of comprehension, which had the following sentence for each logical formula \(\phi\) of order \(1\):$$\exists Y,\;\lnot(\exists Z,\;Z\in Y\veebar\phi(Z))$$Do you see the difference? Both axiom schemas essentially say that given a logical formula, there exists a set that contains all sets that satisfy the formula. But axiom schema of specification contains an additional constraint: it contains all of the sets satisfying the formula that belong to a particular set. In fact, without this additional constraint, the axiom schema of comprehension leads to a direct contradiction
Obviously, these sorts of contradictions are no bueno. Ideally, we'd like to ensure that our system doesn't have any. We call such a system consistent, but ensuring that ZFC is consistent is surprisingly hard.
As a useful application, given sets \(A,B\), consider the order \(1\) logical formula \(\lnot X\in B\). The axiom schema of specification thus implies that there exists a set of all the elements in \(A\) that are not in \(B\). Denote this set by \(A\setminus B\).
I claim that this set is the desired set. For any \(Z\in X\), if \(\phi(Z)\), then the set map sends \(Z\to Z\), so \(Z\in Y\). And, for any \(Z\in Y\), either \(\phi(Z)\) or \(Z=E\), but \(\phi(E)\), and thus we are done.
Given an order \(1\) logical formula \(\phi\) and a set \(X\), we often notate the set generated by the axiom schema of specification on \(\phi\) applied to \(X\) by \(\{x\in X\mid\phi(x)\}\).
Axiom of Power Set
This axiom gives us our main way of constructing larger sets from smaller ones. The axiom tells us that for any set \(X\), there is a set \(Y\) that contains all of the subsets of \(X\). By the axiom schema of specification, this axiom implies that there is a set that consists only of the subsets of \(X\). We call this set the power set of \(X\), and denote it by \(\cP(X)\).
Axiom of Pairing
Like the axiom schema of specification, this axiom is implied by the presentation of ZFC above, by the following proof
Axiom of Union
The axiom of union is almost the opposite of the axiom of the power set: it gives us a way of "unpacking" sets by constructing a set of the elements of the elements. It implies that for any set \(X\), there exists a set \(Y\) that has every element of \(X\) as a subset. By the axiom schema of specification, we can let \(Y\) be the set of elements of elements of \(X\). We call \(Y\) the union of \(X\), and denote it by \(\bigcup X\). We call this construction the union because of the following result:
As an application of the result above, we can construct a special set map, which we call the successor map. It sends a set \(x\) to \(x\cup\{x\}\). This is well defined since pairing tells us that \(\{x\}\) exists and union gives us the rest.
Another application is intersection. Given set \(X\), we can define \(\cap X\) to be the set containing those elements that are members of every single element of \(X\). Does \(\cap X\) always exist? The answer is yes!
I claim that \(Z\setminus\bigcup R=\cap X\). If there exists some \(A\in X\) such that \(Y\notin A\), then \(Y\in\psi(A)\), so \(Y\in\bigcup R\). If \(Y\in\bigcup R\), then \(Y\in\phi(A)\) for some \(A\in X\), so \(Y\notin A\). This completes the proof.
Via the same trick that we used for the axiom of union, we can define the intersection of two sets \(A\cap B\) as well. Now that we have all of the basic axioms down, we can start to build some interesting structures.
Suppose \(A=C\) and \(B\neq D\) (thus, we can assume that \(C\neq D\) since if \(A=B\) and \(C=D\), this condition wouldn't hold). Then \(\{A,B\}\neq\{C\}\) and \(\{A,B\}\neq\{C,D\}\) since \(\{A,B\}\setminus\{A\}=\{B\}\) or \(\varnothing\), neither of which is equal to \(\{C,D\}\setminus\{A\}=\{D\}\).
The result above is useful, since it gives us a structure on which equality requires equality of each of the components in order. Thus, we call \(\{\{A\},\{A,B\}\}\) the ordered pair of \(A,B\), and notate it as \((A,B)\) (Note: \((A,B)=(B,A)\) only if \(A=B\)).
Now for a quick definition dump. These definitions should be familiar.
- Reflexivity: \(\lnot x\lt x\)
- Anti-symmetry: \(x\lt y\to\lnot y\lt x\)
- Transitivity: \(x\lt y\wedge y\lt z\to x\lt z\).
- Trichotomy: \(\lnot (x\lt y \vee y\lt x)\to x=y\)
- Reflexivity: \(a\sim a\)
- Symmetry: \(a\sim b\to b\sim a\)
- Transitivity: \(a\sim b\wedge b\sim c\to a\sim c\)
Given functions \(f:A\to B\) and \(g:B\to C\), we can construct a function \(g\circ f:A\to C\) as follows: for all \(x\in A\), \(g\circ f(x)=g(f(x))\). We call this function composition
Now for some assorted function results
Given \(B\), a set of non-empty disjoint sets, let \(\sim\) be the relation \(a\sim b\) iff there exists \(C\in B\), \(a,b\in C\). This satisfies the property by definition, so all we need to show is that the relation is an equivalence relation. Let \(x,y,z\in\bigcup B\)
- Reflexivity: there exists \(C\in B\) such that \(x\in C\), so \(x\sim x\).
- Symmetry: If \(x,y\in C\), then \(y,x\in C\)
- Transitivity: if \(a,b\in C\) and \(b,c\in D\), \(C=D\) since the elements of \(B\) are disjoint, so \(a,b,c\in C\).
- There exists a set of functions from \(A\to B\). We call this \(B^A\).
- There exists a set of partial orders on \(A\). There exists a set of orders on \(A\).
- There exists a set of injections on \(A,B\). Also a set of surjections on \(A,B\), and a set of bijections on \(A,B\).
- There exists a set of equivalence relations on \(A\).
- Injectivity: Suppose \(x,y\in A\) such that \(h(x)=h(y)\). Note that \(x\in Y\) iff \(f_*(h(x))\in Y\) for all \(x\in A\) (\(f_*\) is the left inverse of \(f\)), so either \(x,y\in Y\) or neither are.
If \(x,y\in Y\), then letting \(x^*,y^*\in B\) such that \(g(x^*)=x\), \(g(y^*)=y\), we have $$h(x)=g_*(g(x^*))=x^*=y^*=g_*(g(y^*))=h(y)$$, and so \(g(x^*)=g(y^*)\) and thus \(x=y\).
If \(x,y\notin Y\), then injectivity is given by the injectivity of \(f\). - Surjectivity: Let \(y\in B\). If \(g(y)\in Y\), then \(h(g(y))=g_*(g(y))=y\), showing surjectivity. If \(g(y)\notin Y\), then \(y\notin B\setminus f(A)\) since \(g(B\setminus f(A))\subset Y\). Thus, \(y\in f(A)\), and so there exists some \(x\) such that \(f(x)=y\). Thus, \(h(x)=y\).
Axiom of Regularity
In my mind, this is perhaps the most mysterious of the axioms. Unlike the others, which gives ways to construct new sets, this axiom restricts the sets we can work with. So why do we need this?
This is a great question, but not one we are ready to answer yet, so we'll have to wait until the next few posts to understand its role better. The main implication of this axiom we'll use will be in proving the well ordering principle.
Axiom of Infinity
The axiom of infinity perhaps has the most variations between textbooks, but the core concept is the same. We pick a nice set map (in our case, the successor map), and use it to generate an infinite set by applying it arbitrarily many times.
Barring the axiom of infinity, none of the axioms give us a way to construct an infinite set. This is where the axiom of infinity comes in. So how does the axiom of infinity imply that we have an infinite set?
Before we use these results, we need to construct our infinite set of choice. Let \(I\) be the set generated by the Axiom of Infinity. Let $$C=\{J\in\cP(I)\mid \varnothing\in J\wedge (\forall Z,\;Z\in J\to S(Z)\in J)\}$$Clearly, \(I\in C\), so it isn't empty. Moreover, since \(\varnothing\in J\) for all \(J\in C\), \(\varnothing\in\cap C\). This implies that \(\cap C\in C\).
We give \(\cap C\) a special name: \(\bbN\), or the natural numbers. We'll now invoke the axiom of regularity in proving one of the most important properties of \(\bbN\).
Intuitively, the principle of mathematical induction should tell you that all elements of \(\bbN\) are finite, although we don't know what it means yet for a set to be finite. To make some of the notation less clunky, we define \(0\) as \(\varnothing\), \(1\) as \(S(0)\), \(2\) as \(S(1)\), etc. Now, our goal is to order the natural numbers.
Let \(<\) be the relation \(\{(x,y)\in\bbN\times\bbN\mid x\subsetneq y\}\). I claim that this relation is an order on \(\bbN\).
- Reflexivity: \(n\subsetneq n\) is false by regularity on \(\{n\}\).
- Anti-symmetry: \(n\subsetneq m\) and \(m\subsetneq n\) is false by regularity on \(\{\{n\},\{m\}\}\).
- Transitivity: \(n\subsetneq m\subsetneq k\) implies \(n\subsetneq k\) by definition.
- Trichotomy: \(n\subsetneq m\), \(m\subsetneq n\), or \(m=n\) is true by trichotomy
Now that we've established the basic properties of the naturals, we can begin to talk about what it means for a set to be infinite. In fact, our definition will be very simple.
In particular, this implies that \(\bbN\) is infinite. This makes a lot of sense, since an injection is a bijection with its image, a subset of the codomain \(B\). However, there's an alternate definition of infinite set that is also commonly used.
For those who have seen Hilbert's Hotel before, this definition should make sense. Infinite sets have proper subsets that are also infinite. Right?
Well, not exactly. In the axiomatic system ZF (without choice), there are sets that are Dedekind-infinite but not infinite. However, in ZFC, these notions are equivalent, which is proved here.
Ideally, we'd want to ensure that none of the naturals (the elements of \(\bbN\)) should be infinite. To prove that, we prove a few theorems.
- Injectivity: If \(g(a)=g(b)\) and \(a,b\neq i\), then \(a=b\) by injectivity of \(f\). Note that \(f(n)\neq g(a)\) for all \(a\neq i\) by construction, so this shows injectivity.
- Surjectivity: For \(y\in m\), if \(y=f(n)\), then \(y=g(i)\). Otherwise, there exists \(a\in n\) such that \(g(a)=y\) by surjectivity of \(f\).
Axiom of Choice
This is perhaps the most infamous of the axioms of ZFC (so infamous that it has its own letter in the name!), but seems rather innocent. All it says is that given any set \(X\), either it has the empty set as an element, or, there is a function from the set \(X\) to \(\bigcup X\) that sends each set to an element within the set.
The idea is that this function is making a choice for each set: which element am I going to map to? And while this may not seem too consequential, think about the axiom this way. Suppose I had a set of pairs of socks, and I need a choice function to pick one sock from each pair. I could write a rule, like, "always pick the left sock". But if I have a collection of sets, but each set has its own unique structure, there isn't any general rule I can point to in choosing an element from every single set. For an infinitely sized set, you'd need an infinite set of choices.
The reason that it is infamous is because it is the only axiom that implies the existence of an object, without actually constructing it. For more details about the history of the axiom and its importance, see this.
As our first application of the Axiom of Choice (outside of Dedekind-infiniteness), we'll prove the counterpart to the left inverse theorem.
In fact, every surjection having a right inverse implies the axiom of choice. To see why, consider some set \(X\) such that \(\varnothing\notin X\). Then by replacement, replace each element \(S\) of \(X\) by \(\{S\}\times S\) to form the set \(Y\). Now, consider the function \(f:\bigcup Y\to X\) that sends \((S,x)\) to \(S\). This function is surjective since each \(S\in X\) is non-empty. Thus, there exists a right inverse \(g\). Thus, for every \(S\in X\), \(g(S)=(S,x)\) for some \(x\in S\). Letting \(h:\bigcup Y\to\bigcup X\) be the map \((S,x)\mapsto x\), we have a choice function.
In one perspective, a choice function can be viewed as an ordered pair, since we are selecting one element out of each member of the set. This motivates a definition of a general cartesian product.