Week 12 Notes
In weeks 1-3, we studied permutations, combinations, and the many things one can do with them. In this week's notes, we'll be going over some more advanced topics in combinatorics that certainly are useful, but show up less frequently.
Recurrence Relations
Recurrence Relations are relationships between elements of a sequence. As a common example, consider the recurrence relation for the Fibonacci numbers.$$f_{n+1}=f_n+f_{n-1}$$
The Fibonacci numbers are defined by the recurrence relation above, and the following values: \(f_0=0\), \(f_1=1\). We call these starting values the initial conditions of the recurrence.
The Fibonacci recurrence is a special type of recurrence relation known as a linear recurrence, which is of the following form: $$a_n=c_1a_{n-1}+c_2a_{n-2}+\ldots+c_ka_{n-k}$$These recurrences are called linear for the same reason linear polynomials are called linear: all of the terms have degree \(1\). Note that for the recurrence above, we need \(k\) initial conditions in order to determine the sequence. Can you see why?
We care a lot about linear recurrences since we have a great way of solving them.
Characteristic Polynomial
Let's take our general linear recurrence from before.
$$a_n=c_1a_{n-1}+c_2a_{n-2}+\ldots+c_ka_{n-k}$$For those of you who know linear algebra, you'll know that the dimension of the space of solutions to this recurrence is at most \(k\) (as \(k\) initial conditions determines the sequence). So, we simply need to find \(k\) independent solutions to this recurrence, and thus all other solutions can be written in terms of these \(k\) base solutions.
But how do we find the \(k\) base solutions? We guess that the sequence is a geometric sequence. In other words, we guess that \(a_n=x^n\) for some \(x\neq0\). Plugging this guess in, we get $$x^n-c_1x^{n-1}-c_2x^{n-2}+\ldots-c_kx^{n-k}=0$$Since \(x\neq0\), we can divide both sides by \(x^{n-k}\) to get $$x^k-c_1x^{k-1}-\ldots-c_k=0$$
This is a degree \(k\) polynomial! We call this polynomial the characteristic polynomial of the linear recurrence. Thus, there are \(k\) roots, which we call \(x_1,\ldots x_k\). And, \(a_n=x_i^n\) is a valid solution to our recurrence for each \(1\leq i\leq k\). That means that every solution is of the form $$a_n=b_1x_1^n+b_2x_2^n+\ldots+b_kx_k^n$$for constants \(b_1,\ldots,b_k\).
We call the equation above the general form of the solution to the linear recurrence. To find the specific values of \(b_1,\ldots,b_k\) for our recurrence, we plug in our initial conditions, and we get a system of \(k\) equations and \(k\) variables, which we can solve. Let's do an example to show how this is done.
So that gives us a way to solve for any solution to a linear recurrence, right? Well, not so fast. What if the polynomial has multiple roots? In other words, what if it has a root multiple times?
In that case, we have the following theorem.
This solves our problem, since it guarantees that we will have \(k\) solutions to our recurrence. Here is an example of a multiple-root recurrence.
Non-linear recurrences
What do we do if the recurrence is not linear? We expand out a few terms and hope we can see a pattern. Let's do an example.
Let's try to expand the recurrence a bit to see if we see a pattern.$$a_n=2a_{n-1}+1=2(2a_{n-2}+1)+1=4a_{n-2}+2+1$$$$=8a_{n-3}+4+2+1$$Can you see a pattern yet? Let's use the pattern to fully expand the recurrence$$a_n=2^na_0+2^{n-1}+\ldots+1=\sum\limits_{i=0}^{n-1}2^i=2^n-1$$
There are no general rules for solving all recurrences, so if you see a non-linear recurrence on a contest, you can be sure that there's some nice pattern waiting for you if you expand it.
Induction
The principle of mathematical induction, otherwise known as just induction or PMI, is confusing for how simple it is.
Before explaining what PMI is, let's do an example.
How does the above proof work? It works as follows:
- The base case tells us \(S_1=1^2\).
- Then, since \(2-1=1\), the inductive case tells us that \(S_2=2^2\).
- Since \(3-1=2\), the inductive case tells us that \(S_3=3^2\)
Do you see the pattern here? The idea behind induction is like that of dominoes. If I line them up really nicely so that each domino will knock over the next (the inductive case), then we only need to knock over the first domino (the base case) for all of the dominoes to fall.
Induction allows us to prove a lot of interesting results. Here are a few more examples of useful induction proofs to give you a flavor of what they look like.
- Base Case: If \(n=0\), both sides of the equation are \(0\), and thus equal.
- Inductive Case: Suppose \((1+2+\ldots+n-1)^2=1^3+2^3+\ldots+(n-1)^3\). Then, $$(1+2+\ldots+n)^2=(1+2+\ldots+n-1)^2+2n(1+2+\ldots+n-1)+n^2$$$$=(1^3+\ldots+(n-1)^3)+2n\cdot\frac{n(n-1)}2+n^2$$$$=(1^3+\ldots+(n-1)^3)+n^3-n^2+n^2=1^3+\ldots+n^3$$
- Base Case: When \(n=0\), \(x^{2n+1}+y^{2n+1}=x+y\), so \(x+y\) obviously has \(x+y\) as a factor.
- Inductive Case: Suppose \(x^{2n-1}+y^{2n-1}\) has \(x+y\) as a factor. Then, by modular arithmetic, $$x^{2n-1}+y^{2n-1}\equiv0\bmod{x+y}$$By mod properties, $$x^{2n-1}\equiv-y^{2n-1}\bmod{x+y}$$Since \(x^2-y^2=(x+y)(x-y)\), \(x^2-y^2\equiv0\bmod{x+y}\), so by mod properties, $$x^{2n-1}\cdot x^2\equiv-y^{2n-1}\cdot y^2\bmod{x+y}$$$$x^{2n+1}+y^{2n+1}\equiv0\bmod{x+y}$$
There are many variants of induction. The most useful variant is called Strong Induction. In this variant, we prove the following steps for our property \(P\).
- Inductive Case: We show that for any natural number \(n\), if for all natural numbers \(m\) such that \(m\lt n\), property \(P\) is true, then property \(P\) is true for \(n\) as well.
Interestingly, for strong induction, you don't need a base case (why?). It is called strong, since instead of assuming that the property is true just for the natural number just below \(n\), we assume it is true for all natural numbers below \(n\). This can be useful in some induction proofs, like the following.
- Inductive Case: Given \(n\gt1\), suppose that every natural number \(m\) with \(1\lt m\lt n\) can be written as a product of primes. There are two possibilities.
- There exists a natural number \(a\) such that \(1\lt a\lt n\) and \(a\) is a factor of \(n\). In this case, \(b=\frac na\) is also a natural number. Since \(a\gt1\), \(b\lt n\), and since \(a\lt n\), \(b\gt1\). So, both \(a\) and \(b\) can be written as a product of primes. So \(n\) can be written as the product of \(a\)'s primes times the product of \(b\)'s primes.
- If Case 1 is not true, then by the definition of prime numbers, \(n\) is prime, and thus is a product of prime numbers.
This proof would have been near impossible to do with just vanilla induction, but strong induction is good enough to prove the result. Here are some exercises to test your understanding.
There is another variation of induction known as infinite descent. In essence, infinite descent is the principle that there is no infinite sequence of decreasing natural numbers. The first known use of this property is in the proof of the irrationality of \(\sqrt2\).
Now for some more interesting applications.
Let \(S_1=a_1+\ldots+a_{2n+1}\). Since \(a_1=0\), by the property that the remaining elements can be split into sets of equal sums, \(S_1\) is even. And, since we can remove any element, and the resulting sum is even, every \(a_i\) is even. So, let \(b_i=\frac{a_i}2\) be positive integers. Clearly, the given property holds with \(S_2=b_1+b_2+\ldots+b_{2n+1}\). Repeating this process ad nauseam gives us an infinite sequence of decreasing non-negative integers \(a_{2n+1},b_{2n+1},\ldots\), which is a contradiction. Thus, \(a_{2n+1}=0\), which implies all of the \(a_i\) are zero.
Pigeonhole Principle
The pigeonhole principle is another example of an idea that's very simple, but can be used in very interesting ways.
Imagine that there are 10 pigeons, and 9 pigeonholes (areas for pigeons to roam). If each pigeon needs to go into a hole, then clearly at least one hole has more than one pigeon.
This argument generalizes as follows. If I have \(n\) pigeons, and \(k\) holes, then there is a hole with at least \(\left\lfloor\frac nk\right\rfloor\) pigeons (\(\lfloor\rfloor\) is the floor function, which returns the largest integer less than the input). Let's take a look at some examples of this principle in use.
We first used the principle in Week 8, in the proof of Bezout's Lemma. For those of you who didn't know this principle before this week, I'd suggest revisiting those notes to understand that proof. Here's a more fun application of the principle:
Weird, huh? Let's look at some non-trivial applications.
Challenge: Functional Equations
In algebra, we are usually presented with a function first, and then we study its properties. However, in a functional equation, we are given the property first, and then we must deduce all functions that have the given property. For example, the problem might be to find all functions \(f :\mathbb R \to\mathbb R\) such that$$f(f(x)-f(f(y)))=x-y$$for all \(x,y\in\mathbb R\), where \(\mathbb R\) is the symbol for the real numbers.
Before we start studying such equations, let's clarify some basic notation.
A function \(f:A\to B\) is a function with domain \(A\) and codomain \(B\). The domain of a function is the set of inputs to the function, and the codomain is the set of possible outputs.
Functional Equations in general are very difficult to solve. However, if you see them on a contest, you can bet that there is some nice trick that will make them solvable. We'll review some of the more common tricks.
Substitution
With many functional equations, the best place to start is to substitute in values for the variables. \(0\) and \(1\) are often good candidates, but any substitution which leads to a simplified equation is helpful.
Separating Variables
If substitution isn't getting you anywhere, separating variables can be very useful. We saw this technique in the previous exercise, where we isolating \(y\) to one side of the equation, which allowed us to find the form of the solution. We can also apply this technique with multiple variables.
There doesn't seem to be an easy way to see if there are other solutions. So, let's try to simplify our work by substituting in a new function. Let \(g(x)=f(x)-f(0)\). Note that since \(f(x)=g(x)+f(0)\), \(g(x)\) has the same domain and codomain as \(f(x)\). And, if \(f(x)\) is a solution to the functional equation, $$x(f(x)-f(0))-y(f(y)-f(0))=(x-y)(f(x+y)-f(0))$$is an equality, so \(g(x)\) is also a solution to the functional equation. Why is this a useful substitution? Because \(g(0)=0\), let \(y=-x\) to get $$x(g(x)+g(-x))=0$$Thus, \(-g(x)=g(-x)\) for all \(x\) (since \(g(0)=0\)). We can use this fact by letting \(x=u\), \(y=-v\) in the original functional equation to get $$u(g(u))-v(g(v))=(u+v)g(u-v)$$Letting \(x=u\), \(y=v\) gives us $$u(g(u))-v(g(v))=(u-v)g(u+v)$$Since \(u\) and \(v\) can be any real numbers, so can \(u+v\) and \(u-v\). So, letting \(a=u+v\) and \(b=u-v\), we have the equation $$ag(b)=bg(a)$$In particular, with \(b=1\), \(g(a)=ag(1)\). This proves that \(f(x)=f(1)x+f(0)\), which shows that we have found all solutions.
Iterated Functions
When a function is applied repeatedly to an input, we call the resulting function iterated. For example, we can write$$g^3(x)=g(g(g(x)))=g\circ g\circ g(x)$$We often prefer the latter notation when there is a lot of iteration, as the parentheses become distracting and the exponent might be confusing.
With iterated functions, there are two things to look out for. The first are cyclic functions. A function \(g\) is called cyclic of order \(n\) if the following equations holds for all \(x\) in the domain of \(g\)$$\underbrace{g\circ g\circ\ldots\circ g}_{n\text{ times}}(x)=x$$Hidden cyclic functions are often the key to solving iterated functional equations.
The second concept to look out for is commutativity. If \(f^n(x)=g(x)\) for all \(x\) for some \(n\gt0\), then $$f^{n+1}(x)=f(g(x))=g(f(x))$$for all \(x\).
Fixed Points
Another recurring concept in functional equations is that of fixed points. Given a function \(f:A\to B\), \(x\in A\cap B\) is said to be a fixed point of \(f\) if \(f(x)=x\). Recognizing functions with fixed points can quickly simplify functional equations.
For any \(y\in S\), let \(x=1\) in the given functional equation, which implies $$f(f(y))=yf(1)$$Since \(y\gt0\), we have \(f(1)=\frac yy=1\), and thus \(1\) is a fixed point of \(f\). Are there any others?
There aren't, due to the second condition. Let \(y\in S\), \(x=\frac1y\) in the functional equation, which implies $$\begin{aligned}f\left(\frac1yf(y)\right)&=yf\left(\frac 1y\right)\\1&=yf\left(\frac 1y\right)\end{aligned}$$This implies \(\frac1y\in S\). Moreover, for any \(a,b\in S\), $$\begin{aligned}f(af(b))&=bf(a)\\f(ab)&=ab\end{aligned}$$Thus, \(ab\in S\). So, if \(y\gt1\in S\), then \(y^n\in S\) for any natural \(n\), which contradicts the limit approaching \(0\) (since \(y^n\) grows to infinity). On the other hand, if \(y\lt1\in S\), then \(\frac1y\gt1\in S\), which is a contradiction from our argument earlier. Thus, \(S=\{1\}\). More importantly, this tells us that \(xf(x)=1\) for all \(x\), since \(xf(x)\) is a fixed point for all \(x\). Thus, \(f(x)=\frac1x\) is the only possible solution. Plugging it into the functional equation shows that it is in fact a solution.
Injective, Surjective, Monotonic
Often, finding solutions to functional equations comes down to recognizing that they are either injective, surjective, monotonic, or some combination of these.
We discussed injective mappings in the Challenge Section in Week 10, but the idea can be applied to functional equations as well. To summarize, an injective function is one in which each input has a unique output (no two inputs map to the same output).
Surjective functions are ones in which the range and codomain are the same. In other words, for every possible output, there exists an input that maps to that output. As an example, the function \(f:\mathbb R\to\mathbb R\) which is defined as \(f(x)=x^2\) is not surjective since there is no \(x\in\mathbb R\) that maps to \(-1\).
Monotonic functions either always increase or decrease. With these three types of functions, we can learn a lot of about the space of possible solutions to a functional equation.
Suppose \(a,b\in\mathbb R\) such that \(f(a)=f(b)\). We'd like to show that \(a=b\) to prove \(f\) is injective. Let \(y=a\) and \(y=b\) in the given functional equation to get $$\begin{aligned}af(x)&=f(xf(a))+f(f(x)+f(a))-f(x+f(a))\\bf(x)&=f(xf(b))+f(f(x)+f(b))-f(x+f(b))\end{aligned}$$Since \(f(a)=f(b)\), the right hand sides of both equations are equal, so \(af(x)=bf(x)\). Thus, either \(f(x)=0\) for all \(x\), or \(a=b\), implying injectivity.
It's not hard to see that \(f(x)=0\) is a solution to the functional equation. If \(f(x)\neq0\), it is injective. To use this, ideally we want to have an equation of the form \(f(\text{something})=f(\text{other thing})\), so that we can have something and other thing be equal. By playing around with values, we get this sort of equation by setting \(x=0\) and \(y=1\) to get $$f(f(0)+f(1))=f(f(1))$$With this, we know that \(f(0)+f(1)=f(1)\), and thus \(f(0)=0\). Now, plug \(y=0\) into the functional equation to get \(f(f(x))=f(x)\). Since \(f\) is injective, this implies \(f(x)=x\). We can check that this also is a solution. Thus, we have two solutions: \(f(x)=0\) and \(f(x)=x\).
Many times, like the problem above, we aren't always able to prove injectivity of the function in the functional equation. But, the process of doing so helps progress our solution.
For any \(b\in\mathbb R\), we want to find some \(a\in\mathbb R\) such that \(f(a)=b\). Let \(y=-f(x)\) in the functional equation to get $$f(0)-2x=f(f(-f(x))-x)$$As gross as this looks, it proves surjectivity, since for any \(b\in\mathbb R\), letting \(x=\frac{f(0)-b}2\) and \(a=f(-f(x))-x\) shows \(f(a)=b\).
Now let's use surjectivity. As a first idea, perhaps the fact that there exists some \(c\in\mathbb R\) such that \(f(c)=0\) is useful. Letting \(x=c\) gives us $$f(y)=2c+f(f(y)-c)$$Since \(f\) is surjective, \(f(y)\) can be any real value, so we can substitute \(x\) for \(f(y)\) to get $$x=2c+f(x-c)$$So, \(f(x)=x-c\) is the only class of possible solutions. We can plug this back into the functional equation to see that it works for all constants \(c\).
Surjectivity is not all that common to see in functional equation problems, but a tell-tale sign to use it is when you can write the functional equation in a form \(f(\text{mess})=\{something\}\), and the "something" can be any value of the codomain.
Base Case: \(m=1\): Since \(f(2)=2\), \(f(1)=1\) since the function is monotonic.
Inductive Case: Assume the claim is true for \(m\). Then, \(f(2^m)=2^m\), and thus \(f(2^{m+1})=f(2\cdot2^m)=2^{m+1}\). And, we have $$2^m=f(2^m)\lt f(2^m+1)\lt f(2^m+2)\lt\ldots\lt f(2^{m+1}-1)\lt f(2^{m+1})=2^{m+1}$$We have to fit \(2^{m+1}-2^m-1\) values strictly in between \(2^m\) and \(2^{m+1}\), which is only possible if \(f(n)=n\) for all \(2^m\lt n\lt 2^{m+1}\). Thus, \(f(n)=n\) is the only possible solution, which can be checked to be a solution.
The approach used in this problem is known as squeezing, and is very common in problems with monotonic conditions. Squeezing only works when the codomain is discrete (meaning that between adjacent numbers, there are no numbers.)
Traps!
Now that we've covered plenty of approaches to solving functional equations, let's go over some common traps/mistakes to make when solving them.
The most common one is unexpected solutions. Often, when solving functional equations, we have to make a guess as to the form of the functions that satisfy them. While this may lead to some solutions, trying to prove that these solutions are the only ones can be odious. If you are struggling, consider the possibility that there are solutions you haven't accounted for.
As silly as the above example is, unexpected solutions show up all of the time. Just because you can't think of a solution, doesn't mean it is not there. Here's another example.
The answer is no. We've ignored the possibility that \(f(2a)=0\), and the possibility that \(f(2a)=4f(a)\). The correct solution are the following functions for any integer \(a\):
- \(f(n)=an^2\)
- \(f(n)=0\) for even \(n\) and \(f(n)=a\) for odd \(n\)
- \(f(n)=0\) for \(n\equiv0\bmod4\), \(f(n)=a\) for odd \(n\), \(f(n)=4a\) for \(n\equiv2\bmod4\)
The final trap is quite difficult to explain as it is quite a nuanced trap, and thus won't appear in most competitions. However, many contest writers have included this trap many times over the years, so I'd feel wrong to not include it.
In the partial solution above, we find that for every positive real \(t\), either \(f(t)=t\) or \(f(t)=\frac1t\). But that doesn't mean for all \(x\), \(f(x)=x\) or for all \(x\), \(f(x)=\frac1x\). What if \(f(x)=x\) for all \(x\in(0,1)\) and \(f(x)=\frac1x\) for all \(x\geq1\)? That satisfies the first statement, but not the second. This is known as the pointwise value trap.
In fact, the two solutions we found are the only two solutions, but more work needs to be done to show this (can you complete the proof?)
To summarize, functional equations are really hard, and the only way to get used to the tools above is to practice. Try applying the tools in order of how they appeared in this section (first try substituting, then separating variables, etc).