A common task in analysis is to obtain bounds on sums
or integrals
where
is some simple region (such as an interval) in one or more dimensions, and
is an explicit (and
elementary) non-negative expression involving one or more variables (such as
or
, and possibly also some additional parameters. Often, one would be content with an order of magnitude upper bound such as
or
where we use
(or
or
) to denote the bound
for some constant
; sometimes one wishes to also obtain the matching lower bound, thus obtaining
or
where
is synonymous with
. Finally, one may wish to obtain a more precise bound, such as
where
is a quantity that goes to zero as the parameters of the problem go to infinity (or some other limit). (For a deeper dive into asymptotic notation in general, see
this previous blog post.)
Here are some typical examples of such estimation problems, drawn from recent questions on MathOverflow:
Compared to other estimation tasks, such as that of controlling oscillatory integrals, exponential sums, singular integrals, or expressions involving one or more unknown functions (that are only known to lie in some function spaces, such as an space), high-dimensional geometry (or alternatively, large numbers of random variables), or number-theoretic structures (such as the primes), estimation of sums or integrals of non-negative elementary expressions is a relatively straightforward task, and can be accomplished by a variety of methods. The art of obtaining such estimates is typically not explicitly taught in textbooks, other than through some examples and exercises; it is typically picked up by analysts (or those working in adjacent areas, such as PDE, combinatorics, or theoretical computer science) as graduate students, while they work through their thesis or their first few papers in the subject.
Somewhat in the spirit of this previous post on analysis problem solving strategies, I am going to try here to collect some general principles and techniques that I have found useful for these sorts of problems. As with the previous post, I hope this will be something of a living document, and encourage others to add their own tips or suggestions in the comments.
— 1. Asymptotic arithmetic —
Asymptotic notation is designed so that many of the usual rules of algebra and inequality manipulation continue to hold, with the caveat that one has to be careful if subtraction or division is involved. For instance, if one knows that and , then one can immediately conclude that and , even if are negative (note that the notation or automatically forces to be non-negative). Equivalently, we have the rules
and more generally we have the triangle inequality
Again, we stress that this sort of rule implicitly requires the
to be non-negative, and that claims such as
and
are simply
false. As a rule of thumb, if your calculations have arrived at a situation where a signed or oscillating sum or integral appears
inside the big-O notation, or on the right-hand side of an estimate, without being “protected” by absolute value signs, then you have probably made a serious error in your calculations.
Another rule of inequalities that is inherited by asymptotic notation is that if one has two bounds
for the same quantity
, then one can combine them into the unified asymptotic bound
This is an example of a “free move”: a replacement of bounds that does not lose any of the strength of the original bounds, since of course
(2) implies
(1). In contrast, other ways to combine the two bounds
(1), such as taking the geometric mean
while often convenient, are not “free”: the bounds
(1) imply the averaged bound
(3), but the bound
(3) does not imply
(1). On the other hand, the inequality
(2), while it does not concede any logical strength, can require more calculation to work with, often because one ends up splitting up cases such as
and
in order to simplify the minimum. So in practice, when trying to establish an estimate, one often starts with using conservative bounds such as
(2) in order to maximize one’s chances of getting any proof (no matter how messy) of the desired estimate, and only after such a proof is found, one tries to look for more elegant approaches using less efficient bounds such as
(3).
For instance, suppose one wanted to show that the sum
was convergent. Lower bounding the denominator term
by
or by
, one obtains the bounds
and also
so by applying
(2) we obtain the unified bound
To deal with this bound, we can split into the two contributions
, where
dominates, and
, where
dominates. In the former case we see (from the ratio test, for instance) that the sum
is absolutely convergent, and in the latter case we see that the sum
is also absolutely convergent, so the entire sum is absolutely convergent. But once one has this argument, one can try to streamline it, for instance by taking the geometric mean of
(4),
(5) rather than the minimum to obtain the weaker bound
and now one can conclude without decomposition just by observing the absolute convergence of the doubly infinite sum
. This is a less “efficient” estimate, because one has conceded a lot of the decay in the summand by using
(6) (the summand used to be exponentially decaying in
, but is now only polynomially decaying), but it is still sufficient for the purpose of establishing absolute convergence.
One of the key advantages of dealing with order of magnitude estimates, as opposed to sharp inequalities, is that the arithmetic becomes tropical. More explicitly, we have the important rule
whenever
are non-negative, since we clearly have
In particular, if
, then
. That is to say, given two orders of magnitudes, any term
of equal or lower order to a “main term”
can be discarded. This is a very useful rule to keep in mind when trying to estimate sums or integrals, as it allows one to discard many terms that are not contributing to the final answer. It also interacts well with monotone operations, such as raising to a power
; for instance, we have
if
and
is a fixed positive constant, whilst
if
. Finally, this relation also sets up the fundamental
divide and conquer strategy for estimation: if one wants to prove a bound such as
, it will suffice to obtain a decomposition
or at least an upper bound
of
by some bounded number of components
, and establish the bounds
separately. Typically the
will be (morally at least) smaller than the original quantity
– for instance, if
is a sum of non-negative quantities, each of the
might be a subsum of those same quantities – which means that such a decomposition is a “free move”, in the sense that it does not risk making the problem harder. (This is because, if the original bound
is to be true, each of the new objectives
must also be true, and so the decomposition can only make the problem logically easier, not harder.) The only costs to such decomposition are that your proofs might be
times longer, as you may be repeating the same arguments
times, and that the implied constants in the
bounds may be worse than the implied constant in the original
bound. However, in many cases these costs are well worth the benefits of being able to simplify the problem into smaller pieces. As mentioned above, once one successfully executes a divide and conquer strategy, one can go back and try to reduce the number of decompositions, for instance by unifying components that are treated by similar methods, or by replacing strong but unwieldy estimates with weaker, but more convenient estimates.
The above divide and conquer strategy does not directly apply when one is decomposing into an unbounded number of pieces , . In such cases, one needs an additional gain in the index that is summable in in order to conclude. For instance, if one wants to establish a bound of the form , and one has located a decomposition or upper bound
that looks promising for the problem, then it would suffice to obtain exponentially decaying bounds such as
for all
and some constant
, since this would imply
thanks to the geometric series formula. (Here it is important that the implied constants in the asymptotic notation are uniform on
; a
-dependent bound such as
would be useless for this application, as then the growth of the implied constant in
could overwhelm the exponential decay in the
factor). Exponential decay is in fact overkill; polynomial decay such as
would already be sufficient, although harmonic decay such
is not quite enough (the sum
diverges logarithmically), although in many such situations one could try to still salvage the bound by working a lot harder to squeeze some additional logarithmic factors out of one’s estimates. For instance, if one can improve
(8) to
for all
and some constant
, since (by the integral test) the sum
converges (and one can treat the
term separately if one already has
(8)).
Sometimes, when trying to prove an estimate such as , one has identified a promising decomposition with an unbounded number of terms
(where
is finite but unbounded) but is unsure of how to proceed next. Often the next thing to do is to study the extreme terms
and
of this decomposition, and first try to establish (the presumably simpler) tasks of showing that
and
. Often once one does so, it becomes clear how to combine the treatments of the two extreme cases to also treat the intermediate cases, obtaining a bound
for each individual term, leading to the inferior bound
; this can then be used as a starting point to hunt for additional gains, such as the exponential or polynomial gains mentioned previously, that could be used to remove this loss of
. (There are more advanced techniques, such as those based on controlling moments such as the square function
, or trying to understand the precise circumstances in which a “large values” scenario
occurs, and how these scenarios interact with each other for different
, but these are beyond the scope of this post, as they are rarely needed when dealing with sums or integrals of elementary functions.)
If one is faced with the task of estimating a doubly infinite sum , it can often be useful to first think about how one would proceed in estimating when is very large and positive, and how one would proceed when is very large and negative. In many cases, one can simply decompose the sum into two pieces such as and and use whatever methods you came up with to handle the two extreme cases; in some cases one also needs a third argument to handle the case when is of bounded (or somewhat bounded) size, in which case one may need to divide into three pieces such as , , and . Sometimes there will be a natural candidate for the places where one is cutting the sum, but in other situations it may be best to just leave these cut points as unspecified parameters initially, obtain bounds that depend on these parameters, and optimize at the end. (Typically, the optimization proceeds by trying to balance the magnitude of a term that is increasing with respect to a parameter, with one that is decreasing. For instance, if one ends up with a bound such as for some parameter and quantities , it makes sense to select to balance the two terms. Or, if faced with something like for some , then something like would be close to the optimal choice of parameter. And so forth.)
— 1.1. Psychological distinctions between exact and asymptotic arithmetic —
The adoption of the “divide and conquer” strategy requires a certain mental shift from the “simplify, simplify” strategy that one is taught in high school algebra. In the latter strategy, one tries to collect terms in an expression make them as short as possible, for instance by working with a common denominator, with the idea that unified and elegant-looking expressions are “simpler” than sprawling expressions with many terms. In contrast, the divide and conquer strategy is intentionally extremely willing to greatly increase the total length of the expressions to be estimated, so long as each individual component of the expressions appears easier to estimate than the original one. Both strategies are still trying to reduce the original problem to a simpler problem (or collection of simpler sub-problems), but the metric by which one judges whether the problem has become simpler is rather different.
A related mental shift that one needs to adopt in analysis is to move away from the exact identities that are so prized in algebra (and in undergraduate calculus), as the precision they offer is often unnecessary and distracting for the task at hand, and often fail to generalize to more complicated contexts in which exact identities are no longer available. As a simple example, consider the task of estimating the expression
where
is a parameter. With a trigonometric substitution, one can evaluate this expression exactly as
, however the presence of the arctangent can be inconvenient if one has to do further estimation tasks (for instance, if
depends in a complicated fashion on other parameters, which one then also wants to sum or integrate over). Instead, by observing the trivial bounds
and
one can combine them using
(2) to obtain the upper bound
and similar arguments also give the matching lower bound, thus
This bound, while cruder than the exact answer of
, is often good enough for many applications (particularly in situations where one is willing to concede constants in the bounds), and can be more tractible to work with than the exact answer. Furthermore, these arguments can be adapted without difficulty to treat similar expressions such as
for any fixed exponent
, which need not have closed form exact expressions in terms of elementary functions such as the arctangent when
is non-integer.
As a general rule, instead of relying exclusively on exact formulae, one should seek approximations that are valid up to the degree of precision that one seeks in the final estimate. For instance, suppose one one wishes to establish the bound
for all sufficiently small
. If one was clinging to the exact identity mindset, one could try to look for some trigonometric identity to simplify the left-hand side exactly, but the quicker (and more robust) way to proceed is just to use Taylor expansion up to the specified accuracy
to obtain
which one can invert using the geometric series formula
to obtain
from which the claim follows. (One could also have computed the Taylor expansion of
by repeatedly differentiating the secant function, but as this is a series that is usually not memorized, this can take a little bit more time than just computing it directly to the required accuracy as indicated above.) Note that the notion of “specified accuracy” may have to be interpreted in a relative sense if one is planning to multiply or divide several estimates together. For instance, if one wishes to establsh the bound
for small
, one needs an approximation
to the sine function that is accurate to order
, but one only needs an approximation
to the cosine function that is accurate to order
, because the cosine is to be multiplied by
. Here the key is to obtain estimates that have a
relative error of
, compared to the main term (which is
for cosine, and
for sine).
The following table lists some common approximations that can be used to simplify expressions when one is only interested in order of magnitude bounds (with an arbitrary small constant):
The quantity… | has magnitude comparable to … | provided that… |
| | or |
| | |
, , | | |
| | |
| | real |
| | real |
| | |
| | |
, , | | |
| | |
, | | |
| | real |
| | , |
| | |
| | , |
| | , |
| | , |
| | |
| | |
On the other hand, some exact formulae are still very useful, particularly if the end result of that formula is clean and tractable to work with (as opposed to involving somewhat exotic functions such as the arctangent). The geometric series formula, for instance, is an extremely handy exact formula, so much so that it is often desirable to control summands by a geometric series purely to use this formula (we already saw an example of this in (7)). Exact integral identities, such as
or more generally
for
(where
is the
Gamma function) are also quite commonly used, and fundamental exact integration rules such as the change of variables formula, the Fubini-Tonelli theorem or integration by parts are all esssential tools for an analyst trying to prove estimates. Because of this, it is often desirable to estimate a sum by an integral. The
integral test is a classic example of this principle in action: a more quantitative versions of this test is the bound
whenever
are integers and
is monotone decreasing, or the closely related bound
whenever
are reals and
is monotone (either increasing or decreasing); see Lemma 2 of
this previous post. Such bounds allow one to switch back and forth quite easily between sums and integrals as long as the summand or integrand behaves in a mostly monotone fashion (for instance, if it is monotone increasing on one portion of the domain and monotone decreasing on the other). For more precision, one could turn to more advanced relationships between sums and integrals, such as the
Euler-Maclaurin formula or the
Poisson summation formula, but these are beyond the scope of this post.
Exercise 1 Suppose obeys the quasi-monotonicity property whenever . Show that for any integers .
Exercise 2 Use (11) to obtain the “cheap Stirling approximation”
for any natural number . (Hint: take logarithms to convert the product into a sum.)
With practice, you will be able to identify any term in a computation which is already “negligible” or “acceptable” in the sense that its contribution is always going to lead to an error that is smaller than the desired accuracy of the final estimate. One can then work “modulo” these negligible terms and discard them as soon as they appear. This can help remove a lot of clutter in one’s arguments. For instance, if one wishes to establish an asymptotic of the form
for some main term
and lower order error
, any component of
that one can already identify to be of size
is negligible and can be removed from
“for free”. Conversely, it can be useful to
add negligible terms to an expression, if it makes the expression easier to work with. For instance, suppose one wants to estimate the expression
This is a partial sum for the zeta function
so it can make sense to add and subtract the tail
to the expression
(12) to rewrite it as
To deal with the tail, we switch from a sum to the integral using
(10) to bound
giving us the reasonably accurate bound
One can sharpen this approximation somewhat using
(11) or the Euler–Maclaurin formula; we leave this to the interested reader.
Another psychological shift when switching from algebraic simplification problems to estimation problems is that one has to be prepared to let go of constraints in an expression that complicate the analysis. Suppose for instance we now wish to estimate the variant
of
(12), where we are now restricting
to be
square-free. An identity from analytic number theory (the
Euler product identity) lets us calculate the exact sum
so as before we can write the desired expression as
Previously, we applied the integral test
(10), but this time we cannot do so, because the restriction to square-free integers destroys the monotonicity. But we can simply remove this restriction:
Heuristically at least, this move only “costs us a constant”, since a positive fraction (
, in fact) of all integers are square-free. Now that this constraint has been removed, we can use the integral test as before and obtain the reasonably accurate asymptotic
— 2. More on decomposition —
The way in which one decomposes a sum or integral such as or is often guided by the “geometry” of , and in particular where is large or small (or whether various component terms in are large or small relative to each other). For instance, if comes close to a maximum at some point , then it may make sense to decompose based on the distance to , or perhaps to treat the cases and separately. (Note that does not literally have to be the maximum in order for this to be a reasonable decomposition; if it is in “within reasonable distance” of the maximum, this could still be a good move. As such, it is often not worthwhile to try to compute the maximum of exactly, especially if this exact formula ends up being too complicated to be useful.)
If an expression involves a distance between two quantities , it is sometimes useful to split into the case where is much smaller than (so that ), the case where is much smaller than (so that ), or the case when neither of the two previous cases apply (so that ). The factors of here are not of critical importance; the point is that in each of these three cases, one has some hope of simplifying the expression into something more tractable. For instance, suppose one wants to estimate the expression
in terms of the two real parameters
, which we will take to be distinct for sake of this discussion. This particular integral is simple enough that it can be evaluated exactly (for instance using contour integration techniques), but in the spirit of Principle 1, let us avoid doing so and instead try to decompose this expression into simpler pieces. A graph of the integrand reveals that it peaks when
is near
or near
. Inspired by this, one can decompose the region of integration into three pieces:
- (i) The region where .
- (ii) The region where .
- (iii) The region where .
(This is not the only way to cut up the integral, but it will suffice. Often there is no “canonical” or “elegant” way to perform the decomposition; one should just try to find a decomposition that is convenient for the problem at hand.)
The reason why we want to perform such a decomposition is that in each of the three cases, one can simplify how the integrand depends on . For instance, in region (i), we see from the triangle inequality that is now comparable to , so that this contribution to (13) is comparable to
Using a variant of
(9), this expression is comparable to
The contribution of region (ii) can be handled similarly, and is also comparable to
(14). Finally, in region (iii), we see from the triangle inequality that
are now comparable to each other, and so the contribution of this region is comparable to
Now that we have centered the integral around
, we will discard the
constraint, upper bounding this integral by
On the one hand this integral is bounded by
and on the other hand we can bound
and so we can bound the contribution of (iii) by
. Putting all this together, and dividing into the cases
and
, one can soon obtain a total bound of
for the entire integral. One can also adapt this argument to show that this bound is sharp up to constants, thus
A powerful and common type of decomposition is dyadic decomposition. If the summand or integrand involves some quantity in a key way, it is often useful to break up into dyadic regions such as , so that , and then sum over . (One can tweak the dyadic range here with minor variants such as , or replace the base by some other base, but these modifications mostly have a minor aesthetic impact on the arguments at best.) For instance, one could break up a sum
into dyadic pieces
and then seek to estimate each dyadic block
separately (hoping to get some exponential or polynomial decay in
). The classical technique of
Cauchy condensation is a basic example of this strategy. But one can also dyadically decompose other quantities than
. For instance one can perform a “vertical” dyadic decomposition (in contrast to the “horizontal” one just performed) by rewriting
(15) as
since the summand
is
, we may simplify this to
This now converts the problem of estimating the sum
(15) to the more combinatorial problem of estimating the size of the dyadic level sets
for various
. In a similar spirit, we have
where
denotes the Lebesgue measure of a set
, and now we are faced with a geometric problem of estimating the measure of some explicit set. This allows one to use geometric intuition to solve the problem, instead of multivariable calculus:
Exercise 3 Let be a smooth compact submanifold of . Establish the bound
for all , where the implied constants are allowed to depend on . (This can be accomplished either by a vertical dyadic decomposition, or a dyadic decomposition of the quantity .)
Exercise 4 Solve problem (ii) from the introduction to this post by dyadically decomposing in the variable.
Remark 5 By such tools as (10), (11), or Exercise 1, one could convert the dyadic sums one obtains from dyadic decomposition into integral variants. However, if one wished, one could “cut out the middle-man” and work with continuous dyadic decompositions rather than discrete ones. Indeed, from the integral identity
for any , together with the Fubini–Tonelli theorem, we obtain the continuous dyadic decomposition
for any quantity that is positive whenever is positive. Similarly if we work with integrals rather than sums. This version of dyadic decomposition is occasionally a little more convenient to work with, particularly if one then wants to perform various changes of variables in the parameter which would be tricky to execute if this were a discrete variable.
— 3. Exponential weights —
Many sums involve expressions that are “exponentially large” or “exponentially small” in some parameter. A basic rule of thumb is that any quantity that is “exponentially small” will likely give a negligible contribution when compared against quantities that are not exponentially small. For instance, if an expression involves a term of the form for some non-negative quantity , which can be bounded on at least one portion of the domain of summation or integration, then one expects the region where is bounded to provide the dominant contribution. For instance, if one wishes to estimate the integral
for some
, this heuristic suggests that the dominant contribution should come from the region
, in which one can bound
simply by
and obtain an upper bound of
To make such a heuristic precise, one can perform a dyadic decomposition in the exponential weight , or equivalently perform an additive decomposition in the exponent , for instance writing
Exercise 6 Use this decomposition to rigorously establish the bound
for any .
Exercise 7 Solve problem (i) from the introduction to this post.
More generally, if one is working with a sum or integral such as
or
with some exponential weight
and a lower order amplitude
, then one typically expects the dominant contribution to come from the region where
comes close to attaining its maximal value. If this maximum is attained on the boundary, then one typically has geometric series behavior away from the boundary, and one can often get a good estimate by obtaining geometric series type behavior. For instance, suppose one wants to estimate the error function
for
. In view of the complete integral
we can rewrite this as
The exponential weight
attains its maximum at the left endpoint
and decays quickly away from that endpoint. One could estimate this by dyadic decomposition of
as discussed previously, but a slicker way to proceed here is to use the convexity of
to obtain a geometric series upper bound
for
, which on integration gives
giving the asymptotic
for
.
Exercise 8 In the converse direction, establish the upper bound
for some absolute constant and all .
Exercise 9 If for some , show that
(Hint: estimate the ratio between consecutive binomial coefficients and then control the sum by a geometric series).
When the maximum of the exponent occurs in the interior of the region of summation or integration, then one can get good results by some version of Laplace’s method. For simplicity we will discuss this method in the context of one-dimensional integrals
where
attains a non-degenerate global maximum at some interior point
. The rule of thumb here is that
The heuristic justification is as follows. The main contribution should be when
is close to
. Here we can perform a Taylor expansion
since at a non-degenerate maximum we have
and
. Also, if
is continuous, then
when
is close to
. Thus we should be able to estimate the above integral by the gaussian integral
which can be computed to equal
as desired.
Let us illustrate how this argument can be made rigorous by considering the task of estimating the factorial of a large number. In contrast to what we did in Exercise 2, we will proceed using a version of Laplace’s method, relying on the integral representation
As
is large, we will consider
to be part of the exponential weight rather than the amplitude, writing this expression as
where
The function
attains a global maximum at
, with
and
. We will therefore decompose this integral into three pieces
where
is a radius parameter which we will choose later, as it is not immediately obvious for now what the optimal value of this parameter is (although the previous heuristics do suggest that
might be a reasonable choice).
The main term is expected to be the middle term, so we shall use crude methods to bound the other two terms. For the first part where , is increasing so we can crudely bound and thus
(We expect
to be much smaller than
, so there is not much point to saving the tiny
term in the
factor.) For the third part where
,
is decreasing, but bounding
by
would not work because of the unbounded nature of
; some additional decay is needed. Fortunately, we have a strict increase
for
, so by the intermediate value theorem we have
and after a short calculation this gives
Now we turn to the important middle term. If we assume
, then we will have
in the region
, so by Taylor’s theorem with remainder
If we assume that
, then the error term is bounded and we can exponentiate to obtain
for
and hence
If we also assume that
, we can use the error function type estimates from before to estimate
Putting all this together, and using
(17) to estimate
, we conclude that
so if we select for instance, we obtain the Stirling approximation
One can improve the error term by a finer decomposition than
(16); we leave this as an exercise to the interested reader.
Remark 10 It can be convenient to do some initial rescalings to this analysis to achieve a nice normalization; see this previous blog post for details.
Exercise 11 Solve problem (iii) from the introduction. (Hint: extract out the term to write as the exponential factor , placing all the other terms (which are of polynomial size) in the amplitude function . The function will then attain a maximum at ; perform a Taylor expansion and mimic the arguments above.)