Free Essay

Oefeof

In: Film and Music

Submitted By rayycl
Words 103793
Pages 416
An Introduction to Real Analysis
John K. Hunter

1

Department of Mathematics, University of California at Davis

1
The author was supported in part by the NSF. Thanks to Janko Gravner for a number of corrections and comments.

Abstract. These are some notes on introductory real analysis. They cover the properties of the real numbers, sequences and series of real numbers, limits of functions, continuity, differentiability, sequences and series of functions, and
Riemann integration. They don’t include multi-variable calculus or contain any problem sets. Optional sections are starred.

c John K. Hunter, 2014

Contents

Chapter 1.

Sets and Functions

1

1.1.

Sets

1

1.2.

Functions

5

1.3.

Composition and inverses of functions

7

1.4.

Indexed sets

8

1.5.

Relations

11

1.6.

Countable and uncountable sets

14

Chapter 2.

Numbers

21

2.1.

Integers

22

2.2.

Rational numbers

23

2.3.

Real numbers: algebraic properties

25

2.4.

Real numbers: ordering properties

26

2.5.

The supremum and infimum

27

2.6.

Real numbers: completeness

29

2.7.

Properties of the supremum and infimum

31

Chapter 3.

Sequences

35

3.1.

The absolute value

35

3.2.

Sequences

36

3.3.

Convergence and limits

39

3.4.

Properties of limits

43

3.5.

Monotone sequences

45

3.6.

The lim sup and lim inf

48

3.7.

Cauchy sequences

54

3.8.

Subsequences

55 iii iv

Contents

3.9.

The Bolzano-Weierstrass theorem

Chapter 4. Series
4.1. Convergence of series
4.2. The Cauchy condition
4.3. Absolutely convergent series
4.4. The comparison test
4.5. * The Riemann ζ-function
4.6. The ratio and root tests
4.7. Alternating series
4.8. Rearrangements
4.9. The Cauchy product
4.10. * Double series
4.11. * The irrationality of e

57
59
59
62
64
66
68
69
71
73
77
78
86

Chapter 5. Topology of the Real Numbers
5.1. Open sets
5.2. Closed sets
5.3. Compact sets
5.4. Connected sets
5.5. * The Cantor set

89
89
92
95
102
104

Chapter 6. Limits of Functions
6.1. Limits
6.2. Left, right, and infinite limits
6.3. Properties of limits

109
109
114
117

Chapter 7. Continuous Functions
7.1. Continuity
7.2. Properties of continuous functions
7.3. Uniform continuity
7.4. Continuous functions and open sets
7.5. Continuous functions on compact sets
7.6. The intermediate value theorem
7.7. Monotonic functions

121
121
125
127
129
131
133
136

Chapter 8. Differentiable Functions
8.1. The derivative
8.2. Properties of the derivative
8.3. The chain rule
8.4. Extreme values
8.5. The mean value theorem

139
139
145
147
150
152

Contents

8.6.
8.7.
8.8.

Taylor’s theorem
* The inverse function theorem
* L’Hˆspital’s rule o v

154
157
162

Chapter 9. Sequences and Series of Functions
9.1. Pointwise convergence
9.2. Uniform convergence
9.3. Cauchy condition for uniform convergence
9.4. Properties of uniform convergence
9.5. Series

167
167
169
170
171
175

Chapter
10.1.
10.2.
10.3.
10.4.
10.5.
10.6.
10.7.

10. Power Series
Introduction
Radius of convergence
Examples of power series
Algebraic operations on power series
Differentiation of power series
The exponential function
* Smooth versus analytic functions

181
181
182
184
188
193
195
197

Chapter
11.1.
11.2.
11.3.
11.4.
11.5.
11.6.
11.7.
11.8.

11. The Riemann Integral
The supremum and infimum of functions
Definition of the integral
The Cauchy criterion for integrability
Continuous and monotonic functions
Linearity, monotonicity, and additivity
Further existence results
* Riemann sums
* The Lebesgue criterion

205
206
208
215
219
222
230
234
238

Chapter
12.1.
12.2.
12.3.
12.4.
12.5.
12.6.
12.7.

12. Properties and Applications of the Integral
The fundamental theorem of calculus
Consequences of the fundamental theorem
Integrals and sequences of functions
Improper Riemann integrals
* Principal value integrals
The integral test for series
Taylor’s theorem with integral remainder

241
241
246
251
255
261
265
268

Chapter 13. Metric, Normed, and Topological Spaces
13.1. Metric spaces
13.2. Normed spaces

271
271
276

vi

Contents

13.3.

Open and closed sets

279

13.4.

Completeness, compactness, and continuity

282

13.5.

Topological spaces

287

13.6.
13.7.

* Function spaces
* The Minkowski inequality

289
293

Bibliography

299

Chapter 1

Sets and Functions

We understand a “set” to be any collection M of certain distinct objects of our thought or intuition (called the “elements” of M ) into a whole.
(Georg Cantor, 1895)
In mathematics you don’t understand things. You just get used to them.
(Attributed to John von Neumann)
In this chapter, we define sets, functions, and relations and discuss some of their general properties. This material can be referred back to as needed in the subsequent chapters.

1.1. Sets
A set is a collection of objects, called the elements or members of the set. The objects could be anything (planets, squirrels, characters in Shakespeare’s plays, or other sets) but for us they will be mathematical objects such as numbers, or sets of numbers. We write x ∈ X if x is an element of the set X and x ∈ X if x is not
/
an element of X.
If the definition of a “set” as a “collection” seems circular, that’s because it is. Conceiving of many objects as a single whole is a basic intuition that cannot be analyzed further, and the the notions of “set” and “membership” are primitive ones. These notions can be made mathematically precise by introducing a system of axioms for sets and membership that agrees with our intuition and proving other set-theoretic properties from the axioms.
The most commonly used axioms for sets are the ZFC axioms, named somewhat inconsistently after two of their founders (Zermelo and Fraenkel) and one of their axioms (the Axiom of Choice). We won’t state these axioms here; instead, we use
“naive” set theory, based on the intuitive properties of sets. Nevertheless, all the set-theory arguments we use can be rigorously formalized within the ZFC system.
1

2

1. Sets and Functions

Sets are determined entirely by their elements. Thus, the sets X, Y are equal, written X = Y , if x ∈ X if and only if x ∈ Y.
It is convenient to define the empty set, denoted by ∅, as the set with no elements.
(Since sets are determined by their elements, there is only one set with no elements!)
If X = ∅, meaning that X has at least one element, then we say that X is nonempty.
We can define a finite set by listing its elements (between curly brackets). For example, X = {2, 3, 5, 7, 11} is a set with five elements. The order in which the elements are listed or repetitions of the same element are irrelevant. Alternatively, we can define X as the set whose elements are the first five prime numbers. It doesn’t matter how we specify the elements of X, only that they are the same.
Infinite sets can’t be defined by explicitly listing all of their elements. Nevertheless, we will adopt a realist (or “platonist”) approach towards arbitrary infinite sets and regard them as well-defined totalities. In constructive mathematics and computer science, one may be interested only in sets that can be defined by a rule or algorithm — for example, the set of all prime numbers — rather than by infinitely many arbitrary specifications, and there are some mathematicians who consider infinite sets to be meaningless without some way of constructing them. Similar issues arise with the notion of arbitrary subsets, functions, and relations.
1.1.1. Numbers. The infinite sets we use are derived from the natural and real numbers, about which we have a direct intuitive understanding.
Our understanding of the natural numbers 1, 2, 3, . . . derives from counting.
We denote the set of natural numbers by
N = {1, 2, 3, . . . } .
We define N so that it starts at 1. In set theory and logic, the natural numbers are defined to start at zero, but we denote this set by N0 = {0, 1, 2, . . . }. Historically, the number 0 was later addition to the number system, primarily by Indian mathematicians in the 5th century AD. The ancient Greek mathematicians, such as Euclid, defined a number as a multiplicity and didn’t consider 1 to be a number either. Our understanding of the real numbers derives from durations of time and lengths in space. We think of the real line, or continuum, as being composed of an
(uncountably) infinite number of points, each of which corresponds to a real number, and denote the set of real numbers by R. There are philosophical questions, going back at least to Zeno’s paradoxes, about whether the continuum can be represented as a set of points, and a number of mathematicians have disputed this assumption or introduced alternative models of the continuum. There are, however, no known inconsistencies in treating R as a set of points, and since Cantor’s work it has been the dominant point of view in mathematics because of its precision, power, and simplicity. 1.1. Sets

3

We denote the set of (positive, negative and zero) integers by
Z = {. . . , −3, −2, −1, 0, 1, 2, 3, . . . }, and the set of rational numbers (ratios of integers) by
Q = {p/q : p, q ∈ Z and q = 0}.
The letter “Z” comes from “zahl” (German for “number”) and “Q” comes from
“quotient.” These number systems are discussed further in Chapter 2.
Although we will not develop any complex analysis here, we occasionally make use of complex numbers. We denote the set of complex numbers by
C = {x + iy : x, y ∈ R} , where we add and multiply complex numbers in the natural way, with the additional identity that i2 = −1, meaning that i is a square root of −1. If z = x + iy ∈ C, we call x = z the real part of z and y = z the imaginary part of z, and we call
|z| =

x2 + y 2

the absolute value, or modulus, of z. Two complex numbers z = x + iy, w = u + iv are equal if and only if x = u and y = v.
1.1.2. Subsets. A set A is a subset of a set X, written A ⊂ X or X ⊃ A, if every element of A belongs to X; that is, if x ∈ A implies that x ∈ X.
We also say that A is included in X.1 For example, if P is the set of prime numbers, then P ⊂ N, and N ⊂ R. The empty set ∅ and the whole set X are subsets of any set X. Note that X = Y if and only if X ⊂ Y and Y ⊂ X; we often prove the equality of two sets by showing that each one includes the other.
In our notation, A ⊂ X does not imply that A is a proper subset of X (that is, a subset of X not equal to X itself), and we may have A = X. This notation for non-strict inclusion is not universal; some authors use A ⊂ X to denote strict inclusion, in which A = X, and A ⊆ X to denote non-strict inclusion, in which
A = X is allowed.
Definition 1.1. The power set P(X) of a set X is the set of all subsets of X.
Example 1.2. If X = {1, 2, 3}, then
P(X) = {∅, {1}, {2}, {3}, {2, 3}, {1, 3}, {1, 2}, {1, 2, 3}} .
The power set of a finite set with n elements has 2n elements because, in defining a subset, we have two independent choices for each element (does it belong to the subset or not?). In Example 1.2, X has 3 elements and P(X) has 23 = 8 elements. The power set of an infinite set, such as N, consists of all finite and infinite subsets and is infinite. We can define finite subsets of N, or subsets with finite
1By contrast, we say that an element x ∈ X is contained in X, in which cases the singleton set
{x} is included in X. This terminological distinction is not universal, but it is almost always clear from the context whether one is referring to an element of a set or a subset of a set. In fact, before the development of the contemporary notation for set theory, Dedekind [3] used the same symbol (⊆) to denote both membership of elements and inclusion of subsets.

4

1. Sets and Functions

complements, by listing finitely many elements. Some infinite subsets, such as the set of primes or the set of squares, can be defined by giving a definite rule for membership. We imagine that a general subset A ⊂ N is “defined” by going through the elements of N one by one and deciding for each n ∈ N whether n ∈ A or n ∈ A.
/
If X is a set and P is a property of elements of X, we denote the subset of X consisting of elements with the property P by {x ∈ X : P (x)}.
Example 1.3. The set n ∈ N : n = k 2 for some k ∈ N is the set of perfect squares {1, 4, 9, 16, 25, . . . }. The set
{x ∈ R : 0 < x < 1} is the open interval (0, 1).
1.1.3. Set operations. The intersection A ∩ B of two sets A, B is the set of all elements that belong to both A and B; that is x ∈ A ∩ B if and only if x ∈ A and x ∈ B.
Two sets A, B are said to be disjoint if A ∩ B = ∅; that is, if A and B have no elements in common.
The union A ∪ B is the set of all elements that belong to A or B; that is x ∈ A ∪ B if and only if x ∈ A or x ∈ B.
Note that we always use ‘or’ in an inclusive sense, so that x ∈ A ∪ B if x is an element of A or B, or both A and B. (Thus, A ∩ B ⊂ A ∪ B.)
The set-difference of two sets B and A is the set of elements of B that do not belong to A,
B \ A = {x ∈ B : x ∈ A} .
/
If we consider sets that are subsets of a fixed set X that is understood from the context, then we write Ac = X \ A to denote the complement of A ⊂ X in X. Note that (Ac )c = A.
Example 1.4. If
A = {2, 3, 5, 7, 11} ,

B = {1, 3, 5, 7, 9, 11}

then
A ∩ B = {3, 5, 7, 11} ,

A ∪ B = {1, 2, 3, 5, 7, 9, 11} .

Thus, A ∩ B consists of the natural numbers between 1 and 11 that are both prime and odd, while A ∪ B consists of the numbers that are either prime or odd (or both). The set differences of these sets are
B \ A = {1, 9} ,

A \ B = {2} .

Thus, B \ A is the set of odd numbers between 1 and 11 that are not prime, and
A \ B is the set of prime numbers that are not odd.

1.2. Functions

5

These set operations may be represented by Venn diagrams, which can be used to visualize their properties. In particular, if A, B ⊂ X, we have De Morgan’s laws:
(A ∪ B)c = Ac ∩ B c ,

(A ∩ B)c = Ac ∪ B c .

The definitions of union and intersection extend to larger collections of sets in a natural way.
Definition 1.5. Let C be a collection of sets. Then the union of C is
C = {x : x ∈ X for some X ∈ C} , and the intersection of C is
C = {x : x ∈ X for every X ∈ C} .
If C = {A, B}, then this definition reduces to our previous one for A ∪ B and
A ∩ B.
The Cartesian product X × Y of sets X, Y is the set of all ordered pairs (x, y) with x ∈ X and y ∈ Y . If X = Y , we often write X × X = X 2 . Two ordered pairs (x1 , y1 ), (x2 , y2 ) in X × Y are equal if and only if x1 = x2 and y1 = y2 . Thus,
(x, y) = (y, x) unless x = y. This contrasts with sets where {x, y} = {y, x}.
Example 1.6. If X = {1, 2, 3} and Y = {4, 5} then
X × Y = {(1, 4), (1, 5), (2, 4), (2, 5), (3, 4), (3, 5)} .
Example 1.7. The Cartesian product of R with itself is the Cartesian plane R2 consisting of all points with coordinates (x, y) where x, y ∈ R.
The Cartesian product of finitely many sets is defined analogously.
Definition 1.8. The Cartesian products of n sets X1 , X2 ,. . . ,Xn is the set of ordered n-tuples,
X1 × X2 × · · · × Xn = {(x1 , x2 , . . . , xn ) : xi ∈ Xi for i = 1, 2, . . . , n} , where (x1 , x2 , . . . , xn ) = (y1 , y2 , . . . , yn ) if and only if xi = yi for every i =
1, 2, . . . , n.

1.2. Functions
A function f : X → Y between sets X, Y assigns to each x ∈ X a unique element f (x) ∈ Y . Functions are also called maps, mappings, or transformations. The set
X on which f is defined is called the domain of f and the set Y in which it takes its values is called the codomain. We write f : x → f (x) to indicate that f is the function that maps x to f (x).
Example 1.9. The identity function idX : X → X on a set X is the function idX : x → x that maps every element to itself.
Example 1.10. Let A ⊂ X. The characteristic (or indicator) function of A, χA : X → {0, 1},

6

1. Sets and Functions

is defined by
1 if x ∈ A,
0 if x ∈ A.
/
Specifying the function χA is equivalent to specifying the subset A. χA (x) =

Example 1.11. Let A, B be the sets in Example 1.4. We can define a function f : A → B by f (2) = 7,

f (3) = 1,

f (5) = 11, f (7) = 3,

f (11) = 9,

and a function g : B → A by g(1) = 3,

g(3) = 7,

g(5) = 2, g(7) = 2, g(9) = 5,

g(11) = 11.

Example 1.12. The square function f : N → N is defined by f (n) = n2 ,



which we also write as f : n → n2 . The equation g(n) = n, where n is the

positive square root, defines a function g : N → R, but h(n) = ± n does not define a function since it doesn’t specify a unique value for h(n). Sometimes we use a convenient oxymoron and refer to h as a multi-valued function.
One way to specify a function is to explicitly list its values, as in Example 1.11.
Another way is to give a definite rule, as in Example 1.12. If X is infinite and f is not given by a definite rule, then neither of these methods can be used to specify the function. Nevertheless, we suppose that a general function f : X → Y may be
“defined” by picking for each x ∈ X a corresponding value f (x) ∈ Y .
If f : X → Y and U ⊂ X, then we denote the restriction of f to U by f |U : U → Y , where f |U (x) = f (x) for x ∈ U .
In defining a function f : X → Y , it is crucial to specify the domain X of elements on which it is defined. There is more ambiguity about the choice of codomain, however, since we can extend the codomain to any set Z ⊃ Y and define a function g : X → Z by g(x) = f (x). Strictly speaking, even though f and g have exactly the same values, they are different functions since they have different codomains. Usually, however, we will ignore this distinction and regard f and g as being the same function.
The graph of a function f : X → Y is the subset Gf of X × Y defined by
Gf = {(x, y) ∈ X × Y : x ∈ X and y = f (x)} .
For example, if f : R → R, then the graph of f is the usual set of points (x, y) with y = f (x) in the Cartesian plane R2 . Since a function is defined at every point in its domain, there is some point (x, y) ∈ Gf for every x ∈ X, and since the value of a function is uniquely defined, there is exactly one such point. In other words, for each x ∈ X the “vertical line” Lx = {(x, y) ∈ X × Y : y ∈ Y } through x intersects the graph of a function f : X → Y in exactly one point: Lx ∩ Gf = (x, f (x)).
Definition 1.13. The range, or image, of a function f : X → Y is the set of values ran f = {y ∈ Y : y = f (x) for some x ∈ X} .
A function is onto if its range is all of Y ; that is, if for every y ∈ Y there exists x ∈ X such that y = f (x).

1.3. Composition and inverses of functions

7

A function is one-to-one if it maps distinct elements of X to distinct elements of
Y ; that is, if x1 , x2 ∈ X and x1 = x2 implies that f (x1 ) = f (x2 ).
An onto function is also called a surjection, a one-to-one function an injection, and a one-to-one, onto function a bijection.
Example 1.14. The function f : A → B defined in Example 1.11 is one-to-one but not onto, since 5 ∈ ran f , while the function g : B → A is onto but not one-to-one,
/
since g(5) = g(7).

1.3. Composition and inverses of functions
The successive application of mappings leads to the notion of the composition of functions. Definition 1.15. The composition of functions f : X → Y and g : Y → Z is the function g ◦ f : X → Z defined by
(g ◦ f )(x) = g (f (x)) .
The order of application of the functions in a composition is crucial and is read from from right to left. The composition g ◦ f can only be defined if the domain of g includes the range of f , and the existence of g ◦ f does not imply that f ◦ g even makes sense.
Example 1.16. Let X be the set of students in a class and f : X → N the function that maps a student to her age. Let g : N → N be the function that adds up the digits in a number e.g., g(1729) = 19. If x ∈ X is 23 years old, then (g ◦ f )(x) = 5, but (f ◦ g)(x) makes no sense, since students in the class are not natural numbers.
Even if both g ◦ f and f ◦ g are defined, they are, in general, different functions.
Example 1.17. If f : A → B and g : B → A are the functions in Example 1.11, then g ◦ f : A → A is given by
(g ◦ f )(2) = 2,

(g ◦ f )(3) = 3,

(g ◦ f )(7) = 7,

(g ◦ f )(5) = 11,

(g ◦ f )(11) = 5.

and f ◦ g : B → B is given by
(f ◦ g)(1) = 1,

(f ◦ g)(3) = 3,

(f ◦ g)(7) = 7,

(f ◦ g)(9) = 11,

(f ◦ g)(5) = 7,
(f ◦ g)(11) = 9.

A one-to-one, onto function f : X → Y has an inverse f −1 : Y → X defined by f −1 (y) = x if and only if f (x) = y.
Equivalently, f −1 ◦ f = idX and f ◦ f −1 = idY . A value f −1 (y) is defined for every y ∈ Y since f is onto, and it is unique since f is one-to-one. If f : X → Y is oneto-one but not onto, then one can still define an inverse function f −1 : ran f → X whose domain in the range of f .
The use of the notation f −1 to denote the inverse function should not be confused with its use to denote the reciprocal function; it should be clear from the context which meaning is intended.

8

1. Sets and Functions

Example 1.18. If f : R → R is the function f (x) = x3 , which is one-to-one and onto, then the inverse function f −1 : R → R is given by f −1 (x) = x1/3 .
On the other hand, the reciprocal function g = 1/f is given by
1
, g : R \ {0} → R. x3 The reciprocal function is not defined at x = 0 where f (x) = 0. g(x) =

If f : X → Y and A ⊂ X, then we let f (A) = {y ∈ Y : y = f (x) for some x ∈ A} denote the set of values of f on points in A. Similarly, if B ⊂ Y , we let f −1 (B) = {x ∈ X : f (x) ∈ B} denote the set of points in X whose values belong to B. Note that f −1 (B) makes sense as a set even if the inverse function f −1 : Y → X does not exist.
Example 1.19. Define f : R → R by f (x) = x2 . If A = (−2, 2), then f (A) = [0, 4).
If B = (0, 4), then f −1 (B) = (−2, 0) ∪ (0, 2).
If C = (−4, 0), then f −1 (C) = ∅.
Finally, we introduce operations on a set.
Definition 1.20. A binary operation on a set X is a function f : X × X → X.
We think of f as “combining” two elements of X to give another element of X. One can also consider higher-order operations, such as ternary operations f : X × X × X → X, but will will only use binary operations.
Example 1.21. Addition a : N × N → N and multiplication m : N × N → N are binary operations on N where a(x, y) = x + y,

m(x, y) = xy.

1.4. Indexed sets
We say that a set X is indexed by a set I, or X is an indexed set, if there is an onto function f : I → X. We then write
X = {xi : i ∈ I} where xi = f (i). For example,
{1, 4, 9, 16, . . . } = n2 : n ∈ N .
The set X itself is the range of the indexing function f , and it doesn’t depend on how we index it. If f isn’t one-to-one, then some elements are repeated, but this doesn’t affect the definition of the set X. For example,
{−1, 1} = {(−1)n : n ∈ N} = (−1)n+1 : n ∈ N .

1.4. Indexed sets

9

If C = {Xi : i ∈ I} is an indexed collection of sets Xi , then we denote the union and intersection of the sets in C by
Xi = {x : x ∈ Xi for some i ∈ I} , i∈I Xi = {x : x ∈ Xi for every i ∈ I} , i∈I or similar notation.
Example 1.22. For n ∈ N, define the intervals
An = [1/n, 1 − 1/n] = {x ∈ R : 1/n ≤ x ≤ 1 − 1/n},
Bn = (−1/n, 1/n) = {x ∈ R : −1/n < x < 1/n}).
Then



An =



An = (0, 1), n=1 n∈N

Bn = {0}.

Bn = n=1 n∈N

The general statement of De Morgan’s laws for a collection of sets is as follows.
Proposition 1.23 (De Morgan). If {Xi ⊂ X : i ∈ I} is a collection of subsets of a set X, then c Xi

c c Xi ,

=

i∈I

Xi

i∈I

c
Xi .

=

i∈I

i∈I

Proof. We have x ∈ i∈I Xi if and only if x ∈ Xi for every i ∈ I, which holds
/
/ c if and only if x ∈ i∈I Xi . Similarly, x ∈ i∈I Xi if and only if x ∈ Xi for some
/
/ c i ∈ I, which holds if and only if x ∈ i∈I Xi .
The following theorem summarizes how unions and intersections map under functions. Theorem 1.24. Let f : X → Y be a function. If {Yj ⊂ Y : j ∈ J} is a collection of subsets of Y , then



 f −1 

f −1 (Yj ) ,

Yj  =

j∈J

f −1 

j∈J

f −1 (Yj ) ;

Yj  = j∈J j∈J

and if {Xi ⊂ X : i ∈ I} is a collection of subsets of X, then f Xi i∈I =

f (Xi ) ,

f

i∈I

Xi



i∈I

f (Xi ) . i∈I Proof. We prove only the results for the inverse image of a union and the image of an intersection; the proof of the remaining two results is similar.
If x ∈ f −1

j∈J

Yj , then there exists y ∈
−1

y ∈ Yj for some j ∈ J and x ∈ f


(Yj ), so x ∈


f −1  j∈J j∈J j∈J Yj such that f (x) = y. Then

f −1 (Yj ). It follows that

f −1 (Yj ) .

Yj  ⊂ j∈J 10

1. Sets and Functions

Conversely, if x ∈ and f (x) ∈

j∈J

j∈J

f −1 (Yj ), then x ∈ f −1 (Yj ) for some j ∈ J, so f (x) ∈ Yj

Yj , meaning that x ∈ f −1

j∈J

 f −1 (Yj ) ⊂ f −1  j∈J Yj . It follows that

Yj  ,

j∈J

which proves that the sets are equal.
If y ∈ f i∈I Xi , then there exists x ∈ i∈I Xi such that f (x) = y. Then x ∈ Xi and y ∈ f (Xi ) for every i ∈ I, meaning that y ∈ i∈I f (Xi ). It follows that f

Xi



i∈I

f (Xi ) . i∈I The only case in which we don’t always have equality is for the image of an intersection, and we may get strict inclusion here if f is not one-to-one.
Example 1.25. Define f : R → R by f (x) = x2 . Let A = (−1, 0) and B = (0, 1).
Then A ∩ B = ∅ and f (A ∩ B) = ∅, but f (A) = f (B) = (0, 1), so f (A) ∩ f (B) =
(0, 1) = f (A ∩ B).
Next, we generalize the Cartesian product of finitely many sets to the product of possibly infinitely many sets.
Definition 1.26. Let C = {Xi : i ∈ I} be an indexed collection of sets Xi . The
Cartesian product of C is the set of functions that assign to each index i ∈ I an element xi ∈ Xi . That is,
Xi = i∈I f :I→

Xi : f (i) ∈ Xi for every i ∈ I

.

i∈I

For example, if I = {1, 2, . . . , n}, then f defines an ordered n-tuple of elements
(x1 , x2 , . . . , xn ) with xi = f (i) ∈ Xi , so this definition is equivalent to our previous one. If Xi = X for every i ∈ I, then to X, and we also write it as

i∈I

Xi is simply the set of functions from I

X I = {f : I → X} .
We can think of this set as the set of ordered I-tuples of elements of X.
Example 1.27. A sequence of real numbers (x1 , x2 , x3 , . . . , xn , . . . ) ∈ RN is a function f : N → R. We study sequences and their convergence properties in
Chapter 3.
Example 1.28. Let 2 = {0, 1} be a set with two elements. Then a subset A ⊂ I can be identified with its characteristic function χA : I → 2 by: i ∈ A if and only if χA (i) = 1. Thus, A → χA is a one-to-one map from P(I) onto 2I .
Before giving another example, we introduce some convenient notation.

1.5. Relations

11

Definition 1.29. Let
Σ = {(s1 , s2 , s3 , . . . , sk , . . . ) : sk = 0, 1} denote the set of all binary sequences; that is, sequences whose terms are either 0 or 1.
Example 1.30. Let 2 = {0, 1}. Then Σ = 2N , where we identify a sequence
(s1 , s2 , . . . sk , . . . ) with the function f : N → 2 such that sk = f (k). We can also identify Σ and 2N with P(N) as in Example 1.28. For example, the sequence
(1, 0, 1, 0, 1, . . . ) of alternating ones and zeros corresponds to the function f : N → 2 defined by
1 if k is odd, f (k) =
0 if k is even, and to the set {1, 3, 5, 7, . . . } ⊂ N of odd natural numbers.

1.5. Relations
A binary relation R on sets X and Y is a definite relation between elements of X and elements of Y . We write xRy if x ∈ X and y ∈ Y are related. One can also define relations on more than two sets, but we shall consider only binary relations and refer to them simply as relations. If X = Y , then we call R a relation on X.
Example 1.31. Suppose that S is a set of students enrolled in a university and B is a set of books in a library. We might define a relation R on S and B by: s ∈ S has read b ∈ B.
In that case, sRb if and only if s has read b. Another, probably inequivalent, relation is: s ∈ S has checked b ∈ B out of the library.
When used informally, relations may be ambiguous (did s read b if she only read the first page?), but in mathematical usage we always require that relations are definite, meaning that one and only one of the statements “these elements are related” or “these elements are not related” is true.
The graph GR of a relation R on X and Y is the subset of X × Y defined by
GR = {(x, y) ∈ X × Y : xRy} .
This graph contains all of the information about which elements are related. Conversely, any subset G ⊂ X ×Y defines a relation R by: xRy if and only if (x, y) ∈ G.
Thus, a relation on X and Y may be (and often is) defined as subset of X × Y . As for sets, it doesn’t matter how a relation is defined, only what elements are related.
A function f : X → Y determines a relation F on X and Y by: xF y if and only if y = f (x). Thus, functions are a special case of relations. The graph GR of a general relation differs from the graph GF of a function in two ways: there may be elements x ∈ X such that (x, y) ∈ GR for any y ∈ Y , and there may be x ∈ X
/
such that (x, y) ∈ GR for many y ∈ Y .
For example, in the case of the relation R in Example 1.31, there may be some students who haven’t read any books, and there may be other students who have

12

1. Sets and Functions

read lots of books, in which case we don’t have a well-defined function from students to books.
Two important types of relations are orders and equivalence relations, and we define them next.
1.5.1. Orders. A primary example of an order is the standard order ≤ on the natural (or real) numbers. This order is a linear or total order, meaning that two numbers are always comparable. Another example of an order is inclusion ⊂ on the power set of some set; one set is “smaller” than another set if it is included in it.
This order is a partial order (provided the original set has at least two elements), meaning that two subsets need not be comparable.
Example 1.32. Let X = {1, 2}. The collection of subsets of X is
P(X) = {∅, A, B, X} ,

A = {1},

B = {2}.

We have ∅ ⊂ A ⊂ X and ∅ ⊂ B ⊂ X, but A ⊂ B and B ⊂ A, so A and B are not comparable under ordering by inclusion.
The general definition of an order is as follows.
Definition 1.33. An order every x, y, z ∈ X:
(a) x

on a set X is a binary relation on X such that for

x (reflexivity);

(b) if x

y and y

x then x = y (antisymmetry);

(c) if x

y and y

z then x

z (transitivity).

An order is a linear, or total, order if for every x, y ∈ X either x otherwise it is a partial order.
If
is an order, then we also write y corresponding strict order by x y if x

x instead of x

y or y

x,

y, and we define a

y and x = y.

There are many ways to order a given set (with two or more elements).
Example 1.34. Let X be a set. One way to partially order the subsets of X is by inclusion, as in Example 1.32. Another way is to say that A B for A, B ⊂ X if and only if A ⊃ B, meaning that A is “smaller” than B if A includes B. Then in an order on P(X), called ordering by reverse inclusion.
1.5.2. Equivalence relations. Equivalence relations decompose a set into disjoint subsets, called equivalence classes. We begin with an example of an equivalence relation on N.
Example 1.35. Fix N ∈ N and say that m ∼ n if m ≡ n (mod N ), meaning that m − n is divisible by N . Two numbers are related by ∼ if they have the same remainder when divided by N . Moreover, N is the union of N equivalence classes, consisting of numbers with remainders 0, 1,. . . N − 1 modulo N .

1.5. Relations

13

The definition of an equivalence relation differs from the definition of an order only by changing antisymmetry to symmetry, but order relations and equivalence relations have completely different properties.
Definition 1.36. An equivalence relation ∼ on a set X is a binary relation on X such that for every x, y, z ∈ X:
(a) x ∼ x (reflexivity);
(b) if x ∼ y then y ∼ x (symmetry);
(c) if x ∼ y and y ∼ z then x ∼ z (transitivity).
For each x ∈ X, the set of elements equivalent to x,
[x/ ∼] = {y ∈ X : x ∼ y} , is called the equivalence class of x with respect to ∼. When the equivalence relation is understood, we write the equivalence class [x/ ∼] simply as [x]. The set of equivalence classes of an equivalence relation ∼ on a set X is denoted by X/ ∼.
Note that each element of X/ ∼ is a subset of X, so X/ ∼ is a subset of the power set P(X) of X.
The following theorem is the basic result about equivalence relations. It says that an equivalence relation on a set partitions the set into disjoint equivalence classes. Theorem 1.37. Let ∼ be an equivalence relation on a set X. Every equivalence class is non-empty, and X is the disjoint union of the equivalence classes of ∼.
Proof. If x ∈ X, then the symmetry of ∼ implies that x ∈ [x]. Therefore every equivalence class is non-empty and the union of the equivalence classes is X.
To prove that the union is disjoint, we show that for every x, y ∈ X either
[x] ∩ [y] = ∅ (if x ∼ y) or [x] = [y] (if x ∼ y).
Suppose that [x] ∩ [y] = ∅. Let z ∈ [x] ∩ [y] be an element in both equivalence classes. If x1 ∈ [x], then x1 ∼ z and z ∼ y, so x1 ∼ y by the transitivity of ∼, and therefore x1 ∈ [y]. It follows that [x] ⊂ [y]. A similar argument applied to y1 ∈ [y] implies that [y] ⊂ [x], and therefore [x] = [y]. In particular, y ∈ [x], so x ∼ y. On the other hand, if [x] ∩ [y] = ∅, then y ∈ [x] since y ∈ [y], so x ∼ y.
/
There is a natural projection π : X → X/ ∼, given by π(x) = [x], that maps each element of X to the equivalence class that contains it. Conversely, we can index the collection of equivalence classes
X/ ∼ = {[a] : a ∈ A} by a subset A of X which contains exactly one element from each equivalence class.
It is important to recognize, however, that such an indexing involves an arbitrary choice of a representative element from each equivalence class, and it is better to think in terms of the collection of equivalence classes, rather than a subset of elements. Example 1.38. The equivalence classes of N relative to the equivalence relation m ∼ n if m ≡ n (mod 3) are given by
I0 = {3, 6, 9, . . . },

I1 = {1, 4, 7, . . . },

I2 = {2, 5, 8, . . . }.

14

1. Sets and Functions

The projection π : N → {I0 , I1 , I2 } maps a number to its equivalence class e.g. π(101) = I2 . We can choose {1, 2, 3} as a set of representative elements, in which case I0 = [3],
I1 = [1],
I2 = [2], but any other set A ⊂ N of three numbers with remainders 0, 1, 2 (mod 3) will do.
For example, if we choose A = {7, 15, 101}, then
I0 = [15],

I1 = [7],

I2 = [101].

1.6. Countable and uncountable sets
One way to show that two sets have the same “size” is to pair off their elements.
For example, if we can match up every left shoe in a closet with a right shoe, with no right shoes left over, then we know that we have the same number of left and right shoes. That is, we have the same number of left and right shoes if there is a one-to-one, onto map f : L → R, or one-to-one correspondence, from the set L of left shoes to the set R of right shoes.
We refer to the “size” of a set as measured by one-to-one correspondences as its cardinality. This notion enables us to compare the cardinality of both finite and infinite sets. In particular, we can use it to distinguish between “smaller” countably infinite sets, such as the integers or rational numbers, and “larger” uncountably infinite sets, such as the real numbers.
Definition 1.39. Two sets X, Y have equal cardinality, written X ≈ Y , if there is a one-to-one, onto map f : X → Y . The cardinality of X is less than or equal to the cardinality of Y , written X
Y , if there is a one-to-one (but not necessarily onto) map g : X → Y .
If X ≈ Y , then we also say that X, Y have the same cardinality. We don’t define the notion of a “cardinal number” here, only the relation between sets of
“equal cardinality.”
Note that ≈ is an equivalence relation on any collection of sets. In particular, it is transitive because if X ≈ Y and Y ≈ Z, then there are one-to-one and onto maps f : X → Y and g : Y → Z, so g ◦ f : X → Z is one-to-one and onto, and
X ≈ Z. We may therefore divide any collection of sets into equivalence classes of sets with equal cardinality.
It follows immediately from the definition that is reflexive and transitive.
Furthermore, as stated in the following Schr¨der-Bernstein theorem, if X Y and o Y
X, then X ≈ Y . This result allows us to prove that two sets have equal cardinality by constructing one-to-one maps that need not be onto. The statement of the theorem is intuitively obvious but the proof, while elementary, is surprisingly involved and can be omitted without loss of continuity. (We will only use the theorem once, in the proof of Theorem 5.67.)
Theorem 1.40 (* Schr¨der-Bernstein). If X, Y are sets such that there are oneo to-one maps f : X → Y and g : Y → X, then there is a one-to-one, onto map h:X →Y.

1.6. Countable and uncountable sets

15

Proof. We divide X into three disjoint subsets XX , XY , X∞ with different mapping properties as follows.
Consider a point x1 ∈ X. If x1 is not in the range of g, then we say x1 ∈ XX .
Otherwise there exists y1 ∈ Y such that g(y1 ) = x1 , and y1 is unique since g is one-to-one. If y1 is not in the range of f , then we say x1 ∈ XY . Otherwise there exists a unique x2 ∈ X such that f (x2 ) = y1 . Continuing in this way, we generate a sequence of points x1 , y1 , x2 , y2 , . . . , xn , yn , xn+1 , . . . with xn ∈ X, yn ∈ Y and g(yn ) = xn ,

f (xn+1 ) = yn .

We assign the starting point x1 to a subset in the following way: (a) x1 ∈ XX if the sequence terminates at some xn ∈ X that isn’t in the range of g; (b) x1 ∈ XY if the sequence terminates at some yn ∈ Y that isn’t in the range of f ; (c) x1 ∈ X∞ if the sequence never terminates.
Similarly, if y1 ∈ Y , then we generate a sequence of points y1 , x1 , y2 , x2 , . . . , yn , xn , yn+1 , . . . with xn ∈ X, yn ∈ Y by f (xn ) = yn ,

g(yn+1 ) = xn ,

and we assign y1 to a subset YX , YY , or Y∞ of Y as follows: (a) y1 ∈ YX if the sequence terminates at some xn ∈ X that isn’t in the range of g; (b) y1 ∈ YY if the sequence terminates at some yn ∈ Y that isn’t in the range of f ; (c) y1 ∈ Y∞ if the sequence never terminates.
We claim that f : XX → YX is one-to-one and onto. First, if x ∈ XX , then f (x) ∈ YX because the the sequence generated by f (x) coincides with the sequence generated by x after its first term, so both sequences terminate at a point in X.
Second, if y ∈ YX , then there is x ∈ X such that f (x) = y, otherwise the sequence would terminate at y ∈ Y , meaning that y ∈ YY . Furthermore, we must have x ∈ XX because the sequence generated by x is a continuation of the sequence generated by y and therefore also terminates at a point in X. Finally, f is one-toone on XX since f is one-to-one on X.
The same argument applied to g : YY → XY implies that g is one-to-one and onto, so g −1 : XY → YY is one-to-one and onto.
Finally, similar arguments show that f : X∞ → Y∞ is one-to-one and onto: If x ∈ X∞ , then the sequence generated by f (x) ∈ Y doesn’t terminate, so f (x) ∈ Y∞ ; and every y ∈ Y∞ is the image of a point x ∈ X which, like y, generates a sequence that does not terminate, so x ∈ X∞ .
It then follows that h : X → Y defined by

f (x) if x ∈ XX

h(x) = g −1 (x) if x ∈ XY

 f (x) if x ∈ X∞ is a one-to-one, onto map from X to Y .

16

1. Sets and Functions

We can use the cardinality relation to describe the “size” of a set by comparing it with standard sets.
Definition 1.41. A set X is:
(1) Finite if it is the empty set or X ≈ {1, 2, . . . , n} for some n ∈ N;
(2) Countably infinite (or denumerable) if X ≈ N;
(3) Infinite if it is not finite;
(4) Countable if it is finite or countably infinite;
(5) Uncountable if it is not countable.
We’ll take for granted some intuitively obvious facts which follow from the definitions. For example, a finite, non-empty set is in one-to-one correspondence with {1, 2, . . . , n} for a unique natural number n ∈ N (the number of elements in the set), a countably infinite set is not finite, and a subset of a countable set is countable. According to Definition 1.41, we may divide sets into disjoint classes of finite, countably infinite, and uncountable sets. We also distinguish between finite and infinite sets, and countable and uncountable sets. We will show below, in Theorem 2.19, that the set of real numbers is uncountable, and we refer to its cardinality as the cardinality of the continuum.
Definition 1.42. A set X has the cardinality of the continuum if X ≈ R.
One has to be careful in extrapolating properties of finite sets to infinite sets.
Example 1.43. The set of squares
S = {1, 4, 9, 16, . . . , n2 , . . . } is countably infinite since f : N → S defined by f (n) = n2 is one-to-one and onto.
It may appear surprising at first that the set N can be in one-to-one correspondence with an apparently “smaller” proper subset S, since this doesn’t happen for finite sets. In fact, assuming the axiom of choice, one can show that a set is infinite if and only if it has the same cardinality as a proper subset. Dedekind (1888) used this property to give a definition infinite sets that did not depend on the natural numbers N.
Next, we prove some results about countable sets. The following proposition states a useful necessary and sufficient condition for a set to be countable.
Proposition 1.44. A non-empty set X is countable if and only if there is an onto map f : N → X.
Proof. If X is countably infinite, then there is a one-to-one, onto map f : N → X.
If X is finite and non-empty, then for some n ∈ N there is a one-to-one, onto map g : {1, 2, . . . , n} → X. Choose any x ∈ X and define the onto map f : N → X by f (k) =

g(k) if k = 1, 2, . . . , n, x if k = n + 1, n + 2, . . . .

1.6. Countable and uncountable sets

17

Conversely, suppose that such an onto map exists. We define a one-to-one, onto map g recursively by omitting repeated values of f . Explicitly, let g(1) = f (1).
Suppose that n ≥ 1 and we have chosen n distinct g-values g(1), g(2), . . . , g(n). Let
An = {k ∈ N : f (k) = g(j) for every j = 1, 2, . . . , n} denote the set of natural numbers whose f -values are not already included among the g-values. If An = ∅, then g : {1, 2, . . . , n} → X is one-to-one and onto, and X is finite. Otherwise, let kn = min An , and define g(n + 1) = f (kn ), which is distinct from all of the previous g-values. Either this process terminates, and X is finite, or we go through all the f -values and obtain a one-to-one, onto map g : N → X, and
X is countably infinite.
If X is a countable set, then we refer to an onto function f : N → X as an enumeration of X, and write X = {xn : n ∈ N}, where xn = f (n).
Proposition 1.45. The Cartesian product N × N is countably infinite.
Proof. Define a linear order
(m, n)

(m , n )

on ordered pairs of natural numbers as follows:

if either m + n < m + n or m + n = m + n and n < n .

That is, we arrange N × N in a table
(1, 1) (1, 2) (1, 3)
(2, 1) (2, 2) (2, 3)
(3, 1) (3, 2) (3, 3)
(4, 1) (4, 2) (4, 3)
.
.
.
.
.
.
.
.
.

(1, 4)
(2, 4)
(3, 4)
(4, 4)
.
.
.

...
...
...
...
..
.

and list it along successive diagonals from bottom-left to top-right as
(1, 1), (2, 1), (1, 2), (3, 1), (2, 2), (1, 3), (4, 1), (3, 2), (2, 3), (1, 4), . . . .
We define f : N → N × N by setting f (n) equal to the nth pair in this order; for example, f (7) = (4, 1). Then f is one-to-one and onto, so N × N is countably infinite. Theorem 1.46. A countable union of countable sets is countable.
Proof. Let {Xn : n ∈ N} be a countable collection of countable sets. From Proposition 1.44, there is an onto map fn : N → Xn . We define g :N×N→

Xn n∈N by g(n, k) = fn (k). Then g is also onto. From Proposition 1.45, there is a one-toone, onto map h : N → N × N, and it follows that g◦h:N→ Xn n∈N is onto, so Proposition 1.44 implies that the union of the Xn is countable.

18

1. Sets and Functions

The next theorem gives a fundamental example of an uncountable set, namely the set of all subsets of natural numbers. The proof uses a “diagonal” argument due to Cantor (1891), which is of frequent use in analysis. Recall from Definition 1.1 that the power set of a set is the collection of all its subsets.
Theorem 1.47. The power set P(N) of N is uncountable.
Proof. Let C ⊂ P(N) be a countable collection of subsets of N
C = {An ⊂ N : n ∈ N} .
Define a subset A ⊂ N by
A = {n ∈ N : n ∈ An } .
/
Then A = An for every n ∈ N since either n ∈ A and n ∈ An or n ∈ A and n ∈ An .
/
/
Thus, A ∈ C. It follows that no countable collection of subsets of N includes all of
/
the subsets of N, so P(N) is uncountable.
This theorem has an immediate corollary for the set Σ of binary sequences defined in Definition 1.29.
Corollary 1.48. The set Σ of binary sequences has the same cardinality as P(N) and is uncountable.
Proof. By Example 1.30, the set Σ is in one-to-one correspondence with P(N), which is uncountable.
It is instructive to write the diagonal argument in terms of binary sequences.
Suppose that S = {sn ∈ Σ : n ∈ N} is a countable set of binary sequences that begins, for example, as follows s1 = 0 0 1 1 0 1 . . . s2 = 1 1 0 0 1 0 . . . s3 = 1 1 0 1 1 0 . . . s4 = 0 1 1 0 0 0 . . . s5 = 1 0 0 1 1 1 . . . s6 = 1 0 0 1 0 0 . . .
.
.
.
Then we get a sequence s ∈ S by going down the diagonal and switching the values
/
from 0 to 1 or from 1 to 0. For the previous sequences, this gives s = 101101 ....
We will show in Theorem 5.67 below that Σ and P(N) are also in one-to-one correspondence with R, so both have the cardinality of the continuum.
A similar diagonal argument to the one used in Theorem 1.47 shows that for every set X the cardinality of the power set P(X) is strictly greater than the cardinality of X. In particular, the cardinality of P(P(N)) is strictly greater than the cardinality of P(N), the cardinality of P(P(P(N))) is strictly greater than

1.6. Countable and uncountable sets

19

the cardinality of P(P(N), and so on. Thus, there are many other uncountable cardinalities apart from the cardinality of the continuum.
Cantor (1878) raised the question of whether or not there are any sets whose cardinality lies strictly between that of N and P(N). The statement that there are no such sets is called the continuum hypothesis, which may be formulated as follows. Hypothesis 1.49 (Continuum). If C ⊂ P(N) is infinite, then either C ≈ N or
C ≈ P(N).
The work of G¨del (1940) and Cohen (1963) established the remarkable result o that the continuum hypothesis cannot be proved or disproved from the standard axioms of set theory (assuming, as we believe to be the case, that these axioms are consistent). This result illustrates a fundamental and unavoidable incompleteness in the ability of any finite system of axioms to capture the properties of any mathematical structure that is rich enough to include the natural numbers.

Chapter 2

Numbers

God created the integers and the rest is the work of man. (Leopold
Kronecker, in an after-dinner speech at a conference, Berlin, 1886)
“God created the integers and the rest is the work of man.” This maxim spoken by the algebraist Kronecker reveals more about his past as a banker who grew rich through monetary speculation than about his philosophical insight. There is hardly any doubt that, from a psychological and, for the writer, ontological point of view, the geometric continuum is the primordial entity. If one has any consciousness at all, it is consciousness of time and space; geometric continuity is in some way inseparably bound to conscious thought. (Ren´ Thom, 1986) e In this chapter, we describe the properties of the basic number systems. We briefly discuss the integers and rational numbers, and then consider the real numbers in more detail.
The real numbers form a complete number system which includes the rational numbers as a dense subset. We will summarize the properties of the real numbers in a list of intuitively reasonable axioms, which we assume in everything that follows.
These axioms are of three types: (a) algebraic; (b) ordering; (c) completeness. The completeness of the real numbers is what distinguishes them from the rationals numbers and is the essential property for analysis.
The rational numbers may be constructed from the natural numbers as pairs of integers, and there are several ways to construct the real numbers from the rational numbers. For example, Dedekind used cuts of the rationals, while Cantor used equivalence classes of Cauchy sequences of rational numbers. The real numbers that are constructed in either way satisfy the axioms given in this chapter.
These constructions show that the real numbers are as well-founded as the natural numbers (at least, if we take set theory for granted), but they don’t lead to any new properties of the real numbers, and we won’t describe them here.
21

22

2. Numbers

2.1. Integers
Why then is this view [the induction principle] imposed upon us with such an irresistible weight of evidence? It is because it is only the affirmation of the power of the mind which knows it can conceive of the indefinite repetition of the same act, when that act is once possible. (Poincar´, e 1902)
The set of natural numbers, or positive integers, is
N = {1, 2, 3, . . . } .
We add and multiply natural numbers in the usual way. (The formal algebraic properties of addition and multiplication on N follow from the ones stated below for R.)
An essential property of the natural numbers is the following induction principle, which expresses the idea that we can reach every natural number by counting upwards from one.
Axiom 2.1. Suppose that A ⊂ N is a set of natural numbers such that: (a) 1 ∈ A;
(b) n ∈ A implies (n + 1) ∈ A. Then A = N.
This principle, together with appropriate algebraic properties, is enough to completely characterize the natural numbers. For example, one standard set of axioms is the Peano axioms, first stated by Dedekind [3], but we won’t describe them in detail here.
As an illustration of how induction can be used, we prove the following result for the sum of the first n squares, written in summation notation as n k 2 = 12 + 22 + 32 + · · · + n2 . k=1 Proposition 2.2. For every n ∈ N, n k2 = k=1 1 n(n + 1)(2n + 1).
6

Proof. Let A be the set of n ∈ N for which this identity holds. It holds for n = 1, so 1 ∈ A. Suppose the identity holds for some n ∈ N. Then n+1 n

k2 = k=1 k 2 + (n + 1)2 k=1 1 n(n + 1)(2n + 1) + (n + 1)2
6
1
= (n + 1) 2n2 + 7n + 6
6
1
= (n + 1)(n + 2)(2n + 3).
6
It follows that the identity holds when n is replaced by n + 1. Thus n ∈ A implies that (n + 1) ∈ A, so A = N, and the proposition follows by induction.
=

2.2. Rational numbers

23

Note that the right hand side of the identity in Proposition 2.2 is always an integer, as it must be, since one of n, n + 1 is divisible by 2 and one of n, n + 1,
2n + 1 is divisible by 3.
Equations for the sum of the first n cubes, n k3 = k=1 1 2 n (n + 1)2 ,
4

and other powers can be proved by induction in a similar way. Another example of a result that can be proved by induction is the Euler-Binet formula in Proposition 3.9 for the terms in the Fibonacci sequence.
One defect of such a proof by induction is that although it verifies the result, it does not explain where the original hypothesis comes from. A separate argument is often required to come up with a plausible hypothesis. For example, it is reasonable to guess that the sum of the first n squares might be a cubic polynomial in n. The possible values of the coefficients can then be found by evaluating the first few sums, after which the general result may be verified by induction.
The set of integers consists of the natural numbers, their negatives (or additive inverses), and zero (the additive identity):
Z = {. . . , −3, −2, −1, 0, 1, 2, 3, . . . } .
We can add, subtract, and multiply integers in the usual way. In algebraic terminology, (Z, +, ·) is a commutative ring with identity.
Like the natural numbers N, the integers Z are countably infinite.
Proposition 2.3. The set of integers Z is countably infinite.
Proof. The function f : N → Z defined by f (1) = 0, and f (2n) = n,

f (2n + 1) = −n

for n ≥ 1,

is one-to-one and onto.
The function in the previous proof corresponds to listing the integers as
0, 1, −1, 2, −2, 3, −3, 4, −4, 5, −5, . . . .
Alternatively, but less directly, we can prove Proposition 2.3 by writing
Z = −N ∪ {0} ∪ N as a countable union of countable sets and applying Theorem 1.46.

2.2. Rational numbers
A rational number is a ratio of integers. We denote the set of rational numbers by
Q=

p
: p, q ∈ Z and q = 0 q where we may cancel common factors from the numerator and denominator, meaning that p1 p2
=
if and only if p1 q2 = p2 q1 . q1 q2

24

2. Numbers

We can add, subtract, multiply, and divide (except by 0) rational numbers in the usual way. In algebraic terminology, (Q, +, ·) a field. We state the field axioms explicitly for R in Axiom 2.6 below.
We can construct Q from Z as the collection of equivalence classes in Z×Z\{0} with respect to the equivalence relation (p1 , q1 ) ∼ (p2 , q2 ) if p1 q2 = p2 q1 . The usual sums and products of rational numbers are well-defined on these equivalence classes.
The rational numbers are linearly ordered by their standard order, and this order is compatible with the algebraic structure of Q. Thus, (Q, +, ·, 0} denote the set of positive rational numbers, and define the onto (but not one-to-one) map p g : N × N → Q+ , g(p, q) = . q Let h : N → N × N be a one-to-one, onto map, as obtained in Proposition 1.45, and define f : N → Q+ by f = g ◦ h. Then f : N → Q+ is onto, and Proposition 1.44 implies that Q+ is countable. It follows that Q = Q− ∪ {0} ∪ Q+ , where Q− ≈ Q+ denotes the set of negative rational numbers, is countable.
Alternatively, we can write
{p/q : p ∈ Z}

Q= q∈N as a countable union of countable sets, and use Theorem 1.46. As we prove in Theorem 2.19, the real numbers are uncountable, so there are many “more” irrational numbers than rational numbers.

2.3. Real numbers: algebraic properties
The algebraic properties of R are summarized in the following axioms, which state that (R, +, ·) is a field.
Axiom 2.6. There exist binary operations a, m : R × R → R, written a(x, y) = x + y and m(x, y) = x · y = xy, and elements 0, 1 ∈ R such that for all x, y, z ∈ R:
(a) x + 0 = x (existence of an additive identity 0);
(b) for every x ∈ R there exists y ∈ R such that x+y = 0 (existence of an additive inverse y = −x);
(c) x + (y + z) = (x + y) + z (addition is associative);
(d) x + y = y + x (addition is commutative);
(e) x1 = x (existence of a multiplicative identity 1);
(f) for every x ∈ R \ {0}, there exists y ∈ R such that xy = 1 (existence of a multiplicative inverse y = x−1 );
(g) x(yz) = (xy)z (multiplication is associative);
(h) xy = yx (multiplication is commutative);
(i) (x + y)z = xz + yz (multiplication is distributive over addition).

26

2. Numbers

Axioms (a)–(d) say that R is a commutative group with respect to addition; axioms (e)–(h) say that R \ {0} is a commutative group with respect to multiplication; and axiom (i) says that addition and multiplication are compatible, in the sense that they satisfy a distributive law.
All of the usual algebraic properties of addition, subtraction (subtracting x means adding −x), multiplication, and division (dividing by x means multiplying by x−1 ) follow from these axioms, although we will not derive them in detail. The natural number n ∈ N is obtained by adding one to itself n times, the integer −n is its additive inverse, and p/q = pq −1 , where p, q are integers with q = 0 is a rational number. Thus, N ⊂ Z ⊂ Q ⊂ R.

2.4. Real numbers: ordering properties
The real numbers have a natural order relation that is compatible with their algebraic structure. We visualize the ordered real numbers as the real line, with smaller numbers to the left and larger numbers to the right.
Axiom 2.7. There is a strict linear order < on R such that for all x, y, z ∈ R:
(a) either x < y, x = y, or x > y;
(b) if x < y then x + z < y + z;
(c) if x < y and z > 0, then xz < yz.
For any a, b ∈ R with a ≤ b, we define the open intervals
(−∞, b) = {x ∈ R : x < b} ,
(a, b) = {x ∈ R : a < x < b} ,
(a, ∞) = {x ∈ R : a < x} , the closed intervals
(−∞, b] = {x ∈ R : x ≤ b} ,
[a, b] = {x ∈ R : a ≤ x ≤ b} ,
[a, ∞) = {x ∈ R : a ≤ x} , and the half-open intervals
(a, b] = {x ∈ R : a < x ≤ b} ,
[a, b) = {x ∈ R : a ≤ x < b} .
All standard properties of inequalities follow from Axiom 2.6 and Axiom 2.7.
For example: if x < y and z < 0, then xz > yz, meaning that the direction of an inequality is reversed when it is multiplied by a negative number; and x2 > 0 for every x = 0. In future, when we write an inequality such as x < y, we will implicitly require that x, y ∈ R.
Real numbers satisfy many inequalities. A simple, but fundamental, example is the following.
Proposition 2.8. If x, y ∈ R, then xy ≤

1 2 x + y2 ,
2

2.5. The supremum and infimum

27

with equality if and only if x = y.
Proof. We have
0 ≤ (x − y)2 = x2 − 2xy + y 2 , with equality if and only if x = y, so 2xy ≤ x2 + y 2 .


On writing x = a, y = b, where a, b ≥ 0, in the result of Proposition 2.8, we get that

a+b ab ≤
,
2 which says that the geometric mean of two nonnegative numbers is less than or equal to their arithmetic mean, with equality if and only if the numbers are equal.
A geometric interpretation of this inequality is that the square-root of the area of a rectangle is less than or equal to one-quarter of its perimeter, with equality if and only if the rectangle is a square. Thus, a square encloses the largest area among all rectangles of a given perimeter, which is a simple form of an isoperimetric inequality. The arithmetic-geometric mean inequality generalizes to more than two numbers: If n ∈ N and a1 , a2 , . . . , an ≥ 0 are nonnegative real numbers, then a1 + a2 + · · · + an
1/n
(a1 a2 . . . an )

, n with equality if and only if all of the ak are equal. For a proof, see e.g., Steele [13].

2.5. The supremum and infimum
Next, we use the ordering properties of R to define the supremum and infimum of a set of real numbers. These concepts are of central importance in analysis. In particular, in the next section we use them to state the completeness property of
R.
First, we define upper and lower bounds.
Definition 2.9. A set A ⊂ R of real numbers is bounded from above if there exists a real number M ∈ R, called an upper bound of A, such that x ≤ M for every x ∈ A. Similarly, A is bounded from below if there exists m ∈ R, called a lower bound of A, such that x ≥ m for every x ∈ A. A set is bounded if it is bounded both from above and below.
Equivalently, a set A is bounded if A ⊂ I for some bounded interval I = [m, M ].
Example 2.10. The interval (0, 1) is bounded from above by every M ≥ 1 and from below by every m ≤ 0. The interval (−∞, 0) is bounded from above by every
M ≥ 0, but it not bounded from below. The set of integers Z is not bounded from above or below.
If A ⊂ R, we define −A ⊂ R by
−A = {y ∈ R : y = −x for some x ∈ A} .
For example, if A = (0, ∞) consists of the positive real numbers, then −A =
(−∞, 0) consists of the negative real numbers. A number m is a lower bound of

28

2. Numbers

A if and only if M = −m is an upper bound of −A. Thus, every result for upper bounds has a corresponding result for lower bounds, and we will often consider only upper bounds.
Definition 2.11. Suppose that A ⊂ R is a set of real numbers. If M ∈ R is an upper bound of A such that M ≤ M for every upper bound M of A, then M is called the least upper bound or supremum of A, denoted
M = sup A.
If m ∈ R is a lower bound of A such that m ≥ m for every lower bound m of A, then m is called the greatest lower bound or infimum of A, denoted m = inf A.
If A = {xi : i ∈ I} is an indexed subset of R, we also write sup A = sup xi ,

inf A = inf xi .

i∈I

i∈I

As an immediate consequence of the definition, we note that the supremum (or infimum) of a set is unique if one exists: If M , M are suprema of A, then M ≤ M since M is an upper bound of A and M is a least upper bound; similarly, M ≤ M , so M = M . Furthermore, the supremum of a nonempty set A is always greater than or equal to its infimum if both exist. To see this, choose any x ∈ A. Since inf A is a lower bound and sup A is an upper bound of A, we have inf A ≤ x ≤ sup A.
If sup A ∈ A, then we also denote it by max A and refer to it as the maximum of
A; and if inf A ∈ A, then we also denote it by min A and refer to it as the minimum of A. As the following examples illustrate, sup A and inf A may or may not belong to A, so the concepts of supremum and infimum must be clearly distinguished from those of maximum and minimum.
Example 2.12. Every finite set of real numbers
A = {x1 , x2 , . . . , xn } is bounded. Its supremum is the greatest element, sup A = max{x1 , x2 , . . . , xn }, and its infimum is the smallest element, inf A = min{x1 , x2 , . . . , xn }.
Both the supremum and infimum of a finite set belong to the set.
Example 2.13. If A = (0, 1), then every M ≥ 1 is an upper bound of A. The least upper bound is M = 1, so sup(0, 1) = 1.
Similarly, every m ≤ 0 is a lower bound of A, so inf(0, 1) = 0.
In this case, neither sup A nor inf A belong to A. The set R = (0, 1) ∩ Q of rational numbers in (0, 1), the closed interval B = [0, 1], and the half-open interval C = (0, 1] all have the same supremum and infimum as A. Neither sup R nor inf R belong to
R, while both sup B and inf B belong to B, and only sup C belongs to C.

2.6. Real numbers: completeness

29

Example 2.14. Let
1
:n∈N n be the set of reciprocals of the natural numbers. Then sup A = 1, which belongs to
A, and inf A = 0, which does not belong to A.
A=

A set must be bounded from above to have a supremum (or bounded from below to have an infimum), but the following notation for unbounded sets is convenient.
We introduce a system of extended real numbers
R = {−∞} ∪ R ∪ {∞} which includes two new elements denoted −∞ and ∞, ordered so that −∞ < x < ∞ for every x ∈ R.
Definition 2.15. If a set A ⊂ R is not bounded from above, then sup A = ∞, and if A is not bounded from below, then inf A = −∞.
For example, sup N = ∞ and inf R = −∞. We also define sup ∅ = −∞ and inf ∅ = ∞, since — by a strict interpretation of logic — every real number is both an upper and a lower bound of the empty set. With these conventions, every set of real numbers has a supremum and an infimum in R. Moreover, we may define the supremum and infimum of sets of extended real numbers in an obvious way; for example, sup A = ∞ if ∞ ∈ A and inf A = −∞ if −∞ ∈ A.
While R is linearly ordered, we cannot make it into a field however we extend addition and multiplication from R to R. Expressions such as ∞ − ∞ or 0 · ∞ are inherently ambiguous. To avoid any possible confusion, we will give explicit definitions in terms of R alone for every expression that involves ±∞. Moreover, when we say that sup A or inf A exists, we will always mean that it exists as a real number, not as an extended real number. To emphasize this meaning, we will sometimes say that the supremum or infimum “exists as a finite real number.”

2.6. Real numbers: completeness
The rational numbers Q and real numbers R have similar algebraic and order properties (they are both densely ordered fields). The crucial property that distinguishes
R from Q is its completeness. There are two main ways to define the completeness of R. The first, which we describe here, is based on the order properties of R and the existence of suprema. The second, which we describe in Chapter 3, is based on the metric properties of R and the convergence of Cauchy sequences.
We begin with an example that illustrates the difference between Q and R.
Example 2.16. Define A ⊂ Q by
A = x ∈ Q : x2 < 2 .
Then A is bounded from above by every M ∈ Q+ such that M 2 > 2. Nevertheless,

A has no supremum in Q because 2 is irrational: for every upper bound M ∈ Q

there exists M ∈ Q such that 2 < M < M , so M isn’t a least upper bound of A

in Q. On the other hand, A has a supremum in R, namely sup A = 2.

30

2. Numbers

The following axiomatic property of the real numbers is called Dedekind completeness. Dedekind (1872) showed that the real numbers are characterized by the condition that they are a complete ordered field (that is, by Axiom 2.6, Axiom 2.7, and Axiom 2.17).
Axiom 2.17. Every nonempty set of real numbers that is bounded from above has a supremum.
Since inf A = − sup(−A) and A is bounded from below if and only if −A is bounded from above, it follows that every nonempty set of real numbers that is bounded from below has an infimum. The restriction to nonempty sets in Axiom 2.17 is necessary, since the empty set is bounded from above, but its supremum does not exist.
As a first application of this axiom, we prove that R has the Archimedean property, meaning that no real number is greater than every natural number.
Theorem 2.18. If x ∈ R, then there exists n ∈ N such that x < n.
Proof. Suppose, for contradiction, that there exists x ∈ R such that x > n for every n ∈ N. Then x is an upper bound of N, so N has a supremum M = sup N ∈ R.
Since n ≤ M for every n ∈ N, we have n − 1 ≤ M − 1 for every n ∈ N, which implies that n ≤ M − 1 for every n ∈ N. But then M − 1 is an upper bound of N, which contradicts the assumption that M is a least upper bound.
By taking reciprocals, we also get from this theorem that for every exists n ∈ N such that 0 < 1/n < .

> 0 there

These results say roughly that there are no infinite or infinitesimal real numbers. This property is consistent with our intuitive picture of a real line R that does not “extend past the natural numbers,” where the natural numbers are obtained by counting upwards from 1. Robinson (1961) introduced extensions of the real numbers, called non-standard real numbers, which form non-Archimedean ordered fields with both infinite and infinitesimal elements, but they do not satisfy
Axiom 2.17.
The following proof of the uncountability of R is based on its completeness and is Cantor’s original proof (1874). The idea is to show that given any countable set of real numbers, there are additional real numbers in the “gaps” between them.
Theorem 2.19. The set of real numbers is uncountable.
Proof. Suppose that
S = {x1 , x2 , x3 , . . . , xn , . . . } is a countably infinite set of distinct real numbers. We will prove that there is a real number x ∈ R that does not belong to S.
If x1 is the largest element of S, then no real number greater than x1 belongs to
S. Otherwise, we select recursively from S an increasing sequence of real numbers ak and a decreasing sequence bk as follows. Let a1 = x1 and choose b1 = xn1 where n1 is the smallest integer such that xn1 > a1 . Then xn ∈ (a1 , b1 ) for all
/
1 ≤ n ≤ n1 . If xn ∈ (a1 , b1 ) for all n ∈ N, then no real number in (a1 , b1 ) belongs to
/
S, and we are done e.g., take x = (a1 + b1 )/2. Otherwise, choose a2 = xm2 where

2.7. Properties of the supremum and infimum

31

m2 > n1 is the smallest integer such that a1 < xm2 < b1 . Then xn ∈ (a2 , b1 ) for all
/
1 ≤ n ≤ m2 . If xn ∈ (a2 , b1 ) for all n ∈ N, we are done. Otherwise, choose b2 = xn2
/
where n2 > m2 is the smallest integer such that a2 < xn2 < b1 .
Continuing in this way, we either stop after finitely many steps and get an interval that is not included in S, or we get subsets {a1 , a2 , . . . } and {b1 , b2 , . . . } of
{x1 , x2 , . . . } such that a1 < a2 < · · · < ak < · · · < bk < · · · < b2 < b1 .
It follows from the construction that for each n ∈ N, we have xn ∈ (ak , bk ) when k
/
is sufficiently large. Let a = sup ak , k∈N inf bk = b,

k∈N

which exist by the completeness of R. Then a ≤ b (see Proposition 2.22 below) and x ∈ S if a ≤ x ≤ b, which proves the result.
/
This theorem shows that R is uncountable, but it doesn’t show that R has the same cardinality as the power set P(N) of the natural numbers, whose uncountability was proved in Theorem 1.47. In Theorem 5.67, we show that R has the same cardinality as P(N); this provides a second proof that R is uncountable and shows that P(N) has the cardinality of the continuum.

2.7. Properties of the supremum and infimum
In this section, we collect some properties of the supremum and infimum for later use. This section can be referred back to as needed.
First, we state an equivalent way to characterize the supremum and infimum, which is an immediate consequence of Definition 2.11.
Proposition 2.20. If A ⊂ R, then M = sup A if and only if: (a) M is an upper bound of A; (b) for every M < M there exists x ∈ A such that x > M . Similarly, m = inf A if and only if: (a) m is a lower bound of A; (b) for every m > m there exists x ∈ A such that x < m .
We frequently use this proposition as follows: (a) if M is an upper bound of
A, then sup A ≤ M ; (b) if A is nonempty and bounded from above, then for every
> 0, there exists x ∈ A such that x > sup A − . Similarly: (a) if m is a lower bound of A, then m ≤ inf A; (b) if A is nonempty and bounded from below, then for every > 0, there exists x ∈ A such that x < inf A + .
Making a set smaller decreases its supremum and increases its infimum. In the following inequalities, we allow the sup and inf to be extended real numbers.
Proposition 2.21. Suppose that A, B are subsets of R such that A ⊂ B. Then sup A ≤ sup B, and inf A ≥ inf B.
Proof. The result is immediate if B = ∅, when A = ∅, so we may assume that B is nonempty. If B is not bounded from above, then sup B = ∞, so sup A ≤ sup B.
If B bounded from above, then sup B is an upper bound of B. Since A ⊂ B, it follows that sup B is an upper bound of A, so sup A ≤ sup B. Similarly, either inf B = −∞ or inf B is a lower bound of A, so inf A ≥ inf B.

32

2. Numbers

The next proposition states that if every element in one set is less than or equal to every element in another set, then the sup of the first set is less than or equal to the inf of the second set.
Proposition 2.22. Suppose that A, B are nonempty sets of real numbers such that x ≤ y for all x ∈ A and y ∈ B. Then sup A ≤ inf B.
Proof. Fix y ∈ B. Since x ≤ y for all x ∈ A, it follows that y is an upper bound of A, so sup A is finite and sup A ≤ y. Hence, sup A is a lower bound of B, so inf B is finite and sup A ≤ inf B.
If A ⊂ R and c ∈ R, then we define cA = {y ∈ R : y = cx for some x ∈ A}.
Multiplication of a set by a positive number multiplies its sup and inf; multiplication by a negative number also exchanges its sup and inf.
Proposition 2.23. If c ≥ 0, then sup cA = c sup A,

inf cA = c inf A.

sup cA = c inf A,

inf cA = c sup A.

If c < 0, then

Proof. The result is obvious if c = 0. If c > 0, then cx ≤ M if and only if x ≤ M/c, which shows that M is an upper bound of cA if and only if M/c is an upper bound of A, so sup cA = c sup A. If c < 0, then then cx ≤ M if and only if x ≥ M/c, so M is an upper bound of cA if and only if M/c is a lower bound of A, so sup cA = c inf A. The remaining results follow similarly.
If A, B ⊂ R, then we define
A + B = {z ∈ R : z = x + y for some x ∈ A, y ∈ B} ,
A − B = {z ∈ R : z = x − y for some x ∈ A, y ∈ B} .
Proposition 2.24. If A, B are nonempty sets, then sup(A + B) = sup A + sup B,

inf(A + B) = inf A + inf B,

sup(A − B) = sup A − inf B,

inf(A − B) = inf A − sup B.

Proof. The set A + B is bounded from above if and only if A and B are bounded from above, so sup(A + B) exists if and only if both sup A and sup B exist. In that case, if x ∈ A and y ∈ B, then x + y ≤ sup A + sup B, so sup A + sup B is an upper bound of A + B, and therefore sup(A + B) ≤ sup A + sup B.
To get the inequality in the opposite direction, suppose that exist x ∈ A and y ∈ B such that x > sup A − ,
2

y > sup B − .
2

> 0. Then there

2.7. Properties of the supremum and infimum

33

It follows that for every

x + y > sup A + sup B −
> 0, which implies that sup(A + B) ≥ sup A + sup B.

Thus, sup(A + B) = sup A + sup B. It follows from this result and Proposition 2.23 that sup(A − B) = sup A + sup(−B) = sup A − inf B.
The proof of the results for inf(A + B) and inf(A − B) is similar, or we can apply the results for the supremum to −A and −B.
Finally, we prove that taking the supremum over a pair of indices gives the same result as taking successive suprema over each index separately.
Proposition 2.25. Suppose that
{xij : i ∈ I, j ∈ J} is a doubly-indexed set of real numbers. Then sup xij = sup sup xij . i∈I (i,j)∈I×J

j∈J

Proof. For each a ∈ I, we have {a} × J ⊂ I × J, so sup xaj ≤ j∈J sup

xij .

(i,j)∈I×J

Taking the supremum of this inequality over a ∈ I, and replacing ‘a’ by ‘i’, we get that ≤

sup sup xij i∈I j∈J

sup

xij .

(i,j)∈I×J

To prove the reverse inequality, first note that if sup xij

(i,j)∈I×J

is finite, then given

> 0 there exists a ∈ I, b ∈ J such that xab >

xij − .

sup
(i,j)∈I×J

It follows that sup xaj > j∈J xij − ,

sup
(i,j)∈I×J

and therefore that sup sup xij i∈I Since

>

j∈J

sup

xij − .

(i,j)∈I×J

> 0 is arbitrary, we have sup sup xij i∈I j∈J



sup
(i,j)∈I×J

Similarly, if sup (i,j)∈I×J

xij = ∞,

xij .

34

2. Numbers

then given M ∈ R there exists a ∈ I, b ∈ J such that xab > M , and it follows that sup sup xij i∈I > M.

j∈J

Since M is arbitrary, we have sup sup xij i∈I which completes the proof.

j∈J

= ∞,

Chapter 3

Sequences

In this chapter, we discuss sequences. We say what it means for a sequence to converge, and define the limit of a convergent sequence. We begin with some preliminary results about the absolute value, which can be used to define a distance function, or metric, on R. In turn, convergence is defined in terms of this metric.

3.1. The absolute value
Definition 3.1. The absolute value of x ∈ R is defined by
|x| =

x if x ≥ 0,
−x if x < 0.

Some basic properties of the absolute value are the following.
Proposition 3.2. For all x, y ∈ R:
(a) |x| ≥ 0 and |x| = 0 if and only if x = 0;
(b) | − x| = |x|;
(c) |x + y| ≤ |x| + |y| (triangle inequality);
(d) |xy| = |x| |y|;
Proof. Parts (a), (b) follow immediately from the definition. Part (c) remains valid if we change the signs of both x and y or exchange x and y. Therefore we can assume that x ≥ 0 and |x| ≥ |y| without loss of generality, in which case x + y ≥ 0.
If y ≥ 0, corresponding to the case when x and y have the same sign, then
|x + y| = x + y = |x| + |y|.
If y < 0, corresponding to the case when x and y have opposite signs and x + y > 0, then |x + y| = x + y = |x| − |y| < |x| + |y|,
35

36

3. Sequences

which proves (c). Part (d) remains valid if we change x to −x or y to −y, so we can assume that x, y ≥ 0 without loss of generality. Then xy ≥ 0 and |xy| = xy =
|x||y|.
One useful consequence of the triangle inequality is the following reverse triangle inequality.
Proposition 3.3. If x, y ∈ R, then
||x| − |y|| ≤ |x − y|.
Proof. By the triangle inequality,
|x| = |x − y + y| ≤ |x − y| + |y| so |x| − |y| ≤ |x − y|. Similarly, exchanging x and y, we get |y| − |x| ≤ |x − y|, which proves the result.
We can give an equivalent condition for the boundedness of a set by using the absolute value instead of upper and lower bounds as in Definition 2.9.
Proposition 3.4. A set A ⊂ R is bounded if and only if there exists a real number
M ≥ 0 such that
|x| ≤ M for every x ∈ A.
Proof. If the condition in the proposition holds, then M is an upper bound of
A and −M is a lower bound, so A is bounded. Conversely, if A is bounded from above by M and from below by m , then |x| ≤ M for every x ∈ A where M = max{|m |, |M |}.
A third way to say that a set is bounded is in terms of its diameter.
Definition 3.5. Let A ⊂ R. The diameter of A is diam A = sup {|x − y| : x, y ∈ A} .
Then a set is bounded if and only if its diameter is finite.
Example 3.6. If A = (−a, a), then diam A = 2a, and A is bounded. If A =
(−∞, a), then diam A = ∞, and A is unbounded.

3.2. Sequences
A sequence (xn ) of real numbers is an ordered list of numbers xn ∈ R, called the terms of the sequence, indexed by the natural numbers n ∈ N. We often indicate a sequence by listing the first few terms, especially if they have an obvious pattern.
Of course, no finite list of terms is, on its own, sufficient to define a sequence.

3.2. Sequences

37

3
2.9
2.8
2.7

xn

2.6
2.5
2.4
2.3
2.2
2.1
2

0

5

10

15

20 n 25

30

35

40

Figure 1. A plot of the first 40 terms in the sequence xn = (1 + 1/n)n , illustrating that it is monotone increasing and converges to e ≈ 2.718, whose value is indicated by the dashed line.

Example 3.7. Here are some sequences:
1, 8, 27, 64, . . . ,
1 1 1
1, , , , . . .
2 3 4
1, −1, 1, −1, . . .
(1 + 1) ,

1+

1
2

2

,

1+

1
3

xn = n3 ,
1
xn = ; n xn = (−1)n+1 ,

3

, ...

xn =

1+

1 n n

.

Note that unlike sets, where elements are not repeated, the terms in a sequence may be repeated.
The formal definition of a sequence is as a function on N, which is equivalent to its definition as a list.
Definition 3.8. A sequence (xn ) of real numbers is a function f : N → R, where xn = f (n).
We can consider sequences of many different types of objects (for example, sequences of functions) but for now we only consider sequences of real numbers, and we will refer to them as sequences for short. A useful way to visualize a sequence (xn ) is to plot the graph of xn ∈ R versus n ∈ N. (See Figure 1 for an example.) 38

3. Sequences

If we want to indicate the range of the index n ∈ N explicitly, we write the sequence as (xn )∞ . Sometimes it is convenient to start numbering a sequence n=1 from a different integer, such as n = 0 instead of n = 1. In that case, a sequence
(xn )∞ is a function f : N0 → R where xn = f (n) and N0 = {0, 1, 2, 3, . . . }, and n=0 similarly for other starting points.
Every function f : N → R defines a sequence, corresponding to an arbitrary choice of a real number xn ∈ R for each n ∈ N. Some sequences can be defined explicitly by giving an expression for the nth terms, as in Example 3.7; others can be defined recursively. That is, we specify the value of the initial term (or terms) in the sequence, and define xn as a function of the previous terms (x1 , x2 , . . . , xn−1 ).
A well-known example of a recursive sequence is the Fibonacci sequence (Fn )
1, 1, 2, 3, 5, 8, 13, . . . , which is defined by F1 = F2 = 1 and
Fn = Fn−1 + Fn−2

for n ≥ 3.

That is, we add the two preceding terms to get the next term. In general, we cannot expect to solve a recursion relation to get an explicit expression for the nth term in a sequence, but the recursion relation for the Fibonacci sequence is linear with constant coefficients, and it can be solved to give an expression for Fn called the
Euler-Binet formula.
Proposition 3.9 (Euler-Binet formula). The nth term in the Fibonacci sequence is given by

n
1+ 5
1
1 n , φ= .
Fn = √ φ − − φ 2
5
Proof. The terms in the Fibonacci sequence are uniquely determined by the linear difference equation
Fn − Fn−1 − Fn−2 = 0, n ≥ 3, with the initial conditions
F1 = 1,

F2 = 1.

n

We see that Fn = r is a solution of the difference equation if r satisfies r2 − r − 1 = 0, which gives
1
r = φ or − , φ √
1+ 5 φ= ≈ 1.61803.
2

By linearity, n 1 φ is a solution of the difference equation for arbitrary constants A, B. This solution satisfies the initial conditions F1 = F2 = 1 if
1
1
A= √ ,
B = −√ ,
5
5 which proves the result.
Fn = Aφn + B −

3.3. Convergence and limits

39

Alternatively, once we know the answer, we can prove Proposition 3.9 by induction. The details are left as an exercise. Note that√ although the right-hand side of the equation for Fn involves the irrational number 5, its value is an integer for every n ∈ N.
The number φ appearing in Proposition 3.9 is called the golden ratio. It has the property that subtracting 1 from it gives its reciprocal, or φ−1= 1
.
φ

Geometrically, this property means that the removal of a square from a rectangle whose sides are in the ratio φ leaves a rectangle whose sides are in the same ratio.
The number φ was originally defined in Euclid’s Elements as the division of a line in
“extreme and mean ratio,” and Ancient Greek architects arguably used rectangles with this proportion in the Parthenon and other buildings. During the Renaissance, φ was referred to as the “divine proportion.” The first use of the term “golden section” appears to be by Martin Ohm, brother of the physicist Georg Ohm, in a book published in 1835.

3.3. Convergence and limits
Roughly speaking, a sequence (xn ) converges to a limit x if its terms xn get arbitrarily close to x for all sufficiently large n.
Definition 3.10. A sequence (xn ) of real numbers converges to a limit x ∈ R, written x = lim xn , or xn → x as n → ∞, n→∞ if for every

> 0 there exists N ∈ N such that
|xn − x| <

for all n > N .

A sequence converges if it converges to some limit x ∈ R, otherwise it diverges.
Although we don’t show it explicitly in the definition, N is allowed to depend on . Typically, the smaller we choose , the larger we have to make N . One way to view a proof of convergence is as a game: If I give you an > 0, you have to come up with an N that “works.” Also note that xn → x as n → ∞ means the same thing as |xn − x| → 0 as n → ∞.
It may appear obvious that a limit is unique if one exists, but this fact requires proof. Proposition 3.11. If a sequence converges, then its limit is unique.
Proof. Suppose that (xn ) is a sequence such that xn → x and xn → x as n → ∞.
Let > 0. Then there exist N, N ∈ N such that
|xn − x| <
|xn − x | <

2
2

for all n > N , for all n > N .

40

3. Sequences

Choose any n > max{N, N }. Then, by the triangle inequality,
|x − x | ≤ |x − xn | + |xn − x | <

+ < .
2 2
Since this inequality holds for all > 0, we must have |x − x | = 0 (otherwise the inequality would be false for = |x − x |/2 > 0), so x = x .
The following notation for sequences that “diverge to infinity” is convenient.
Definition 3.12. If (xn ) is a sequence then lim xn = ∞,

n→∞

or xn → ∞ as n → ∞, if for every M ∈ R there exists N ∈ R such that xn > M

for all n > N .

Also lim xn = −∞,

n→∞

or xn → −∞ as n → ∞, if for every M ∈ R there exists N ∈ R such that xn < M

for all n > N .

That is, xn → ∞ as n → ∞ means the terms of the sequence (xn ) get arbitrarily large and positive for all sufficiently large n, while xn → −∞ as n → ∞ means that the terms get arbitrarily large and negative for all sufficiently large n. The notation xn → ±∞ does not mean that the sequence converges.
To illustrate these definitions, we discuss the convergence of the sequences in
Example 3.7.
Example 3.13. The terms in the sequence
1, 8, 27, 64, . . . ,

xn = n3

eventually exceed any real number, so n3 → ∞ as n → ∞ and this sequence does not converge. Explicitly, let M ∈ R be given, and choose N ∈ N such that
N > M 1/3 . (If −∞ < M < 1, we can choose N = 1.) Then for all n > N , we have n3 > N 3 > M , which proves the result.
Example 3.14. The terms in the sequence
1
1 1 1 xn =
1, , , , . . .
2 3 4 n get closer to 0 as n gets larger, and the sequence converges to 0:
1
lim
= 0. n→∞ n
To prove this limit, let > 0 be given. Choose N ∈ N such that N > 1/ . (Such a number exists by the Archimedean property of R stated in Theorem 2.18.) Then for all n > N
1
1
1
−0 = <
< , n n
N
which proves that 1/n → 0 as n → ∞.

3.3. Convergence and limits

41

Example 3.15. The terms in the sequence xn = (−1)n+1 ,

1, −1, 1, −1, . . .

oscillate back and forth infinitely often between 1 and −1, but they do not approach any fixed limit, so the sequence does not converge. To show this explicitly, note that for every x ∈ R we have either |x − 1| ≥ 1 or |x + 1| ≥ 1. It follows that there is no N ∈ N such that |xn − x| < 1 for all n > N . Thus, Definition 3.10 fails if
= 1 however we choose x ∈ R, and the sequence does not converge.
Example 3.16. The convergence of the sequence
(1 + 1) ,

1+

2

1
2

,

1+

1
3

3

,...

xn =

1+

1 n n

,

illustrated in Figure 1, is less obvious. Its terms are given by xn = an , n an = 1 +

1
.
n

As n increases, we take larger powers of numbers that get closer to one. If a > 1 is any fixed real number, then an → ∞ as n → ∞ so the sequence (an ) does not converge (see Proposition 3.31 below for a detailed proof). On the other hand, if a = 1, then 1n = 1 for all n ∈ N so the sequence (1n ) converges to 1. Thus, there are two competing factors in the sequence with increasing n: an → 1 but n → ∞.
It is not immediately obvious which of these factors “wins.”
In fact, they are in balance. As we prove in Proposition 3.32 below, the sequence converges with n 1 lim 1 +
= e, n→∞ n where 2 < e < 3. This limit can be taken as the definition of e ≈ 2.71828.
For comparison, one can also show that lim n→∞

1+

1 n2 n

= 1,

lim

n→∞

1+

1 n n2

= ∞.

In the first case, the rapid approach of an = 1+1/n2 to 1 “beats” the slower growth in the exponent n, while in the second case, the rapid growth in the exponent n2
“beats” the slower approach of an = 1 + 1/n to 1.
An important property of a sequence is whether or not it is bounded.
Definition 3.17. A sequence (xn ) of real numbers is bounded from above if there exists M ∈ R such that xn ≤ M for all n ∈ N, and bounded from below if there exists m ∈ R such that xn ≥ m for all n ∈ N. A sequence is bounded if it is bounded from above and below, otherwise it is unbounded.
An equivalent condition for a sequence (xn ) to be bounded is that there exists
M ≥ 0 such that
|xn | ≤ M for all n ∈ N.

42

3. Sequences

Example 3.18. The sequence (n3 ) is bounded from below but not from above, while the sequences (1/n) and (−1)n+1 are bounded. The sequence
1, −2, 3, −4, 5, −6, . . .

xn = (−1)n+1 n

is not bounded from below or above.
We then have the following property of convergent sequences.
Proposition 3.19. A convergent sequence is bounded.
Proof. Let (xn ) be a convergent sequence with limit x. There exists N ∈ N such that |xn − x| < 1 for all n > N .
The triangle inequality implies that
|xn | ≤ |xn − x| + |x| < 1 + |x|

for all n > N .

Defining
M = max {|x1 |, |x2 |, . . . , |xN |, 1 + |x|} , we see that |xn | ≤ M for all n ∈ N, so (xn ) is bounded.
Thus, boundedness is a necessary condition for convergence, and every unbounded sequence diverges; for example, the unbounded sequence in Example 3.13 diverges. On the other hand, boundedness is not a sufficient condition for convergence; for example, the bounded sequence in Example 3.15 diverges.
The boundedness, or convergence, of a sequence (xn )∞ depends only on the n=1 behavior of the infinite “tails” (xn )∞ n=N of the sequence, where N is arbitrarily large. Equivalently, the sequence (xn )∞ and the shifted sequences (xn+N )∞ n=1 n=1 have the same convergence properties and limits for every N ∈ N. As a result, changing a finite number of terms in a sequence doesn’t alter its boundedness or convergence, nor does it alter the limit of a convergent sequence. In particular, the existence of a limit gives no information about how quickly a sequence converges to its limit.
Example 3.20. Changing the first hundred terms of the sequence (1/n) from 1/n to n, we get the sequence
1
1
1
,
,
, ...,
1, 2, 3, . . . , 99, 100,
101 102 103 which is still bounded (although by 100 instead of by 1) and still convergent to
0. Similarly, changing the first billion terms in the sequence doesn’t change its boundedness or convergence.
We introduce some convenient terminology to describe the behavior of “tails” of a sequence,
Definition 3.21. Let P (x) denote a property of real numbers x ∈ R. If (xn ) is a real sequence, then P (xn ) holds eventually if there exists N ∈ N such that P (xn ) holds for all n > N ; and P (xn ) holds infinitely often if for every N ∈ N there exists n > N such that P (xn ) holds.

3.4. Properties of limits

43

For example, (xn ) is bounded if there exists M ≥ 0 such that |xn | ≤ M eventually; and (xn ) does not converge to x ∈ R if there exists 0 > 0 such that
|xn − x| ≥ 0 infinitely often.
Note that if a property P holds infinitely often according to Definition 3.21, then it does indeed hold infinitely often: If N = 1, then there exists n1 > 1 such that P (xn1 ) holds; if N = n1 , then there exists n2 > n1 such that P (xn2 ) holds; then there exists n3 > n2 such that P (xn3 ) holds, and so on.

3.4. Properties of limits
In this section, we prove some order and algebraic properties of limits of sequences.
3.4.1. Monotonicity. Limits of convergent sequences preserve (non-strict) inequalities.
Theorem 3.22. If (xn ) and (yn ) are convergent sequences and xn ≤ yn for all n ∈ N, then lim xn ≤ lim yn . n→∞ n→∞

Proof. Suppose that xn → x and yn → y as n → ∞. Then for every exists P, Q ∈ N such that
|x − xn | <
|y − yn | <

2
2

> 0 there

for all n > P , for all n > Q.

Choosing n > max{P, Q}, we have
= y + yn − y + < y + .
2
2
> 0, it follows that x ≤ y.

x = xn + x − xn < yn +
Since x < y + for every

This result, of course, remains valid if the inequality xn ≤ yn holds only for all sufficiently large n. Limits need not preserve strict inequalities. For example,
1/n > 0 for all n ∈ N but limn→∞ 1/n = 0.
It follows immediately that if (xn ) is a convergent sequence with m ≤ xn ≤ M for all n ∈ N, then m ≤ lim xn ≤ M. n→∞ The following “squeeze” or “sandwich” theorem is often useful in proving the convergence of a sequence by bounding it between two simpler convergent sequences with equal limits.
Theorem 3.23 (Sandwich). Suppose that (xn ) and (yn ) are convergent sequences of real numbers with the same limit L. If (zn ) is a sequence such that xn ≤ zn ≤ yn then (zn ) also converges to L.

for all n ∈ N,

44

3. Sequences

Proof. Let

> 0 be given, and choose P, Q ∈ N such that

|xn − L| <

|yn − L| <

for all n > P ,

for all n > Q.

If N = max{P, Q}, then for all n > N
− < xn − L ≤ zn − L ≤ yn − L < , which implies that |zn − L| < . This prove the result.
It is essential here that (xn ) and (yn ) have the same limit.
Example 3.24. If xn = −1, yn = 1, and zn = (−1)n+1 , then xn ≤ zn ≤ yn for all n ∈ N, the sequence (xn ) converges to −1 and (yn ) converges 1, but (zn ) does not converge. As once consequence, we show that we can take absolute values inside limits.
Corollary 3.25. If xn → x as n → ∞, then |xn | → |x| as n → ∞.
Proof. By the reverse triangle inequality,
0 ≤ | |xn | − |x| | ≤ |xn − x|, and the result follows from Theorem 3.23.
3.4.2. Linearity. Limits respect addition and multiplication. In proving the following theorem, we need to show that the sequences converge, not just get an expressions for their limits.
Theorem 3.26. Suppose that (xn ) and (yn ) are convergent real sequences and c ∈ R. Then the sequences (cxn ), (xn + yn ), and (xn yn ) converge, and lim cxn = c lim xn ,

n→∞

n→∞

lim (xn + yn ) = lim xn + lim yn ,

n→∞

n→∞

lim (xn yn ) =

n→∞

n→∞

lim xn

n→∞

lim yn .

n→∞

Proof. We let x = lim xn , n→∞ y = lim yn . n→∞ The first statement is immediate if c = 0. Otherwise, let > 0 be given, and choose
N ∈ N such that
|xn − x| < for all n > N .
|c|
Then
|cxn − cx| < for all n > N , which proves that (cxn ) converges to cx.
For the second statement, let

> 0 be given, and choose P, Q ∈ N such that

|xn − x| <

for all n > P ,
|yn − y| <
2
2
Let N = max{P, Q}. Then for all n > N , we have

for all n > Q.

|(xn + yn ) − (x + y)| ≤ |xn − x| + |yn − y| < , which proves that (xn + yn ) converges to x + y.

3.5. Monotone sequences

45

For the third statement, note that since (xn ) and (yn ) converge, they are bounded and there exists M > 0 such that
|xn |, |yn | ≤ M and |x|, |y| ≤ M . Given

for all n ∈ N

> 0, choose P, Q ∈ N such that

|xn − x| <

for all n > P ,
|yn − y| <
2M
2M and let N = max{P, Q}. Then for all n > N ,

for all n > Q,

|xn yn − xy| = |(xn − x)yn + x(yn − y)|
≤ |xn − x| |yn | + |x| |yn − y|
≤ M (|xn − x| + |yn − y|)
< , which proves that (xn yn ) converges to xy.
Note that the convergence of (xn + yn ) does not imply the convergence of (xn ) and (yn ) separately; for example, take xn = n and yn = −n. If, however, (xn ) converges then (yn ) converges if and only if (xn + yn ) converges.

3.5. Monotone sequences
Monotone sequences have particularly simple convergence properties.
Definition 3.27. A sequence (xn ) of real numbers is increasing if xn+1 ≥ xn for all n ∈ N, decreasing if xn+1 ≤ xn for all n ∈ N, and monotone if it is increasing or decreasing. A sequence is strictly increasing if xn+1 > xn , strictly decreasing if xn+1 < xn , and strictly monotone if it is strictly increasing or strictly decreasing.
We don’t require a monotone sequence to be strictly monotone, but this usage isn’t universal. In some places, “increasing” or “decreasing” is used to mean
“strictly increasing” or “strictly decreasing.” In that case, what we call an increasing sequence is called a nondecreasing sequence and a decreasing sequence is called nonincreasing sequence. We’ll use the more easily understood direct terminology.
Example 3.28. The sequence
1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, . . . is monotone increasing but not strictly monotone increasing; the sequence (n3 ) is strictly monotone increasing; the sequence (1/n) is strictly monotone decreasing; and the sequence ((−1)n+1 ) is not monotone.
Bounded monotone sequences always converge, and unbounded monotone sequences diverge to ±∞.
Theorem 3.29. A monotone sequence of real numbers converges if and only if it is bounded. If (xn ) is monotone increasing and bounded, then lim xn = sup{xn : n ∈ N},

n→∞

46

3. Sequences

and if (xn ) is monotone decreasing and bounded, then lim xn = inf{xn : n ∈ N}.

n→∞

Furthermore, if (xn ) is monotone increasing and unbounded, then lim xn = ∞,

n→∞

and if (xn ) is monotone decreasing and unbounded, then lim xn = −∞.

n→∞

Proof. If the sequence converges, then by Proposition 3.19 it is bounded.
Conversely, suppose that (xn ) is a bounded, monotone increasing sequence.
The set of terms {xn : n ∈ N} is bounded from above, so by Axiom 2.17 it has a supremum x = sup{xn : n ∈ N}.
Let > 0. From the definition of the supremum, there exists an N ∈ N such that xN > x − . Since the sequence is increasing, we have xn ≥ xN for all n > N , and therefore x − < xn ≤ x. It follows that
|xn − x| <

for all n > N ,

which proves that xn → x as n → ∞.
If (xn ) is an unbounded monotone increasing sequence, then it is not bounded from above, since it is bounded from below by its first term x1 . Hence, for every
M ∈ R there exists N ∈ N such that xN > M . Since the sequence is increasing, we have xn ≥ xN > M for all n > N , which proves that xn → ∞ as n → ∞.
The result for a monotone decreasing sequence (xn ) follows similarly, or by applying the previous result to the monotone increasing sequence (−xn ).
The fact that every bounded monotone sequence has a limit is another way to express the completeness of R. For example, this is not true in Q: an increasing

sequence of rational numbers that converges to 2 is bounded from above in Q (for example, by 2) but has no limit in Q.
We sometimes use the notation xn ↑ x to indicate that (xn ) is a monotone increasing sequence that converges to x, and xn ↓ x to indicate that (xn ) is a monotone decreasing sequence that converges to x, with a similar notation for monotone sequences that diverge to ±∞. For example, 1/n ↓ 0 and n3 ↑ ∞ as n → ∞.
The following propositions give some examples of monotone sequences. In the proofs, we use the binomial theorem, which we state without proof.
Theorem 3.30 (Binomial). If x, y ∈ R and n ∈ N, then n (x + y)n = k=0 n n−k k x y , k n k =

n!
.
k!(n − k)!

Here, n! = 1 · 2 · 3 · · · · · n and, by convention, 0! = 1. The binomial coefficients n k

=

n · (n − 1) · (n − 2) · · · · (n − k + 1)
,
1 · 2 · 3··· · k

3.5. Monotone sequences

47

read “n choose k,” give the number of ways of choosing k objects from n objects, order not counting. For example,
(x + y)2 = x2 + 2xy + y 2 ,
(x + y)3 = x3 + 3x2 y + 3xy 2 + y 3 ,
(x + y)4 = x4 + 4x3 y + 6x2 y 2 + 4xy 3 + y 4 .
We also recall the sum of a geometric series: if a = 1, then n ak = k=0 1 − an+1
.
1−a

Proposition 3.31. The geometric sequence (an )∞ , n=0 1, a, a2 , a3 , . . . , is strictly monotone decreasing if 0 < a < 1, with lim an = 0,

n→∞

and strictly monotone increasing if 1 < a < ∞, with lim an = ∞.

n→∞

Proof. If 0 < a < 1, then 0 < an+1 = a · an < an , so the sequence (an ) is strictly monotone decreasing and bounded from below by zero. Therefore by Theorem 3.29 it has a limit x ∈ R. Theorem 3.26 implies that x = lim an+1 = lim a · an = a lim an = ax. n→∞ n→∞

n→∞

Since a = 1, it follows that x = 0.
If a > 1, then an+1 = a · an > an , so (an ) is strictly increasing. Let a = 1 + δ where δ > 0. By the binomial theorem, we have an = (1 + δ)n n = k=0 n k δ k

1
= 1 + nδ + n(n − 1)δ 2 + · · · + δ n
2
> 1 + nδ.
Given M ≥ 0, choose N ∈ N such that N > M/δ. Then for all n > N , we have an > 1 + nδ > 1 + N δ > M, so an → ∞ as n → ∞.
The next proposition proves the existence of the limit for e in Example 3.16.
Proposition 3.32. The sequence (xn ) with xn =

1+

1 n n

is strictly monotone increasing and converges to a limit 2 < e < 3.

48

3. Sequences

Proof. By the binomial theorem,
1+

1 n n

n

= k=0 n 1 k nk

n(n − 1)(n − 2) 1
1
n(n − 1) 1
+
· 2+
· 3 n 2! n 3! n n(n − 1)(n − 2) . . . 2 · 1 1
+ ··· +
· n n! n
1
1
1
1
2
=2+
1−
+
1−
1−
2!
n
3!
n n 1
1
2
2 1
+ ··· +
1−
1−
... · . n! n n n n

=1+n·

Each of the terms in the sum on the right hand side is a positive increasing function of n, and the number of terms increases with n. Therefore (xn ) is a strictly increasing sequence, and xn > 2 for every n ≥ 2. Moreover, since 0 ≤ (1 − k/n) < 1 for 1 ≤ k ≤ n, we have
1+

1 n n

N ; (b) for every N ∈ N there exists n > N such that xn > y − .

52

3. Sequences

(2) y = ∞ and for every M ∈ R, there exists n ∈ N such that xn > M , i.e., (xn ) is not bounded from above.
(3) y = −∞ and for every m ∈ R there exists N ∈ N such that xn < m for all n > N , i.e., xn → −∞ as n → ∞.
Similarly,
z = lim inf xn n→∞ if and only if −∞ ≤ z ≤ ∞ satisfies one of the following conditions.
(1) −∞ < z < ∞ and for every > 0: (a) there exists N ∈ N such that xn > z − for all n > N ; (b) for every N ∈ N there exists n > N such that xn < z + .
(2) z = −∞ and for every m ∈ R, there exists n ∈ N such that xn < m, i.e., (xn ) is not bounded from below.
(3) z = ∞ and for every M ∈ R there exists N ∈ N such that xn > M for all n > N , i.e., xn → ∞ as n → ∞.
Proof. We prove the result for lim sup. The result for lim inf follows by applying this result to the sequence (−xn ).
First, suppose that y = lim sup xn and −∞ < y < ∞. Then (xn ) is bounded from above and yn = sup {xk : k ≥ n} ↓ y as n → ∞.
Therefore, for every > 0 there exists N ∈ N such that yN < y + . Since xn ≤ yN for all n > N , this proves (1a). To prove (1b), let > 0 and suppose that N ∈ N is arbitrary. Since yN ≥ y is the supremum of {xn : n ≥ N }, there exists n ≥ N such that xn > yN − ≥ y − , which proves (1b).
Conversely, suppose that −∞ < y < ∞ satisfies condition (1) for the lim sup.
Then, given any > 0, (1a) implies that there exists N ∈ N such that yn = sup {xk : k ≥ n} ≤ y +

for all n > N ,

and (1b) implies that yn > y − for all n ∈ N. Hence, |yn − y| < so yn → y as n → ∞, which means that y = lim sup xn .

for all n > N ,

We leave the verification of the equivalence for y = ±∞ as an exercise.
Next we give a necessary and sufficient condition for the convergence of a sequence in terms of its lim inf and lim sup.
Theorem 3.42. A sequence (xn ) of real numbers converges if and only if lim inf xn = lim sup xn = x n→∞ n→∞

are finite and equal, in which case lim xn = x.

n→∞

Furthermore, the sequence diverges to ∞ if and only if lim inf xn = lim sup xn = ∞ n→∞ n→∞

and diverges to −∞ if and only if lim inf xn = lim sup xn = −∞ n→∞ n→∞

3.6. The lim sup and lim inf

53

Proof. First suppose that lim inf xn = lim sup xn = x n→∞ n→∞

for some x ∈ R. Then yn ↓ x and zn ↑ x as n → ∞ where yn = sup {xk : k ≥ n} ,

zn = inf {xk : k ≥ n} .

Since zn ≤ xn ≤ yn , the “sandwich” theorem implies that lim xn = x.

n→∞

Conversely, suppose that the sequence (xn ) converges to a limit x ∈ R. Then for every > 0, there exists N ∈ N such that x − < xn < x +

for all n > N .

It follows that x − ≤ zn ≤ yn ≤ x + for all n > N .
Therefore yn , zn → x as n → ∞, so lim sup xn = lim inf xn = x.
The sequence (xn ) diverges to ∞ if and only if lim inf xn = ∞, and then lim sup xn = ∞, since lim inf xn ≤ lim sup xn . Similarly, (xn ) diverges to −∞ if and only if lim sup xn = −∞, and then lim inf xn = −∞.
If lim inf xn = lim sup xn , then we say that the sequence (xn ) oscillates. The difference lim sup xn − lim inf xn provides a measure of the size of the oscillations in the sequence as n → ∞.
Every sequence has a finite or infinite lim sup, but not every sequence has a limit (even if we include sequences that diverge to ±∞). The following corollary gives a convenient way to prove the convergence of a sequence without having to refer to the limit before it is known to exist.
Corollary 3.43. Let (xn ) be a sequence of real numbers. Then (xn ) converges with limn→∞ xn = x if and only if lim supn→∞ |xn − x| = 0.
Proof. If limn→∞ xn = x, then limn→∞ |xn − x| = 0, so lim sup |xn − x| = lim |xn − x| = 0. n→∞ n→∞

Conversely, if lim supn→∞ |xn − x| = 0, then
0 ≤ lim inf |xn − x| ≤ lim sup |xn − x| = 0, n→∞ n→∞

so lim inf n→∞ xn |xn − x| = lim supn→∞ xn |xn − x| = 0. Theorem 3.42 implies that limn→∞ |xn − x| = 0, or limn→∞ xn = x.
Note that the condition lim inf n→∞ |xn − x| = 0 doesn’t tell us anything about the convergence of (xn ).
Example 3.44. Let xn = 1 + (−1)n . Then (xn ) oscillates between 0 and 2, and lim inf xn = 0, n→∞ lim sup xn = 2. n→∞ The sequence is non-negative and its lim inf is 0, but the sequence does not converge.

54

3. Sequences

3.7. Cauchy sequences
Cauchy has become unbearable. Every Monday, broadcasting the known facts he has learned over the week as a discovery. I believe there is no historical precedent for such a talent writing so much awful rubbish. This is why I have relegated him to the rank below us. (Jacobi in a letter to
Dirichlet, 1841)
The Cauchy condition is a necessary and sufficient condition for the convergence of a real sequence that depends only on the terms of the sequence and not on its limit. Furthermore, the completeness of R can be defined by the convergence of
Cauchy sequences, instead of by the existence of suprema. This approach defines completeness in terms of the distance properties of R rather than its order properties and generalizes to other metric spaces that don’t have a natural ordering.
Roughly speaking, a Cauchy sequence is a sequence whose terms eventually get arbitrarily close together.
Definition 3.45. A sequence (xn ) of real numbers is a Cauchy sequence if for every > 0 there exists N ∈ N such that
|xm − xn | <

for all m, n > N .

Theorem 3.46. A sequence of real numbers converges if and only if it is a Cauchy sequence. Proof. First suppose that (xn ) converges to a limit x ∈ R. Then for every there exists N ∈ N such that
|xn − x| <

2

>0

for all n > N .

It follows that if m, n > N , then
|xm − xn | ≤ |xm − x| + |x − xn | < , which implies that (xn ) is Cauchy. (This direction doesn’t use the completeness of
R; for example, it holds equally well for sequence of rational numbers that converge in Q.)
Conversely, suppose that (xn ) is Cauchy. Then there is N1 ∈ N such that
|xm − xn | < 1

for all m, n > N1 .

It follows that if n > N1 , then
|xn | ≤ |xn − xN1 +1 | + |xN1 +1 | ≤ 1 + |xN1 +1 |.
Hence the sequence is bounded with
|xn | ≤ max {|x1 |, |x2 |, . . . , |xN1 |, 1 + |xN1 +1 |} .
Since the sequence is bounded, its lim sup and lim inf exist. We claim they are equal. Given > 0, choose N ∈ N such that the Cauchy condition in Definition 3.45 holds. Then xn − < xm < xn + for all m ≥ n > N .
It follows that for all n > N we have xn − ≤ inf {xm : m ≥ n} ,

sup {xm : m ≥ n} ≤ xn + ,

3.8. Subsequences

55

which implies that sup {xm : m ≥ n} − ≤ inf {xm : m ≥ n} + .
Taking the limit as n → ∞, we get that lim sup xn − ≤ lim inf xn + , n→∞ and since

n→∞

> 0 is arbitrary, we have lim sup xn ≤ lim inf xn . n→∞ n→∞

It follows that lim sup xn = lim inf xn , so Theorem 3.42 implies that the sequence converges. 3.8. Subsequences
A subsequence of a sequence (xn ) x1 , x2 , x3 , . . . , xn , . . . is a sequence (xnk ) of the form xn1 , xn2 , xn3 , . . . , xnk , . . . where n1 < n2 < n3 < · · · < nk < . . . .
Example 3.47. A subsequence of the sequence (1/n),
1 1 1 1
1, , , , , . . . .
2 3 4 5 is the sequence (1/k 2 )
1
1 1 1
,
, ....
1, , ,
4 9 16 25
2
Here, nk = k . On the other hand, the sequences
1 1 1 1
1
1 1 1
1, 1, , , , , . . . ,
, 1, , , , . . .
2 3 4 5
2
3 4 5 aren’t subsequences of (1/n) since nk is not a strictly increasing function of k in either case.
The standard short-hand notation for subsequences used above is convenient but not entirely consistent, and the notion of a subsequence is a bit more involved than it might appear at first sight. To explain it in more detail, we give the formal definition of a subsequence as a function on N.
Definition 3.48. Let (xn ) be a sequence, where xn = f (n) and f : N → R. A sequence (yk ), where yk = g(k) and g : N → R, is a subsequence of (xn ) if there is a strictly increasing function φ : N → N such that g = f ◦ φ. In that case, we write φ(k) = nk and yk = xnk .
Example 3.49. In Example 3.47, the sequence (1/n) corresponds to the function f (n) = 1/n and the subsequence (1/k 2 ) corresponds to g(k) = 1/k 2 . Here, g = f ◦φ with φ(k) = k 2 .
Note that since the indices in a subsequence form a strictly increasing sequence of integers (nk ), it follows that nk → ∞ as k → ∞.

56

3. Sequences

Proposition 3.50. Every subsequence of a convergent sequence converges to the limit of the sequence.
Proof. Suppose that (xn ) is a convergent sequence with limn→∞ xn = x and (xnk ) is a subsequence. Let > 0. There exists N ∈ N such that |xn − x| < for all n > N . Since nk → ∞ as k → ∞, there exists K ∈ N such that nk > N if k > K.
Then k > K implies that |xnk − x| < , so limk→∞ xnk = x.
A useful criterion for the divergence of a sequence follows immediately from this result and the uniqueness of limits.
Corollary 3.51. If a sequence has subsequences that converge to different limits, then the sequence diverges.
Example 3.52. The sequence ((−1)n+1 ),
1, −1, 1, −1, 1, . . . , has subsequences (1) and (−1) that converge to different limits, so it diverges.
In general, we define the limit set of a sequence to be the set of all limits of its convergent subsequences.
Definition 3.53. The limit set of a sequence (xn ) is the set
{x ∈ R : there is a subsequence (xnk ) such that xnk → x as k → ∞} of limits of all of its convergent subsequences.
The limit set of a convergent sequence consists of a single point, namely its limit. Example 3.54. The limit set of the divergent sequence ((−1)n+1 ),
1, −1, 1, −1, 1, . . . , contains two points, and is {−1, 1}.
Example 3.55. Let {rn : n ∈ N} be an enumeration of the rational numbers in [0, 1]. Every x ∈ [0, 1] is a limit of a subsequence (rnk ). To obtain such a subsequence recursively, choose n1 = 1, and for each k ≥ 2 choose a rational number rnk such that |x − rnk | < 1/k and nk > nk−1 . This is always possible since the rational numbers are dense in [0, 1] and every interval contains infinitely many terms of the sequence. Conversely, if rnk → x, then 0 ≤ x ≤ 1 since 0 ≤ rnk ≤ 1.
Thus, the limit set of (rn ) is the interval [0, 1].
Finally, we state a characterization of the lim sup and lim inf of a sequence in terms of of its limit set, where we use the usual conventions about ±∞. We leave the proof as an exercise.
Theorem 3.56. Suppose that (xn ) is sequence of real numbers with limit set S.
Then
lim sup xn = sup S, lim inf xn = inf S. n→∞ n→∞

3.9. The Bolzano-Weierstrass theorem

57

3.9. The Bolzano-Weierstrass theorem
The Bolzano-Weierstrass theorem is a fundamental compactness result. It allows us to deduce the convergence of a subsequence from the boundedness of a sequence without having to know anything specific about the limit. In this respect, it is analogous to the result that a monotone increasing sequence converges if it is bounded from above, and it also provides another way of expressing the completeness of R.
Theorem 3.57 (Bolzano-Weierstrass). Every bounded sequence of real numbers has a convergent subsequence.
Proof. Suppose that (xn ) is a bounded sequence of real numbers. Let
M = sup xn , n∈N m = inf xn , n∈N and define the closed interval I0 = [m, M ].
Divide I0 = L0 ∪ R0 in half into two closed intervals, where
L0 = [m, (m + M )/2],

R0 = [(m + M )/2, M ].

At least one of the intervals L0 , R0 contains infinitely many terms of the sequence, meaning that xn ∈ L0 or xn ∈ R0 for infinitely many n ∈ N (even if the terms themselves are repeated).
Choose I1 to be one of the intervals L0 , R0 that contains infinitely many terms and choose n1 ∈ N such that xn1 ∈ I1 . Divide I1 = L1 ∪ R1 in half into two closed intervals. One or both of the intervals L1 , R1 contains infinitely many terms of the sequence. Choose I2 to be one of these intervals and choose n2 > n1 such that xn2 ∈ I2 . This is always possible because I2 contains infinitely many terms of the sequence. Divide I2 in half, pick a closed half-interval I3 that contains infinitely many terms, and choose n3 > n2 such that xn3 ∈ I3 . Continuing in this way, we get a nested sequence of intervals I1 ⊃ I2 ⊃ I3 ⊃ . . . Ik ⊃ . . . of length
|Ik | = 2−k (M − m), together with a subsequence (xnk ) such that xnk ∈ Ik .
Let > 0 be given. Since |Ik | → 0 as k → ∞, there exists K ∈ N such that |Ik | < for all k > K. Furthermore, since xnk ∈ IK for all k > K we have
|xnj − xnk | < for all j, k > K. This proves that (xnk ) is a Cauchy sequence, and therefore it converges by Theorem 3.46.
The subsequence obtained in the proof of this theorem is not unique. In particular, if the sequence does not converge, then for some k ∈ N both the left and right intervals Lk and Rk contain infinitely many terms of the sequence. In that case, we can obtain convergent subsequences with different limits, depending on our choice of Lk or Rk . This loss of uniqueness is a typical feature of compactness arguments.
We can, however, use the Bolzano-Weierstrass theorem to give a criterion for the convergence of a sequence in terms of the convergence of its subsequences. It states that if every convergent subsequence of a bounded sequence has the same limit, then the entire sequence converges to that limit.
Theorem 3.58. If (xn ) is a bounded sequence of real numbers such that every convergent subsequence has the same limit x, then (xn ) converges to x.

58

3. Sequences

Proof. We will prove that if a bounded sequence (xn ) does not converge to x, then it has a convergent subsequence whose limit is not equal to x.
If (xn ) does not converges to x then there exists 0 > 0 such that |xn − x| ≥ for infinitely many n ∈ N. We can therefore find a subsequence (xnk ) such that
|xnk − x| ≥

0

0

for every k ∈ N. The subsequence (xnk ) is bounded, since (xn ) is bounded, so by the Bolzano-Weierstrass theorem, it has a convergent subsequence (xnkj ). If lim xnkj = y,

j→∞

then |x − y| ≥

0,

so x = y, which proves the result.

Chapter 4

Series

Divergent series are the devil, and it is a shame to base on them any demonstration whatsoever. (Niels Henrik Abel, 1826)
This series is divergent, therefore we may be able to do something with it. (Oliver Heaviside, quoted by Kline)
In this chapter, we apply our results for sequences to series, or infinite sums.
The convergence and sum of an infinite series is defined in terms of its sequence of finite partial sums.

4.1. Convergence of series
A finite sum of real numbers is well-defined by the algebraic properties of R, but in order to make sense of an infinite series, we need to consider its convergence. We say that a series converges if its sequence of partial sums converges, and in that case we define the sum of the series to be the limit of its partial sums.
Definition 4.1. Let (an ) be a sequence of real numbers. The series


an n=1 converges to a sum S ∈ R if the sequence (Sn ) of partial sums n Sn =

ak k=1 converges to S as n → ∞. Otherwise, the series diverges.
If a series converges to S, we write


S=

an . n=1 59

60

4. Series

We also say a series diverges to ±∞ if its sequence of partial sums does. As for sequences, we may start a series at other values of n than n = 1 without changing its convergence properties. It is sometimes convenient to omit the limits on a series when they aren’t important, and write it as an .
Example 4.2. If |a| < 1, then the geometric series with ratio a converges and its sum is

1 an =
.
1−a n=0 This series is simple enough that we can compute its partial sums explicitly, n ak =

Sn = k=0 1 − an+1
.
1−a

As shown in Proposition 3.31, if |a| < 1, then an → 0 as n → ∞, so that Sn →
1/(1 − a), which proves the result.
The geometric series diverges to ∞ if a ≥ 1, and diverges in an oscillatory fashion if a ≤ −1. The following examples consider the cases a = ±1 in more detail. Example 4.3. The series


1 = 1 + 1 + 1 + ... n=1 diverges to ∞, since its nth partial sum is Sn = n.
Example 4.4. The series


(−1)n+1 = 1 − 1 + 1 − 1 + . . . n=1 diverges, since its partial sums
Sn =

1
0

if n is odd, if n is even,

oscillate between 0 and 1.
This series illustrates the dangers of blindly applying algebraic rules for finite sums to series. For example, one might argue that
S = (1 − 1) + (1 − 1) + (1 − 1) + · · · = 0 + 0 + 0 + · · · = 0, or that
S = 1 + (−1 + 1) + (−1 + 1) + · · · = 1 + 0 + 0 + · · · = 1, or that
1 − S = 1 − (1 − 1 + 1 − 1 + . . . ) = 1 − 1 + 1 − 1 + 1 − · · · = S, so 2S = 1 or S = 1/2.
The Italian mathematician and priest Luigi Grandi (1710) suggested that these results were evidence in favor of the existence of God, since they showed that it was possible to create something out of nothing.

4.1. Convergence of series

61

Telescoping series of the form


(an − an+1 ) n=1 are another class of series whose partial sums
Sn = a1 − an+1 can be computed explicitly and then used to study their convergence. We give one example. Example 4.5. The series


1
1
1
1
1
=
+
+
+
+ ... n(n + 1)
1·2 2·3 3·4 4·5 n=1 converges to 1. To show this, we observe that
1
1
1
= −
,
n(n + 1) n n+1 so n

k=1

1
=
k(k + 1)

n

1
1
− k k+1

k=1

1
1
1 1 1 1 1 1
− + − + − + ··· + −
1 2 2 3 3 4 n n+1
1
,
=1−
n+1

=

and it follows that



k=1

1
= 1. k(k + 1)

A condition for the convergence of series with positive terms follows immediately from the condition for the convergence of monotone sequences.
Proposition 4.6. A series if its partial sums

an with positive terms an ≥ 0 converges if and only n ak ≤ M k=1 are bounded from above, otherwise it diverges to ∞. n Proof. The partial sums Sn = k=1 ak of such a series form a monotone increasing sequence, and the result follows immediately from Theorem 3.29
Although we have only defined sums of convergent series, divergent series are not necessarily meaningless. For example, the Ces`ro sum C of a series a an is defined by n 1
Sn ,
Sn = a1 + a2 + · · · + an .
C = lim n→∞ n k=1 62

4. Series

That is, we average the first n partial sums the series, and let n → ∞. One can prove that if a series converges to S, then its Ces`ro sum exists and is equal to S, a but a series may be Ces`ro summable even if it is divergent. a Example 4.7. For the series
1
n

(−1)n+1 in Example 4.4, we find that

n

1/2 + 1/(2n) if n is odd,
1/2
if n is even,

Sk = k=1 since the Sn ’s alternate between 0 and 1. It follows the Ces`ro sum of the series is a C = 1/2. This is, in fact, what Grandi believed to be the “true” sum of the series.
Ces`ro summation is important in the theory of Fourier series. There are also a many other ways to sum a divergent series or assign a meaning to it (for example, as an asymptotic series), but we won’t discuss them further here.

4.2. The Cauchy condition
The following Cauchy condition for the convergence of series is an immediate consequence of the Cauchy condition for the sequence of partial sums.
Theorem 4.8 (Cauchy condition). The series


an n=1 converges if and only for every

> 0 there exists N ∈ N such that

n

ak = |am+1 + am+2 + · · · + an | <

for all n > m > N .

k=m+1

Proof. The series converges if and only if the sequence (Sn ) of partial sums is
Cauchy, meaning that for every > 0 there exists N such that n |Sn − Sm | =

ak <

for all n > m > N ,

k=m+1

which proves the result.
A special case of this theorem is a necessary condition for the convergence of a series, namely that its terms approach zero. This condition is the first thing to check when considering whether or not a given series converges.
Theorem 4.9. If the series



an n=1 converges, then lim an = 0.

n→∞

Proof. If the series converges, then it is Cauchy. Taking m = n − 1 in the Cauchy condition in Theorem 4.8, we find that for every > 0 there exists N ∈ N such that
|an | < for all n > N , which proves that an → 0 as n → ∞.

4.2. The Cauchy condition

63

Example 4.10. The geometric series an converges if |a| < 1 and in that case n n a → 0 as n → ∞. If |a| ≥ 1, then a → 0 as n → ∞, which implies that the series diverges. The condition that the terms of a series approach zero is not, however, sufficient to imply convergence. The following series is a fundamental example.
Example 4.11. The harmonic series


1 1 1
1
= 1 + + + + ... n 2 3 4 n=1 diverges, even though 1/n → 0 as n → ∞. To see this, we collect the terms in successive groups of powers of two,


1
1
=1+ + n 2 n=1 1 1
+
3 4

1 1 1 1
+ + +
5 6 7 8

+

1
1 1
1 1 1 1
+
+
+
+ + +
2
4 4
8 8 8 8
1 1 1 1
> 1 + + + + + ....
2 2 2 2
In general, for every n ≥ 1, we have
>1+

2n+1

k=1

+

1
1
1
+
+ ··· +
16 16
16

k=2j +1
2j+1

n

1
>1+ +
2 j=1

1
1
1
+
+ ··· +
9 10
16

2j+1

n

1
1
=1+ + k 2 j=1

+

k=2j +1

+ ...
+ ...

1 k 1
2j+1

n

>1+

1
1
+
2 j=1 2

n 3
+ ,
2
2 so the series diverges. We can similarly obtain an upper bound for the partial sums,
>

2n+1

k=1

n

1
1

n n=1 n=1

1
1
.
2 n=1 n

Proposition 4.17. An absolutely convergent series converges. Moreover,


an n=1 converges absolutely if and only if the series




a− n a+ , n n=1

n=1

of positive and negative terms both converge. Furthermore, in that case




n=1



n=1



a− , n a+ − n an =



n=1

n=1

Proof. If an is absolutely convergent, then the Cauchy condition. Since

a− . n n=1

n=1

|an | is convergent, so it satisfies

n

n

|ak |,

ak ≤ k=m+1 k=m+1

the series



a+ + n |an | =

an also satisfies the Cauchy condition, and therefore it converges.

For the second part, note that n 0≤

n

n

k=m+1 n k=m+1 n a+ ≤ k 0≤ k=m+1 n

k=m+1

k=m+1

|ak |, k=m+1 n

a− ≤ k 0≤

a− , k a+ + k |ak | =

|ak |, k=m+1 which shows that
|an | is Cauchy if and only if both a+ , a− are Cauchy. n n
It follows that
|an | converges if and only if both a+ , a− converge. In that n n

66

4. Series

case, we have


n

an = lim

n→∞

n=1

ak k=1 n

n

n→∞

=
=
and similarly for

a− k a+ − k = lim

k=1 k=1 n n lim a+ − lim a− k k n→∞ n→∞ k=1 k=1 ∞

a+ − a− , n n n=1 n=1

|an |, which proves the proposition.

It is worth noting that this result depends crucially on the completeness of R.
Example 4.18. Suppose that a+ , a− ∈ Q+ are positive rational numbers such that n n


a+ = n √



a− = 2 − n 2,



2,

n=1

n=1

and let an = a+ − a− . Then n n




n=1




n=1


n=1


a− = 2 ∈ Q. n a+ + n |an | = n=1 √ a− = 2 2 − 2 ∈ Q,
/
n

a+ − n an =

n=1

n=1

Thus, the series converges absolutely in Q, but it doesn’t converge in Q.

4.4. The comparison test
One of the most useful ways of showing that a series is absolutely convergent is to compare it with a simpler series whose convergence is already known.
Theorem 4.19 (Comparison test). Suppose that bn ≥ 0 and


bn n=1 converges. If |an | ≤ bn , then


an n=1 converges absolutely.
Proof. Since

bn converges it satisfies the Cauchy condition, and since n n

|ak | ≤ k=m+1 bk k=m+1 4.4. The comparison test

the series absolutely. 67

|an | also satisfies the Cauchy condition. Therefore

an converges

Example 4.20. The series


1
1
1
1
= 1 + 2 + 2 + 2 + .... n2 2
3
4 n=1 converges by comparison with the telescoping series in Example 4.5. We have




0≤

1
1
<
.
(n + 1)2 n(n + 1)

1
1
=1+
2
n
(n + 1)2 n=1 n=1 and We also get the explicit upper bound




1
1
1 and divergence for 0 < p ≤ 1, using the integral test, is given in Example 12.44.

68

4. Series

4.5. * The Riemann ζ-function
Example 4.21 justifies the following definition.
Definition 4.22. The Riemann ζ-function is defined for 1 < s < ∞ by


ζ(s) =

1 ns n=1

For instance, as stated in Example 4.20, we have ζ(2) = π 2 /6. In fact, Euler
(1755) discovered a general formula for the value ζ(2n) of the ζ-function at even natural numbers, ζ(2n) = (−1)n+1

(2π)2n B2n
,
2(2n)!

n = 1, 2, 3, . . . ,

where the coefficients B2n are the Bernoulli numbers (see Example 10.19). In particular, ζ(4) =

π4
,
90

ζ(6) =

π6
,
945

ζ(8) =

π8
,
9450

ζ(10) =

π 10
.
93555

On the other hand, the values of the ζ-function at odd natural numbers are harder to study. For instance,


ζ(3) =

1
= 1.2020569 . . . n3 n=1

is called Ap´ry’s constant. It was proved to be irrational by Ap´ry (1979) but a e e simple explicit expression for ζ(3) is not known (and likely doesn’t exist).
The Riemann ζ-function is intimately connected with number theory and the distribution of primes. Every positive integer n has a unique factorization n = pα1 pα2 . . . pαk ,
1
2 k where the pj are primes and the exponents αj are positive integers. Using the binomial expansion in Example 4.2, we have
1−

1 ps −1

=1+

1
1
1
1
+ 2s + 3s + 4s + . . . . s p p p p By expanding the products and rearranging the resulting sums, one can see that
1−

ζ(s) = p 1 ps −1

,

where the product is taken over all primes p, since every possible prime factorization of a positive integer appears exactly once in the sum on the right-hand side. The infinite product here is defined as a limit of finite products,
1−
p

1 ps −1

1−

= lim

N →∞

p≤N

1 ps −1

.

4.6. The ratio and root tests

69

Using complex analysis, one can show that the ζ-function may be extended in a unique way to an analytic (i.e., differentiable) function of a complex variable s = σ + it ∈ C ζ : C \ {1} → C, where σ = s is the real part of s and t = s is the imaginary part. The ζfunction has a singularity at s = 1, called a simple pole, where it goes to infinity like
1/(1−s), and is equal to zero at the negative even integers s = −2, −4, . . . , −2n, . . . .
These zeros are called the trivial zeros of the ζ-function. Riemann (1859) made the following conjecture.
Hypothesis 4.23 (Riemann hypothesis). Except for the trivial zeros, the only zeros of the Riemann ζ-function occur on the line s = 1/2.
If true, the Riemann hypothesis has significant consequences for the distribution of primes (and many other things); roughly speaking, it implies that the prime numbers are “randomly distributed” among the natural numbers (with density
1/ log n near a large integer n ∈ N). Despite enormous efforts, this conjecture has neither been proved nor disproved, and it remains one of the most significant open problems in mathematics (perhaps the most significant open problem).

4.6. The ratio and root tests
In this section, we describe the ratio and root tests, which provide explicit sufficient conditions for the absolute convergence of a series that can be compared with a geometric series. These tests are particularly useful in studying power series, but they aren’t effective in determining the convergence or divergence of series whose terms do not approach zero at a geometric rate.
Theorem 4.24 (Ratio test). Suppose that (an ) is a sequence of nonzero real numbers such that the limit an+1 r = lim n→∞ an exists or diverges to infinity. Then the series


an n=1 converges absolutely if 0 ≤ r < 1 and diverges if 1 < r ≤ ∞.
Proof. If r < 1, choose s such that r < s < 1. Then there exists N ∈ N such that an+1 N . an It follows that
|an | ≤ M sn for all n > N where M is a suitable constant. Therefore an converges absolutely by comparison with the convergent geometric series
M sn .
If r > 1, choose s such that r > s > 1. There exists N ∈ N such that an+1 >s for all n > N , an 70

4. Series

so that |an | ≥ M sn for all n > N and some M > 0. It follows that (an ) does not approach 0 as n → ∞, so the series diverges.
Example 4.25. Let a ∈ R, and consider the series


nan = a + 2a2 + 3a3 + . . . . n=1 Then

1
(n + 1)an+1
= |a| lim 1 +
= |a|. n→∞ n→∞ nan n
By the ratio test, the series converges if |a| < 1 and diverges if |a| > 1; the series also diverges if |a| = 1. The convergence of the series for |a| < 1 is explained by the fact that the geometric decay of the factor an is more rapid than the algebraic growth of the coefficient n. lim Example 4.26. Let p > 0 and consider the p-series


1
.
np n=1 Then

1/(n + 1)p
1
= lim
= 1, n→∞ n→∞ (1 + 1/n)p
1/np
so the ratio test is inconclusive. In this case, the series diverges if 0 < p ≤ 1 and converges if p > 1, which shows that either possibility may occur when the limit in the ratio test is 1. lim The root test provides a criterion for convergence of a series that is closely related to the ratio test, but it doesn’t require that the limit of the ratios of successive terms exists.
Theorem 4.27 (Root test). Suppose that (an ) is a sequence of real numbers and let 1/n r = lim sup |an |
.
n→∞

Then the series



an n=1 converges absolutely if 0 ≤ r < 1 and diverges if 1 < r ≤ ∞.
Proof. First suppose 0 ≤ r < 1. If 0 < r < 1, choose s such that r < s < 1, and let r t= , r < t < 1. s If r = 0, choose any 0 < t < 1. Since t > lim sup |an |1/n , Theorem 3.41 implies that there exists N ∈ N such that
|an |1/n < t

for all n > N .

Therefore |an | < tn for all n > N , where t < 1, so it follows that the series converges by comparison with the convergent geometric series tn .

4.7. Alternating series

71

Next suppose 1 < r ≤ ∞. If 1 < r < ∞, choose s such that 1 < s < r, and let r t= ,
1 < t < r. s If r = ∞, choose any 1 < t < ∞. Since t < lim sup |an |1/n , Theorem 3.41 implies that |an |1/n > t for infinitely many n ∈ N. n Therefore |an | > t for infinitely many n ∈ N, where t > 1, so (an ) does not approach zero as n → ∞, and the series diverges.
The root test may succeed where the ratio test fails.
Example 4.28. Consider the geometric series with ratio 1/2,


an = n=1 1
1
1
1
1
+ 2 + 3 + 4 + 5 + ...,
2 2
2
2
2

an =

1
.
2n

Then (of course) both the ratio and root test imply convergence since an+1 1
= lim sup |an |1/n = < 1. an 2 n→∞ Now consider the series obtained by switching successive odd and even terms lim n→∞



bn = n=1 1
1
1
1
1
+ + 4 + 3 + 6 + ...,
22
2 2
2
2

bn =

1/2n+1
1/2n−1

if n is odd, if n is even

For this series, bn+1 = bn 2 if n is odd,
1/8 if n is even,

and the ratio test doesn’t apply, since the required limit does not exist. (The series still converges at a geometric rate, however, because the the decrease in the terms by a factor of 1/8 for even n dominates the increase by a factor of 2 for odd n.) On the other hand
1
lim sup |bn |1/n = ,
2
n→∞ so the ratio test still works. In fact, as we discuss in Section 4.8, since the series is absolutely convergent, every rearrangement of it converges to the same sum.

4.7. Alternating series
An alternating series is one in which successive terms have opposite signs. If the terms in an alternating series have decreasing absolute values and converge to zero, then the series converges however slowly its terms approach zero. This allows us to prove the convergence of some series which aren’t absolutely convergent.
Example 4.29. The alternating harmonic series from Example 4.14 is


(−1)n+1
1 1 1 1
= 1 − + − + − .... n 2 3 4 5 n=1 The behavior of its partial sums is shown in Figure 1, which illustrates the idea of the convergence proof for alternating series.

72

4. Series

1.2

1

sn

0.8

0.6

0.4

0.2

0

0

5

10

15

20 n 25

30

35

40

Figure 1. A plot of the first 40 partial sums Sn of the alternating harmonic series in Example 4.14. The odd partial sums decrease and the even partial sums increase to the sum of the series log 2 ≈ 0.6931, which is indicated by the dashed line.

Theorem 4.30 (Alternating series). Suppose that (an ) is a decreasing sequence of nonnegative real numbers, meaning that 0 ≤ an+1 ≤ an , such that an → 0 as n → ∞. Then the alternating series


(−1)n+1 an = a1 − a2 + a3 − a4 + a5 − . . . n=1 converges.
Proof. Let

n

(−1)k+1 ak

Sn =

k=1

denote the nth partial sum. If n = 2m − 1 is odd, then
S2m−1 = S2m−3 − a2m−2 + a2m−1 ≤ S2m−3 , since (an ) is decreasing, and
S2m−1 = (a1 − a2 ) + (a3 − a4 ) + · · · + (a2m−3 − a2m−2 ) + a2m−1 ≥ 0.
Thus, the sequence (S2m−1 ) of odd partial sums is decreasing and bounded from below by 0, so S2m−1 ↓ S + as m → ∞ for some S + ≥ 0.
Similarly, if n = 2m is even, then
S2m = S2m−2 + a2m−1 − a2m ≥ S2m−2 ,

4.8. Rearrangements

73

and
S2m = a1 − (a2 − a3 ) − (a4 − a5 ) − · · · − (a2m−1 − a2m ) ≤ a1 .
Thus, (S2m ) is increasing and bounded from above by a1 , so S2m ↑ S − ≤ a1 as m → ∞.
Finally, note that lim (S2m−1 − S2m ) = lim a2m = 0,

m→∞

m→∞

so S + = S − , which implies that the series converges to their common value.
The proof also shows that the sum S2m ≤ S ≤ S2n−1 is bounded from below and above by all even and odd partial sums, respectively, and that the error |Sn −S| is less than the first term an+1 in the series that is neglected.
Example 4.31. The alternating p-series


(−1)n+1 np n=1 converges for every p > 0. The convergence is absolute for p > 1 and conditional for 0 < p ≤ 1.

4.8. Rearrangements
A rearrangement of a series is a series that consists of the same terms in a different order. The convergence of rearranged series may initially appear to be unconnected with absolute convergence, but absolutely convergent series are exactly those series whose sums remain the same under every rearrangement of their terms. On the other hand, a conditionally convergent series can be rearranged to give any sum we please, or to diverge.
Example 4.32. A rearrangement of the alternating harmonic series in Example 4.14 is
1 1 1 1 1 1
1
1
1− − + − − + −

+ ...,
2 4 3 6 8 5 10 12 where we put two negative even terms between each of the positive odd terms. The behavior of its partial sums is shown in Figure 2. As proved in Example 12.47, this series converges to one-half of the sum of the alternating harmonic series. The sum of the alternating harmonic series can change under rearrangement because it is conditionally convergent.
Note also that both the positive and negative parts of the alternating harmonic series diverge to infinity, since
1 1 1
1 1 1 1
1 + + + + ... > + + + + ...
3 5 7
2 4 6 8
1
1 1 1
>
1 + + + + ... ,
2
2 3 4
1 1 1 1
1
1 1 1
+ + + + ... =
1 + + + + ... ,
2 4 6 8
2
2 3 4 and the harmonic series diverges. This is what allows us to change the sum by rearranging the series.

74

4. Series

1.2

1

sn

0.8

0.6

0.4

0.2

0

0

5

10

15

20 n 25

30

35

40

Figure 2. A plot of the first 40 partial sums Sn of the rearranged alternating harmonic series in Example 4.32. The series converges to half the sum of the
1
alternating harmonic series, 2 log 2 ≈ 0.3466. Compare this picture with Figure 1.

The formal definition of a rearrangement is as follows.
Definition 4.33. A series


bm m=1 is a rearrangement of a series


an n=1 if there is a one-to-one, onto function f : N → N such that bm = af (m) .
If
bm is a rearrangement of ment of bm , with m = f −1 (n).

an with n = f (m), then

an is a rearrange-

Theorem 4.34. If a series is absolutely convergent, then every rearrangement of the series converges to the same sum.
Proof. First, suppose that


an n=1 4.8. Rearrangements

75

is a convergent series with an ≥ 0, and let


bm ,

bm = af (m)

m=1

be a rearrangement.
Given

> 0, choose N ∈ N such that


0≤

N

ak −

ak < .

k=1

k=1

Since f : N → N is one-to-one and onto, there exists M ∈ N such that
{1, 2, . . . , N } ⊂ f −1 ({1, 2, . . . , M }) , meaning that all of the terms a1 , a2 ,. . . , aN are included among the b1 , b2 ,. . . , bM .
For example, we can take M = max{m ∈ N : 1 ≤ f (m) ≤ N }; this maximum is well-defined since there are finitely many such m (in fact, N of them).
If m > M , then


m

N

bj ≤

ak ≤ j=1 k=1

ak k=1 since the bj ’s include all the ak ’s in the left sum, all the bj ’s are included among the ak ’s in the right sum, and ak , bj ≥ 0. It follows that


0≤

m

ak −

bj < , j=1 k=1

for all m > M , which proves that




bj = j=1 ak . k=1 If an is a general absolutely convergent series, then from Proposition 4.17 the positive and negative parts of the series




a− n a+ , n n=1

n=1

converge. If bm is a rearrangement of an , then b+ and b− are rearrangem m +

ments of an and an , respectively. It follows from what we’ve just proved that they converge and






b+ = m m=1

n=1

Proposition 4.17 then implies that




m=1

m=1

which proves the result.

m=1



b− = m m=1

a− . n n=1

bm is absolutely convergent and



b+ − m bm =



b− = m a+ , n ∞

n=1



a− = n a+ − n n=1

an , n=1 76

4. Series

2
1.8
1.6
1.4

Sn

1.2
1
0.8
0.6
0.4
0.2
0

0

5

10

15

20 n 25

30

35

40

Figure 3. A plot of the first 40 partial sums Sn of the rearranged alternating

harmonic series described in Example 4.35, which converges to 2.

Conditionally convergent series behave completely differently from absolutely convergent series under rearrangement. As Riemann observed, they can be rearranged to give any sum we want, or to diverge. Before giving the proof, we illustrate the idea with an example.
Example 4.35. Suppose we want to rearrange the alternating harmonic series
1 1 1 1 1
1 − + − + − + ....
2 3 4 5 6

so that its sum is 2 ≈ 1.4142. We choose positive terms until we get a partial sum

that is greater than 2, √ which gives 1 + 1/3 + 1/5; followed by negative terms until we get a sum less than 2, which gives 1 + 1/3 + 1/5 − 1/2; followed by positive

terms until we get a sum greater than 2, which gives
1 1 1 1 1
1
1
1+ + − + + +
+ ;
3 5 2 7 9 11 13

followed by another negative term −1/4 to get a sum less than 2; and so on. The first 40 partial sums of the resulting series are shown in Figure 3.
Theorem 4.36. If a series is conditionally convergent, then it has rearrangements that converge to an arbitrary real number and rearrangements that diverge to ∞ or −∞.
Proof. Suppose that an is conditionally convergent. Since the series converges, an → 0 as n → ∞. If both the positive part a+ and negative part a− of the n n

4.9. The Cauchy product

77

series converge, then the series converges absolutely; and if only one part diverges, then the series diverges (to ∞ if a+ diverges, or −∞ if a− diverges). Therefore n n both a+ and a− diverge. This means that we can make sums of successive n n positive or negative terms in the series as large as we wish.
Suppose S ∈ R. Starting from the beginning of the series, we choose successive positive or zero terms in the series until their partial sum is greater than or equal to S. Then we choose successive strictly negative terms, starting again from the beginning of the series, until the partial sum of all the terms is strictly less than
S. After that, we choose successive positive or zero terms until the partial sum is greater than or equal S, followed by negative terms until the partial sum is strictly less than S, and so on. The partial sums are greater than S by at most the value of the last positive term retained, and are less than S by at most the value of the last negative term retained. Since an → 0 as n → ∞, it follows that the rearranged series converges to S.
A similar argument shows that we can rearrange a conditional convergent series to diverge to ∞ or −∞, and that we can rearrange the series so that it diverges in a finite or infinite oscillatory fashion.
The previous results indicate that conditionally convergent series behave in many ways more like divergent series than absolutely convergent series.

4.9. The Cauchy product
In this section, we prove a result about the product of absolutely convergent series that is useful in multiplying power series. It is convenient to begin numbering the terms of the series at n = 0.
Definition 4.37. The Cauchy product of the series




an ,

bn

n=0

is the series

n=0



n

n=0

k=0

ak bn−k

.

The Cauchy product arises formally by term-by-term multiplication and rearrangement:
(a0 + a1 + a2 + a3 + . . . ) (b0 + b1 + b2 + b3 + . . . )
= a0 b0 + a0 b1 + a0 b2 + a0 b3 + · · · + a1 b0 + a1 b1 + a1 b2 + . . .
+ a2 b0 + a2 b1 + · · · + a3 b0 + . . .
= a0 b0 + (a0 b1 + a1 b0 ) + (a0 b2 + a1 b1 + a2 b0 )
+ (a0 b3 + a1 b2 + a2 b1 + a3 b0 ) + . . . .
In general, writing m = n − k, we have formally that




an n=0 ∞

bn n=0 ∞

=



n

ak bm = k=0 m=0

ak bn−k . n=0 k=0

78

4. Series

There are no convergence issues about the individual terms in the Cauchy product, n since k=0 ak bn−k is a finite sum.
Theorem 4.38 (Cauchy product). If the series




an ,

bn

n=0

n=0

are absolutely convergent, then the Cauchy product is absolutely convergent and




n

ak bn−k n=0 ∞

an

=

bn

n=0

k=0

.

n=0

Proof. For every N ∈ N, we have n N

N

n

n=0

k=0

ak bn−k ≤ n=0 k=0

|ak ||bn−k |
N

N



|ak |

|bm | m=0 ∞

k=0




|an |

|bn | .

n=0

n=0

Thus, the Cauchy product is absolutely convergent, since the partial sums of its absolute values are bounded from above.
Since the series for the Cauchy product is absolutely convergent, any rearrangement of it converges to the same sum. In particular, the subsequence of partial sums given by
N

N

N

an n=0 bn n=0 N

=

an bm n=0 m=0

corresponds to a rearrangement of the Cauchy product, so


n

N

ak bn−k n=0 k=0

N →∞



N

= lim

an n=0 bn n=0 =



an n=0 bn

.

n=0

In fact, as we discuss in the next section, since the series of term-by-term products of absolutely convergent series converges absolutely, every rearrangement of the product series — not just the one in the Cauchy product — converges to the product of the sums.

4.10. * Double series
A double series is a series of the form


amn , m,n=1 4.10. * Double series

79

where the terms amn are indexed by a pair of natural numbers m, n ∈ R. More formally, the terms are defined by a function f : N × N → R where amn = f (m, n).
In this section, we consider sums of double series; this material is not used later on.
There are many different ways to define the sum of a double series since, unlike the index set N of a series, the index set N × N of a double series does not come with a natural linear order. Our main interest here is in giving a definition of the sum of a double series that does not depend on the order of its terms. As for series, this unordered sum exists if and only if the double series is absolutely convergent.
If F ⊂ N × N is a finite subset of pairs of natural numbers, then we denote by amn =
F

amn
(m,n)∈F

the partial sum of all terms amn whose indices (m, n) belong to F .
Definition 4.39. The unordered sum of nonnegative real numbers amn ≥ 0 is amn = sup

amn

F ∈F

N×N

F

where the supremum is taken over the collection F of all finite subsets F ⊂ N × N.
The unordered sum converges if this supremum is finite and diverges to ∞ if this supremum is ∞.
In other words, the unordered sum of a double series of nonnegative terms is the supremum of the set of all finite partial sums of the series. Note that this supremum exists if and only if the finite partial sums of the series are bounded from above.
Example 4.40. The unordered sum
1
1
1
1
1
1
=
+
+
+
+
+ ...
(m + n)p
(1 + 1)p
(1 + 2)p
(2 + 1)p
(1 + 3)p
(2 + 2)p
(m,n)∈N×N

converges if p > 2 and diverges if p ≤ 2. To see this, first note that if
T = {(m, n) ∈ N × N : 2 ≤ m + n ≤ N } is a “triangular” set of indices, then
N −1 N −m

T

1
=
(m + n)p m=1 N

= k=2 N

= k=2 1
(m + n)p

n=1 k 1 kp n=1
1
. k p−1

It follows that

N×N

1

(m + n)p

N

k=2

1 k p−1

80

4. Series

for every N ∈ N, so the double series diverges if p ≤ 2 since the (p − 1)-series diverges. Moreover, if p > 2 and F is a finite subset of N × N, then there exists a triangular set T such that F ⊂ T , so that

F

1

(m + n)p

1
<
(m + n)p

T



k=2

1
.
k p−1

It follows that the unordered sum converges if p > 2, with

N×N

1
=
(m + n)p



1 k p−1

k=2

.

Note that this double p-series converges only if its terms approach zero at a faster rate than the terms in a single p-series (of degree greater than 2 in (m, n) rather than degree greater than 1 in n).
We define a general unordered sum of real numbers by summing its positive and negative terms separately. (Recall the notation in Definition 4.15 for the positive and negative parts of a real number.)
Definition 4.41. The unordered sum of a double series of real numbers is

N×N

a+ mn a− mn a+ − mn amn =
N×N

N×N

− a− mn is the decomposition of amn into its positive and negative where amn = a− are finite; diverges parts. The unordered sum converges if both a+ and mn mn

+ a+ is finite and amn is finite; diverges to −∞ if to ∞ if amn = ∞ and mn −
+
− amn diverge to ∞. amn and amn = ∞; and is undefined if both
This definition does not require us to order the index set N × N in any way; in fact, the same definition applies to any series of real numbers ai i∈I

whose terms are indexed by an arbitrary set I. A sum over a set of indices will always denote an unordered sum.
Definition 4.42. A double series


amn m,n=1 of real numbers converges absolutely if the unordered sum
|amn | < ∞
N×N

converges.
The following result is a straightforward consequence of the definitions and the completeness of R.

4.10. * Double series

81

Proposition 4.43. An unordered sum amn of real numbers converges if and only if it converges absolutely, and in that case

N×N

a− . mn a+ + mn |amn | =
N×N

N×N

Proof. First, suppose that the unordered sum amn N×N

converges absolutely. If F ⊂ N × N is any finite subset, then a± ≤ mn F

|amn | ≤

|amn |.

F

N×N

It follows that the finite partial sums of the positive and negative terms of the series are bounded from above, so the unordered sum converges.
Conversely, suppose that the unordered sum converges. If F ⊂ N × N is any finite subset, then
F

a− ≤ mn a+ + mn |amn | =
F

F

a− . mn a+ + mn N×N

N×N

It follows that the finite partial sums of the absolute values of the terms in the series are bounded from above, so the unordered sum converges absolutely. Furthermore,

N×N

a− . mn a+ + mn |amn | ≤
N×N

N×N

To prove the inequality in the other direction, let
F+ , F− ⊂ N × N such that a+ > mn F+

N×N

> 0. There exist finite sets

a− > mn a+ − , mn 2

F−

N×N

a− − . mn 2

Let F = F+ ∪ F− . Then, since a± ≥ 0, we have mn F

F

F

a+ mn ≥

F−

a+ mn >

a− mn +

F+

N×N

Since

a− mn a+ + mn |amn | =

a− − . mn +
N×N

> 0 is arbitrary, it follows that

N×N

a− , mn a+ + mn |amn | ≥
N×N

N×N

which completes the proof.
Next, we define rearrangements of a double series into single series and show that every rearrangement of a convergent unordered sum converges to the same sum. 82

4. Series

Definition 4.44. A rearrangement of a double series


amn m,n=1 is a series of the form



bk ,

bk = aσ(k)

k=1

where σ : N → N × N is a one-to-one, onto map.
Example 4.45. The rearrangement corresponding to the map f : N → N × N defined in the proof of Proposition 1.45 is given by


amn = a11 + a21 + a12 + a31 + a22 + a13 + a41 + a32 + a23 + a14 + . . . . m,n=1 Theorem 4.46. If the unordered sum of a double series of real numbers converges, then every rearrangement of the double series into a single series converges to the unordered sum.
Proof. Suppose that the unordered sum

amn converges with a− = S− , mn a+ = S+ , mn N×N

N×N

and let



bk k=1 be a rearrangement of the double series corresponding to a map σ : N → N × N.
For k ∈ N, let
Fk = {(m, n) ∈ N × N : (m, n) = σ(j) for some 1 ≤ j ≤ k} , so that k bj = j=1 Given

Fk

a− . mn a+ − mn amn =
Fk

Fk

> 0, choose finite sets
F+ , F− ⊂ N × N

such that a+ ≤ S+ , mn S+ − <

a− ≤ S− , mn S− − <
F−

F+

and define N ∈ N by
N = max {j ∈ N : σ(j) ∈ F+ ∪ F− } .
If k ≥ N , then Fk ⊃ F+ ∪ F− and, since a± ≥ 0, mn a+ ≤ mn F+

a− ≤ mn a+ ≤ S+ , mn Fk

F−

a− ≤ S− . mn Fk

4.10. * Double series

83

It follows that a+ ≤ S+ , mn S+ − <

a− ≤ S− , mn S− − <

Fk

Fk

which implies that k bj − (S+ − S− ) < . j=1 This inequality proves that the rearrangement sum S+ − S− of the double series.

bk converges to the unordered

The rearrangement of a double series into a single series is one natural way to interpret a double series in terms of single series. Another way is to use iterated sums of single series.
Given a double series amn , one can define two iterated sums, obtained by summing first over one index followed by the other:




m=1


M →∞



amn

N →∞

m=1

M

lim

= lim

N →∞

m=1

n=1

.

n=1

N

amn

;

amn

lim

= lim

n=1

n=1

N

M

amn

M →∞

m=1

As the following example shows, these iterated sums may not equal, even if both of them converge.
Example 4.47. Define amn by amn 
1

= −1


0

if n = m + 1 if m = n + 1 otherwise Then, by writing out the terms amn in a table, one can see that


amn = n=1 1
0



if m = 1
,
otherwise

amn = m=1 −1
0

if n = 1
,
otherwise

so that






m=1



n=1

amn

m=1

= 1,

n=1

amn

= −1.

Note that for this series
|amn | = ∞,
N×N

so it is not absolutely convergent. Furthermore, both of the sums a− mn

a+ , mn N×N

N×N

diverge to ∞, so the unordered sum is not well-defined.

84

4. Series

The following basic result, which is a special case of Fubini’s theorem, guarantees that both of the iterated sums of an absolutely convergent double series exist and are equal to the unordered sum. It also gives a criterion for the absolute convergence of a double series in terms of the convergence of its iterated sums. The key point of the proof is that a double sum over a “rectangular” set of indices is equal to an iterated sum. We can then estimate sums over arbitrary finite subsets of indices in terms of sums over rectangles.
Theorem 4.48 (Fubini for double series). A double series of real numbers converges absolutely if and only if either one of the iterated series








n=1

m=1

|amn | , m=1 amn

|amn |

n=1

converges. In that case, both iterated series converge to the unordered sum of the double series:




n=1

amn m=1 N×N





amn =

m=1

amn

=

n=1

.

Proof. First suppose that one of the iterated sums of the absolute values exists.
Without loss of generality, we suppose that




m=1

n=1

|amn |

< ∞.

Let F ⊂ N × N be a finite subset. Choose M, N ∈ N such that m ≤ M and n ≤ N for all (m, n) ∈ F . Then F ⊂ R where the rectangle R is given by
R = {1, 2, . . . , M } × {1, 2, . . . , N }, so that
M

|amn | ≤
F



|amn | m=1 R

Thus, the finite partial sums of sum converges absolutely.



m=1

N

|amn | =

n=1



n=1

|amn | .

|amn | are bounded from above, so the unordered

Conversely, suppose that the unordered sum converges absolutely. Then, using
Proposition 2.25 and the fact that the supremum of partial sums of non-negative terms over rectangles in N × N is equal to the supremum over all finite subsets, we get that




M

|amn | m=1 N

= sup
M ∈N

m=1 N ∈N

|amn |

sup

n=1

M

=

n=1
N

|amn |

sup
(M,N )∈N×N

|amn |.

=
N×N

m=1 n=1

4.10. * Double series

85

Thus, the iterated sums converge to the unordered sum. Moreover, we have similarly that






a+ mn m=1

n=1



a− mn a+ , ij =

m=1

N×N

a− , ij =

n=1

N×N

which implies that






amn m=1 ∞



a+ mn =

n=1

m=1



a− mn −

n=1

m=1

n=1

=

amn .
N×N

The preceding results show that the sum of an absolutely convergent double series is unambiguous; unordered sums, sums of rearrangements into single series, and iterated sums all converge to the same value. On the other hand, the sum of a conditionally convergent double series depends on how it is defined, e.g., on how one chooses to rearrange the double series into a single series. We conclude by describing one other way to define double sums.
Example 4.49. A natural way to generalize the sum of a single series to a double series, going back to Cauchy (1821) and Pringsheim (1897), is to say that


amn = S m,n=1 if for every

> 0 there exists M, N ∈ N such that n m

aij − S < i=1 j=1

for all m > M and n > N . We write this definition briefly as


M

amn = m,n=1 N

amn .

lim

M,N →∞

m=1 n=1

That is, we sum over larger and larger “rectangles” of indices in the N × N-plane.
An absolutely convergent series converges in this sense to the same sum, but some conditionally convergent series also converge. For example, using this definition of the sum, we have


M

(−1)m+n
= lim
M,N →∞ mn m,n=1

N

(−1)m+n mn m=1 n=1
M

(−1)m m m=1

= lim

M →∞


(−1)n
=
n n=1 2

= (log 2) ,

2

N

lim

N →∞

(−1)n n n=1

86

4. Series

but the series is not absolutely convergent, since the sums of both its positive and negative terms diverge to ∞.
This definition of a sum is not, however, as easy to use as the unordered sum of an absolutely convergent series. For example, it does not satisfy Fubini’s theorem
(although one can show that if the sum of a double series exists in this sense and the iterated sums also exist, then they are equal [16]).

4.11. * The irrationality of e
In this section, we use series to prove that e is an irrational number. In Proposition 3.32, we defined e ≈ 2.71828 . . . as the limit e = lim

1+

n→∞

1 n n

.

We first obtain an alternative expression for e as the sum of a series.
Proposition 4.50.


e=

1
.
n! n=0 Proof. Using the binomial theorem, as in the proof of Proposition 3.32, we find that 1+

1 n n

=2+

1
2!

1−

1 n +

1
3!

1−

1 n 1−

2 n 1
1
2 k−1 1−
1−
... 1 − k! n n n
1
2
2 1
1
1−
1−
... ·
+ ··· + n! n n n n
1
1
1
1
< 2 + + + + ··· + .
2! 3! 4! n! Taking the limit of this inequality as n → ∞, we get that
+ ··· +



e≤

1
.
n! n=0 To get the reverse inequality, we observe that for every 2 ≤ k ≤ n,
1+

1 n n

≥2+

1
2!

1−

1 n + ··· +

+
1
k!

1
3!

1−

1−

1 n 1 n 1−

Fixing k and taking the limit as n → ∞, we get that k e≥ j=0 1
.
j!

1−

2 n 2 n ... 1 −

k−1 n .

4.11. * The irrationality of e

87

Then, taking the limit as k → ∞, we find that


e≥

1
,
n! n=0 which proves the result.
This series for e is very rapidly convergent. The next proposition gives an explicit error estimate.
Proposition 4.51. For every n ∈ N, n 1
1
<
.
k! n · n!

0 0 such that G ⊃ (x − δ, x + δ).
The entire set of real numbers R is obviously open, and the empty set ∅ is open since it satisfies the definition vacuously (there is no x ∈ ∅).
Example 5.2. The open interval I = (0, 1) is open. If x ∈ I, then
I ⊃ (x − δ, x + δ) ,

δ = min

x 1−x
,
2
2

> 0.

Similarly, every finite or infinite open interval (a, b), (−∞, b), or (a, ∞) is open.
Example 5.3. The half-open interval J = (0, 1] isn’t open, since 1 ∈ J and
(1 − δ, 1 + δ) isn’t a subset of J for any δ > 0, however small.
The next proposition states a characteristic property of open sets.
Proposition 5.4. An arbitrary union of open sets is open, and a finite intersection of open sets is open.
89

90

5. Topology of the Real Numbers

Proof. Suppose that {Ai ⊂ R : i ∈ I} is an arbitrary collection of open sets. If x ∈ i∈I Ai , then x ∈ Ai for some i ∈ I. Since Ai is open, there is δ > 0 such that
Ai ⊃ (x − δ, x + δ), and therefore
Ai ⊃ (x − δ, x + δ), i∈I which proves that

i∈I

Ai is open.

Suppose that {Ai ⊂ R : i = 1, 2, . . . , n} is a finite collection of open sets. If n x ∈ i=1 Ai , then x ∈ Ai for every 1 ≤ i ≤ n. Since Ai is open, there is δi > 0 such that Ai ⊃ (x − δi , x + δi ). Let δ = min{δ1 , δ2 , . . . , δn } > 0.
Then we see that

n

Ai ⊃ (x − δ, x + δ), i=1 which proves that

n i=1 Ai is open.

The previous proof fails for an infinite intersection of open sets, since we may have δi > 0 for every i ∈ N but inf{δi : i ∈ N} = 0.
Example 5.5. The interval
In = is open for every n ∈ N, but

1 1
− , n n



In = {0} n=1 is not open.
In fact, every open set in R is a countable union of disjoint open intervals, but we won’t prove it here.
5.1.1. Neighborhoods. Next, we introduce the notion of the neighborhood of a point, which often gives clearer, but equivalent, descriptions of topological concepts than ones that use open intervals.
Definition 5.6. A set U ⊂ R is a neighborhood of a point x ∈ R if
U ⊃ (x − δ, x + δ) for some δ > 0. The open interval (x − δ, x + δ) is called a δ-neighborhood of x.
A neighborhood of x needn’t be an open interval about x, it just has to contain one. Some people require than a neighborhood is also an open set, but we don’t; we’ll specify that a neighborhood is open if it’s needed.
Example 5.7. If a < x < b, then the closed interval [a, b] is a neighborhood of x, since it contains the interval (x − δ, x + δ) for sufficiently small δ > 0. On the other hand, [a, b] is not a neighborhood of the endpoints a, b since no open interval about a or b is contained in [a, b].
We can restate the definition of open sets in terms of neighborhoods as follows.

5.1. Open sets

91

Definition 5.8. A set G ⊂ R is open if every x ∈ G has a neighborhood U such that G ⊃ U .
In particular, an open set is itself a neighborhood of each of its points.
We can restate Definition 3.10 for the limit of a sequence in terms of neighborhoods as follows.
Proposition 5.9. A sequence (xn ) of real numbers converges to a limit x ∈ R if and only if for every neighborhood U of x there exists N ∈ N such that xn ∈ U for all n > N .
Proof. First suppose the condition in the proposition holds. Given > 0, let
U = (x − , x + ) be an -neighborhood of x. Then there exists N ∈ N such that xn ∈ U for all n > N , which means that |xn − x| < . Thus, xn → x as n → ∞.
Conversely, suppose that xn → x as n → ∞, and let U be a neighborhood of x. Then there exists > 0 such that U ⊃ (x − , x + ). Choose N ∈ N such that |xn − x| < for all n > N . Then xn ∈ U for all n > N , which proves the condition. 5.1.2. Relatively open sets. We define relatively open sets by restricting open sets in R to a subset.
Definition 5.10. If A ⊂ R then B ⊂ A is relatively open in A, or open in A, if
B = A ∩ G where G is open in R.
Example 5.11. Let A = [0, 1]. Then the half-open intervals (a, 1] and [0, b) are open in A for every 0 ≤ a < 1 and 0 < b ≤ 1, since
(a, 1] = [0, 1] ∩ (a, 2),

[0, b) = [0, 1] ∩ (−1, b)

and (a, 2), (−1, b) are open in R. By contrast, neither (a, 1] nor [0, b) is open in R.
The neighborhood definition of open sets generalizes to relatively open sets.
First, we define relative neighborhoods in the obvious way.
Definition 5.12. If A ⊂ R then a relative neighborhood in A of a point x ∈ A is a set V = A ∩ U where U is a neighborhood of x in R.
As we show next, a set is relatively open if and only if it contains a relative neighborhood of every point.
Proposition 5.13. A set B ⊂ A is relatively open in A if and only if every x ∈ B has a relative neighborhood V in A such that B ⊃ V .
Proof. Assume that B is open in A. Then B = A ∩ G where G is open in R. If x ∈ B, then x ∈ G, and since G is open, there is a neighborhood U of x in R such that G ⊃ U . Then V = A ∩ U is a relative neighborhood of x with B ⊃ V .
Conversely, assume that every point x ∈ B has a relative neighborhood Vx =
A ∩ Ux in A such that Vx ⊂ B, where Ux is a neighborhood of x in R. Since Ux is a neighborhood of x, it contains an open neighborhood Gx ⊂ Ux . We claim that that B = A ∩ G where
G=
Gx . x∈B 92

5. Topology of the Real Numbers

It then follows that G is open, since it’s a union of open sets, and therefore B = A∩G is relatively open in A.
To prove the claim, we show that B ⊂ A ∩ G and B ⊃ A ∩ G. First, B ⊂ A ∩ G since x ∈ A ∩ Gx ⊂ A ∩ G for every x ∈ B. Second, A ∩ Gx ⊂ A ∩ Ux ⊂ B for every x ∈ B. Taking the union over x ∈ B, we get that A ∩ G ⊂ B.

5.2. Closed sets
Sets are not doors. (Attributed to James Munkres.)
Closed sets are defined topologically as complements of open sets.
Definition 5.14. A set F ⊂ R is closed if F c = {x ∈ R : x ∈ F } is open.
/
Example 5.15. The closed interval I = [0, 1] is closed since
I c = (−∞, 0) ∪ (1, ∞) is a union of open intervals, and therefore it’s open. Similarly, every finite or infinite closed interval [a, b], (−∞, b], or [a, ∞) is closed.
The empty set ∅ and R are both open and closed; they’re the only such sets.
Most subsets of R are neither open nor closed (so, unlike doors, “not open” doesn’t mean “closed” and “not closed” doesn’t mean “open”).
Example 5.16. The half-open interval I = (0, 1] isn’t open because it doesn’t contain any neighborhood of the right endpoint 1 ∈ I. Its complement
I c = (∞, 0] ∪ (1, ∞) isn’t open either, since it doesn’t contain any neighborhood of 0 ∈ I c . Thus, I isn’t closed either.
Example 5.17. The set of rational numbers Q ⊂ R is neither open nor closed.
It isn’t open because every neighborhood of a rational number contains irrational numbers, and its complement isn’t open because every neighborhood of an irrational number contains rational numbers.
Closed sets can also be characterized in terms of sequences.
Proposition 5.18. A set F ⊂ R is closed if and only if the limit of every convergent sequence in F belongs to F .
Proof. First suppose that F is closed and (xn ) is a convergent sequence of points xn ∈ F such that xn → x. Then every neighborhood of x contains points xn ∈ F .
It follows that x ∈ F c , since F c is open and every y ∈ F c has a neighborhood
/
U ⊂ F c that contains no points in F . Therefore, x ∈ F .
Conversely, suppose that the limit of every convergent sequence of points in F belongs to F . Let x ∈ F c . Then x must have a neighborhood U ⊂ F c ; otherwise for every n ∈ N there exists xn ∈ F such that xn ∈ (x − 1/n, x + 1/n), so x = lim xn , and x is the limit of a sequence in F . Thus, F c is open and F is closed.

5.2. Closed sets

93

Example 5.19. To verify that the closed interval [0, 1] is closed from Proposition 5.18, suppose that (xn ) is a convergent sequence in [0, 1]. Then 0 ≤ xn ≤ 1 for all n ∈ N, and since limits preserve (non-strict) inequalities, we have
0 ≤ lim xn ≤ 1, n→∞ meaning that the limit belongs to [0, 1]. On the other hand, the half-open interval
I = (0, 1] isn’t closed since, for example, (1/n) is a convergent sequence in I whose limit 0 doesn’t belong to I.
Closed sets have complementary properties to those of open sets stated in
Proposition 5.4.
Proposition 5.20. An arbitrary intersection of closed sets is closed, and a finite union of closed sets is closed.
Proof. If {Fi : i ∈ I} is an arbitrary collection of closed sets, then every Fic is open. By De Morgan’s laws in Proposition 1.23, we have c Fi

Fic ,

=

i∈I

i∈I

which is open by Proposition 5.4. Thus i∈I Fi is closed. Similarly, the complement of a finite union of closed sets is open, since c n

Fi i=1 n

Fic ,

= i=1 so a finite union of closed sets is closed.
The union of infinitely many closed sets needn’t be closed.
Example 5.21. If In is the closed interval
1
1
,1 −
,
In = n n then the union of the In is an open interval


In = (0, 1). n=1 If A is a subset of R, it is useful to consider different ways in which a point x ∈ R can belong to A or be “close” to A.
Definition 5.22. Let A ⊂ R be a subset of R. Then x ∈ R is:
(1) an interior point of A if there exists δ > 0 such that A ⊃ (x − δ, x + δ);
(2) an isolated point of A if x ∈ A and there exists δ > 0 such that x is the only point in A that belongs to the interval (x − δ, x + δ);
(3) a boundary point of A if for every δ > 0 the interval (x − δ, x + δ) contains points in A and points not in A;
(4) an accumulation point of A if for every δ > 0 the interval (x−δ, x+δ) contains a point in A that is distinct from x.

94

5. Topology of the Real Numbers

When the set A is understood from the context, we refer, for example, to an
“interior point.”
Interior and isolated points of a set belong to the set, whereas boundary and accumulation points may or may not belong to the set. In the definition of a boundary point x, we allow the possibility that x itself is a point in A belonging to (x − δ, x + δ), but in the definition of an accumulation point, we consider only points in A belonging to (x − δ, x + δ) that are distinct from x. Thus an isolated point is a boundary point, but it isn’t an accumulation point. Accumulation points are also called cluster points or limit points.
We illustrate these definitions with a number of examples.
Example 5.23. Let I = (a, b) be an open interval and J = [a, b] a closed interval.
Then the set of interior points of I or J is (a, b), and the set of boundary points consists of the two endpoints {a, b}. The set of accumulation points of I or J is the closed interval [a, b] and I, J have no isolated points. Thus, I, J have the same interior, isolated, boundary and accumulation points, but J contains its boundary points and all of its accumulation points, while I does not.
Example 5.24. Let a < c < b and suppose that
A = (a, c) ∪ (c, b) is an open interval punctured at c. Then the set of interior points is A, the set of boundary points is {a, b, c}, the set of accumulation points is the closed interval
[a, b], and there are no isolated points.
Example 5.25. Let
1
:n∈N . n Then every point of A is an isolated point, since a sufficiently small interval about
1/n doesn’t contain 1/m for any integer m = n, and A has no interior points. The set of boundary points of A is A ∪ {0}. The point 0 ∈ A is the only accumulation
/
point of A, since every open interval about 0 contains 1/n for sufficiently large n.
A=

Example 5.26. The set N of natural numbers has no interior or accumulation points. Every point of N is both a boundary point and an isolated point.
Example 5.27. The set Q of rational numbers has no interior or isolated points, and every real number is both a boundary and accumulation point of Q.
Example 5.28. The Cantor set C defined in Section 5.5 below has no interior points and no isolated points. The set of accumulation points and the set of boundary points of C is equal to C.
The following proposition gives a sequential definition of an accumulation point.
Proposition 5.29. A point x ∈ R is an accumulation point of A ⊂ R if and only if there is a sequence (xn ) in A with xn = x for every n ∈ N such that xn → x as n → ∞.
Proof. Suppose x ∈ R is an accumulation point of A. Definition 5.22 implies that for every n ∈ N there exists xn ∈ A \ {x} such that xn ∈ (x − 1/n, x + 1/n). It follows that xn → x as n → ∞.

5.3. Compact sets

95

Conversely, if x is the limit of a sequence (xn ) in A with xn = x, and U is a neighborhood of x, then xn ∈ U \ {x} for sufficiently large n ∈ N, which proves that x is an accumulation point of A.
Example 5.30. If
1
:n∈N , n then 0 is an accumulation point of A, since (1/n) is a sequence in A such that
1/n → 0 as n → ∞. On the other hand, 1 is not an accumulation point of A since the only sequences in A that converges to 1 are the ones whose terms eventually equal 1, and the terms are required to be distinct from 1.
A=

We can also characterize open and closed sets in terms of their interior and accumulation points.
Proposition 5.31. A set A ⊂ R is:
(1) open if and only if every point of A is an interior point;
(2) closed if and only if every accumulation point belongs to A.
Proof. If A is open, then it is an immediate consequence of the definitions that every point in A is an interior point. Conversely, if every point x ∈ A is an interior point, then there is an open neighborhood Ux ⊂ A of x, so
A=

Ux x∈A is a union of open sets, and therefore A is open.
If A is closed and x is an accumulation point, then Proposition 5.29 and Proposition 5.18 imply that x ∈ A. Conversely, if every accumulation point of A belongs to A, then every x ∈ Ac has a neighborhood with no points in A, so Ac is open and
A is closed.

5.3. Compact sets
The significance of compact sets is not as immediately apparent as the significance of open sets, but the notion of compactness plays a central role in analysis. One indication of its importance already appears in the Bolzano-Weierstrass theorem
(Theorem 3.57).
Compact sets may be characterized in many different ways, and we will give the two most important definitions. One is based on sequences (every sequence has a convergent subsequence), and the other is based on open sets (every open cover has a finite subcover).
We will prove that a subset of R is compact if and only if it is closed and bounded. For example, every closed, bounded interval [a, b] is compact. There are, however, many other compact subsets of R. In Section 5.5 we describe a particularly interesting example called the Cantor set.
We emphasize that although the compact sets in R are exactly the closed and bounded sets, this isn’t their fundamental definition; rather it’s an explicit description of what compact sets look like in R. In more general spaces than R, closed and

96

5. Topology of the Real Numbers

bounded sets need not be compact, and it’s the properties defining compactness that are the crucial ones. Chapter 13 has further explanation.
5.3.1. Sequential definition. Intuitively, a compact set confines every sequence of points in the set so much that the sequence must accumulate at some point of the set. This implies that a subsequence converges to an accumulation point and leads to the following definition.
Definition 5.32. A set K ⊂ R is sequentially compact if every sequence in K has a convergent subsequence whose limit belongs to K.
Note that we require that the subsequence converges to a point in K, not to a point outside K.
We usually abbreviate “sequentially compact” to “compact,” but sometimes we need to distinguish explicitly between the sequential definition of compactness given above and the topological definition given in Definition 5.52 below.
Example 5.33. The open interval I = (0, 1) is not compact. The sequence (1/n) in I converges to 0, so every subsequence also converges to 0 ∈ I. Therefore, (1/n)
/
has no convergent subsequence whose limit belongs to I.
Example 5.34. The set A = Q ∩ [0, 1] of rational numbers in [0, 1] is not compact.

If (rn ) is a sequence of rational numbers 0 ≤ rn ≤ 1 that converges to 1/ 2, then

every subsequence also converges to 1/ 2 ∈ A, so (rn ) has no subsequence that
/
converges to a point in A.
Example 5.35. The set N is closed, but it is not compact. The sequence (n) in N has no convergent subsequence since every subsequence diverges to infinity.
As these examples illustrate, a compact set must be closed and bounded. Conversely, the Bolzano-Weierstrass theorem implies that that every closed, bounded subset of R is compact. This fact may be taken as an alternative statement of the theorem. Theorem 5.36 (Bolzano-Weierstrass). A subset of R is sequentially compact if and only if it is closed and bounded.
Proof. First, assume that K ⊂ R is sequentially compact. Let (xn ) be a sequence in K that converges to x ∈ R. Then every subsequence of K also converges to x, so the compactness of K implies that x ∈ K. It follows from Proposition 5.18 that
K is closed. Next, suppose for contradiction that K is unbounded. Then there is a sequence (xn ) in K such that |xn | → ∞ as n → ∞. Every subsequence of (xn ) is also unbounded and therefore diverges, so (xn ) has no convergent subsequence.
This contradicts the assumption that K is sequentially compact, so K is bounded.
Conversely, assume that K ⊂ R is closed and bounded. Let (xn ) be a sequence in K. Then (xn ) is bounded since K is bounded, and Theorem 3.57 implies that
(xn ) has a convergent subsequence. Since K is closed the limit of this subsequence belongs to K, so K is sequentially compact.
Example 5.37. Every closed, bounded interval [a, b] is compact.

5.3. Compact sets

97

Example 5.38. Let
1
:n∈N . n Then A is not compact, since it isn’t closed. However, the set K = A ∪ {0} is closed and bounded, so it is compact.
A=

Example 5.39. The Cantor set defined in Section 5.5 is compact.
For later use, we prove a useful property of compact sets in R which follows from Theorem 5.36.
Proposition 5.40. If K ⊂ R is compact, then K has a maximum and minimum.
Proof. Since K is compact it is bounded and therefore it has a (finite) supremum
M = sup K. From the definition of the supremum, for every n ∈ N there exists xn ∈ K such that
1
M − < xn ≤ M. n It follows from the ‘squeeze’ theorem that xn → M as n → ∞. Since K is closed,
M ∈ K, which proves that K has a maximum. A similar argument shows that m = inf K belongs to K, so K has a minimum.
Example 5.41. The bounded closed interval [0, 1] is compact and its maximum 1 and minimum 0 belong to the set, while the open interval (0, 1) is not compact and its supremum 1 and infimum 0 do not belong to the set. The unbounded, closed interval [0, ∞) is not compact, and it has no maximum.
Example 5.42. The set A in Example 5.38 is not compact and its infimum 0 does not belong to the set, but the compact set K has 0 as a minimum value.
Compact sets have the following nonempty intersection property.
Theorem 5.43. Let {Kn : n ∈ N} be a decreasing sequence of nonempty compact sets of real numbers, meaning that
K1 ⊃ K2 ⊃ · · · ⊃ Kn ⊃ Kn+1 ⊃ . . . , and Kn = ∅. Then



Kn = ∅. n=1 Moreover, if diam Kn → 0 as n → ∞, then the intersection consists of a single point. Proof. For each n ∈ N, choose xn ∈ Kn . Since (xn ) is a sequence in the compact set K1 , it has a convergent subsequence (xnk ) with xnk → x as k → ∞. Then xnk ∈ Kn for all k sufficiently large that nk ≥ n. Since a “tail” of the subsequence belongs to Kn and Kn is closed, we have x ∈ Kn for every n ∈ N. Hence, x ∈ Kn , and the intersection is nonempty.
If x, y ∈ Kn , then x, y ∈ Kn for every n ∈ N, so |x − y| ≤ diam Kn . If diam Kn → 0 as n → ∞, then |x − y| = 0, so x = y and Kn consists of a single point. 98

5. Topology of the Real Numbers

We refer to a decreasing sequence of sets as a nested sequence. In the case when each Kn = [an , bn ] is a compact interval, the preceding result is called the nested interval theorem.
Example 5.44. The nested compact intervals [0, 1 + 1/n] have nonempty intersection [0, 1]. Here, diam[0, 1 + 1/n] → 1 as n → ∞, and the intersection consists of an interval. The nested compact intervals [0, 1/n] have nonempty intersection
{0}, which consists of a single point since diam[0, 1/n] → 0 as n → ∞. On the other hand, the nested half-open intervals (0, 1/n] have empty intersection, as do the nested unbounded, closed intervals [n, ∞). In particular, Theorem 5.43 doesn’t hold if we replace “compact” by “closed.”
Example 5.45. Define a nested sequence A1 ⊃ A2 ⊃ . . . of non-compact sets by
1
: k = n, n + 1, n + 2, . . . k An =

,

so A1 = A where A is the set considered in Example 5.38. Then


An = ∅. n=1 If we add 0 to the An to make them compact and define Kn = An ∪ {0}, then the intersection ∞

Kn = {0} n=1 is nonempty.
5.3.2. Topological definition. To give a topological definition of compactness in terms of open sets, we introduce the notion of an open cover of a set.
Definition 5.46. Let A ⊂ R. A cover of A is a collection of sets {Ai ⊂ R : i ∈ I} whose union contains A,
Ai ⊃ A. i∈I An open cover of A is a cover such that Ai is open for every i ∈ I.
Example 5.47. Let Ai = (1/i, 2). Then C = {Ai : i ∈ N} is an open cover of
(0, 1], since


i=1

1
,2
i

= (0, 2) ⊃ (0, 1].

On the other hand, C is not a cover of [0, 1] since its union does not contain 0. If, for any δ > 0, we add the interval B = (−δ, δ) to C, then


i=1

1
, 2 ∪ B = (−δ, 2) ⊃ [0, 1], i so C = C ∪ {B} is an open cover of [0, 1].

5.3. Compact sets

99

Example 5.48. If Ai = (i − 1, i + 1), then {Ai : i ∈ Z} is an open cover of R.
On the other hand, if Bi = (i, i + 1), then {Bi : i ∈ Z} is not open cover of R, since its union doesn’t contain any of the integers. Finally, if Ci = [i, i + 1), then
{Ci : i ∈ Z} is a cover of R by disjoint, half-open intervals, but it isn’t an open cover. Thus, to get an open cover, we need the intervals to “overlap”.
Example 5.49. Let {ri : i ∈ N} be an enumeration of the rational numbers ri ∈ [0, 1], and fix > 0. Define Ai = (ri − , ri + ). Then {Ai : i ∈ N} is an open cover of [0, 1] since every irrational number x ∈ [0, 1] can be approximated to within by some rational number. Similarly, if I = [0, 1] \ Q denotes the set of irrational numbers in [0, 1], then {(x − , x + ) : x ∈ I} is an open cover of [0, 1].
In this case, the cover consists of uncountably many sets.
Next, we define subcovers.
Definition 5.50. Suppose that C = {Ai ⊂ R : i ∈ I} is a cover of A ⊂ R. A subcover S of C is a sub-collection S ⊂ C that covers A, meaning that
S = {Aik ∈ C : k ∈ J},

Aik ⊃ A. k∈J A finite subcover is a subcover {Ai1 , Ai2 , . . . , Ain } that consists of finitely many sets. Example 5.51. Consider the cover C = {Ai : i ∈ N} of (0, 1] in Example 5.47, where Ai = (1/i, 2). Then {A2j : j ∈ N} is a subcover. There is, however, no finite subcover {Ai1 , Ai2 , . . . , Ain } since if N = max{i1 , i2 , . . . , in } then n Aki = k=1 1
,2 ,
N

which does not contain the points x ∈ (0, 1) with 0 < x ≤ 1/N . On the other hand, the cover C = C ∪ {(−δ, δ)} of [0, 1] does have a finite subcover. For example, if
N ∈ N is such that 1/N < δ, then
1
, 2 , (−δ, δ)
N
is a finite subcover of [0, 1] consisting of two sets (whose union is the same as the original cover).
Having introduced this terminology, we give a topological definition of compact sets. Definition 5.52. A set K ⊂ R is compact if every open cover of K has a finite subcover. First, we illustrate Definition 5.52 with several examples.
Example 5.53. The collection of open intervals
{Ai : i ∈ N} ,

Ai = (i − 1, i + 1)

is an open cover of the natural numbers N, since


Ai = (0, ∞) ⊃ N. i=1 100

5. Topology of the Real Numbers

However, no finite sub-collection
{Ai1 , Ai2 , . . . , Ain } covers N, since if N = max{i1 , i2 , . . . , in }, then n Aik ⊂ (0, N + 1) k=1 so its union does not contain sufficiently large integers with n ≥ N + 1. Thus, N is not compact.
Example 5.54. Consider the open intervals
1
1
1
1
− i+1 , i + i+1 i 2
2
2
2

Ai =

,

which get smaller as they get closer to 0. Then {Ai : i = 0, 1, 2, . . . } is an open cover of the open interval (0, 1); in fact


Ai = i=0 0,

3
2

⊃ (0, 1).

However, no finite sub-collection
{Ai1 , Ai2 , . . . , Ain } of intervals covers (0, 1), since if N = max{i1 , i2 , . . . , in }, then n Aik ⊂ k=1 1
3
1
− N +1 ,
N
2
2
2

,

so it does not contain the points in (0, 1) that are sufficiently close to 0. Thus, (0, 1) is not compact. Example 5.51 gives another example of an open cover of (0, 1) with no finite subcover.
Example 5.55. The collection of open intervals {A0 , A1 , A2 , . . . } in Example 5.54 isn’t an open cover of the closed interval [0, 1] since 0 doesn’t belong to their union.
We can get an open cover {A0 , A1 , A2 , . . . , B} of [0, 1] by adding to the Ai an open interval B = (−δ, δ), where δ > 0 is arbitrarily small. In that case, if we choose n ∈ N sufficiently large that
1
1
− n+1 < δ,
2n
2 then {A0 , A1 , A2 , . . . , An , B} is a finite subcover of [0, 1] since n Ai ∪ B = i=0 −δ,

3
2

⊃ [0, 1].

Points sufficiently close to 0 belong to B, while points further away belong to Ai for some 0 ≤ i ≤ n. The open cover of [0, 1] in Example 5.51 is similar.
As the previous example suggests, and as follows from the next theorem, every open cover of [0, 1] has a finite subcover, and [0, 1] is compact.
Theorem 5.56 (Heine-Borel). A subset of R is compact if and only if it is closed and bounded.

5.3. Compact sets

101

Proof. The most important direction of the theorem is that a closed, bounded set is compact.
First, we prove that a closed, bounded interval K = [a, b] is compact. Suppose that C = {Ai : i ∈ I} is an open cover of [a, b], and let
B = {x ∈ [a, b] : [a, x] has a finite subcover S ⊂ C} .
We claim that sup B = b. The idea of the proof is that any open cover of [a, x] must cover a larger interval since the open set that contains x extends past x.
Since C covers [a, b], there exists a set Ai ∈ C with a ∈ Ai , so [a, a] = {a} has a subcover consisting of a single set, and a ∈ B. Thus, B is non-empty and bounded from above by b, so c = sup B ≤ b exists. Assume for contradiction that c < b.
Then [a, c] has a finite subcover
{Ai1 , Ai2 , . . . , Ain }, with c ∈ Aik for some 1 ≤ k ≤ n. Since Aik is open and a ≤ c < b, there exists δ > 0 such that [c, c + δ) ⊂ Aik ∩ [a, b]. Then {Ai1 , Ai2 , . . . , Ain } is a finite subcover of [a, x] for c < x < c + δ, contradicting the definition of c, so sup B = b. Moreover, the following argument shows that, in fact, b = max B.
Since C covers [a, b], there is an open set Ai0 ∈ C such that b ∈ Ai0 . Then
(b − δ, b + δ) ⊂ Ai0 for some δ > 0, and since sup B = b there exists c ∈ B such that b − δ < c ≤ b. Let {Ai1 , . . . , Ain } be a finite subcover of [a, c]. Then
{Ai0 , Ai1 , . . . , Ain } is a finite subcover of [a, b], which proves that [a, b] is compact.
Now suppose that K ⊂ R is a closed, bounded set, and let C = {Ai : i ∈ I} be an open cover of K. Since K is bounded, K ⊂ [a, b] for some closed bounded interval [a, b], and, since K is closed, C = C ∪ {K c } is an open cover of [a, b].
From what we have just proved, [a, b] has a finite subcover that is included in C .
Omitting K c from this subcover, if necessary, we get a finite subcover of K that is included in the original cover C.
To prove the converse, suppose that K ⊂ R is compact. Let Ai = (−i, i). Then


Ai = R ⊃ K, i=1 so {Ai : i ∈ N} is an open cover of K, which has a finite subcover {Ai1 , Ai2 , . . . , Ain }.
Let N = max{i1 , i2 , . . . , in }. Then n K⊂

Aik = (−N, N ), k=1 so K is bounded.
To prove that K is closed, we prove that K c is open. Suppose that x ∈ K c .
For i ∈ N, let
1
1
Ai = x − , x + i i

c

=

−∞, x −

1 i 1
∪ x + ,∞ . i Then {Ai : i ∈ N} is an open cover of K, since


Ai = (−∞, x) ∪ (x, ∞) ⊃ K. i=1 102

5. Topology of the Real Numbers

Since K is compact, there is a finite subcover {Ai1 , Ai2 , . . . , Ain }. max{i1 , i2 , . . . , in }. Then n K⊂

Ai k =

−∞, x −

k=1

1
N

∪ x+

Let N =

1
,∞ ,
N

which implies that (x − 1/N, x + 1/N ) ⊂ K c . This proves that K c is open and K is closed.
The following corollary is an immediate consequence of what we have proved.
Corollary 5.57. A subset of R is compact if and only if it is sequentially compact.
Proof. By Theorem 5.36 and Theorem 5.56, a subset of R is compact or sequentially compact if and only if it is closed and bounded.
Corollary 5.57 generalizes to an arbitrary metric space, where a set is compact if and only if it is sequentially compact, although a different proof is required. By contrast, Theorem 5.36 and Theorem 5.56 do not hold in an arbitrary metric space, where a closed, bounded set need not be compact.

5.4. Connected sets
A connected set is, roughly speaking, a set that cannot be divided into “separated” parts. The formal definition is as follows.
Definition 5.58. A set of real numbers A ⊂ R is disconnected if there are disjoint open sets U, V ⊂ R such that A ∩ U and A ∩ V are nonempty and
A = (A ∩ U ) ∪ (A ∩ V ).
A set is connected if it not disconnected.
The condition A = (A ∩ U ) ∪ (A ∩ V ) is equivalent to U ∪ V ⊃ A. If A is disconnected as in the definition, then we say that the open sets U , V separate A.
It is easy to give examples of disconnected sets. As the following examples illustrate, any set of real numbers that is “missing” a point is disconnected.
Example 5.59. The set {0, 1} consisting of two points is disconnected. For example, let U = (−1/2, 1/2) and V = (1/2, 3/2). Then U , V are open and U ∩ V = ∅.
Furthermore, A∩U = {0} and A∩V = {1} are nonempty, and A = (A∩U )∪(A∩V ).
Similarly, the union of half-open intervals [0, 1/2) ∪ (1/2, 1] is disconnected.
Example 5.60. The set R \ {0} is disconnected since R \ {0} = (−∞, 0) ∪ (0, ∞).
Example 5.61. The set Q of rational numbers is disconnected. For example, let


U = (−∞, 2) and V = ( 2, ∞). Then U , V are disjoint open sets, Q ∩ U and

Q ∩ V are nonempty, and U ∪ V = R \ { 2} ⊃ Q.
In general, it is harder to prove that a set is connected than disconnected, because one has to show that there is no way to separate it by open sets. However, the ordering properties of R enable us to characterize its connected sets: they are exactly the intervals.
First, we give a precise definition of an interval.

5.4. Connected sets

103

Definition 5.62. A set of real numbers I ⊂ R is an interval if x, y ∈ I and x < y implies that z ∈ I for every x < z < y.
That is, an interval is a set with the property that it contains all the points between any two points in the set.
We claim that, according to this definition, an interval contains all the points that lie between its infimum and supremum. The infimum and supremum may be finite or infinite, and they may or may not belong to the interval. Depending on which of these these possibilities occur, we see that an interval is any open, closed, half-open, bounded, or unbounded interval of the form
∅,

(a, b),

[a, b],

[a, b),

(a, b],

(a, ∞),

[a, ∞),

(−∞, b),

(−∞, b],

R,

where a, b ∈ R and a ≤ b. If a = b, then [a, a] = {a} is an interval that consists of a single point, which — like the empty set — satisfies the definition vacuously.
Thus, Definition 5.62 is consistent with the usual definition of an interval.
To prove the previous claim, suppose that I is an interval and let a = inf I, b = sup I where −∞ ≤ a, b ≤ ∞. If a > b, then I = ∅, and if a = b, then I consists of a single point {a}. Otherwise, −∞ ≤ a < b ≤ ∞. In that case, the definition of the infimum and supremum implies that for every a , b ∈ R with a < a < b < b, there exist x, y ∈ I such that a ≤ x < a and b < y ≤ b. Since I is an interval, it follows that I ⊃ [x, y] ⊃ [a , b ], and since a > a, b < b are arbitrary, it follows that
I ⊃ (a, b). Moreover, since a = inf I and b = sup I, the interval I cannot contain any points x ∈ R such that x < a or x > b.
The slightly tricky part of the following theorem is the proof that every interval is connected.
Theorem 5.63. A set of real numbers is connected if and only if it is an interval.
Proof. First, suppose that A ⊂ R is not an interval. Then there are a, b ∈ A and c ∈ A such that a < c < b. If U = (−∞, c) and V = (c, ∞), then a ∈ A ∩ U ,
/
b ∈ A ∩ V , and A = (A ∩ U ) ∪ (A ∩ V ), so A is disconnected. It follows that every connected set is an interval.
To prove the converse, suppose that I ⊂ R is not connected. We will show that
I is not an interval. Let U , V be open sets that separate I. Choose a ∈ I ∩ U and b ∈ I ∩ V , where we can assume without loss of generality that a < b. Let c = sup (U ∩ [a, b]) .
We will prove that a < c < b and c ∈ I, meaning that I is not an interval. If
/
a ≤ x < b and x ∈ U , then U ⊃ [x, x + δ) for some δ > 0, so x = sup (U ∩ [a, b]).
Thus, c = a and if a < c < b, then c ∈ U . If a < y ≤ b and y ∈ V , then
/
V ⊃ (y − δ, y] for some δ > 0, and therefore (y − δ, y] is disjoint from U , which implies that y = sup (U ∩ [a, b]). It follows that c = b and c ∈ U ∩ V , so a < c < b
/
and c ∈ I, which completes the proof.
/

104

5. Topology of the Real Numbers

Figure 1. An illustration of the removal of middle-thirds from an interval in the construction of the Cantor set. The figure shows the interval [0, 1] and the first four sets F1 , F2 , F3 , F4 , going from top to bottom.

5.5. * The Cantor set
One of the most interesting examples of a compact set is the Cantor set, which is obtained by “removing middle-thirds” from closed intervals in [0, 1], as illustrated in Figure 1.
We define a nested sequence (Fn ) of sets Fn ⊂ [0, 1] as follows. First, we remove the middle-third from [0, 1] to get F1 = [0, 1] \ (1/3, 2/3), or
F1 = I0 ∪ I1 ,

I0 = 0,

1
,
3

I1 =

2
,1 .
3

Next, we remove middle-thirds from I0 and I1 , which splits I0 \ (1/9, 2/9) into
I00 ∪ I01 and I1 \ (7/9, 8/9) into I10 ∪ I11 , to get
F2 = I00 ∪ I01 ∪ I10 ∪ I11 ,
I00 = 0,

1
,
9

I01 =

2 1
,
,
9 3

I10 =

2 7
,
,
3 9

I11 =

8
,1 .
9

Then we remove middle-thirds from I00 , I01 , I10 , and I11 to get
F3 = I000 ∪ I001 ∪ I010 ∪ I011 ∪ I100 ∪ I101 ∪ I110 ∪ I111 ,
1
, I010 =
27
2 19
,
, I101 =
=
3 27

I000 = 0,
I100

2 1
,
,
27 9
20 7
,
,
27 9

2 7
,
, I011 =
9 27
8 25
=
,
, I111 =
9 27

I010 =
I110

8 1
,
,
27 3
26
,1 .
27

5.5. * The Cantor set

105

Continuing in this way, we get at the nth stage a set of the form
Fn =

Is , s∈Σn where Σn = {(s1 , s2 , . . . , sn ) : sn = 0, 1} is the set of binary n-tuples. Furthermore, each Is = [as , bs ] is a closed interval, and if s = (s1 , s2 , . . . , sn ), then n as = k=1 2sk
,
3k

bs = as +

1
.
3n

In other words, the left endpoints as are the points in [0, 1] that have a finite base three expansion consisting entirely of 0’s and 2’s.
We can verify this formula for the endpoints as , bs of the intervals in Fn by induction. It holds when n = 1. Assume that it holds for some n ∈ N. If we remove the middle-third of length 1/3n+1 from the interval [as , bs ] of length 1/3n with s ∈ Σn , then we get the original left endpoint, which may be written as n+1 as = k=1 2sk
,
3k

where s = (s1 , . . . , sn+1 ) ∈ Σn+1 is given by sk = sk for k = 1, . . . , n and sn+1 = 0.
We also get a new left endpoint as = as + 2/3n+1 , which may be written as n+1 as = k=1 2sk
,
3k

where sk = sk for k = 1, . . . , n and sn+1 = 1. Moreover, bs = as + 1/3n+1 , which proves that the formula for the endpoints holds for n + 1.
Definition 5.64. The Cantor set C is the intersection


C=

Fn , n=1 of the nested sequence of sets (Fn ) defined above.
The compactness of C follows immediately from its definition.
Theorem 5.65. The Cantor set is compact.
Proof. The Cantor set C is bounded, since it is a subset of [0, 1]. All the sets
Fn are closed, because they are a finite union of closed intervals, so, from Proposition 5.20, their intersection is closed. It follows that C is closed and bounded, and
Theorem 5.36 implies that it is compact.
The Cantor set C is clearly nonempty since the endpoints as , bs of Is are contained in Fn for every finite binary sequence s and every n ∈ N. These endpoints form a countably infinite set. What may be initially surprising is that there are uncountably many other points in C that are not endpoints. For example, 1/4 has the infinite base three expansion 1/4 = 0.020202 . . . , so it is not one of the endpoints, but, as we will show, it belongs to C because it has a base three expansion consisting entirely of 0’s and 2’s.

106

5. Topology of the Real Numbers

Let Σ be the set of binary sequences in Definition 1.29. The idea of the next theorem is that each binary sequence picks out a unique point of the Cantor set by telling us whether to choose the left or the right interval at each stage of the “middle-thirds” construction. For example, 1/4 corresponds to the sequence
(0, 1, 0, 1, 0, 1, . . . ), and we get it by alternately choosing left and right intervals.
Theorem 5.66. The Cantor set has the same cardinality as Σ.
Proof. We use the same notation as above. Let s = (s1 , s2 , . . . , sk , . . . ) ∈ Σ, and define sn = (s1 , s2 , . . . , sn ) ∈ Σn . Then (Isn ) is a nested sequence of intervals such that diam Isn = 1/3n → 0 as n → ∞. Since each Isn is a compact interval,
Theorem 5.43 implies that there is a unique point


x∈

Isn ⊂ C. n=1 Thus, s → x defines a function f : Σ → C. Furthermore, this function is one-to-one: if two sequences differ in the nth place, say, then the corresponding points in C belong to different intervals Isn at the nth stage of the construction, and therefore the points are different since the intervals are disjoint.
Conversely, if x ∈ C, then x ∈ Fn for every n ∈ N and there is a unique sn ∈ Σn such that x ∈ Isn . The intervals (Isn ) are nested, so there is a unique sequence s = (s1 , s2 , . . . , sk , . . . ) ∈ Σ, such that sn = (s1 , s2 , . . . , sn ). It follows that f : Σ → C is onto, which proves the result.
The argument also shows that x ∈ C if and only if it is a limit of left endpoints as , meaning that

2sk
,
sk = 0, 1. x= 3k k=1 In other words, x ∈ C if and only if it has a base 3 expansion consisting entirely of
0’s and 2’s. Note that this condition does not exclude 1, which corresponds to the sequence (1, 1, 1, 1, . . . ) or “always pick the right interval,” and


1 = 0.2222 · · · = k=1 2
.
3k

We may use Theorem 5.66, together with the Schr¨der-Bernstein theorem, to o prove that Σ, P(N) and R have the same uncountable cardinality of the continuum.
It follows, in particular, that the Cantor set has the same cardinality as R, even though it appears, at first sight, to be a very sparse subset.
Theorem 5.67. The set R of real numbers has the same cardinality as P(N).
Proof. The inclusion map f : C → R, where f (x) = x, is one-to-one, so C
R.
From Theorem 5.66 and Corollary 1.48, we have C ≈ Σ ≈ P(N), so P(N) R.
Conversely, the map from real numbers to their Dedekind cuts, given by g : R → P(Q),

g : x → {r ∈ Q : r < x} ,

is one-to-one, so R
P(Q). Since Q is countably infinite, P(N) ≈ P(Q), so
R P(N). The conclusion then follows from Theorem 1.40.

5.5. * The Cantor set

107

Another proof of this theorem, which doesn’t require the Schr¨der-Bernstein o theorem, can be given by associating binary sequences in Σ with binary expansions of real numbers in [0, 1]:


h : (s1 , s2 , . . . , sk , . . . ) → k=1 sk
.
2k

Some real numbers, however, have two distinct binary expansion; e.g.,
1
= 0.10000 . . . = 0.01111 . . . .
2
There are only countably many such numbers, so they do not affect the cardinality of [0, 1], but they complicate the explicit construction of a one-to-one, onto map f : Σ → R by this approach. An alternative method is to represent real numbers by continued fractions instead of binary expansions, but we won’t describe these proofs in more detail here.

Chapter 6

Limits of Functions

In this chapter, we define limits of functions and describe their properties.

6.1. Limits
We begin with the -δ definition of the limit of a function.
Definition 6.1. Let f : A → R, where A ⊂ R, and suppose that c ∈ R is an accumulation point of A. Then lim f (x) = L

x→c

if for every

> 0 there exists a δ > 0 such that
0 < |x − c| < δ and x ∈ A implies that |f (x) − L| < .

We also denote limits by the ‘arrow’ notation f (x) → L as x → c, and often leave it to be implicitly understood that x ∈ A is restricted to the domain of f .
Note that it follows directly from the definition that lim f (x) = L if and only if

x→c

lim |f (x) − L| = 0.

x→c

In defining a limit as x → c, we do not consider what happens when x = c, and a function needn’t be defined at c for its limit to exist. This is the case, for example, when we define the derivative of a function as a limit of its difference quotients. Moreover, even if a function is defined at c and its limit as x → c exists, the value of the function need not equal the limit. In fact, the condition that limx→c f (x) = f (c) defines the continuity of f at c. We study continuous functions in Chapter 7.
Example 6.2. Let A = [0, ∞) \ {9} and define f : A → R by x−9 f (x) = √
.
x−3
109

110

6. Limits of Functions

We claim that lim f (x) = 6.

x→9


To prove this, let > 0 be given. If x ∈ A, then x − 3 = 0, and dividing this

factor into the numerator we get f (x) = x + 3. It follows that
|f (x) − 6| =



x−9
1
x−3 = √
≤ |x − 9|.
3
x+3

Thus, if δ = 3 , then x ∈ A and |x − 9| < δ implies that |f (x) − 6| < .
Like the limits of sequences, limits of functions are unique.
Proposition 6.3. The limit of a function is unique if it exists.
Proof. Suppose that f : A → R and c ∈ R is an accumulation point of A ⊂ R.
Assume that lim f (x) = L1 , lim f (x) = L2 x→c where L1 , L2 ∈ R. For every

x→c

> 0 there exist δ1 , δ2 > 0 such that

0 < |x − c| < δ1 and x ∈ A implies that |f (x) − L1 | < /2,
0 < |x − c| < δ2 and x ∈ A implies that |f (x) − L2 | < /2.
Let δ = min(δ1 , δ2 ) > 0. Then, since c is an accumulation point of A, there exists x ∈ A such that 0 < |x − c| < δ. It follows that
|L1 − L2 | ≤ |L1 − f (x)| + |f (x) − L2 | < .
Since this holds for arbitrary

> 0, we must have L1 = L2 .

Note that in this proof we used the requirement in the definition of a limit that c is an accumulation point of A. The limit definition would be vacuous if it was applied to a non-accumulation point, and in that case every L ∈ R would be a limit. We can rephrase the -δ definition of limits in terms of neighborhoods. Recall from Definition 5.6 that a set V ⊂ R is a neighborhood of c ∈ R if V ⊃ (c − δ, c + δ) for some δ > 0, and (c − δ, c + δ) is called a δ-neighborhood of c.
Definition 6.4. A set U ⊂ R is a punctured (or deleted) neighborhood of c ∈ R if U ⊃ (c − δ, c) ∪ (c, c + δ) for some δ > 0. The set (c − δ, c) ∪ (c, c + δ) is called a punctured (or deleted) δ-neighborhood of c.
That is, a punctured neighborhood of c is a neighborhood of c with the point c itself removed.
Definition 6.5. Let f : A → R, where A ⊂ R, and suppose that c ∈ R is an accumulation point of A. Then lim f (x) = L

x→c

if and only if for every neighborhood V of L, there is a punctured neighborhood U of c such that x ∈ A ∩ U implies that f (x) ∈ V .

6.1. Limits

111

This is essentially a rewording of the -δ definition. If Definition 6.1 holds and
V is a neighborhood of L, then V contains an -neighborhood of L, so there is a punctured δ-neighborhood U of c such that f maps U ∩ A into V , which verifies
Definition 6.5. Conversely, if Definition 6.5 holds and > 0, then V = (L − , L + ) is a neighborhood of L, so there is a punctured neighborhood U of c such that f maps U ∩ A into V , and U contains a punctured δ-neighborhood of c, which verifies
Definition 6.1.
The next theorem gives an equivalent sequential characterization of the limit.
Theorem 6.6. Let f : A → R, where A ⊂ R, and suppose that c ∈ R is an accumulation point of A. Then lim f (x) = L

x→c

if and only if lim f (xn ) = L.

n→∞

for every sequence (xn ) in A with xn = c for all n ∈ N such that lim xn = c.

n→∞

Proof. First assume that the limit exists and is equal to L. Suppose that (xn ) is any sequence in A with xn = c that converges to c, and let > 0 be given. From
Definition 6.1, there exists δ > 0 such that |f (x) − L| < whenever 0 < |x − c| < δ, and since xn → c there exists N ∈ N such that 0 < |xn − c| < δ for all n > N . It follows that |f (xn ) − L| < whenever n > N , so f (xn ) → L as n → ∞.
To prove the converse, assume that the limit does not exist or is not equal to
L. Then there is an 0 > 0 such that for every δ > 0 there is a point x ∈ A with
0 < |x − c| < δ but |f (x) − L| ≥ 0 . Therefore, for every n ∈ N there is an xn ∈ A such that
1
|f (xn ) − L| ≥ 0 .
0 < |xn − c| < , n It follows that xn = c and xn → c, but f (xn ) → L, so the sequential condition does not hold. This proves the result.
A non-existence proof for a limit directly from Definition 6.1 is often awkward.
(One has to show that for every L ∈ R there exists 0 > 0 such that for every δ > 0 there exists x ∈ A with 0 < |x − c| < δ and |f (x) − L| ≥ 0 .) The previous theorem gives a convenient way to show that a limit of a function does not exist.
Corollary 6.7. Suppose that f : A → R and c ∈ R is an accumulation point of A.
Then limx→c f (x) does not exist if either of the following conditions holds:
(1) There are sequences (xn ), (yn ) in A with xn , yn = c such that lim xn = lim yn = c,

n→∞

n→∞

but

lim f (xn ) = lim f (yn ).

n→∞

n→∞

(2) There is a sequence (xn ) in A with xn = c such that limn→∞ xn = c but the sequence (f (xn )) diverges.

112

6. Limits of Functions

1

0.5

0.5

0

0

y

1.5

1

y

1.5

−0.5

−0.5

−1

−1

−1.5
−3

−2

−1

0 x 1

2

−1.5
−0.1

3

−0.05

0 x 0.05

0.1

Figure 1. A plot of the function y = sin(1/x), with the hyperbola y = 1/x shown in red, and a detail near the origin.

Example 6.8. Define the sign function sgn : R → R by

1 if x > 0,

sgn x = 0 if x = 0,


−1 if x < 0,
Then the limit lim sgn x

x→0

doesn’t exist. To prove this, note that (1/n) is a non-zero sequence such that
1/n → 0 and sgn(1/n) → 1 as n → ∞, while (−1/n) is a non-zero sequence such that −1/n → 0 and sgn(−1/n) → −1 as n → ∞. Since the sequences of sgn-values have different limits, Corollary 6.7 implies that the limit does not exist.
Example 6.9. The limit
1
, x corresponding to the function f : R \ {0} → R given by f (x) = 1/x, doesn’t exist.
For example, if (xn ) is the non-zero sequence given by xn = 1/n, then 1/n → 0 but the sequence of values (n) diverges to ∞. lim x→0

Example 6.10. The limit
1
, x corresponding to the function f : R \ {0} → R given by f (x) = sin(1/x), doesn’t exist. (See Figure 1.) For example, the non-zero sequences (xn ), (yn ) defined by lim sin

x→0

xn =

1
,
2πn

yn =

1
2πn + π/2

both converge to zero as n → ∞, but the limits lim f (xn ) = 0,

n→∞

are different.

lim f (yn ) = 1

n→∞

6.1. Limits

113

Like sequences, functions must satisfy a boundedness condition if their limit is to exist. Before stating this condition, we define the supremum and infimum of a function, which are the supremum or infimum of its range.
Definition 6.11. If f : A → R is a real-valued function, then sup f = sup {f (x) : x ∈ A} ,
A

inf f = inf {f (x) : x ∈ A} .
A

A function is bounded if its range is bounded.
Definition 6.12. If f : A → R, then f is bounded from above if supA f is finite, bounded from below if inf A f is finite, and bounded if both are finite. A function that is not bounded is said to be unbounded.
Example 6.13. If f : [0, 2] → R is defined by f (x) = x2 , then sup f = 4,

inf f = 0,

[0,2]

[0,2]

so f is bounded.
Example 6.14. If f : (0, 1] → R is defined by f (x) = 1/x, then sup f = ∞,

inf f = 1,

(0,1]

(0,1]

so f is bounded from below, not bounded from above, and unbounded. Note that if we extend f to a function g : [0, 1] → R by defining, for example, g(x) =

1/x
0

if 0 < x ≤ 1, if x = 0,

then g is still unbounded on [0, 1].
Equivalently, a function f : A → R is bounded if supA |f | is finite, meaning that there exists M ≥ 0 such that
|f (x)| ≤ M for every x ∈ A.
If B ⊂ A, then we say that f is bounded from above on B if supB f is finite, with similar terminology for bounded from below on B, and bounded on B.
Example 6.15. The function f : (0, 1] → R defined by f (x) = 1/x is unbounded, but it is bounded on every interval [δ, 1] with 0 < δ < 1. The function g : R → R defined by g(x) = x2 is unbounded, but it is bounded on every finite interval [a, b].
We also introduce a notion of being bounded near a point.
Definition 6.16. Suppose that f : A → R and c is an accumulation point of A.
Then f is locally bounded at c if there is a neighborhood U of c such that f is bounded on A ∩ U .
Example 6.17. The function f : (0, 1] → R defined by f (x) = 1/x is locally bounded at every 0 < c ≤ 1, but it is not locally bounded at 0.
Proposition 6.18. Suppose that f : A → R and c is an accumulation point of A.
If limx→c f (x) exists, then f is locally bounded at c.

114

6. Limits of Functions

Proof. Let limx→c f (x) = L. Taking that there exists a δ > 0 such that

= 1 in the definition of the limit, we get

0 < |x − c| < δ and x ∈ A implies that |f (x) − L| < 1.
Let U = (c − δ, c + δ). If x ∈ A ∩ U and x = c, then
|f (x)| ≤ |f (x) − L| + |L| < 1 + |L|, so f is bounded on A ∩ U . (If c ∈ A, then |f | ≤ max{1 + |L|, |f (c)|} on A ∩ U .)
As for sequences, boundedness is a necessary but not sufficient condition for the existence of a limit.
Example 6.19. The limit
1
, x considered in Example 6.9 doesn’t exist because the function f : R \ {0} → R given by f (x) = 1/x is not locally bounded at 0. lim x→0

Example 6.20. The function f : R \ {0} → R defined by f (x) = sin

1 x is bounded, but limx→0 f (x) doesn’t exist.

6.2. Left, right, and infinite limits
We can define other kinds of limits in an obvious way. We list some of them here and give examples, whose proofs are left as an exercise. All these definitions can be combined in various ways and have obvious equivalent sequential characterizations.
Definition 6.21 (Right and left limits). Let f : A → R, where A ⊂ R. If c ∈ R is an accumulation point of {x ∈ A : x > c}, then f has the right limit lim f (x) = L,

x→c+

if for every

> 0 there exists a δ > 0 such that c < x < c + δ and x ∈ A implies that |f (x) − L| < .

If c ∈ R is an accumulation point of {x ∈ A : x < c}, then f has the left limit lim f (x) = L,

x→c−

if for every

> 0 there exists a δ > 0 such that c − δ < x < c and x ∈ A implies that |f (x) − L| < .

Equivalently, the right limit of f is the limit of the restriction f |A+ of f to the set A+ = {x ∈ A : x > c}, lim f (x) = lim f |A+ (x),

x→c+

and analogously for the left limit.

x→c

6.2. Left, right, and infinite limits

115

Example 6.22. For the sign function in Example 6.8, we have lim sgn x = −1,

lim sgn x = 1,

x→0−

x→0+

although the corresponding limit does not exist.
The existence and equality of the left and right limits implies the existence of the limit.
Proposition 6.23. Suppose that f : A → R, where A ⊂ R, and c ∈ R is an accumulation point of both {x ∈ A : x > c} and {x ∈ A : x < c}. Then lim f (x) = L

x→c

if and only if lim f (x) = lim f (x) = L. x→c− x→c+

Proof. It follows immediately from the definitions that the existence of the limit implies the existence of the left and right limits with the same value. Conversely, if both left and right limits exists and are equal to L, then given > 0, there exist δ1 > 0 and δ2 > 0 such that c − δ1 < x < c and x ∈ A implies that |f (x) − L| < , c < x < c + δ2 and x ∈ A implies that |f (x) − L| < .
Choosing δ = min(δ1 , δ2 ) > 0, we get that
|x − c| < δ and x ∈ A implies that |f (x) − L| < , which show that the limit exists.
Next we introduce some convenient definitions for various kinds of limits involving infinity. We emphasize that ∞ and −∞ are not real numbers (what is sin ∞, for example?) and all these definition have precise translations into statements that involve only real numbers.
Definition 6.24 (Limits as x → ±∞). Let f : A → R, where A ⊂ R. If A is not bounded from above, then lim f (x) = L

x→∞

if for every

> 0 there exists an M ∈ R such that x > M and x ∈ A implies that |f (x) − L| < .

If A is not bounded from below, then lim f (x) = L

x→−∞

if for every

> 0 there exists an m ∈ R such that x < m and x ∈ A implies that |f (x) − L| < .

116

6. Limits of Functions

Sometimes we write +∞ instead of ∞ to indicate that it denotes arbitrarily large, positive values, while −∞ denotes arbitrarily large, negative values.
It follows from the definitions that
1
lim f (x) = lim f
,
+ x→∞ t t→0 lim f (x) = lim f


x→−∞

t→0

1 t ,

and it is often useful to convert one of these limits into the other.
Example 6.25. We have lim √

x→∞

x
= 1,
1 + x2

lim √

x→−∞

x
= −1.
1 + x2

Definition 6.26 (Divergence to ±∞). Let f : A → R, where A ⊂ R, and suppose that c ∈ R is an accumulation point of A. Then lim f (x) = ∞

x→c

if for every M ∈ R there exists a δ > 0 such that
0 < |x − c| < δ and x ∈ A implies that f (x) > M , and lim f (x) = −∞

x→c

if for every m ∈ R there exists a δ > 0 such that
0 < |x − c| < δ and x ∈ A implies that f (x) < m.
The notation limx→c f (x) = ±∞ is simply shorthand for the property stated in this definition; it does not mean that the limit exists, and we say that f diverges to ±∞.
Example 6.27. We have lim x→0

1
= ∞, x2 lim

x→∞

1
= 0. x2 Example 6.28. We have
1
1
= ∞, lim = −∞. x x→0− x
How would you define these statements precisely? Note that lim x→0+

1
= ±∞, x→0 x since 1/x takes arbitrarily large positive (if x > 0) and negative (if x < 0) values in every two-sided neighborhood of 0. lim Example 6.29. None of the limits lim+ x→0

1 sin x

1 x ,

lim−

x→0

1 sin x

1 x ,

1 sin x→0 x lim 1 x is ∞ or −∞, since (1/x) sin(1/x) oscillates between arbitrarily large positive and negative values in every one-sided or two-sided neighborhood of 0.

6.3. Properties of limits

117

Example 6.30. We have lim x→∞

1
− x3 x = −∞,

lim

x→−∞

1
− x3 x = ∞.

How would you define these statements precisely and prove them?

6.3. Properties of limits
The properties of limits of functions follow from the corresponding properties of sequences and the sequential characterization of the limit in Theorem 6.6. We can also prove them directly from the -δ definition of the limit.
6.3.1. Order properties. As for limits of sequences, limits of functions preserve
(non-strict) inequalities.
Theorem 6.31. Suppose that f, g : A → R and c is an accumulation point of A.
If
f (x) ≤ g(x) for all x ∈ A, and limx→c f (x), limx→c g(x) exist, then lim f (x) ≤ lim g(x).

x→c

x→c

Proof. Let lim f (x) = L,

lim g(x) = M.

x→c

x→c

Suppose for contradiction that L > M , and let
1
= (L − M ) > 0.
2
From the definition of the limit, there exist δ1 , δ2 > 0 such that
|f (x) − L| <

if x ∈ A and 0 < |x − c| < δ1 ,

|g(x) − M | <

if x ∈ A and 0 < |x − c| < δ2 .

Let δ = min(δ1 , δ2 ). Since c is an accumulation point of A, there exists x ∈ A such that 0 < |x − a| < δ, and it follows that f (x) − g(x) = [f (x) − L] + L − M + [M − g(x)]
>L−M −2
> 0, which contradicts the assumption that f (x) ≤ g(x).
Finally, we state a useful “sandwich” or “squeeze” criterion for the existence of a limit.
Theorem 6.32. Suppose that f, g, h : A → R and c is an accumulation point of
A. If f (x) ≤ g(x) ≤ h(x) for all x ∈ A and lim f (x) = lim h(x) = L, x→c x→c

118

6. Limits of Functions

then the limit of g(x) as x → c exists and lim g(x) = L.

x→c

We leave the proof as an exercise. We often use this result, without comment, in the following way: If
0 ≤ f (x) ≤ g(x)

or |f (x)| ≤ g(x)

and g(x) → 0 as x → c, then f (x) → 0 as x → c.
It is essential for the bounding functions f , h in Theorem 6.32 to have the same limit. Example 6.33. We have
−1 ≤ sin

1 x ≤1

for all x = 0

and lim (−1) = −1,

lim 1 = 1,

x→0

x→0

but lim sin

x→0

1 x does not exist.

6.3.2. Algebraic properties. Limits of functions respect algebraic operations.
Theorem 6.34. Suppose that f, g : A → R, c is an accumulation point of A, and the limits lim f (x) = L, lim g(x) = M x→c x→c

exist. Then lim kf (x) = kL

x→c

for every k ∈ R,

lim [f (x) + g(x)] = L + M,

x→c

lim [f (x)g(x)] = LM,

x→c

lim

x→c

f (x)
L
= g(x) M

if M = 0.

Proof. We prove the results for sums and products from the definition of the limit, and leave the remaining proofs as an exercise. All of the results also follow from the corresponding results for sequences.
First, we consider the limit of f + g. Given

> 0, choose δ1 , δ2 such that

0 < |x − c| < δ1 and x ∈ A implies that |f (x) − L| < /2,
0 < |x − c| < δ2 and x ∈ A implies that |g(x) − M | < /2, and let δ = min(δ1 , δ2 ) > 0. Then 0 < |x − c| < δ implies that
|f (x) + g(x) − (L + M )| ≤ |f (x) − L| + |g(x) − M | < , which proves that lim(f + g) = lim f + lim g.
To prove the result for the limit of the product, first note that from the local boundedness of functions with a limit (Proposition 6.18) there exists δ0 > 0 and

6.3. Properties of limits

119

K > 0 such that |g(x)| ≤ K for all x ∈ A with 0 < |x − c| < δ0 . Choose δ1 , δ2 > 0 such that
0 < |x − c| < δ1 and x ∈ A implies that |f (x) − L| < /(2K),
0 < |x − c| < δ2 and x ∈ A implies that |g(x) − M | < /(2|L| + 1).
Let δ = min(δ0 , δ1 , δ2 ) > 0. Then for 0 < |x − c| < δ and x ∈ A,
|f (x)g(x) − LM | = |(f (x) − L) g(x) + L (g(x) − M )|
≤ |f (x) − L| |g(x)| + |L| |g(x) − M |
<

2K
< ,

· K + |L| ·

which proves that lim(f g) = lim f lim g.

2|L| + 1

Chapter 7

Continuous Functions

In this chapter, we define continuous functions and study their properties.

7.1. Continuity
Continuous functions are functions that take nearby values at nearby points.
Definition 7.1. Let f : A → R, where A ⊂ R, and suppose that c ∈ A. Then f is continuous at c if for every > 0 there exists a δ > 0 such that
|x − c| < δ and x ∈ A implies that |f (x) − f (c)| < .
A function f : A → R is continuous if it is continuous at every point of A, and it is continuous on B ⊂ A if it is continuous at every point in B.
The definition of continuity at a point may be stated in terms of neighborhoods as follows.
Definition 7.2. A function f : A → R, where A ⊂ R, is continuous at c ∈ A if for every neighborhood V of f (c) there is a neighborhood U of c such that x ∈ A ∩ U implies that f (x) ∈ V .
The -δ definition corresponds to the case when V is an -neighborhood of f (c) and U is a δ-neighborhood of c.
Note that c must belong to the domain A of f in order to define the continuity of f at c. If c is an isolated point of A, then the continuity condition holds automatically since, for sufficiently small δ > 0, the only point x ∈ A with |x − c| < δ is x = c, and then 0 = |f (x) − f (c)| < . Thus, a function is continuous at every isolated point of its domain, and isolated points are not of much interest.
If c ∈ A is an accumulation point of A, then the continuity of f at c is equivalent to the condition that lim f (x) = f (c), x→c meaning that the limit of f as x → c exists and is equal to the value of f at c.
121

122

7. Continuous Functions

Example 7.3. If f : (a, b) → R is defined on an open interval, then f is continuous on (a, b) if and only if lim f (x) = f (c)

for every a < c < b

x→c

since every point of (a, b) is an accumulation point.
Example 7.4. If f : [a, b] → R is defined on a closed, bounded interval, then f is continuous on [a, b] if and only if lim f (x) = f (c)

for every a < c < b,

x→c

lim f (x) = f (a),

lim f (x) = f (b).

x→b−

x→a+

Example 7.5. Suppose that
1
1 1
0, 1, , , . . . , , . . .
2 3 n A= and f : A → R is defined by

f (0) = y0 ,

f

1 n = yn

for some values y0 , yn ∈ R. Then 1/n is an isolated point of A for every n ∈ N, so f is continuous at 1/n for every choice of yn . The remaining point 0 ∈ A is an accumulation point of A, and the condition for f to be continuous at 0 is that lim yn = y0 .

n→∞

As for limits, we can give an equivalent sequential definition of continuity, which follows immediately from Theorem 6.6.
Theorem 7.6. If f : A → R and c ∈ A is an accumulation point of A, then f is continuous at c if and only if lim f (xn ) = f (c)

n→∞

for every sequence (xn ) in A such that xn → c as n → ∞.
In particular, f is discontinuous at c ∈ A if there is sequence (xn ) in the domain
A of f such that xn → c but f (xn ) → f (c).
Let’s consider some examples of continuous and discontinuous functions to illustrate the definition.

Example 7.7. The function f : [0, ∞) → R defined by f (x) = x is continuous on [0, ∞). To prove that f is continuous at c > 0, we note that for 0 ≤ x < ∞,
|f (x) − f (c)| =



x−



c = √

x−c
1
√ ≤ √ |x − c|, x+ c c √ so given > 0, we can choose δ = c > 0 in the definition of continuity. To prove that f is continuous at 0, we note that if 0 ≤ x < δ where δ = 2 > 0, then

|f (x) − f (0)| = x < .

7.1. Continuity

123

Example 7.8. The function sin : R → R is continuous on R. To prove this, we use the trigonometric identity for the difference of sines and the inequality | sin x| ≤ |x|:
| sin x − sin c| = 2 cos
≤ 2 sin

x+c
2
x−c
2

sin

x−c
2

≤ |x − c|.
It follows that we can take δ =

in the definition of continuity for every c ∈ R.

Example 7.9. The sign function sgn : R → R, defined by

1 if x > 0,

sgn x = 0 if x = 0,


−1 if x < 0, is not continuous at 0 since limx→0 sgn x does not exist (see Example 6.8). The left and right limits of sgn at 0, lim f (x) = −1,

x→0−

lim f (x) = 1,

x→0+

do exist, but they are unequal. We say that f has a jump discontinuity at 0.
Example 7.10. The function f : R → R defined by f (x) =

1/x
0

if x = 0, if x = 0,

is not continuous at 0 since limx→0 f (x) does not exist (see Example 6.9). The left and right limits of f at 0 do not exist either, and we say that f has an essential discontinuity at 0.
Example 7.11. The function f : R → R defined by f (x) =

sin(1/x) if x = 0,
0
if x = 0

is continuous at c = 0 (see Example 7.21 below) but discontinuous at 0 because limx→0 f (x) does not exist (see Example 6.10).
Example 7.12. The function f : R → R defined by f (x) =

x sin(1/x) if x = 0,
0
if x = 0

is continuous at every point of R. (See Figure 1.) The continuity at c = 0 is proved in Example 7.22 below. To prove continuity at 0, note that for x = 0,
|f (x) − f (0)| = |x sin(1/x)| ≤ |x|, so f (x) → f (0) as x → 0. If we had defined f (0) to be any value other than
0, then f would not be continuous at 0. In that case, f would have a removable discontinuity at 0.

124

7. Continuous Functions

1

0.4

0.8

0.3
0.2

0.6

0.1
0.4
0
0.2
−0.1
0

−0.2

−0.2

−0.4
−3

−0.3

−2

−1

0

1

2

3

−0.4
−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

Figure 1. A plot of the function y = x sin(1/x) and a detail near the origin with the lines y = ±x shown in red.

Example 7.13. The Dirichlet function f : R → R defined by f (x) =

1 if x ∈ Q,
0 if x ∈ Q
/

is discontinuous at every c ∈ R. If c ∈ Q, choose a sequence (xn ) of rational
/
numbers such that xn → c (possible since Q is dense in R). Then xn → c and f (xn ) → 1 but f (c) = 0. If c ∈ Q, choose a sequence (xn ) of irrational numbers such that xn → c; for example if c = p/q, we can take

2 p xn = +
,
q n √ since xn ∈ Q would imply that 2 ∈ Q. Then xn → c and f (xn ) → 0 but f (c) = 1.
Alternatively, by taking a rational sequence (xn ) and an irrational sequence (˜n ) x that converge to c, we can see that limx→c f (x) does not exist for any c ∈ R.
Example 7.14. The Thomae function f : R → R is defined by f (x) =

1/q
0

if x = p/q ∈ Q where p and q > 0 are relatively prime, if x ∈ Q or x = 0.
/

Figure 2 shows the graph of f on [0, 1]. The Thomae function is continuous at 0 and every irrational number and discontinuous at every nonzero rational number.
To prove this claim, first suppose that x = p/q ∈ Q \ {0} is rational and nonzero. Then f (x) = 1/q > 0, but for every δ > 0, the interval (x − δ, x + δ) contains irrational points y such that f (y) = 0 and |f (x) − f (y)| = 1/q. The definition of continuity therefore fails if 0 < ≤ 1/q, and f is discontinuous at x.
Second, suppose that x ∈ Q is irrational. Given > 0, choose n ∈ N such that
/
1/n < . There are finitely many rational numbers r = p/q in the interval (x −
1, x + 1) with p, q relatively prime and 1 ≤ q ≤ n; we list them as {r1 , r2 , . . . , rm }.
Choose
δ = min{|x − rk | : k = 1, 2, . . . , n}

7.2. Properties of continuous functions

125

Figure 2. A plot of the Thomae function in Example 7.14 on [0, 1]

to be the distance of x to the closest such rational number. Then δ > 0 since x ∈ Q. Furthermore, if |x − y| < δ, then either y is irrational and f (y) = 0, or
/
y = p/q in lowest terms with q > n and f (y) = 1/q < 1/n < . In either case,
|f (x) − f (y)| = |f (y)| < , which proves that f is continuous at x ∈ Q.
/
The continuity of f at 0 follows immediately from the inequality 0 ≤ f (x) ≤ |x| for all x ∈ R.
We give a rough classification of discontinuities of a function f : A → R at an accumulation point c ∈ A as follows.
(1) Removable discontinuity: limx→c f (x) = L exists but L = f (c), in which case we can make f continuous at c by redefining f (c) = L (see Example 7.12).
(2) Jump discontinuity: limx→c f (x) doesn’t exist, but both the left and right limits limx→c− f (x), limx→c+ f (x) exist and are different (see Example 7.9).
(3) Essential discontinuity: limx→c f (x) doesn’t exist and at least one of the left or right limits limx→c− f (x), limx→c+ f (x) doesn’t exist (see Examples 7.10,
7.11, 7.13).

7.2. Properties of continuous functions
The basic properties of continuous functions follow from those of limits.
Theorem 7.15. If f, g : A → R are continuous at c ∈ A and k ∈ R, then kf , f + g, and f g are continuous at c. Moreover, if g(c) = 0 then f /g is continuous at c.
Proof. This result follows immediately Theorem 6.34.
A polynomial function is a function P : R → R of the form
P (x) = a0 + a1 x + a2 x2 + · · · + an xn

126

7. Continuous Functions

where a0 , a1 , a2 , . . . , an are real coefficients. A rational function R is a ratio of polynomials P , Q
P (x)
R(x) =
.
Q(x)
The domain of R is the set of points in R such that Q = 0.
Corollary 7.16. Every polynomial function is continuous on R and every rational function is continuous on its domain.
Proof. The constant function f (x) = 1 and the identity function g(x) = x are continuous on R. Repeated application of Theorem 7.15 for scalar multiples, sums, and products implies that every polynomial is continuous on R. It also follows that a rational function R = P/Q is continuous at every point where Q = 0.
Example 7.17. The function f : R → R given by x + 3x3 + 5x5
1 + x2 + x4 is continuous on R since it is a rational function whose denominator never vanishes. f (x) =

In addition to forming sums, products and quotients, another way to build up more complicated functions from simpler functions is by composition. We recall that the composition g ◦ f of functions f , g is defined by (g ◦ f )(x) = g (f (x)). The next theorem states that the composition of continuous functions is continuous; note carefully the points at which we assume f and g are continuous.
Theorem 7.18. Let f : A → R and g : B → R where f (A) ⊂ B. If f is continuous at c ∈ A and g is continuous at f (c) ∈ B, then g ◦ f : A → R is continuous at c.
Proof. Let that > 0 be given. Since g is continuous at f (c), there exists η > 0 such

|y − f (c)| < η and y ∈ B implies that |g(y) − g (f (c))| < .
Next, since f is continuous at c, there exists δ > 0 such that
|x − c| < δ and x ∈ A implies that |f (x) − f (c)| < η.
Combing these inequalities, we get that
|x − c| < δ and x ∈ A implies that |g (f (x)) − g (f (c))| < , which proves that g ◦ f is continuous at c.
Corollary 7.19. Let f : A → R and g : B → R where f (A) ⊂ B. If f is continuous on A and g is continuous on f (A), then g ◦ f is continuous on A.
Example 7.20. The function f (x) =

1/ sin x if x = nπ for n ∈ Z,
0
if x = nπ for n ∈ Z

is continuous on R \ {nπ : n ∈ Z}, since it is the composition of x → sin x, which is continuous on R, and y → 1/y, which is continuous on R \ {0}, and sin x = 0 when x = nπ. It is discontinuous at x = nπ because it is not locally bounded at those points. 7.3. Uniform continuity

127

Example 7.21. The function f (x) =

sin(1/x) if x = 0,
0
if x = 0

is continuous on R \ {0}, since it is the composition of x → 1/x, which is continuous on R \ {0}, and y → sin y, which is continuous on R.
Example 7.22. The function f (x) =

x sin(1/x) if x = 0,
0
if x = 0.

is continuous on R \ {0} since it is a product of functions that are continuous on
R \ {0}. As shown in Example 7.12, f is also continuous at 0, so f is continuous on R.

7.3. Uniform continuity
Uniform continuity is a subtle but powerful strengthening of continuity.
Definition 7.23. Let f : A → R, where A ⊂ R. Then f is uniformly continuous on A if for every > 0 there exists a δ > 0 such that
|x − y| < δ and x, y ∈ A implies that |f (x) − f (y)| < .
The key point of this definition is that δ depends only on , not on x, y. A uniformly continuous function on A is continuous at every point of A, but the converse is not true.
To explain this point in more detail, note that if a function f is continuous on
A, then given > 0 and c ∈ A, there exists δ( , c) > 0 such that
|x − c| < δ( , c) and x ∈ A implies that |f (x) − f (c)| < .
If for some

0

> 0 we have inf δ( 0 , c) = 0

c∈A

however we choose δ( 0 , c) > 0 in the definition of continuity, then no δ0 ( 0 ) > 0 depending only on 0 works simultaneously for every c ∈ A. In that case, the function is continuous on A but not uniformly continuous.
Before giving some examples, we state a sequential condition for uniform continuity to fail.
Proposition 7.24. A function f : A → R is not uniformly continuous on A if and only if there exists 0 > 0 and sequences (xn ), (yn ) in A such that lim |xn − yn | = 0 and |f (xn ) − f (yn )| ≥

n→∞

0

for all n ∈ N.

Proof. If f is not uniformly continuous, then there exists 0 > 0 such that for every δ > 0 there are points x, y ∈ A with |x − y| < δ and |f (x) − f (y)| ≥ 0 . Choosing xn , yn ∈ A to be any such points for δ = 1/n, we get the required sequences.
Conversely, if the sequential condition holds, then for every δ > 0 there exists n ∈ N such that |xn − yn | < δ and |f (xn ) − f (yn )| ≥ 0 . It follows that the uniform

128

7. Continuous Functions

continuity condition in Definition 7.23 cannot hold for any δ > 0 if not uniformly continuous.

=

0,

so f is

Example 7.25. Example 7.8 shows that the sine function is uniformly continuous on R, since we can take δ = for every x, y ∈ R.
Example 7.26. Define f : [0, 1] → R by f (x) = x2 . Then f is uniformly continuous on [0, 1]. To prove this, note that for all x, y ∈ [0, 1] we have x2 − y 2 = |x + y| |x − y| ≤ 2|x − y|, so we can take δ = /2 in the definition of uniform continuity. Similarly, f (x) = x2 is uniformly continuous on any bounded set.
Example 7.27. The function f (x) = x2 is continuous but not uniformly continuous on R. We have already proved that f is continuous on R (it’s a polynomial). To prove that f is not uniformly continuous, let xn = n,

yn = n +

1
.
n

Then lim |xn − yn | = lim

n→∞

n→∞

1
= 0, n but
|f (xn ) − f (yn )| =

n+

1 n 2

1
≥2
n2

− n2 = 2 +

for every n ∈ N.

It follows from Proposition 7.24 that f is not uniformly continuous on R. The problem here is that in order to prove the continuity of f at c, given > 0 we need to make δ( , c) smaller as c gets larger, and δ( , c) → 0 as c → ∞.
Example 7.28. The function f : (0, 1] → R defined by
1
x is continuous but not uniformly continuous on (0, 1]. It is continuous on (0, 1] since it’s a rational function whose denominator x is nonzero in (0, 1]. To prove that f is not uniformly continuous, we define xn , yn ∈ (0, 1] for n ∈ N by f (x) =

xn =

1
,
n

yn =

1
.
n+1

Then |xn − yn | → 0 as n → ∞, but
|f (xn ) − f (yn )| = (n + 1) − n = 1

for every n ∈ N.

It follows from Proposition 7.24 that f is not uniformly continuous on (0, 1]. The problem here is that given > 0, we need to make δ( , c) smaller as c gets closer to
0, and δ( , c) → 0 as c → 0+ .
The non-uniformly continuous functions in the last two examples were unbounded. However, even bounded continuous functions can fail to be uniformly continuous if they oscillate arbitrarily quickly.

7.4. Continuous functions and open sets

129

Example 7.29. Define f : (0, 1] → R by f (x) = sin

1 x Then f is continuous on (0, 1] but it isn’t uniformly continuous on (0, 1]. To prove this, define xn , yn ∈ (0, 1] for n ∈ N by xn =

1
,
2nπ

yn =

1
.
2nπ + π/2

Then |xn − yn | → 0 as n → ∞, but
|f (xn ) − f (yn )| = sin 2nπ +

π
− sin 2nπ = 1
2

for all n ∈ N.

It isn’t a coincidence that these examples of non-uniformly continuous functions have domains that are either unbounded or not closed. We will prove in Section 7.5 that a continuous function on a compact set is uniformly continuous.

7.4. Continuous functions and open sets
Let f : A → R be a function. Recall that if B ⊂ A, then the image of B under f is the set f (B) = {y ∈ R : y = f (x) for some x ∈ B} , and if C ⊂ R, then the inverse image, or preimage, of C under f is the set f −1 (C) = {x ∈ A : f (x) ∈ C} .
The next example illustrates how open sets behave under continuous functions.
Example 7.30. Define f : R → R by f (x) = x2 , and consider the open interval
I = (1, 4). Then both f (I) = (1, 16) and f −1 (I) = (−2, −1)∪(1, 2) are open. There are two intervals in the inverse image of I because f is two-to-one on f −1 (I). On the other hand, if J = (−1, 1), then f (J) = [0, 1),

f −1 (J) = (−1, 1),

so the inverse image of the open interval J is open, but the image is not.
Thus, a continuous function needn’t map open sets to open sets. As we will show, however, the inverse image of an open set under a continuous function is always open. This property is the topological definition of a continuous function; it is a global definition in the sense that it is equivalent to the continuity of the function at every point of its domain.
Recall from Section 5.1 that a subset B of a set A ⊂ R is relatively open in
A, or open in A, if B = A ∩ U where U is open in R. Moreover, as stated in
Proposition 5.13, B is relatively open in A if and only if every point x ∈ B has a relative neighborhood C = A ∩ V such that C ⊂ B, where V is a neighborhood of x in R.
Theorem 7.31. A function f : A → R is continuous on A if and only if f −1 (V ) is open in A for every set V that is open in R.

130

7. Continuous Functions

Proof. First assume that f is continuous on A, and suppose that c ∈ f −1 (V ).
Then f (c) ∈ V and since V is open it contains an -neighborhood
V (f (c)) = (f (c) − , f (c) + ) of f (c). Since f is continuous at c, there is a δ-neighborhood
Uδ (c) = (c − δ, c + δ) of c such that f (A ∩ Uδ (c)) ⊂ V (f (c)) .
This statement just says that if |x − c| < δ and x ∈ A, then |f (x) − f (c)| < . It follows that
A ∩ Uδ (c) ⊂ f −1 (V ), meaning that f −1 (V ) contains a relative neighborhood of c. Therefore f −1 (V ) is relatively open in A.
Conversely, assume that f −1 (V ) is open in A for every open V in R, and let c ∈ A. Then the preimage of the -neighborhood (f (c) − , f (c) + ) is open in A, so it contains a relative δ-neighborhood A∩(c−δ, c+δ). It follows that |f (x)−f (c)| < if |x − c| < δ and x ∈ A, which means that f is continuous at c.
As one illustration of how we can use this result, we prove that continuous functions map intervals to intervals. (See Definition 5.62 for what we mean by an interval.) In view of Theorem 5.63, this is a special case of the fact that continuous functions map connected sets to connected sets (see Theorem 13.82).
Theorem 7.32. Suppose that f : I → R is continuous and I ⊂ R is an interval.
Then f (I) is an interval.
Proof. Suppose that f (I) is not a interval. Then, by Theorem 5.63, f (I) is disconnected, and there exist nonempty, disjoint open sets U , V such that U ∪ V ⊃ f (I).
Since f is continuous, f −1 (U ), f −1 (V ) are open from Theorem 7.31. Furthermore, f −1 (U ), f −1 (V ) are nonempty and disjoint, and f −1 (U ) ∪ f −1 (V ) = I. It follows that I is disconnected and therefore I is not an interval from Theorem 5.63. This shows that if I is an interval, then f (I) is also an interval.
We can also define open functions, or open mappings. Although they form an important class of functions, they aren’t as fundamental as continuous functions.
Definition 7.33. An open mapping on a set A ⊂ R is a function f : A → R such that f (B) is open in R for every set B ⊂ A that is open in A.
A continuous function needn’t be open, but if f : A → R is continuous and one-to-one, then f −1 : f (A) → R is open.
Example 7.34. Example 7.30 shows that the square function f : R → R defined by f (x) = x2 is not an open mapping on R. On the other hand f : [0, ∞) → R is open because it is one-to-one with a continuous inverse f −1 : [0, ∞) → R given by

f −1 (x) = x.

7.5. Continuous functions on compact sets

131

7.5. Continuous functions on compact sets
Continuous functions on compact sets have especially nice properties. For example, they are bounded and attain their maximum and minimum values, and they are uniformly continuous. Since a closed, bounded interval is compact, these results apply, in particular, to continuous functions f : [a, b] → R.
First, we prove that the continuous image of a compact set in R is compact.
This is a special case of the fact that continuous functions map compact sets to compact sets (see Theorem 13.82).
Theorem 7.35. If K ⊂ R is compact and f : K → R is continuous, then f (K) is compact. Proof. We will give two proofs, one using sequences and the other using open covers. We show that f (K) is sequentially compact. Let (yn ) be a sequence in f (K).
Then yn = f (xn ) for some xn ∈ K. Since K is compact, the sequence (xn ) has a convergent subsequence (xni ) such that lim xni = x

i→∞

where x ∈ K. Since f is continuous on K, lim f (xni ) = f (x).

i→∞

Writing y = f (x), we have y ∈ f (K) and lim yni = y.

i→∞

Therefore every sequence (yn ) in f (K) has a convergent subsequence whose limit belongs to f (K), so f (K) is compact.
As an alternative proof, we show that f (K) has the Heine-Borel property. Suppose that {Vi : i ∈ I} is an open cover of f (K). Since f is continuous, Theorem 7.31 implies that f −1 (Vi ) is open in K, so {f −1 (Vi ) : i ∈ I} is an open cover of K. Since
K is compact, there is a finite subcover f −1 (Vi1 ), f −1 (Vi2 ), . . . , f −1 (ViN ) of K, and it follows that
{Vi1 , Vi2 , . . . , ViN } is a finite subcover of the original open cover of f (K). This proves that f (K) is compact. Note that compactness is essential here; it is not true, in general, that a continuous function maps closed sets to closed sets.
Example 7.36. Define f : R → R by f (x) =

1
.
1 + x2

Then [0, ∞) is closed but f ([0, ∞)) = (0, 1] is not.

132

7. Continuous Functions

0.15
1.2

1
0.1

y

y

0.8

0.6

0.05

0.4

0.2

0

0

0.1

0.2

0.3 x 0.4

0.5

0

0.6

0

0.02

0.04

0.06

0.08

0.1

x

Figure 3. A plot of the function y = x + x sin(1/x) on [0, 2/π] and a detail near the origin.

The following result is one of the most important property of continuous functions on compact sets.
Theorem 7.37 (Weierstrass extreme value). If f : K → R is continuous and
K ⊂ R is compact, then f is bounded on K and f attains its maximum and minimum values on K.
Proof. The image f (K) is compact from Theorem 7.35. Proposition 5.40 implies that f (K) is bounded and the maximum M and minimum m belong to f (K).
Therefore there are points x, y ∈ K such that f (x) = M , f (y) = m, and f attains its maximum and minimum on K.
Example 7.38. Define f : [0, 1] → R by f (x) =

1/x if 0 < x ≤ 1,
0
if x = 0.

Then f is unbounded on [0, 1] and has no maximum value (f does, however, have a minimum value of 0 attained at x = 0). In this example, [0, 1] is compact but f is discontinuous at 0, which shows that a discontinuous function on a compact set needn’t be bounded.
Example 7.39. Define f : (0, 1] → R by f (x) = 1/x. Then f is unbounded on
(0, 1] with no maximum value (f does, however, have a minimum value of 1 attained at x = 1). In this example, f is continuous but the half-open interval (0, 1] isn’t compact, which shows that a continuous function on a non-compact set needn’t be bounded. Example 7.40. Define f : (0, 1) → R by f (x) = x. Then inf f (x) = 0, x∈(0,1) sup f (x) = 1 x∈(0,1) but f (x) = 0, f (x) = 1 for any 0 < x < 1. Thus, even if a continuous function on a non-compact set is bounded, it needn’t attain its supremum or infimum.

7.6. The intermediate value theorem

133

Example 7.41. Define f : [0, 2/π] → R by f (x) =

x + x sin(1/x) if 0 < x ≤ 2/π,
0
if x = 0.

(See Figure 3.) Then f is continuous on the compact interval [0, 2/π], so by Theorem 7.37 it attains its maximum and minimum. For 0 ≤ x ≤ 2/π, we have
0 ≤ f (x) ≤ 1/π since | sin 1/x| ≤ 1. Thus, the minimum value of f is 0, attained at x = 0. It is also attained at infinitely many other interior points in the interval,
1
xn =
,
n = 0, 1, 2, 3, . . . ,
2nπ + 3π/2 where sin(1/xn ) = −1. The maximum value of f is 1/π, attained at x = 2/π.
Finally, we prove that continuous functions on compact sets are uniformly continuous Theorem 7.42. If f : K → R is continuous and K ⊂ R is compact, then f is uniformly continuous on K.
Proof. Suppose for contradiction that f is not uniformly continuous on K. Then from Proposition 7.24 there exists 0 > 0 and sequences (xn ), (yn ) in K such that lim |xn − yn | = 0 and |f (xn ) − f (yn )| ≥

n→∞

0

for every n ∈ N.

Since K is compact, there is a convergent subsequence (xni ) of (xn ) such that lim xni = x ∈ K.

i→∞

Moreover, since (xn − yn ) → 0 as n → ∞, it follows that lim yni = lim [xni − (xni − yni )] = lim xni − lim (xni − yni ) = x,

i→∞

i→∞

i→∞

i→∞

so (yni ) also converges to x. Then, since f is continuous on K, lim |f (xni ) − f (yni )| = lim f (xni ) − lim f (yni ) = |f (x) − f (x)| = 0,

i→∞

i→∞

i→∞

but this contradicts the non-uniform continuity condition
|f (xni ) − f (yni )| ≥

0.

Therefore f is uniformly continuous.
Example 7.43. The function f : [0, 2/π] → R defined in Example 7.41 is uniformly continuous on [0, 2/π] since it is is continuous and [0, 2/π] is compact.

7.6. The intermediate value theorem
The intermediate value theorem states that a continuous function on an interval takes on all values between any two of its values. We first prove a special case.
Theorem 7.44 (Intermediate value). Suppose that f : [a, b] → R is a continuous function on a closed, bounded interval. If f (a) < 0 and f (b) > 0, or f (a) > 0 and f (b) < 0, then there is a point a < c < b such that f (c) = 0.

134

7. Continuous Functions

Proof. Assume for definiteness that f (a) < 0 and f (b) > 0. (If f (a) > 0 and f (b) < 0, consider −f instead of f .) The set
E = {x ∈ [a, b] : f (x) < 0} is nonempty, since a ∈ E, and E is bounded from above by b. Let c = sup E ∈ [a, b], which exists by the completeness of R. We claim that f (c) = 0.
Suppose for contradiction that f (c) = 0. Since f is continuous at c, there exists δ > 0 such that
1
|x − c| < δ and x ∈ [a, b] implies that |f (x) − f (c)| < |f (c)|.
2
If f (c) < 0, then c = b and
1
f (x) = f (c) + f (x) − f (c) < f (c) − f (c)
2
1 for all x ∈ [a, b] such that |x − c| < δ, so f (x) < 2 f (c) < 0. It follows that there are points x ∈ E with x > c, which contradicts the fact that c is an upper bound of E.
If f (c) > 0, then c = a and
1
f (x) = f (c) + f (x) − f (c) > f (c) − f (c)
2
1 for all x ∈ [a, b] such that |x − c| < δ, so f (x) > 2 f (c) > 0. It follows that there exists η > 0 such that c − η ≥ a and f (x) > 0 for c − η ≤ x ≤ c.
In that case, c − η < c is an upper bound for E, since c is an upper bound and f (x) > 0 for c − η ≤ x ≤ c, which contradicts the fact that c is the least upper bound. This proves that f (c) = 0. Finally, c = a, b since f is nonzero at the endpoints, so a < c < b.
We give some examples to show that all of the hypotheses in this theorem are necessary. Example 7.45. Let K = [−2, −1] ∪ [1, 2] and define f : K → R by f (x) =

−1
1

if −2 ≤ x ≤ −1 if 1 ≤ x ≤ 2

Then f (−2) < 0 and f (2) > 0, but f doesn’t vanish at any point in its domain.
Thus, in general, Theorem 7.44 fails if the domain of f is not a connected interval
[a, b].
Example 7.46. Define f : [−1, 1] → R by f (x) =

−1
1

if −1 ≤ x < 0 if 0 ≤ x ≤ 1

Then f (−1) < 0 and f (1) > 0, but f doesn’t vanish at any point in its domain.
Here, f is defined on an interval but it is discontinuous at 0. Thus, in general,
Theorem 7.44 fails for discontinuous functions.

7.6. The intermediate value theorem

135

As one immediate consequence of the intermediate value theorem, we show that the real numbers contain the square root of 2.
Example 7.47. Define the continuous function f : [1, 2] → R by f (x) = x2 − 2.
Then f (1) < 0 and f (2) > 0, so Theorem 7.44 implies that there exists 1 < c < 2 such that c2 = 2. Moreover, since x2 − 2 is strictly increasing on [0, ∞), there is a

unique such positive number, and we have proved the existence of 2.

We can get more accurate approximations to 2 by repeatedly bisecting the

interval [1, 2]. For example f (3/2) = 1/4 > 0 so 1 < 2 < 3/2, and f (5/4) < 0

so 5/4 < 2 < 3/2, and so on. This bisection method is a simple, but useful, algorithm for computing numerical approximations of solutions of f (x) = 0 where f is a continuous function.
Note that we used the existence of a supremum in the proof of Theorem 7.44. If we restrict f (x) = x2 − 2 to rational numbers, f : A → Q where A = [1, 2] ∩ Q, then

f is continuous on A, f (1) < 0 and f (2) > 0, but f (c) = 0 for any c ∈ A since 2 is irrational. This shows that the completeness of R is essential for Theorem 7.44 to hold. (Thus, in a sense, the theorem actually describes the completeness of the continuum R rather than the continuity of f !)
The general statement of the Intermediate Value Theorem follows immediately from this special case.
Theorem 7.48 (Intermediate value theorem). Suppose that f : [a, b] → R is a continuous function on a closed, bounded interval. Then for every d strictly between f (a) and f (b) there is a point a < c < b such that f (c) = d.
Proof. Suppose, for definiteness, that f (a) < f (b) and f (a) < d < f (b). (If f (a) > f (b) and f (b) < d < f (a), apply the same proof to −f , and if f (a) = f (b) there is nothing to prove.) Let g(x) = f (x) − d. Then g(a) < 0 and g(b) > 0, so
Theorem 7.44 implies that g(c) = 0 for some a < c < b, meaning that f (c) = d.
As one consequence of our previous results, we prove that a continuous function maps compact intervals to compact intervals.
Theorem 7.49. Suppose that f : [a, b] → R is a continuous function on a closed, bounded interval. Then f ([a, b]) = [m, M ] is a closed, bounded interval.
Proof. Theorem 7.37 implies that m ≤ f (x) ≤ M for all x ∈ [a, b], where m and
M are the maximum and minimum values of f on [a, b], so f ([a, b]) ⊂ [m, M ].
Moreover, there are points c, d ∈ [a, b] such that f (c) = m, f (d) = M .
Let J = [c, d] if c ≤ d or J = [d, c] if d < c. Then J ⊂ [a, b], and Theorem 7.48 implies that f takes on all values in [m, M ] on J. It follows that f ([a, b]) ⊃ [m, M ], so f ([a, b]) = [m, M ].
First we give an example to illustrate the theorem.
Example 7.50. Define f : [−1, 1] → R by f (x) = x − x3 .

136

7. Continuous Functions

Then, using calculus to compute the maximum and minimum of f , we find that
2
f ([−1, 1]) = [−M, M ],
M= √ .
3 3
This example illustrates that f ([a, b]) = [f (a), f (b)] unless f is increasing.
Next we give some examples to show that the continuity of f and the connectedness and compactness of the interval [a, b] are essential for Theorem 7.49 to hold. Example 7.51. Let sgn : [−1, 1] → R be the sign function defined in Example 6.8.
Then f is a discontinuous function on a compact interval [−1, 1], but the range f ([−1, 1]) = {−1, 0, 1} consists of three isolated points and is not an interval.
Example 7.52. In Example 7.45, the function f : K → R is continuous on a compact set K but f (K) = {−1, 1} consists of two isolated points and is not an interval. Example 7.53. The continuous function f : R → R in Example 7.36 maps the unbounded, closed interval [0, ∞) to the half-open interval (0, 1].
The last example shows that a continuous function may map a closed but unbounded interval to an interval which isn’t closed (or open). Nevertheless, as shown in Theorem 7.32, a continuous function always maps intervals to intervals, although the intervals may be open, closed, half-open, bounded, or unbounded.

7.7. Monotonic functions
Monotonic functions have continuity properties that are not shared by general functions.
Definition 7.54. Let I ⊂ R be an interval. A function f : I → R is increasing if f (x1 ) ≤ f (x2 )

if x1 , x2 ∈ I and x1 < x2 ,

strictly increasing if f (x1 ) < f (x2 )

if x1 , x2 ∈ I and x1 < x2 ,

f (x1 ) ≥ f (x2 )

if x1 , x2 ∈ I and x1 < x2 ,

decreasing if and strictly decreasing if f (x1 ) > f (x2 )

if x1 , x2 ∈ I and x1 < x2 .

An increasing or decreasing function is called a monotonic function, and a strictly increasing or strictly decreasing function is called a strictly monotonic function.
A commonly used alternative (and, unfortunately, incompatible) terminology is “nondecreasing” for “increasing,” “increasing” for “strictly increasing,” “nonincreasing” for “decreasing,” and “decreasing” for “strictly decreasing.” According to our terminology, a constant function is both increasing and decreasing. Monotonic functions are also referred to as monotone functions.

7.7. Monotonic functions

137

Theorem 7.55. If f : I → R is monotonic on an interval I, then the left and right limits of f , lim− f (x), lim+ f (x), x→c x→c

exist at every interior point c of I.
Proof. Assume for definiteness that f is increasing. (If f is decreasing, we can apply the same argument to −f which is increasing). We will prove that
E = {f (x) ∈ R : x ∈ I and x < c} .

lim f (x) = sup E,

x→c−

The set E is nonempty since c in an interior point of I, so there exists x ∈ I with x < c, and E bounded from above by f (c) since f is increasing. It follows that L = sup E ∈ R exists. (Note that L may be strictly less than f (c)!)
Suppose that > 0 is given. Since L is a least upper bound of E, there exists y0 ∈ E such that L − < y0 ≤ L, and therefore x0 ∈ I with x0 < c such that f (x0 ) = y0 . Let δ = c − x0 > 0. If c − δ < x < c, then x0 < x < c and therefore f (x0 ) ≤ f (x) ≤ L since f is increasing and L is an upper bound of E. It follows that L − < f (x) ≤ L if c − δ < x < c, which proves that limx→c− f (x) = L.
A similar argument, or the same argument applied to g(x) = −f (−x), shows that lim+ f (x) = inf {f (x) ∈ R : x ∈ I and x > c} . x→c We leave the details as an exercise.
Similarly, if I = (a, b] has right-endpoint b ∈ I and f is monotonic on I, then the left limit limx→b− f (x) exists, although it may not equal f (b), and if a ∈ I is a left-endpoint, then the right limit limx→a+ f (x) exists, although it may not equal f (a).
Corollary 7.56. Every discontinuity of a monotonic function f : I → R at an interior point of the interval I is a jump discontinuity.
Proof. If c is an interior point of I, then the left and right limits of f at c exist by the previous theorem. Moreover, assuming for definiteness that f is increasing, we have f (x) ≤ f (c) ≤ f (y) for all x, y ∈ I with x < c < y, and since limits preserve inequalities lim f (x) ≤ f (c) ≤ lim+ f (x).

x→c−

x→c

If the left and right limits are equal, then the limit exists and is equal to the left and right limits, so lim f (x) = f (c), x→c meaning that f is continuous at c. In particular, a monotonic function cannot have a removable discontinuity at an interior point of its domain (although it can have one at an endpoint of a closed interval). If the left and right limits are not equal,

138

7. Continuous Functions

then f has a jump discontinuity at c, so f cannot have an essential discontinuity either. One can show that a monotonic function has, at most, a countable number of discontinuities, and it may have a countably infinite number, but we omit the proof. By contrast, the non-monotonic Dirichlet function has uncountably many discontinuities at every point of R.

Chapter 8

Differentiable Functions

A differentiable function is a function that can be approximated locally by a linear function. 8.1. The derivative
Definition 8.1. Suppose that f : (a, b) → R and a < c < b. Then f is differentiable at c with derivative f (c) if lim h→0

f (c + h) − f (c)
= f (c). h The domain of f is the set of points c ∈ (a, b) for which this limit exists. If the limit exists for every c ∈ (a, b) then we say that f is differentiable on (a, b).
Graphically, this definition says that the derivative of f at c is the slope of the tangent line to y = f (x) at c, which is the limit as h → 0 of the slopes of the lines through (c, f (c)) and (c + h, f (c + h)).
We can also write f (c) = lim

x→c

f (x) − f (c)
,
x−c

since if x = c + h, the conditions 0 < |x − c| < δ and 0 < |h| < δ in the definitions of the limits are equivalent. The ratio f (x) − f (c) x−c is undefined (0/0) at x = c, but it doesn’t have to be defined in order for the limit as x → c to exist.
Like continuity, differentiability is a local property. That is, the differentiability of a function f at c and the value of the derivative, if it exists, depend only the values of f in a arbitrarily small neighborhood of c. In particular if f : A → R
139

140

8. Differentiable Functions

where A ⊂ R, then we can define the differentiability of f at any interior point c ∈ A since there is an open interval (a, b) ⊂ A with c ∈ (a, b).
8.1.1. Examples of derivatives. Let us give a number of examples that illustrate differentiable and non-differentiable functions.
Example 8.2. The function f : R → R defined by f (x) = x2 is differentiable on
R with derivative f (x) = 2x since lim h→0

(c + h)2 − c2 h(2c + h)
= lim
= lim (2c + h) = 2c. h→0 h→0 h h

Note that in computing the derivative, we first cancel by h, which is valid since h = 0 in the definition of the limit, and then set h = 0 to evaluate the limit. This procedure would be inconsistent if we didn’t use limits.
Example 8.3. The function f : R → R defined by f (x) =

x2
0

if x > 0, if x ≤ 0.

is differentiable on R with derivative f (x) =

2x if x > 0,
0
if x ≤ 0.

For x > 0, the derivative is f (x) = 2x as above, and for x < 0, we have f (x) = 0.
For 0, we consider the limit lim h→0

f (h) f (h) − f (0)
= lim
.
h→0 h h The right limit is lim h→0+

f (h)
= lim h = 0, h→0 h

and the left limit is lim h→0−

f (h)
= 0. h Since the left and right limits exist and are equal, the limit also exists, and f is differentiable at 0 with f (0) = 0.
Next, we consider some examples of non-differentiability at discontinuities, corners, and cusps.
Example 8.4. The function f : R → R defined by f (x) =

1/x
0

if x = 0, if x = 0,

8.1. The derivative

141

is differentiable at x = 0 with derivative f (x) = −1/x2 since f (c + h) − f (c)
1/(c + h) − 1/c
= lim h→0 h h c − (c + h)
= lim h→0 hc(c + h)
1
= − lim h→0 c(c + h)
1
= − 2. c However, f is not differentiable at 0 since the limit lim h→0

lim

h→0

f (h) − f (0)
1/h − 0
1
= lim
= lim 2 h→0 h→0 h h h

does not exist.
Example 8.5. The sign function f (x) = sgn x, defined in Example 6.8, is differentiable at x = 0 with f (x) = 0, since in that case f (x + h) − f (x) = 0 for all sufficiently small h. The sign function is not differentiable at 0 since lim h→0

sgn h sgn h − sgn 0
= lim h→0 h h and sgn h
=
h

1/h if h > 0
−1/h if h < 0

is unbounded in every neighborhood of 0, so its limit does not exist.
Example 8.6. The absolute value function f (x) = |x| is differentiable at x = 0 with derivative f (x) = sgn x. It is not differentiable at 0, however, since f (h) − f (0)
|h|
= lim
= lim sgn h h→0 h h→0 h does not exist. (The right limit is 1 and the left limit is −1.) lim h→0

Example 8.7. The function f : R → R defined by f (x) = |x|1/2 is differentiable at x = 0 with sgn x f (x) =
.
2|x|1/2
If c > 0, then using the difference of two square to rationalize the numerator, we get lim

h→0

(c + h)1/2 − c1/2 f (c + h) − f (c)
= lim h→0 h h (c + h) − c
= lim h→0 h (c + h)1/2 + c1/2
1
= lim
1/2 + c1/2 h→0 (c + h)
1
= 1/2 .
2c

142

8. Differentiable Functions

If c < 0, we get the analogous result with a negative sign. However, f is not differentiable at 0, since lim h→0+

f (h) − f (0)
1
= lim+ 1/2 h h→0 h

does not exist.
Example 8.8. The function f : R → R defined by f (x) = x1/3 is differentiable at x = 0 with
1
f (x) = 2/3 .
3x
To prove this result, we use the identity for the difference of cubes, a3 − b3 = (a − b)(a2 + ab + b2 ), and get for c = 0 that f (c + h) − f (c)
(c + h)1/3 − c1/3
= lim h→0 h→0 h h
(c + h) − c
= lim
2/3 + (c + h)1/3 c1/3 + c2/3 h→0 h (c + h)
1
= lim
2/3 + (c + h)1/3 c1/3 + c2/3 h→0 (c + h)
1
= 2/3 .
3c
However, f is not differentiable at 0, since lim 1 f (h) − f (0)
= lim 2/3 h→0 h h→0 h lim does not exist.
Finally, we consider some examples of highly oscillatory functions.
Example 8.9. Define f : R → R by f (x) =

x sin(1/x) if x = 0,
0
if x = 0.

It follows from the product and chain rules proved below that f is differentiable at x = 0 with derivative
1
1
1
f (x) = sin − cos . x x x However, f is not differentiable at 0, since lim h→0

f (h) − f (0)
1
= lim sin , h→0 h h which does not exist.
Example 8.10. Define f : R → R by f (x) =

x2 sin(1/x)
0

if x = 0, if x = 0.

8.1. The derivative

143

1

0.01

0.8

0.008

0.6

0.006

0.4

0.004

0.2

0.002

0

0

−0.2

−0.002

−0.4

−0.004

−0.6

−0.006

−0.8

−0.008

−1
−1

−0.5

0

0.5

1

−0.01
−0.1

−0.05

0

0.05

0.1

Figure 1. A plot of the function y = x2 sin(1/x) and a detail near the origin with the parabolas y = ±x2 shown in red.

Then f is differentiable on R. (See Figure 1.) It follows from the product and chain rules proved below that f is differentiable at x = 0 with derivative
1
1
− cos . x x
Moreover, f is differentiable at 0 with f (0) = 0, since f (x) = 2x sin

lim

h→0

1 f (h) − f (0)
= lim h sin = 0. h→0 h h In this example, limx→0 f (x) does not exist, so although f is differentiable on R, its derivative f is not continuous at 0.
8.1.2. Derivatives as linear approximations. Another way to view Definition 8.1 is to write f (c + h) = f (c) + f (c)h + r(h) as the sum of a linear (or, strictly speaking, affine) approximation f (c) + f (c)h of f (c + h) and a remainder r(h). In general, the remainder also depends on c, but we don’t show this explicitly since we’re regarding c as fixed.
As we prove in the following proposition, the differentiability of f at c is equivalent to the condition r(h) lim
= 0. h→0 h
That is, the remainder r(h) approaches 0 faster than h, so the linear terms in h provide a leading order approximation to f (c + h) when h is small. We also write this condition on the remainder as r(h) = o(h)

as h → 0,

pronounced “r is little-oh of h as h → 0.”
Graphically, this condition means that the graph of f near c is close the line through the point (c, f (c)) with slope f (c). Analytically, it means that the function h → f (c + h) − f (c)

144

8. Differentiable Functions

is approximated near c by the linear function h → f (c)h.
Thus, f (c) may be interpreted as a scaling factor by which a differentiable function f shrinks or stretches lengths near c.
If |f (c)| < 1, then f shrinks the length of a small interval about c by (approximately) this factor; if |f (c)| > 1, then f stretches the length of an interval by (approximately) this factor; if f (c) > 0, then f preserves the orientation of the interval, meaning that it maps the left endpoint to the left endpoint of the image and the right endpoint to the right endpoints; if f (c) < 0, then f reverses the orientation of the interval, meaning that it maps the left endpoint to the right endpoint of the image and visa-versa.
We can use this description as a definition of the derivative.
Proposition 8.11. Suppose that f : (a, b) → R. Then f is differentiable at c ∈
(a, b) if and only if there exists a constant A ∈ R and a function r : (a−c, b−c) → R such that r(h) = 0. f (c + h) = f (c) + Ah + r(h), lim h→0 h
In that case, A = f (c).
Proof. First suppose that f is differentiable at c according to Definition 8.1, and define r(h) = f (c + h) − f (c) − f (c)h.
Then
lim

h→0

r(h) f (c + h) − f (c)
= lim
− f (c) = 0, h→0 h h so the condition in the proposition holds with A = f (c).
Conversely, suppose that f (c + h) = f (c) + Ah + r(h) where r(h)/h → 0 as h → 0. Then f (c + h) − f (c) r(h) = lim A +
= A, lim h→0 h→0 h h so f is differentiable at c with f (c) = A.
Example 8.12. For Example 8.2 with f (x) = x2 , we get
(c + h)2 = c2 + 2ch + h2 , and r(h) = h2 , which goes to zero at a quadratic rate as h → 0.
Example 8.13. For Example 8.4 with f (x) = 1/x, we get
1
1
1
= − 2 h + r(h), c+h c c for c = 0, where the quadratically small remainder is r(h) =

h2
.
c2 (c + h)

8.2. Properties of the derivative

145

8.1.3. Left and right derivatives. For the most part, we will use derivatives that are defined only at the interior points of the domain of a function. Sometimes, however, it is convenient to use one-sided left or right derivatives that are defined at the endpoint of an interval.
Definition 8.14. Suppose that f : [a, b] → R. Then f is right-differentiable at a ≤ c < b with right derivative f (c+ ) if lim h→0+

f (c + h) − f (c)
= f (c+ ) h exists, and f is left-differentiable at a < c ≤ b with left derivative f (c− ) if lim h→0−

f (c + h) − f (c) f (c) − f (c − h)
= lim
= f (c− ). h h h→0+ A function is differentiable at a < c < b if and only if the left and right derivatives at c both exist and are equal.
Example 8.15. If f : [0, 1] → R is defined by f (x) = x2 , then f (0+ ) = 0,
These left and right derivatives remain defined on a larger domain, say

x 2

f (x) = 1


1/x

f (1− ) = 2. the same if f is extended to a function if 0 ≤ x ≤ 1, if x > 1, if x < 0.

For this extended function we have f (1+ ) = 0, which is not equal to f (1− ), and f (0− ) does not exist, so the extended function is not differentiable at either 0 or
1.
Example 8.16. The absolute value function f (x) = |x| in Example 8.6 is left and right differentiable at 0 with left and right derivatives f (0+ ) = 1,

f (0− ) = −1.

These are not equal, and f is not differentiable at 0.

8.2. Properties of the derivative
In this section, we prove some basic properties of differentiable functions.
8.2.1. Differentiability and continuity. First we discuss the relation between differentiability and continuity.
Theorem 8.17. If f : (a, b) → R is differentiable at at c ∈ (a, b), then f is continuous at c.

146

8. Differentiable Functions

Proof. If f is differentiable at c, then f (c + h) − f (c)
·h
h→0 h f (c + h) − f (c)
= lim
· lim h h→0 h→0 h = f (c) · 0

lim f (c + h) − f (c) = lim

h→0

= 0, which implies that f is continuous at c.
For example, the sign function in Example 8.5 has a jump discontinuity at 0 so it cannot be differentiable at 0. The converse does not hold, and a continuous function needn’t be differentiable. The functions in Examples 8.6, 8.8, 8.9 are continuous but not differentiable at 0. Example 9.24 describes a function that is continuous on R but not differentiable anywhere.
In Example 8.10, the function is differentiable on R, but the derivative f is not continuous at 0. Thus, while a function f has to be continuous to be differentiable, if f is differentiable its derivative f need not be continuous. This leads to the following definition.
Definition 8.18. A function f : (a, b) → R is continuously differentiable on (a, b), written f ∈ C 1 (a, b), if it is differentiable on (a, b) and f : (a, b) → R is continuous.
For example, the function f (x) = x2 with derivative f (x) = 2x is continuously differentiable on R, whereas the function in Example 8.10 is not continuously differentiable at 0. As this example illustrates, functions that are differentiable but not continuously differentiable may behave in rather pathological ways. On the other hand, the behavior of continuously differentiable functions, whose graphs have continuously varying tangent lines, is more-or-less consistent with what one expects. 8.2.2. Algebraic properties of the derivative. A fundamental property of the derivative is that it is a linear operation. In addition, we have the following product and quotient rules.
Theorem 8.19. If f, g : (a, b) → R are differentiable at c ∈ (a, b) and k ∈ R, then kf , f + g, and f g are differentiable at c with
(kf ) (c) = kf (c),

(f + g) (c) = f (c) + g (c),

(f g) (c) = f (c)g(c) + f (c)g (c).

Furthermore, if g(c) = 0, then f /g is differentiable at c with f g

(c) =

f (c)g(c) − f (c)g (c)
.
g 2 (c)

8.3. The chain rule

147

Proof. The first two properties follow immediately from the linearity of limits stated in Theorem 6.34. For the product rule, we write f (c + h)g(c + h) − f (c)g(c) h (f (c + h) − f (c)) g(c + h) + f (c) (g(c + h) − g(c))
= lim h→0 h f (c + h) − f (c) g(c + h) − g(c)
= lim lim g(c + h) + f (c) lim h→0 h→0 h→0 h h = f (c)g(c) + f (c)g (c),

(f g) (c) = lim

h→0

where we have used the properties of limits in Theorem 6.34 and Theorem 8.19, which implies that g is continuous at c. The quotient rule follows by a similar argument, or by combining the product rule with the chain rule, which implies that
(1/g) = −g /g 2 . (See Example 8.22 below.)
Example 8.20. We have 1 = 0 and x = 1. Repeated application of the product rule implies that xn is differentiable on R for every n ∈ N with
(xn ) = nxn−1 .
Alternatively, we can prove this result by induction: The formula holds for n = 1.
Assuming that it holds for some n ∈ N, we get from the product rule that
(xn+1 ) = (x · xn ) = 1 · xn + x · nxn−1 = (n + 1)xn , and the result follows. It also follows by linearity that every polynomial function is differentiable on R, and from the quotient rule that every rational function is differentiable at every point where its denominator is nonzero. The derivatives are given by their usual formulae.

8.3. The chain rule
The chain rule states that the composition of differentiable functions is differentiable. The result is quite natural if one thinks in terms of derivatives as linear maps. If f is differentiable at c, it scales lengths by a factor f (c), and if g is differentiable at f (c), it scales lengths by a factor g (f (c)). Thus, the composition g ◦ f scales lengths at c by a factor g (f (c)) · f (c). Equivalently, the derivative of a composition is the composition of the derivatives (regarded as linear maps).
We will prove the chain rule by showing that the composition of remainder terms in the linear approximations of f and g leads to a similar remainder term in the linear approximation of g ◦ f . The argument is complicated by the fact that we have to evaluate the remainder of g at a point that depends on the remainder of f , but this complication should not obscure the simplicity of the final result.
Theorem 8.21 (Chain rule). Let f : A → R and g : B → R where A ⊂ R and f (A) ⊂ B, and suppose that c is an interior point of A and f (c) is an interior point of B. If f is differentiable at c and g is differentiable at f (c), then g ◦ f : A → R is differentiable at c and
(g ◦ f ) (c) = g (f (c)) f (c).

148

8. Differentiable Functions

Proof. Since f is differentiable at c, there is a function r(h) such that r(h) = 0, h and since g is differentiable at f (c), there is a function s(k) such that f (c + h) = f (c) + f (c)h + r(h),

lim

h→0

g (f (c) + k) = g (f (c)) + g (f (c)) k + s(k),

s(k)
= 0. k→0 k lim It follows that
(g ◦ f )(c + h) = g (f (c) + f (c)h + r(h))
= g (f (c)) + g (f (c)) · (f (c)h + r(h)) + s (f (c)h + r(h))
= g (f (c)) + g (f (c)) f (c) · h + t(h) where t(h) = g (f (c)) · r(h) + s (φ(h)) ,

φ(h) = f (c)h + r(h).

Since r(h)/h → 0 as h → 0, we have s (φ(h)) t(h) = lim
.
h→0 h→0 h h We claim that this limit exists and is zero, and then it follows from Proposition 8.11 that g ◦ f is differentiable at c with lim (g ◦ f ) (c) = g (f (c)) f (c).
To prove the claim, we use the facts that φ(h) s(k)
→ f (c) as h → 0,
→ 0 as k → 0. h k
Roughly speaking, we have φ(h) ∼ f (c)h when h is small and therefore s (φ(h)) s (f (c)h)

→0 h h

as h → 0.

In detail, let > 0 be given. We want to show that there exists δ > 0 such that s (φ(h))
<
h

if 0 < |h| < δ.

First, choose δ1 > 0 such that r(h) < |f (c)| + 1 h if 0 < |h| < δ1 .

If 0 < |h| < δ1 , then
|φ(h)| ≤ |f (c)| |h| + |r(h)|
< |f (c)| |h| + (|f (c)| + 1)|h|
< (2|f (c)| + 1) |h|.
Next, choose η > 0 so that s(k) < k 2|f (c)| + 1

if 0 < |k| < η.

8.3. The chain rule

149

(We include a “1” in the denominator on the right-hand side to avoid a division by zero if f (c) = 0.) Finally, define δ2 > 0 by η δ2 =
,
2|f (c)| + 1 and let δ = min(δ1 , δ2 ) > 0.
If 0 < |h| < δ and φ(h) = 0, then 0 < |φ(h)| < η, so
|s (φ(h)) | ≤

|φ(h)|
< |h|.
2|f (c)| + 1

If φ(h) = 0, then s(φ(h)) = 0, so the inequality holds in that case also. This proves that s (φ(h))
= 0. lim h→0 h Example 8.22. Suppose that f is differentiable at c and f (c) = 0. Then g(y) = 1/y is differentiable at f (c), with g (y) = −1/y 2 (see Example 8.4). It follows that the reciprocal function 1/f = g ◦ f is differentiable at c with
1
f

(c) = g (f (c))f (c) = −

f (c)
.
f (c)2

The chain rule gives an expression for the derivative of an inverse function. In terms of linear approximations, it states that if f scales lengths at c by a nonzero factor f (c), then f −1 scales lengths at f (c) by the factor 1/f (c).
Proposition 8.23. Suppose that f : A → R is a one-to-one function on A ⊂ R with inverse f −1 : B → R where B = f (A). Assume that f is differentiable at an interior point c ∈ A and f −1 is differentiable at f (c), where f (c) is an interior point of B. Then f (c) = 0 and
1
(f −1 ) (f (c)) =
.
f (c)
Proof. The definition of the inverse implies that f −1 (f (x)) = x.
Since f is differentiable at c and f −1 is differentiable at f (c), the chain rule implies that f −1 (f (c)) f (c) = 1.
Dividing this equation by f (c) = 0, we get the result. Moreover, it follows that f −1 cannot be differentiable at f (c) if f (c) = 0.
Alternatively, setting d = f (c), we can write the result as
(f −1 ) (d) =

1 f (f −1 (d))

.

Proposition 8.23 is not entirely satisfactory because it assumes the existence and differentiability of an inverse function. We will return to this question in
Section 8.7 below, but we end this section with some examples that illustrate the

150

8. Differentiable Functions

necessity of the condition f (c) = 0 for the existence and differentiability of the inverse. Example 8.24. Define f : R → R by f (x) = x2 . Then f (0) = 0 and f is not invertible on any neighborhood of the origin, since it is non-monotone and not oneto-one. On the other hand, if f : (0, ∞) → (0, ∞) is defined by f (x) = x2 , then f (x) = 2x = 0 and the inverse function f −1 : (0, ∞) → (0, ∞) is given by

f −1 (y) = y.
The formula for the inverse of the derivative gives
1
1
(f −1 ) (x2 ) =
=
, f (x)
2x
or, writing x = f −1 (y),
1
(f −1 ) (y) = √ ,
2 y in agreement with Example 8.7.
Example 8.25. Define f : R → R by f (x) = x3 . Then f is strictly increasing, one-to-one, and onto. The inverse function f −1 : R → R is given by f −1 (y) = y 1/3 .
Then f (0) = 0 and f −1 is not differentiable at f (0)= 0. On the other hand, f −1 is differentiable at non-zero points of R, with
1
1
= 2,
(f −1 ) (x3 ) = f (x)
3x
or, writing x = y 1/3 ,
(f −1 ) (y) =

1
,
3y 2/3

in agreement with Example 8.8.

8.4. Extreme values
One of the most useful applications of the derivative is in locating the maxima and minima of functions.
Definition 8.26. Suppose that f : A → R. Then f has a global (or absolute) maximum at c ∈ A if f (x) ≤ f (c) for all x ∈ A, and f has a local (or relative) maximum at c ∈ A if there is a neighborhood U of c such that f (x) ≤ f (c) for all x ∈ A ∩ U .
Similarly, f has a global (or absolute) minimum at c ∈ A if f (x) ≥ f (c)

for all x ∈ A,

and f has a local (or relative) minimum at c ∈ A if there is a neighborhood U of c such that f (x) ≥ f (c) for all x ∈ A ∩ U .

8.4. Extreme values

151

If f has a (local or global) maximum or minimum at c ∈ A, then f is said to have a (local or global) extreme value at c.
Theorem 7.37 states that a continuous function on a compact set has a global maximum and minimum but does not say how to find them. The following fundamental result goes back to Fermat.
Theorem 8.27. If f : A ⊂ R → R has a local extreme value at an interior point c ∈ A and f is differentiable at c, then f (c) = 0.
Proof. If f has a local maximum at c, then f (x) ≤ f (c) for all x in a δ-neighborhood
(c − δ, c + δ) of c, so f (c + h) − f (c)
≤0
h

for all 0 < h < δ,

which implies that f (c) = lim+ h→0 f (c + h) − f (c)
≤ 0. h Moreover, f (c + h) − f (c)
≥0
h

for all −δ < h < 0,

which implies that f (c) = lim− h→0 f (c + h) − f (c)
≥ 0. h It follows that f (c) = 0. If f has a local minimum at c, then the signs in these inequalities are reversed, and we also conclude that f (c) = 0.
For this result to hold, it is crucial that c is an interior point, since we look at the sign of the difference quotient of f on both sides of c. At an endpoint, we get the following inequality condition on the derivative. (Draw a graph!)
Proposition 8.28. Let f : [a, b] → R. If the right derivative of f exists at a, then: f (a+ ) ≤ 0 if f has a local maximum at a; and f (a+ ) ≥ 0 if f has a local minimum at a. Similarly, if the left derivative of f exists at b, then: f (b− ) ≥ 0 if f has a local maximum at b; and f (b− ) ≤ 0 if f has a local minimum at b.
Proof. If the right derivative of f exists at a, and f has a local maximum at a, then there exists δ > 0 such that f (x) ≤ f (a) for a ≤ x < a + δ, so f (a+ ) = lim+ h→0 f (a + h) − f (a)
≤ 0. h Similarly, if the left derivative of f exists at b, and f has a local maximum at b, then f (x) ≤ f (b) for b − δ < x ≤ b, so f (b− ) ≥ 0. The signs are reversed for local minima at the endpoints.
In searching for extreme values of a function, it is convenient to introduce the following classification of points in the domain of the function.

152

8. Differentiable Functions

Definition 8.29. Suppose that f : A ⊂ R → R. An interior point c ∈ A such that f is not differentiable at c or f (c) = 0 is called a critical point of f . An interior point where f (c) = 0 is called a stationary point of f .
Theorem 8.27 limits the search for maxima or minima of a function f on A to the following points.
(1) Boundary points of A.
(2) Critical points of f :
(a) interior points where f is not differentiable;
(b) stationary points where f (c) = 0.
Additional tests are required to determine which of these points gives local or global extreme values of f . In particular, a function need not attain an extreme value at a critical point.
Example 8.30. If f : [−1, 1] → R is the function f (x) =

x if −1 ≤ x ≤ 0,
2x if 0 < x ≤ 1,

then x = 0 is a critical point since f is not differentiable at 0, but f does not attain a local extreme value at 0. The global maximum and minimum of f are attained at the endpoints x = 1 and x = −1, respectively, and f has no other local extreme values. Example 8.31. If f : [−1, 1] → R is the function f (x) = x3 , then x = 0 is a critical point since f (0) = 0, but f does not attain a local extreme value at 0. The global maximum and minimum of f are attained at the endpoints x = 1 and x = −1, respectively, and f has no other local extreme values.

8.5. The mean value theorem
The mean value theorem is a key result that connects the global behavior of a function f : [a, b] → R, described by the difference f (b) − f (a), to its local behavior, described by the derivative f : (a, b) → R. We begin by proving a special case.
Theorem 8.32 (Rolle). Suppose that f : [a, b] → R is continuous on the closed, bounded interval [a, b], differentiable on the open interval (a, b), and f (a) = f (b).
Then there exists a < c < b such that f (c) = 0.
Proof. By the Weierstrass extreme value theorem, Theorem 7.37, f attains its global maximum and minimum values on [a, b]. If these are both attained at the endpoints, then f is constant, and f (c) = 0 for every a < c < b. Otherwise, f attains at least one of its global maximum or minimum values at an interior point a < c < b. Theorem 8.27 implies that f (c) = 0.
Note that we require continuity on the closed interval [a, b] but differentiability only on the open interval (a, b). This proof is deceptively simple, but the result is not trivial. It relies on the extreme value theorem, which in turn relies on the completeness of R. The theorem would not be true if we restricted attention to functions defined on the rationals Q.

8.5. The mean value theorem

153

The mean value theorem is an immediate consequence of Rolle’s theorem: for a general function f with f (a) = f (b), we subtract off a linear function to make the values of the resulting function equal at the endpoints.
Theorem 8.33 (Mean value). Suppose that f : [a, b] → R is continuous on the closed, bounded interval [a, b] and differentiable on the open interval (a, b). Then there exists a < c < b such that f (b) − f (a) f (c) =
.
b−a
Proof. The function g : [a, b] → R defined by g(x) = f (x) − f (a) −

f (b) − f (a)
(x − a) b−a is continuous on [a, b] and differentiable on (a, b) with f (b) − f (a)
.
b−a
Moreover, g(a) = g(b) = 0. Rolle’s Theorem implies that there exists a < c < b such that g (c) = 0, which proves the result. g (x) = f (x) −

Graphically, this result says that there is point a < c < b at which the slope of the tangent line to the graph y = f (x) is equal to the slope of the chord between the endpoints (a, f (a)) and (b, f (b)).
As a first application, we prove a converse to the obvious fact that the derivative of a constant functions is zero.
Theorem 8.34. If f : (a, b) → R is differentiable on (a, b) and f (x) = 0 for every a < x < b, then f is constant on (a, b).
Proof. Fix x0 ∈ (a, b). The mean value theorem implies that for all x ∈ (a, b) with x = x0 f (x) − f (x0 ) f (c) = x − x0 for some c between x0 and x. Since f (c) = 0, it follows that f (x) = f (x0 ) for all x ∈ (a, b), meaning that f is constant on (a, b).
Corollary 8.35. If f, g : (a, b) → R are differentiable on (a, b) and f (x) = g (x) for every a < x < b, then f (x) = g(x) + C for some constant C.
Proof. This follows from the previous theorem since (f − g) = 0.
We can also use the mean value theorem to relate the monotonicity of a differentiable function with the sign of its derivative. (See Definition 7.54 for our terminology for increasing and decreasing functions.)
Theorem 8.36. Suppose that f : (a, b) → R is differentiable on (a, b). Then f is increasing if and only if f (x) ≥ 0 for every a < x < b, and decreasing if and only if f (x) ≤ 0 for every a < x < b. Furthermore, if f (x) > 0 for every a < x < b then f is strictly increasing, and if f (x) < 0 for every a < x < b then f is strictly decreasing. 154

8. Differentiable Functions

Proof. If f is increasing and a < x < b, then f (x + h) − f (x)
≥0
h for all sufficiently small h (positive or negative), so f (x) = lim

h→0

f (x + h) − f (x)
≥ 0. h Conversely if f ≥ 0 and a < x < y < b, then by the mean value theorem there exists x < c < y such that f (y) − f (x)
= f (c) ≥ 0, y−x which implies that f (x) ≤ f (y), so f is increasing. Moreover, if f (c) > 0, we get f (x) < f (y), so f is strictly increasing.
The results for a decreasing function f follow in a similar way, or we can apply of the previous results to the increasing function −f .
Note that although f > 0 implies that f is strictly increasing, f is strictly increasing does not imply that f > 0.
Example 8.37. The function f : R → R defined by f (x) = x3 is strictly increasing on R, but f (0) = 0.
If f is continuously differentiable and f (c) > 0, then f (x) > 0 for all x in a neighborhood of c and Theorem 8.36 implies that f is strictly increasing near c.
This conclusion may fail if f is not continuously differentiable at c.
Example 8.38. Define f : R → R by f (x) =

x/2 + x2 sin(1/x)
0

if x = 0, if x = 0.

Then f is differentiable on R with f (x) =

1/2 − cos(1/x) + 2x sin(1/x)
1/2

if x = 0, if x = 0.

Every neighborhood of 0 includes intervals where f < 0 or f > 0, in which f is strictly decreasing or strictly increasing, respectively. Thus, despite the fact that f (0) > 0, the function f is not strictly increasing in any neighborhood of 0. As a result, no local inverse of the function f exists on any neighborhood of 0.

8.6. Taylor’s theorem
If f : (a, b) → R is differentiable on (a, b) and f : (a, b) → R is differentiable, then we define the second derivative f : (a, b) → R of f as the derivative of f . We define higher-order derivatives similarly. If f has derivatives f (n) : (a, b) → R of all orders n ∈ N, then we say that f is infinitely differentiable on (a, b).
Taylor’s theorem gives an approximation for an (n + 1)-times differentiable function in terms of its Taylor polynomial of degree n.

8.6. Taylor’s theorem

155

Definition 8.39. Let f : (a, b) → R and suppose that f has n derivatives f , f , . . . f (n) : (a, b) → R on (a, b). The Taylor polynomial of degree n of f at a < c < b is
Pn (x) = f (c) + f (c)(x − c) +

1
1
f (c)(x − c)2 + · · · + f (n) (c)(x − c)n .
2!
n!

Equivalently, n ak (x − c)k ,

Pn (x) =

ak =

k=0

1 (k) f (c). k! We call ak the kth Taylor coefficient of f at c. The computation of the Taylor polynomials in the following examples are left as an exercise.
Example 8.40. If P (x) is a polynomial of degree n, then Pn (x) = P (x).
Example 8.41. The Taylor polynomial of degree n of ex at x = 0 is
Pn (x) = 1 + x +

1
1 2 x · · · + xn .
2!
n!

Example 8.42. The Taylor polynomial of degree 2n of cos x at x = 0 is
P2n (x) = 1 −

1
1
1 2 x + x4 − · · · + (−1)n x2n .
2!
4!
(2n)!

We also have P2n+1 = P2n since the Tayor coefficients of odd order are zero.
Example 8.43. The Taylor polynomial of degree 2n + 1 of sin x at x = 0 is
P2n+1 (x) = x −

1 3
1
1 x + x5 − · · · + (−1)n x2n+1 .
3!
5!
(2n + 1)!

We also have P2n+2 = P2n+1 .
Example 8.44. The Taylor polynomial of degree n of 1/x at x = 1 is
Pn (x) = 1 − (x − 1) + (x − 1)2 − · · · + (−1)n (x − 1)n .
Example 8.45. The Taylor polynomial of degree n of log x at x = 1 is
1
1
Pn (x) = (x − 1) − (x − 1)2 + (x − 1)3 − · · · + (−1)n+1 (x − 1)n .
2
3
We write f (x) = Pn (x) + Rn (x). where Rn is the error, or remainder, between f and its Taylor polynomial Pn . The next theorem is one version of Taylor’s theorem, which gives an expression for the remainder due to Lagrange. It can be regarded as a generalization of the mean value theorem, which corresponds to the case n = 1. The idea of the proof is to subtract a suitable polynomial from the function and apply Rolle’s theorem, just as we proved the mean value theorem by subtracting a suitable linear function.

156

8. Differentiable Functions

Theorem 8.46 (Taylor with Lagrange Remainder). Suppose that f : (a, b) → R has n + 1 derivatives on (a, b) and let a < c < b. For every a < x < b, there exists ξ between c and x such that
1
1 f (x) = f (c) + f (c)(x − c) + f (c)(x − c)2 + · · · + f (n) (c)(x − c)n + Rn (x)
2!
n! where 1 f (n+1) (ξ)(x − c)n+1 .
Rn (x) =
(n + 1)!
Proof. Fix x, c ∈ (a, b). For t ∈ (a, b), let
1
1 g(t) = f (x) − f (t) − f (t)(x − t) − f (t)(x − t)2 − · · · − f (n) (t)(x − t)n .
2!
n!
Then g(x) = 0 and
1
g (t) = − f (n+1) (t)(x − t)n . n! Define n+1 x−t h(t) = g(t) − g(c). x−c
Then h(c) = h(x) = 0, so by Rolle’s theorem, there exists a point ξ between c and x such that h (ξ) = 0, which implies that g (ξ) + (n + 1)

(x − ξ)n g(c) = 0.
(x − c)n+1

It follows from the expression for g that
(x − ξ)n
1 (n+1) f (ξ)(x − ξ)n = (n + 1) g(c), n!
(x − c)n+1 and using the expression for g in this equation, we get the result.
Note that the remainder term
Rn (x) =

1 f (n+1) (ξ)(x − c)n+1
(n + 1)!

has the same form as the (n + 1)th-term in the Taylor polynomial of f , except that the derivative is evaluated at a (typically unknown) intermediate point ξ between c and x, instead of at c.
Example 8.47. Let us prove that lim x→0

1 − cos x x2 =

1
.
2

By Taylor’s theorem,
1
1 cos x = 1 − x2 + (cos ξ)x4
2
4! for some ξ between 0 and x. It follows that for x = 0,
1 − cos x 1
1
− = − (cos ξ)x2 . x2 2
4!
Since | cos ξ| ≤ 1, we get
1 − cos x 1
1

≤ x2 ,
2
x
2
4!

8.7. * The inverse function theorem

157

which implies that
1 − cos x 1

= 0. x2 2
Note that as well as proving the limit, Taylor’s theorem gives an explicit upper bound for the difference between (1 − cos x)/x2 and its limit 1/2. For example, lim x→0

1 − cos(0.1) 1
1


.
(0.1)2
2
2400
Numerically, we have
1 1 − cos(0.1)
≈ 0.00041653,

2
(0.1)2

1
≈ 0.00041667.
2400

In Section 12.7, we derive an alternative expression for the remainder Rn as an integral. 8.7. * The inverse function theorem
The inverse function theorem gives a sufficient condition for a differentiable function f to be locally invertible at a point c with differentiable inverse: namely, that f is continuously differentiable at c and f (c) = 0. Example 8.24 shows that one cannot expect the inverse of a differentiable function f to exist locally at c if f (c) = 0, while Example 8.38 shows that the condition f (c) = 0 is not, on its own, sufficient to imply the existence of a local inverse.
Before stating the theorem, we give a precise definition of local invertibility.
Definition 8.48. A function f : A → R is locally invertible at an interior point c ∈ A if there exist open neighborhoods U of c and V of f (c) such that f |U : U → V is one-to-one and onto, in which case f has a local inverse ( f |U )−1 : V → U .
The following examples illustrate the definition.
Example 8.49. If f : R → R is the square function f (x) = x2 , then a local inverse at c = 2 with U = (1, 3) and V = (1, 9) is defined by

( f |U )−1 (y) = y.
Similarly, a local inverse at c = −2 with U = (−3, −1) and V = (1, 9) is defined by

( f |U )−1 (y) = − y.
In defining a local inverse at c, we require that it maps an open neighborhood V of f (c) onto an open neighborhood U of c; that is, we want ( f |U )−1 (y) to be “close” to c when y is “close” to f (c), not some more distant point that f also maps “close” to f (c). Thus, the one-to-one, onto function g defined by

− y if 1 < y < 4 g : (1, 9) → (−2, −1) ∪ [2, 3), g(y) = √ y if 4 ≤ y < 9 is not a local inverse of f at c = 2 in the sense of Definition 8.48, even though g(f (2)) = 2 and both compositions f ◦ g : (1, 9) → (1, 9),

g ◦ f : (−2, −1) ∪ [2, 3) → (−2, −1) ∪ [2, 3)

are identity maps, since U = (−2, −1) ∪ [2, 3) is not a neighborhood of 2.

158

8. Differentiable Functions

Example 8.50. The function f : R → R defined by f (x) =

cos (1/x)
0

if x = 0 if x = 0

is locally invertible at every c ∈ R with c = 0 or c = 1/(nπ) for some n ∈ Z.
Theorem 8.51 (Inverse function). Suppose that f : A ⊂ R → R and c ∈ A is an interior point of A. If f is differentiable in a neighborhood of c, f (c) = 0, and f is continuous at c, then there are open neighborhoods U of c and V of f (c) such that f has a local inverse ( f |U )−1 : V → U . Furthermore, the local inverse function is differentiable at f (c) with derivative
1
[( f |U )−1 ] (f (c)) =
.
f (c)
.
Proof. Suppose, for definiteness, that f (c) > 0 (otherwise, consider −f ). By the continuity of f , there exists an open interval U = (a, b) containing c on which f > 0. It follows from Theorem 8.36 that f is strictly increasing on U . Writing
V = f (U ) = (f (a), f (b)) , we see that f |U : U → V is one-to-one and onto, so f has a local inverse on V , which proves the first part of the theorem.
It remains to prove that the local inverse ( f |U )−1 , which we denote by f −1 for short, is differentiable. First, since f is differentiable at c, we have f (c + h) = f (c) + f (c)h + r(h) where the remainder r satisfies r(h) = 0. h Since f (c) > 0, there exists δ > 0 such that
1
|r(h)| ≤ f (c)|h| for |h| < δ.
2
It follows from the differentiability of f that, if |h| < δ, lim h→0

f (c)|h| = |f (c + h) − f (c) − r(h)|
≤ |f (c + h) − f (c)| + |r(h)|
1
≤ |f (c + h) − f (c)| + f (c)|h|.
2
Absorbing the term proportional to |h| on the right hand side of this inequality into the left hand side and writing f (c + h) = f (c) + k, we find that

1 f (c)|h| ≤ |k| for |h| < δ.
2
Choosing δ > 0 small enough that (c − δ, c + δ) ⊂ U , we can express h in terms of k as h = f −1 (f (c) + k) − f −1 (f (c)).

8.7. * The inverse function theorem

159

Using this expression in the expansion of f evaluated at c + h, f (c + h) = f (c) + f (c)h + r(h), we get that f (c) + k = f (c) + f (c) f −1 (f (c) + k) − f −1 (f (c)) + r(h).
Simplifying and rearranging this equation, we obtain the corresponding expansion for f −1 evaluated at f (c) + k, f −1 (f (c) + k) = f −1 (f (c)) +

1 k + s(k), f (c)

where the remainder s is given by
1
1 r (h) = − r f −1 (f (c) + k) − f −1 (f (c)) . s(k) = − f (c) f (c)
Since f (c)|h|/2 ≤ |k|, it follows that
|s(k)|
2 |r(h)|

.
|k|
f (c)2 |h|
Therefore, by the “sandwich” theorem and the fact that h → 0 as k → 0, lim k→0

|s(k)|
= 0.
|k|

This result proves that f −1 is differentiable at f (c) with f −1 (f (c)) =

1
.
f (c)

The expression for the derivative of the inverse also follows from Proposition 8.23, but only once we know that f −1 is dfferentiable at f (c).
One can show that Theorem 8.51 remains true under the weaker hypothesis that the derivative exists and is nonzero in an open neighborhood of c, but in practise, we almost always apply the theorem to continuously differentiable functions.
The inverse function theorem generalizes to functions of several variables, f :
A ⊂ Rn → Rn , with a suitable generalization of the derivative of f at c as the linear map f (c) : Rn → Rn that approximates f near c. A different proof of the existence of a local inverse is required in that case, since one cannot use monotonicity arguments. As an example of the application of the inverse function theorem, we consider a simple problem from bifurcation theory.
Example 8.52. Consider the transcendental equation y = x − k (ex − 1) where k ∈ R is a constant parameter. Suppose that we want to solve for x ∈ R given y ∈ R. If y = 0, then an obvious solution is x = 0. The inverse function theorem applied to the continuously differentiable function f (x; k) = x − k(ex − 1) implies that there are neighborhoods U , V of 0 (depending on k) such that the equation has a unique solution x ∈ U for every y ∈ V provided that the derivative

160

8. Differentiable Functions

0.5

y

0

−0.5

−1
−1

−0.5

0 x 0.5

1

Figure 2. Graph of y = f (x; k) for the function in Example 8.52: (a) k = 0.5
(green); (b) k = 1 (blue); (c) k = 1.5 (red). When y is sufficiently close to zero, there is a unique solution for x in some neighborhood of zero unless k = 1.

of f with respect to x at 0, given by fx (0; k) = 1 − k is non-zero i.e., provided that k = 1 (see Figure 2).

8.7. * The inverse function theorem

161

3
2.5
2
1.5

x

1
0.5
0
−0.5
−1
−1.5
−2
0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

k

Figure 3. Plot of the solutions for x of the nonlinear equation x = k(ex − 1) as a function of the parameter k (see Example 8.52). The point (x, k) = (0, 1) where the two solution branches cross is called a bifurcation point.

Alternatively, we can fix a value of y, say y = 0, and ask how the solutions of the corresponding equation for x, x − k (ex − 1) = 0,

162

8. Differentiable Functions

depend on the parameter k. Figure 2 plots the solutions for x as a function of k for
0.2 ≤ k ≤ 2. The equation has two different solutions for x unless k = 1. The branch of nonzero solutions crosses the branch of zero solution at the point (x, k) = (0, 1), called a bifurcation point. The implicit function theorem, which is a generalization of the inverse function theorem, implies that a necessary condition for a solution
(x0 , k0 ) of the equation f (x; k) = 0 to be a bifurcation point, meaning that the equation fails to have a unique solution branch x = g(k) in some neighborhood of
(x0 , k0 ), is that fx (x0 ; k0 ) = 0.

8.8. * L’Hˆspital’s rule o In this section, we prove a rule (much beloved by calculus students) for the evaluation of inderminate limits of the form 0/0 or ∞/∞. Our proof uses the following generalization of the mean value theorem.
Theorem 8.53 (Cauchy mean value). Suppose that f, g : [a, b] → R are continuous on the closed, bounded interval [a, b] and differentiable on the open interval (a, b).
Then there exists a < c < b such that f (c) [g(b) − g(a)] = [f (b) − f (a)] g (c).
Proof. The function h : [a, b] → R defined by h(x) = [f (x) − f (a)] [g(b) − g(a)] − [f (b) − f (a)] [g(x) − g(a)] is continuous on [a, b] and differentiable on (a, b) with h (x) = f (x) [g(b) − g(a)] − [f (b) − f (a)] g (x).
Moreover, h(a) = h(b) = 0. Rolle’s Theorem implies that there exists a < c < b such that h (c) = 0, which proves the result.
If g(x) = x, then this theorem reduces to the usual mean value theorem (Theorem 8.33). Next, we state one form of l’Hˆspital’s rule. o Theorem 8.54 (l’Hˆspital’s rule: 0/0). Suppose that f, g : (a, b) → R are differeno tiable functions on a bounded open interval (a, b) such that g (x) = 0 for x ∈ (a, b) and lim f (x) = 0, lim g(x) = 0. x→a+ x→a+

Then lim+ x→a

f (x)
= L implies that g (x)

lim

x→a+

f (x)
= L. g(x) Proof. We may extend f, g : [a, b) → R to continuous functions on [a, b) by defining f (a) = g(a) = 0. If a < x < b, then by the mean value theorem, there exists a < c < x such that g(x) = g(x) − g(a) = g (c)(x − a) = 0, so g = 0 on (a, b). Moreover, by the Cauchy mean value theorem (Theorem 8.53), there exists a < c < x such that f (x) f (x) − f (a) f (c)
=
=
.
g(x) g(x) − g(a) g (c)

8.8. * L’Hˆspital’s rule o 163

Since c → a+ as x → a+ , the result follows. (In fact, since a < c < x, the δ that
“works” for f /g also “works” for f /g.)
Example 8.55. Using l’Hˆspital’s rule twice (verify that all of the hypotheses are o satisfied!), we find that lim+ x→0

sin x
1 − cos x cos x
1
= lim+
= lim+
= .
2
x
2x
2
2
x→0 x→0 Analogous results and proofs apply to left limits (x → a− ), two-sided limits
(x → a), and infinite limits (x → ∞ or x → −∞). Alternatively, one can reduce these limits to the left limit considered in Theorem 8.54.
For example, suppose that f, g : (a, ∞) → R are differentiable, g = 0, and f (x) → 0, g(x) → 0 as x → ∞. Assuming that a > 0 without loss of generality, we define F, G : (0, 1/a) → R by
F (t) = f

1 t ,

G(t) = g

1 t 1 t ,

G (t) = −

1 g t2

.

The chain rule implies that
F (t) = −

1 f t2

1 t .

Replacing limits as x → ∞ by equivalent limits as t → 0+ and applying Theorem 8.54 to F , G, all of whose hypothesis are satisfied if the limit of f (x)/g (x) as x → ∞ exists, we get lim x→∞

f (x)
F (t)
F (t) f (x)
= lim
= lim
= lim
.
+ G(t)
+ G (t) x→∞ g (x) g(x) t→0 t→0 A less straightforward generalization is to the case when g and possibly f have infinite limits as x → a+ . In that case, we cannot simply extend f and g by continuity to the point a. Instead, we introduce two points a < x < y < b and consider the limits x → a+ followed by y → a+ .
Theorem 8.56 (l’Hˆspital’s rule: ∞/∞). Suppose that f, g : (a, b) → R are o differentiable functions on a bounded open interval (a, b) such that g (x) = 0 for x ∈ (a, b) and lim |g(x)| = ∞. x→a+ Then lim x→a+

f (x)
= L implies that g (x)

lim

x→a+

f (x)
= L. g(x) Proof. Since |g(x)| → ∞ as x → a+ , we have g = 0 near a, and we may assume without loss of generality that g = 0 on (a, b). If a < x < y < b, then the mean value theorem implies that g(x) − g(y) = 0, since g = 0, and the Cauchy mean value theorem implies that there exists x < c < y such that f (x) − f (y) f (c)
=
. g(x) − g(y) g (c)

164

8. Differentiable Functions

We may therefore write f (x) f (x) − f (y) g(x) − g(y) f (y)
=
+ g(x) g(x) − g(y) g(x) g(x) f (c) g(y) f (y)
=
1−
+
. g (c) g(x) g(x)
It follows that f (c) f (c) f (x)
−L ≤
−L + g(x) g (c) g (c)
Given

g(y) f (y)
+
. g(x) g(x)

> 0, choose δ > 0 such that f (c)
−L < g (c)

for a < c < a + δ.

Then, since a < c < y, we have for all a < x < y < a + δ that g(y) f (x) f (y)
− L < + (|L| + )
+
. g(x) g(x) g(x) Fixing y, taking the lim sup of this inequality as x → a+ , and using the assumption that |g(x)| → ∞, we find that f (x)
−L ≤ . g(x) lim sup x→a+ Since

> 0 is arbitrary, we have f (x)
− L = 0, g(x) lim sup x→a+ which proves the result.
Alternatively, instead of using the lim sup, we can verify the limit explicitly by an “ /3”-argument. Given > 0, choose η > 0 such that f (c)
−L < g (c)
3

for a < c < a + η,

choose a < y < a + η, and let δ1 = y − a > 0. Next, choose δ2 > 0 such that
|g(x)| >

3

|L| +

3

|g(y)|

for a < x < a + δ2 ,

and choose δ3 > 0 such that
|g(x)| >

3

|f (y)|

for a < x < a + δ3 .

Let δ = min(δ1 , δ2 , δ3 ) > 0. Then for a < x < a + δ, we have f (x) f (c) f (c)
−L ≤
−L + g(x) g (c) g (c) which proves the result.

g(y) f (y)
+
< + + , g(x) g(x)
3 3 3

8.8. * L’Hˆspital’s rule o 165

We often use this result when both f (x) and g(x) diverge to infinity as x → a+ , but no assumption on the behavior of f (x) is required.
As for the previous theorem, analogous results and proofs apply to other limits
(x → a− , x → a, or x → ±∞). There are also versions of l’Hˆspital’s rule that o imply the divergence of f (x)/g(x) to ±∞, but we consider here only the case of a finite limit L.
Example 8.57. Since ex → ∞ as x → ∞, we get by l’Hˆspital’s rule that o 1 x = lim x = 0. lim x→∞ e x→∞ ex
Similarly, since x → ∞ as as x → ∞, we get by l’Hˆspital’s rule that o 1/x log x lim = lim
= 0. x→∞ x x→∞ 1
That is, ex grows faster than x and log x grows slower than x as x → ∞. We also write these limits using “little oh” notation as x = o(ex ) and log x = o(x) as x → ∞.
Finally, we note that one cannot use l’Hˆspital’s rule “in reverse” to deduce o that f /g has a limit if f /g has a limit.
Example 8.58. Let f (x) = x + sin x and g(x) = x. Then f (x), g(x) → ∞ as x → ∞ and f (x) sin x lim = lim 1 +
= 1, x→∞ g(x) x→∞ x but the limit f (x)
= lim (1 + cos x) lim x→∞ x→∞ g (x) does not exist.

Chapter 9

Sequences and Series of
Functions

In this chapter, we define and study the convergence of sequences and series of functions. There are many different ways to define the convergence of a sequence of functions, and different definitions lead to inequivalent types of convergence. We consider here two basic types: pointwise and uniform convergence.

9.1. Pointwise convergence
Pointwise convergence defines the convergence of functions in terms of the convergence of their values at each point of their domain.
Definition 9.1. Suppose that (fn ) is a sequence of functions fn : A → R and f : A → R. Then fn → f pointwise on A if fn (x) → f (x) as n → ∞ for every x ∈ A.
We say that the sequence (fn ) converges pointwise if it converges pointwise to some function f , in which case f (x) = lim fn (x). n→∞ Pointwise convergence is, perhaps, the most obvious way to define the convergence of functions, and it is one of the most important. Nevertheless, as the following examples illustrate, it is not as well-behaved as one might initially expect.
Example 9.2. Suppose that fn : (0, 1) → R is defined by n fn (x) =
.
nx + 1
Then, since x = 0, lim fn (x) = lim

n→∞

n→∞

1
1
= , x + 1/n x 167

168

9. Sequences and Series of Functions

so fn → f pointwise where f : (0, 1) → R is given by f (x) =

1
.
x

We have |fn (x)| < n for all x ∈ (0, 1), so each fn is bounded on (0, 1), but the pointwise limit f is not. Thus, pointwise convergence does not, in general, preserve boundedness. Example 9.3. Suppose that fn : [0, 1] → R is defined by fn (x) = xn . If 0 ≤ x < 1, then xn → 0 as n → ∞, while if x = 1, then xn → 1 as n → ∞. So fn → f pointwise where
0
1

f (x) =

if 0 ≤ x < 1, if x = 1.

Although each fn is continuous on [0, 1], the pointwise limit f is not (it is discontinuous at 1). Thus, pointwise convergence does not, in general, preserve continuity.
Example 9.4. Define fn : [0, 1] → R by

2n2 x if 0 ≤ x ≤ 1/(2n)

2 fn (x) = 2n (1/n − x) if 1/(2n) < x < 1/n,


0
1/n ≤ x ≤ 1.
If 0 < x ≤ 1, then fn (x) = 0 for all n ≥ 1/x, so fn (x) → 0 as n → ∞; and if x = 0, then fn (x) = 0 for all n, so fn (x) → 0 also. It follows that fn → 0 pointwise on [0, 1]. This is the case even though max fn = n → ∞ as n → ∞. Thus, a pointwise convergent sequence (fn ) of functions need not be uniformly bounded
(that is, bounded independently of n), even if it converges to zero.
Example 9.5. Define fn : R → R by fn (x) =

sin nx
.
n

Then fn → 0 pointwise on R. The sequence (fn ) of derivatives fn (x) = cos nx does not converge pointwise on R; for example, fn (π) = (−1)n does not converge as n → ∞. Thus, in general, one cannot differentiate a pointwise convergent sequence. This behavior isn’t limited to pointwise convergent sequences, and happens because the derivative of a small, rapidly oscillating function can be large. Example 9.6. Define fn : R → R by fn (x) =

x2 x2 + 1/n

.

If x = 0, then lim n→∞

x2 x2 + 1/n

=

x2
= |x|
|x|

9.2. Uniform convergence

169

while fn (0) = 0 for all n ∈ N, so fn → |x| pointwise on R. Moreover,

1 if x > 0

3 x + 2x/n fn (x) = 2
→ 0 if x = 0

(x + 1/n)3/2

−1 if x < 0
The pointwise limit |x| isn’t differentiable at 0 even though all of the fn are differentiable on R and the derivatives fn converge pointwise on R. (The fn ’s “round off” the corner in the absolute value function.)
Example 9.7. Define fn : R → R by x n
.
n
Then, by the limit formula for the exponential, fn → ex pointwise on R. fn (x) = 1 +

9.2. Uniform convergence
In this section, we introduce a stronger notion of convergence of functions than pointwise convergence, called uniform convergence. The difference between pointwise convergence and uniform convergence is analogous to the difference between continuity and uniform continuity.
Definition 9.8. Suppose that (fn ) is a sequence of functions fn : A → R and f : A → R. Then fn → f uniformly on A if, for every > 0, there exists N ∈ N such that n > N implies that |fn (x) − f (x)| < for all x ∈ A.
When the domain A of the functions is understood, we will often say fn → f uniformly instead of uniformly on A.
The crucial point in this definition is that N depends only on and not on x ∈ A, whereas for a pointwise convergent sequence N may depend on both and
x. A uniformly convergent sequence is always pointwise convergent (to the same limit), but the converse is not true. If a sequence converges pointwise, it may happen that for some > 0 one needs to choose arbitrarily large N ’s for different points x ∈ A, meaning that the sequences of values converge arbitrarily slowly on A. In that case a pointwise convergent sequence of functions is not uniformly convergent. Example 9.9. The sequence fn (x) = xn in Example 9.3 converges pointwise on
[0, 1] but not uniformly on [0, 1]. For 0 ≤ x < 1, we have
|fn (x) − f (x)| = xn .
If 0 < < 1, we cannot make xn < for all 0 ≤ x < 1 however large we choose
n. The problem is that xn converges to 0 at an arbitrarily slow rate for x sufficiently close to 1. There is no difficulty in the rate of convergence at 1 itself, since fn (1) = 1 for every n ∈ N. As we will show, the uniform limit of continuous functions is continuous, so since the pointwise limit of the continuous functions fn is discontinuous, the sequence cannot converge uniformly on [0, 1]. The sequence does, however, converge uniformly to 0 on [0, b] for every 0 ≤ b < 1; given > 0, we take N large enough that bN < .

170

9. Sequences and Series of Functions

Example 9.10. The pointwise convergent sequence in Example 9.4 does not converge uniformly. If it did, it would have to converge to the pointwise limit 0, but fn 1
2n

= n,

so for no > 0 does there exist an N ∈ N such that |fn (x) − 0| < and n > N , since this inequality fails for n ≥ if x = 1/(2n).

for all x ∈ A

Example 9.11. The functions in Example 9.5 converge uniformly to 0 on R, since
| sin nx|
1
≤ , n n for all x ∈ R if n > 1/ .
|fn (x)| =

so |fn (x) − 0| <

9.3. Cauchy condition for uniform convergence
The Cauchy condition in Definition 3.45 provides a necessary and sufficient condition for a sequence of real numbers to converge. There is an analogous uniform
Cauchy condition that provides a necessary and sufficient condition for a sequence of functions to converge uniformly.
Definition 9.12. A sequence (fn ) of functions fn : A → R is uniformly Cauchy on A if for every > 0 there exists N ∈ N such that m, n > N implies that |fm (x) − fn (x)| <

for all x ∈ A.

The key part of the following proof is the argument to show that a pointwise convergent, uniformly Cauchy sequence converges uniformly.
Theorem 9.13. A sequence (fn ) of functions fn : A → R converges uniformly on
A if and only if it is uniformly Cauchy on A.
Proof. Suppose that (fn ) converges uniformly to f on A. Then, given > 0, there exists N ∈ N such that
|fn (x) − f (x)| <

2

for all x ∈ A if n > N .

It follows that if m, n > N then
|fm (x) − fn (x)| ≤ |fm (x) − f (x)| + |f (x) − fn (x)| <

for all x ∈ A,

which shows that (fn ) is uniformly Cauchy.
Conversely, suppose that (fn ) is uniformly Cauchy. Then for each x ∈ A, the real sequence (fn (x)) is Cauchy, so it converges by the completeness of R. We define f : A → R by f (x) = lim fn (x), n→∞ and then fn → f pointwise.
To prove that fn → f uniformly, let > 0. Since (fn ) is uniformly Cauchy, we can choose N ∈ N (depending only on ) such that
|fm (x) − fn (x)| <

2

for all x ∈ A if m, n > N .

9.4. Properties of uniform convergence

171

Let n > N and x ∈ A. Then for every m > N we have
|fn (x) − f (x)| ≤ |fn (x) − fm (x)| + |fm (x) − f (x)| <

+ |fm (x) − f (x)|.
2
Since fm (x) → f (x) as m → ∞, we can choose m > N (depending on x, but it doesn’t matter since m doesn’t appear in the final result) such that
|fm (x) − f (x)| <

2

.

It follows that if n > N , then
|fn (x) − f (x)| <

for all x ∈ A,

which proves that fn → f uniformly.
Alternatively, we can take the limit as m → ∞ in the uniform Cauchy condition to get for all x ∈ A and n > N that
|f (x) − fn (x)| = lim |fm (x) − fn (x)| ≤ m→∞ 2

< .

9.4. Properties of uniform convergence
In this section we prove that, unlike pointwise convergence, uniform convergence preserves boundedness and continuity. Uniform convergence does not preserve differentiability any better than pointwise convergence. Nevertheless, we give a result that allows us to differentiate a convergent sequence; the key assumption is that the derivatives converge uniformly.
9.4.1. Boundedness. First, we consider the uniform convergence of bounded functions. Theorem 9.14. Suppose that fn : A → R is bounded on A for every n ∈ N and fn → f uniformly on A. Then f : A → R is bounded on A.
Proof. Taking = 1 in the definition of the uniform convergence, we find that there exists N ∈ N such that
|fn (x) − f (x)| < 1

for all x ∈ A if n > N .

Choose some n > N . Then, since fn is bounded, there is a constant M ≥ 0 such that |fn (x)| ≤ M for all x ∈ A.
It follows that
|f (x)| ≤ |f (x) − fn (x)| + |fn (x)| < 1 + M

for all x ∈ A,

meaning that f is bounded on A.
In particular, it follows that if a sequence of bounded functions converges pointwise to an unbounded function, then the convergence is not uniform.

172

9. Sequences and Series of Functions

Example 9.15. The sequence of functions fn : (0, 1) → R in Example 9.2, defined by n fn (x) =
,
nx + 1 cannot converge uniformly on (0, 1), since each fn is bounded on (0, 1), but the pointwise limit f (x) = 1/x is not. The sequence (fn ) does, however, converge uniformly to f on every interval [a, 1) with 0 < a < 1. To prove this, we estimate for a ≤ x < 1 that
|fn (x) − f (x)| =
Thus, given

1
1
n
1
1
=

.

<
2 nx + 1 x x(nx + 1) nx na2

> 0 choose N = 1/(a2 ), and then
|fn (x) − f (x)| <

for all x ∈ [a, 1) if n > N ,

which proves that fn → f uniformly on [a, 1). Note that
|f (x)| ≤

1 a for all x ∈ [a, 1)

so the uniform limit f is bounded on [a, 1), as Theorem 9.14 requires.
9.4.2. Continuity. One of the most important properties of uniform convergence is that it preserves continuity. We use an “ /3” argument to get the continuity of the uniform limit f from the continuity of the fn .
Theorem 9.16. If a sequence (fn ) of continuous functions fn : A → R converges uniformly on A ⊂ R to f : A → R, then f is continuous on A.
Proof. Suppose that c ∈ A and let

> 0. Then, for every n ∈ N,

|f (x) − f (c)| ≤ |f (x) − fn (x)| + |fn (x) − fn (c)| + |fn (c) − f (c)| .
By the uniform convergence of (fn ), we can choose n ∈ N such that
|fn (x) − f (x)| <

3

for all x ∈ A,

and for such an n it follows that
|f (x) − f (c)| < |fn (x) − fn (c)| +

2
.
3

(Here, we use the fact that fn is close to f at both x and c, where x is an an arbitrary point in a neighborhood of c; this is where we use the uniform convergence in a crucial way.)
Since fn is continuous on A, there exists δ > 0 such that
|fn (x) − fn (c)| <

3

if |x − c| < δ and x ∈ A,

which implies that
|f (x) − f (c)| <
This proves that f is continuous.

if |x − c| < δ and x ∈ A.

9.4. Properties of uniform convergence

173

This result can be interpreted as justifying an “exchange in the order of limits” lim lim fn (x) = lim lim fn (x).

n→∞ x→c

x→c n→∞

Such exchanges of limits always require some sort of condition for their validity — in this case, the uniform convergence of fn to f is sufficient, but pointwise convergence is not.
It follows from Theorem 9.16 that if a sequence of continuous functions converges pointwise to a discontinuous function, as in Example 9.3, then the convergence is not uniform. The converse is not true, however, and the pointwise limit of a sequence of continuous functions may be continuous even if the convergence is not uniform, as in Example 9.4.
9.4.3. Differentiability. The uniform convergence of differentiable functions does not, in general, imply anything about the convergence of their derivatives or the differentiability of their limit. As noted above, this is because the values of two functions may be close together while the values of their derivatives are far apart (if, for example, one function varies slowly while the other oscillates rapidly, as in Example 9.5). Thus, we have to impose strong conditions on a sequence of functions and their derivatives if we hope to prove that fn → f implies fn → f .
The following example shows that the limit of the derivatives need not equal the derivative of the limit even if a sequence of differentiable functions converges uniformly and their derivatives converge pointwise.
Example 9.17. Consider the sequence (fn ) of functions fn : R → R defined by x . fn (x) =
1 + nx2
Then fn → 0 uniformly on R. To see this, we write

n|x|
1
1
=√
|fn (x)| = √ n 1 + nx2 n √ where t = n|x|. We have t 1

1 + t2
2

t
1 + t2

for all t ∈ R,

since (1 − t)2 ≥ 0, which implies that 2t ≤ 1 + t2 . Using this inequality, we get
1
|fn (x)| ≤ √
2 n
Hence, given

for all x ∈ R.

> 0, choose N = 1/(4 2 ). Then
|fn (x)| <

for all x ∈ R if n > N ,

which proves that (fn ) converges uniformly to 0 on R. (Alternatively, we could get the same result by using calculus to compute the maximum value of |fn | on R.)
Each fn is differentiable with fn (x) =

1 − nx2
.
(1 + nx2 )2

174

9. Sequences and Series of Functions

It follows that fn → g pointwise as n → ∞ where g(x) =

0
1

if x = 0, if x = 0.

The convergence is not uniform since g is discontinuous at 0. Thus, fn → 0 uniformly, but fn (0) → 1, so the limit of the derivatives is not the derivative of the limit. However, we do get a useful result if we strengthen the assumptions and require that the derivatives converge uniformly, not just pointwise. The proof involves a slightly tricky application of the mean value theorem.
Theorem 9.18. Suppose that (fn ) is a sequence of differentiable functions fn :
(a, b) → R such that fn → f pointwise and fn → g uniformly for some f, g :
(a, b) → R. Then f is differentiable on (a, b) and f = g.
Proof. Let c ∈ (a, b), and let > 0 be given. To prove that f (c) = g(c), we estimate the difference quotient of f in terms of the difference quotients of the fn : f (x) − f (c) fn (x) − fn (c) f (x) − f (c)
− g(c) ≤

x−c x−c x−c fn (x) − fn (c)
− fn (c) + |fn (c) − g(c)|
+
x−c where x ∈ (a, b) and x = c. We want to make each of the terms on the right-hand side of the inequality less than /3. This is straightforward for the second term
(since fn is differentiable) and the third term (since fn → g). To estimate the first term, we approximate f by fm , use the mean value theorem, and let m → ∞.
Since fm −fn is differentiable, the mean value theorem implies that there exists ξ between c and x such that fm (x) − fm (c) fn (x) − fn (c)
(fm − fn )(x) − (fm − fn )(c)

= x−c x−c x−c = fm (ξ) − fn (ξ).
Since (fn ) converges uniformly, it is uniformly Cauchy by Theorem 9.13. Therefore there exists N1 ∈ N such that
|fm (ξ) − fn (ξ)| <

3

for all ξ ∈ (a, b) if m, n > N1 ,

which implies that fm (x) − fm (c) fn (x) − fn (c)

< . x−c x−c
3
Taking the limit of this equation as m → ∞, and using the pointwise convergence of (fm ) to f , we get that f (x) − f (c) fn (x) − fn (c)

≤ x−c x−c
3

for n > N1 .

Next, since (fn ) converges to g, there exists N2 ∈ N such that
|fn (c) − g(c)| <

3

for all n > N2 .

9.5. Series

175

Choose some n > max(N1 , N2 ). Then the differentiability of fn implies that there exists δ > 0 such that fn (x) − fn (c)
− fn (c) < if 0 < |x − c| < δ. x−c 3
Putting these inequalities together, we get that f (x) − f (c)
− g(c) < x−c if 0 < |x − c| < δ,

which proves that f is differentiable at c with f (c) = g(c).
Like Theorem 9.16, Theorem 9.18 can be interpreted as giving sufficient conditions for an exchange in the order of limits: lim lim

n→∞ x→c

fn (x) − fn (c) fn (x) − fn (c)
= lim lim
.
x→c n→∞ x−c x−c

It is worth noting that in Theorem 9.18 the derivatives fn are not assumed to be continuous. If they are continuous, then one can use Riemann integration and the fundamental theorem of calculus to give a simpler proof (see Theorem 12.21).

9.5. Series
The convergence of a series is defined in terms of the convergence of its sequence of partial sums, and any result about sequences is easily translated into a corresponding result about series.
Definition 9.19. Suppose that (fn ) is a sequence of functions fn : A → R. Let
(Sn ) be the sequence of partial sums Sn : A → R, defined by n Sn (x) =

fk (x). k=1 Then the series



S(x) =

fn (x) n=1 converges pointwise to S : A → R on A if Sn → S as n → ∞ pointwise on A, and uniformly to S on A if Sn → S uniformly on A.
We illustrate the definition with a series whose partial sums we can compute explicitly. Example 9.20. The geometric series


xn = 1 + x + x2 + x3 + . . . n=0 has partial sums n xk =

Sn (x) = k=0 1 − xn+1
.
1−x

176

9. Sequences and Series of Functions

Thus, Sn (x) → 1/(1 − x) as n → ∞ if |x| < 1 and diverges if |x| ≥ 1, meaning that


xn = n=0 1
1−x

pointwise on (−1, 1).

Since 1/(1−x) is unbounded on (−1, 1), Theorem 9.14 implies that the convergence cannot be uniform.
The series does, however, converge uniformly on [−ρ, ρ] for every 0 ≤ ρ < 1.
To prove this, we estimate for |x| ≤ ρ that
Sn (x) −

|x|n+1
1
ρn+1
=

.
1−x
1−x
1−ρ

Since ρn+1 /(1 − ρ) → 0 as n → ∞, given any only on and ρ, such that
0≤

ρn+1
<
1−ρ

> 0 there exists N ∈ N, depending

for all n > N .

It follows that n xk − k=0 1
<
1−x

for all x ∈ [−ρ, ρ] and all n > N ,

which proves that the series converges uniformly on [−ρ, ρ].
The Cauchy condition for the uniform convergence of sequences immediately gives a corresponding Cauchy condition for the uniform convergence of series.
Theorem 9.21. Let (fn ) be a sequence of functions fn : A → R. The series


fn n=1 converges uniformly on A if and only if for every that > 0 there exists N ∈ N such

n

fk (x) <

for all x ∈ A and all n > m > N .

k=m+1

Proof. Let

n

fk (x) = f1 (x) + f2 (x) + · · · + fn (x).

Sn (x) = k=1 From Theorem 9.13 the sequence (Sn ), and therefore the series uniformly if and only if for every > 0 there exists N such that
|Sn (x) − Sm (x)| <

fn , converges

for all x ∈ A and all n, m > N .

Assuming n > m without loss of generality, we have n Sn (x) − Sm (x) = fm+1 (x) + fm+2 (x) + · · · + fn (x) =

fk (x), k=m+1 so the result follows.

9.5. Series

177

The following simple criterion for the uniform convergence of a series is very useful. The name comes from the letter traditionally used to denote the constants, or “majorants,” that bound the functions in the series.
Theorem 9.22 (Weierstrass M -test). Let (fn ) be a sequence of functions fn : A →
R, and suppose that for every n ∈ N there exists a constant Mn ≥ 0 such that


|fn (x)| ≤ Mn

for all x ∈ A,

Mn < ∞. n=1 Then



fn (x). n=1 converges uniformly on A.
Proof. The result follows immediately from the observation that
Cauchy if
Mn is Cauchy.

fn is uniformly

In detail, let > 0 be given. The Cauchy condition for the convergence of a real series implies that there exists N ∈ N such that n Mk <

for all n > m > N .

k=m+1

Then for all x ∈ A and all n > m > N , we have n n

fk (x) ≤ k=m+1 |fk (x)| k=m+1 n



Mk k=m+1 < .
Thus,
fn satisfies the uniform Cauchy condition in Theorem 9.21, so it converges uniformly. Example 9.23. Returning to Example 9.20, we consider the geometric series


xn . n=0 If |x| ≤ ρ where 0 ≤ ρ < 1, then


|xn | ≤ ρn ,

ρn < 1. n=0 n

The M -test, with Mn = ρ , implies that the series converges uniformly on [−ρ, ρ].
Example 9.24. The series


f (x) =

1 cos (3n x)
2n
n=1

178

9. Sequences and Series of Functions

2
1.5
1

y

0.5
0
−0.5
−1
−1.5
−2

0

1

2

3

4

5

6

x

Figure 1. Graph of the Weierstrass continuous, nowhere differentiable function y = ∞ 2−n cos(3n x) on one period [0, 2π]. n=0 converges uniformly on R by the M -test since
1
1 cos (3n x) ≤ n , n 2
2



1
= 1.
2n
n=1

It then follows from Theorem 9.16 that f is continuous on R. (See Figure 1.)
Taking the formal term-by-term derivative of the series for f , we get a series whose coefficients grow with n,


− n=1 3
2

n

sin (3n x) ,

so we might expect that there are difficulties in differentiating f . As Figure 2 illustrates, the function doesn’t look smooth at any length-scale. Weierstrass (1872) proved that f is not differentiable at any point of R. Bolzano (1830) had also constructed a continuous, nowhere differentiable function, but his results weren’t published until 1922. Subsequently, Tagaki (1903) constructed a similar function to the Weierstrass function whose nowhere-differentiability is easier to prove. Such functions were considered to be highly counter-intuitive and pathological at the time
Weierstrass discovered them, and they weren’t well-received by many prominent mathematicians. 9.5. Series

179

0.05

−0.18

0

−0.2

−0.05
−0.22

y

y

−0.1
−0.24

−0.15
−0.26
−0.2
−0.28

−0.25

−0.3

4

4.02

4.04

4.06

4.08

−0.3

4.1

4

4.002

4.004

x

4.006

4.008

4.01

x

Figure 2. Details of the Weierstrass function for 4 ≤ x ≤ 4.1 (left) and
4 ≤ x ≤ 4.01 (right) showing its self-similar, fractal behavior under rescalings.

If the Weierstrass M -test applies to a series of functions to prove uniform convergence, then it also implies that the series converges absolutely, meaning that


|fn (x)| < ∞

for every x ∈ A.

n=1

Thus, the M -test is not applicable to series that converge uniformly but not absolutely.
Absolute convergence of a series is completely different from uniform convergence, and the two concepts shouldn’t be confused. Absolute convergence on A is a pointwise condition for each x ∈ A, while uniform convergence is a global condition that involves all points x ∈ A simultaneously. We illustrate the difference with a rather trivial example.
Example 9.25. Let fn : R → R be the constant function
(−1)n+1
. n fn converges on R to the constant function f (x) = c, where fn (x) =

Then



c=

(−1)n+1 n n=1

is the sum of the alternating harmonic series (c = log 2). The convergence of fn is uniform on R since the terms in the series do not depend on x, but the convergence isn’t absolute at any x ∈ R since the harmonic series


1 n n=1 diverges to infinity.

Chapter 10

Power Series

In discussing power series it is good to recall a nursery rhyme:
“There was a little girl
Who had a little curl
Right in the middle of her forehead
When she was good
She was very, very good
But when she was bad
She was horrid.”
(Robert Strichartz [14])
Power series are one of the most useful type of series in analysis. For example, we can use them to define transcendental functions such as the exponential and trigonometric functions (as well as many other less familiar functions).

10.1. Introduction
A power series (centered at 0) is a series of the form


an xn = a0 + a1 x + a2 x2 + · · · + an xn + . . . . n=0 where the constants an are some coefficients. If all but finitely many of the an are zero, then the power series is a polynomial function, but if infinitely many of the an are nonzero, then we need to consider the convergence of the power series.
The basic facts are these: Every power series has a radius of convergence 0 ≤
R ≤ ∞, which depends on the coefficients an . The power series converges absolutely in |x| < R and diverges in |x| > R. Moreover, the convergence is uniform on every interval |x| < ρ where 0 ≤ ρ < R. If R > 0, then the sum of the power series is infinitely differentiable in |x| < R, and its derivatives are given by differentiating the original power series term-by-term.
181

182

10. Power Series

Power series work just as well for complex numbers as real numbers, and are in fact best viewed from that perspective. We will consider here only real-valued power series, although many of the results extend immediately to complex-valued power series.
Definition 10.1. Let (an )∞ be a sequence of real numbers and c ∈ R. The power n=0 series centered at c with coefficients an is the series


an (x − c)n . n=0 Example 10.2. The following are power series centered at 0:


xn = 1 + x + x2 + x3 + x4 + . . . , n=0 ∞

1
1
1
1 n x = 1 + x + x2 + x3 + x4 + . . . , n! 2
6
24 n=0 ∞

(n!)xn = 1 + x + 2x2 + 6x3 + 24x4 + . . . , n=0 ∞ n (−1)n x2 = x − x2 + x4 − x8 + . . . . n=0 An example of a power series centered at 1 is


(−1)n+1
1
1
1
(x − 1)n = (x − 1) − (x − 1)2 + (x − 1)3 − (x − 1)4 + . . . . n 2
3
4 n=1 The power series in Definition 10.1 is a formal, algebraic expression, since we haven’t said anything yet about its convergence. By changing variables (x−c) → x, we can assume without loss of generality that a power series is centered at 0, and we will do so whenever it’s convenient.

10.2. Radius of convergence
First, we prove that every power series has a radius of convergence.
Theorem 10.3. Let



an (x − c)n n=0 be a power series. There is a non-negative, extended real number 0 ≤ R ≤ ∞ such that the series converges absolutely for 0 ≤ |x − c| < R and diverges for
|x − c| > R. Furthermore, if 0 ≤ ρ < R, then the power series converges uniformly on the interval |x − c| ≤ ρ, and the sum of the series is continuous in |x − c| < R.
Proof. We assume without loss of generality that c = 0. Suppose the power series


an xn
0
n=0

10.2. Radius of convergence

183

converges for some x0 ∈ R with x0 = 0. Then its terms converge to zero, so they are bounded and there exists M ≥ 0 such that
|an xn | ≤ M
0

for n = 0, 1, 2, . . . .

If |x| < |x0 |, then
|an xn | = |an xn |
0

x x0 n

≤ M rn ,

r=

x
< 1. x0 Comparing the power series with the convergent geometric series
M rn , we see that an xn is absolutely convergent. Thus, if the power series converges for some x0 ∈ R, then it converges absolutely for every x ∈ R with |x| < |x0 |.
Let
R = sup |x| ≥ 0 :

an xn converges .

If R = 0, then the series converges only for x = 0. If R > 0, then the series converges absolutely for every x ∈ R with |x| < R, since it converges for some x0 ∈ R with |x| < |x0 | < R. Moreover, the definition of R implies that the series diverges for every x ∈ R with |x| > R. If R = ∞, then the series converges for all x ∈ R.
Finally, let 0 ≤ ρ < R and suppose |x| ≤ ρ. Choose σ > 0 such that ρ < σ < R.
Then
|an σ n | converges, so |an σ n | ≤ M , and therefore ρ n x n
≤ |an σ n |
≤ M rn ,
|an xn | = |an σ n | σ σ where r = ρ/σ < 1. Since
M rn < ∞, the M -test (Theorem 9.22) implies that the series converges uniformly on |x| ≤ ρ, and then it follows from Theorem 9.16 that the sum is continuous on |x| ≤ ρ. Since this holds for every 0 ≤ ρ < R, the sum is continuous in |x| < R.
The following definition therefore makes sense for every power series.
Definition 10.4. If the power series


an (x − c)n n=0 converges for |x − c| < R and diverges for |x − c| > R, then 0 ≤ R ≤ ∞ is called the radius of convergence of the power series.
Theorem 10.3 does not say what happens at the endpoints x = c ± R, and in general the power series may converge or diverge there. We refer to the set of all points where the power series converges as its interval of convergence, which is one of (c − R, c + R), (c − R, c + R], [c − R, c + R), [c − R, c + R].
We won’t discuss here any general theorems about the convergence of power series at the endpoints (e.g., the Abel theorem). Also note that a power series need not converge uniformly on |x − c| < R.
Theorem 10.3 does not give an explicit expression for the radius of convergence of a power series in terms of its coefficients. The ratio test gives a simple, but useful,

184

10. Power Series

way to compute the radius of convergence, although it doesn’t apply to every power series. Theorem 10.5. Suppose that an = 0 for all sufficiently large n and the limit
R = lim

n→∞

an an+1 exists or diverges to infinity. Then the power series


an (x − c)n n=0 has radius of convergence R.
Proof. Let an+1 (x − c)n+1 an+1 = |x − c| lim
.
n n→∞ n→∞ an (x − c) an By the ratio test, the power series converges if 0 ≤ r < 1, or |x − c| < R, and diverges if 1 < r ≤ ∞, or |x − c| > R, which proves the result. r = lim

The root test gives an expression for the radius of convergence of a general power series.
Theorem 10.6 (Hadamard). The radius of convergence R of the power series


an (x − c)n n=0 is given by
1

R=

1/n

lim supn→∞ |an | where R = 0 if the lim sup diverges to ∞, and R = ∞ if the lim sup is 0.
Proof. Let
1/n

r = lim sup |an (x − c)n | n→∞ = |x − c| lim sup |an |

1/n

.

n→∞

By the root test, the series converges if 0 ≤ r < 1, or |x − c| < R, and diverges if
1 < r ≤ ∞, or |x − c| > R, which proves the result.
This theorem provides an alternate proof of Theorem 10.3 from the root test; in fact, our proof of Theorem 10.3 is more-or-less a proof of the root test.

10.3. Examples of power series
We consider a number of examples of power series and their radii of convergence.
Example 10.7. The geometric series


xn = 1 + x + x2 + . . . n=0 10.3. Examples of power series

185

has radius of convergence
R = lim

n→∞

It converges to

1
= 1.
1



1
=
xn
1 − x n=0

for |x| < 1,

and diverges for |x| > 1. At x = 1, the series becomes
1 + 1 + 1 + 1 + ... and at x = −1 it becomes
1 − 1 + 1 − 1 + 1 − ..., so the series diverges at both endpoints x = ±1. Thus, the interval of convergence of the power series is (−1, 1). The series converges uniformly on [−ρ, ρ] for every
0 ≤ ρ < 1 but does not converge uniformly on (−1, 1) (see Example 9.20). Note that although the function 1/(1 − x) is well-defined for all x = 1, the power series only converges when |x| < 1.
Example 10.8. The series


1
1
1
1 n x = x + x2 + x3 + x4 + . . . n 2
3
4 n=1 has radius of convergence
1/n
1
= lim 1 +
1/(n + 1) n→∞ n At x = 1, the series becomes the harmonic series
R = lim

n→∞

= 1.



1 1 1
1
= 1 + + + + ..., n 2 3 4 n=1 which diverges, and at x = −1 it is minus the alternating harmonic series


(−1)n
1 1 1
= −1 + − + − . . . , n 2 3 4 n=1 which converges, but not absolutely. Thus the interval of convergence of the power series is [−1, 1). The series converges uniformly on [−ρ, ρ] for every 0 ≤ ρ < 1 but does not converge uniformly on (−1, 1).
Example 10.9. The power series


1 n
1
1 x = 1 + x + x + x3 + . . . n! 2!
3!
n=0 has radius of convergence
1/n!
(n + 1)!
R = lim
= lim
= lim (n + 1) = ∞, n→∞ 1/(n + 1)! n→∞ n→∞ n! so it converges for all x ∈ R. The sum is the exponential function


ex =

1 n x . n! n=0

186

10. Power Series

In fact, this power series may be used to define the exponential function. (See
Section 10.6.)
Example 10.10. The power series


(−1)n 2n
1
1 x = 1 − x2 + x4 + . . .
(2n)!
2!
4!
n=0 has radius of convergence R = ∞, and it converges for all x ∈ R. Its sum cos x provides an analytic definition of the cosine function.
Example 10.11. The power series

n=0∞

(−1)n 2n+1
1
1 x = x − x3 + x5 + . . .
(2n + 1)!
3!
5!

has radius of convergence R = ∞, and it converges for all x ∈ R. Its sum sin x provides an analytic definition of the sine function.
Example 10.12. The power series


(n!)xn = 1 + x + (2!)x + (3!)x3 + (4!)x4 + . . . n=0 has radius of convergence
R = lim

n→∞

n!
1
= lim
= 0, n→∞ n + 1
(n + 1)!

so it converges only for x = 0. If x = 0, its terms grow larger once n > 1/|x| and
|(n!)xn | → ∞ as n → ∞.
Example 10.13. The series


(−1)n+1
1
1
(x − 1)n = (x − 1) − (x − 1)2 + (x − 1)3 − . . . n 2
3
n=1 has radius of convergence
R = lim

n→∞

(−1)n+1 /n
1
n
= lim
= lim
= 1, n+2 /(n + 1) n→∞ n + 1 n→∞ 1 + 1/n
(−1)

so it converges if |x − 1| < 1 and diverges if |x − 1| > 1. At the endpoint x = 2, the power series becomes the alternating harmonic series
1−

1 1 1
+ − + ...,
2 3 4

which converges. At the endpoint x = 0, the power series becomes the harmonic series 1 1 1
1 + + + + ...,
2 3 4 which diverges. Thus, the interval of convergence is (0, 2].

10.3. Examples of power series

187

0.6

0.5

y

0.4

0.3

0.2

0.1

0

0

0.2

0.4

0.6

0.8

1

x

n

Figure 1. Graph of the lacunary power series y = ∞ (−1)n x2 on [0, 1). n=0 It appears relatively well-behaved; however, the small oscillations visible near x = 1 are not a numerical artifact.

Example 10.14. The power series

n

(−1)n x2 = x − x2 + x4 − x8 + x16 − x32 + . . . n=0 with an =

(−1)k
0

if n = 2k , if n = 2k ,

has radius of convergence R = 1. To prove this, note that the series converges for
|x| < 1 by comparison with the convergent geometric series
|x|n , since
|an xn | =

|x|n
0

if n = 2k
≤ |x|n . if n = 2k

If |x| > 1, then the terms do not approach 0 as n → ∞, so the series diverges.
Alternatively, we have
1 if n = 2k ,
|an |1/n =
0 if n = 2k , so lim sup |an |1/n = 1 n→∞ and the Hadamard formula (Theorem 10.6) gives R = 1. The series does not converge at either endpoint x = ±1, so its interval of convergence is (−1, 1).
In this series, there are successively longer gaps (or “lacuna”) between the powers with non-zero coefficients. Such series are called lacunary power series, and

188

10. Power Series

0.52

0.51
0.508

0.51

0.506
0.504

0.5

y

y

0.502
0.49

0.5
0.498

0.48

0.496
0.494

0.47

0.492
0.46
0.9

0.92

0.94

0.96

0.98

0.49
0.99

1

0.992

0.994

x

0.996

0.998

1

x

n

∞ n 2 near x = 1,
Figure 2. Details of the lacunary power series n=0 (−1) x showing its oscillatory behavior and the nonexistence of a limit as x → 1− .

they have many interesting properties. For example, although the series does not converge at x = 1, one can ask if

n

(−1)n x2

lim

x→1−

n=0

exists. In a plot of this sum on [0, 1), shown in Figure 1, the function appears relatively well-behaved near x = 1. However, Hardy (1907) proved that the function has infinitely many, very small oscillations as x → 1− , as illustrated in Figure 2, and the limit does not exist. Subsequent results by Hardy and Littlewood (1926) showed, under suitable assumptions on the growth of the “gaps” between non-zero coefficients, that if the limit of a lacunary power series as x → 1− exists, then the series must converge at x = 1. Since the lacunary power series considered here does not converge at 1, its limit as x → 1− cannot exist. For further discussion of lacunary power series, see [4].

10.4. Algebraic operations on power series
We can add, multiply, and divide power series in a standard way. For simplicity, we consider power series centered at 0.
Proposition 10.15. If R, S > 0 and the functions




an xn

f (x) =

in |x| < R,

bn x n

g(x) =

n=0

n=0

are sums of convergent power series, then


(an + bn ) xn

in |x| < T ,

cn xn

(f + g)(x) =

in |x| < T ,

n=0


(f g)(x) = n=0 in |x| < S

10.4. Algebraic operations on power series

189

where T = min(R, S) and n cn =

an−k bk . k=0 Proof. The power series expansion of f + g follows immediately from the linearity of limits. The power series expansion of f g follows from the Cauchy product
(Theorem 4.38), since power series converge absolutely inside their intervals of convergence, and






an xn

bn xn

n=0



n

an−k xn−k · bk xk

=

n=0

n=0

cn x n .

= n=0 k=0

It may happen that the radius of convergence of the power series for f + g or f g is larger than the radius of convergence of the power series for f , g. For example, if g = −f , then the radius of convergence of the power series for f + g = 0 is ∞ whatever the radius of convergence of the power series for f .
The reciprocal of a convergent power series that is nonzero at its center also has a power series expansion.
Proposition 10.16. If R > 0 and


a n xn

f (x) =

in |x| < R,

n=0

is the sum of a power series with a0 = 0, then there exists S > 0 such that


1
=
bn x n f (x) n=0

in |x| < S.

The coefficients bn are determined recursively by b0 =

1
,
a0

bn = −

n−1

1 a0 an−k bk ,

for n ≥ 1.

k=0

Proof. First, we look for a formal power series expansion (i.e., without regard to its convergence)


bn x n

g(x) = n=0 such that the formal Cauchy product f g is equal to 1. This condition is satisfied if




an xn n=0 ∞

bn x n

n

=

n=0

an−k bk n=0 xn = 1.

k=0

Matching the coefficients of xn , we find that n−1 a0 b0 = 1,

an−k bk = 0

a0 bn + k=0 which gives the stated recursion relation.

for n ≥ 1,

190

10. Power Series

To complete the proof, we need to show that the formal power series for g has a nonzero radius of convergence. In that case, Proposition 10.15 shows that f g = 1 inside the common interval of convergence of f and g, so 1/f = g has a power series expansion. We assume without loss of generality that a0 = 1; otherwise replace f by f /a0 .
The power series for f converges absolutely and uniformly on compact sets inside its interval of convergence, so the function


|an | |x|n n=1 is continuous in |x| < R and vanishes at x = 0. It follows that there exists δ > 0 such that


|an | |x|n ≤ 1

for |x| ≤ δ.

n=1

Then f (x) = 0 for |x| < δ, since


|an | |x|n > 0,

|f (x)| ≥ 1 − n=1 so 1/f (x) is well defined.
We claim that

1 for n = 0, 1, 2, . . . . δn The proof is by induction. Since b0 = 1, this inequality is true for n = 0. If n ≥ 1 and the inequality holds for bk with 0 ≤ k ≤ n − 1, then by taking the absolute value of the recursion relation for bn , we get
|bn | ≤

n

n

|ak ||bn−k | ≤

|bn | ≤ k=1 k=1

|ak |
1
≤ n n−k δ δ ∞

|ak |δ k ≤ k=1 1
,
δn

so the inequality holds for bk with 0 ≤ k ≤ n, and the claim follows.
We then get that
1
, δ n→∞ so the Hadamard formula in Theorem 10.6 implies that the radius of convergence of bn xn is greater than or equal to δ > 0, which completes the proof. lim sup |bn |1/n ≤

An immediate consequence of these results for products and reciprocals of power series is that quotients of convergent power series are given by convergent power series, provided that the denominator is nonzero.
Proposition 10.17. If R, S > 0 and




an xn

f (x) = n=0 in |x| < R,

bn x n

g(x) =

in |x| < S

n=0

are the sums of power series with b0 = 0, then there exists T > 0 and coefficients cn such that

f (x)
=
cn x n in |x| < T . g(x) n=0

10.4. Algebraic operations on power series

191

The previous results do not give an explicit expression for the coefficients in the power series expansion of f /g or a sharp estimate for its radius of convergence.
Using complex analysis, one can show that radius of convergence of the power series for f /g centered at 0 is equal to the distance from the origin of the nearest singularity of f /g in the complex plane. We will not discuss complex analysis here, but we consider two examples.
Example 10.18. Replacing x by −x2 in the geometric power series from Example 10.7, we get the following power series centered at 0


1
= 1 − x2 + x4 − x6 + · · · =
(−1)n+1 x2n ,
1 + x2 n=0 which has radius of convergence R = 1. From the point of view of real functions, it may appear strange that the radius of convergence is 1, since the function 1/(1+x2 ) is well-defined on R, has continuous derivatives of all orders, and has power series expansions with nonzero radius of convergence centered at every c ∈ R. However, when 1/(1 + z 2 ) is regarded as a function of a complex variable z ∈ C, one sees that it has singularities at z = ±i, where the denominator vanishes, and | ± i| = 1, which explains why R = 1.
Example 10.19. The function f : R → R defined by f (0) = 1 and f (x) =

ex − 1
,
x

for x = 0

has the power series expansion


f (x) =

1 xn ,
(n + 1)! n=0 with infinite radius of convergence. The reciprocal function g : R → R of f is given by g(0) = 1 and x g(x) = x
,
for x = 0. e −1
Proposition 10.16 implies that


bn x n

g(x) = n=0 has a convergent power series expansion at 0, with b0 = 1 and n−1 bn = − k=0 bk
(n − k + 1)!

for n ≥ 1.

The numbers Bn = n!bn are called Bernoulli numbers. They may be defined as the coefficients in the power series expansion


x
Bn n
=
x . ex − 1 n=0 n!
The function x/(ex − 1) is called the generating function of the Bernoulli numbers, where we adopt the convention that x/(ex − 1) = 1 at x = 0.

192

10. Power Series

A number of properties of the Bernoulli numbers follow from their generating function. First, we observe that x 1
1
+ x= x ex − 1 2
2
is an even function of x. It follows that

ex/2 + e−x/2 ex/2 − e−x/2

1
B1 = − ,
2
and Bn = 0 for all odd n ≥ 3. Thus, the power series expansion of x/(ex − 1) has the form

1
B2n 2n x =1− x+ x . x−1 e
2
(2n)! n=1 B0 = 1,

The recursion formula for bn can be written in terms of Bn as n k=0

n+1
Bk = 0, k which implies that the Bernoulli numbers are rational. For example, one finds that
1
1
1
1
5
691
B2 = ,
B4 = − , B6 =
, B8 = − , B10 =
, B12 = −
.
6
30
42
30
66
2730
As the sudden appearance of the large irregular prime number 691 in the numerator of B12 suggests, there is no simple pattern for the values of B2n , although they continue to alternate in sign.1 The Bernoulli numbers have many surprising connections with number theory and other areas of mathematics; for example, as noted in Section 4.5, they give the values of the Riemann zeta function at even natural numbers.
Using complex analysis, one can show that the radius of convergence of the power series for z/(ez − 1) at z = 0 is equal to 2π, since the closest zeros of the denominator ez − 1 to the origin in the complex plane occur at z = ±2πi, where
|z| = 2π. Given this fact, the Hadamard formula (Theorem 10.6) implies that
1/n

Bn
1
=
,
n!

n→∞ which shows that at least some of the Bernoulli numbers Bn grow very rapidly
(factorially) as n → ∞. lim sup

Finally, we remark that we have proved that algebraic operations on convergent power series lead to convergent power series. If one is interested only in the formal algebraic properties of power series, and not their convergence, one can introduce a purely algebraic structure called the ring of formal power series (over the field R) in a variable x,


an xn : an ∈ R ,

R[[x]] = n=0 1A prime number p is said to be irregular if it divides the numerator of B , expressed in lowest
2n
terms, for some 2 ≤ 2n ≤ p − 3; otherwise it is regular. The smallest irregular prime number is 37, which divides the numerator of B32 = −7709321041217/5100, since 7709321041217 = 37·683·305065927.
There are infinitely many irregular primes, and it is conjectured that there are infinitely many regular primes. A proof of this conjecture is, however, an open problem.

10.5. Differentiation of power series

193

with sums and products on R[[x]] defined in the obvious way:






an xn + n=0 ∞

bn xn = n=0 (an + bn ) xn , n=0 ∞



an xn n=0 bn xn

n

=

n=0

an−k bk n=0 xn .

k=0

10.5. Differentiation of power series
We saw in Section 9.4.3 that, in general, one cannot differentiate a uniformly convergent sequence or series. We can, however, differentiate power series, and they behaves as nicely as one can imagine in this respect. The sum of a power series f (x) = a0 + a1 x + a2 x2 + a3 x3 + a4 x4 + . . . is infinitely differentiable inside its interval of convergence, and its derivative f (x) = a1 + 2a2 x + 3a3 x2 + 4a4 x3 + . . . is given by term-by-term differentiation. To prove this result, we first show that the term-by-term derivative of a power series has the same radius of convergence as the original power series. The idea is that the geometrical decay of the terms of the power series inside its radius of convergence dominates the algebraic growth of the factor n that comes from taking the derivative.
Theorem 10.20. Suppose that the power series


an (x − c)n n=0 has radius of convergence R. Then the power series


nan (x − c)n−1 n=1 also has radius of convergence R.
Proof. Assume without loss of generality that c = 0, and suppose |x| < R. Choose ρ such that |x| < ρ < R, and let r= |x|
,
ρ

0 < r < 1.

To estimate the terms in the differentiated power series by the terms in the original series, we rewrite their absolute values as follows: nan xn−1 =

n ρ |x| ρ The ratio test shows that the series lim n→∞

n−1

|an ρn | =

nrn−1
|an ρn |. ρ nrn−1 converges, since

(n + 1)rn
= lim n→∞ nrn−1

1+

1 n r = r < 1,

194

10. Power Series

so the sequence (nrn−1 ) is bounded, by M say. It follows that nan xn−1 ≤

M
|an ρn | ρ for all n ∈ N.

The series
|an ρn | converges, since ρ < R, so the comparison test implies that n−1 nan x converges absolutely.
|an xn | diverges (since

Conversely, suppose |x| > R. Then

an xn diverges)

and nan xn−1 ≥

1
|an xn |
|x|

for n ≥ 1, so the comparison test implies that have the same radius of convergence.

nan xn−1 diverges. Thus the series

Theorem 10.21. Suppose that the power series


an (x − c)n

f (x) =

for |x − c| < R

n=0

has radius of convergence R > 0 and sum f . Then f is differentiable in |x − c| < R and ∞

nan (x − c)n−1

f (x) =

for |x − c| < R.

n=1

Proof. The term-by-term differentiated power series converges in |x − c| < R by
Theorem 10.20. We denote its sum by


nan (x − c)n−1 .

g(x) = n=1 Let 0 < ρ < R. Then, by Theorem 10.3, the power series for f and g both converge uniformly in |x − c| < ρ. Applying Theorem 9.18 to their partial sums, we conclude that f is differentiable in |x − c| < ρ and f = g. Since this holds for every
0 ≤ ρ < R, it follows that f is differentiable in |x − c| < R and f = g, which proves the result.
Repeated application of Theorem 10.21 implies that the sum of a power series is infinitely differentiable inside its interval of convergence and its derivatives are given by term-by-term differentiation of the power series. Furthermore, we can get an expression for the coefficients an in terms of the function f ; they are simply the
Taylor coefficients of f at c.
Theorem 10.22. If the power series


an (x − c)n

f (x) = n=0 has radius of convergence R > 0, then f is infinitely differentiable in |x − c| < R and f (n) (c) an =
.
n!

10.6. The exponential function

195

Proof. We assume c = 0 without loss of generality. Applying Theorem 10.22 to the power series f (x) = a0 + a1 x + a2 x2 + a3 x3 + · · · + an xn + . . . k times, we find that f has derivatives of every order in |x| < R, and f (x) = a1 + 2a2 x + 3a3 x2 + · · · + nan xn−1 + . . . , f (x) = 2a2 + (3 · 2)a3 x + · · · + n(n − 1)an xn−2 + . . . , f (x) = (3 · 2 · 1)a3 + · · · + n(n − 1)(n − 2)an xn−3 + . . . ,
.
.
.
f (k) (x) = (k!)ak + · · · +

n! xn−k + . . . ,
(n − k)!

where all of these power series have radius of convergence R. Setting x = 0 in these series, we get a0 = f (0),

a1 = f (0),

...

ak =

f (k) (0)
,
k!

...

which proves the result (after replacing 0 by c).
One consequence of this result is that power series with different coefficients cannot converge to the same sum.
Corollary 10.23. If two power series




an (x − c)n , n=0 bn (x − c)n n=0 have nonzero-radius of convergence and are equal in some neighborhood of 0, then an = bn for every n = 0, 1, 2, . . . .
Proof. If the common sum in |x − c| < δ is f (x), we have f (n) (c) f (n) (c)
,
bn =
,
n! n! since the derivatives of f at c are determined by the values of f in an arbitrarily small open interval about c, so the coefficients are equal. an =

10.6. The exponential function
We showed in Example 10.9 that the power series
1
1
1
E(x) = 1 + x + x2 + x3 + · · · + xn + . . . .
2!
3! n! has radius of convergence ∞. It therefore defines an infinitely differentiable function
E : R → R.
Term-by-term differentiation of the power series, which is justified by Theorem 10.21, implies that
1
1
E (x) = 1 + x + x2 + · · · + x(n−1) + . . . ,
2!
(n − 1)!

196

10. Power Series

so E = E. Moreover E(0) = 1. As we show below, there is a unique function with these properties, which are shared by the exponential function ex . Thus, this power series provides an analytical definition of ex = E(x). All of the other familiar properties of the exponential follow from its power-series definition, and we will prove a few of them here
First, we show that ex ey = ex+y . For the moment, we continue to write the function ex as E(x) to emphasise that we use nothing beyond its power series definition. Proposition 10.24. For every x, y ∈ R,
E(x)E(y) = E(x + y).
Proof. We have





xj
,
j!

E(x) = j=0 E(y) = k=0 yk
.
k!

Multiplying these series term-by-term and rearranging the sum as a Cauchy product, which is justified by Theorem 4.38, we get




E(x)E(y) = j=0 k=0
∞ n

= n=0 k=0

xj y k j! k! xn−k y k
.
(n − k)! k!

From the binomial theorem, n k=0

xn−k y k
1
=
(n − k)! k! n! n

k=0

n!
1
n xn−k y k =
(x + y) .
(n − k)! k! n! Hence,


E(x)E(y) =

(x + y)n
= E(x + y), n! n=0

which proves the result.
In particular, since E(0) = 1, it follows that
E(−x) =

1
.
E(x)

We have E(x) > 0 for all x ≥ 0, since all of the terms in its power series are positive, so E(x) > 0 for all x ∈ R.
Next, we prove that the exponential is characterized by the properties E = E and E(0) = 1. This is a simple uniqueness result for an initial value problem for a linear ordinary differential equation.
Proposition 10.25. Suppose that f : R → R is a differentiable function such that f = f,
Then f = E.

f (0) = 1.

10.7. * Smooth versus analytic functions

197

Proof. Suppose that f = f . Using the equation E = E, the fact that E is nonzero on R, and the quotient rule, we get f E

=

f E − Ef f E − Ef
=
= 0.
E2
E2

It follows from Theorem 8.34 that f /E is constant on R. Since f (0) = E(0) = 1, we have f /E = 1, which implies that f = E.
In view of this result, we now write E(x) = ex . The following proposition, which we use below in Section 10.7.2, shows that ex grows faster than any power of x as x → ∞.
Proposition 10.26. Suppose that n is a non-negative integer. Then xn lim x = 0. x→∞ e
Proof. The terms in the power series of ex are positive for x > 0, so for every k∈N ∞ xk xj
>
for all x > 0. ex = j! k! j=0 Taking k = n + 1, we get for x > 0 that
0<

xn xn (n + 1)!
< (n+1)
.
= ex x x /(n + 1)!

Since 1/x → 0 as x → ∞, the result follows.
The logarithm log : (0, ∞) → R can be defined as the inverse of the exponential function exp : R → (0, ∞), which is strictly increasing on R since its derivative is strictly positive. Having the logarithm and the exponential, we can define the power function for all exponents p ∈ R by xp = ep log x ,

x > 0.

Other transcendental functions, such as the trigonometric functions, can be defined in terms of their power series, and these can be used to prove their usual properties.
We will not carry all this out in detail; we just want to emphasize that, once we have developed the theory of power series, we can define all of the functions arising in elementary calculus from the first principles of analysis.

10.7. * Smooth versus analytic functions
The power series theorem, Theorem 10.22, looks similar to Taylor’s theorem, Theorem 8.46, but there is a fundamental difference. Taylor’s theorem gives an expression for the error between a function and its Taylor polynomials. No question of convergence is involved. On the other hand, Theorem 10.22 asserts the convergence of an infinite power series to a function f . The coefficients of the Taylor polynomials and the power series are the same in both cases, but Taylor’s theorem approximates f by its Taylor polynomials Pn (x) of degree n at c in the limit x → c with n fixed, while the power series theorem approximates f by Pn (x) in the limit n → ∞ with x fixed.

198

10. Power Series

10.7.1. Taylor’s theorem and power series. To explain the difference between
Taylor’s theorem and power series in more detail, we introduce an important distinction between smooth and analytic functions: smooth functions have continuous derivatives of all orders, while analytic functions are sums of power series.
Definition 10.27. Let k ∈ N. A function f : (a, b) → R is C k on (a, b), written f ∈ C k (a, b), if it has continuous derivatives f (j) : (a, b) → R of orders 1 ≤ j ≤ k.
A function f is smooth (or C ∞ , or infinitely differentiable) on (a, b), written f ∈
C ∞ (a, b), if it has continuous derivatives of all orders on (a, b).
In fact, if f has derivatives of all orders, then they are automatically continuous, since the differentiability of f (k) implies its continuity; on the other hand, the existence of k derivatives of f does not imply the continuity of f (k) . The statement
“f is smooth” is sometimes used rather loosely to mean “f has as many continuous derivatives as we want,” but we will use it to mean that f is C ∞ .
Definition 10.28. A function f : (a, b) → R is analytic on (a, b) if for every c ∈ (a, b) the function f is the sum in a neighborhood of c of a power series centered at c with nonzero radius of convergence.
Strictly speaking, this is the definition of a real analytic function, and analytic functions are complex functions that are sums of power series. Since we consider only real functions here, we abbreviate “real analytic” to “analytic.”
Theorem 10.22 implies that an analytic function is smooth: If f is analytic on
(a, b) and c ∈ (a, b), then there is an R > 0 and coefficients (an ) such that


an (x − c)n

f (x) =

for |x − c| < R.

n=0

Then Theorem 10.22 implies that f has derivatives of all orders in |x − c| < R, and since c ∈ (a, b) is arbitrary, f has derivatives of all orders in (a, b). Moreover, it follows that the coefficients an in the power series expansion of f at c are given by
Taylor’s formula.
What is less obvious is that a smooth function need not be analytic. If f is smooth, then we can define its Taylor coefficients an = f (n) (c)/n! at c for every n ≥ 0, and write down the corresponding Taylor series an (x − c)n . The problem is that the Taylor series may have zero radius of convergence if the derivatives of f grow too rapidly as n → ∞, in which case it diverges for every x = c, or the Taylor series may converge, but not to f .
10.7.2. A smooth, non-analytic function. In this section, we give an example of a smooth function that is not the sum of its Taylor series.
It follows from Proposition 10.26 that if n ak xk

p(x) = k=0 is any polynomial function, then p(x) = x→∞ ex

n

xk
= 0. x→∞ ex

ak lim

lim

k=0

10.7. * Smooth versus analytic functions

199

−5

0.9

5

0.8

x 10

4.5
4

0.7

3.5

0.6

3 y y

0.5

2.5

0.4

2
0.3

1.5

0.2

1

0.1
0
−1

0.5
0

1

2 x 3

4

5

0
−0.02

0

0.02

0.04 x 0.06

0.08

0.1

Figure 3. Left: Plot y = φ(x) of the smooth, non-analytic function defined in Proposition 10.29. Right: A detail of the function near x = 0. The dotted line is the power-function y = x6 /50. The graph of φ near 0 is “flatter’ than the graph of the power-function, illustrating that φ(x) goes to zero faster than any power of x as x → 0.

We will use this limit to exhibit a non-zero function that approaches zero faster than every power of x as x → 0. As a result, all of its derivatives at 0 vanish, even though the function itself does not vanish in any neighborhood of 0. (See Figure 3.)
Proposition 10.29. Define φ : R → R by φ(x) =

exp(−1/x)
0

if x > 0, if x ≤ 0.

Then φ has derivatives of all orders on R and φ(n) (0) = 0

for all n ≥ 0.

Proof. The infinite differentiability of φ(x) at x = 0 follows from the chain rule.
Moreover, its nth derivative has the form φ(n) (x) =

pn (1/x) exp(−1/x) if x > 0,
0
if x < 0,

where pn (1/x) is a polynomial of degree 2n in 1/x. This follows, for example, by induction, since differentiation of φ(n) shows that pn satisfies the recursion relation pn+1 (z) = z 2 [pn (z) − pn (z)] ,

p0 (z) = 1.

Thus, we just have to show that φ has derivatives of all orders at 0, and that these derivatives are equal to zero.
First, consider φ (0). The left derivative φ (0− ) of φ at 0 is 0 since φ(0) = 0 and φ(h) = 0 for all h < 0. To find the right derivative, we write 1/h = x and use

200

10. Power Series

Proposition 10.26, which gives φ(h) − φ(0) h h→0 exp(−1/h) = lim h h→0+ x = lim x x→∞ e
= 0.

φ (0+ ) = lim+

Since both the left and right derivatives equal zero, we have φ (0) = 0.
To show that all the derivatives of φ at 0 exist and are zero, we use a proof by induction. Suppose that φ(n) (0) = 0, which we have verified for n = 1. The left derivative φ(n+1) (0− ) is clearly zero, so we just need to prove that the right derivative is zero. Using the form of φ(n) (h) for h > 0 and Proposition 10.26, we get that φ(n) (h) − φ(n) (0) h h→0 pn (1/h) exp(−1/h)
= lim+ h h→0 xpn (x)
= lim x→∞ ex
= 0,

φ(n+1) (0+ ) = lim+

which proves the result.
Corollary 10.30. The function φ : R → R defined by φ(x) =

exp(−1/x)
0

if x > 0, if x ≤ 0,

is smooth but not analytic on R.
Proof. From Proposition 10.29, the function φ is smooth, and the nth Taylor coefficient of φ at 0 is an = 0. The Taylor series of φ at 0 therefore converges to
0, so its sum is not equal to φ in any neighborhood of 0, meaning that φ is not analytic at 0.
The fact that the Taylor polynomial of φ at 0 is zero for every degree n ∈ N does not contradict Taylor’s theorem, which says that for for every n ∈ N and x > 0 there exists 0 < ξ < x such that φ(x) =

φ(n) (ξ) n x . n! Since the derivatives of φ are bounded, it follows that there is a constant Cn , depending on n, such that
|φ(x)| ≤ Cn xn

for all 0 < x < ∞.

10.7. * Smooth versus analytic functions

201

Thus, φ(x) → 0 as x → 0 faster than any power of x. But this inequality does not imply that φ(x) = 0 for x > 0 since Cn grows rapidly as n increases, and Cn xn → 0 as n → ∞ for any x > 0, however small.
We can construct other smooth, non-analytic functions from φ.
Example 10.31. The function ψ(x) =

exp(−1/x2 ) if x = 0,
0
if x = 0,

is infinitely differentiable on R, since ψ(x) = φ(x2 ) is a composition of smooth functions. The function in the next example is useful in many parts of analysis. Before giving the example, we introduce some terminology.
Definition 10.32. A function f : R → R has compact support if there exists R ≥ 0 such that f (x) = 0 for all x ∈ R with |x| ≥ R.
It isn’t hard to construct continuous functions with compact support; one example that vanishes for |x| ≥ 1 is the piecewise-linear, triangular (or ‘tent’) function f (x) =

1 − |x| if |x| < 1,
0
if |x| ≥ 1.

By matching left and right derivatives of piecewise-polynomial functions, we can similarly construct C 1 or C k functions with compact support. Using φ, however, we can construct a smooth (C ∞ ) function with compact support, which might seem unexpected at first sight.
Example 10.33. The function η(x) =

exp[−1/(1 − x2 )] if |x| < 1,
0
if |x| ≥ 1,

is infinitely differentiable on R, since η(x) = φ(1 − x2 ) is a composition of smooth functions. Moreover, it vanishes for |x| ≥ 1, so it is a smooth function with compact support. Figure 4 shows its graph. This function is sometimes called a ‘bump’ function. The function φ defined in Proposition 10.29 illustrates that knowing the values of a smooth function and all of its derivatives at one point does not tell us anything about the values of the function at nearby points. This behavior contrasts with, and highlights, the remarkable property of analytic functions that the values of an analytic function and all of its derivatives at a single point of an interval determine the function on the whole interval.
We make this principle of analytic continuation precise in the following proposition. The proof uses a common trick of going from a local result (equality of functions in a neighborhood of a point) to a global result (equality of functions on the whole of their connected domain) by proving that an appropriate subset is open, closed, and non-empty.

202

10. Power Series

0.4
0.35
0.3

y

0.25
0.2
0.15
0.1
0.05
0
−2

−1.5

−1

−0.5

0 x 0.5

1

1.5

2

Figure 4. Plot of the smooth, compactly supported “bump” function defined in Example 10.33.

Proposition 10.34. Suppose that f, g : (a, b) → R are analytic functions on an open interval (a, b). If f (n) (c) = g (n) (c) for all n ≥ 0 at some point c ∈ (a, b), then f = g on (a, b).
Proof. Let
E = x ∈ (a, b) : f (n) (x) = g (n) (x) all n ≥ 0 .
The continuity of the derivatives f (n) , g (n) implies that E is closed in (a, b): If xk ∈ E and xk → x ∈ (a, b), then f (n) (x) = lim f (n) (xk ) = lim g (n) (xk ) = g (n) (x), k→∞ k→∞

so x ∈ E, and E is closed.
The analyticity of f , g implies that E is open in (a, b): If x ∈ E, then f = g in some open interval (x − r, x + r) with r > 0, since both functions have the same
Taylor coefficients and convergent power series centered at x, so f (n) = g (n) in
(x − r, x + r), meaning that (x − r, x + r) ⊂ E, and E is open.
From Theorem 5.63, the interval (a, b) is connected, meaning that the only subsets that are open and closed in (a, b) are the empty set and the entire interval.
But E = ∅ since c ∈ E, so E = (a, b), which proves the result.
It is worth noting the choice of the set E in the preceding proof. For example, the proof would not work if we try to use the set
˜
E = {x ∈ (a, b) : f (x) = g(x)}

10.7. * Smooth versus analytic functions

203

˜
˜
instead of E. The continuity of f , g implies that E is closed, but E is not, in general, open.
One particular consequence of Proposition 10.34 is that a non-zero analytic function on R cannot have compact support, since an analytic function on R that is equal to zero on any interval (a, b) ⊂ R must equal zero on R. Thus, the nonanalyticity of the ‘bump’-function η in Example 10.33 is essential.

Chapter 11

The Riemann Integral

I know of some universities in England where the Lebesgue integral is taught in the first year of a mathematics degree instead of the Riemann integral, but I know of no universities in England where students learn the Lebesgue integral in the first year of a mathematics degree. (Approximate quotation attributed to T. W. K¨rner) o Let f : [a, b] → R be a bounded (not necessarily continuous) function on a compact (closed, bounded) interval. We will define what it means for f to be b Riemann integrable on [a, b] and, in that case, define its Riemann integral a f .
The integral of f on [a, b] is a real number whose geometrical interpretation is the signed area under the graph y = f (x) for a ≤ x ≤ b. This number is also called the definite integral of f . By integrating f over an interval [a, x] with varying right end-point, we get a function of x, called an indefinite integral of f .
The most important result about integration is the fundamental theorem of calculus, which states that integration and differentiation are inverse operations in an appropriately understood sense. Among other things, this connection enables us to compute many integrals explicitly. We will prove the fundamental theorem in the next chapter. In this chapter, we define the Riemann integral and prove some of its basic properties.
Integrability is a less restrictive condition on a function than differentiability. Generally speaking, integration makes functions smoother, while differentiation makes functions rougher. For example, the indefinite integral of every continuous function exists and is differentiable, whereas the derivative of a continuous function need not exist (and typically doesn’t).
The Riemann integral is the simplest integral to define, and it allows one to integrate every continuous function as well as some not-too-badly discontinuous functions. There are, however, many other types of integrals, the most important of which is the Lebesgue integral. The Lebesgue integral allows one to integrate unbounded or highly discontinuous functions whose Riemann integrals do not exist,
205

206

11. The Riemann Integral

and it has better mathematical properties than the Riemann integral. The definition of the Lebesgue integral is more involved, requiring the use of measure theory, and we will not discuss it here. In any event, the Riemann integral is adequate for many purposes, and even if one needs the Lebesgue integral, it is best to understand the Riemann integral first.

11.1. The supremum and infimum of functions
In this section we collect some results about the supremum and infimum of functions that we use to study Riemann integration. These results can be referred back to as needed. From Definition 6.11, the supremum or infimum of a function is the supremum or infimum of its range, and results about the supremum or infimum of sets translate immediately to results about functions. There are, however, a few differences, which come from the fact that we often compare the values of functions at the same point, rather than all of their values simultaneously.
Inequalities and operations on functions are defined pointwise as usual; for example, if f, g : A → R, then f ≤ g means that f (x) ≤ g(x) for every x ∈ A, and f + g : A → R is defined by (f + g)(x) = f (x) + g(x).
Proposition 11.1. Suppose that f, g : A → R and f ≤ g. Then sup f ≤ sup g,
A

A

inf f ≤ inf g.
A

A

Proof. If sup g = ∞, then sup f ≤ sup g. Otherwise, if f ≤ g and g is bounded from above, then f (x) ≤ g(x) ≤ sup g

for every x ∈ A.

A

Thus, f is bounded from above by supA g, so supA f ≤ supA g. Similarly, −f ≥ −g implies that supA (−f ) ≥ supA (−g), so inf A f ≤ inf A g.
Note that f ≤ g does not imply that supA f ≤ inf A g; to get that conclusion, we need to know that f (x) ≤ g(y) for all x, y ∈ A and use Proposition 2.24.
Example 11.2. Define f, g : [0, 1] → R by f (x) = 2x, g(x) = 2x + 1. Then f < g and sup f = 2, inf f = 0, sup g = 3, inf g = 1.
[0,1]

[0,1]

[0,1]

[0,1]

Thus, sup f > inf g even though f < g.
As for sets, the supremum and infimum of functions do not, in general, preserve strict inequalities, and a function need not attain its supremum or infimum even if it exists.
Example 11.3. Define f : [0, 1] → R by f (x) =

x if 0 ≤ x < 1,
0 if x = 1.

Then f < 1 on [0, 1] but sup[0,1] f = 1, and there is no point x ∈ [0, 1] such that f (x) = 1.

11.1. The supremum and infimum of functions

207

Next, we consider the supremum and infimum of linear combinations of functions. Multiplication of a function by a positive constant multiplies the inf or sup, while multiplication by a negative constant switches the inf and sup,
Proposition 11.4. Suppose that f : A → R is a bounded function and c ∈ R. If c ≥ 0, then sup cf = c sup f, inf cf = c inf f.
A

A

A

A

If c < 0, then sup cf = c inf f,

inf cf = c sup f.

A

A

A

A

Proof. Apply Proposition 2.23 to the set {cf (x) : x ∈ A} = c{f (x) : x ∈ A}.
For sums of functions, we get an inequality.
Proposition 11.5. If f, g : A → R are bounded functions, then sup(f + g) ≤ sup f + sup g,
A

A

inf (f + g) ≥ inf f + inf g.
A

A

A

A

Proof. Since f (x) ≤ supA f and g(x) ≤ supA g for every x ∈ [a, b], we have f (x) + g(x) ≤ sup f + sup g.
A

A

Thus, f + g is bounded from above by supA f + supA g, so sup(f + g) ≤ sup f + sup g.
A

A

A

The proof for the infimum is analogous (or apply the result for the supremum to the functions −f , −g).
We may have strict inequality in Proposition 11.5 because f and g may take values close to their suprema (or infima) at different points.
Example 11.6. Define f, g : [0, 1] → R by f (x) = x, g(x) = 1 − x. Then sup f = sup g = sup(f + g) = 1,
[0,1]

[0,1]

[0,1]

so sup(f + g) = 1 but sup f + sup g = 2. Here, f attains its supremum at 1, while g attains its supremum at 0.
Finally, we prove some inequalities that involve the absolute value.
Proposition 11.7. If f, g : A → R are bounded functions, then sup f − sup g ≤ sup |f − g|,
A

A

inf f − inf g ≤ sup |f − g|.
A

A

A

A

Proof. Since f = f − g + g and f − g ≤ |f − g|, we get from Proposition 11.5 and
Proposition 11.1 that sup f ≤ sup(f − g) + sup g ≤ sup |f − g| + sup g,
A

A

A

A

so sup f − sup g ≤ sup |f − g|.
A

A

A

A

208

11. The Riemann Integral

Exchanging f and g in this inequality, we get sup g − sup f ≤ sup |f − g|,
A

A

A

which implies that sup f − sup g ≤ sup |f − g|.
A

A

A

Replacing f by −f and g by −g in this inequality, we get inf f − inf g ≤ sup |f − g|,
A

A

A

where we use the fact that sup(−f ) = − inf f .
Proposition 11.8. If f, g : A → R are bounded functions such that
|f (x) − f (y)| ≤ |g(x) − g(y)|

for all x, y ∈ A,

then sup f − inf f ≤ sup g − inf g.
A

A

A

A

Proof. The condition implies that for all x, y ∈ A, we have f (x) − f (y) ≤ |g(x) − g(y)| = max [g(x), g(y)] − min [g(x), g(y)] ≤ sup g − inf g,
A

A

which implies that sup{f (x) − f (y) : x, y ∈ A} ≤ sup g − inf g.
A

A

From Proposition 2.24, we have sup{f (x) − f (y) : x, y ∈ A} = sup f − inf f,
A

A

so the result follows.

11.2. Definition of the integral
The definition of the integral is more involved than the definition of the derivative.
The derivative is approximated by difference quotients, whereas the integral is approximated by upper and lower sums based on a partition of an interval.
We say that two intervals are almost disjoint if they are disjoint or intersect only at a common endpoint. For example, the intervals [0, 1] and [1, 3] are almost disjoint, whereas the intervals [0, 2] and [1, 3] are not.
Definition 11.9. Let I be a nonempty, compact interval. A partition of I is a finite collection {I1 , I2 , . . . , In } of almost disjoint, nonempty, compact subintervals whose union is I.
A partition of [a, b] with subintervals Ik = [xk−1 , xk ] is determined by the set of endpoints of the intervals a = x0 < x1 < x2 < · · · < xn−1 < xn = b.
Abusing notation, we will denote a partition P either by its intervals
P = {I1 , I2 , . . . , In }

11.2. Definition of the integral

209

or by the set of endpoints of the intervals
P = {x0 , x1 , x2 , . . . , xn−1 , xn }.
We’ll adopt either notation as convenient; the context should make it clear which one is being used. There is always one more endpoint than interval.
Example 11.10. The set of intervals
{[0, 1/5], [1/5, 1/4], [1/4, 1/3], [1/3, 1/2], [1/2, 1]} is a partition of [0, 1]. The corresponding set of endpoints is
{0, 1/5, 1/4, 1/3, 1/2, 1}.
We denote the length of an interval I = [a, b] by
|I| = b − a.
Note that the sum of the lengths |Ik | = xk −xk−1 of the almost disjoint subintervals in a partition {I1 , I2 , . . . , In } of an interval I is equal to length of the whole interval.
This is obvious geometrically; algebraically, it follows from the telescoping series n n

|Ik | = k=1 (xk − xk−1 ) k=1 = xn − xn−1 + xn−1 − xn−2 + · · · + x2 − x1 + x1 − x0
= xn − x0
= |I|.
Suppose that f : [a, b] → R is a bounded function on the compact interval
I = [a, b] with
M = sup f, m = inf f.
I

I

If P = {I1 , I2 , . . . , In } is a partition of I, let
Mk = sup f,

mk = inf f.
Ik

Ik

These suprema and infima are well-defined, finite real numbers since f is bounded.
Moreover,
m ≤ mk ≤ Mk ≤ M.
If f is continuous on the interval I, then it is bounded and attains its maximum and minimum values on each subinterval, but a bounded discontinuous function need not attain its supremum or infimum.
We define the upper Riemann sum of f with respect to the partition P by n n

Mk |Ik | =

U (f ; P ) = k=1 Mk (xk − xk−1 ), k=1 and the lower Riemann sum of f with respect to the partition P by n n

mk |Ik | =

L(f ; P ) = k=1 mk (xk − xk−1 ). k=1 210

11. The Riemann Integral

Geometrically, U (f ; P ) is the sum of the signed areas of rectangles based on the intervals Ik that lie above the graph of f , and L(f ; P ) is the sum of the signed areas of rectangles that lie below the graph of f . Note that m(b − a) ≤ L(f ; P ) ≤ U (f ; P ) ≤ M (b − a).
Let Π(a, b), or Π for short, denote the collection of all partitions of [a, b]. We define the upper Riemann integral of f on [a, b] by
U (f ) = inf U (f ; P ).
P ∈Π

The set {U (f ; P ) : P ∈ Π} of all upper Riemann sums of f is bounded from below by m(b − a), so this infimum is well-defined and finite. Similarly, the set
{L(f ; P ) : P ∈ Π} of all lower Riemann sums is bounded from above by M (b − a), and we define the lower Riemann integral of f on [a, b] by
L(f ) = sup L(f ; P ).
P ∈Π

These upper and lower sums and integrals depend on the interval [a, b] as well as the function f , but to simplify the notation we won’t show this explicitly. A commonly used alternative notation for the upper and lower integrals is b b

f,

U (f ) =

L(f ) =

a

f. a Note the use of “lower-upper” and “upper-lower” approximations for the integrals: we take the infimum of the upper sums and the supremum of the lower sums. As we show in Proposition 11.22 below, we always have L(f ) ≤ U (f ), but in general the upper and lower integrals need not be equal. We define Riemann integrability by their equality.
Definition 11.11. A function f : [a, b] → R is Riemann integrable on [a, b] if it is bounded and its upper integral U (f ) and lower integral L(f ) are equal. In that case, the Riemann integral of f on [a, b], denoted by b b

f (x) dx,

f,

a

a

f
[a,b]

or similar notations, is the common value of U (f ) and L(f ).
An unbounded function is not Riemann integrable. In the following, “integrable” will mean “Riemann integrable, and “integral” will mean “Riemann integral” unless stated explicitly otherwise.
11.2.1. Examples. Let us illustrate the definition of Riemann integrability with a number of examples.
Example 11.12. Define f : [0, 1] → R by f (x) =

1/x
0

Then

1
0

if 0 < x ≤ 1, if x = 0.
1
dx x 11.2. Definition of the integral

211

isn’t defined as a Riemann integral becuase f is unbounded. In fact, if
0 < x1 < x2 < · · · < xn−1 < 1 is a partition of [0, 1], then sup f = ∞,
[0,x1 ]

so the upper Riemann sums of f are not well-defined.
An integral with an unbounded interval of integration, such as

1 dx, x
1
also isn’t defined as a Riemann integral. In this case, a partition of [1, ∞) into finitely many intervals contains at least one unbounded interval, so the corresponding Riemann sum is not well-defined. A partition of [1, ∞) into bounded intervals
(for example, Ik = [k, k + 1] with k ∈ N) gives an infinite series rather than a finite
Riemann sum, leading to questions of convergence.
One can interpret the integrals in this example as limits of Riemann integrals, or improper Riemann integrals, r 1

1
1
1
1
dx = lim dx, dx = lim dx, r→∞ 1 x x x x →0+
0
1 but these are not proper Riemann integrals in the sense of Definition 11.11. Such improper Riemann integrals involve two limits — a limit of Riemann sums to define the Riemann integrals, followed by a limit of Riemann integrals. Both of the improper integrals in this example diverge to infinity. (See Section 12.4.)
1

Next, we consider some examples of bounded functions on compact intervals.
Example 11.13. The constant function f (x) = 1 on [0, 1] is Riemann integrable, and 1

1 dx = 1.
0

To show this, let P = {I1 , I2 , . . . , In } be any partition of [0, 1] with endpoints
{0, x1 , x2 , . . . , xn−1 , 1}.
Since f is constant,
Mk = sup f = 1,

mk = inf f = 1
Ik

Ik

for k = 1, . . . , n,

and therefore n (xk − xk−1 ) = xn − x0 = 1.

U (f ; P ) = L(f ; P ) =

k=1

Geometrically, this equation is the obvious fact that the sum of the areas of the rectangles over (or, equivalently, under) the graph of a constant function is exactly equal to the area under the graph. Thus, every upper and lower sum of f on [0, 1] is equal to 1, which implies that the upper and lower integrals
U (f ) = inf U (f ; P ) = inf{1} = 1,
P ∈Π

are equal, and the integral of f is 1.

L(f ) = sup L(f ; P ) = sup{1} = 1
P ∈Π

212

11. The Riemann Integral

More generally, the same argument shows that every constant function f (x) = c is integrable and b c dx = c(b − a). a The following is an example of a discontinuous function that is Riemann integrable.
Example 11.14. The function if 0 < x ≤ 1 if x = 0

0
1

f (x) = is Riemann integrable, and
1

f dx = 0.
0

To show this, let P = {I1 , I2 , . . . , In } be a partition of [0, 1]. Then, since f (x) = 0 for x > 0,
Mk = sup f = 0,

mk = inf f = 0
Ik

Ik

for k = 2, . . . , n.

The first interval in the partition is I1 = [0, x1 ], where 0 < x1 ≤ 1, and
M1 = 1,

m1 = 0,

since f (0) = 1 and f (x) = 0 for 0 < x ≤ x1 . It follows that
U (f ; P ) = x1 ,

L(f ; P ) = 0.

Thus, L(f ) = 0 and
U (f ) = inf{x1 : 0 < x1 ≤ 1} = 0, so U (f ) = L(f ) = 0 are equal, and the integral of f is 0. In this example, the infimum of the upper Riemann sums is not attained and U (f ; P ) > U (f ) for every partition P .
A similar argument shows that a function f : [a, b] → R that is zero except at finitely many points in [a, b] is Riemann integrable with integral 0.
The next example is a bounded function on a compact interval whose Riemann integral doesn’t exist.
Example 11.15. The Dirichlet function f : [0, 1] → R is defined by f (x) =

1 if x ∈ [0, 1] ∩ Q,
0 if x ∈ [0, 1] \ Q.

That is, f is one at every rational number and zero at every irrational number.
This function is not Riemann integrable. If P = {I1 , I2 , . . . , In } is a partition of [0, 1], then
Mk = sup f = 1, mk = inf = 0,
Ik

Ik

since every interval of non-zero length contains both rational and irrational numbers. It follows that
U (f ; P ) = 1,
L(f ; P ) = 0 for every partition P of [0, 1], so U (f ) = 1 and L(f ) = 0 are not equal.

11.2. Definition of the integral

213

The Dirichlet function is discontinuous at every point of [0, 1], and the moral of the last example is that the Riemann integral of a highly discontinuous function need not exist. Nevertheless, some fairly discontinuous functions are still Riemann integrable. Example 11.16. The Thomae function defined in Example 7.14 is Riemann integrable. The proof is left as an exercise.
Theorem 11.58 and Theorem 11.61 below give precise statements of the extent to which a Riemann integrable function can be discontinuous.
11.2.2. Refinements of partitions. As the previous examples illustrate, a direct verification of integrability from Definition 11.11 is unwieldy even for the simplest functions because we have to consider all possible partitions of the interval of integration. To give an effective analysis of Riemann integrability, we need to study how upper and lower sums behave under the refinement of partitions.
Definition 11.17. A partition Q = {J1 , J2 , . . . , Jm } is a refinement of a partition
P = {I1 , I2 , . . . , In } if every interval Ik in P is an almost disjoint union of one or more intervals J in Q.
Equivalently, if we represent partitions by their endpoints, then Q is a refinement of P if Q ⊃ P , meaning that every endpoint of P is an endpoint of Q. We don’t require that every interval — or even any interval — in a partition has to be split into smaller intervals to obtain a refinement; for example, every partition is a refinement of itself.
Example 11.18. Consider the partitions of [0, 1] with endpoints
P = {0, 1/2, 1},

Q = {0, 1/3, 2/3, 1},

R = {0, 1/4, 1/2, 3/4, 1}.

Thus, P , Q, and R partition [0, 1] into intervals of equal length 1/2, 1/3, and 1/4, respectively. Then Q is not a refinement of P but R is a refinement of P .
Given two partitions, neither one need be a refinement of the other. However, two partitions P , Q always have a common refinement; the smallest one is R =
P ∪ Q, meaning that the endpoints of R are exactly the endpoints of P or Q (or both). Example 11.19. Let P = {0, 1/2, 1} and Q = {0, 1/3, 2/3, 1}, as in Example 11.18.
Then Q isn’t a refinement of P and P isn’t a refinement of Q. The partition
S = P ∪ Q, or
S = {0, 1/3, 1/2, 2/3, 1}, is a refinement of both P and Q. The partition S is not a refinement of R, but
T = R ∪ S, or
T = {0, 1/4, 1/3, 1/2, 2/3, 3/4, 1}, is a common refinement of all of the partitions {P, Q, R, S}.
As we show next, refining partitions decreases upper sums and increases lower sums. (The proof is easier to understand than it is to write out — draw a picture!)

214

11. The Riemann Integral

Theorem 11.20. Suppose that f : [a, b] → R is bounded, P is a partitions of [a, b], and Q is refinement of P . Then
U (f ; Q) ≤ U (f ; P ),

L(f ; P ) ≤ L(f ; Q).

Proof. Let
P = {I1 , I2 , . . . , In } ,
Q = {J1 , J2 , . . . , Jm } be partitions of [a, b], where Q is a refinement of P , so m ≥ n. We list the intervals in increasing order of their endpoints. Define
Mk = sup f,

mk = inf f,

M = sup f,

Ik

Ik

m = inf f.

J

J

Since Q is a refinement of P , each interval Ik in P is an almost disjoint union of intervals in Q, which we can write as qk Ik =

J
=pk

for some indices pk ≤ qk . If pk < qk , then Ik is split into two or more smaller intervals in Q, and if pk = qk , then Ik belongs to both P and Q. Since the intervals are listed in order, we have p1 = 1,

pk+1 = qk + 1,

qn = m.

If pk ≤ ≤ qk , then J ⊂ Ik , so
M ≤ Mk ,

mk ≥ m

for pk ≤ ≤ qk .

Using the fact that the sum of the lengths of the J-intervals is the length of the corresponding I-interval, we get that qk qk

qk

M |J | ≤
=pk

Mk |J | = Mk
=pk

|J | = Mk |Ik |.
=pk

It follows that m n

qk

M |J | =

U (f ; Q) =
=1

n

M |J | ≤ k=1 =pk

Mk |Ik | = U (f ; P ). k=1 Similarly, qk qk

m |J | ≥ and mk |J | = mk |Ik |,
=pk

=pk n qk

n

m |J | ≥

L(f ; Q) = k=1 =pk

mk |Ik | = L(f ; P ), k=1 which proves the result.
It follows from this theorem that all lower sums are less than or equal to all upper sums, not just the lower and upper sums associated with the same partition.
Proposition 11.21. If f : [a, b] → R is bounded and P , Q are partitions of [a, b], then L(f ; P ) ≤ U (f ; Q).

11.3. The Cauchy criterion for integrability

215

Proof. Let R be a common refinement of P and Q. Then, by Theorem 11.20,
L(f ; P ) ≤ L(f ; R),

U (f ; R) ≤ U (f ; Q).

It follows that
L(f ; P ) ≤ L(f ; R) ≤ U (f ; R) ≤ U (f ; Q).

An immediate consequence of this result is that the lower integral is always less than or equal to the upper integral.
Proposition 11.22. If f : [a, b] → R is bounded, then
L(f ) ≤ U (f ).
Proof. Let
A = {L(f ; P ) : P ∈ Π},

B = {U (f ; P ) : P ∈ Π}.

From Proposition 11.21, L ≤ U for every L ∈ A and U ∈ B, so Proposition 2.22 implies that sup A ≤ inf B, or L(f ) ≤ U (f ).

11.3. The Cauchy criterion for integrability
The following theorem gives a criterion for integrability that is analogous to the
Cauchy condition for the convergence of a sequence.
Theorem 11.23. A bounded function f : [a, b] → R is Riemann integrable if and only if for every > 0 there exists a partition P of [a, b], which may depend on , such that
U (f ; P ) − L(f ; P ) < .
Proof. First, suppose that the condition holds. Let > 0 and choose a partition
P that satisfies the condition. Then, since U (f ) ≤ U (f ; P ) and L(f ; P ) ≤ L(f ), we have
0 ≤ U (f ) − L(f ) ≤ U (f ; P ) − L(f ; P ) < .
Since this inequality holds for every > 0, we must have U (f ) − L(f ) = 0, and f is integrable.
Conversely, suppose that f is integrable. Given any
Q, R such that

> 0, there are partitions

U (f ; Q) < U (f ) + ,
L(f ; R) > L(f ) − .
2
2
Let P be a common refinement of Q and R. Then, by Theorem 11.20,
U (f ; P ) − L(f ; P ) ≤ U (f ; Q) − L(f ; R) < U (f ) − L(f ) + .
Since U (f ) = L(f ), the condition follows.
If U (f ; P ) − L(f ; P ) < , then U (f ; Q) − L(f ; Q) < for every refinement Q of P , so the Cauchy condition means that a function is integrable if and only if its upper and lower sums get arbitrarily close together for all sufficiently refined partitions. 216

11. The Riemann Integral

It is worth considering in more detail what the Cauchy condition in Theorem 11.23 implies about the behavior of a Riemann integrable function.
Definition 11.24. The oscillation of a bounded function f on a set A is osc f = sup f − inf f.
A

A

A

If f : [a, b] → R is bounded and P = {I1 , I2 , . . . , In } is a partition of [a, b], then n U (f ; P ) − L(f ; P ) =

n

n

sup f · |Ik | − k=1 Ik

inf f · |Ik | = k=1 Ik

osc f · |Ik |. k=1 Ik

A function f is Riemann integrable if we can make U (f ; P ) − L(f ; P ) as small as we wish. This is the case if we can find a sufficiently refined partition P such that the oscillation of f on most intervals is arbitrarily small, and the sum of the lengths of the remaining intervals (where the oscillation of f is large) is arbitrarily small.
For example, the discontinuous function in Example 11.14 has zero oscillation on every interval except the first one, where the function has oscillation one, but the length of that interval can be made as small as we wish.
Thus, roughly speaking, a function is Riemann integrable if it oscillates by an arbitrary small amount except on a finite collection of intervals whose total length is arbitrarily small. Theorem 11.58 gives a precise statement.
One direct consequence of the Cauchy criterion is that a function is integrable if we can estimate its oscillation by the oscillation of an integrable function.
Proposition 11.25. Suppose that f, g : [a, b] → R are bounded functions and g is integrable on [a, b]. If there exists a constant C ≥ 0 such that osc f ≤ C osc g
I

I

on every interval I ⊂ [a, b], then f is integrable.
Proof. If P = {I1 , I2 , . . . , In } is a partition of [a, b], then n U (f ; P ) − L (f ; P ) =

osc f · |Ik |

Ik k=1 n

osc g · |Ik |

≤C k=1 Ik

≤ C [U (g; P ) − L(g; P )] .
Thus, f satisfies the Cauchy criterion in Theorem 11.23 if g does, which proves that f is integrable if g is integrable.
We can also use the Cauchy criterion to give a sequential characterization of integrability. Theorem 11.26. A bounded function f : [a, b] → R is Riemann integrable if and only if there is a sequence (Pn ) of partitions such that lim [U (f ; Pn ) − L(f ; Pn )] = 0.

n→∞

11.3. The Cauchy criterion for integrability

217

In that case, b f = lim U (f ; Pn ) = lim L(f ; Pn ). n→∞ a

n→∞

Proof. First, suppose that the condition holds. Then, given > 0, there is an n ∈ N such that U (f ; Pn ) − L(f ; Pn ) < , so Theorem 11.23 implies that f is integrable and U (f ) = L(f ).
Furthermore, since U (f ) ≤ U (f ; Pn ) and L(f ; Pn ) ≤ L(f ), we have
0 ≤ U (f ; Pn ) − U (f ) = U (f ; Pn ) − L(f ) ≤ U (f ; Pn ) − L(f ; Pn ).
Since the limit of the right-hand side is zero, the ‘squeeze’ theorem implies that b f.

lim U (f ; Pn ) = U (f ) =

n→∞

a

It also follows that b lim L(f ; Pn ) = lim U (f ; Pn ) − lim [U (f ; Pn ) − L(f ; Pn )] =

n→∞

n→∞

n→∞

f. a Conversely, if f is integrable then, by Theorem 11.23, for every n ∈ N there exists a partition Pn such that
1
0 ≤ U (f ; Pn ) − L(f ; Pn ) < , n and U (f ; Pn ) − L(f ; Pn ) → 0 as n → ∞.
Note that if the limits of U (f ; Pn ) and L(f ; Pn ) both exist and are equal, then lim [U (f ; Pn ) − L(f ; Pn )] = lim U (f ; Pn ) − lim L(f ; Pn ),

n→∞

n→∞

n→∞

so the conditions of the theorem are satisfied. Conversely, the proof of the theorem shows that if the limit of U (f ; Pn ) − L(f ; Pn ) is zero, then the limits of U (f ; Pn ) and L(f ; Pn ) both exist and are equal. This isn’t true for general sequences, where one may have lim(an − bn ) = 0 even though lim an and lim bn don’t exist.
Theorem 11.26 provides one way to prove the existence of an integral and, in some cases, evaluate it.
Example 11.27. Consider the function f (x) = x2 on [0, 1]. Let Pn be the partition of [0, 1] into n-intervals of equal length 1/n with endpoints xk = k/n for k =
0, 1, 2, . . . , n. If Ik = [(k − 1)/n, k/n] is the kth interval, then inf f = x2 k−1 sup f = x2 , k Ik

Ik

since f is increasing. Using the formula for the sum of squares n k2 = k=1 1 n(n + 1)(2n + 1),
6

we get n x2 · k U (f ; Pn ) = k=1 1
1
= 3 n n

n

k2 = k=1 1
6

1+

1 n 2+

1 n 218

11. The Riemann Integral

Upper Riemann Sum =0.44
1
0.8 y 0.6
0.4
0.2
0

0

0.2

0

0.2

0.4

0.6 x Lower Riemann Sum =0.24

0.8

1

0.8

1

0.8

1

0.8

1

0.8

1

0.8

1

1
0.8
y

0.6
0.4
0.2
0

0.4

0.6 x Upper Riemann Sum =0.385
1
0.8 y 0.6
0.4
0.2
0

0

0.2

0

0.2

0.4

0.6 x Lower Riemann Sum =0.285

1
0.8
y

0.6
0.4
0.2
0

0.4

0.6 x Upper Riemann Sum =0.3434
1
0.8 y 0.6
0.4
0.2
0

0

0.2

0

0.2

0.4

0.6 x Lower Riemann Sum =0.3234

1
0.8
y

0.6
0.4
0.2
0

0.4

0.6 x Figure 1. Upper and lower Riemann sums for Example 11.27 with n =
5, 10, 50 subintervals of equal length.

11.4. Continuous and monotonic functions

219

and n x2 · k−1 L(f ; Pn ) = k=1 1
1
= 3 n n

n−1

k2 = k=1 1
6

1−

1 n 2−

1 n .

(See Figure 11.27.) It follows that lim U (f ; Pn ) = lim L(f ; Pn ) =

n→∞

n→∞

1
,
3

and Theorem 11.26 implies that x2 is integrable on [0, 1] with
1

x2 dx =
0

1
.
3

The fundamental theorem of calculus, Theorem 12.1 below, provides a much easier way to evaluate this integral, but the Riemann sums provide the basic definition of the integral.

11.4. Continuous and monotonic functions
The Cauchy criterion leads to the following fundamental result that every continuous function is Riemann integrable. To prove this result, we use the fact that a continuous function oscillates by an arbitrarily small amount on every interval of a sufficiently refined partition.
Theorem 11.28. A continuous function f : [a, b] → R on a compact interval is
Riemann integrable.

Proof. A continuous function on a compact set is bounded, so we just need to verify the Cauchy condition in Theorem 11.23.
Let > 0. A continuous function on a compact set is uniformly continuous, so there exists δ > 0 such that
|f (x) − f (y)| <

b−a

for all x, y ∈ [a, b] such that |x − y| < δ.

Choose a partition P = {I1 , I2 , . . . , In } of [a, b] such that |Ik | < δ for every k; for example, we can take n intervals of equal length (b − a)/n with n > (b − a)/δ.
Since f is continuous, it attains its maximum and minimum values Mk and mk on the compact interval Ik at points xk and yk in Ik . These points satisfy
|xk − yk | < δ, so
Mk − mk = f (xk ) − f (yk ) <

b−a

.

220

11. The Riemann Integral

The upper and lower sums of f therefore satisfy n n

U (f ; P ) − L(f ; P ) =

Mk |Ik | − k=1 n

mk |Ik | k=1 (Mk − mk )|Ik |

=

k=1 n <

b−a

|Ik | k=1 < , and Theorem 11.23 implies that f is integrable.
Example 11.29. The function f (x) = x2 on [0, 1] considered in Example 11.27 is integrable since it is continuous.
Another class of integrable functions consists of monotonic (increasing or decreasing) functions.
Theorem 11.30. A monotonic function f : [a, b] → R on a compact interval is
Riemann integrable.
Proof. Suppose that f is monotonic increasing, meaning that f (x) ≤ f (y) for x ≤
y. Let Pn = {I1 , I2 , . . . , In } be a partition of [a, b] into n intervals Ik = [xk−1 , xk ], of equal length (b − a)/n, with endpoints k xk = a + (b − a) , n k = 0, 1, . . . , n.

Since f is increasing,
Mk = sup f = f (xk ),

mk = inf f = f (xk−1 ).
Ik

Ik

Hence, summing a telescoping series, we get n U (f ; Pn ) − L(U ; Pn ) =

(Mk − mk ) (xk − xk−1 ) k=1 =

b−a n n

[f (xk ) − f (xk−1 )] k=1 b−a
[f (b) − f (a)] . n It follows that U (f ; Pn ) − L(U ; Pn ) → 0 as n → ∞, and Theorem 11.26 implies that f is integrable.
=

The proof for a monotonic decreasing function f is similar, with sup f = f (xk−1 ),
Ik

inf f = f (xk ),
Ik

or we can apply the result for increasing functions to −f and use Theorem 11.32 below. 11.4. Continuous and monotonic functions

221

1.2

1

y

0.8

0.6

0.4

0.2

0

0

0.2

0.4

0.6

0.8

1

x

Figure 2. The graph of the monotonic function in Example 11.31 with a countably infinite, dense set of jump discontinuities.

Monotonic functions needn’t be continuous, and they may be discontinuous at a countably infinite number of points.
Example 11.31. Let {qk : k ∈ N} be an enumeration of the rational numbers in
[0, 1) and let (ak ) be a sequence of strictly positive real numbers such that


ak = 1. k=1 Define f : [0, 1] → R by f (x) =

Q(x) = {k ∈ N : qk ∈ [0, x)} .

ak , k∈Q(x) for x > 0, and f (0) = 0. That is, f (x) is obtained by summing the terms in the series whose indices k correspond to the rational numbers such that 0 ≤ qk < x.
For x = 1, this sum includes all the terms in the series, so f (1) = 1. For every 0 < x < 1, there are infinitely many terms in the sum, since the rationals are dense in [0, x), and f is increasing, since the number of terms increases with x.
By Theorem 11.30, f is Riemann integrable on [0, 1]. Although f is integrable, it has a countably infinite number of jump discontinuities at every rational number in [0, 1), which are dense in [0, 1], The function is continuous elsewhere (the proof is left as an exercise).
Figure 2 shows the graph of f corresponding to the enumeration
{0, 1/2, 1/3, 2/3, 1/4, 3/4, 1/5, 2/5, 3/5, 4/5, 1/6, 5/6, 1/7, . . . } of the rational numbers in [0, 1) and ak =
What is its Riemann integral?

6 1
.
π2 k2

222

11. The Riemann Integral

11.5. Linearity, monotonicity, and additivity
The integral has the following three basic properties.
(1) Linearity:
g.
a

a

a

a

a

b

f+

(f + g) =

f,

cf = c

b

b

b

b

(2) Monotonicity: if f ≤ g, then b b

f≤

g.

a

a

(3) Additivity: if a < c < b, then c b

f+

b

f=

a

f.

c

a

These properties are analogous to the corresponding properties of sums (or convergent series): n n

cak = c k=1 n

n

ak ,

n

(ak + bk ) =

k=1

k=1

n

ak + k=1 bk ; k=1 n

ak ≤

bk

if ak ≤ bk ;

k=1 n k=1 m ak + k=1 n

ak = k=m+1 ak . k=1 In this section, we prove these properties and derive a few of their consequences.
11.5.1. Linearity. We begin by proving the linearity. First we prove linearity with respect to scalar multiplication and then linearity with respect to sums.
Theorem 11.32. If f : [a, b] → R is integrable and c ∈ R, then cf is integrable and b

b

cf = c a f. a Proof. Suppose that c ≥ 0. Then for any set A ⊂ [a, b], we have sup cf = c sup f,
A

A

inf cf = c inf f,
A

A

so U (cf ; P ) = cU (f ; P ) for every partition P . Taking the infimum over the set Π of all partitions of [a, b], we get
U (cf ) = inf U (cf ; P ) = inf cU (f ; P ) = c inf U (f ; P ) = cU (f ).
P ∈Π

P ∈Π

P ∈Π

Similarly, L(cf ; P ) = cL(f ; P ) and L(cf ) = cL(f ). If f is integrable, then
U (cf ) = cU (f ) = cL(f ) = L(cf ), which shows that cf is integrable and b b

f.

cf = c a a

11.5. Linearity, monotonicity, and additivity

223

Now consider −f . Since sup(−f ) = − inf f,

inf (−f ) = − sup f,

A

A

A

A

we have
U (−f ; P ) = −L(f ; P ),

L(−f ; P ) = −U (f ; P ).

Therefore
U (−f ) = inf U (−f ; P ) = inf [−L(f ; P )] = − sup L(f ; P ) = −L(f ),
P ∈Π

P ∈Π

P ∈Π

L(−f ) = sup L(−f ; P ) = sup [−U (f ; P )] = − inf U (f ; P ) = −U (f ).
P ∈Π

P ∈Π

P ∈Π

Hence, −f is integrable if f is integrable and b b

(−f ) = −

f.

a

a

Finally, if c < 0, then c = −|c|, and a successive application of the previous results b b shows that cf is integrable with a cf = c a f .
Next, we prove the linearity of the integral with respect to sums. If f , g are bounded, then f + g is bounded and sup(f + g) ≤ sup f + sup g,
I

I

inf (f + g) ≥ inf f + inf g.
I

I

I

I

It follows that osc(f + g) ≤ osc f + osc g,
I

I

I

so f +g is integrable if f , g are integrable. In general, however, the upper (or lower) sum of f + g needn’t be the sum of the corresponding upper (or lower) sums of f and g. As a result, we don’t get b b

(f + g) = a b

f+ a g a simply by adding upper and lower sums. Instead, we prove this equality by estimating the upper and lower integrals of f + g from above and below by those of f and g.
Theorem 11.33. If f, g : [a, b] → R are integrable functions, then f + g is integrable, and b b

(f + g) = a b

f+ a g. a Proof. We first prove that if f, g : [a, b] → R are bounded, but not necessarily integrable, then
U (f + g) ≤ U (f ) + U (g),

L(f + g) ≥ L(f ) + L(g).

224

11. The Riemann Integral

Suppose that P = {I1 , I2 , . . . , In } is a partition of [a, b]. Then n sup(f + g) · |Ik |

U (f + g; P ) = k=1 n



Ik n sup f · |Ik | + k=1 Ik

sup g · |Ik | k=1 Ik

≤ U (f ; P ) + U (g; P ).
Let > 0. Since the upper integral is the infimum of the upper sums, there are partitions Q, R such that
U (g; R) < U (g) + ,
U (f ; Q) < U (f ) + ,
2
2 and if P is a common refinement of Q and R, then
U (f ; P ) < U (f ) + ,
2

U (g; P ) < U (g) + .
2

It follows that
U (f + g) ≤ U (f + g; P ) ≤ U (f ; P ) + U (g; P ) < U (f ) + U (g) + .
Since this inequality holds for arbitrary > 0, we must have U (f +g) ≤ U (f )+U (g).
Similarly, we have L(f + g; P ) ≥ L(f ; P ) + L(g; P ) for all partitions P , and for every > 0, we get L(f + g) > L(f ) + L(g) − , so L(f + g) ≥ L(f ) + L(g).
For integrable functions f and g, it follows that
U (f + g) ≤ U (f ) + U (g) = L(f ) + L(g) ≤ L(f + g).
Since U (f + g) ≥ L(f + g), we have U (f + g) = L(f + g) and f + g is integrable.
Moreover, there is equality throughout the previous inequality, which proves the result. Although the integral is linear, the upper and lower integrals of non-integrable functions are not, in general, linear.
Example 11.34. Define f, g : [0, 1] → R by f (x) =

1
0

if x ∈ [0, 1] ∩ Q, if x ∈ [0, 1] \ Q,

g(x) =

0 if x ∈ [0, 1] ∩ Q,
1 if x ∈ [0, 1] \ Q.

That is, f is the Dirichlet function and g = 1 − f . Then
U (f ) = U (g) = 1,

L(f ) = L(g) = 0,

U (f + g) = L(f + g) = 1,

so
U (f + g) < U (f ) + U (g),

L(f + g) > L(f ) + L(g).

The product of integrable functions is also integrable, as is the quotient provided it remains bounded. Unlike the integral of the sum, however, there is no way to express the integral of the product f g in terms of f and g.
Theorem 11.35. If f, g : [a, b] → R are integrable, then f g : [a, b] → R is integrable. If, in addition, g = 0 and 1/g is bounded, then f /g : [a, b] → R is integrable. 11.5. Linearity, monotonicity, and additivity

225

Proof. First, we show that the square of an integrable function is integrable. If f is integrable, then f is bounded, with |f | ≤ M for some M ≥ 0. For all x, y ∈ [a, b], we have f 2 (x) − f 2 (y) = |f (x) + f (y)| · |f (x) − f (y)| ≤ 2M |f (x) − f (y)|.
Taking the supremum of this inequality over x, y ∈ I ⊂ [a, b] and using Proposition 11.8, we get that sup(f 2 ) − inf (f 2 ) ≤ 2M sup f − inf f .
I

I

I

I

meaning that osc(f 2 ) ≤ 2M osc f.
I

I

If follows from Proposition 11.25 that f 2 is integrable if f is integrable.
Since the integral is linear, we then see from the identity
1
(f + g)2 − (f − g)2 fg =
4
that f g is integrable if f , g are integrable. We remark that the trick of representing a product as a difference of squares isn’t a new one: the ancient Babylonian apparently used this identity, together with a table of squares, to compute products.
In a similar way, if g = 0 and |1/g| ≤ M , then
1
1
|g(x) − g(y)|

=
≤ M 2 |g(x) − g(y)| . g(x) g(y)
|g(x)g(y)|
Taking the supremum of this equation over x, y ∈ I ⊂ [a, b], we get sup I

1 g 1 g − inf
I

≤ M 2 sup g − inf g ,
I

I

2

meaning that oscI (1/g) ≤ M oscI g, and Proposition 11.25 implies that 1/g is integrable if g is integrable. Therefore f /g = f · (1/g) is integrable.
11.5.2. Monotonicity. Next, we prove the monotonicity of the integral.
Theorem 11.36. Suppose that f, g : [a, b] → R are integrable and f ≤ g. Then b b

f≤ a g. a Proof. First suppose that f ≥ 0 is integrable. Let P be the partition consisting of the single interval [a, b]. Then
L(f ; P ) = inf f · (b − a) ≥ 0,
[a,b]

so

b

f ≥ L(f ; P ) ≥ 0. a If f ≥ g, then h = f − g ≥ 0, and the linearity of the integral implies that b b

f− a which proves the theorem.

b

h ≥ 0,

g= a a

226

11. The Riemann Integral

One immediate consequence of this theorem is the following simple, but useful, estimate for integrals.
Theorem 11.37. Suppose that f : [a, b] → R is integrable and
M = sup f,

m = inf f.
[a,b]

[a,b]

Then b m(b − a) ≤

f ≤ M (b − a). a Proof. Since m ≤ f ≤ M on [a, b], Theorem 11.36 implies that b b

m≤ a b

f≤ a M, a which gives the result.
This estimate also follows from the definition of the integral in terms of upper and lower sums, but once we’ve established the monotonicity of the integral, we don’t need to go back to the definition.
A further consequence is the intermediate value theorem for integrals, which states that a continuous function on a compact interval is equal to its average value at some point in the interval.
Theorem 11.38. If f : [a, b] → R is continuous, then there exists x ∈ [a, b] such that b
1
f. f (x) = b−a a
Proof. Since f is a continuous function on a compact interval, the extreme value theorem (Theorem 7.37) implies it attains its maximum value M and its minimum value m. From Theorem 11.37, m≤ 1 b−a b

f ≤ M. a By the intermediate value theorem (Theorem 7.44), f takes on every value between m and M , and the result follows.
As shown in the proof of Theorem 11.36, given linearity, monotonicity is equivalent to positivity, b f ≥0

if f ≥ 0.

a

We remark that even though the upper and lower integrals aren’t linear, they are monotone. Proposition 11.39. If f, g : [a, b] → R are bounded functions and f ≤ g, then
U (f ) ≤ U (g),

L(f ) ≤ L(g).

11.5. Linearity, monotonicity, and additivity

227

Proof. From Proposition 11.1, we have for every interval I ⊂ [a, b] that sup f ≤ sup g,
I

inf f ≤ inf g.
I

I

I

It follows that for every partition P of [a, b], we have
U (f ; P ) ≤ U (g; P ),

L(f ; P ) ≤ L(g; P ).

Taking the infimum of the upper inequality and the supremum of the lower inequality over P , we get that U (f ) ≤ U (g) and L(f ) ≤ L(g).
We can estimate the absolute value of an integral by taking the absolute value under the integral sign. This is analogous to the corresponding property of sums: n n

an ≤ k=1 |ak |. k=1 Theorem 11.40. If f is integrable, then |f | is integrable and b b

f ≤ a |f |. a Proof. First, suppose that |f | is integrable. Since
−|f | ≤ f ≤ |f |, we get from Theorem 11.36 that b −

b

|f | ≤ a b

f≤

b

|f |,

a

b

f ≤

or

a

a

|f |. a To complete the proof, we need to show that |f | is integrable if f is integrable.
For x, y ∈ [a, b], the reverse triangle inequality gives
| |f (x)| − |f (y)| | ≤ |f (x) − f (y)|.
Using Proposition 11.8, we get that sup |f | − inf |f | ≤ sup f − inf f,
I

I

I

I

meaning that oscI |f | ≤ oscI f . Proposition 11.25 then implies that |f | is integrable if f is integrable.
In particular, we immediately get the following basic estimate for an integral.
Corollary 11.41. If f : [a, b] → R is integrable and M = sup[a,b] |f |, then b f ≤ M (b − a). a Finally, we prove a useful positivity result for the integral of continuous functions.
Proposition 11.42. If f : [a, b] → R is a continuous function such that f ≥ 0 and b f = 0, then f = 0. a 228

11. The Riemann Integral

Proof. Suppose for contradiction that f (c) > 0 for some a ≤ c ≤ b. For definiteness, assume that a < c < b. (The proof is similar if c is an endpoint.) Then, since f is continuous, there exists δ > 0 such that f (c) for c − δ ≤ x ≤ c + δ,
2
where we choose δ small enough that c − δ > a and c + δ < b. It follows that
|f (x) − f (c)| ≤

f (c)
2
for c − δ ≤ x ≤ c + δ. Using this inequality and the assumption that f ≥ 0, we get f (x) = f (c) + f (x) − f (c) ≥ f (c) − |f (x) − f (c)| ≥

b

c−δ

f= a c+δ

f+ a b

f ≥0+

f+ c−δ c+δ

f (c)
· 2δ + 0 > 0.
2

This contradiction proves the result.
The assumption that f ≥ 0 is, of course, required, otherwise the integral of the function may be zero due to cancelation.
Example 11.43. The function f : [−1, 1] → R defined by f (x) = x is continuous
1
and nonzero, but −1 f = 0.
Continuity is also required; for example, the discontinuous function in Example 11.14 is nonzero, but its integral is zero.
11.5.3. Additivity. Finally, we prove additivity. This property refers to additivity with respect to the interval of integration, rather than linearity with respect to the function being integrated.
Theorem 11.44. Suppose that f : [a, b] → R and a < c < b. Then f is Riemann integrable on [a, b] if and only if it is Riemann integrable on [a, c] and [c, b].
Moreover, in that case, b c

f= a b

f+ a f. c Proof. Suppose that f is integrable on [a, b]. Then, given > 0, there is a partition
P of [a, b] such that U (f ; P ) − L(f ; P ) < . Let P = P ∪ {c} be the refinement of P obtained by adding c to the endpoints of P . (If c ∈ P , then P = P .) Then
P = Q ∪ R where Q = P ∩ [a, c] and R = P ∩ [c, b] are partitions of [a, c] and [c, b] respectively. Moreover,
U (f ; P ) = U (f ; Q) + U (f ; R),

L(f ; P ) = L(f ; Q) + L(f ; R).

It follows that
U (f ; Q) − L(f ; Q) = U (f ; P ) − L(f ; P ) − [U (f ; R) − L(f ; R)]
≤ U (f ; P ) − L(f ; P ) < , which proves that f is integrable on [a, c]. Exchanging Q and R, we get the proof for [c, b].

11.5. Linearity, monotonicity, and additivity

229

Conversely, if f is integrable on [a, c] and [c, b], then there are partitions Q of
[a, c] and R of [c, b] such that
U (f ; Q) − L(f ; Q) <

2

U (f ; R) − L(f ; R) <

,

2

.

Let P = Q ∪ R. Then
U (f ; P ) − L(f ; P ) = U (f ; Q) − L(f ; Q) + U (f ; R) − L(f ; R) < , which proves that f is integrable on [a, b].
Finally, if f is integrable, then with the partitions P , Q, R as above, we have b f ≤ U (f ; P ) = U (f ; Q) + U (f ; R) a < L(f ; Q) + L(f ; R) + c b

<

f+

f+ .

a

c

Similarly, b f ≥ L(f ; P ) = L(f ; Q) + L(f ; R) a > U (f ; Q) + U (f ; R) − c >

b

f− .

f+ a Since

c

> 0 is arbitrary, we see that

b a f=

c a b c f+

f.

We can extend the additivity property of the integral by defining an oriented
Riemann integral.
Definition 11.45. If f : [a, b] → R is integrable, where a < b, and a ≤ c ≤ b, then a b

f =− b c

f, a f = 0. c With this definition, the additivity property in Theorem 11.44 holds for all a, b, c ∈ R for which the oriented integrals exist. Moreover, if |f | ≤ M , then the estimate in Corollary 11.41 becomes b f ≤ M |b − a| a for all a, b ∈ R (even if a ≥ b).
The oriented Riemann integral is a special case of the integral of a differential form. It assigns a value to the integral of a one-form f dx on an oriented interval.

230

11. The Riemann Integral

11.6. Further existence results
In this section, we prove several further useful conditions for the existences of the
Riemann integral.
First, we show that changing the values of a function at finitely many points doesn’t change its integrability of the value of its integral.
Proposition 11.46. Suppose that f, g : [a, b] → R and f (x) = g(x) except at finitely many points x ∈ [a, b]. Then f is integrable if and only if g is integrable, and in that case b b

g.

f= a a

Proof. It is sufficient to prove the result for functions whose values differ at a single point, say c ∈ [a, b]. The general result then follows by repeated application of this result.
Since f , g differ at a single point, f is bounded if and only if g is bounded. If f , g are unbounded, then neither one is integrable. If f , g are bounded, we will show that f , g have the same upper and lower integrals. The reason is that their upper and lower sums differ by an arbitrarily small amount with respect to a partition that is sufficiently refined near the point where the functions differ.
Suppose that f , g are bounded with |f |, |g| ≤ M on [a, b] for some M > 0. Let
> 0. Choose a partition P of [a, b] such that
U (f ; P ) < U (f ) + .
2
Let Q = {I1 , . . . , In } be a refinement of P such that |Ik | < δ for k = 1, . . . , n, where
.
8M
Then g differs from f on at most two intervals in Q. (This could happen on two intervals if c is an endpoint of the partition.) On such an interval Ik we have δ= sup g − sup f ≤ sup |g| + sup |f | ≤ 2M,
Ik

Ik

Ik

Ik

and on the remaining intervals, supIk g − supIk f = 0. It follows that
|U (g; Q) − U (f ; Q)| < 2M · 2δ <

.
2
Using the properties of upper integrals and refinements, we obtain that
U (g) ≤ U (g; Q) < U (f ; Q) +

≤ U (f ; P ) + < U (f ) + .
2
2
Since this inequality holds for arbitrary > 0, we get that U (g) ≤ U (f ). Exchanging f and g, we see similarly that U (f ) ≤ U (g), so U (f ) = U (g).
An analogous argument for lower sums (or an application of the result for upper sums to −f , −g) shows that L(f ) = L(g). Thus U (f ) = L(f ) if and only if b b
U (g) = L(g), in which case a f = a g.
Example 11.47. The function f in Example 11.14 differs from the 0-function at one point. It is integrable and its integral is equal to 0.

11.6. Further existence results

231

The conclusion of Proposition 11.46 can fail if the functions differ at a countably infinite number of points. One reason is that we can turn a bounded function into an unbounded function by changing its values at an countably infinite number of points. Example 11.48. Define f : [0, 1] → R by f (x) =

n if x = 1/n for n ∈ N,
0 otherwise.

Then f is equal to the 0-function except on the countably infinite set {1/n : n ∈ N}, but f is unbounded and therefore it’s not Riemann integrable.
The result in Proposition 11.46 is still false, however, for bounded functions that differ at a countably infinite number of points.
Example 11.49. The Dirichlet function in Example 11.15 is bounded and differs from the 0-function on the countably infinite set of rationals, but it isn’t Riemann integrable. The Lebesgue integral is better behaved than the Riemann intgeral in this respect: two functions that are equal almost everywhere, meaning that they differ on a set of Lebesgue measure zero, have the same Lebesgue integrals. In particular, two functions that differ on a countable set have the same Lebesgue integrals (see
Section 11.8).
The next proposition allows us to deduce the integrability of a bounded function on an interval from its integrability on slightly smaller intervals.
Proposition 11.50. Suppose that f : [a, b] → R is bounded and integrable on
[a, r] for every a < r < b. Then f is integrable on [a, b] and b a

r

f = lim

r→b

f. a Proof. Since f is bounded, |f | ≤ M on [a, b] for some M > 0. Given

> 0, let

r =b−

4M
(where we assume is sufficiently small that r > a). Since f is integrable on [a, r], there is a partition Q of [a, r] such that
U (f ; Q) − L(f ; Q) <

.
2
Then P = Q∪{b} is a partition of [a, b] whose last interval is [r, b]. The boundedness of f implies that sup f − inf f ≤ 2M.
[r,b]

[r,b]

Therefore
U (f ; P ) − L(f ; P ) = U (f ; Q) − L(f ; Q) + sup f − inf f · (b − r)
[r,b]

<

2

+ 2M · (b − r) = ,

[r,b]

232

11. The Riemann Integral

so f is integrable on [a, b] by Theorem 11.23. Moreover, using the additivity of the integral, we get f− a

b

r

b

f ≤ M · (b − r) → 0

f =

as r → b− .

r

a

An obvious analogous result holds for the left endpoint.
Example 11.51. Define f : [0, 1] → R by f (x) =

sin(1/x) if 0 < x ≤ 1,
0
if x = 0.

Then f is bounded on [0, 1]. Furthemore, f is continuous and therefore integrable on [r, 1] for every 0 < r < 1. It follows from Proposition 11.50 that f is integrable on [0, 1].
The assumption in Proposition 11.50 that f is bounded on [a, b] is essential.
Example 11.52. The function f : [0, 1] → R defined by f (x) =

1/x
0

for 0 < x ≤ 1, for x = 0,

is continuous and therefore integrable on [r, 1] for every 0 < r < 1, but it’s unbounded and therefore not integrable on [0, 1].
As a corollary of this result and the additivity of the integral, we prove a generalization of the integrability of continuous functions to piecewise continuous functions. Theorem 11.53. If f : [a, b] → R is a bounded function with finitely many discontinuities, then f is Riemann integrable.
Proof. By splitting the interval into subintervals with the discontinuities of f at an endpoint and using Theorem 11.44, we see that it is sufficient to prove the result if f is discontinuous only at one endpoint of [a, b], say at b. In that case, f is continuous and therefore integrable on any smaller interval [a, r] with a < r < b, and Proposition 11.50 implies that f is integrable on [a, b].
Example 11.54. Define f : [0, 2π] → R by f (x) =

sin (1/sin x) if x = 0, π, 2π,
0
if x = 0, π, 2π.

Then f is bounded and continuous except at x = 0, π, 2π, so it is integrable on [0, 2π]
(see Figure 3). This function doesn’t have jump discontinuities, but Theorem 11.53 still applies.

11.6. Further existence results

233

1

y

0.5

0

−0.5

−1
0

1

2

3

4

5

6

x

Figure 3. Graph of the Riemann integrable function y = sin(1/ sin x) in Example 11.54.

1
0.8
0.6
0.4
0.2
0
−0.2
−0.4
−0.6
−0.8
−1
0

0.05

0.1

0.15

0.2

0.25

0.3

Figure 4. Graph of the Riemann integrable function y = sgn(sin(1/x)) in Example 11.55.

Example 11.55. Define f : [0, 1/π] → R by f (x) =

sgn [sin (1/x)] if x = 1/nπ for n ∈ N,
0
if x = 0 or x = 1/nπ for n ∈ N,

where sgn is the sign function,

1

sgn x = 0


−1

if x > 0, if x = 0, if x < 0.

234

11. The Riemann Integral

Then f oscillates between 1 and −1 a countably infinite number of times as x →
0+ (see Figure 4). It has jump discontinuities at x = 1/(nπ) and an essential discontinuity at x = 0. Nevertheless, it is Riemann integrable. To see this, note that f is bounded on [0, 1] and piecewise continuous with finitely many discontinuities on [r, 1] for every 0 < r < 1. Theorem 11.53 implies that f is Riemann integrable on [r, 1], and then Theorem 11.50 implies that f is integrable on [0, 1].

11.7. * Riemann sums
Instead of using upper and lower sums, we can give an equivalent definition of the
Riemann integral as a limit of Riemann sums. This was, in fact, Riemann’s original definition [11], which he gave in 1854 in his Habilitationsschrift (a kind of postdoctoral dissertation required of German academics), building on previous work of
Cauchy who defined the integral for continuous functions.
It is interesting to note that the topic of Riemann’s Habilitationsschrift was not integration theory, but Fourier series. Riemann introduced a definition of the integral along the way so that he could state his results more precisely. Many of the fundamental developments of rigorous real analysis in the nineteenth century were motivated by problems related to Fourier series and their convergence.
Upper and lower sums were introduced subsequently by Darboux, and they simplify the theory. We won’t use Riemann sums here, but we will explain the equivalence of the definitions. We’ll say, temporarily, that a function is Darboux integrable if it satisfies Definition 11.11.
To give Riemann’s definition, we first define a tagged partition (P, C) of a compact interval [a, b] to be a partition
P = {I1 , I2 , . . . , In } of the interval together with a set
C = {c1 , c2 , . . . , cn } of points such that ck ∈ Ik for k = 1, . . . , n. (We think of the point ck as a “tag” attached to the interval Ik .)
If f : [a, b] → R, then we define the Riemann sum of f with respect to the tagged partition (P, C) by n f (ck )|Ik |.

S(f ; P, C) = k=1 That is, instead of using the supremum or infimum of f on the kth interval in the sum, we evaluate f at a point in the interval. Roughly speaking, a function is
Riemann integrable if its Riemann sums approach the same value as the partition is refined, independently of how we choose the points ck ∈ Ik .
As a measure of the refinement of a partition P = {I1 , I2 , . . . , In }, we define the mesh (or norm) of P to be the maximum length of its intervals, mesh(P ) = max |Ik | = max |xk − xk−1 |.
1≤k≤n

1≤k≤n

11.7. * Riemann sums

235

Definition 11.56. A function f : [a, b] → R is Riemann integrable on [a, b] if there exists a number R ∈ R with the following property: For every > 0 there is a δ > 0 such that
|S(f ; P, C) − R| < for every tagged partition (P, C) of [a, b] with mesh(P ) < δ. In that case, R = is the Riemann integral of f on [a, b].

b a f

Note that
L(f ; P ) ≤ S(f ; P, C) ≤ U (f ; P ), so the Riemann sums are “squeezed” between the upper and lower sums. The following theorem shows that the Darboux and Riemann definitions lead to the same notion of the integral, so it’s a matter of convenience which definition we adopt as our starting point.
Theorem 11.57. A function is Riemann integrable (in the sense of Definition 11.56) if and only if it is Darboux integrable (in the sense of Definition 11.11). Furthermore, in that case, the Riemann and Darboux integrals of the function are equal.
Proof. First, suppose that f : [a, b] → R is Riemann integrable with integral R.
Then f is bounded on [a, b]; otherwise, it would be unbounded in some interval Ik of every partition P , and its Riemann sums with respect to P would be arbitrarily large for suitable points ck ∈ Ik , so no R ∈ R could satisfy Definition 11.56.
Let > 0. Since f is Riemann integrable, there is a partition P = {I1 , I2 , . . . , In } of [a, b] such that
|S(f ; P, C) − R| <

2 for every set of points C = {ck ∈ Ik : k = 1, . . . , n}. If Mk = supIk f , then there exists ck ∈ Ik such that
Mk −
< f (ck ).
2(b − a)
It follows that n n

Mk |Ik | − k=1 2

f (ck )|Ik |,

< k=1 meaning that U (f ; P ) − /2 < S(f ; P, C). Since S(f ; P, C) < R + /2, we get that
U (f ) ≤ U (f ; P ) < R + .
Similarly, if mk = inf Ik f , then there exists ck ∈ Ik such that n mk +

2(b − a)

n

mk |Ik | +

> f (ck ), k=1 2

f (ck )|Ik |,

> k=1 and L(f ; P ) + /2 > S(f ; P, C). Since S(f ; P, C) > R − /2, we get that
L(f ) ≥ L(f ; P ) > R − .
These inequalities imply that
L(f ) + > R > U (f ) − for every > 0, and therefore L(f ) ≥ R ≥ U (f ). Since L(f ) ≤ U (f ), we conclude that L(f ) = R = U (f ), so f is Darboux integrable with integral R.

236

11. The Riemann Integral

Conversely, suppose that f is Darboux integrable. The main point is to show that if > 0, then U (f ; P ) − L(f ; P ) < not just for some partition but for every partition whose mesh is sufficiently small.
Let > 0 be given. Since f is Darboux integrable. there exists a partition Q such that
U (f ; Q) − L(f ; Q) < .
4
Suppose that Q contains m intervals and |f | ≤ M on [a, b]. We claim that if δ= then U (f ; P ) − L(f ; P ) <

,
8mM
for every partition P with mesh(P ) < δ.

To prove this claim, suppose that P = {I1 , I2 , . . . , In } is a partition with mesh(P ) < δ. Let P be the largest common refinement of P and Q, so that the endpoints of P consist of the endpoints of P or Q. Since a, b are common endpoints of P and Q, there are at most m − 1 endpoints of Q that are distinct from endpoints of P . Therefore, at most m − 1 intervals in P contain additional endpoints of Q and are strictly refined in P , meaning that they are the union of two or more intervals in P .
Now consider U (f ; P ) − U (f ; P ). The terms that correspond to the same, unrefined intervals in P and P cancel. If Ik is a strictly refined interval in P , then the corresponding terms in each of the sums U (f ; P ) and U (f ; P ) can be estimated by M |Ik | and their difference by 2M |Ik |. There are at most m − 1 such intervals and |Ik | < δ, so it follows that
U (f ; P ) − U (f ; P ) < 2(m − 1)M δ <

4

.

Since P is a refinement of Q, we get
U (f ; P ) < U (f ; P ) +

4

≤ U (f ; Q) +

4

< L(f ; Q) + .
2

It follows by a similar argument that
L(f ; P ) − L(f ; P ) <

4

,

and
≥ L(f ; Q) − > U (f ; Q) − .
4
4
2
Since L(f ; Q) ≤ U (f ; Q), we conclude from these inequalities that
L(f ; P ) > L(f ; P ) −

U (f ; P ) − L(f ; P ) < for every partition P with mesh(P ) < δ.
If D denotes the Darboux integral of f , then we have
L(f ; P ) ≤ D ≤ U (f, P ),

L(f ; P ) ≤ S(f ; P, C) ≤ U (f ; P ).

Since U (f ; P ) − L(f ; P ) < for every partition P with mesh(P ) < δ, it follows that
|S(f ; P, C) − D| < .
Thus, f is Riemann integrable with Riemann integral D.

11.7. * Riemann sums

237

Finally, we give a necessary and sufficient condition for Riemann integrability that was proved by Riemann himself (1854). (See [5] for further discussion.) To state the condition, we introduce some notation.
Let f ; [a, b] → R be a bounded function. If P = {I1 , I2 , . . . , In } is a partition of [a, b] and > 0, let A (P ) ⊂ {1, . . . , n} be the set of indices k such that osc f = sup f − inf f ≥
Ik

for k ∈ A (P ).

Ik

Ik

Similarly, let B (P ) ⊂ {1, . . . , n} be the set of indices such that for k ∈ B (P ).

osc f <
Ik

That is, the oscillation of f on Ik is “large” if k ∈ A (P ) and “small” if k ∈ B (P ).
We denote the sum of the lengths of the intervals in P where the oscillation of f is
“large” by s (P ) =
|Ik |. k∈A (P )

Fixing > 0, we say that s (P ) → 0 as mesh(P ) → 0 if for every η > 0 there exists δ > 0 such that mesh(P ) < δ implies that s (P ) < η.
Theorem 11.58. A function is Riemann integrable if and only if s (P ) → 0 as mesh(P ) → 0 for every > 0.
Proof. Let f : [a, b] → R be Riemann integrable with |f | ≤ M on [a, b].
First, suppose that the condition holds, and let > 0. If P is a partition of
[a, b], then, using the notation above for A (P ), B (P ) and the inequality
0 ≤ osc f ≤ 2M,
Ik

we get that n U (f ; P ) − L(f ; P ) =

osc f · |Ik | k=1 Ik

osc f · |Ik | +

= k∈A (P )

Ik

≤ 2M

osc f · |Ik | k∈B (P )

|Ik | + k∈A (P )

Ik

|Ik | k∈B (P )

≤ 2M s (P ) + (b − a).
By assumption, there exists δ > 0 such that s (P ) < if mesh(P ) < δ, in which case U (f ; P ) − L(f ; P ) < (2M + b − a).
The Cauchy criterion in Theorem 11.23 then implies that f is integrable.
Conversely, suppose that f is integrable, and let > 0 be given. If P is a partition, we can bound s (P ) from above by the difference between the upper and lower sums as follows:
U (f ; P ) − L(f ; P ) ≥

osc f · |Ik | ≥ k∈A (P )

Ik

|Ik | = s (P ). k∈A (P )

238

11. The Riemann Integral

Since f is integrable, for every η > 0 there exists δ > 0 such that mesh(P ) < δ implies that
U (f ; P ) − L(f ; P ) < η.
Therefore, mesh(P ) < δ implies that
1

s (P ) ≤

[U (f ; P ) − L(f ; P )] < η,

which proves the result.
This theorem has the drawback that the necessary and sufficient condition for Riemann integrability is somewhat complicated and, in general, isn’t easy to verify. In the next section, we state a simpler necessary and sufficient condition for
Riemann integrability.

11.8. * The Lebesgue criterion
Although the Dirichlet function in Example 11.15 is not Riemann integrable, it is
Lebesgue integrable. Its Lebesgue integral is given by
1

f = 1 · |A| + 0 · |B|
0

where A = [0, 1] ∩ Q is the set of rational numbers in [0, 1], B = [0, 1] \ Q is the set of irrational numbers, and |E| denotes the Lebesgue measure of a set E. The
Lebesgue measure of a subset of R is a generalization of the length of an interval which applies to more general sets. It turns out that |A| = 0 (as is true for any countable set of real numbers — see Example 11.60 below) and |B| = 1. Thus, the
Lebesgue integral of the Dirichlet function is 0.
A necessary and sufficient condition for Riemann integrability can be given in terms of Lebesgue measure. To state this condition, we first define what it means for a set to have Lebesgue measure zero.
Definition 11.59. A set E ⊂ R has Lebesgue measure zero if for every > 0 there is a countable collection of open intervals {(ak , bk ) : k ∈ N} such that


E⊂



(bk − ak ) < .

(ak , bk ), k=1 k=1

The open intervals is this definition are not required to be disjoint, and they may “overlap.”
Example 11.60. Every countable set E = {xk ∈ R : k ∈ N} has Lebesgue measure zero. To prove this, let > 0 and for each k ∈ N define ak = xk −

Then E ⊂

, bk = xk + k+2 .
2k+2
2

k=1 (ak , bk ) since xk ∈ (ak , bk ) and




(bk − ak ) = k=1 k=1

2k+1

=

2

< ,

11.8. * The Lebesgue criterion

239

so the Lebesgue measure of E is equal to zero. (The ‘ /2k ’ trick used here is a common one in measure theory.)
If E = [0, 1] ∩ Q consists of the rational numbers in [0, 1], then the set G = described above encloses the dense set of rationals in a collection of open intervals the sum of whose lengths is arbitrarily small. This set isn’t so easy to visualize. Roughly speaking, if is small and we look at a section of [0, 1] at a given magnification, then we see a few of the longer intervals in G with relatively large gaps between them. Magnifying one of these gaps, we see a few more intervals with large gaps between them, magnifying those gaps, we see a few more intervals, and so on. Thus, the set G has a fractal structure, meaning that it looks similar at all scales of magnification.

k=1 (ak , bk )

In general, we have the following result, due to Lebesgue, which we state without proof.
Theorem 11.61. A function f : [a, b] → R is Riemann integrable if and only if it is bounded and the set of points at which it is discontinuous has Lebesgue measure zero. For example, the set of discontinuities of the Riemann-integrable function in
Example 11.14 consists of a single point {0}, which has Lebesgue measure zero. On the other hand, the set of discontinuities of the non-Riemann-integrable Dirichlet function in Example 11.15 is the entire interval [0, 1], and its set of discontinuities has Lebesgue measure one.
In particular, every bounded function with a countable set of discontinuities is
Riemann integrable, since such a set has Lebesgue measure zero. Riemann integrability of a function does not, however, imply that the function has only countably many discontinuities.
Example 11.62. The Cantor set C in Example 5.64 has Lebesgue measure zero.
To prove this, using the same notation as in Section 5.5, we note that for every n ∈ N the set Fn ⊃ C consists of 2n closed intervals Is of length |Is | = 3−n . For every > 0 and s ∈ Σn , there is an open interval Us of slightly larger length
|Us | = 3−n + 2−n that contains Is . Then {Us : s ∈ Σn } is a cover of C by open intervals, and n 2
+ .
|Us | =
3
s∈Σn

We can make the right-hand side as small as we wish by choosing n large enough and small enough, so C has Lebesgue measure zero.
Let χC : [0, 1] → R be the characteristic function of the Cantor set, χC (x) =

1
0

if x ∈ C, otherwise. ¯
By partitioning [0, 1] into the closed intervals {Us : s ∈ Σn } and the closures of the complementary intervals, we see similarly that the upper Riemann sums of χC can be made arbitrarily small, so χC is Riemann integrable on [0, 1] with zero integral.
The Riemann integrability of the function χC also follows from Theorem 11.61.

240

11. The Riemann Integral

It is, however, discontinuous at every point of C. Thus, χC is an example of a
Riemann integrable function with uncountably many discontinuities.

Chapter 12

Properties and Applications of the Integral

In the integral calculus I find much less interesting the parts that involve only substitutions, transformations, and the like, in short, the parts that involve the known skillfully applied mechanics of reducing integrals to algebraic, logarithmic, and circular functions, than I find the careful and profound study of transcendental functions that cannot be reduced to these functions. (Gauss, 1808)

12.1. The fundamental theorem of calculus
The fundamental theorem of calculus states that differentiation and integration are inverse operations in an appropriately understood sense. The theorem has two parts: in one direction, it says roughly that the integral of the derivative is the original function; in the other direction, it says that the derivative of the integral is the original function.
In more detail, the first part states that if F : [a, b] → R is differentiable with integrable derivative, then b F (x) dx = F (b) − F (a). a This result can be thought of as a continuous analog of the corresponding identity for sums of differences, n (Ak − Ak−1 ) = An − A0 . k=1 The second part states that if f : [a, b] → R is continuous, then d dx

x

f (t) dt = f (x). a 241

242

12. Properties and Applications of the Integral

This is a continuous analog of the corresponding identity for differences of sums, k k−1

aj − j=1 aj = ak . j=1 The proof of the fundamental theorem consists essentially of applying the identities for sums or differences to the appropriate Riemann sums or difference quotients and proving, under appropriate hypotheses, that they converge to the corresponding integrals or derivatives.
We’ll split the statement and proof of the fundamental theorem into two parts.
(The numbering of the parts as I and II is arbitrary.)
12.1.1. Fundamental theorem I. First we prove the statement about the integral of a derivative.
Theorem 12.1 (Fundamental theorem of calculus I). If F : [a, b] → R is continuous on [a, b] and differentiable in (a, b) with F = f where f : [a, b] → R is Riemann integrable, then b f (x) dx = F (b) − F (a). a Proof. Let
P = {x0 , x1 , x2 , . . . , xn−1 , xn }, be a partition of [a, b], with x0 = a and xn = b. Then n [F (xk ) − F (xk−1 )] .

F (b) − F (a) = k=1 The function F is continuous on the closed interval [xk−1 , xk ] and differentiable in the open interval (xk−1 , xk ) with F = f . By the mean value theorem, there exists xk−1 < ck < xk such that
F (xk ) − F (xk−1 ) = f (ck )(xk − xk−1 ).
Since f is Riemann integrable, it is bounded, and mk (xk − xk−1 ) ≤ F (xk ) − F (xk−1 ) ≤ Mk (xk − xk−1 ), where Mk =

sup
[xk−1 ,xk ]

f,

mk =

inf

f.

[xk−1 ,xk ]

Hence, L(f ; P ) ≤ F (b)−F (a) ≤ U (f ; P ) for every partition P of [a, b], which implies b that L(f ) ≤ F (b) − F (a) ≤ U (f ). Since f is integrable, L(f ) = U (f ) = a f and therefore F (b) − F (a) =

b a f.

In Theorem 12.1, we assume that F is continuous on the closed interval [a, b] and differentiable in the open interval (a, b) where its usual two-sided derivative is defined and is equal to f . It isn’t necessary to assume the existence of the right derivative of F at a or the left derivative at b, so the values of f at the endpoints are not necessarily determined by F . By Proposition 11.46, however, the integrability of f on [a, b] and the value of its integral do not depend on these values, so the statement of the theorem makes sense. As a result, we’ll sometimes

12.1. The fundamental theorem of calculus

243

abuse terminology and say that “F is integrable on [a, b]” even if it’s only defined on (a, b).
Theorem 12.1 imposes the integrability of F as a hypothesis. Every function F that is continuously differentiable on the closed interval [a, b] satisfies this condition, but the theorem remains true even if F is a discontinuous, Riemann integrable function. Example 12.2. Define F : [0, 1] → R by
F (x) =

x2 sin(1/x)
0

if 0 < x ≤ 1, if x = 0.

Then F is continuous on [0, 1] and, by the product and chain rules, differentiable in (0, 1]. It is also differentiable — but not continuously differentiable — at 0, with
F (0+ ) = 0. Thus,
F (x) =

− cos (1/x) + 2x sin (1/x) if 0 < x ≤ 1,
0
if x = 0.

The derivative F is bounded on [0, 1] and discontinuous only at one point (x = 0), so Theorem 11.53 implies that F is integrable on [0, 1]. This verifies all of the hypotheses in Theorem 12.1, and we conclude that
1

F (x) dx = sin 1.
0

There are, however, differentiable functions whose derivatives are unbounded or so discontinuous that they aren’t Riemann integrable.

Example 12.3. Define F : [0, 1] → R by F (x) = x. Then F is continuous on
[0, 1] and differentiable in (0, 1], with
1
F (x) = √
2 x

for 0 < x ≤ 1.

This function is unbounded, so F is not Riemann integrable on [0, 1], however we define its value at 0, and Theorem 12.1 does not apply.
We can interpret the integral of F on [0, 1] as an improper Riemann integral (as is discussed further in Section 12.4). The function F is continuously differentiable on [ , 1] for every 0 < < 1, so
1


1
√ dx = 1 − .
2 x

Thus, we get the improper integral
1

lim
+

→0

1
√ dx = 1.
2 x

The construction of a function with a bounded, non-integrable derivative is more involved. It’s not sufficient to give a function with a bounded derivative that is discontinuous at finitely many points, as in Example 12.2, because such a function is Riemann integrable. Rather, one has to construct a differentiable function whose

244

12. Properties and Applications of the Integral

derivative is discontinuous on a set of nonzero Lebesgue measure. Abbott [1] gives an example.
Finally, we remark that Theorem 12.1 remains valid for the oriented Riemann integral, since exchanging a and b reverses the sign of both sides.
12.1.2. Fundamental theorem of calculus II. Next, we prove the other direction of the fundamental theorem. We will use the following result, of independent interest, which states that the average of a continuous function on an interval approaches the value of the function as the length of the interval shrinks to zero. The proof uses a common trick of taking a constant inside an average.
Theorem 12.4. Suppose that f : [a, b] → R is integrable on [a, b] and continuous at a. Then
1 a+h f (x) dx = f (a). lim+ h→0 h a
Proof. If k is a constant, then we have k= a+h

1 h k dx. a (That is, the average of a constant is equal to the constant.) We can therefore write
1
h
Let

a+h

a+h

1 h f (x) dx − f (a) = a [f (x) − f (a)] dx. a > 0. Since f is continuous at a, there exists δ > 0 such that
|f (x) − f (a)| <

for a ≤ x < a + δ.

It follows that if 0 < h < δ, then
1
h

a+h

f (x) dx − f (a) ≤ a 1
· sup |f (x) − f (a)| · h ≤ , h a≤a≤a+h

which proves the result.
A similar proof shows that if f is continuous at b, then lim+ h→0

1 h b

f = f (b), b−h and if f is continuous at a < c < b, then lim+ h→0

1
2h

c+h

f = f (c). c−h More generally, if f is continuous at c and {Ih : h > 0} is any collection of intervals with c ∈ Ih and |Ih | → 0 as h → 0+ , then lim h→0+

1
|Ih |

f = f (c).
Ih

The assumption in Theorem 12.4 that f is continuous at the point about which we take the averages is essential.

12.1. The fundamental theorem of calculus

245

Example 12.5. Let f : R → R be the sign

1

f (x) = 0


−1

function if x > 0, if x = 0, if x < 0.

Then lim+ h→0

1 h h

f (x) dx = 1,

lim+

h→0

0

1 h 0

f (x) dx = −1,
−h

and neither limit is equal to f (0). In this example, the limit of the symmetric averages h
1
f (x) dx = 0 lim+ h→0 2h −h is equal to f (0), but this equality doesn’t hold if we change f (0) to a nonzero value
(for example, if f (x) = 1 for x ≥ 0 and f (x) = −1 for x < 0) since the limit of the symmetric averages is still 0.
The second part of the fundamental theorem follows from this result and the fact that the difference quotients of F are averages of f .
Theorem 12.6 (Fundamental theorem of calculus II). Suppose that f : [a, b] → R is integrable and F : [a, b] → R is defined by x F (x) =

f (t) dt. a Then F is continuous on [a, b]. Moreover, if f is continuous at a ≤ c ≤ b, then F is differentiable at c and F (c) = f (c).
Proof. First, note that Theorem 11.44 implies that f is integrable on [a, x] for every a ≤ x ≤ b, so F is well-defined. Since f is Riemann integrable, it is bounded, and |f | ≤ M for some M ≥ 0. It follows that x+h |F (x + h) − F (x)| =

f (t) dt ≤ M |h|, x which shows that F is continuous on [a, b] (in fact, Lipschitz continuous).
Moreover, we have
F (c + h) − F (c)
1
= h h

c+h

f (t) dt. c It follows from Theorem 12.4 that if f is continuous at c, then F is differentiable at c with
F (c) = lim

h→0

F (c + h) − F (c)
1
= lim h→0 h h c+h

f (t) dt = f (c), c where we use the appropriate right or left limit at an endpoint.
The assumption that f is continuous is needed to ensure that F is differentiable.

246

12. Properties and Applications of the Integral

Example 12.7. If f (x) =

1 for x ≥ 0,
0 for x < 0,

then x x for x ≥ 0,
0 for x < 0.

f (t) dt =

F (x) =
0

The function F is continuous but not differentiable at x = 0, where f is discontinuous, since the left and right derivatives of F at 0, given by F (0− ) = 0 and
F (0+ ) = 1, are different.

12.2. Consequences of the fundamental theorem
The first part of the fundamental theorem, Theorem 12.1, is the basic tool for the exact evaluation of integrals. It allows us to compute the integral of of a function f if we can find an antiderivative; that is, a function F such that F = f . There is no systematic procedure for finding antiderivatives. Moreover, an antiderivative of an elementary function (constructed from power, trigonometric, and exponential functions and their inverses) need not be — and often isn’t — expressible in terms of elementary functions. By contrast, the rules of differentiation provide a mechanical algorithm for the computation of the derivative of any function formed from elementary functions by algebraic operations and compositions.
Example 12.8. For p = 0, 1, 2, . . . , we have
1
d xp+1 = xp , dx p + 1 and it follows that
1

xp dx =
0

1
.
p+1

Example 12.9. We can use the fundamental theorem to evaluate certain limits of sums. For example, lim n→∞

1 np+1 n

kp = k=1 1
,
p+1

since the sum on the left-hand side is the upper sum of xp on a partition of [0, 1] into n intervals of equal length. Example 11.27 illustrates this result explicitly for p = 2.
Two important general consequences of the first part of the fundamental theorem are integration by parts and substitution (or change of variable), which come from inverting the product rule and chain rule for derivatives, respectively.
Theorem 12.10 (Integration by parts). Suppose that f, g : [a, b] → R are continuous on [a, b] and differentiable in (a, b), and f , g are integrable on [a, b]. Then b b

f g dx = f (b)g(b) − f (a)g(a) − a f g dx. a 12.2. Consequences of the fundamental theorem

247

Proof. The function f g is continuous on [a, b] and, by the product rule, differentiable in (a, b) with derivative
(f g) = f g + f g.
Since f , g, f , g are integrable on [a, b], Theorem 11.35 implies that f g , f g, and
(f g) , are integrable. From Theorem 12.1, we get that b b

f g dx +

b

(f g) dx = f (b)g(b) − f (a)g(a),

f g dx =

a

a

a

which proves the result.
Integration by parts says that we can move a derivative from one factor in an integral onto the other factor, with a change of sign and the appearance of a boundary term. The product rule for derivatives expresses the derivative of a product in terms of the derivatives of the factors. By contrast, integration by parts doesn’t give an explicit expression for the integral of a product, it simply replaces one integral by another. This can sometimes be used transform an integral into an integral that is easier to evaluate, but the importance of integration by parts goes far beyond its use as an integration technique.
Example 12.11. For n = 0, 1, 2, 3, . . . , let x tn e−t dt.

In (x) =
0

If n ≥ 1, then integration by parts with f (t) = tn and g (t) = e−t gives x In (x) = −xn e−x + n

tn−1 e−t dt = −xn e−x + nIn−1 (x).
0

Also, by the fundamental theorem of calculus, x e−t dt = 1 − e−x .

I0 (x) =
0

It then follows by induction that n In (x) = n! 1 − e−x k=0 xk
.
k!

Since xk e−x → 0 as x → ∞ for every k = 0, 1, 2, . . . , we get the improper integral ∞

r

tn e−t dt = lim

r→∞

0

tn e−t dt = n!
0

This formula suggests an extension of the factorial function to complex numbers z ∈ C, called the Gamma function, which is defined for z > 0 by the improper, complex-valued integral


tz−1 e−t dt.

Γ(z) =
0

In particular, Γ(n) = (n − 1)! for n ∈ N. The Gamma function is an important special function, which is studied further in complex analysis.
Next we consider the change of variable formula for integrals.

248

12. Properties and Applications of the Integral

Theorem 12.12 (Change of variable). Suppose that g : I → R differentiable on an open interval I and g is integrable on I. Let J = g(I). If f : J → R continuous, then for every a, b ∈ I, b g(b)

f (g(x)) g (x) dx =

f (u) du.

a

g(a)

Proof. For x ∈ J, let x F (x) =

f (u) du. g(a) Since f is continuous, Theorem 12.6 implies that F is differentiable in J with
F = f . The chain rule implies that the composition F ◦ g : I → R is differentiable in I, with
(F ◦ g) (x) = f (g(x)) g (x).
This derivative is integrable on [a, b] since f ◦ g is continuous and g is integrable.
Theorem 12.1, the definition of F , and the additivity of the integral then imply that b

b

(F ◦ g) dx

f (g(x)) g (x) dx = a a

= F (g(b)) − F (g(a)) g(b) =

F (u) du, g(a) which proves the result.

There is no assumption in this theorem that g is invertible, but we often use the theorem in that case. A continuous function maps an interval to an interval, and it is one-to-one if and only if it is strictly monotone. An increasing function preserves the orientation of the interval, while a decreasing function reverses it, in which case the integrals in the previous theorem are understood as oriented integrals.
This result can also be formulated in terms of non-oriented integrals. Suppose that g : I → J is one-to-one and onto from an interval I = [a, b] to an interval
J = g(I) = [c, d] where c = g(a), d = g(b) if g is increasing, and c = g(b), d = g(a) if g is decreasing, then
(f ◦ g)(x)|g (x)| dx.

f (u) du = g(I) I

In this identity, both integrals are over positively oriented intervals and we include an absolute value in the Jacobian factor |g (x)|. If g ≥ 0, then this identity is the

12.2. Consequences of the fundamental theorem

249

same as the oriented form, while if g ≤ 0, then b (f ◦ g)(x)|g (x)| dx =
I

(f ◦ g)(x)[−g (x)] dx a g(b)

=−

f (u) du g(a) g(a)

=

f (u) du g(b) =

f (u) du. g(I) Example 12.13. For every a > 0, the increasing, differentiable function g : R → R defined by g(x) = x3 maps (−a, a) one-to-one and onto (−a3 , a3 ) and preserves orientation. Thus, if f : [−a, a] → R is continuous, a3 a
3

2

f (x ) · 3x dx =

f (u) du.
−a3

−a

The decreasing, differentiable function g : R → R defined by g(x) = −x3 maps
(−a, a) one-to-one and onto (−a3 , a3 ) and reverses orientation. Thus,
−a3

a
3

2

f (−x ) · (−3x ) dx =

a3

f (u) du = − a3 −a

f (u) du.
−a3

The non-monotone, differentiable function g : R → R defined by g(x) = x2 maps
(−a, a) onto [0, a2 ). It is two-to-one, except at x = 0. The change of variables formula gives a2 a
2

f (x ) · 2x dx =

f (u) du = 0. a2 −a

The contributions to the original integral from [0, a] and [−a, 0] cancel since the integrand is an odd function of x.
One consequence of the second part of the fundamental theorem, Theorem 12.6, is that every continuous function has an antiderivative, even if it can’t be expressed explicitly in terms of elementary functions. This provides a way to define transcendental functions as integrals of elementary functions.
Example 12.14. One way to define the natural logarithm log : (0, ∞) → R in terms of algebraic functions is as the integral x 1 dt. log x =
1 t
This integral is well-defined for every 0 < x < ∞ since 1/t is continuous on the interval [1, x] if x > 1, or [x, 1] if 0 < x < 1. The usual properties of the logarithm follow from this representation. We have (log x) = 1/x by definition; and, for example, by making the substitution s = xt in the second integral in the following equation, when dt/t = ds/s, we get x log x + log y =
1

1 dt + t y
1

1 dt = t x
1

1 dt + t xy x 1 ds = s xy
1

1 dt = log(xy). t 250

12. Properties and Applications of the Integral

1.5

1

y

0.5

0

−0.5

−1

−1.5
−2

−1.5

−1

−0.5

0 x 0.5

1

1.5

2

Figure 1. Graphs of the error function y = F (x) (blue) and its derivative, the Gaussian function y = f (x) (green), from Example 12.15.

We can also define many non-elementary functions as integrals.
Example 12.15. The error function x 2
2
e−t dt erf(x) = √ π 0 is an anti-derivative on R of the Gaussian function
2
2 f (x) = √ e−x . π The error function isn’t expressible in terms of elementary functions. Nevertheless, it is defined as a limit of Riemann sums for the integral. Figure 1 shows the graphs of f and F . The name “error function” comes from the fact that the probability of a Gaussian random variable deviating by more than a given amount from its mean can be expressed in terms of F . Error functions also arise in other applications; for example, in modeling diffusion processes such as heat flow.

Example 12.16. The Fresnel sine function S is defined by x S(x) =

sin
0

πt2
2

dt.

The function S is an antiderivative of sin(πt2 /2) on R (see Figure 2), but it can’t be expressed in terms of elementary functions. Fresnel integrals arise, among other places, in analysing the diffraction of waves, such as light waves. From the perspective of complex analysis, they are closely related to the error function through the
Euler formula eiθ = cos θ + i sin θ.
Discontinuous functions may or may not have an antiderivative and typically don’t. Darboux proved that every function f : (a, b) → R that is the derivative of a function F : (a, b) → R, where F = f at all points of (a, b), has the intermediate

12.3. Integrals and sequences of functions

251

1
0.8
0.6
0.4

y

0.2
0
−0.2
−0.4
−0.6
−0.8
−1

0

2

4

6

8

10

x

Figure 2. Graphs of the Fresnel integral y = S(x) (blue) and its derivative y = sin(πx2 /2) (green) from Example 12.16.

value property. That is, for all c, d such that if a < c < d < b and all y between f (c) and f (d), there exists an x between c and d such that f (x) = y. A continuous derivative has this property by the intermediate value theorem, but a discontinuous derivative also has it. Thus, discontinuous functions without the intermediate value property, such as ones with a jump discontinuity, don’t have an antiderivative. For example, the function F in Example 12.7 is not an antiderivative of the step function f on R since it isn’t differentiable at 0.
In dealing with functions that are not continuously differentiable, it turns out to be more useful to abandon the idea of a derivative that is defined pointwise everywhere (pointwise values of discontinuous functions are somewhat arbitrary) and introduce the notion of a weak derivative. We won’t define or study weak derivatives here.

12.3. Integrals and sequences of functions
A fundamental question that arises throughout analysis is the validity of an exchange in the order of limits. Some sort of condition is always required.
In this section, we consider the question of when the convergence of a sequence of functions fn → f implies the convergence of their integrals fn → f . Here, we exchange a limit of a sequence of functions with a limit of the Riemann sums that define their integrals. The two types of convergence we’ll discuss are pointwise and uniform convergence, which are defined in Chapter 9.
As we show first, the Riemann integral is well-behaved with respect to uniform convergence. The drawback to uniform convergence is that it’s a strong form of convergence, and we often want to use a weaker form, such as pointwise convergence, in which case the Riemann integral may not be suitable.

252

12. Properties and Applications of the Integral

12.3.1. Uniform convergence. The uniform limit of continuous functions is continuous and therefore integrable. The next result shows, more generally, that the uniform limit of integrable functions is integrable. Furthermore, the limit of the integrals is the integral of the limit.
Theorem 12.17. Suppose that fn : [a, b] → R is Riemann integrable for each n ∈ N and fn → f uniformly on [a, b] as n → ∞. Then f : [a, b] → R is Riemann integrable on [a, b] and b b

fn .

f = lim

n→∞

a

a

Proof. The main statement we need to prove is that f is integrable. Let
Since fn → f uniformly, there is an N ∈ N such that if n > N then fn (x) −

> 0.

for all a ≤ x ≤ b.

< f (x) < fn (x) + b−a b−a
It follows from Proposition 11.39 that
L fn −

≤ L(f ),
U (f ) ≤ U fn +
.
b−a b−a Since fn is integrable and upper integrals are greater than lower integrals, we get that b

b

fn − ≤ L(f ) ≤ U (f ) ≤ a fn + a for all n > N , which implies that
0 ≤ U (f ) − L(f ) ≤ 2 .
Since > 0 is arbitrary, we conclude that L(f ) = U (f ), so f is integrable. Moreover, it follows that for all n > N we have b b

fn − a which shows that

b a fn →

b a f ≤ , a f as n → ∞.

Alternatively, once we know that the uniform limit of integrable functions is integrable, the convergence of the integrals follows directly from the estimate b b

fn − a b

(fn − f ) ≤ sup |fn (x) − f (x)| · (b − a) → 0

f = a x∈[a,b]

a

Example 12.18. The function fn : [0, 1] → R defined by n + cos x fn (x) = x ne + sin x converges uniformly on [0, 1] to f (x) = e−x since, for 0 ≤ x ≤ 1, n + cos x cos x − e−x sin x
1
− e−x =
≤ . nex + sin x nex + sin x n It follows that
1

lim

n→∞

0

n + cos x dx = nex + sin x

1
0

1 e−x dx = 1 − . e as n → ∞.

12.3. Integrals and sequences of functions

253

Example 12.19. Every power series f (x) = a0 + a1 x + a2 x2 + · · · + an xn + . . . with radius of convergence R > 0 converges uniformly on compact intervals inside the interval |x| < R, so we can integrate it term-by-term to get x 0

1
1
1 f (t) dt = a0 x + a1 x2 + a2 x3 + · · · + an xn+1 + . . .
2
3 n+1 for |x| < R.

Example 12.20. If we integrate the geometric series
1
= 1 + x + x2 + · · · + xn + . . .
1−x

for |x| < 1,

we get a power series for log, log 1
1−x

1
1
1
= x + x2 + x3 · · · + xn + . . .
2
3 n for |x| < 1.

For instance, taking x = 1/2, we get the rapidly convergent series


log 2 =

1 n2n n=1

for the irrational number log 2 ≈ 0.6931. This series was known and used by Euler.
For comparison, the alternating harmonic series in Example 12.46 also converges to log 2, but it does so extremely slowly and would be a poor choice for computing a numerical approximation.
Although we can integrate uniformly convergent sequences, we cannot in general differentiate them. In fact, it’s often easier to prove results about the convergence of derivatives by using results about the convergence of integrals, together with the fundamental theorem of calculus. The following theorem provides sufficient conditions for fn → f to imply that fn → f .
Theorem 12.21. Let fn : (a, b) whose derivatives fn : (a, b) → R pointwise and fn → g uniformly continuous. Then f : (a, b) → R is

→ R be a sequence of differentiable functions are integrable on (a, b). Suppose that fn → f on (a, b) as n → ∞, where g : (a, b) → R is continuously differentiable on (a, b) and f = g.

Proof. Choose some point a < c < b. Since fn is integrable, the fundamental theorem of calculus, Theorem 12.1, implies that x fn (x) = fn (c) +

fn

for a < x < b.

c

Since fn → f pointwise and fn → g uniformly on [a, x], we find that x f (x) = f (c) +

g. c Since g is continuous, the other direction of the fundamental theorem, Theorem 12.6, implies that f is differentiable in (a, b) and f = g.

254

12. Properties and Applications of the Integral

In particular, this theorem shows that the limit of a uniformly convergent sequence of continuously differentiable functions whose derivatives converge uniformly is also continuously differentiable.
The key assumption in Theorem 12.21 is that the derivatives fn converge uniformly, not just pointwise; the result is false if we only assume pointwise convergence of the fn . In the proof of the theorem, we only use the assumption that fn (x) converges at a single point x = c. This assumption together with the assumption that fn → g uniformly implies that fn → f uniformly, where x f (x) = lim fn (c) + n→∞ g. c Thus, the theorem remains true if we replace the assumption that fn → f pointwise on (a, b) by the weaker assumption that limn→∞ fn (c) exists for some c ∈ (a, b).
This isn’t an important change, however, because the restrictive assumption in the theorem is the uniform convergence of the derivatives fn , not the pointwise (or uniform) convergence of the functions fn .
The assumption that g = lim fn is continuous is needed to show the differentiability of f by the fundamental theorem, but the result remains true even if g isn’t continuous. In that case, however, a different — and more complicated — proof is required, which is given in Theorem 9.18.
12.3.2. Pointwise convergence. On its own, the pointwise convergence of functions is never sufficient to imply convergence of their integrals.
Example 12.22. For n ∈ N, define fn : [0, 1] → R by fn (x) =

n if 0 < x < 1/n,
0 if x = 0 or 1/n ≤ x ≤ 1.

Then fn → 0 pointwise on [0, 1] but
1

fn = 1
0

for every n ∈ N. By slightly modifying these functions to fn (x) =

n2
0

if 0 < x < 1/n, if x = 0 or 1/n ≤ x ≤ 1,

we get a sequence that converges pointwise to 0 but whose integrals diverge to ∞.
The fact that the fn are discontinuous is not important; we could replace the step functions by continuous “tent” functions or smooth “bump” functions.
The behavior of the integral under pointwise convergence in the previous example is unavoidable whatever definition of the integral one uses. A more serious defect of the Riemann integral is that the pointwise limit of Riemann integrable functions needn’t be Riemann integrable at all, even if it is bounded.

12.4. Improper Riemann integrals

255

Example 12.23. Let {rk : k ∈ N} be an enumeration of the rational numbers in
[0, 1] and define fn : [0, 1] → R by fn (x) =

if x = rk for some 1 ≤ k ≤ n, otherwise. 1
0

Then each fn is Riemann integrable since it differs from the zero function at finitely many points. However, fn → f pointwise on [0, 1] to the Dirichlet function f , which is not Riemann integrable.
This is another place where the Lebesgue integral has better properties than the Riemann integral. The pointwise (or pointwise almost everywhere) limit of
Lebesgue measurable functions is Lebesgue measurable. As Example 12.22 shows, we still need conditions to ensure the convergence of the integrals, but there are quite simple and general conditions for the Lebesgue integral (such as the monotone convergence and dominated convergence theorems).

12.4. Improper Riemann integrals
The Riemann integral is only defined for a bounded function on a compact interval
(or a finite union of such intervals). Nevertheless, we frequently want to integrate unbounded functions or functions on an infinite interval. One way to interpret such integrals is as a limit of Riemann integrals; these limits are called improper
Riemann integrals.
12.4.1. Improper integrals. First, we define the improper integral of a function that fails to be integrable at one endpoint of a bounded interval.
Definition 12.24. Suppose that f : (a, b] → R is integrable on [c, b] for every a < c < b. Then the improper integral of f on [a, b] is b b

f = lim

→0+

a

f. a+ The improper integral converges if this limit exists (as a finite real number), otherwise it diverges. Similarly, if f : [a, b) → R is integrable on [a, c] for every a < c < b, then b a b−

f = lim
+
→0

f. a We use the same notation to denote proper and improper integrals; it should be clear from the context which integrals are proper Riemann integrals (i.e., ones given by Definition 11.11) and which are improper. If f is Riemann integrable on
[a, b], then Proposition 11.50 shows that its improper and proper integrals agree, but an improper integral may exist even if f isn’t integrable.
Example 12.25. If p > 0, then the integral
1
0

1 dx xp

256

12. Properties and Applications of the Integral

isn’t defined as a Riemann integral since 1/xp is unbounded on (0, 1]. The corresponding improper integral is
1
0

1 dx = lim xp →0+

1

1 dx. xp

For p = 1, we have
1

1 − 1−p
1
dx =
,
xp
1−p
so the improper integral converges if 0 < p < 1, with
1
0

1
1
dx =
,
p x p−1

and diverges to ∞ if p > 1. The integral also diverges (more slowly) to ∞ if p = 1 since 1
1
1 dx = log . x Thus, we get a convergent improper integral if the integrand 1/xp does not grow too rapidly as x → 0+ (slower than 1/x).
We define improper integrals on an unbounded interval as limits of integrals on bounded intervals.
Definition 12.26. Suppose that f : [a, ∞) → R is integrable on [a, r] for every r > a. Then the improper integral of f on [a, ∞) is


r

f = lim

r→∞

a

f. a Similarly, if f : (−∞, b] → R is integrable on [r, b] for every r < b, then b b

f = lim

r→∞

−∞

f.
−r

Let’s consider the convergence of the integral of the power function in Example 12.25 at infinity rather than at zero.
Example 12.27. Suppose that p > 0. The improper integral

1

1 dx = lim r→∞ xp

r
1

1 dx = lim r→∞ xp

r1−p − 1
1−p

converges to 1/(p − 1) if p > 1 and diverges to ∞ if 0 < p < 1. It also diverges
(more slowly) if p = 1 since

1

1 dx = lim r→∞ x

r
1

1 dx = lim log r = ∞. r→∞ x

Thus, we get a convergent improper integral if the integrand 1/xp decays sufficiently rapidly as x → ∞ (faster than 1/x).
A divergent improper integral may diverge to ∞ (or −∞) as in the previous examples, or — if the integrand changes sign — it may oscillate.

12.4. Improper Riemann integrals

257

Example 12.28. Define f : [0, ∞) → R by f (x) = (−1)n
Then 0 ≤

r
0

for n ≤ x < n + 1 where n = 0, 1, 2, . . . .

f ≤ 1 and n 1 if n is an odd integer,
0 if n is an even integer.

f=
0

Thus, the improper integral


0

f doesn’t converge.

More general improper integrals may be defined as finite sums of improper integrals of the previous forms. For example, if f : [a, b] \ {c} → R is integrable on closed intervals not including a < c < b, then b a

c−δ

f = lim+ δ→0 b

f + lim
+
→0

a

f; c+ and if f : R → R is integrable on every compact interval, then


c

r

f = lim

s→∞

−∞

f + lim
−s

r→∞

f, c where we split the integral at an arbitrary point c ∈ R. Note that each limit is required to exist separately.
Example 12.29. The improper Riemann integral


1 r 1
1
1 dx = lim dx + lim dx p p r→∞ 1 xp x x
→0+
0 does not converge for any p ∈ R, since the integral either diverges at 0 (if p ≥ 1) or at infinity (if p ≤ 1).

Example 12.30. If f : [0, 1] → R is continuous and 0 < c < 1, then we define as an improper integral
1
0

c−δ

f (x) dx = lim+ δ→0 |x − c|1/2

0

f (x) dx + lim
→0+
|x − c|1/2

1 c+ f (x) dx. |x − c|1/2

Example 12.31. Consider the following integral, called a Frullani integral,

f (ax) − f (bx)
I=
dx, x 0 where 0 < a < b and f : [0, ∞) → R is a continuous function whose limit as x → ∞ exists. We write this limit as f (∞) = lim f (x). x→∞ We interpret the integral as an improper integral I = I1 + I2 where r f (ax) − f (bx) f (ax) − f (bx) dx, I2 = lim dx. + r→∞ 1 x x
→0
Consider I1 . After making the substitutions s = ax and t = bx and using the additivity property of the integral, we get that
1

I1 = lim

a

I1 = lim
+
→0

a

f (s) ds − s b b f (t) dt t

b

= lim
+
→0

a

f (t) dt t

b

− a f (t) dt. t

258

12. Properties and Applications of the Integral

To evaluate the limit, we write b a

f (t) dt = t b a b

= a f (t) − f (0) dt + f (0) t b a f (t) − f (0) dt + f (0) log t 1 dt t b a

.

Since f is continuous at 0 and t ≥ a in the interval of integration of length (b − a), we have b a

as

f (t) − f (0) dt ≤ t b−a a · max{|f (t) − f (0)| : a ≤ t ≤ b} → 0

→ 0+ . It follows that
I1 = f (0) log

b

b a − a f (t) dt. t

A similar argument gives
I2 = −f (∞) log

b

b a + a f (t) dt. t

Adding these results, we conclude that

0

f (ax) − f (bx) dx = {f (0) − f (∞)} log x b a .

12.4.2. Absolutely convergent improper integrals. The convergence of improper integrals is analogous to the convergence of series. A series an converges absolutely if
|an | converges, and conditionally if an converges but
|an | diverges. We introduce a similar definition for improper integrals and provide a test for the absolute convergence of an improper integral that is analogous to the comparison test for series.
Definition 12.32. An improper integral proper integral b a

b a b a f is absolutely convergent if the im-

|f | converges, and conditionally convergent if

b a f converges but

|f | diverges.

As part of the next theorem, we prove that an absolutely convergent improper integral converges (similarly, an absolutely convergent series converges).
Theorem 12.33. Suppose that f, g : I → R are defined on some finite or infinite interval I. If |f | ≤ g and the improper integral I g converges, then the improper integral I f converges absolutely. Moreover, an absolutely convergent improper integral converges.
Proof. To be specific, we suppose that f, g : [a, ∞) → R are integrable on [a, r] for r > a and consider the improper integral


r

f = lim a r→∞

f. a A similar argument applies to other types of improper integrals.

12.4. Improper Riemann integrals

259

First, suppose that f ≥ 0. Then r ∞

r

f≤

g≤

a

g,

a

a

r

so a f is a monotonic increasing function of r that is bounded from above. Therefore it converges as r → ∞.
In general, we decompose f into its positive and negative parts, f = f+ − f− ,

|f | = f+ + f− ,

f+ = max{f, 0},

f− = max{−f, 0}.

We have 0 ≤ f± ≤ g, so the improper integrals of f± converge by the previous argument, and therefore so does the improper integral of f :


r r→∞ a

r

f+ −

f = lim

a r r

f+ − lim

= lim

r→∞


f−

r→∞


a

f+ −

=

f− a a

f− .

a

a




Moreover, since 0 ≤ f± ≤ |f |, we see that a f+ and a f− both converge if


|f | converges, and therefore so does a f , so an absolutely convergent improper a integral converges.
Example 12.34. Consider the limiting behavior of the error function erf(x) in
Example 12.15 as x → ∞, which is given by
2
√ π ∞
0

r

2
2
e−x dx = √ lim π r→∞

2

e−x dx.
0

The convergence of this improper integral follows by comparison with e−x , for example, since
2

0 ≤ e−x ≤ e−x

for x ≥ 1,

and


r

e−x dx = lim

r→∞

1

e−x dx = lim e−1 − e−r = r→∞ 1

1
.
e

This argument proves that the error function approaches a finite limit as x → ∞, but it doesn’t give the exact value, only an upper bound
2
√ π ∞

2

e−x dx ≤ M,
0

1

2
M=√
π

2

e−x dx +
0

1 e 2
≤√
π

One can evaluate this improper integral exactly, with the result that
2
√ π ∞

2

e−x dx = 1.
0

1+

1 e .

260

12. Properties and Applications of the Integral

0.4

y

0.3

0.2

0.1

0

−0.1

0

5

10

15

x

Figure 3. Graph of y = (sin x)/(1 + x2 ) from Example 12.35. The dashed green lines are the graphs of y = ±1/x2 .

The standard trick to obtain this result (apparently introduced by Laplace) uses double integration, polar coordinates, and the substitution u = r2 :


2
2

e−x dx





e−x

=

0

0

2

−y 2

dxdy

0


π/2

2

e−r r dr dθ

=
0

0

π π ∞ −u e du = .
=
4 0
4
This formal computation can be justified rigorously, but we won’t do so here. There are also many other ways to obtain the same result.
Example 12.35. The improper integral

sin x dx = lim r→∞ 1 + x2
0
converges absolutely, since

0

sin x dx =
1 + x2

1
0

r
0

sin x dx 1 + x2

sin x dx +
1 + x2


1

sin x dx 1 + x2

and (see Figure 3) sin x
1
≤ 2
1 + x2 x ∞

for x ≥ 1,
1

1 dx < ∞. x2 The value of this integral doesn’t have an elementary expression, but by using contour integration from complex analysis one can show that

sin x
1
e dx =
Ei(1) − Ei(−1) ≈ 0.6468,
2
1+x
2e
2
0

12.5. * Principal value integrals

261

where Ei is the exponential integral function defined in Example 12.41.
Improper integrals, and the principal value integrals discussed below, arise frequently in complex analysis, and many such integrals can be evaluated by contour integration. Example 12.36. The improper integral

0

sin x dx = lim r→∞ x

r
0

sin x π dx = x 2

converges conditionally. We leave the proof as an exercise. Note that there is no difficulty at 0, since sin x/x → 1 as x → 0, and comparison with the function
1/x doesn’t imply absolute convergence at infinity because the improper integral

1/x dx diverges. There are many ways to show that the exact value of the
1
improper integral is π/2. The standard method uses contour integration.
Example 12.37. Consider the limiting behavior of the Fresnel sine function S(x) in Example 12.16 as x → ∞. The improper integral


πx2
2

sin
0

r

dx = lim

r→∞

sin
0

πx2
2

dx =

1
.
2

converges conditionally. This example may seem surprising since the integrand sin(πx2 /2) doesn’t converge to 0 as x → ∞. The explanation is that the integrand oscillates more rapidly with increasing x, leading to a more rapid cancelation between positive and negative values in the integral (see Figure 2). The exact value can be found by contour integration, again, which shows that


sin
0

πx2
2

1 dx = √
2



exp −
0

πx2
2

dx.

Evaluation of the resulting Gaussian integral gives 1/2.

12.5. * Principal value integrals
Some integrals have a singularity that is too strong for them to converge as improper integrals but, due to cancelation between positive and negative parts of the integrand, they have a finite limit as a principal value integral. We begin with an example. Example 12.38. Consider f : [−1, 1] \ {0} defined by f (x) =

1
.
x

The definition of the integral of f on [−1, 1] as an improper integral is
1
−1

−δ
1
1 dx = lim dx + lim x δ→0+ −1 x
→0+
= lim log δ − lim log . δ→0+ →0+

1

1 dx x

262

12. Properties and Applications of the Integral

Neither limit exists, so the improper integral diverges. (Formally, we get ∞ − ∞.)
If, however, we take δ = and combine the limits, we get a convergent principal value integral, which is defined by
1

p.v.
−1



1 dx = lim x →0+

−1

1

1 dx + x 1 dx x

= lim (log − log ) = 0.
+
→0

The value of 0 is what one might expect from the oddness of the integrand. A cancelation in the contributions from either side of the singularity is essential to obtain a finite limit.
The principal value integral of 1/x on a non-symmetric interval about 0 still exists but is non-zero. For example, if b > 0, then b p.v.
−1



1 dx = lim x →0+

−1

b

1 dx + x 1 dx x

= lim (log + log b − log ) = log b.
+
→0

The crucial feature of a principal value integral is that we remove a symmetric interval around a singular point, or infinity. The resulting cancelation in the integral of a non-integrable function that changes sign across the singularity may lead to a finite limit.
Definition 12.39. If f : [a, b] \ {c} → R is integrable on closed intervals not including a < c < b, then the principal value integral of f on [a, b] is b p.v.

c−

b

f = lim

f+

→0+

a

a

f

.

c+

If f : R → R is integrable on compact intervals, then the principal value integral of f on R is


p.v.

r

f = lim

r→∞

−∞

f.
−r

If the improper integral exists, then the principal value integral exists and is equal to the improper integral. As Example 12.38 shows, the principal value integral may exist even if the improper integral does not. Of course, a principal value integral may also diverge.
Example 12.40. Consider the principal value integral
1

p.v.
−1

1 dx = lim x2 →0+
= lim
+
→0


−1

2

1 dx + x2 −2

1

1 dx x2

= ∞.

In this case, the function 1/x2 is positive and approaches ∞ on both sides of the singularity at x = 0, so there is no cancelation and the principal value integral diverges to ∞.
Principal value integrals arise frequently in complex analysis, harmonic analysis, and a variety of applications.

12.5. * Principal value integrals

263

10

8

6

4

2

0

−2

−4
−2

−1

0

1

2

3

Figure 4. Graphs of the exponential integral y = Ei(x) (blue) and its derivative y = ex /x (green) from Example 12.41.

Example 12.41. The exponential integral Ei is a non-elementary function defined by x

Ei(x) =
−∞

et dt. t

Its graph is shown in Figure 4. This integral has to be understood, in general, as an improper, principal value integral, and the function has a logarithmic singularity at x = 0.
If x < 0, then the integrand is continuous for −∞ < t ≤ x, and the integral is interpreted as an improper integral, x ∞

et dt = lim r→∞ t

x
−r

et dt. t

This improper integral converges absolutely by comparison with et , since et ≤ et t for −∞ < t ≤ −1,

and
−1

−1

et dt = lim e−r − e−1 =

et dt = lim

r→∞

−∞

r→∞

−r

1
.
e

If x > 0, then the integrand has a non-integrable singularity at t = 0, and we interpret it as a principal value integral. We write x −∞

et dt = t −1
−∞

et dt + t x
−1

et dt. t

264

12. Properties and Applications of the Integral

The first integral is interpreted as an improper integral as before. The second integral is interpreted as a principal value integral x p.v.
−1

et dt = lim t →0+



et dt + t −1

x

et dt . t This principal value integral converges, since x p.v.
−1

x

et dt = t −1

x

et − 1 dt + p.v. t −1

1 dt = t x
−1

et − 1 dt + log x. t The first integral makes sense as a Riemann integral since the integrand has a removable singularity at t = 0, with lim t→0

et − 1 t = 1,

so it extends to a continuous function on [−1, x].
Finally, if x = 0, then the integrand is unbounded at the left endpoint t = 0.
The corresponding improper integral diverges, and Ei(0) is undefined.
The exponential integral arises in physical applications such as heat flow and radiative transfer. It is also related to the logarithmic integral x li(x) =
0

dt log t

by li(x) = Ei(log x). The logarithmic integral is important in number theory, where it gives an asymptotic approximation for the number of primes less than x as x → ∞.
Example 12.42. Let f : R → R and assume, for simplicity, that f has compact support, meaning that f = 0 outside a compact interval [−r, r]. If f is integrable, we define the Hilbert transform Hf : R → R of f by the principal value integral
Hf (x) =

1
p.v.
π


−∞

f (t)
1
dt = lim x−t π →0+

x−
−∞

f (t) dt + x−t ∞ x+ f (t) dt . x−t Here, x plays the role of a parameter in the integral with respect to t. We use a principal value because the integrand may have a non-integrable singularity at t = x. Since f has compact support, the intervals of integration are bounded and there is no issue with the convergence of the integrals at infinity.
For example, suppose that f is the step function f (x) =

1 for 0 ≤ x ≤ 1,
0 for x < 0 or x > 1.

If x < 0 or x > 1, then t = x for 0 ≤ t ≤ 1, and we get a proper Riemann integral
Hf (x) =

1 π 1
0

x
1
1 dt = log
.
x−t π x−1

12.6. The integral test for series

265

If 0 < x < 1, then we get a principal value integral x− 1
1
1
1 1 lim dt + dt π →0+ x−t π x+ x − t
0
1 x + log lim log
=
π →0+
1−x
1 x = log π 1−x

Hf (x) =

Thus, for x = 0, 1 we have
Hf (x) =

x
1
. log π x−1 The principal value integral with respect to t diverges if x = 0, 1 because f (t) has a jump discontinuity at the point where t = x. Consequently the values Hf (0),
Hf (1) of the Hilbert transform of the step function are undefined.

12.6. The integral test for series
An a further application of the improper integral, we prove a useful test for the convergence or divergence of a monotone decreasing, positive series. The idea is to interpret the series as an upper or lower sum of an integral.
Theorem 12.43 (Integral test). Suppose that f : [1, ∞) → R is a positive decreasing function (i.e., 0 ≤ f (x) ≤ f (y) for x ≥ y). Let an = f (n). Then the series ∞

an n=1 converges if and only if the improper integral


f (x) dx
1

converges. Furthermore, the limit n n→∞

n

ak −

D = lim

f (x) dx
1

k=1

exists, and 0 ≤ D ≤ a1 .
Proof. Let

n

Sn =

n

ak , k=1 Tn =

f (x) dx.
1

The integral Tn exists since f is monotone, and the sequences (Sn ), (Tn ) are increasing since f is positive.
Let
Pn = {[1, 2], [2, 3], . . . , [n − 1, n]} be the partition of [1, n] into n − 1 intervals of length 1. Since f is decreasing, sup f = ak ,
[k,k+1]

inf f = ak+1 ,

[k,k+1]

266

12. Properties and Applications of the Integral

and the upper and lower sums of f on Pn are given by n−1 U (f ; Pn ) =

n−1

ak ,

L(f ; Pn ) =

k=1

ak+1 . k=1 Since the integral of f on [1, n] is bounded by its upper and lower sums, we get that
Sn − a1 ≤ Tn ≤ Sn−1 .
This inequality shows that (Tn ) is bounded from above by S if Sn ↑ S, and (Sn ) is bounded from above by T + a1 if Tn ↑ T , so (Sn ) converges if and only if (Tn ) converges, which proves the first part of the theorem.
Let Dn = Sn −Tn . Then the inequality shows that an ≤ Dn ≤ a1 ; in particular,
(Dn ) is bounded from below by zero. Moreover, since f is decreasing, n+1 Dn − Dn+1 =

f (x) dx − an+1 ≥ f (n + 1) · 1 − an+1 = 0, n so (Dn ) is decreasing. Therefore Dn ↓ D where 0 ≤ D ≤ a1 , which proves the second part of the theorem.
A basic application of this result is to the p-series.
Example 12.44. Applying Theorem 12.43 to the function f (x) = 1/xp and using
Example 12.27, we find that


1 np n=1 converges if p > 1 and diverges if 0 < p ≤ 1.
Theorem 12.43 is also useful for divergent series, since it tells us how quickly their partial sums diverge. We remark that one can obtain similar, but more accurate, asymptotic approximations than the one in theorem for the behavior of the partial sums in terms of integrals, called the Euler-MacLaurin summation formulae. Example 12.45. Applying the second part of Theorem 12.43 to the function f (x) = 1/x, we find that n lim

n→∞

k=1

1
− log n = γ k where the limit 0 ≤ γ < 1 is the Euler constant.
Example 12.46. We can use the result of Example 12.45 to compute the sum A of the alternating harmonic series
A=1−

1 1 1 1 1
+ − + − + ....
2 3 4 5 6

12.6. The integral test for series

267

The partial sum of the first 2m terms is given by m A2m = k=1 2m

= k=1 2m

= k=1 m

1

2k − 1
1
−2 k 1

k

k=1

m

k=1

m

k=1

1
2k

1
2k

1
.
k

Here, we rewrite a sum of the odd terms in the harmonic series as the difference between the harmonic series and its even terms, then use the fact that a sum of the even terms in the harmonic series is one-half the sum of the series. It follows that
2m

lim A2m = lim

m→∞

m→∞

k=1

1
− log 2m − k m

k=1

1
− log m + log 2m − log m . k Since log 2m − log m = log 2, we get that lim A2m = γ − γ + log 2 = log 2.

m→∞

The odd partial sums also converge to log 2 since
1
→ log 2
2m + 1

A2m+1 = A2m +

as m → ∞.

Thus,


(−1)n+1
= log 2. n n=1
Example 12.47. A similar calculation to the previous one can be used to to compute the sum S of the rearrangement of the alternating harmonic series
S =1−

1 1 1 1 1 1
1
1
− + − − + −

+ ...
2 4 3 6 8 5 10 12

discussed in Example 4.32. The partial sum S3m of the series may be written in terms of partial sums of the harmonic series as
S3m = 1 − m = k=1 2m

= k=1 2m

= k=1 1 1
1
1
1
− + ··· +


2 4
2m − 1 4m − 2 4m
1

2k − 1
1
− k m

k=1

1 1

k 2

2m

k=1

1

2k

m

k=1

1
2k
2m

k=1

1 1

k 2

1
2k
2m

k=1

1
.
k

268

12. Properties and Applications of the Integral

It follows that
2m

lim S3m = lim

m→∞

m→∞

k=1

1
1
− log 2m − k 2

m

k=1

1
1
− log m − k 2

2m

k=1

1
− log 2m k 1
1
+ log 2m − log − log 2m .
2
2
Since log 2m −

1
2

log m −

1
2

1
2

log 2m =

log 2, we get that that

1
1
1
1
lim S3m = γ − γ − γ + log 2 = log 2.
2
2
2
2

m→∞

Finally, since lim (S3m+1 − S3m ) = lim (S3m+2 − S3m ) = 0,

m→∞

m→∞

we conclude that the whole series converges to S =

1
2

log 2.

12.7. Taylor’s theorem with integral remainder
In Theorem 8.46, we gave an expression for the error between a function and its
Taylor polynomial of order n in terms of the Lagrange remainder, which involves a pointwise value of the derivative of order n + 1 evaluated at some intermediate point. In the next theorem, we give an alternative expression for the error in terms of an integral of the derivative.
Theorem 12.48 (Taylor with integral remainder). Suppose that f : (a, b) → R has n + 1 derivatives on (a, b) and f n+1 is Riemann integrable on every subinterval of (a, b). Let a < c < b. Then for every a < x < b, f (x) = f (c) + f (c)(x − c) +

1
1
f (c)(x − c)2 + · · · + f (n) (c)(x − c)n + Rn (x)
2!
n!

where
Rn (x) =

1 n! x

f (n+1) (t)(x − t)n dt. c Proof. We use proof by induction. The formula is true for n = 0, since the fundamental theorem of calculus (Theorem 12.1) implies that x f (x) = f (c) +

f (t) dt = f (c) + R0 (x). c Assume that the formula is true for some n ∈ N0 and f n+2 is Riemann integrable. Then, since
1 d
(x − t)n+1 ,
(x − t)n = − n + 1 dt an integration by parts with respect to t (Theorem 12.10) implies that x 1
1
f (n+1) (t)(x − t)n+1 +
(n + 1)!
(n + 1)! c 1
(n+1)
n+1
=
f
(c)(x − c)
+ Rn+1 (x).
(n + 1)!

x

f (n+2) (t)(x − t)n+1 dt

Rn (x) = −

c

12.7. Taylor’s theorem with integral remainder

269

Use of this equation in the formula for n gives the formula for n + 1, which proves the result.
By making the change of variable t = c + s(x − c), we can also write the remainder as
1
1 f (n+1) (c + s(x − c)) (1 − s)n ds (x − c)n+1 .
Rn (x) = n! 0
In particular, if |f (n+1) (x)| ≤ M for a < x < b, then
1
1
(1 − s)n ds |x − c|n+1
M
n!
0
M

|x − c|n+1 ,
(n + 1)! which agrees with what one gets from the Lagrange remainder.

|Rn (x)| ≤

Thus, the integral form of the remainder is as effective as the Lagrange form in estimating its size from a uniform bound on the derivative. The integral form requires slightly stronger assumptions than the Lagrange form, since we need to assume that the derivative of order n+1 is integrable, but its proof is straightforward once we have the integral. Moreover, the integral form generalizes to vector-valued functions f : (a, b) → Rn , while the Lagrange form does not.

Chapter 13

Metric, Normed, and
Topological Spaces

A metric space is a set X that has a notion of the distance d(x, y) between every pair of points x, y ∈ X. A fundamental example is R with the absolute-value metric d(x, y) = |x − y|, and nearly all of the concepts we discuss below for metric spaces are natural generalizations of the corresponding concepts for R.
A special type of metric space that is particularly important in analysis is a normed space, which is a vector space whose metric is derived from a norm. On the other hand, every metric space is a special type of topological space, which is a set with the notion of an open set but not necessarily a distance.
The concepts of metric, normed, and topological spaces clarify our previous discussion of the analysis of real functions, and they provide the foundation for wide-ranging developments in analysis. The aim of this chapter is to introduce these spaces and give some examples, but their theory is too extensive to describe here in any detail.

13.1. Metric spaces
A metric on a set is a function that satisfies the minimal properties we might expect of a distance.
Definition 13.1. A metric d on a set X is a function d : X × X → R such that for all x, y, z ∈ X:
(1) d(x, y) ≥ 0 and d(x, y) = 0 if and only if x = y (positivity);
(2) d(x, y) = d(y, x) (symmetry);
(3) d(x, y) ≤ d(x, z) + d(z, y) (triangle inequality).
A metric space (X, d) is a set X with a metric d defined on X.
271

272

13. Metric, Normed, and Topological Spaces

In general, many different metrics can be defined on the same set X, but if the metric on X is clear from the context, we refer to X as a metric space.
Subspaces of a metric space are subsets whose metric is obtained by restricting the metric on the whole space.
Definition 13.2. Let (X, d) be a metric space. A metric subspace (A, dA ) of (X, d) consists of a subset A ⊂ X whose metric dA : A × A → R is is the restriction of d to A; that is, dA (x, y) = d(x, y) for all x, y ∈ A.
We can often formulate intrinsic properties of a subset A ⊂ X of a metric space
X in terms of properties of the corresponding metric subspace (A, dA ).
When it is clear that we are discussing metric spaces, we refer to a metric subspace as a subspace, but metric subspaces should not be confused with other types of subspaces (for example, vector subspaces of a vector space).
13.1.1. Examples. In the following examples of metric spaces, the verification of the properties of a metric is mostly straightforward and is left as an exercise.
Example 13.3. A rather trivial example of a metric on any set X is the discrete metric 0 if x = y, d(x, y) =
1 if x = y.
This metric is nevertheless useful in illustrating the definitions and providing counterexamples.
Example 13.4. Define d : R × R → R by d(x, y) = |x − y|.
Then d is a metric on R. The natural numbers N and the rational numbers Q with the absolute-value metric are metric subspaces of R, as is any other subset A ⊂ R.
Example 13.5. Define d : R2 × R2 → R by d(x, y) = |x1 − y1 | + |x2 − y2 |
2

1

x = (x1 , x2 ),

y = (y1 , y2 ).

1

Then d is a metric on R , called the metric. (Here, “ ” is pronounced “ell-one.”)
For example, writing z = (z1 , z2 ), we have d(x, y) = |x1 − z1 + z1 − y1 | + |x2 − z2 + z2 − y2 |
≤ |x1 − z1 | + |z1 − y1 | + |x2 − z2 | + |z2 − y2 |
≤ d(x, z) + d(z, y), so d satisfies the triangle inequality. This metric is sometimes referred to informally as the “taxicab” metric, since it’s the distance one would travel by taxi on a rectangular grid of streets.
Example 13.6. Define d : R2 × R2 → R by d(x, y) =

(x1 − y1 )2 + (x2 − y2 )2

x = (x1 , x2 ),

y = (y1 , y2 ).

Then d is a metric on R2 , called the Euclidean, or 2 , metric. It corresponds to the usual notion of distance between points in the plane. The triangle inequality is geometrically obvious but an analytical proof is non-trivial (see Theorem 13.26 below). 13.1. Metric spaces

273

0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
−0.1

0

0.2

0.4

0.6

0.8

1

Figure 1. The graph of a function f ∈ C([0, 1]) is in blue. A function whose distance from f with respect to the sup-norm is less than 0.1 has a graph that lies inside the dotted red lines y = f (x) ± 0.1 e.g., the green graph.

Example 13.7. Define d : R2 × R2 → R by d(x, y) = max (|x1 − y1 | , |x2 − y2 |)
2

Then d is a metric on R , called the



x = (x1 , x2 ),

y = (y1 , y2 ).

, or maximum, metric.

Example 13.8. Define d : R2 × R2 → R for x = (x1 , x2 ), y = (y1 , y2 ) as follows: if
(x1 , x2 ) = k(y1 , y2 ) for k ∈ R, then d(x, y) =

x2 + x2 +
1
2

2
2
y1 + y2 ;

and if (x1 , x2 ) = k(y1 , y2 ) for some k ∈ R, then d(x, y) =

2

2

(x1 − y1 ) + (x2 − y2 ) .

That is, d(x, y) is the sum of the Euclidean distances of x and y from the origin, unless x and y lie on the same line through the origin, in which case it is the
Euclidean distance from x to y. Then d defines a metric on R2 .
In England, d is sometimes called the “British Rail” metric, because all the train lines radiate from London (located at 0). To take a train from town x to town y, one has to take a train from x to 0 and then take a train from 0 to y, unless x and y are on the same line, when one can take a direct train.
Example 13.9. Let C(K) denote the set of continuous functions f : K → R, where K ⊂ R is compact; for example, K = [a, b] is a closed, bounded interval. If f, g ∈ C(K) define d(f, g) = sup |f (x) − g(x)| = f − g x∈K ∞,

f



= sup |f (x)| . x∈K 274

13. Metric, Normed, and Topological Spaces

1.5

1

x2

0.5

0

−0.5

−1

−1.5
−1.5

−1

−0.5

0 x1 0.5

1

1.5

Figure 2. The unit balls B1 (0) on R2 for different metrics: they are the interior of a diamond ( 1 -norm), a circle ( 2 -norm), or a square ( ∞ -norm).
The ∞ -ball of radius 1/2 is also indicated by the dashed line.

The function d : C(K) × C(K) → R is well-defined, since a continuous function on a compact set is bounded, and d is a metric on C(K). Two functions are close with respect to this metric if their values are close at every point x ∈ K. (See Figure 1.)
We refer to f ∞ as the sup-norm of f . Section 13.6 has further discussion.
13.1.2. Open and closed balls. A ball in a metric space is analogous to an interval in R.
Definition 13.10. Let (X, d) be a metric space. The open ball Br (x) of radius r > 0 and center x ∈ X is the set of points whose distance from x is less than r,
Br (x) = {y ∈ X : d(x, y) < r} .
¯
The closed ball Br (x) of radius r > 0 and center x ∈ X as the set of points whose distance from x is less than or equal to r,
¯
Br (x) = {y ∈ X : d(x, y) ≤ r} .
The term “ball” is used to denote a “solid ball,” rather than the “sphere” of points whose distance from the center x is equal to r.
Example 13.11. Consider R with its standard absolute-value metric, defined in
Example 13.4. Then the open ball Br (x) = {y ∈ R : |x − y| < r} is the open interval
¯
of radius r centered at x, and the closed ball Br (x) = {y ∈ R : |x − y| ≤ r} is the closed interval of radius r centered at x.

13.1. Metric spaces

275

Example 13.12. For R2 with the Euclidean metric defined in Example 13.6, the ball Br (x) is an open disc of radius r centered at x. For the 1 -metric in Example 13.5, the ball is a diamond of diameter 2r, and for the ∞ -metric in Example 13.7, it is a square of side 2r. The unit ball B1 (0) for each of these metrics is illustrated in Figure 2.
Example 13.13. Consider the space C(K) of continuous functions f : K → R on a compact set K ⊂ R with the sup-norm metric defined in Example 13.9. The ball Br (f ) consists of all continuous functions g : K → R whose values are within r of the values of f at every x ∈ K. For example, for the function f shown in
Figure 1 with r = 0.1, the open ball Br (f ) consists of all continuous functions g whose graphs lie between the red lines.
One has to be a little careful with the notion of balls in a general metric space, because they don’t always behave the way their name suggests.
Example 13.14. Let X be a set with the discrete metric given in Example 13.3.
Then Br (x) = {x} consists of a single point if 0 ≤ r < 1 and Br (x) = X is the whole space if r ≥ 1. (See also Example 13.44.)
An another example, what are the open balls for the metric in Example 13.8?
A set in a metric space is bounded if it is contained in a ball of finite radius.
Definition 13.15. Let (X, d) be a metric space. A set A ⊂ X is bounded if there exist x ∈ X and 0 ≤ R < ∞ such that d(x, y) ≤ R for all y ∈ A, meaning that
A ⊂ BR (x).
Unlike R, or a vector space, a general metric space has no distinguished origin, but the center point of the ball is not important in this definition of a bounded set.
The triangle inequality implies that d(y, z) < R + d(x, y) if d(x, z) < R, so
BR (x) ⊂ BR (y)

for R = R + d(x, y).

Thus, if Definition 13.15 holds for some x ∈ X, then it holds for every x ∈ X.
We can say equivalently that A ⊂ X is bounded if the metric subspace (A, dA ) is bounded.
Example 13.16. Let X be a set with the discrete metric given in Example 13.3.
Then X is bounded since X = Br (x) if r > 1 and x ∈ X.
Example 13.17. A subset A ⊂ R is bounded with respect to the absolute-value metric if A ⊂ (−R, R) for some 0 < R < ∞.
Example 13.18. Let C(K) be the space of continuous functions f : K → R on a compact set defined in Example 13.9. The set F ⊂ C(K) of all continuous functions f : K → R such that |f (x)| ≤ 1 for every x ∈ K is bounded, since d(f, 0) = f ∞ ≤
1 for all f ∈ F. The set of constant functions {f : f (x) = c for all x ∈ K} isn’t bounded, since f ∞ = |c| may be arbitrarily large.
We define the diameter of a set in an analogous way to Definition 3.5 for subsets of R.

276

13. Metric, Normed, and Topological Spaces

Definition 13.19. Let (X, d) be a metric space and A ⊂ X. The diameter 0 ≤ diam A ≤ ∞ of A is diam A = sup {d(x, y) : x, y ∈ A} .
It follows from the definitions that A is bounded if and only if diam A < ∞.
The notions of an upper bound, lower bound, supremum, and infimum in R depend on its order properties. Unlike properties of R based on the absolute value, they do not generalize to an arbitrary metric space, which isn’t equipped with an order relation.

13.2. Normed spaces
In general, there are no algebraic operations defined on a metric space, only a distance function. Most of the spaces that arise in analysis are vector, or linear, spaces, and the metrics on them are usually derived from a norm, which gives the
“length” of a vector.
We assume that the reader is familiar with the basic theory of vector spaces, and we consider only real vector spaces.
Definition 13.20. A normed vector space (X, · ) is a vector space X together with a function · : X → R, called a norm on X, such that for all x, y ∈ X and k ∈ R:
(1) 0 ≤ x < ∞ and x = 0 if and only if x = 0;
(2)

kx = |k| x ;

(3)

x+y ≤ x + y .

The properties in Definition 13.20 are natural ones to require of a length. The length of x is 0 if and only if x is the 0-vector; multiplying a vector by a scalar k multiplies its length by |k|; and the length of the “hypoteneuse” x + y is less than or equal to the sum of the lengths of the “sides” x, y. Because of this last interpretation, property (3) is called the triangle inequality. We also refer to a normed vector space as a normed space for short.
Proposition 13.21. If (X, · ) is a normed vector space, then d : X × X → R defined by d(x, y) = x − y is a metric on X.
Proof. The metric-properties of d follow directly from the properties of a norm in
Definition 13.20. The positivity is immediate. Also, we have d(x, y) = x − y = − (x − y) = y − x = d(y, x), d(x, y) = x − z + z − y ≤ x − z + z − y = d(x, z) + d(y, z), which proves the symmetry of d and the triangle inequality.
If X is a normed vector space, then we always use the metric associated with its norm, unless stated specifically otherwise.
A metric associated with a norm has the additional properties that for all x, y, z ∈ X and k ∈ R d(x + z, y + z) = d(x, y),

d(kx, ky) = |k|d(x, y),

13.2. Normed spaces

277

which are called translation invariance and homogeneity, respectively. These properties imply that the open balls Br (x) in a normed space are rescaled, translated versions of the unit ball B1 (0).
Example 13.22. The set of real numbers R with the absolute-value norm | · | is a one-dimensional normed vector space.
Example 13.23. The discrete metric in Example 13.3 on R, and the metric in
Example 13.8 on R2 are not derived from a norm. (Why?)
Example 13.24. The space R2 with any of the norms defined for x = (x1 , x2 ) by x 1

= |x1 | + |x2 |,

x

2

x2 + x2 ,
1
2

=

x



= max (|x1 |, |x2 |)

is a normed vector space. The corresponding metrics are the “taxicab” metric in
Example 13.5, the Euclidean metric in Example 13.6, and the maximum metric in
Example 13.7, respectively.
The norms in Example 13.24 are special cases of a fundamental family of p norms on Rn . All of the p -norms reduce to the absolute-value norm if n = 1, but they are different if n ≥ 2.
Definition 13.25. For 1 ≤ p < ∞, the x = (x1 , x2 , . . . , xn ) ∈ Rn by x p

p

-norm

·

p

= (|x1 |p + |x2 |p + · · · + |xn |p )

: Rn → R is defined for

1/p

The 2 -norm is called the Euclidean norm. For p = ∞, the is defined by x ∞ = max (|x1 |, |x2 |, . . . , |xn |) .
The notation for the



.



-norm ·



: Rn → R

-norm is explained by the fact that x ∞

= lim x p .

Moreover, consistent with its name, the

p→∞ p -norm is a norm.

Theorem 13.26. Let 1 ≤ p ≤ ∞. The space Rn with the vector space.

p

-norm is a normed

Proof. The space Rn is an n-dimensional vector space, so we just need to verify the properties of the norm.
The positivity and homogeneity of the p -norm follow immediately from its definition. We verify the triangle inequality here only for the cases p = 1, ∞.
Let x = (x1 , x2 , . . . , xn ) and y = (y1 , y2 , . . . , yn ) be points in Rn . For p = 1, we have x+y

1

= |x1 + y1 | + |x2 + y2 | + · · · + |xn + yn |
≤ |x1 | + |y1 | + |x2 | + |y2 | + · · · + |xn | + |yn |
≤ x

1

+ y 1.

278

13. Metric, Normed, and Topological Spaces

For p = ∞, we have x+y ∞

= max (|x1 + y1 |, |x2 + y2 |, . . . , |xn + yn |)
≤ max (|x1 | + |y1 |, |x2 | + |y2 |, . . . , |xn | + |yn |)
≤ max (|x1 |, |x2 |, . . . , |xn |) + max (|y1 |, |y2 |, . . . , |yn |)
≤ x



+ y

∞.

The proof of the triangle inequality for 1 < p < ∞ is more difficult and is given in
Section 13.7.
We can use Definition 13.25 to define x p for any 0 < p ≤ ∞. However, if
0 < p < 1, then · p doesn’t satisfy the triangle inequality, so it is not a norm.
This explains the restriction 1 ≤ p ≤ ∞.
Although the p -norms are numerically different for different values of p, they are equivalent in the following sense (see Corollary 13.29).
Definition 13.27. Let X be a vector space. Two norms · a , · b on X are equivalent if there exist strictly positive constants M ≥ m > 0 such that m x

≤ x

a

≤M x

b

a

for all x ∈ X.

Geometrically, two norms are equivalent if and only if an open ball with respect to either one of the norms contains an open ball with respect to the other.
Equivalent norms define the same open sets, convergent sequences, and continuous functions, so there are no topological differences between them.
The next theorem shows that every
Figure 2.)

p

-norm is equivalent to the



-norm. (See

Theorem 13.28. Suppose that 1 ≤ p < ∞. Then, for every x ∈ Rn , x ∞

≤ x

p

≤ n1/p x

∞.

Proof. Let x = (x1 , x2 , . . . , xn ) ∈ Rn . Then for each 1 ≤ i ≤ n, we have
|xi | ≤ (|x1 |p + |x2 |p + · · · + |xn |p )

1/p

= x p,

which implies that x ∞

= max {|xi | : 1 ≤ i ≤ n} ≤ x p .

On the other hand, since |xi | ≤ x x p



≤ (n x

for every 1 ≤ i ≤ n, we have p 1/p
∞)

= n1/p x

∞,

which proves the result
As an immediate consequence, we get the equivalence of the
Corollary 13.29. The

p

Proof. We have n−1/q x

and q q

p

-norms.

norms on Rn are equivalent for every 1 ≤ p, q ≤ ∞.

≤ x



≤ x

p

≤ n1/p x



≤ n1/p x q .

With more work, one can prove that that x q ≤ x p for 1 ≤ p ≤ q ≤ ∞, meaning that the unit ball with respect to the q -norm contains the unit ball with respect to the p -norm.

13.3. Open and closed sets

279

13.3. Open and closed sets
There are natural definitions of open and closed sets in a metric space, analogous to the definitions in R. Many of the properties of such sets in R carry over immediately to general metric spaces.
Definition 13.30. Let X be a metric space. A set G ⊂ X is open if for every x ∈ G there exists r > 0 such that Br (x) ⊂ G. A subset F ⊂ X is closed if
F c = X \ F is open.
We can rephrase this definition more geometrically in terms of neighborhoods.
Definition 13.31. Let X be a metric space. A set U ⊂ X is a neighborhood of x ∈ X if Br (x) ⊂ U for some r > 0.
Definition 13.30 then states that a subset of a metric space is open if and only if every point in the set has a neighborhood that is contained in the set. In particular, a set is open if and only if it is a neighborhood of every point in the set.
Example 13.32. If d is the discrete metric on a set X in Example 13.3, then every subset A ⊂ X is open, since for every x ∈ A we have B1/2 (x) = {x} ⊂ A. Every set is also closed, since its complement is open.
Example 13.33. Consider R2 with the Euclidean norm (or any other
If f : R → R is a continuous function, then

p

-norm).

E = (x1 , x2 ) ∈ R2 : x2 < f (x1 ) is an open subset of R2 . If f is discontinuous, then E needn’t be open. We leave the proofs as an exercise.
Example 13.34. If (X, d) is a metric space and A ⊂ X, then B ⊂ A is open in the metric subspace (A, dA ) if and only if B = A ∩ G where G is an open subset of
X. This is consistent with our previous definition of relatively open sets in A ⊂ R.
Open sets with respect to one metric on a set need not be open with respect to another metric. For example, every subset of R with the discrete metric is open, but this is not true of R with the absolute-value metric.
Consistent with our terminology, open balls are open and closed balls are closed.
Proposition 13.35. Let X be a metric space. If x ∈ X and r > 0, then the open
¯
ball Br (x) is open and the closed ball Br (x) is closed.
Proof. Suppose that y ∈ Br (x) where r > 0, and let = r − d(x, y) > 0. The triangle inequality implies that B (y) ⊂ Br (x), which proves that Br (x) is open.
¯
Similarly, if y ∈ Br (x)c and = d(x, y) − r > 0, then the triangle inequality implies
¯r (x)c , which proves that Br (x)c is open and Br (x) is closed.
¯
¯ that B (y) ⊂ B
The next theorem summarizes three basic properties of open sets.
Theorem 13.36. Let X be a metric space.
(1) The empty set ∅ and the whole set X are open.
(2) An arbitrary union of open sets is open.

280

13. Metric, Normed, and Topological Spaces

(3) A finite intersection of open sets is open.
Proof. Property (1) follows immediately from Definition 13.30. (The empty set satisfies the definition vacuously: since it has no points, every point has a neighborhood that is contained in the set.)
To prove (2), let {Gi ⊂ X : i ∈ I} be an arbitrary collection of open sets. If x∈ Gi , i∈I then x ∈ Gi for some i ∈ I. Since Gi is open, there exists r > 0 such that
Br (x) ⊂ Gi , and then
Br (x) ⊂
Gi . i∈I Thus, the union

Gi is open.

The prove (3), let {G1 , G2 , . . . , Gn } be a finite collection of open sets. If n x∈

Gi , i=1 then x ∈ Gi for every 1 ≤ i ≤ n. Since Gi is open, there exists ri > 0 such that
Bri (x) ⊂ Gi . Let r = min(r1 , r2 , . . . , rn ) > 0. Then Br (x) ⊂ Bri (x) ⊂ Gi for every
1 ≤ i ≤ n, which implies that n Br (x) ⊂

Gi . i=1 Thus, the finite intersection

Gi is open.

The previous proof fails if we consider the intersection of infinitely many open sets {Gi : i ∈ I} because we may have inf{ri : i ∈ I} = 0 even though ri > 0 for every i ∈ I.
The properties of closed sets follow by taking complements of the corresponding properties of open sets and using De Morgan’s laws, exactly as in the proof of
Proposition 5.20.
Theorem 13.37. Let X be a metric space.
(1) The empty set ∅ and the whole set X are closed.
(2) An arbitrary intersection of closed sets is closed.
(3) A finite union of closed sets is closed.
The following relationships of points to sets are entirely analogous to the ones in Definition 5.22 for R.
Definition 13.38. Let X be a metric space and A ⊂ X.
(1) A point x ∈ A is an interior point of A if Br (x) ⊂ A for some r > 0.
(2) A point x ∈ A is an isolated point of A if Br (x) ∩ A = {x} for some r > 0, meaning that x is the only point of A that belongs to Br (x).
(3) A point x ∈ X is a boundary point of A if, for every r > 0, the ball Br (x) contains a point in A and a point not in A.

13.3. Open and closed sets

281

(4) A point x ∈ X is an accumulation point of A if, for every r > 0, the ball Br (x) contains a point y ∈ A such that y = x.
A set is open if and only if every point is an interior point and closed if and only if every accumulation point belongs to the set.
We define the interior, boundary, and closure of a set as follows.
Definition 13.39. Let A be a subset of a metric space. The interior A◦ of A is the set of interior points of A. The boundary ∂A of A is the set of boundary points.
¯
The closure of A is A = A ∪ ∂A.
¯
It follows that x ∈ A if and only if the ball Br (x) contains some point in A for every r > 0. The next proposition gives equivalent topological definitions.
Proposition 13.40. Let X be a metric space and A ⊂ X. The interior of A is the largest open set contained in A,
A◦ =

{G ⊂ A : G is open in X} ,

the closure of A is the smallest closed set that contains A,
¯
A=

{F ⊃ A : F is closed in X} ,

and the boundary of A is their set-theoretic difference,
¯
∂A = A \ A◦ .
Proof. Let A1 denote the set of interior points of A, as in Definition 13.38, and
A2 =
{G ⊂ A : G is open}. If x ∈ A1 , then there is an open neighborhood
G ⊂ A of x, so G ⊂ A2 and x ∈ A2 . It follows that A1 ⊂ A2 . To get the opposite inclusion, note that A2 is open by Theorem 13.36. Thus, if x ∈ A2 , then A2 ⊂ A is a neighborhood of x, so x ∈ A1 and A2 ⊂ A1 . Therefore A1 = A2 , which proves the result for the interior.
Next, Definition 13.38 and the previous result imply that
¯
(A)c = (Ac )◦ =

{G ⊂ Ac : G is open} .

Using De Morgan’s laws, and writing Gc = F , we get that
¯
A=

c

{G ⊂ Ac : G is open} =

{F ⊃ A : F is closed} ,

which proves the result for the closure.
¯
Finally, if x ∈ ∂A, then x ∈ A = A ∪ ∂A, and no neighborhood of x is
¯
¯ contained in A, so x ∈ A◦ . It follows that x ∈ A \ A◦ and ∂A ⊂ A \ A◦ . Conversely,
/
¯
¯
if x ∈ A \ A◦ , then every neighborhood of x contains points in A, since x ∈ A, and every neighborhood contains points in Ac , since x ∈ A◦ . It follows that x ∈ ∂A
/
¯ and A \ A◦ ⊂ ∂A, which completes the proof.
It follows from Theorem 13.36, and Theorem 13.37 that the interior A◦ is open,
¯
the closure A is closed, and the boundary ∂A is closed. Furthermore, A is open if
¯
and only if A = A◦ , and A is closed if and only if A = A.
Let us illustrate these definitions with some examples, whose verification we leave as an exercise.

282

13. Metric, Normed, and Topological Spaces

Example 13.41. Consider R with the absolute-value metric. If I = (a, b) and
¯
¯
J = [a, b], then I ◦ = J ◦ = (a, b), I = J = [a, b], and ∂I = ∂J = {a, b}. Note

¯ that I = I , meaning that I is open, and J = J, meaning that J is closed. If

¯ = ∂A = A ∪ {0}. Thus, A is neither open
A = {1/n : n ∈ N}, then A = ∅ and A
¯
(A = A◦ ) nor closed (A = A). If Q is the set of rational numbers, then Q◦ = ∅
¯ = ∂Q = R. Thus, Q is neither open nor closed. Since Q = R, we say that Q
¯
and Q is dense in R.
Example 13.42. Let A be the unit open ball in R2 with the Euclidean metric,
A = (x, y) ∈ R2 : x2 + y 2 < 1 .
Then A◦ = A, the closure of A is the closed unit ball
¯
A = (x, y) ∈ R2 : x2 + y 2 ≤ 1 , and the boundary of A is the unit circle
∂A = (x, y) ∈ R2 : x2 + y 2 = 1 ,
Example 13.43. Let A be the unit open ball with the x-axis deleted in R2 with the Euclidean metric,
A = (x, y) ∈ R2 : x2 + y 2 < 1, y = 0 .
Then A◦ = A, the closure of A is the closed unit ball
¯
A = (x, y) ∈ R2 : x2 + y 2 ≤ 1 , and the boundary of A consists of the unit circle and the x-axis,
∂A = (x, y) ∈ R2 : x2 + y 2 = 1 ∪ (x, 0) ∈ R2 : |x| ≤ 1 .
Example 13.44. Suppose that X is a set containing at least two elements with the discrete metric defined in Example 13.3. If x ∈ X, then the unit open ball is
B1 (x) = {x}, and it is equal to its closure Br (x) = {x}. On the other hand, the
¯
closed unit ball is B1 (x) = X. Thus, in a general metric space, the closure of an open ball of radius r > 0 need not be the closed ball of radius r.

13.4. Completeness, compactness, and continuity
A sequence (xn ) in a set X is a function f : N → X, where xn = f (n) is the nth term in the sequence.
Definition 13.45. Let (X, d) be a metric space. A sequence (xn ) in X converges to x ∈ X, written xn → x as n → ∞ or lim xn = x,

n→∞

if for every

> 0 there exists N ∈ N such that n > N implies that d(xn , x) < .

That is, xn → x in X if d(xn , x) → 0 in R. Equivalently, xn → x as n → ∞ if for every neighborhood U of x there exists N ∈ N such that xn ∈ U for all n > N .

13.4. Completeness, compactness, and continuity

283

Example 13.46. If d is the discrete metric on a set X, then a sequence (xn ) converges in (X, d) if and only if it is eventually constant. That is, there exists x ∈ X and N ∈ N such that xn = x for all n > N ; and, in that case, the sequence converges to x.
Example 13.47. For R with its standard absolute-value metric, Definition 13.45 is the definition of the convergence of a real sequence.
As for subsets of R, we can give a sequential characterization of closed sets in a metric space.
Theorem 13.48. A subset F ⊂ X of a metric space X is closed if and only if the limit of every convergent sequence in F belongs to F .
Proof. First suppose that F is closed, meaning that F c is open. If (xn ) be a sequence in F and x ∈ F c , then there is a neighborhood U ⊂ F c of x which contains no terms in the sequence, so (xn ) cannot converge to x. Thus, the limit of every convergent sequence in F belongs to F .
Conversely, suppose that F is not closed. Then F c is not open, and there exists a point x ∈ F c such that every neighborhood of x contains points in F . Choose xn ∈ F such that xn ∈ B1/n (x). Then (xn ) is a sequence in F whose limit x does not belong to F , which proves the result.
We define the completeness of metric spaces in terms of Cauchy sequences.
Definition 13.49. Let (X, d) be a metric space. A sequence (xn ) in X is a Cauchy sequence for every > 0 there exists N ∈ N such that m, n > N implies that d(xm , xn ) < .
Every convergent sequence is Cauchy: if xn → x then given > 0 there exists
N such that d(xn , x) < /2 for all n > N , and then for all m, n > N we have d(xm , xn ) ≤ d(xm , x) + d(xn , x) < .
Complete spaces are ones in which the converse is also true.
Definition 13.50. A metric space is complete if every Cauchy sequence converges.
Example 13.51. If d is the discrete metric on a set X, then (X, d) is a complete metric space since every Cauchy sequence is eventually constant.
Example 13.52. The space (R, | · |) is complete, but the metric subspace (Q, | · |) is not complete.
In a complete space, we have the following simple criterion for the completeness of a subspace.
Proposition 13.53. A subspace (A, dA ) of a complete metric space (X, d) is complete if and only if A is closed in X.
Proof. If A is a closed subspace of a complete space X and (xn ) is a Cauchy sequence in A, then (xn ) is a Cauchy sequence in X, so it converges to x ∈ X.
Since A is closed, x ∈ A, which shows that A is complete.

284

13. Metric, Normed, and Topological Spaces

Conversely, if A is not closed, then by Proposition 13.48 there is a convergent sequence in A whose limit does not belong to A. Since it converges, the sequence is Cauchy, but it doesn’t have a limit in A, so A is not complete.
The most important complete metric spaces in analysis are the complete normed spaces, or Banach spaces.
Definition 13.54. A Banach space is a complete normed vector space.
For example, R with the absolute-value norm is a one-dimensional Banach space. Furthermore, it follows from the completeness of R that every finite-dimensional normed vector space over R is complete. We prove this for the p -norms given in
Definition 13.25.
Theorem 13.55. Let 1 ≤ p ≤ ∞. The vector space Rn with the
Banach space.

p

-norm is a

Proof. Suppose that (xk )∞ is a sequence of points k=1 xk = (x1,k , x2,k , . . . , xn,k ) n in R that is Cauchy with respect to the

p

-norm. From Theorem 13.28,

|xi,j − xi,k | ≤ xj − xk

p

,

so each coordinate sequence (xi,k )∞ is Cauchy in R. The completeness of R k=1 implies that xi,k → xi as k → ∞ for some xi ∈ R. Let x = (x1 , x2 , . . . , xn ). Then, from Theorem 13.28 again, xk − x

p

≤ C max {|xi,k − xi | : i = 1, 2, . . . , n} ,

where C = n1/p if 1 ≤ p < ∞ or C = 1 if p = ∞. Given > 0, choose Ni ∈ N such that |xi,k − xi | < /C for all k > Ni , and let N = max{N1 , N2 , . . . , Nn }. Then k > N implies that xk − x p < , which proves that xk → x with respect to the p -norm. Thus, (Rn , · p ) is complete.
The Bolzano-Weierstrass property provides a sequential definition of compactness in a general metric space.
Definition 13.56. A subset K ⊂ X of a metric space X is sequentially compact, or compact for short, if every sequence in K has a convergent subsequence whose limit belongs to K.
Explicitly, this condition means that if (xn ) is a sequence of points xn ∈ K then there is a subsequence (xnk ) such that xnk → x as k → ∞ where x ∈ K.
Compactness is an intrinsic property of a subset: K ⊂ X is compact if and only if the metric subspace (K, dK ) is compact.
Although this definition is similar to the one for compact sets in R, there is a significant difference between compact sets in a general metric space and in R.
Every compact subset of a metric space is closed and bounded, as in R, but it is not always true that a closed, bounded set is compact.
First, as the following example illustrates, a set must be complete, not just closed, to be compact. (A closed subset of R is complete because R is complete.)

13.4. Completeness, compactness, and continuity

285

Example 13.57. Consider the metric space Q with the absolute value norm. The set [0, 2] ∩ Q is a closed, bounded subspace, but it is not compact since a sequence

of rational numbers that converges in R to an irrational number such as 2 has no convergent subsequence in Q.
Second, completeness and boundedness is not enough, in general, to imply compactness. Example 13.58. Consider N, or any other infinite set, with the discrete metric,
0 if m = n,
1 if m = n.

d(m, n) =

Then N is complete and bounded with respect to this metric. However, it is not compact since xn = n is a sequence with no convergent subsequence, as is clear from Example 13.46.
The correct generalization to an arbitrary metric space of the characterization of compact sets in R as closed and bounded replaces “closed” with “complete” and
“bounded” with “totally bounded,” which is defined as follows.
Definition 13.59. Let (X, d) be a metric space. A subset A ⊂ X is totally bounded if for every > 0 there exists a finite set {x1 , x2 , . . . , xn } of points in X such that n A⊂

B (xi ). i=1 The proof of the following result is then completely analogous to the proof of the Bolzano-Weierstrass theorem in Theorem 3.57 for R.
Theorem 13.60. A subset K ⊂ X of a metric space X is sequentially compact if and only if it is is complete and totally bounded.
The definition of the continuity of functions between metric spaces parallels the definitions for real functions.
Definition 13.61. Let (X, dX ) and (Y, dY ) be metric spaces. A function f : X →
Y is continuous at c ∈ X if for every > 0 there exists δ > 0 such that dX (x, c) < δ implies that dY (f (x), f (c)) < .
The function is continuous on X if it is continuous at every point of X.
Example 13.62. A function f : R2 → R, where R2 is equipped with the Euclidean norm · and R with the absolute value norm | · |, is continuous at c ∈ R2 if x − c < δ implies that f (x) − f (c) <
Explicitly, if x = (x1 , x2 ), c = (c1 , c2 ) and f (x) = (f1 (x1 , x2 ), f2 (x1 , x2 )) , this condition reads:
(x1 − c1 )2 + (x2 − c2 )2 < δ implies that
|f (x1 , x2 ) − f1 (c1 , c2 )| < .

286

13. Metric, Normed, and Topological Spaces

Example 13.63. A function f : R → R2 , where R2 is equipped with the Euclidean norm · and R with the absolute value norm | · |, is continuous at c ∈ R2 if
|x − c| < δ implies that f (x) − f (c) <
Explicitly, if f (x) = (f1 (x), f2 (x)) , where f1 , f2 : R → R, this condition reads: |x − c| < δ implies that
2

2

[f1 (x) − f1 (c)] + [f2 (x) − f2 (c)] < .
The previous examples generalize in a natural way to define the continuity of an m-component vector-valued function of n variables, f : Rn → Rm . The definition looks complicated if it is written out explicitly, but it is much clearer if it is expressed in terms or metrics or norms.
Example 13.64. Define F : C([0, 1]) → R by
F (f ) = f (0), where C([0, 1]) is the space of continuous functions f : [0, 1] → R equipped with the sup-norm described in Example 13.9, and R has the absolute value norm. That is,
F evaluates a function f (x) at x = 0. Thus F is a function acting on functions, and its values are scalars; such a function, which maps functions to scalars, is called a functional. Then F is continuous, since f − g ∞ < implies that |f (0) − g(0)| < .
(That is, we take δ = ).
We also have a sequential characterization of continuity in a metric space.
Theorem 13.65. Let X and Y be metric spaces. A function f : X → Y is continuous at c ∈ X if and only if f (xn ) → f (c) as n → ∞ for every sequence (xn ) in X such that xn → c as n → ∞,
We define uniform continuity similarly to the real case.
Definition 13.66. Let (X, dX ) and (Y, dY ) be metric spaces. A function f : X →
Y is uniformly continuous on X if for every > 0 there exists δ > 0 such that dX (x, y) < δ implies that dY (f (x), f (y)) < .
The proofs of the following theorems are identically to the proofs we gave for functions f : A ⊂ R → R. First, a function on a metric space is continuous if and only if the inverse images of open sets are open.
Theorem 13.67. A function f : X → Y between metric spaces X and Y is continuous on X if and only if f −1 (V ) is open in X for every open set V in Y .
Second, the continuous image of a compact set is compact.
Theorem 13.68. Let f : K → Y be a continuous function from a compact metric space K to a metric space Y . Then f (K) is a compact subspace of Y .
Third, a continuous function on a compact set is uniformly continuous.
Theorem 13.69. If f : K → Y is a continuous function on a compact set K, then f is uniformly continuous.

13.5. Topological spaces

287

13.5. Topological spaces
A collection of subsets of a set X with the properties of the open sets in a metric space given in Theorem 13.36 is called a topology on X, and a set with such a collection of open sets is called a topological space.
Definition 13.70. Let X be a set. A collection T ⊂ P(X) of subsets of X is a topology on X if it satisfies the following conditions.
(1) The empty set ∅ and the whole set X belong to T .
(2) The union of an arbitrary collection of sets in T belongs to T .
(3) The intersection of a finite collection of sets in T belongs to T .
A set G ⊂ X is open with respect to T if G ∈ T , and a set F ⊂ X is closed with respect to T if F c ∈ T . A topological space (X, T ) is a set X together with a topology T on X.
We can put different topologies on a set with two or more elements. If the topology on X is clear from the context, then we simply refer to X as a topological space and we don’t specify the topology when we refer to open or closed sets.
Every metric space with the open sets in Definition 13.30 is a topological space; the resulting collection of open sets is called the metric topology of the metric space. There are, however, topological spaces whose topology is not derived from any metric on the space.
Example 13.71. Let X be any set. Then T = P(X) is a topology on X, called the discrete topology. In this topology, every set is both open and closed. This topology is the metric topology associated with the discrete metric on X in Example 13.3.
Example 13.72. Let X be any set. Then T = {∅, X} is a topology on X, called the trivial topology. The empty set and the whole set are both open and closed, and no other subsets of X are either open or closed.
If X has at least two elements, then this topology is different from the discrete topology in the previous example, and it is not derived from a metric. To see this, suppose that x, y ∈ X and x = y. If d : X × X → R is a metric on X, then d(x, y) = r > 0 and Br (x) is a nonempty open set in the metric topology that doesn’t contain y, so Br (x) ∈ T .
/
The previous example illustrates a separation property of metric topologies that need not be satisfied by non-metric topologies.
Definition 13.73. A topological space (X, T ) is Hausdorff if for every x, y ∈ X with x = y there exist open sets U, V ∈ T such that x ∈ U , y ∈ V and U ∩ V = ∅.
That is, a topological space is Hausdorff if distinct points have disjoint neighborhoods. In that case, we also say that the topology is Hausdorff. Nearly all topological spaces that arise in analysis are Hausdorff, including, in particular, metric spaces.
Proposition 13.74. Every metric topology is Hausdorff.

288

13. Metric, Normed, and Topological Spaces

Proof. Let (X, d) be a metric space. If x, y ∈ X and x = y, then d(x, y) = r > 0, and Br/2 (x), Br/2 (y) are disjoint open neighborhoods of x, y.
Compact sets are defined topologically as sets with the Heine-Borel property.
Definition 13.75. Let X be a topological space. A set K ⊂ X is compact if every open cover of K has a finite subcover. That is, if {Gi : i ∈ I} is a collection of open sets such that
K⊂
Gi , i∈I then there is a finite subcollection {Gi1 , Gi2 , . . . , Gin } such that n K⊂

Gik . k=1 The Heine-Borel and Bolzano-Weierstrass properties are equivalent in every metric space.
Theorem 13.76. A metric space is compact if and only if it sequentially compact.
We won’t prove this result here, but we remark that it does not hold for general topological spaces.
Finally, we give the topological definitions of convergence, continuity, and connectedness which are essentially the same as the corresponding statements for R.
We also show that continuous maps preserve compactness and connectedness.
The definition of the convergence of a sequence is identical to the statement in
Proposition 5.9 for R.
Definition 13.77. Let X be a topological space. A sequence (xn ) in X converges to x ∈ X if for every neighborhood U of x there exists N ∈ N such that xn ∈ U for every n > N .
The following definition of continuity in a topological space corresponds to
Definition 7.2 for R (with the relative absolute-value topology on the domain A of f ) and Theorem 7.31.
Definition 13.78. Let f : X → Y be a function between topological spaces X, Y .
Then f is continuous at x ∈ X if for every neighborhood V ⊂ Y of f (x), there exists a neighborhood U ⊂ X of x such that f (U ) ⊂ V . The function f is continuous on
X if f −1 (V ) is open in X for every open set V ⊂ Y .
These definitions are equivalent to the corresponding “ -δ” definitions in a metric space, but they make sense in a general topological space because they refer only to neighborhoods and open sets. We illustrate them with two simple examples.
Example 13.79. If X is a set with the discrete topology in Example 13.71, then a sequence converges to x ∈ X if an only if its terms are eventually equal to x, since
{x} is a neighborhood of x. Every function f : X → Y is continuous with respect to the discrete topology on X, since every subset of X is open. On the other hand, if Y has the discrete topology, then f : X → Y is continuous if and only if f −1 ({y}) is open in X for every y ∈ Y .

13.6. * Function spaces

289

Example 13.80. Let X be a set with the trivial topology in Example 13.72. Then every sequence converges to every point x ∈ X, since the only neighborhood of x is
X itself. As this example illustrates, non-Hausdorff topologies have the unpleasant feature that limits need not be unique, which is one reason why they rarely arise in analysis. If Y has the trivial topology, then every function X → Y is continuous, since f −1 (∅) = ∅ and f −1 (Y ) = X are open in X. On the other hand, if X has the trivial topology and Y is Hausdorff, then the only continuous functions f : X → Y are the constant functions.
Our last definition of a connected topological space corresponds to Definition 5.58 for connected sets of real numbers (with the relative topology).
Definition 13.81. A topological space X is disconnected if there exist nonempty, disjoint open sets U , V such that X = U ∪ V . A topological space is connected if it is not disconnected.
The following proof that continuous functions map compact sets to compact sets and connected sets is the same as the proofs given in Theorem 7.35 and Theorem 7.32 for sets of real numbers. Note that a continuous function maps compact or connected sets in the opposite direction to open or closed sets, whose inverse image is open or closed.
Theorem 13.82. Suppose that f : X → Y is a continuous map between topological spaces X and Y . Then f (X) is compact if X is compact, and f (X) is connected if X is connected.
Proof. For the first part, suppose that X is compact. If {Vi : i ∈ I} is an open cover of f (X), then since f is continuous {f −1 (Vi ) : i ∈ I} is an open cover of X, and since X is compact there is a finite subcover f −1 (Vi1 ), f −1 (Vi2 ), . . . , f −1 (Vin ) .
It follows that
{Vi1 , Vi2 , . . . , Vin } is a finite subcover of the original open cover of f (X), which proves that f (X) is compact. For the second part, suppose that f (X) is disconnected. Then there exist nonempty, disjoint open sets U , V in Y such that U ∪ V ⊃ f (X). Since f is continuous, f −1 (U ), f −1 (V ) are open, nonempty, disjoint sets such that
X = f −1 (U ) ∪ f −1 (V ), so X is disconnected. It follows that f (X) is connected if X is connected.

13.6. * Function spaces
There are many function spaces, and their study is a central topic in analysis.
We discuss only one important example here: the space of continuous functions on a compact set equipped with the sup norm. We repeat its definition from
Example 13.9.

290

13. Metric, Normed, and Topological Spaces

Definition 13.83. Let K ⊂ R be a compact set. The space C(K) consists of the continuous functions f : K → R. Addition and scalar multiplication of functions is defined pointwise in the usual way: if f, g ∈ C(K) and k ∈ R, then
(f + g)(x) = f (x) + g(x),

(kf )(x) = k (f (x)) .

The sup-norm of a function f ∈ C(K) is defined by f ∞

= sup |f (x)|. x∈K Since a continuous function on a compact set attains its maximum and minimum value, for f ∈ C(K) we can also write f ∞

= max |f (x)|. x∈K Thus, the sup-norm on C(K) is analogous to the ∞ -norm on Rn . In fact, if
K = {1, 2, . . . , n} is a finite set with the discrete topology, then it is identical to the ∞ -norm.
Our previous results on continuous functions on a compact set can be formulated concisely in terms of this space. The following characterization of uniform convergence in terms of the sup-norm is easily seen to be equivalent to Definition 9.8.
Definition 13.84. A sequence (fn ) of functions fn : K → R converges uniformly on K to a function f : K → R if lim n→∞

fn − f



= 0.

Similarly, we can rephrase Definition 9.12 for a uniformly Cauchy sequence in terms of the sup-norm.
Definition 13.85. A sequence (fn ) of functions fn : K → R is uniformly Cauchy on K if for every > 0 there exists N ∈ N such that m, n > N implies that fm − fn



< .

Thus, the uniform convergence of a sequence of functions is defined in exactly the same way as the convergence of a sequence of real numbers with the absolute value | · | replaced by the sup-norm · . Moreover, like R, the space C(K) is complete. Theorem 13.86. The space C(K) with the sup-norm

·



is a Banach space.

Proof. From Theorem 7.15, the sum of continuous functions and the scalar multiple of a continuous function are continuous, so C(K) is closed under addition and scalar multiplication. The algebraic vector-space properties for C(K) follow immediately from those of R.
From Theorem 7.37, a continuous function on a compact set is bounded, so
· ∞ is well-defined on C(K). The sup-norm is clearly non-negative, and f ∞ = 0 implies that f (x) = 0 for every x ∈ K, meaning that f = 0 is the zero function.

13.6. * Function spaces

291

We also have for all f, g ∈ C(K) and k ∈ R that kf ∞

= sup |kf (x)| = |k| sup |f (x)| = |k| f x∈K f +g



∞,

x∈K

= sup |f (x) + g(x)| x∈K ≤ sup {|f (x)| + |g(x)|} x∈K ≤ sup |f (x)| + sup |g(x)| x∈K ≤ f

x∈K



+ g

∞,

which verifies the properties of a norm.
Finally, Theorem 9.13 implies that a uniformly Cauchy sequence converges uniformly so C(K) is complete.
For comparison with the sup-norm, we consider a different norm on C([a, b]) called the one-norm, which is analogous to the 1 -norm on Rn .
Definition 13.87. If f : [a, b] → R is a Riemann integrable function, then the one-norm of f is b f

1

|f (x)| dx.

= a Theorem 13.88. The space C([a, b]) of continuous functions f : [a, b] → R with the one-norm · 1 is a normed space.
Proof. As shown in Theorem 13.86, C([a, b]) is a vector space. Every continuous function is Riemann integrable on a compact interval, so · 1 : C([a, b]) → R is well-defined, and we just have to verify that it satisfies the properties of a norm. b Since |f | ≥ 0, we have f 1 = a |f | ≥ 0. Furthermore, since f is continuous,
Proposition 11.42 shows that f 1 = 0 implies that f = 0, which verifies the positivity. If k ∈ R, then b kf

1

b

|kf | = |k|

= a |f | = |k| f

1,

a

which verifies the homogeneity. Finally, the triangle inequality is satisfied since b f +g

1

b

|f + g| ≤

= a b

|f | + |g| = a b

|f | + a |g| = f

1

+ g 1.

a

Although C([a, b]) equipped with the one-norm · 1 is a normed space, it is not complete, and therefore it is not a Banach space. The following example gives a non-convergent Cauchy sequence in this space.
Example 13.89. Define the continuous functions fn : [0, 1] → R by

0 if 0 ≤ x ≤ 1/2,

fn (x) = n(x − 1/2) if 1/2 < x < 1/2 + 1/n,


1
if 1/2 + 1/n ≤ x ≤ 1.

292

13. Metric, Normed, and Topological Spaces

If n > m, we have
1/2+1/m

fn − fm

1

|fn − fm | ≤

=
1/2

since |fn − fn | ≤ 1. Thus, fn − fm 1 < sequence with respect to the one-norm.
We claim that if f − fn have to be

1
,
m

for all m, n > 1/ , so (fn ) is a Cauchy

→ 0 as n → ∞ where f ∈ C([0, 1]), then f would

1

0
1

f (x) =

if 0 ≤ x ≤ 1/2, if 1/2 < x ≤ 1,

which is discontinuous at 1/2, so (fn ) does not have a limit in (C([0, 1]), ·
To prove the claim, note that if f − fn
1/2

1

→ 0, then

1/2

|f | =

1 ).

|f | = 0 since

1

|f − fn | ≤

0

1/2
0

0

|f − fn | → 0,
0

and Proposition 11.42 implies that f (x) = 0 for 0 ≤ x ≤ 1/2. Similarly, for every
1
0 < < 1/2, we get that 1/2+ |f − 1| = 0, so f (x) = 1 for 1/2 < x ≤ 1.
The sequence (fn ) is not uniformly Cauchy since fn − fm ∞ → 1 as n →
∞ for every m ∈ N, so this example does not contradict the completeness of
(C([0, 1]), · ∞ ).
The ∞ -norm and the 1 -norm on the finite-dimensional space Rn are equivalent, but the sup-norm and the one-norm on C([a, b]) are not. In one direction, we have b

|f | ≤ (b − a) · sup |f |,
[a,b]

a

so f 1 ≤ (b − a) f ∞ , and f − fn ∞ → 0 implies that f − fn 1 → 0. As the following example shows, the converse is not true. There is no constant M such that f ∞ ≤ M f 1 for all f ∈ C([a, b]), and f − fn 1 → 0 does not imply that f − fn ∞ → 0.
Example 13.90. For n ∈ N, define the continuous function fn : [0, 1] → R by fn (x) =
Then fn



1/n
1

=
0

1

if 0 ≤ x ≤ 1/n, if 1/n < x ≤ 1.

= 1 for every n ∈ N, but fn so fn

1 − nx
0

1
(1 − nx) dx = x − nx2
2

1/n

=
0

1
,
2n

→ 0 as n → ∞.

Thus, unlike the finite-dimensional vector space Rn , an infinite-dimensional vector space such as C([a, b]) has many inequivalent norms and many inequivalent notions of convergence.
The incompleteness of C([a, b]) with respect to the one-norm suggests that we use the larger space R([a, b]) of Riemann integrable functions on [a, b], which

13.7. * The Minkowski inequality

293

includes some discontinuous functions. A slight complication arises from the fact b that if f is Riemann integrable and a |f | = 0, then it does not follows that f = 0, so f 1 = 0 does not imply that f = 0. Thus, · 1 is not, strictly speaking, a norm on R([a, b]). We can, however, get a normed space of equivalence classes of Riemann b integrable functions, by defining f, g ∈ R([a, b]) to be equivalent if a |f − g| = 0.
For instance, the function in Example 11.14 is equivalent to the zero-function.
A much more fundamental defect of the space of (equivalence classes of) Riemann integrable functions with the one-norm is that it is still not complete. To get a space that is complete with respect to the one-norm, we have to use the space
L1 ([a, b]) of (equivalence classes of) Lebesgue integrable functions on [a, b]. This is another reason for the superiority of the Lebesgue integral over the Riemann integral: it leads function spaces that are complete with respect to integral norms.
The inclusion of the smaller incomplete space C([a, b]) of continuous functions in the larger complete space L1 ([a, b]) of Lebesgue integrable functions is analogous to the inclusion of the incomplete space Q of rational numbers in the complete space R of real numbers.

13.7. * The Minkowski inequality
Inequalities are essential to analysis. Their proofs, however, may require considerable ingenuity, and there are often many different ways to prove the same inequality.
In this section, we complete the proof that the p -spaces are normed spaces by proving the triangle inequality given in Definition 13.25. This inequality is called the
Minkowski inequality, and it’s one of the most important inequalities in mathematics.
The simplest case is for the Euclidean norm with p = 2. We begin by proving the following fundamental Cauchy-Schwartz inequality.
Theorem 13.91 (Cauchy-Schwartz inequality). If (x1 , x2 , . . . , xn ), (y1 , y2 , . . . , yn ) are points in Rn , then n 1/2

n

i=1

1/2

n

x2 i xi yi ≤

2 yi i=1

.

i=1

Proof. Since | xi yi | ≤
|xi | |yi |, it is sufficient to prove the inequality for xi , yi ≥ 0. Furthermore, the inequality is obvious if x = 0 or y = 0, so we assume that at least one xi and one yi is nonzero.
For every α, β ∈ R, we have n 2

0≤

(αxi − βyi ) . i=1 Expanding the square on the right-hand side and rearranging the terms, we get that n

n

xi yi ≤ α 2

2αβ i=1 n

x2 + β 2 i i=1

2 yi . i=1 294

13. Metric, Normed, and Topological Spaces

We choose α, β > 0 to “balance” the terms on the right-hand side,
1/2

n
2
yi

α=

1/2

n

,

x2 i β=

i=1

.

i=1

Then division of the resulting inequality by 2αβ proves the theorem.
The Minkowski inequality for p = 2 is an immediate consequence of the CauchySchwartz inequality.
Corollary 13.92 (Minkowski inequality). If (x1 , x2 , . . . , xn ) and (y1 , y2 , . . . , yn ) are points in Rn , then
1/2

n
2

x2 i ≤

(xi + yi )

1/2

n

i=1

1/2

n
2
yi

+

i=1

.

i=1

Proof. Expanding the square in the following equation and using the CauchySchwartz inequality, we get x2 + 2 i (xi + yi )2 = n ≤

+2

2 yi i=1
1/2

n

x2 i ≤ i=1 1/2

n

x2 i i=1



i=1
1/2

n

x2 i 2 yi xi yi + i=1 i=1

i=1

n

n

n

n

n
2
yi

+

2 yi i=1

i=1
1/2

n

+

2
 .

i=1

Taking the square root of this inequality, we obtain the result.
To prove the Minkowski inequality for general 1 < p < ∞, we first define the
H¨lder conjugate p of p and prove Young’s inequality. o Definition 13.93. If 1 < p < ∞, then the H¨lder conjugate 1 < p < ∞ of p is o the number such that
1
1
+
= 1. p p
If p = 1, then p = ∞; and if p = ∞ then p = 1.
The H¨lder conjugate of 1 < p < ∞ is given explicitly by o p p =
.
p−1
Note that if 1 < p < 2, then 2 < p < ∞; and if 2 < p < ∞, then 1 < p < 2. The number 2 is its own H¨lder conjugate. Furthermore, if p is the H¨lder conjugate o o of p, then p is the H¨lder conjugate of p . o Theorem 13.94 (Young’s inequality). Suppose that 1 < p < ∞ and 1 < p < ∞ is its H¨lder conjugate. If a, b ≥ 0 are nonnegative real numbers, then o ab ≤

ap bp +
.
p p 13.7. * The Minkowski inequality

295

Moreover, there is equality if and only if ap = bp .
Proof. There are several ways to prove this inequality. We give a proof based on calculus. The result is trivial if a = 0 or b = 0, so suppose that a, b > 0. We write ap bp
− ab = bp
+
p p 1 a 1 ap
+ − p −1 p pb p b

.

The definition of p implies that p /p = p − 1, so that p a ap a p
= p −1
= p /p p b b b
Therefore, we have ap bp tp 1 a +
− ab = bp f (t), f (t) =
+ − t, t = p −1 . p p p p b The derivative of f is f (t) = tp−1 − 1.
Thus, for p > 1, we have f (t) < 0 if 0 < t < 1, and Theorem 8.36 implies that f (t) is strictly decreasing; moreover, f (t) > 0 if 1 < t < ∞, so f (t) is strictly increasing. It follows that f has a strict global minimum on (0, ∞) at t = 1. Since
1
1 f (1) = + − 1 = 0, p p we conclude that f (t) ≥ 0 for all 0 < t < ∞, with equality if and only if t = 1.
Furthermore, t = 1 if and only if a = bp −1 or ap = bp . It follows that ap bp
+
− ab ≥ 0 p p for all a, b ≥ 0, with equality if and only ap = bp , which proves the result.
For p = 2, Young’s inequality reduces to the more easily proved inequality in
Proposition 2.8.
Before continuing, we give a scaling argument which explains the appearance of the H¨lder conjugate in Young’s inequality. Suppose we look for an inequality o of the form ab ≤ M ap + N aq for all a, b ≥ 0 for some exponents p, q and some constants M , N . Any inequality that holds for all positive real numbers must remain true under rescalings. Rescaling a → λa, b → µb in the inequality (where λ, µ > 0) and dividing by λµ, we find that it becomes ab ≤

µq−1 λp−1 M ap +
N bq . µ λ

We take µ = λp−1 to make the first scaling factor equal to one, and then the inequality becomes ab ≤ M ap + λr N bq ,

r = (p − 1)(q − 1) − 1.

If the exponent r of λ is non-zero, then we can violate the inequality by taking λ sufficiently small (if r > 0) or sufficiently large (if r < 0), since it is clearly

296

13. Metric, Normed, and Topological Spaces

impossible to bound ab by ap for all b ∈ R. Thus, the inequality can only hold if r = 0, which implies that q = p .
This argument does not, of course, prove the inequality, but it shows that the only possible exponents for which an inequality of this form can hold must satisfy q = p . Theorem 13.94 proves that such an inequality does in fact hold in that case provided 1 < p < ∞.
Next, we use Young’s inequality to deduce H´lder’s inequality, which is a geno eralization of the Cauchy-Schwartz inequality for p = 2.
Theorem 13.95 (H¨lder’s inequality). Suppose that 1 < p < ∞ and 1 < p < ∞ o is its H¨lder conjugate. If (x1 , x2 , . . . , xn ) and (y1 , y2 , . . . , yn ) are points in Rn , then o 1/p

n

n

1/p

n

p

p

|xi |

xi yi ≤

|yi |

i=1

i=1

.

i=1

Proof. We assume without loss of generality that xi , yi are nonnegative and x, y =
0. Let α, β > 0. Then applying Young’s inequality in Theorem 13.94 with a = αxi , b = βyi and summing over i, we get n n

αp xi yi ≤ αβ p i=1 xp i i=1

n

βp
+
p

p yi . i=1 Then, choosing
1/p

n p yi

α=

1/p

n

,

xp i β=

i=1

i=1

to “balance” the terms on the right-hand side, dividing by αβ, and using the fact that 1/p + 1/p = 1, we get H¨lder’s inequality. o Minkowski’s inequality follows from H¨lder’s inequality. o Theorem 13.96 (Minkowski’s inequality). Suppose that 1 < p < ∞ and 1 < p <
∞ is its H¨lder conjugate. If (x1 , x2 , . . . , xn ) and (y1 , y2 , . . . , yn ) are points in Rn , o then
1/p

n p |xi + yi |

1/p

n



i=1

|xi |

p

1/p

n

|yi |

+

i=1

p

.

i=1

Proof. We assume without loss of generality that xi , yi are nonnegative and x, y =
0. We split the sum on the left-hand side as follows: n n p p−1

|xi + yi | = i=1 |xi + yi | |xi + yi | i=1 n

n p−1 ≤

|xi | |xi + yi | i=1 |yi | |xi + yi |

+

p−1

.

i=1

By H¨lder’s inequality, we have o n

|xi | |xi + yi | i=1 1/p

n p−1 ≤

|xi | i=1 1/p

n

p

|xi + yi | i=1 (p−1)p

,

13.7. * The Minkowski inequality

297

and using the fact that p = p/(p − 1), we get n 1/p

n p−1 |xi | |xi + yi |



i=1

1−1/p

n

p

p

|xi |

|xi + yi |

i=1

.

i=1

Similarly, n 1/p

n p−1 |yi | |xi + yi |



i=1

|yi |

|xi + yi |

1/p

n

p

p

|xi + yi | ≤ 

|xi | i=1 Fianlly, dividing this inequality by (

.

i=1

Combining these inequalities, we obtain

i=1

p

i=1

n

1−1/p

n

p

1/p

n

|yi |

+ i=1 

1−1/p

n

p

|xi + yi |



p

i=1

|xi + yi |p )1−1/p , we get the result.

.

Bibliography

[1] S. Abbott, Understanding Analysis, Springer-Verlag, New York, 2001.
[2] T. Apostol, Mathematical Analysis, Addison-Wesley, 1974.
[3] R. Dedekind, Was sind und was sollen die Zahlen?, 1888. Tranbslation: What are
Numbers and What Should They Be?, H. Pogorzelski, W. Ryan, W. Snyder, Research
Institute for Mathematics, (1995).
[4] P. Duren, Invitation to Classical Analysis, AMS, 2012.
[5] W. Dunham, The Calculus Gallery, Princeton University Press, 2005.
[6] T, W. K¨rner, A Companion to Analysis, AMS, 2004. o [7] J. E. Marsden and M. J. Hoffman, Elementary Classical Analysis, Macmillan, 1993.
[8] F. A. Medvedev, Scenes from the History of Real Functions, Birkh¨user, Basel, 1991. a [9] V. H. Moll, Numbers and Functions, AMS, Providence, 2012.
[10] Y. Moschovakis, Notes on Set Theory, 2nd ed., Springer, 2006.
[11] B. Riemann, Collected Works, Translated by R. Baker, C. Christenson, and H. Orde,
Kendrick Press, 2004.
[12] K. A. Ross, Elementary Analysis, Springer, 2010.
[13] J. M. Steele, The Cauchy-Schwarz Master Class, Cambridge University Press, 2004.
[14] R. Strichartz, The Way of Analysis, 2000.
[15] W. Rudin, Principles of Mathematical Analysis, McGraw-Hill, 1976.
[16] E. T Whittaker and G. N. Watson, A Course of Modern Analysis, Cambridge University Press, 1927.

299

Similar Documents