Introduction

These are the lecture notes and class materials for Math 818 Introduction to Modern Algebra II in Spring 2026. This is the second part of a two-part course on groups, rings, modules, and fields. In this second half, we will discuss module theory, with a focus on the structure theory of modules over PIDs and applications to linear algebra, field theory, and Galois theory. A major goal of this course is to prepare graduate students for the PhD qualifying exam in algebra.

The lecture notes for Math 817 Introduction to Modern Algebra I can be found here: Math 817 lecture notes.

The lecture notes draw heavily on Eloísa Grifo’s Algebra Notes (PDF, opens in new tab) , which in turn draw from earlier lecture notes of Mark Walker and Alexandra Seceleanu. The textbook Abstract Algebra by Dummit and Foote is a good resource covering similar material.

Modules

11. An Introduction to Modules

Modules are a generalization of the concept of a vector space to any ring of scalars in place of a field. While vector spaces are key examples of modules, many of the basic facts we are used to from linear algebra are often a little more subtle over a general ring.

11.1 Definitions and first examples

Definition 11.1.1

Let \(R\) be a ring with \(1 \neq 0\). A left \(R\)-module is an abelian group \((M,+)\) together with an action \(R \times M \to M\) of \(R\) on \(M\), written as \((r,m) \mapsto rm\), such that for all \(r,s \in R\) and \(m,n \in M\) we have the following:

\((r + s)m = rm + sm\),
\((rs)m = r(sm)\),
\(r(m + n) = rm + rn\), and
\(1m = m\).

A right \(R\)-module is an abelian group \((M,+)\) together with an action of \(R\) on \(M\), written as \(M \times R \to M, (m,r)\mapsto mr\), such that for all \(r,s \in R\) and \(m,n \in M\) we have

\(m(r + s) = mr + ms\),
\(m(rs) = (mr)s\),
\((m + n)r = mr + nr\), and
\(m1 = m\).

By default, we will be studying left \(R\)-modules. To make the writing less heavy, we will sometimes say \(R\)-module rather than left \(R\)-module whenever there is no ambiguity.

Remark 11.1.2

If \(R\) is a commutative ring, then any left \(R\)-module \(M\) may be regarded as a right \(R\)-module by setting \(m r := r m\). Likewise, any right \(R\)-module may be regarded as a left \(R\)-module. Thus for commutative rings, we just refer to modules, and not left or right modules.

Lemma 11.1.3 (Arithmetic in modules)

Let \(R\) be a ring with \(1_R \neq 0_R\) and \(M\) be an \(R\)-module. Then \(0_Rm = 0_M\) and \((-1_R)m = -m\) for all \(m \in M\).

Proof (of Lemma 11.1.3)

Let \(m \in M\). Then \[ 0_R m = (0_R + 0_R) m = 0_Rm + 0_Rm. \] Since \(M\) is an abelian group, the element \(0_Rm\) has an additive inverse, \(-0_Rm\), so adding it on both sides we see that \[ 0_M = 0_Rm. \] Moreover, \[ m + (-1_R)m = 1_R m + (-1_R)m = (1_R - 1_R)m = 0_R m = 0_M, \] so \((-1_R)m = -m\).

Typically, one first encounters modules in an undergraduate linear algebra course: the vector spaces from linear algebra are modules over fields. Later we will see that vector spaces are much simpler modules than modules over other rings. So while one might take linear algebra and vector spaces as an inspiration for what to expect from a module, be warned that this perspective can often be deceiving.

Definition 11.1.4

Let \(F\) be a field. A vector space over \(F\) is an \(F\)-module.

We will see more about vector spaces soon. Note that many of the concepts we will introduce have special names in the case of vector spaces. Here are some other important examples:

Lemma 11.1.5

Let \(M\) be a set with a binary operation \(+\). Then

\(M\) is an abelian group if and only if \(M\) is a \(\mathbb{Z}\)-module.
\(M\) is an abelian group such that \(nm:=\underbrace{ m + \cdots + m}_{n \textrm{ times}}=0_M\) for all \(m\in M\) if and only if \(M\) has a \(\mathbb{Z}/n\)-module structure.

Proof (of Lemma 11.1.5)

First, we show 1). If \(M\) is a \(\mathbb{Z}\)-module, then \((M,+)\) is an abelian group by definition of module. Conversely, if \((M,+)\) is an abelian group then there is a unique \(\mathbb{Z}\)-module structure on \(M\) given by the formulas below. The uniqueness of the \(\mathbb{Z}\) action follows from the identities below in which the right hand side is determined only by the abelian group structure of \(M\). The various identities follow from the axioms of a module: \[ \begin{cases} i \cdot m = (\underbrace{1 + \cdots + 1}_i) \cdot m = \underbrace{1 \cdot m + \cdots +1 \cdot m}_i = \underbrace{ m + \cdots + m}_i & \text{ if } i>0\\ 0 \cdot m = 0_M & \\ i \cdot m = - (-i) \cdot m = - (\underbrace{m + \cdots + m}_{-i}) & \text{ if } i<0. \end{cases} \] We leave it as an exercise to check that this \(\mathbb{Z}\)-action really satisfies the module axioms.

Now we show 2). If \(M\) is a \(\mathbb{Z}/n\) module, then \((M,+)\) is an abelian group by definition, and \(nm= \underbrace{ m + \cdots + m}_n=\underbrace{[1]_n \cdot m + \cdots +[1]_n \cdot m}_n=[0]_nm=0_M\).

Conversely, there is a unique \(\mathbb{Z}/n\)-module structure on \(M\) given by the formulas below, which are analogous to the ones above: \[ \begin{cases} [i]_n \cdot m = (\underbrace{[1]_n+ \cdots + [1]_n}_i) \cdot m = \underbrace{[1]_n \cdot m + \cdots +[1]_n \cdot m}_i= \underbrace{ m + \cdots + m}_i & \text{ if } i>0\\ 0 \cdot m = 0_M &\\ [i]_n \cdot m = - (-[i]_n) \cdot m = - (\underbrace{m + \cdots + m}_{-i}) & \text{ if } i<0. \end{cases} \] These formulas are well-defined, meaning they are independent of the choice of representative for \([i]_n\), because of the assumption that \(nm=0_M\). Again checking that this \(\mathbb{Z}/n\)-action really satisfies the module axioms is left as an exercise.

The proposition above says in particular that any group of the form \[ G=\mathbb{Z}^\ell\times \mathbb{Z}/d_1\times \dots\times \mathbb{Z}/d_m \] is a \(\mathbb{Z}\)-module, and if \(\ell=0, m \geqslant 1\) and \(d_i \mid n\) for \(1 \leqslant i \leqslant m\) then \(G\) is also a \(\mathbb{Z}/n\)-module. In particular, the Klein group is a \(\mathbb{Z}/2\)-module.

In contrast to vector spaces, for \(M\) a module over a ring \(R\), it can happen that \(rm=0\) for some \(r \in R\) and \(m \in M\) such that \(r \neq 0_R\) and \(m \neq 0_M\). For example, in the Klein group \(K_4\) viewed as a \(\mathbb{Z}\)-module we have \(2m=0\) for all \(m \in K_4\).

Example 11.1.6

The trivial \(R\)-module is \(0=\{0\}\) with \(r0=0\) for any \(r\in R\).
If \(R\) is any ring, then \(R\) is a left and right \(R\)-module via the action of \(R\) on itself given by its internal multiplication.
If \(I\) is a left (respectively, right) ideal of a ring \(R\) then \(I\) is a left (respectively, right) \(R\)-module with respect to the action of \(R\) on \(I\) by internal multiplication.
If \(R\) is a subring of a ring \(S\), then \(S\) is a (left or right) \(R\)-module with respect to the action of \(R\) on \(S\) by internal multiplication in \(S\).
If \(R\) is a ring with \(1 \neq 0\), then \(R[x_1,\ldots,x_n]\) is an \(R\)-module for any \(n \geqslant 1\). This is a special case of (4).
The standard free module over \(R\) of rank \(n\) is the set \[ R^n=\left\{\begin{bmatrix} r_1\\ \vdots\\r_n \end{bmatrix} \mid r_i\in R, 1 \leqslant i \leqslant n\right\} \] with componentwise addition and multiplication by elements of \(R\), as follows: \[ \begin{bmatrix} r_1\\ \vdots\\r_n \end{bmatrix} +\begin{bmatrix} r'_1\\ \vdots\\r'_n \end{bmatrix} =\begin{bmatrix} r_1+r'_1\\ \vdots\\r_n +r'_n\end{bmatrix} \text{ and } r\begin{bmatrix} r_1\\ \vdots\\r_n \end{bmatrix}=\begin{bmatrix} rr_1\\ \vdots\\ rr_n \end{bmatrix}. \] We will often write the elements of \(R^n\) as \(n\)-tuples \((r_1, \ldots, r_n)\) instead. Notice that \(R\) is the free \(R\)-module of rank \(1\).
More generally, given a collection of \(R\)-modules \(\{ A_i \}\), the abelian groups \[ \prod_i A_i = \{ (a_i)_i \mid a_i \in A_i \}\] and \[ \bigoplus_i A_i = \{ (a_i)_i \mid a_i \in A_i, a_i = 0 \textrm{ for all but finitely many}\ i \,\} \] are \(R\)-modules with the \(R\)-action \(r(a_i) := (ra_i)\). They are called the direct product and direct sum, respectively.
If \(R\) is a ring, let \( M_n(R)\) denote the ring of \(n \times n\) matrices with entries in \(R\). Then the set of \(n\times 1\) column vectors with entries in \(R\) is a left \(M_n(R)\)-module, and the set of \(1\times n\) row vectors with entries in \(R\) is a right \(M_n(R)\)-module, with the usual coordinatewise vector-plus-vector addition and usual matrix-times-vector multiplication.

11.2 Submodules and restriction of scalars

Definition 11.2.1

Let \(R\) be a ring and let \(M\) be a left \(R\)-module. An \(R\)-submodule of \(M\) is a subset \(N\subseteq M\) that is an \(R\)-module with the same addition and \(R\)-action as \(M\).

Exercise 11.2.2

Show that a subset \(N\subseteq M\) is a submodule if and only if it is a subgroup under addition of \(M\) satisfying \(rn \in N\) for all \(r \in R\) and \(n \in N\).

Example 11.2.3

Every \(R\)-module \(M\) has two trivial submodules: \(M\) itself and the zero module \(0 = \{ 0_M \}\). A submodule \(N\) of \(M\) is nontrivial if \(N \neq M\) and \(N \neq 0\).

Lemma 11.2.4 (Submodule tests)

Let \(R\) be a ring with \(1 \neq 0\) and let \(M\) be a left \(R\)-module. Let \(N\) be a nonempty subset of \(M\).

(Two-step test) \(N\) is an \(R\)-submodule of \(M\) if and only if \(n+n' \in N\) and \(rn \in N\) for all \(n,n'\in N\) and \(r\in R\).
(One-step test) \(N\) is an \(R\)-submodule of \(M\) if and only if \(rn+n'\in N\) for all \(n,n'\in N\) and \(r\in R\).

Proof (of Lemma 11.2.4)

Exercise.

Example 11.2.5

Let \(R\) be a ring and let \(M\) be a subset of \(R\). Then \(M\) is a left (respectively, right) \(R\)-submodule of \(R\) if and only if \(M\) is a left (respectively, right) ideal of R.

Exercise 11.2.6

Let \(R\) be a ring and let \(A\) and \(B\) be submodules of an \(R\)-module \(M\). Then the sum of \(A\) and \(B\), \[ A + B := \{ a + b \mid a \in A, b \in B \}, \] is a submodule of \(M\). If \(\{A_i \ | \ i\in I\}\) is a collection of submodules of \(M\), then the intersection \( \bigcap_{i\in I} A_i\) is a submodule of \(M\).

Exercise 11.2.7

Let \(R\) be a commutative ring , let \(I\) be an ideal of \(R\) and let \(M\) be an \(R\)-module. Show that \[ IM := \left\{\sum_{k=1}^n j_km_k \mid n \geqslant 0, j_k \in I, m_k \in M \text{ for } 1 \leqslant k \leqslant n \right\} \] is a submodule of \(M\).

Example 11.2.8

When \(R\) is a field, the submodules of a vector space \(V\) are precisely the subspaces of \(V\). When \(R = \mathbb{Z}\), then the class of \(R\)-modules is simply the class of all abelian groups. The submodules of a \(\mathbb{Z}\)-module \(M\) coincide with the subgroups of the abelian group \(M\).

Given an \(R\)-module \(M\), the ring \(R\) is sometimes referred to as the ring of scalars, by analogy to the vector space case. Given an action of a ring of scalars on a module, we can sometimes produce an action of a different ring of scalars on the same set, producing a new module structure.

Lemma 11.2.9 (Restriction of scalars)

Let \(\phi\!: R \to S\) be a ring homomorphism. Any left \(S\)-module \(M\) may be regarded via restriction of scalars as a left \(R\)-module with \(R\)-action defined by \(r m := \phi(r)m\) for any \(m\in M\). In particular, if \(R\) is a subring of a ring \(S\), then any left \(S\)-module \(M\) may be regarded via restriction of scalars as a left \(R\)-module with \(R\)-action defined by the action of the elements of \(R\) viewed as elements of \(S\).

Proof (of Lemma 11.2.9)

Let \(r,s \in R\) and \(m,n \in M\). One checks that the axioms in the definition of a module hold for the given action using properties of ring homomorphisms. For example: \[ (r+s)m=\phi(r + s)m= (\phi(r)+\phi(s))m=\phi(r)m + \phi(s)m=rm+sm. \] The remaining properties are left as an exercise.

Note that the second module structure on \(M\) obtained via restriction of scalars is induced by the original module structure, so the two are related. In general, one can give different module structures on the same abelian group over different, possibly unrelated, rings.

Example 11.2.10

If \(I\) is an ideal of a ring \(R\), applying restriction of scalars along the quotient homomorphism \(q\!:R\to R/I\) tells us that any left \(R/I\)-module is also a left \(R\)-module. In particular, applying this to the \(R/I\)-module \(R/I\) makes \(R/I\) a left and right \(R\)-module by restriction of scalars along the quotient homomorphism. Thus the \(R\)-action on \(R/I\) is given by \[ r \cdot (a + I) := ra + I. \]

The next example explains why restriction of scalars is called a restriction.

Example 11.2.11

Let \(R\) be a subring of \(S\), and let \(i\!: R \to S\) be the inclusion map, which must by definition be a ring homomorphism. Applying restriction of scalars to an \(S\)-module \(M\) via \(i\) is the same as simply restricting our scalars to the elements of \(R\).

11.3 Module homomorphisms and isomorphisms

Definition 11.3.1

Given \(R\)-modules \(M\) and \(N\), an \(R\)-module homomorphism from \(M\) to \(N\) is a function \(f\!: M \to N\) such that for all \(r \in R\) and \(m,n \in M\) we have

\(f(m+n)=f(m)+f(n)\)
\(f(rm) = rf(m)\).

An \(R\)-module homomorphism is also called an \(R\)-linear map.

Remark 11.3.2

The condition \(f(m+n)=f(m)+f(n)\) says that \(f\) is a homomorphism of abelian groups, and the condition \(f(rm) = rf(m)\) says that \(f\) is \(R\)-linear, meaning that it preserves the \(R\)-action. Since \(f\) is a homomorphism of abelian groups, it follows that \(f(0) = 0\) must hold.

Definition 11.3.3

Let \(M\) and \(N\) be vector spaces over a field \(F\). A linear transformation from \(M\) to \(N\) is an \(F\)-module homomorphism \(M \to N\).

Example 11.3.4

Let \(R\) be a commutative ring and \(M\) be an \(R\)-module. For each \(r \in R\), the multiplication map \(\mu_r\!: M \to M\) given by \(\mu_r(m) = rm\) is a homomorphism of \(R\)-modules: indeed, by the definition of \(R\)-module we have \[ \mu_r(m+n) = r(m+n) = rm+rn = \mu_r(m) + \mu_r(n), \] and \[ \mu_r(sm) = r(sm) = (rs)m = (sr)m = s (rm) = s \mu_r(m). \] Note that \(R\) is not commutative, the left multiplication map \(\mu_r\!:M\to M\) is not a homomorphism of (left) \(R\)-modules.

Definition 11.3.5

An \(R\)-module homomorphism \(h\!: M \to N\) is an \(R\)-module isomorphism if there is an \(R\)-module homomorphism \(g: N \to M\) such that \(h \circ g = \mathrm{id}_N\) and \(g \circ h = \mathrm{id}_M\). We say \(M\) and \(N\) are isomorphic, denoted \(M \cong N\), if there exists an isomorphism \(M \to N\).

To check that an \(R\)-module homomorphism \(f\!: M \to N\) is an isomorphism, it is sufficient to check that it is bijective.

Exercise 11.3.6

Let \(f\!: M \to N\) be a homomorphism of \(R\)-modules. Show that if \(f\) is bijective, then its set-theoretic inverse \(f^{-1}\!: N \to M\) is an \(R\)-module homomorphism. Therefore, every bijective homomorphism of \(R\)-modules is an isomorphism.

One should think of a module isomorphism as a relabelling of the names of the elements of the module. If two modules are isomorphic, that means that they are essentially the same, up to renaming the elements.

Definition 11.3.7

Let \(f\!: M \to N\) be a homomorphism of \(R\)-modules. The kernel of \(f\) is \[ \ker(f) := \{ m \in M \mid f(m) = 0 \}. \] The image of \(f\), denoted \(\mathrm{im}(f)\) or \(f(M)\), is \[ \mathrm{im}(f) := \{ f(m) \mid m \in M \}. \]

Exercise 11.3.8

Let \(R\) be a ring with \(1 \neq 0\), let \(M\) be an \(R\)-module, and let \(N\) be an \(R\)-submodule of \(M\). Then the inclusion map \(i\!: N \to M\) is an \(R\)-module homomorphism.

Exercise 11.3.9

If \(f\!: M \to N\) is an \(R\)-module homomorphism, then \(\ker(h)\) is an \(R\)-submodule of \(M\) and \(\mathrm{im}(f)\) is an \(R\)-submodule of \(N\).

Definition 11.3.10

Let \(R\) be a ring and let \(M\) and \(N\) be \(R\)-modules. Then \(\mathrm{Hom}_R(M,N)\) denotes the set of all \(R\)-module homomorphisms from \(M\) to \(N\), and \(\mathrm{End}_R(M)\) denotes the set \(\mathrm{Hom}_R(M,M)\). We call \(\mathrm{End}(M)\) the endomorphism ring of \(M\), and elements of \(\mathrm{End}(M)\) are called endomorphisms of \(M\).

The endomorphism ring of an \(R\)-module \(M\) is called that because it is a ring, with multiplication given by composition of endomorphisms, \(0\) given by the zero map (the constant equal to \(0\)), and \(1\) given by the identity map. However, two homomorphisms from \(M\) to \(N\) are not composable unless \(M = N\), so \(\mathrm{Hom}_R(M,N)\) is not a ring.

When \(R\) is commutative, \(\mathrm{Hom}_R(M,N)\) is, however, an \(R\)-module; let us describe its \(R\)-module structure. Given \(f, g \in \mathrm{Hom}_R(M,N)\), \(f+g\) is the map defined by \[ (f+g)(m) := f(m) + g(m), \] and given \(r \in R\) and \(f \in \mathrm{Hom}_R(M,N)\), \(r \cdot f\) is the \(R\)-module homomorphism defined by \[ (r \cdot f) (m) := r \cdot f(m) = f(rm). \] The zero element of \(\mathrm{Hom}_R(M,N)\) is the zero map, the constant equal to \(0_N\).

Lemma 11.3.11

Let \(M\) and \(N\) be \(R\)-modules over a commutative ring \(R\). Then the addition and multiplication by scalars defined above make \(\mathrm{Hom}_R(M,N)\) an \(R\)-module.

Proof (of Lemma 11.3.11)

There are many things to check, including:

The addition and the \(R\)-action are both well-defined: given \(f,g\in \mathrm{Hom}_R(M,N)\) and \(r\in R\), we always have \(f+g, rf\in \mathrm{Hom}_R(M,N)\).
The axioms of an \(R\)-module are satisfied for \(\mathrm{Hom}_R(M,N)\).

We leave the details as exercises.

We will see later that for an \(n\)-dimensional vector space \(V\) over a field \(F\), there is an isomorphism of vector spaces \(\mathrm{Hom}_F(V)\cong M_n(F)\). This says that every linear transformation \(T:V\to V\) corresponds to some \(n\times n\) matrix. However, the story for general \(R\)-modules is a lot more complicated.

Lemma 11.3.12

For any commutative ring \(R\) and any \(R\)-module \(M\) there is an isomorphism of \(R\)-modules \(\mathrm{Hom}_R(R,M)\cong M\).

Proof (of Lemma 11.3.12)

Let \(f\!:M\to \mathrm{Hom}_R(R,M)\) be given for each \(m\in M\) by \(f(m)=\phi_m\) where \(\phi_m\) is the map defined by \(\phi_m(r)=rm\) for all \(r\in R\). Now we have many things to check:

\(f\) is well-defined, meaning that for any \(m\in M\), its image \(f(m) = \phi_m\) is an element of \(\mathrm{Hom}_R(R,M)\), since \[ \phi_m(r_1+r_2)=(r_1+r_2)m=r_1m+r_2m=\phi_m(r_1)+\phi_m(r_2) \] \[ \phi_m(r_1r_2)=(r_1r_2)m=r_1(r_2m)=r_1\phi_m(r_2) \] for all \(r_1,r_2\in R\).
\(f\) is an \(R\)-module homomorphism, since \[ \phi_{m_1+m_2}(r)=r(m_1+m_2)=rm_1+rm_2=\phi_{m_1}(r)+\phi_{m_2}(r) \] \[ \phi_{r'm}(r)=r(r'm)=(rr')m=r'(rm)=r'\phi_{m}(r) \]
\(f\) is injective, since \(\phi_m=\phi_{m'}\) implies in particular that \(\phi_m(1_R)=\phi_{m'}(1_R)\), which by definition of \(\phi_{-}\) means that \(m=m'\).
\(f\) is surjective, since for \(\psi \in \mathrm{Hom}_R(R,M)\) we have \(\psi(r)=\psi(r1_R)=r\psi(1_R)\) for all \(r\in R\), so \(\psi=\phi_{\psi(1_R)}\).

This shows that \(f\) is an \(R\)-module isomorphism.

Remark 11.3.13

Let \(R\) be a commutative ring with \(1 \neq 0\) and let \(M\) be an \(R\)-module. Then \(M\) is also an \(\mathrm{End}_R(M)\)-module with the action \(\phi m=\phi(m)\) for any \(\phi\in \mathrm{End}_R(M)\), \(m\in M\).

Definition 11.3.14

Let \(R\) be a ring, let \(M\) be an \(R\)-module, and let \(N\) be a submodule of \(M\). The quotient module \(M/N\) is the quotient group \(M/N\) with R action defined by \[ r(m + N) := rm + N \] for all \(r \in R\) and \(m + N \in M/N\).

Lemma 11.3.15

Let \(R\) be a ring, let \(M\) be an \(R\)-module, and let \(N\) be a submodule of \(M\). The quotient module \(M/N\) is an \(R\)-module, and the quotient map \(q\!: M \to M/N\) is an \(R\)-module homomorphism with kernel \(\ker(q) = N\).

Proof (of Lemma 11.3.15)

Among the many things to check here, we will only check the well-definedness of the \(R\)-action on \(M\), and leave the others as exercises. To check well-definedness, consider \(m+N=m'+N\). Then \(m-m'\in N\), so \(r(m-m')\in N\) by the definition of submodule. This gives that \(rm-rm'\in N\), hence \(rm+N=rm'+N\).

Definition 11.3.16

Given an \(R\)-module \(M\) and a submodule \(N\) of \(M\), the map \(q\!: M \to M/N\) is the canonical quotient map, or simply the canonical map from \(M\) to \(N\).

Example 11.3.17

If \(R\) is a field, quotient modules are the same thing as quotient vector spaces. When \(R = \mathbb{Z}\), recall that \(\mathbb{Z}\)-modules are the same as abelian groups. Quotients of \(\mathbb{Z}\)-modules coincide with quotients of abelian groups.

Theorem 11.3.18 (Universal mapping property for quotient modules)

Let \(N\) be a submodule of \(M\), let \(T\) be an \(R\)-module, and let \(f: M \to T\) be an \(R\)-module homomorphism. If \(N \subseteq \ker f\), then the function \[ \begin{aligned} M/N &\longrightarrow & T \\ m+N &\longmapsto& f(m) \end{aligned} \] is a well-defined \(R\)-module homomorphism. In fact, \(\overline{f}: M/N \to T\) is the unique \(R\)-module homomorphism such that \(\overline{f} \circ q = f\), where \(q\!: M \to M/N\) denotes the canonical map.

Proof (of Theorem 11.3.18)

By 817, we already know that \(\overline{f}\) is a well-defined homomorphism of groups under \(+\) and that it is the unique one such that \(\overline{f} \circ q = f\). It remains only to show \(\overline{f}\) is an \(R\)-linear map: \[ \overline{f}(r (m +N)) = \overline{f} (rm + N) = f(rm) = r f(m) = r \overline{f}(m + N). \] where the third equation uses that \(f\) preserves scaling.

Theorem 11.3.19 (First Isomorphism Theorem)

Let \(N\) be an \(R\)-module and let \(h: M \to N\) be an \(R\)-module homomorphism. Then \(\ker(h)\) is a submodule of \(M\) and there is an \(R\)-module isomorphism \(M/\ker(h) \cong \mathrm{im}(h)\).

Proof (of Theorem 11.3.19)

If we forget the multiplication by scalars in \(R\), by the First Isomorphism Theorem for Groups, we know that there is an isomorphism of abelian groups under \(+\), given by \[ \begin{aligned} M/\mathrm{ker}(h) &\longrightarrow & \mathrm{im}(h) \\ m+\mathrm{ker}(h) &\longmapsto& h(m). \end{aligned} \] It remains only to show this map preserves multiplication by scalars. And indeed: \[ \begin{aligned} \overline{h}(r(m+\ker(h))) & = \overline{h}(rm+\ker(h)) & \textrm{by definition of the \(R\)-action on } M/\ker(h)\\ & = h(rm) & \textrm{by definition of } \overline{h} \\ & = rh(m) & \textrm{ since \(h\) is an \(R\)-module homomorphism} \\ & = r \overline{h}(m+ \ker h) & \textrm{by definition of } h. \end{aligned} \]

Theorem 11.3.20 (Diamond Isomorphism Theorem)

Let \(A\) and \(B\) be submodules of \(M\), and let \(A + B = \{a+b \mid a \in A, b \in B\}\). Then \(A + B\) is a submodule of \(M\), \(A \cap B\) is a submodule of \(A\), and there is an \(R\)-module isomorphism \((A + B)/B \cong A/(A \cap B)\).

Proof (of Theorem 11.3.20)

We know that \(A+B\) and \(A \cap B\) are submodules of \(M\). By the Diamond Isomorphism Theorem for Groups, there is an isomorphism of abelian groups \[\begin{aligned} A/(A \cap B) &\cong (A+B)/B \\ a + (A\cap B) &\mapsto a + B\end{aligned} \] It remains only to show \(h\) preserves multiplication by scalars: \[ h(r(a+(A \cap B))) = h(ra + A \cap B) = ra + B = r(a +B) = rh(a + (A \cap B)). \]

Theorem 11.3.21 (Cancelling Isomorphism Theorem)

Let \(A\) and \(B\) be submodules of \(M\) with \(A \subseteq B\). Then there is an \(R\)-module isomorphism \((M/A)/(B/A) \cong M/B\).

Proof (of Theorem 11.3.21)

From 817, we know that \(B/A\) is a subgroup of \(M/A\) under \(+\). Given \(r \in R\) and \(b +A \in B/A\) we have \(r(b+A) = rb + A\) which belongs to \(B/A\) since \(rb \in B\). This proves \(B/A\) is a submodule of \(M/A\). By the Cancelling Isomorphism Theorem for Groups, there is an isomorphism of abelian groups \[\begin{aligned} (M/A)/(B/A) &\cong M/B \\ (m+A) + B/A &\mapsto m + B \end{aligned} \] and it remains only to show this map is \(R\)-linear: \[ \begin{aligned} h(r((m+A) + B/A)) = & h(r(m+A) + B/A) = h((rm + A) + B/A) \\ & = rm + B = r(m +B)\\ & = r h((m+A) + B/A). \end{aligned} \]

Theorem 11.3.22 (Lattice Isomorphism Theorem)

Let \(R\) be a ring, let \(N\) be a R-submodule of an \(R\)-module \(M\), and let \(q\!: M \to M/N\) be the quotient map. Then the function \[ \begin{aligned} \{\, R\text{-submodules of } M \text{ containing } N \,\} &\longrightarrow \{\, R\text{-submodules of } M/N \,\} \\ K &\longmapsto K/N . \end{aligned} \] is a bijection, with inverse defined by \[ \Psi^{-1}(T) := q^{-1}(T) = \{ a \in M \mid a+N \in T \} \] for each \(R\)-submodule \(T\) of \(M/N\). Moreover, \(\Psi\) and \(\Psi^{-1}\) preserve sums and intersections of submodules.

Proof (of Theorem 11.3.22)

From 817, we know there is a bijection between the set of subgroups of \(M\) and that contain \(N\) and subgroups of the quotient group \(M/N\), given by the same map \(\Psi\). We just need to prove that these maps send submodules to submodules. If \(K\) is a submodule of \(M\) containing \(N\), then by the Cancelling Isomorphism Theorem we know that \(K/N\) is a submodule of \(M/N\). If \(T\) is a submodule of \(M/N\), then \(\pi^{-1}(T)\) is an abelian group, by 817. For \(r \in R\) and \(m \in \pi^{-1}(T)\), we have \(\pi(m) \in T\), and hence \(\pi(rm) = r\pi(m) \in T\) too, since \(T\) is a submodule. This proves \(\pi^{-1}(T)\) is a submodule.

11.4 Module generators, bases and free modules

Definition 11.4.1

Let \(M\) be an \(R\)-module. A linear combination of finitely many elements \(a_1,\dots,a_n\) of \(M\) is an element of \(M\) of the form \(r_1m_1 + \dots + r_nm_n\) for some \(r_1,\ldots,r_n \in R\).

Definition 11.4.2

Let \(R\) be a ring with \(1 \neq 0\) and let \(M\) be an \(R\)-module. For a subset \(A\) of \(M\), the submodule of \(M\) generated by \(A\) is \[ RA := \{r_1a_1 + \dots + r_na_n \mid n \geq 0, r_i \in R, a_i\in A\}. \] We say \(M\) is generated by \(A\) if \(M=RA\). If \(M\) is an \(F\)-vector space, we may say that \(M\) is spanned by a set \(A\) instead of generated by \(A\).

A module M is finitely generated if there is a finite subset \(A\) of \(M\) that generates \(M\). If \(A = {a}\) has a single element, the module \(RA= Ra\) is called cyclic.

Exercise 11.4.3

Let \(M\) be an \(R\)-module and let \(A \subseteq M\). Then \(RA\) is the smallest submodule of \(M\) containing \(A\), that is \[ RA \quad = \bigcap\limits_{A\subseteq N, N \text{ submodule of }M} N. \]

Exercise 11.4.4

Being finitely generated and being cyclic are \(R\)-module isomorphism invariants.

Example 11.4.5

Let \(R\) be a ring with \(1 \neq 0\).

\(R = R1\) is cyclic.
\(R \oplus R\) is generated by \(\{(1,0),(0,1)\}\).
\(R[x]\) is generated as an \(R\)-module by the set \(\{1,x,x^2,\ldots, x^n,\ldots\}\) of monic monomials in the variable \(x\).
Let \(M = \mathbb{Z}[x,y]\). \(M\) is generated by
- \(\{1,x,y\}\) as a ring,
- \(\{1,y,y^2,\ldots, y^n,\ldots\}\) as an \(\mathbb{Z}[x]\)-module, and
- \(\{x^iy^j \mid i,j \in \mathbb{Z}_{\geqslant 0}\}\) as a group (\(\mathbb{Z}\)-module).

Lemma 11.4.6

Let \(R\) be a ring with \(1 \neq 0\), let \(M\) be an \(R\)-module, and let \(N\) be an \(R\)-submodule of \(M\).

If \(M\) is finitely generated as an \(R\)-module, then so is \(M/N\).
If \(N\) and \(M/N\) are finitely generated as \(R\)-modules, then so is \(M\).

Proof (of Lemma 11.4.6)

The proof of (2) will be a problem set question. To show (1), note that if \(M=RA\) then \(M/N=R\bar{A}\), where \(\bar{A}=\{a+N \mid a\in A\}\).

Definition 11.4.7

Let \(M\) be an \(R\)-module and let \(A\) be a subset of \(M\). The set \(A\) is linearly independent if whenever \(r_1,\ldots,r_n \in R\) and \(a_1,\ldots ,a_n\) are distinct elements of \(A\) satisfying \(r_1a_1 + \dots + r_na_n = 0\), then \(r_1 = \dots = r_n = 0\). Otherwise \(A\) is linearly dependent.

Definition 11.4.8

A subset \(A\) of an \(R\)-module \(M\) is a basis of \(M\) if \(A\) is linearly independent and generates \(M\). An \(R\)-module M is a free \(R\)-module if \(M\) has a basis.

We will later see that over a field, every module is free. However, when \(R\) is not a field, there are \(R\)-modules that are not free; in fact, most modules are not free.

Example 11.4.9

Here are some examples of free modules:

If we think of \(R\) as a module over itself, it is free with basis \(\{1\}\).
The module \(R \oplus R\) is free with basis \(\{(1,0),(0,1)\}\).
The \(R\)-module \(R[x]\) is free, and \(\{1,x,x^2,\ldots, x^n,\ldots\}\) is a basis.
Let \(M = \mathbb{Z}[x,y]\). Then \(\{1,y,y^2,\ldots, y^n,\ldots\}\) is a basis for the \(\mathbb{Z}[x]\)-module \(M\), and \(\{x^iy^j \mid i,j \in \mathbb{Z}_{\geqslant 0}\}\) is a basis for the \(\mathbb{Z}\)-module \(M.\)

Example 11.4.10

\(\mathbb{Z}/2\) is not a free \(\mathbb{Z}\)-module. Indeed suppose that \(A\) is a basis for \(\mathbb{Z}/2\) and \(a\in A\). Then \(2a=0\) so \(A\) cannot be linearly independent, a contradiction.

Lemma 11.4.11

If \(A\) is a basis of \(M\) then every nonzero element \(0\neq m\in M\) can be written uniquely as \(m=r_1a_1 + \dots + r_na_n\) with \(a_i\) distinct elements of \(A\) and \(r_i\neq 0\).

Proof (of Lemma 11.4.11)

Suppose that if \(m\neq 0\) and \(A_1,A_2\) are finite subsets of \(A\) such that \[ m=\sum_{a\in A_1}r_aa=\sum_{b\in A_2}s_bb \] for some \(r_a, s_b \in R\). Then \[ \sum_{a\in A_1\cap A_2} (r_a-s_a)a+\sum_{a\in A_1\setminus A_2} r_aa-\sum_{a \in A_2\setminus A_1} s_aa=0. \] Since \(A\) is a linearly independent set, we conclude that \(r_a=s_a\) for \(a\in A_1\cap A_2\), \(r_a=0_R\) for \(a\in A_1\setminus A_2\), and \(s_a=0_R\) for \(a \in A_2\setminus A_1\). Set \[ B := \{a \in A_1\cap A_2 \mid r_a \neq 0_R\}. \] Then \[ m = \displaystyle\sum_{a\in B}r_aa \] is the unique way of writing \(m\) as a linear combination of elements of \(A\) with nonzero coefficients.

Theorem 11.4.12 (Universal mapping property for free modules)

Let \(R\) be a ring, \(M\) be a free \(R\)-module with basis \(B\), \(N\) be any \(R\)-module, and let \(j: B \to N\) be any function. Then there is a unique \(R\)-module homomorphism \(h: M \to N\) such that \(h(b) = j(b)\) for all \(b \in B\).

Proof (of Theorem 11.4.12)

We have two things to prove: existence and uniqueness.

Existence: Any \(0\neq m\in M\) can be written uniquely as \[ m=r_1b_1+\dots+r_nb_n \] with \(b_i\in B\) distinct and \(0 \neq r_i \in R\). Define \(h\!: M \to N\) by \[ \begin{cases} h(r_1b_1+\dots+r_nb_n) = r_1j(b_1) + \cdots +r_nj(b_n) & \text{ if } r_1b_1 + \cdots + r_nb_n \neq 0\\ h(0_M)=0_N \end{cases} \] One can check that this satisfies the conditions to be an \(R\)-module homomorphism (exercise!).

Uniqueness: Let \(h:M\to N\) be an \(R\)-module homomorphism such that \(h(b_i)=j(b_i)\). Then in particular \(h\!:(M,+)\to (N,+)\) is a group homomorphism and therefore \(h(0_M)=0_N\) by properties of group homomorphisms. Furthermore, if \(m=r_1b_1+\dots+r_nb_n\) then \[ h(m)=h(r_1b_1+\dots+r_nb_n)=r_1h(b_1)+\dots+r_nh(b_n)=r_1j(b_1)+\dots+r_nj(b_n) \] by the definition of homomorphism, and because \(h(b_i)=j(b_i)\).

Corollary 11.4.13

If \(A\) and \(B\) are sets of the same cardinality, and fix a bijection \(j:A\to B\). If \(M\) and \(N\) are free \(R\)-modules with bases \(A\) and \(B\) respectively, then there is an isomorphism of \(R\)-modules \(M \cong N\).

Proof (of Corollary 11.4.13)

Let \(g:M\to N\) and \(h:N\to M\) be the module homomorphisms induced by the bijection \(j:A\to B\) and its inverse \(j^{-1}:B\to A\), which exist by the UMP for free module. We will show that \(h\) and \(g\) are inverse homomorphisms. First, note that \(g \circ h:N\to N\) is an \(R\)-module homomorphism and \((g \circ h)(b) = g(j^{-1}(b))=j(j^{-1}(b))=b\) for every \(b\in B\). Since the identity map \(\mathrm{id}_N\) is an \(R\)-module homomorphism and \(id_N(b)=b\) for every \(b\in B\), by the uniqueness in the UMP for free modules we have \(g \circ h = \mathrm{id}_n\). Similarly, one shows that \(h \circ g = \mathrm{id}_M\).

The corollary gives that, up to isomorphism, there is only one free module with basis \(A\), provided such a module exists. But does a free module generated by a given set \(A\) exist? It turns out it does.

Definition 11.4.14

Let \(R\) be a ring and let \(A\) be a set. The free \(R\)-module generated by \(A\), denoted \(F_R(A)\) is the set of formal sums

\[ \begin{align*} F_R(A) &= \{r_1a_1 + \dots + r_na_n \mid n \geqslant 0, r_i \in R, a_i \in A\} \\ &= \left\lbrace \sum_{a \in A} r_aa \mid r_a \in R, r_a = 0 \text{ for all but finitely many }a \right\rbrace, \end{align*} \] with addition defined by \[ \left(\sum_{a \in A} r_aa\right) + \left(\sum_{a \in A} s_aa \right) = \sum_{a \in A} (r_a + s_a)a \] and \(R\)-action defined by \[ r \left(\sum_{a \in A} r_aa \right) = \sum_{a \in A} (rr_a)a. \]

Exercise 11.4.15

This construction \(F_R(A)\) results in an \(R\)-module, which is free with basis \(A\), and \(F_R(A)\cong \bigoplus_{a\in A}R\).

Theorem 11.4.16 (Uniqueness of rank over commutative rings)

Let \(R\) be a commutative ring with \(1 \neq 0\) and let \(M\) be a free \(R\)-module. If \(A\) and \(B\) are both bases for \(M\), then \(A\) and \(B\) have the same cardinality, meaning that there exists a bijection \(A \to B\).

Proof (of Theorem 11.4.16)

You will show this in the next problem set (at least in the case where \(M\) has a finite basis).

Definition 11.4.17

Let \(R\) be a commutative ring and let \(M\) be a free \(R\)-module. The rank of \(M\) is the cardinality of any basis of \(M\).

Example 11.4.18

Let \(R\) be a commutative ring with \(1 \neq 0\). The rank of \(R^n\) is \(n\). Note that, any free \(R\)-module of rank \(n\) must be isomorphic to \(R^n\).

Earlier, we described the \(R\)-module structure on the direct sum of \(R\)-modules; this is how we construct \(R^n\), by taking the direct sum of \(n\) copies of the \(R\)-module \(R\). This construction can also be described as the direct product of \(n\) copies of \(R\). However, the direct sum and direct product are two different constructions.

Definition 11.4.19

Let \(R\) be a ring. Let \(\{ M_a \}_{a \in J}\) be a collection of \(R\)-modules. The direct product of the R-modules \(M_a\) is the Cartesian product \[ \prod_{a \in J} M_a := \{ (m_a)_{a \in J} \mid m_a \in M_a \} \] with addition defined by \[ (m_a)_{a \in J}+(n_a)_{a \in J} := (m_a+n_a)_{a \in J} \] and \(R\)-action defined by \[ r(m_a)_{a \in J} = (rm_a)_{a \in J}. \]

The direct sum of the \(R\)-modules \(M_a\) is the \(R\)-submodule \(\bigoplus_{a \in J} M_a\) of the direct product \(\prod_{a \in J} M_a\) given by \[ \bigoplus_{a \in J} M_a = \{(m_a)_{a \in J} \mid m_a = 0 \text{ for all but finitely many } a \}. \]

Exercise 11.4.20

The direct sum and the direct product of an arbitrary family of \(R\)-modules are \(R\)-modules.

Example 11.4.21

Suppose that \(|A| = n < \infty\). Let \(M_1,\ldots,M_n\) be \(R\)-modules. The direct product module \(M_1 \times \dots \times M_n\) is the abelian group \(M_1 \times \dots \times M_n\) with ring action given by \(r(m_1,\ldots,m_n) = (rm_1,\ldots,rm_n)\) for all \(r \in R\) and \(m_i \in M_i\). Comparing the definitions we see that \[ M_1 \times \dots \times M_n = M_1 \oplus \dots \oplus M_n. \]

If \(M_i=R\) for \(1 \leqslant i \leqslant n\), then we denote \(R^n = \underbrace{R\times \dots \times R}_n=\underbrace{R\oplus \dots \oplus R}_n\).

It is useful to talk about maps from the factors/summands to the direct product/ direct sum and conversely.

Definition 11.4.22

For \(i\in J\) the inclusion of the \(i\)-th factor into a direct product or direct sum is the map \[ \iota_i\!: M_i \to \prod_{a \in J} M_a \text{ or } \iota_i\!: M_i \to \bigoplus_{a \in J} M_a, \iota_i(m)=(m_a)_{a \in J}, \text{ where } m_a=\begin{cases} m & \text{ if } a = i \\ 0 & \text{ if } a \neq i \end{cases}. \]

For \(i\in J\) the \(i\)-th projection map from a direct product or a direct sum module is \[ \pi_i\!: \prod_{a \in J} M_a \to M_i \text{ or } \pi_i:\bigoplus_{a \in J} M_a \to M_i, \pi_i \left((m_a)_{a \in J}\right)=m_i. \]

Lemma 11.4.23

Projections from direct products or sums of \(R\)-module, inclusions into direct products or sums of \(R\)-modules, and products of \(R\)-module homomorphisms are \(R\)-module homomorphisms. Furthermore, inclusions are injective, projections are surjective, and \[ \pi_i\circ \iota_i=\mathrm{id}_{M_i}. \] Also, \(\iota_i(M_i)\) is an \(R\)-submodule of the direct product/sum which is isomorphic to \(M_i\).

Note, however, that \(\iota_i\circ\pi_i\neq \mathrm{id}\).

12. Vector Spaces

We now turn our focus to vector spaces: modules over fields.

12.1 Classification of vector spaces and dimension

Recall that for a subset \(A\) of an \(F\)-vector space \(V\), the span of \(A\), denoted \(\mathrm{span}(A)\), is the subspace generated by \(A\):

\[ \mathrm{span}(A) := \left\{\sum_{i=1}^n c_i a_i \mid n \geqslant 0, c_i\in F, a_i \in A \right\}. \]

Lemma 12.1.1

Suppose \(I\) is a linearly independent subset of an \(F\)-vector space \(V\) and \(v \in V \setminus \mathrm{span}(I)\), then \(I \cup \{v\}\) is also linearly independent.

Proof of Lemma 12.1.1

Let \(w_1, \dots, w_n\) be any list of distinct elements of \(I \cup \{v\}\) and suppose that \(\sum_i c_i w_i = 0\) for some \(c_i \in F\). If none of the \(w_i\)'s is equal to \(v\), then \(c_i = 0\) for all \(i\), since \(I\) is linearly independent. Without loss of generality, say \(w_1 = v\). If \(c_1 = 0\) then \(c_i = 0\) for all \(i\) by the same reasoning as in the previous case. If \(c_1 \ne 0\), then

\[ v = \sum_{i \geqslant 2} \frac{-c_i}{c_1} w_i \in \mathrm{span}(I), \]

contrary to assumption. This proves that \(I \cup \{v\}\) is a linearly independent set.

Theorem 12.1.2

Let \(V\) be an \(F\)-vector space and assume \(I \subseteq S \subseteq V\) are subsets such that \(I\) is linearly independent and \(S\) spans \(V\). Then there is a subset \(B\) with \(I \subseteq B \subseteq S\) such that \(B\) is a basis.

Before we prove this theorem, we note the following corollary:

Corollary 12.1.3 (Every vector space has a basis)

Every vector space \(V\) has a basis, and hence is a free module. Moreover, every linearly independent subset of \(V\) is contained in some basis, and every set of vectors that spans \(V\) contains some basis.

Proof of Corollary 12.1.3

For this first part, apply the theorem with \(I = \varnothing\) and \(S = V\). For the second and third, use \(I\) arbitrary and \(S = V\) and \(I = \varnothing\) and \(S\) arbitrary, respectively.

Example 12.1.4

\(\mathbb{R}\) has a basis as a \(\mathbb{Q}\)-vector space; just don't ask me what it looks like.

Proof of Theorem 12.1.2

Let \(\mathcal{P}\) denote the collection of all subsets \(X\) of \(V\) such that \(I \subseteq X \subseteq S\) and \(X\) is linearly independent. We make \(\mathcal{P}\) into a poset by the order relation given by set containment \(\subseteq\). We note that \(\mathcal{P}\) is not empty since, for example \(I \in \mathcal{P}\).

Let \(\mathcal{T}\) be any nonempty chain in \(\mathcal{P}\). Let \(Z = \bigcup_{Y \in \mathcal{T}} Y\). We claim \(Z \in \mathcal{P}\). Given \(z_1, \dots, z_m \in Z\), for each \(i\) we have \(z_i \in Y_i\) for some \(Y_i \in T\). Since \(T\) is totally ordered, one of \(Y_1, \dots, Y_m\) contains all the others and hence contains all the \(z_i\)'s. Since \(Y_i\) is linearly independent, this shows \(z_1, \dots, z_m\) are linearly independent. Thus \(Z\) is linearly independent. Since \(\mathcal{T}\) is non-empty, \(Z \supseteq I\) and hence \(Z \in \mathcal{P}\). It is an upper bound for \(\mathcal{T}\) by construction.

By Zorn's Lemma, \(\mathcal{P}\) has a maximal element \(B\), which we claim is a basis for \(V\). Note that \(B\) is linearly independent and \(I \subseteq B \subseteq S\) by construction. We need to show that it spans \(V\). Suppose not. Since \(S\) spans \(V\), if \(S \subseteq \mathrm{span}(B)\), then \(\mathrm{span}(B)\) would have to be all of \(V\). So, there is at least one \(v \in S\) such that \(v \notin \mathrm{span}(B)\), and set \(X := B \cup \{v\}\). Clearly, \(I \subset X \subseteq S\) and, by Lemma 12.1.1, \(X\) is linearly independent. This shows that \(X\) is an element of \(\mathcal{P}\) that is strictly bigger than \(B\), contrary to the maximality of \(B\).

Corollary 12.1.5

Let \(F\) be a field and \(W\) be a subspace of the \(F\)-vector space \(V\). Then every basis of \(W\) extends to a basis of \(V\), that is, if \(B\) is a basis of \(W\) then there exists a basis \(\tilde{B}\) of \(V\) such that \(B\) is a subset of \(\tilde{B}\).

Proof of Corollary 12.1.5

Apply Theorem 12.1.2 with \(B = I\) and \(S = V\). Since \(B\) is a basis of \(W\), \(B\) is linearly independent, and \(B\) remains linearly independent when regarded as a subset of \(V\).

Remark 12.1.6

It is not true that, with the notation of the previous Corollary, if \(\tilde{B}\) is a basis of \(V\) then there exists a basis \(B\) of \(W\) such that \(B\) is a subset of \(\tilde{B}\). For instance, take \(F = \mathbb{R}\), \(V = \mathbb{R}^2\), \(\tilde{B} = \{(1,0), (0,1)\}\) and \(W\) the subspace spanned by \((1,1)\).

The following is an essential property of vector spaces that eventually will allow us to compare bases in terms of size.

Lemma 12.1.7 (Exchange Property)

Let \(B\) be a basis for a vector space \(V\) and consider any set of linearly independent vectors \(I \subseteq V\). Then there is a subset \(A \subseteq B\) such that

\(|I|=|A|\) (meaning that there is a bijection \( I \leftrightarrow A\)), and
\((B \smallsetminus A) \cup I\) is also a basis for \(V\).

Proof of Lemma 12.1.7

First we show we can swap out one element of \(B\) for one nonzero element \(\{v\}\). In this case, we will show the stronger statement that for any subset \(B_0 \subseteq B\), and any element \(a\notin \mathrm{span}(B_0)\), there is some \(b\in B\smallsetminus B_0\) such that \(B\smallsetminus \{b\} \cup \{v\}\) is a basis for \(V\).

Since \(B\) is a basis, we can write

\[ v = \sum_i \lambda_i b_i \]

for some elements \(b_i \in B\). Since \(v\notin \mathrm{span}(B_0)\), we have \(\lambda_i\neq 0\) for some \(b_i\notin B_0\); say \(\lambda_1\neq 0\). We claim that \(B' := B\smallsetminus \{ b_1\} \cup \{v\}\) is a basis.

\(B'\) is linearly independent: By Lemma 12.1.1, it suffices to show that \(v\notin \mathrm{span}(B\smallsetminus \{b_1\})\). Indeed, if it were, then writing \(v\) as a linear combination of \(B\smallsetminus \{b_1\}\) and the linear combination \(v = \sum_i \lambda_i b_i\) with \(b_1\neq 0\) would give two different expressions of \(v\) in the basis \(B\), a contradiction.

\(B'\) spans \(V\): First, we note that it suffices to show that \(b_1\) in the span of \(V\): once we have shown this, we can write any element of \(V\) as a linear combination of elements of \(B\), and just replace the term with \(b_1\) with a linear combination of elements of \(B'\) by substituting. To this end, by the contrapositive of Lemma 12.1.1, it suffices to show that \(B' \cup \{b_1\}\) is not linearly dependent, and our starting expression \(v = \sum_i \lambda_i b_i\) does this.

This concludes the case of one element. For the general case, we set up a Zorn's Lemma argument.

Consider the collection of pairs \((I', A')\) with \(I' \subseteq I\), \(A' \subseteq A\), and \(|I'|=|A'|\) with the property that \(B\smallsetminus A' \cup I'\) is a basis for \(V\). By a Zorn's Lemma argument (left as an exercise), there is a maximal such pair under the partial order \((I',A')\leq (I'',A'')\) if \(I'\subseteq I''\) and \(A'\subseteq A''\). Let \((I_0,A_0)\) be a maximal element. We will argue that \(I_0=I\).

To obtain a contradiction, suppose otherwise, and let \(a\in I \smallsetminus I_0\). Apply the special case above to the basis \((B\smallsetminus A_0) \cup I_0\) and special subset \(I_0\): since \(I\) is linearly independent, \(a\notin \mathrm{span}(I_0)\). Then by the special case, there is some \(b\in B\smallsetminus A_0\) such that \( B\smallsetminus (A_0 \cup \{b\}) \cup (I_0 \cup \{a\})\) is a basis. This contradicts the maximality of \((I_0,A_0)\), so we deduce that \(I_0=I\) as required.

It follows that all bases for the same vector space have the same cardinality.

Theorem 12.1.8 (Dimension Theorem)

Let \(V\) be a vector space, and \(B,B'\) be two bases for \(V\). Then \(|B| = |B'|\), meaning there is a bijection \(B\leftrightarrow B'\).

Proof of Theorem 12.1.8

Let \(B, B'\) be two bases for \(V\). Applying the Exchange Lemma with \(C=B'\), there is a subset \(C\subseteq B\) with \(|C|=|B'|\), so there is an injective map \(B' \hookrightarrow B\). Switching roles, there is an injective map \(B\hookrightarrow B'\). It follows from a result in set theory (the Cantor-Bernstein theorem) that there is a bijection \( B\leftrightarrow B'\).

Corollary 12.1.9

Let \(F\) be a field. Let \(V\) be a vector space with a basis \(B\) and \(V'\) be a vector space with a basis \(B'\). Then \[ V\cong V' \quad \Longleftrightarrow \quad |B|=|B'|.\]

Proof of Corollary 12.1.9

The (\(\Leftarrow\)) implication is a special case of Corollary 11.4.13. For the (\(\Rightarrow\)) implication, we claim that if \(\phi: V\to V'\) is an isomorphism, then \(\phi(B)\) is a basis for \(V'\):

\(\phi(B)\) is linearly independent: Let \(\phi(b_1),\dots,\phi(b_n)\in \phi(B)\) and \( \lambda_1 \phi(b_1) + \cdots+ \lambda_n \phi(b_n) = 0\). Then \[ 0= \lambda_1 \phi(b_1) + \cdots+ \lambda_n \phi(b_n) = \phi( \lambda_1 b_1 + \cdots+ \lambda_n b_n)\] and \(\phi\) is injective so \( \lambda_1 b_1 + \cdots+ \lambda_n b_n=0\). Since \(B\) is linearly independent, we have \(\lambda_1 = \cdots=\lambda_n=0\).

\(\phi(B)\) spans \(V'\): Since \(\phi\) is surjective, for any \(v'\in V'\) we can write \(v'=\phi(v)\) for some \(v\in V\). Then we can write \(v=\lambda_1 b_1 + \cdots + \lambda_n b_n\) for some \(\lambda_i\in F\) and \(b_i\in B\), and then \[ v' = \phi(v) = \phi(\lambda_1 b_1 + \cdots + \lambda_n b_n) = \lambda_1 \phi(b_1) + \cdots+ \lambda_n \phi(b_n).\]

Thus, \(\phi(B)\) is a basis for \(V'\). Since \(\phi\) is a bijection, we have \(|B|=|\phi(B)|\), and by the previous Theorem, \(|\phi(B)|=|B'|\).

Definition 12.1.10

The dimension of a vector space \(V\), denoted \(\dim_F(V)\) or \(\dim(V)\), is the cardinality of any of its bases.

Example 12.1.11

\(\dim_F(F^n) = |\{e_1,e_2,\ldots,e_n\}| = n.\)

While one can talk about infinite cardinals, we'll generally say that dimension is a natural number or \(\infty\). We restate the results above specifically in the finite-dimensional case for easy reference.

Theorem 12.1.12 (Classification of finite dimensional vector spaces)

Let \(F\) be a field. Let \(V\) be a vector space of dimension \(n\), and \(W\) be a vector space of dimension \(m\).

(1) \(V\cong W\) if and only if \(n = m\).
(2) \(V \cong F^n\).
(2) \(F^n \cong F^m\) if and only if \(m=n\).

Proof of Theorem 12.1.12

Part (1) follows from of Corollary 12.1.9. Part (2) and (3) are special cases, in light of Example 12.1.11.

Remark 12.1.13

The previous results says in particular that dimension is an isomorphism invariant for a vector space. You will show in the Homework that rank is an isomorphism invariant over commutative rings.

Let us consider a few infinite-dimensional vector spaces.

Example 12.1.14

Consider the vector space \(F[x]\). This cannot be a finite dimensional vector space. For instance, if \(\{f_1 , \dots , f_n\}\) were a basis, then setting

\[ M = \max_{1 \leqslant j \leqslant n}\{ \deg(f_j)\} \]

we see that the element \(x^{M+1}\) is not be in the span of \(\{f_1 , \dots , f_n\}\). We can find a basis for this space though. Consider the collection \(B = \{1, x, x^2 , \ldots \}\). This set is linearly independent and spans \(F[x]\), thus it forms a basis for \(F[x]\). This basis is countable, so \(\dim_F(F[x])= |\mathbb{N}|\).

Example 12.1.15

Consider the real vector space

\[ V := \mathbb{R}^\mathbb{N} = \mathbb{R}\times \mathbb{R}\times \mathbb{R} \times \cdots. \]

This space can be identified with sequences \(\{a_n\}\) of real numbers. One might be interested in a basis for this vector space. At first glance, the most obvious choice for a basis would be \(E = \{e_1,e_2,\ldots\}\). It turns out that \(E\) is the basis for the direct sum \(\bigoplus_{i\in \N}\mathbb{R}\). However, it is immediate that this set does not span \(V\), as \(v = (1,1,\ldots)\) can not be represented as a finite linear combination of these elements. Since \(v\) is not in \(\mathrm{span}(E)\), then we know that \(E \cup \{v\}\) is a linearly independent set. However, this new set \(E \cup \{v\}\) does not span \(V\) either, as \((1, 2, 3, 4, \ldots)\) is not in the span of \(E \cup \{v\}\). We know that \(V\) has a basis, but it can be shown that no countable collection of vectors forms a basis for this space, and in fact \(\dim_\mathbb{R} (\mathbb{R}^\mathbb{N}) =|\mathbb{R}|\).

Example 12.1.17

Since \(\mathbb{Q}\) is a subring of \(\mathbb{R}\), we have that \(\mathbb{R}\) is a \(\mathbb{Q}\)-vector space, and likewise with \(\mathbb{C}\). One can show that \(\dim_{\mathbb{Q}}(\mathbb{R})=|\mathbb{R}|\), and \(\dim_{\mathbb{Q}}(\mathbb{C})=|\mathbb{C}| = |\mathbb{R}|\), so \(\mathbb{R}\cong \mathbb{C}\) as \(\mathbb{Q}\)-vector spaces. In particular, \((\mathbb{R},+)\cong (\mathbb{C},+)\) as groups.

We now deduce some formulas that relate the dimensions of various vector spaces.

Theorem 12.1.18

Let \(W\) be a subspace of a vector space \(V\). Then \[ \dim(V) = \dim(W) + \dim(V/W). \]

Here the dimension of a vector space is understood to be either a nonnegative integer or \(\infty\), and the arithmetic of the formula is understood to follow the rules \(n+\infty=\infty=\infty+\infty\) for any \(n\in \mathbb{Z}_{\geqslant 0}\). The proof follows from the Problem #1 in Problem Set #2.

Example 12.1.19

Consider the vector space \(V = \mathbb{R}^2\) and its subspace \(W=\mathrm{span}\{e_1\}\). Then the quotient vector space \(V/W\) is, by definition,

\[ V/W=\{(x,y)+W \mid (x,y)\in \mathbb{R}^2\}. \]

Looking at each coset we see that

\[ (x,y)+W=(x,y)+\mathrm{span}\{e_1\}=\{(x,y)+(a,0)\mid a\in \mathbb{R}\}=\{(t,y)\mid t\in \mathbb{R}\}, \]

so \((x,y)+W\) is geometrically a line parallel to the \(x\)-axis and having the \(y\)-intercept \(y\). It is intuitively natural to identify such a line with its intercept, which gives a map

\[ V/W\to \mathrm{span}\{e_2\} \quad (x,y)+W \mapsto (0,y). \]

It turns out that this map is a vector space isomorphism, hence \[ \dim(V/W) = \dim(\mathrm{span}\{e_2\}) = 1 \] and we can check that \[ \dim(W) + \dim(V/W) = 1+1 = 2 = \dim(V). \]

If \(V\) and \(W\) are both infinite dimensional vector spaces, it can happen that \(V/W\) is finite dimensional but also that it is infinite dimensional.

Example 12.1.20

Let \(V=F[x]\), which we saw in Example 12.1.15 is an infinite dimensional vector space over \(F\). Fix a polynomial \(f\) with \(\deg(f)=d\), and note that the ideal \((f)\) of \(F[x]\) generated by \(f\) is also an \(F\)-vector subspace of \(F[x]\) via restriction of scalars. We will show later that \(\dim(F[x]/(f))=d\). In contrast, the subspace \(E\) of all even degree polynomials in \(F[x]\) together with the zero polynomial satisfies \(\dim(F[x]/E)=\infty\).

Definition 12.1.21

Let \(T\!: V \to W\) be a linear transformation. The nullspace of \(T\) is \(\ker(T)\). The rank of \(T\) is \(\dim(\mathrm{im}(T))\).

Corollary 12.1.22 (Rank-Nullity Theorem)

Let \(f\!: V \to W\) be a linear transformation. Then \[ \dim(\ker(f)) + \dim(\mathrm{im}(f)) = \dim(V). \]

Proof of Corollary 12.1.22

By the First Isomorphism Theorem for modules we have \(V/\ker(f)\cong\mathrm{im}(f)\), thus \[ \dim\left(V/\ker(f)\right)=\dim(\mathrm{im}(V)). \] By Theorem 12.1.18, we have \[ \dim(V)=\dim(\ker(V))+\dim\left(V/\ker(f)\right). \] Thus \[ \dim(V)=\dim(\ker(V))+\dim\left(V/\ker(f)\right) = \dim(\ker(V)) + \dim(\mathrm{im}(V)). \]

12.2 Linear transformations and homomorphisms between free modules

Definition 12.2.1 (The matrix of a homomorphism between free modules)

Let \(R\) be a commutative ring . Let \(V\) be a finitely generated free \(R\)-module of rank \(n\), and let \(W\) be a finitely generated free \(R\)-module of rank \(m\). Let \(B=\{b_1, \dots, b_n\}\) and \(C=\{c_1, \dots, c_m\}\) be ordered bases of \(V,W\). Given an \(R\)-module homomorphism \(f: V \to W\), we define elements \(a_{ij}\in R\) for \(1 \leqslant i \leqslant m\) and \(1 \leqslant j \leqslant n\) by the formulas

\[ f(b_i) = \sum_{j=1}^m a_{j,i} c_j. \tag{12.2.1}\label{eq-12-2-aij} \]

The matrix

\[ [f]_B^C= \begin{bmatrix} a_{1,1} & a_{1,2} & \cdots & a_{1,n} \\ a_{2,1} & a_{2,2} & \cdots & a_{2,n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m,1} & a_{m,2} & \cdots & a_{m,n} \\ \end{bmatrix} \]

is said to represent the homomorphism \(f\) with respect to the bases \(B\) and \(C\).

Remark 12.2.2

By Lemma 11.4.11, the coefficients \(a_{j,i}\) in equation \(\eqref{eq-12-2-aij}\) are uniquely determined by the \(f(b_i)\) and the elements of \(C\). The coefficients \(a_{j,i}\) corresponding to \(f(b_i)\) form the \(i\)th column of \([f]_B^C\). Note that \([f]_B^C\) is an \(m\times n\) matrix with entries in \(R\).

Definition 12.2.3

Let \(V\) and \(W\) be finite \(F\)-vector spaces of dimension \(n\) and \(m\) with ordered bases \(B\) and \(C\), respectively, and let \(f\!:V\to W\) be a linear transformation. The matrix \([f]_B^C\) is called the matrix of the linear transformation \(f\) with respect to the bases \(B\) and \(C\).

Example 12.2.4

If \(\mathrm{id}_V\!: V \to V\) is the identity automorphism of an \(n\)-dimensional free \(R\)-module \(V\), then for any basis \(B\) of \(V\) we have \(\mathrm{id}_V(b_i) = b_i\) for all \(i\) and hence

\[ [\mathrm{id}_V]^B_B = I_n. \]

Example 12.2.5

Let \(P_3\) denote the the \(F\)-vector space of polynomials of degree at most 3 (including the zero polynomial) and consider the linear transformation \(d:P_3\to P_3\) given by taking the derivative \(d(f)=f'\). Let \(B=\{1,x,x^2,x^3\}\). Then

\[ [f]_B^B= \begin{bmatrix} 0 & 1 &0 & 0 \\ 0 &0 & 2 & 0 \\ 0& 0& 0& 3 \\ 0 & 0 &0 & 0 \\ \end{bmatrix}. \]

Example 12.2.6

Let \(F\) be a field and consider a linear transformation \(f\!:V\to W\), where \(V=F^n\) and \(W=F^m\). Consider also the standard ordered bases \(B\) and \(C\), i.e. \(b_i=e_i\in V\) and \(c_i=e_i\in W\). Then for any

\[ v = \begin{bmatrix} \ell_1\\ \vdots\\ \ell_n \end{bmatrix} =\sum_i \ell_i b_i \]

in \(V\) we have

\[ f \left( \sum \ell_i b_i \right) = \sum_i \ell_i f(b_i). \]

Each \(f(b_i)\) can be written uniquely as a linear combination of the \(c_j\)'s as in \(\eqref{eq-12-2-aij}\):

\[ f(b_i) = \sum_j a_{j,i} c_j. \]

Then we get

\[ f(v) = \sum_i\ell_i\left( \sum_{j} a_{j,i} c_j \right)= \sum_j \left(\sum_i a_{j,i} \ell_i\right) c_j. \]

In other words, we have

\[ f(v) = \begin{bmatrix} \sum_i a_{1,i} \ell_i \\ \vdots\\ \sum_i a_{m,i} \ell_i \end{bmatrix} = \begin{bmatrix} a_{1,1} & a_{1,2} & \cdots & a_{1,n} \\ a_{2,1} & a_{2,2} & \cdots & a_{2,n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m,1} & a_{m,2} & \cdots & a_{m,n} \\ \end{bmatrix} \cdot \begin{bmatrix} \ell_1\\\vdots\\ \ell_n\end{bmatrix} =[f]_B^C\cdot v. \]

This says that any linear transformation \(f\!:F^n\to F^m\) is given by multiplication by a matrix, since we noticed above that \(f(v) = [f]_B^C\cdot v\). The same type of statement holds for free modules over commutative rings, and we will show it below in Proposition 12.2.7.

Proposition 12.2.7

Let \(R\) be a commutative ring . Let \(V\) and \(W\) be finitely generated free \(R\)-modules of ranks \(n\) and \(m\) respectively. Fixing ordered bases \(B\) for \(V\) and \(C\) for \(W\) gives an isomorphism of \(R\)-modules

\[ \mathrm{Hom}_R(V, W) \cong \mathrm{Mat}_{m,n}(R) \qquad f\mapsto [f]_B^C. \]

If \(V=W\), so that in particular \(m=n\), and \(B=C\), then the above map is an \(R\)-module isomorphism \(\mathrm{End}_R(V)\cong\mathrm{Mat}_n(R)\), and an isomorphism of rings as well.

Proof of Proposition 12.2.7

Let \(\varphi\!:\mathrm{Hom}_R(V, W) \to \mathrm{Mat}_{m,n}(R)\) be defined by \(\varphi(f)=[f]_B^C\). We need to check that \(\varphi\) is a homomorphism of \(R\)-modules, which translates into \([f+g]_B^C=[f]_B^C+[g]_B^C\) and \([\lambda f]_B^C=\lambda[f]_B^C\) for any \(f,g \in \mathrm{Hom}_R(V, W)\) and \(\lambda\in R\). Let \(A=[f]_B^C\) and \(A'=[g]_B^C\). Then

\[ (f+g)(b_i)=f(b_i)+g(b_i)= \sum_j a_{j,i} c_j+ \sum_j a'_{j,i} c_j= \sum_j (a_{j,i}+a'_{i,j}) c_j \]

gives \([f+g]_B^C=A+A'\) and

\[ (\lambda f)(b_i)=\lambda\left( \sum_j a_{j,i} c_j\right)= \sum_j (\lambda a_{j,i}) c_j \]

gives \([\lambda f]_B^C=\lambda A\). We leave the proof that for \(f,g\in \mathrm{End}_R(V)\) we have \([f\circ g]_B^B=[f]_B^B[g]_B^B\) as an exercise.

Finally, the argument described in Example 12.2.6 also works for any ring \(R\), and it can be adapted for any two chosen basis \(B\) and \(C\), showing that \(\varphi\) is a bijection.

Corollary 12.2.8

For any field \(F\) and finite \(F\)-vector spaces \(V\) and \(W\) of dimension \(n\) and \(m\) respectively, \(\dim(\mathrm{Hom}_F(V, W))=mn\).

Proof of Corollary 12.2.8

The isomorphism \(\mathrm{Hom}_F(V, W) \cong \mathrm{Mat}_{m,n}(F)\) gives

\[ \dim \left(\mathrm{Hom}_F(V, W) \right) = \dim \left( \mathrm{Mat}_{m,n}(F) \right)=mn. \]

To explain in what sense the matrix \([f]_B^C\) represents the linear transformation \(f\), it is convenient to use the notion of coordinates with respect to a basis. We prepare for this with an exercise that we will also reuse later.

Exercise 12.2.9

Let \(R\) be a commutative ring and \(V\) be a free module with a basis \(B\). Let \(M\) be an arbitrary \(R\)-module and let \(\phi: V\to M\) be an \(R\)-module homomorphism. Then:

\(\phi\) is injective if and only if \(\phi(B)\) is linearly independent.
\(\phi\) is surjective if and only if \(\phi(B)\) generates \(M\).
\(\phi\) is an isomorphism if and only if \(\phi(B)\) is a basis for \(M\).

Definition 12.2.10

Let \(R\) be a commutative ring and \(V\) be a free module with basis \(B=\{b_1,\dots,b_n\}\). Consider the \(R\)-module homomorphism \(\phi:V\to R^n\) with \(\phi(b_i)=e_i\). There is a unique such map by the UMP for free modules, and it is an isomorphism by the previous exercise. We call \(\phi(v)\) the vector of \(B\)-coordinates of \(v\), denoted \([v]_B\).

Remark 12.2.11

Note that

\[ [v]_B = (r_1,\dots,r_n) \Longleftrightarrow v = r_1 b_1 + \cdots + r_n b_n, \]

since \(\phi(r_1 b_1 + \cdots + r_n b_n) = r_1 e_1 + \cdots + r_n e_n = (r_1,\dots,r_n)\) and \(\phi\) is injective.

Proposition 12.2.12

Let \(R\) be a commutative ring. Let \(V\) be a free module with ordered basis \(B\) and \(W\) be a free module with ordered basis \(C\). Let \(f:V\to W\) be a linear transformation. Then

\[ [f(v)]_C = [f]_B^C \cdot [v]_B \]

for all \(v\in V\).

Proof of Proposition 12.2.12

Let \(v\in V\) and write \([v]_B = (r_1,\dots,r_n)\), so \(v=\sum_j r_j b_j\). Write \([f]_B^C=[a_{i,j}]\). Then

\[ f(v) = f\!\left(\sum_j r_j b_j\right) = \sum_j r_j f(b_j) = \sum_j r_j \left(\sum_i a_{i,j} c_i\right) = \sum_i \left(\sum_j a_{i,j} r_j\right) c_i. \]

Thus the \(i\)-th entry of \([f(v)]_C\) is \(\sum_j a_{i,j} r_j\). On the other hand, multiplying out \([f]_B^C \cdot [v]_B = [a_{i,j}](r_1,\dots,r_n)\), the \(i\)-th entry is also \(\sum_j a_{i,j} r_j\).

12.3 Change of basis

Definition 12.3.1

Let \(V\) be a finitely generated free module over a commutative ring \(R\), and let \(B\) and \(C\) be bases of \(V\). Let \(\mathrm{id}_V\) be the identity map on \(V\). Then \([\mathrm{id}_V]_B^{C}\) is a matrix called the change of basis matrix from \(B\) to \(C\).

In Theorem 12.3.6 we will show that \([\mathrm{id}_V]_B^{C}\) is invertible with inverse \(\left([\mathrm{id}_V]_B^{C}\right)^{-1}=[\mathrm{id}_V]_{C}^B\).

Example 12.3.2

Consider the subspace \(V = P_2\) of \(F[x]\) of all polynomials of degree up to \(2\), and the bases \(B = \{1, x, x^2\}\) and \(C = \{1,x-2,(x-2)^2\}\) of \(V\). We calculate the change of basis matrix. We have

\[ \begin{aligned} \mathrm{id}_V(1) &=1 ,\\ \mathrm{id}_V(x) &=2\cdot1+1\cdot(x-2), \\ \mathrm{id}_V(x^2) &=4\cdot1 +4\cdot(x-2)+1\cdot(x-2)^2. \end{aligned} \]

Thus, the change of basis matrix is given by \([\mathrm{id}_V]_B^{C} = \begin{bmatrix} 1 & 2 & 4\\ 0 & 1 & 4\\ 0 & 0 & 1 \end{bmatrix}.\)

Lemma 12.3.3

If \(V,W,U\) are finitely generated free \(R\)-modules with ordered bases \(B\), \(C\), and \(D\), respectively, and \(f\!: V \to W\) and \(g\!: W \to U\) are \(R\)-module homomorphisms, then \([g\circ f]_B^D=[g]_C^D \cdot [f]_B^C.\)

Proof of Lemma 12.3.3

It suffices to check that \([g\circ f]_B^D \cdot p=[g]_C^D \cdot [f]_B^C \cdot p\) for any \(p\in R^n\) where \(n=\mathrm{rank}(V)\). (In fact, we can just take \(p=e_j\) for each \(j\), since \(Ae_j\) is the \(j\)th column of \(A\).) We can write \(p=[v]_B\) for some \(v\in V\). Then

\[ [g\circ f]_B^D [v]_B = [(g\circ f)(v)]_D = [g(f(v))]_D = [g]_C^D [f(v)]_C = [g]_C^D ([f]_B^C [v]_B) = ([g]_C^D [f]_B^C) [v]_B. \]

Definition 12.3.4

Let \(V\) be a finitely generated free module over a commutative ring \(R\). Two \(R\)-module homomorphisms \(f,g: V \to V\) are similar if there is a bijective linear transformation \(h: V \to V\) such that \(g = h\circ f \circ h^{-1}\). Two \(n \times n\) matrices \(A\) and \(B\) with entries in \(R\) are similar if there is an invertible \(n \times n\) matrix \(P\) such that \(B = PAP^{-1}\).

Remark 12.3.5

For elements \(A,B\in \textrm{GL}_n(R)\), the notions of similar and conjugate are the same.

Theorem 12.3.6

Let \(V, W\) be finitely generated free modules over a commutative ring \(R\), let \(B\) and \(B'\) be bases of \(V\), let \(C\) and \(C'\) be bases of \(W\), and let \(f: V \to W\) be a homomorphism. Then

\[ [f]_{B'}^{C'} = [\mathrm{id}_W]_C^{C'} [f]_B^C [\mathrm{id}_V]_{B'}^{B} \]

In particular, if \(g\!: V \to V\) is an \(R\)-module homomorphism, then \([g]_B^B\) and \([g]_{B'}^{B'}\) are similar.

Proof of Theorem 12.3.6

Since \(f=\mathrm{id}_W\circ f\circ \mathrm{id}_V\), by Lemma 12.3.3 we have

\[ [f]_{B'}^{C'} = [\mathrm{id}_W]_C^{C'} [f]_B^C [\mathrm{id}_V]_{B'}^{B}. \]

Now set \(V=W,B=C,B'=C'\) and \(f=g\) in the displayed equation to obtain

\[ [g]_{B'}^{B'} = [\mathrm{id}_V]_B^{B'} [g]_B^B [\mathrm{id}_V]_{B'}^{B}=P[g]_B^B P^{-1}. \]

We now come to certain special changes of basis and their matrices:

Definition 12.3.7

Let \(R\) be a commutative ring, let \(M\) be a free \(R\)-module of finite rank \(n\), and let \(B = \{b_1,\dots ,b_n\}\) be an ordered basis for \(M\). An elementary basis change operation on the basis \(B\) is one of the following three types of operations to produce a new basis \(B'=\{b'_1,\dots,b'_n\}\):

Replacing \(b_j\) by \(r b_i + b_j\) for some \(i \neq j\) and some \(r\in R\); that is, \(b'_j = rb_i + b_j\) and \(b'_k = b_k\) for \(k\neq j\).
Replacing \(b_i\) by \(ub_i\) for some \(i\) and some unit \(u\) of \(R\); that is, \(b'_i= u b_i\) and \(b'_k = b_k\) for \(k\neq i\).
Swapping the indices of \(b_i\) and \(b_j\) for some \(i \neq j\); that is, \(b'_i= b_j\), \(b'_j=b_i\), and \(b'_k = b_k\) for \(k\neq i,j\).

Definition 12.3.8

Let \(R\) be a commutative ring . An elementary column operation on a matrix \(A \in \mathrm{Mat}_{m,n}(R)\) is one of the following three types of operations:

Adding an element of \(R\) times a column of \(A\) to a different column of \(A\).
Multiplying a column of \(A\) by a unit of \(R\).
Interchanging two columns of \(A\).

We define a elementary row operation analogously.

Definition 12.3.9

Let \(R\) be a commutative ring. An elementary matrix over \(R\) is an \(n \times n\) matrix of one of the following three forms:

For \(r \in R\) and \(1 \leqslant i,j \leqslant n\) with \(i \neq j\), let \(E_{i,j}(r)\) be the matrix with \(1\)s on the diagonal, \(r\) in the \((i,j)\) position, and \(0\) everywhere else.
For \(u \in R^\times\) and \(1\leqslant i \leqslant n\) let \(E_i(u)\) denote the matrix with \((i,i)\) entry \(u\), \((j,j)\) entry \(1\) for all \(j \neq i\), and \(0\) everywhere else.
For \(1 \leqslant i,j \leqslant n\) with \(i \neq j\), let \(E_{(i,j)}\) denote the matrix with \(1\) in the \((i,j)\) and \((j,i)\) positions and in the \((l,l)\) positions for all \(l\not \in \{i,j\}\), and \(0\) in all other entries.

Remark 12.3.10

The elementary matrices \(E_i(u)\) and \(E_{(i,j)}\) are symmetric and the transpose of \(E_{i,j}(r)\) is \(E_{j,i}(r)\). In particular, the transpose of an elementary matrix is an elementary matrix.

Lemma 12.3.11

Let \(E\) be an \(n \times n\) elementary matrix.

\(E\) is the change of basis matrix \([\mathrm{id}]_{B'}^B\) for the corresponding elementary basis change operation from \(B\) to \(B'\).
If \(A \in \mathrm{Mat}_{m,n}(R)\), then the result of performing the corresponding elementary column operation on \(A\) is the product matrix \(AE\). Explicitly,
- \(AE_{i,j}(r)\) is the matrix obtained from \(A\) by replacing \[ \mathrm{col}_j(A) \quad \rightsquigarrow \quad \mathrm{col}_j(A) + r \cdot \mathrm{col}_i(A). \]
- \(AE_{i}(u)\) is the matrix obtained from \(A\) by replacing \[ \mathrm{col}_i(A)\quad \rightsquigarrow \quad u \cdot \mathrm{col}_i(A). \]
- \(A E_{(i,j)}\) is the matrix obtained from \(A\) by replacing \[ \begin{aligned} &\mathrm{col}_i(A)\quad \rightsquigarrow \quad \mathrm{col}_j(A)\\ &\mathrm{col}_j(A)\quad \rightsquigarrow \quad \mathrm{col}_i(A) \end{aligned} \]
If \(B \in \mathrm{Mat}_{n,q}(R)\), then the result of performing the corresponding elementary row operation on \(A\) is the product matrix \(E^T B\). Explicitly,
- \(E_{i,j}(r) B\) is the matrix obtained from \(B\) by replacing \[ \mathrm{row}_i(A) \quad \rightsquigarrow \quad \mathrm{row}_i(A) + r \cdot \mathrm{row}_j(A). \]
- \(E_{i}(u) B\) is the matrix obtained from \(B\) by replacing \[ \mathrm{row}_i(A)\quad \rightsquigarrow \quad u \cdot \mathrm{row}_i(A). \]
- \(E_{(i,j)} B\) is the matrix obtained from \(B\) by replacing \[ \begin{aligned} &\mathrm{row}_i(A)\quad \rightsquigarrow \quad \mathrm{row}_j(A)\\ &\mathrm{row}_j(A)\quad \rightsquigarrow \quad \mathrm{row}_i(A) \end{aligned} \]

Proof of Lemma 12.3.11

By definition, the \(j\)-th column of \([\mathrm{id}]_{B'}^B\) gives the coefficients for \(b'_j\) as a linear combination of the elements of \(B\). In each case, we check that the matrix \(E\) agrees with the specified combinations in the definition of the basis operation.
It suffices to check this for a row vector, since the \(i\)-th row of \(BE\) can be computed as the \(i\)-th row of \(B\) multiplied by \(E\). Then one can verify this by case-by-case multiplication.
Similar to (2).

Remark 12.3.12

To remember the relationship between elementary matrices and elementary operations, it suffices to remember that

Row operations correspond to multiplication on the left and column operations correspond to multiplication on the right, and
The elementary matrix corresponding to an elementary row or column operation is the matrix that results from applying that operation to the identity matrix.

Indeed, (2) follows from taking \(A=I\) or \(B=I\) in the Lemma.

12.4 Determinants

We briefly cover some of the key facts about determinants that we will need later.

Definition 12.4.1

Let \(R\) be a commutative ring. We define the function

\[ \det: \mathrm{Mat}_{n\times n}(R) \to R \]

by the rule

\[ \det(A) = \sum_{i\in S_n} \mathrm{sgn}(\sigma) \prod_{i=1}^n {a_{i,\sigma(i)}} \]

for a matrix \(A=[a_{i,j}]\). We call \(\det(A)\) the determinant of \(A\).

Example 12.4.2

If \(A= \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22}\end{bmatrix}\), then \(\det(A)=a_{11} a_{22} - a_{12} a_{21}\).

If \(A\) is an upper triangular matrix, so that \(a_{ij}=0\) for \(j>i\), then there is only one nonzero term in the sum, and \(\det(A)\) is the product of the diagonal entries.

Definition 12.4.3

Let \(R\) be a commutative ring. Let \(\phi: \underbrace{R^n \times \cdots \times R^n}_{n-\text{times}} \to R\) be a function. We say that

\(\phi\) is multilinear if for each \(i=1,\dots,n\) we have
\[ \phi( v_1,\dots,v_{i-1}, v_i + v'_i, v_{i+1},\dots, v_n) = \phi( v_1,\dots,v_{i-1}, v_i, v_{i+1},\dots, v_n) + \phi( v_1,\dots,v_{i-1}, v'_i, v_{i+1},\dots, v_n) \]
and
\[ \phi( v_1,\dots,v_{i-1}, r v_i, v_{i+1},\dots, v_n) = \phi( v_1,\dots,v_{i-1}, v_i, v_{i+1},\dots, v_n) + r \phi( v_1,\dots,v_{i-1}, v_i, v_{i+1},\dots, v_n) \]
for all \(v_1,\dots,v_n, v'_i \in R^n\) and \(r\in R\); i.e., when all but one entry is fixed, the function \(R^n \to R\) in the remaining output is an \(R\)-module homomorphism.
\(\phi\) is alternating if \(\phi(v_1,\dots,v_n)=0\) whenever \(v_i=v_j\) for some \(i\neq j\).

Lemma 12.4.4

Let \(\phi: \underbrace{R^n \times \cdots \times R^n}_{n-\text{times}} \to R\) be a multilinear alternating function. Then for any \(\sigma\in S_n\) and any vectors \(v_1,\dots,v_n\in R^n\), we have

\[ \phi(v_{\sigma(1)}, v_{\sigma(2)},\dots,v_{\sigma(n)}) = \mathrm{sgn}(\sigma) \phi(v_1, v_2,\dots,v_n). \]

Proof of Lemma 12.4.4

First, we consider the case of the transposition \((1\, 2)\). Note that

\[ \begin{aligned} 0 &= \phi(v_1+v_2, v_1+ v_2,\dots,v_n) = \phi(v_1, v_1+ v_2,\dots,v_n) + \phi(v_2, v_1+ v_2,\dots,v_n) \\ & = \phi(v_1, v_1,\dots,v_n) + \phi(v_2, v_1,\dots,v_n) + \phi(v_1, v_2,\dots,v_n) + \phi(v_2, v_2,\dots,v_n) \\ & = \phi(v_2, v_1,\dots,v_n) + \phi(v_1, v_2,\dots,v_n) , \end{aligned} \]

so \(\phi(v_2, v_1,\dots,v_n) = - \phi(v_1, v_2,\dots,v_n)\). The case of an arbitrary transposition follows in the same way. For an arbitrary permutation \(\sigma\), we can write \(\sigma\) as a product of \(t\) transpositions for some \(t\). Applying the case of one transposition \(t\) times yields

Theorem 12.4.5

Let \(R\) be a commutative ring. Identify \(\mathrm{Mat}_{n\times n}(R)\) with \(\underbrace{R^n \times \cdots \times R^n}_{n-\text{times}}\) mapping \(A\) to the \(n\)-tuple of columns of \(A\). Then \(\det\) is the unique function \(\mathrm{Mat}_{n\times n} \to R\) that is multilinear, alternating, and satisfies \(\det(I) = 1\).

Proof of Theorem 12.4.5 (Sketch)

The verification that \(\det\) has these properties is straightforward but messy. To show uniqueness, we can use multlinearity to show that the value of a function with these properties is determined by the values when each column is a standard vector \(e_i\). We can then use the alternating property and Lemma 12.4.4 to show that the value is determined by the value at the identity matrix.

Our next goal is to prove the familiar multiplicative property for determinants.

Proposition 12.4.6

Let \(R\) be a commutative ring. Let \(A\) be a square matrix and let \(B\) be a matrix obtained from \(A\) by a single elementary column operation:

If the operation is of type I, \(\det(B) = \det(A)\).
If the operation is of type II, given by multiplying a column of \(A\) by a unit \(u\), then \(\det(B) = u \det(A)\).
If the operation is of type III, \(\det(B) = - \det(A)\).

In particular, if \(A\) is an arbitrary matrix and \(E\) is an elementary matrix, then \(\det(EA)=\det(E)\det(A)\).

Proof of Proposition 12.4.6

The first case follows from multilinearity and alternating properties: For notational simplicity say \(A = (v_1, v_2, \dots)\) and \(B = (v_1 + rv_2, v_2, \dots)\). Then

\[ \det(B) = \det(v_1, v_2, \dots) + r \det(v_2, v_2, \dots) = \det(A) + r \cdot 0 = \det(A) \]

The second case is immediate from (the second part of) \(R\)-multilinearity. The last case is a special case of Lemma 12.4.4.

The final claim comes from noting that \(\det(E)=1,u,-1\) in the three cases, respectively.

Corollary 12.4.7

For \(R = F\) a field, we have \(\det(A) \ne 0\) if and only if \(A\) is invertible.

Proof of Corollary 12.4.7

If \(A\) is not invertible, then the span of the columns of \(A\) is a proper subspace of \(F^n\) and hence the columns of \(A\) must be linearly dependent. Say the \(i\)-th column is a linear combination of the rest: \(v_i = \sum_{j \ne i} c_j v_j\). Then

\[ \det(v_1, \dots, v_n) = \sum_{j \ne i} c_j \det(\text{a matrix with the \(i\)-th and \(j\)-th columns equal}) = 0. \]

If \(A\) is invertible, then we can write \(A\) as a product of elementary matrices (this is a result that we stated before, but will prove soon). The result thus follows from the Proposition 12.4.6 and the fact that \(\det(I_n) = 1\).

Theorem 12.4.8

Let \(R\) be a commutative ring. Then for any matrices \(A, B \in \mathrm{Mat}_{n \times n}(R)\) we have

\[ \det(AB) = \det(A) \det(B). \]

Proof of Theorem 12.4.8

First we will consider the case where \(R=F\) is a field.

If \(A\) is not invertible, neither is \(AB\), since \(\mathrm{im}(AB) \subseteq \mathrm{im}(A)\), and if \(B\) is not invertible, neither is is \(AB\), since \(\ker(AB) \supseteq \ker(A)\). So, by the Proposition 12.4.6, if either \(A\) or \(B\) is not invertible, both sides of the equation are \(0\).

Assume now that \(A\) and \(B\) are both invertible. Then by the Proposition 12.4.6 we have

\[ A = E_1 \cdots E_n \]

and

\[ B = F_1 \cdots F_m \]

and hence

\[ AB = E_1 \cdots E_n F_1 \cdots F_m \]

where the \(E_i\)'s and \(F_j\)'s are elementary matrices.

Applying Corollary 12.4.7 repeatedly gives

\[ \det(AB) = \det(E_1 \cdots E_n F_1 \cdots F_{m-1}) \det(F_m) = \cdots = \det(E_1) \cdots \det(E_n) \det(F_1) \cdots \det(F_m) \]

and similarly

\[ \det(A) \det(B) = \left(\det(E_1) \cdots \det(E_n)\right) \left( \det(F_1) \cdots \det(F_m)\right). \]

Now, for an integral domain \(R\), consider its fraction field \(F\), and identify \(R\subseteq F\) as a subring. To compute \(\det(A)\), \(\det(B)\), and \(\det(AB)\) we can replace \(R\) by \(F\), and are done by the field case.

One can apply a similar trick for arbitrary commutative rings, but we'll skip this for now.

Proposition 12.4.9

Let \(R\) be a commutative ring. Let \(A\in \mathrm{Mat}_{n\times m}(R)\) and \(B\in \mathrm{Mat}_{m\times n}(R)\) with \(m\geq n\). For a subset \(I=\{i_1,\dots,i_n\} \subseteq [m]\) with \(|I|=n\), let \(A_I\) denote the submatrix of \(A\) with columns indexed by \(I\) (in increasing order). Then

\[ \det(AB) \in \left( \{ \det(A_I) \mid I\subseteq [m],\ |I|=n\} \right). \]

Proof of Proposition 12.4.9

Let \(a_1,\dots,a_m\) be the columns of \(A\). We can write the \(j\)-th column of \(AB\) as \(\sum_{i=1}^m b_{i,j} a_i\). Then, by multilinearity,

\[ \begin{aligned} \det(AB) &= \det \begin{bmatrix} \sum_{i_1=1}^m b_{i_1,1} a_{i_1} & \cdots & \sum_{i_n=1}^m b_{i_n,n} a_{i_n} \end{bmatrix} \\ &= \sum_{1\leq i_1,\dots,i_n \leq m} b_{i_1,1} \cdots b_{i_n,n} \det \begin{bmatrix} a_{i_1} & \cdots & a_{i_n} \end{bmatrix}. \end{aligned} \]

By the alternating property, we can rewrite each \(\det\begin{bmatrix} a_{i_1} & \cdots & a_{i_n} \end{bmatrix}\) as either zero, or the determinant of a submatrix with columns \( i_1 < i_2 < \cdots < i_n \), up to sign. This gives \(\det(AB)\) as an \(R\)-linear combination of the determinants \(\det(A_I)\).

We also have the following:

Exercise 12.4.10

Let \(R\) be a commutative ring and \(A\in \mathrm{Mat}_{n\times n}(R)\). Then \(\det(A) = \det(A^\mathrm{T})\), where \(A^\mathrm{T}\) is the transpose of \(A\).

Chapter 13 Finitely generated modules over PIDs

We have seen that every module over a field is free. In contrast, whenever \(R\) is a commutative ring that is not a field, we can always construct modules that are not free. We will see that, however, every module is still a quotient of a free module. Describing that quotient explicitly is to give a presentation for the module, similarly to how we gave presentations for groups. We will study the particular case of finitely generated modules over PIDs in more detail.

13.1 The module presented by a matrix

Writing a given \(R\)-module \(M\) as a quotient of a free module is giving a presentation for \(M\). In 817, we studied presentations for groups; these consisted of a set of generators and a set (normal subgroup) of relations among these generators. Presentations are important for modules as well. In this case, the relations are encoded by a matrix, or equivalently by a homomorphism between a pair of free modules. We study below how the change of basis techniques can be applied to unravel the structure of a module starting with its presentation.

Definition 13.1.1

Let \(R\) be a commutative ring with \(1 \neq 0\), let \(A \in \mathrm{Mat}_{m,n}(R)\), and let \(t_A\!: R^n \to R^m\) be the \(R\)-module homomorphism represented by \(A\) with respect to the standard bases; i.e., the homomorphism given by the rule \(t_A(v)=Av\). The \(R\)-module presented by \(A\) is the \(R\)-module \(R^m/\mathrm{im}(t_A)\).

The \(R\)-module \(M\) presented by \(A \in \mathrm{Mat}_{m,n}(R)\) has \(m\) generators and \(n\) relations. Each row of \(A\) corresponds to a generator for \(M\), while each column encodes a relation among those generators. More precisely, the relations among the \(m\) generators are themselves generated by the \(n\) generators of \(\mathrm{im}(t_A)\), which are the images of the standard basis of \(R^n\) by \(t_A\).

Example 13.1.2

The \(\mathbb{Z}\)-module \(M = \mathbb{Z}/6\) is presented by

\[ \mathbb{Z} \xrightarrow{6} \mathbb{Z}, \]

since \(M \cong \mathbb{Z}/ \mathrm{im}(t_6) = \mathbb{Z}/(6)\). Notice here we abused notation and wrote \(6\) instead of the \(1 \times 1\) matrix \(\left[ 6 \right]\).

Example 13.1.3

Let \(R = k[x,y]\), where \(k\) is a field, and \(I = (x,y)\). The \(R\)-module \(M = R/I\) has \(1\) generator, \(m = 1+I\), so we can write a presentation for \(M\) of the form \(F \xrightarrow{p} R\) for some free module \(F\) and some \(R\)-module homomorphism \(p\). To find such an \(F\), we need to ask about the relations among the generators of \(M\). For any \(a \in I\), we have the relation \(am = 0\), so \(I\) is the module of relations for this presentation of \(M\).

How many generators does the module of relations have? In this case, we need \(2\): the relations \(xm = 0\) and \(ym=0\) generate all the relations, since for any \(a \in I\), we can write \(a = rx+sy\) for some \(x, y \in R\), and thus \(am = 0\) can be rewritten as \(r(xm) + s(ym) = 0\), which is a linear combination of the two relations \(xm=0\) and \(ym=0\). Finally, we have the following presentation for \(M\):

\[ R^2 \xrightarrow{\begin{bmatrix} x & y \end{bmatrix}} R. \]

Indeed, the image of \(\begin{bmatrix} x & y \end{bmatrix}\) is \((x,y)\), and \(M \cong R/(x,y)\).

Conversely, we might be given a matrix and ask about what module it represents; one thing to keep in mind is that some presentations might be inefficient, either by having more generators or more relations than necessary. We want to answer to key questions: given a presentation for a module, how to find a more efficient presentation; and how to decide if two different presentations actually give us isomorphic modules. Keeping these goals in mind, let's try a more elaborate example.

Example 13.1.4

Consider the matrix

\[ A = \begin{bmatrix} 2 & 1 & 0 \\ 3 & 9 & 5 \\ 1 & -2 & 7 \\ 0 & 1 & 2 \\ \end{bmatrix}. \]

What \(\mathbb{Z}\)-module \(M\) is presented by \(A\)? Formally, \(M\) is the quotient module \(M=\mathbb{Z}^4/\mathrm{im}(t_A)\), where \(t_A \!:\mathbb{Z}^3 \to \mathbb{Z}^4\) is defined by \(t_A(v)=Av\). Since \(\mathbb{Z}^4\) is generated by its standard basis elements \(\{e_1,e_2,e_3,e_4\}\), we deduce as in Lemma \(\ref{lem:fg}\) that \(M=\mathbb{Z}^4/\mathrm{im}(t_A)\) is generated by the cosets of the \(e_i\). To keep the notation short, we set \(m_i=e_i+\mathrm{im}(t_A)\).

Let \(N=\mathrm{im}(t_A)\) and note that \(N\) is the submodule of \(\mathbb{Z}^4\) generated by the columns of \(A\):

\[ N=R\left\{\begin{bmatrix} 2 \\ 3 \\ 1\\ 0 \end{bmatrix}, \begin{bmatrix} 1 \\ 9 \\ -2 \\1 \end{bmatrix},\begin{bmatrix} 0 \\ 5 \\ 7\\2 \end{bmatrix}\right\}=R\{2e_1 + 3e_2 + e_3,e_1 + 9e_2 -2 e_3 + e_4, 5e_2 + 7 e_3 + 2 e_4 \}. \]

Since \(N\) maps to \(0\) under the quotient map \(q\!:\mathbb{Z}^4\to M=\mathbb{Z}^4/N\), the relations of \(M\) can be written as

\[ \begin{cases} 2m_1 + 3m_2 + m_3 & = 0 \\ m_1 + 9m_2 -2 m_3 + m_4 & = 0 \\ 5m_2 + 7 m_3 + 2 m_4 & = 0. \end{cases} \]

We can now see that this is a rather inefficient presentation, since we can clearly use the first equation to solve for \(m_3=-2m_1 - 3m_2\). This implies that \(M\) can be generated using only \(m_1, m_2\) and \(m_4\), that is

\[ M=R\{m_1,m_2,m_3,m_4\}=R\{m_1,m_2,m_4\}. \]

This eliminates the first equation and the latter two become

\[ \begin{cases} 5m_1 + 15m_2 + m_4 & = 0 \\ -14m_1 -16m_2 + 2m_4 & = 0 \\ \end{cases} \]

Now we can also eliminate \(m_4\), i.e leaving just two generators \(m_1, m_2\) that satisfy

\[ -24m_1 -46m_2 = 0. \]

Another way to do this is to look at the matrix \(A\) and use elementary row operations to ``make zeros" on the 1st and 2nd columns, as follows:

\[ A = \begin{bmatrix} 2 & 1 & 0 \\ 3 & 9 & 5 \\ 1 & -2 & 7 \\ 0 & 1 & 2 \\ \end{bmatrix} \rightarrow \begin{bmatrix} 0 & 5 & -14 \\ 0 & 15 & -16 \\ 1 & -2 & 7 \\ 0 & 1 & 2 \\ \end{bmatrix} \rightarrow \begin{bmatrix} 0 & 0 & -24 \\ 0 & 0 & -46 \\ 1 & 0 & 11\\ 0 & 1 & 2 \\ \end{bmatrix} \]

Eliminating the generators \(m_3\) and \(m_4\) amounts to dropping the first two columns (which are the 3rd and 4th standard basis vectors) as well as the last two rows. As we will prove soon, this shows that the \(\mathbb{Z}\)-module presented by \(A\) is isomorphic to the \(\mathbb{Z}\)-module presented by

\[ B= \begin{bmatrix} -24 \\ - 46 \end{bmatrix}. \]

We can go further. Set \(m_1' := m_1 + 2m_2\). Then \(m_1'\) and \(m_2\) also form a generating set of \(M\). The relation on \(m_1, m_2\) translates to

\[ -24m'_1 +2 m_2 = 0 \]

given by the matrix

\[ C = E_{2,1}(-2)B= \begin{bmatrix} -24 \\ 2 \end{bmatrix}. \]

Note that we have done a row operation (subtract twice row 1 from row 2) to get from \(B\) to \(C\). Continuing in this fashion by adding 12 row 2 to row 1 we also form

\[ D = E_{1,2}(12)C= \begin{bmatrix} 0 \\ 2 \end{bmatrix}, \]

The last matrix \(D\) presents the module \(M'=\mathbb{Z}^2/\mathrm{im}(t_D)\) with generators \(a, b\), where

\[ a=e_1+\mathrm{im}(t_D), \quad b=e_2+\mathrm{im}(t_D) \]

and relation \(2a = 0\). This module \(M'\) is isomorphic to our original module \(M\). As we will see, this proves \(M \cong \mathbb{Z} \oplus \mathbb{Z}/2\). An explicit isomorphism between \(M'\) and \(\mathbb{Z} \oplus \mathbb{Z}/2\) is given by sending \(\mathbb{Z}^2\to \mathbb{Z} \oplus \mathbb{Z}/2\) by the unique \(\mathbb{Z}\)-module homomorphism defined by

\[ e_1\mapsto (1,0) \textrm{ and } e_2\mapsto (0,[1]_2). \]

Now notice that the kernel of this homomorphism is the submodule \((2e_2)\mathbb{Z}=\mathrm{im}(t_D)\). Then the first isomorphism theorem gives \(M'=\mathbb{Z}^2/\mathrm{im}(t_D)\cong \mathbb{Z} \oplus \mathbb{Z}/2\).

Lemma 13.1.5

Let \(R\) be a commutative ring with \(1 \neq 0\), \(A \in \mathrm{Mat}_{m,n}(R)\) and \(B \in M_{m',n'}(R)\) for some \(m,n,m',n' \geqslant 1\). Then \(A\) and \(B\) present isomorphic \(R\)-modules if \(B\) can be obtained from \(A\) by any finite sequence of operations of the following form:

\(B=PA\) for some invertible matrix \(P\); for example, an elementary row operation,
\(B=AQ\) for some invertible matrix \(Q\); for example, an elementary column operation,
deletion of the \(j\)th column and \(i\)th row of \(A\) if \(Ae_j = e_i\), that is, if the \(j\)th column of \(A\) is the vector \(e_i\),
the reverse of 3: insertion of a row and column satisfying \(Ae_j = e_i\),
deletion of a column of all \(0\)'s,
the reverse of 5: insertion of a column of all \(0\)s.

Proof of Lemma 13.1.5

It is sufficient to show that each individual operation gives an isomorphism, as the composition of isomorphisms is an isomorphism.

We start with (2). Since \(Q\) is invertible, \(t_Q\) is an invertible homomorphism, so \(t_Q\) is surjective. Then \(\mathrm{im}(t_{AQ}) = \mathrm{im}(t_A \circ t_Q) = \mathrm{im}(t_A)\). Thus \(R^m/ \mathrm{im}(t_{AQ}) = R^m / \mathrm{im}(t_{A})\). Note that we have equality and not just isomorphism in this case.

In case (1), the map \(t_P : R^m \to R^m\) is an isomorphism. We have \(t_P (\mathrm{im}(t_A)) = \mathrm{im}(t_{PA})\). Then the map \[ R^m \xrightarrow{t_P} R^m \xrightarrow{\pi} R^m / \mathrm{im}(t_{PA})\] is surjective with kernel \(\mathrm{im}(t_A)\), so there is an isomorphism \(R^m/\mathrm{im}(t_A) \cong R^m/ \mathrm{im}(t_{PA})\) by the First Isomorphism Theorem.

For case (3), we have \(m'=m-1\) and \(n'=n-1\). Since \(R^m\) is free, by the UMP for free modules there is a unique \(R\)-module homomorphism \(p\!: R^{m} \to R^{m-1}\) sending

\[ \begin{aligned} e_1 \mapsto e'_1, & \ldots, e_{i-1} \mapsto e'_{i-1}\\ & e_i \mapsto 0\\ e_{i+1} \mapsto e'_{i}, & \ldots, e_{m} \mapsto e'_{m-1} \end{aligned} \]

Similarly, there is a unique \(R\)-module homomorphism \(q\!: R^{n} \to R^{n-1}\) sending

\[ \begin{aligned} e_1 \mapsto e'_1, &\ldots, e_{j-1} \mapsto e'_{j-1}, \\ e_j &\mapsto 0, \\ e_{j+1} \mapsto e'_{j}, &\ldots, e_n \mapsto e'_{n-1}. \end{aligned} \]

Here the elements \(e_i\) are part of a standard basis for \(R^n\) or for \(R^m\), while the elements \(e'_i\) are part of a standard basis for \(R^{n-1}\) or for \(R^{m-1}\). Then the diagram

\[ \begin{array}{ccc} R^n & \xrightarrow{\;A\;} & R^m \\ \Big\downarrow\rlap{\scriptstyle q} & & \Big\downarrow\rlap{\scriptstyle p} \\ R^{n-1} & \xrightarrow{\;A'\;} & R^{m-1} \end{array} \]

commutes by the definition of \(A'\). In particular, \(p(\mathrm{im}(t_A)) \subseteq \mathrm{im}(t_{A'})\) and so \(p\) induces an \(R\)-module homomorphism

\[ \overline{p}\!: M \to M', \]

and we claim \(\overline{p}\) is bijective.

Since \(p\) is onto, so is \(\overline{p}\). Suppose \(m \in \ker(\overline{p})\). Then \(m = v + \mathrm{im}(t_A)\) for some \(v \in R^m\) and \(p(v) \in \mathrm{im}(t_{A'})\). Say \(p(v) = A' w\). Since \(q\) is onto, \(w = q(u)\) for some \(u\). Then

\[ p(v - Au) = p(v) - pA(u) = p(v) - A'q(u) = p(v) - A'w = p(v) - p(v) = 0, \]

and thus \(v - Au \in \ker(p)\). Now, the kernel of \(p\) is clearly \(Re_i\), so that \(v - Au = re_i\) for some \(r\). Finally, since \(Ae_j = e_i\), we have \(A(re_j) = re_i = v - Au\) and hence \(v = A(u + re_j)\), which proves \(v=t_A(u+re_j) \in \mathrm{im}(t_A)\) and hence that \(m = 0\).

For (5), it is clear that the columns of \(A'\) generate the same submodule of \(R^m\) as do the columns of \(A\), and thus \(M = M'\).

Finally, for operations (4) and (6), since the isomorphism relation is reflexive, the statements of parts (3) and (5) show that parts (4) and (6) are true as well.

13.2 Existence of presentations

Which modules have presentations? If we take this in a broad sense, the answer is every module.

Theorem 13.2.1

Let \(R\) be a ring and \(M\) be a module. Then there exist free modules \(F,G\) and a homomorphism \(\alpha\colon G\to F\) such that \(M\cong F/ \mathrm{im}(\alpha)\).

Proof of Theorem 13.2.1

Let \(S\) be a generating set for \(M\), and let \(F= F_R(S)\) be the free module with basis \(S\). By the UMP for free modules, there is a unique homomorphism \(\phi\) such that for each \(s\in S\), \(\phi(s) =s\). Since \(S\) generates \(M\), this map is surjective by an Exercise from earlier.

Let \(K=\ker(\phi)\), and let \(S'\) be a generating set for \(K\). Set \(G=F_R(S')\) to be the free module with basis \(S'\). By the UMP for free modules again, there is a unique \(\alpha\colon G\to F\) such that for each \(s'\in S'\), \(\alpha(s') =s'\). We claim that \(\mathrm{im}(\alpha) = K\) (left as an exercise).

Thus, the First Isomorphism Theorem, \(M\cong F/K = F/\mathrm{im}(\alpha)\).

Suppose that \(R\) is commutative. In the setting of the previous Theorem, if \(F\) is free of rank \(m\), and \(G\) is free of finite rank \(n\), then by picking bases for \(F\) and \(G\), we can rewrite \(\alpha:G\to F\) as \(t_A: R^n \to R^m\) for some matrix \(A\).

Since this would yield \(m\) generators for \(M\), this can only happen if \(M\) is finitely generated. This in general does not suffice to guarantee that there will only be finitely many generators for the submodule of relations.

It might seem like no submodule of a finitely generated module could ever fail to itself be finitely generated, but indeed this happens!

Example 13.2.2

Let \(k\) be a field and \(R = k[x_1, x_2, \ldots]\) be a polynomial ring in infinitely many variables. When we think of \(R\) as a module over itself, it is finitely generated, by the element \(1\). However, there are submodules of \(R\) that are not finitely generated: for example, the ideal \((x_1, x_2, \ldots)\) generated by all the variables.

We will need the following fact.

Exercise 13.2.3

Let \(M\) be a module and \(N\) be a submodule. If \(N\) and \(M/N\) are finitely generated, then \(M\) is finitely generated.

Theorem 13.2.4

Let \(R\) be a PID. Then every submodule of a finitely generated module is also finitely generated.

Proof of Theorem 13.2.4

We will first prove that for each \(n \geqslant 1\), every submodule of \(R^n\) is finitely generated. The base case \(n=1\) holds by the definition, since a submodule of \(R^1\) is the same thing as an ideal of \(R\), and every ideal is generated by one element. Assume \(n > 1\) and that every submodule of \(R^{n-1}\) is finitely generated. Let \(M\) be any submodule of \(R^n\). Define

\[ \pi\!: R^n \twoheadrightarrow R^1 \]

to be the projection onto the last component of \(R^n\). The kernel of \(\pi\) may be identified with \(R^{n-1}\), and so \(N := \ker(\pi) \cap M\) is a submodule of \(R^{n-1}\). By assumption, \(N\) is finitely generated. The image \(\pi(M)\) is a submodule of \(R^1\), that is, an ideal of \(R\), and so it too is principal. Furthermore, by the First Isomorphism Theorem \(M/\ker(\pi)\cong \pi(M)\). By the Exercise above, we deduce that \(M\) is a finitely generated module.

Now let \(T\) be any finitely generated \(R\)-module and \(N \subseteq T\) any submodule. Since \(T\) is finitely generated, there exists a surjective \(R\)-module homomorphism \(q\!: R^n \twoheadrightarrow T\) for some \(n\). Then \(q^{-1}(N)\) is a submodule of \(R^n\) and hence it is finitely generated by the case we already proved, say by element \(v_1, \dots , v_m \in q^{-1}(N)\). We claim that \(q(v_1), \dots, q(v_m)\) generate \(N\). Given any \(a \in N\), since \(q\) is surjective we can find some \(b \in q^{-1}(N)\) such that \(q(b) = a\). Since \(v_1, \ldots, v_m\) generated \(q^{-1}(N)\), we can find \(c_1, \ldots, c_m \in R\) such that

\[ b = c_1 v_1 + \cdots + c_m v_m \implies c_1 q(v_1) + \cdots + c_m q(v_m) = q(c_1 v_1 + \cdots + c_m v_m) = q(b) = a. \]

Theorem 13.2.5

Any finitely generated module \(M\) over a PID \(R\) has a finite presentation given by an \(m \times n\) matrix \(A\), that is, there is an isomorphism

\[ M\cong R^m/\mathrm{im}(t_A), \]

where \(t_A\!: R^n \to R^m\) is the map on free modules \(t_A(v)=Av\) induced by \(A\).

Proof of Theorem 13.2.5

Let \(M\) be a finitely generated module over a PID. We follow the argument of Theorem 13.2.1. Choose a finite generating set \(y_1, \dots y_m\) of \(M\) and obtain an \(R\)-module map \(\pi\!: R^m \to M\) that sends \(e_i\) to \(y_i\), by using the UMP for free modules. Since every element in \(M\) is given as a linear combination of the \(y_i\), the map \(\pi\) is surjective. Notice, however, that this representation as a linear combination of the \(y_i\) is not necessarily unique, so \(\pi\) might have a nontrivial kernel.

Since \(R^m\) is finitely generated and \(R\) is a PID, the submodule \(\ker(\pi)\) is also finitely generated, say by \(z_1, \dots, z_n\). This too leads to a surjective \(R\)-module map \(g\!: R^n \to \ker(\pi)\) that sends \(e_i \mapsto z_i\). The composition of \(g\!: R^n \twoheadrightarrow \ker(\pi)\) followed by the inclusion of \(\iota\!: \ker(\pi) \hookrightarrow R^m\) is an \(R\)-module homomorphism \(t=\iota \circ g:R^n \to R^m\) and hence by the correspondence between matrices and homomorphisms, we know \(t\) is given by a \(m \times n\) matrix \(A=[t]_B^C\) with respect to the standard bases of \(R^m\) and \(R^n\) respectively, meaning \(t=t_A\).

It remains to show that \(M\cong R^m/\mathrm{im}(t_A)\). First note that since \(t_A=\iota \circ g\) and \(g\) is surjective we have

\[ \mathrm{im}(t_A)=\mathrm{im}(\iota \circ g)=\iota(\mathrm{im}(g))=\iota(\ker(\pi))=\ker(\pi). \]

By the First Isomorphism Theorem we now have

\[ M = \mathrm{im}(\pi)\cong R^m/\ker(\pi)= R^m/\mathrm{im}(t_A). \]

13.3 Classification of finitely generated modules over PIDs

We just showed that any finitely generated module \(M\) over a PID has a finite presentation matrix \(A\). We will discuss a canonical form for such a matrix \(A\) and the consequences it has on determining the isomorphism type of \(M\).

Theorem 13.3.1 (Smith Normal Form (SNF))

Let \(R\) be a PID and let \(A \in \mathrm{Mat}_{m,n}(R)\). Then there exist invertible matrices \(P\) and \(Q\) such that \(M = PAQ = [a_{ij}]\) satisfies the following: all nondiagonal entries of \(M\) are \(0\), meaning \(a_{ij} = 0\) if \(i \neq j\), and the diagonal entries of \(M\) satisfy

\[ a_{11} \mid a_{22} \mid a_{33} \mid \cdots. \]

Moreover, the number \(\ell\) of nonzero entries of \(M\) is uniquely determined by \(A\), and the nonzero diagonal entries \(a_{11},\ldots, a_{\ell \ell}\) are unique up to associates.

Furthermore, if \(R\) is a Euclidean domain, then \(P\) and \(Q\) can be chosen to be a product of elementary matrices.

Elementary row and column operations correspond to multiplication by elementary matrices, which are invertible, and that the composition of invertible matrices is invertible. So whenever we apply elementary row and column operations, we can translate it into multiplication by an invertible matrix on the left or the right, respectively.

To transform a matrix \(A\) into its Smith Normal Form, we will use a sequence of steps that all correspond to multiplication by invertible matrices. Many of those steps will actually be elementary row and column operations, which correspond to multiplication by an elementary matrix. Elementary matrices are invertible, and a product of invertible matrices is invertible, and so any finite sequence of elementary row and column operations can be described by multiplication by an invertible matrix. However, in general not every invertible matrix can be obtained as a product of elementary matrices. In fact, there are examples of PIDs \(R\) and matrices \(A\) for which the Smith Normal Form cannot be obtained by simply taking a sequence of elementary row and column operations. However, it is not easy to give such an example, in part because when our PID \(R\) is nice enough, the Smith Normal Form can in fact be obtained by simply taking a sequence of elementary row and column operations. This is the case for Euclidean domains: over such rings, the Euclidean Algorithm for finding the gcd of two elements works, and it's the key step we will need to find a Smith Normal Form. When \(R\) is a general PID, however, we need to work a little harder.

Before we prove Theorem 13.3.1, let's see how to classify modules over PIDs using the Smith Normal Form for their presentation matrix. First, we need a lemma on how to interpret the module presented by a matrix in Smith Normal Form; we leave the proof as an exercise.

Lemma 13.3.2

Let \(R\) be a commutative ring with \(1 \neq 0\), let \(m \geqslant n\), let \(A = [a_{ij}] \in \mathrm{Mat}_{m,n}(R)\) be a matrix such that all nondiagonal entries of \(A\) are \(0\), and let \(M\) be the \(R\)-module presented by \(A\). Then \(M \cong R^{m-n} \oplus R/(a_{11}) \oplus \dots \oplus R/(a_{nn})\).

Theorem 13.3.3 (Classification of finitely generated modules over a PID using invariant factors)

Let \(R\) be a PID and let \(M\) be a finitely generated module. Then there exist \(r \geqslant 0\), \(k \geqslant 0\), and nonzero nonunit elements \(d_1,\ldots, d_k\) of \(R\) satisfying \(d_1 \mid d_2 \mid \dots \mid d_k\) such that

\[ M \cong R^r \oplus R/(d_1) \oplus \dots \oplus R/(d_k). \]

Moreover \(r\) and \(k\) are uniquely determined by \(M\), and the \(d_i\) are unique up to associates.

Proof of Theorem 13.3.3

Existence:

By Theorem 13.2.3, \(M\) has a presentation matrix \(A\). By Theorem 13.3.1, \(A\) can be put into Smith Normal Form \(B\), where the diagonal entries of \(B\) are \(b_1, \ldots, b_\ell\) and satisfy \(b_1 \mid b_2 \mid \cdots \mid b_k\). Moreover, \(k\) is unique and the \(d_i\) are uniquely determined up to associates (ie, up to multiplication by units) by \(A\), hence by \(B\). By Theorem 13.2.3, \(M\) is isomorphic to the module presented by \(B\). By Lemma 13.3.2, this is isomorphic to

\[ M \cong R^r \oplus R/(b_1) \oplus \dots \oplus R/(b_\ell). \]

Finally, some of these \(b_i\) might be units; let \(d_1 \mid \cdots \mid d_k\) be the nonunits among the \(b_i\), and note that if \(u\) is a unit, then \(R/(u) \cong (0)\). We conclude that

\[ M \cong R^r \oplus R/(d_1) \oplus \dots \oplus R/(d_k), \]

as desired.

Uniqueness:

For uniqueness, we show by induction on \(\max\{k,k'\}\) that if \[M = R^r \oplus R/(d_1) \oplus \dots \oplus R/(d_k) \cong N= R^{r'} \oplus R/(d'_1) \oplus \dots \oplus R/(d'_{k'})\] with \(d_1 | \cdots | d_k\) and \(d'_1 | \cdots | d'_{k'}\), then \(r=r'\), \(k=k'\), and \(d_i\) and \(d'_i\) are associates for all \(i\). If \(k=k'=0\) then this follows from invariance of rank for free modules over commutative rings, which you proved in the homework. If \(k>0\), then consider the element \(m=\iota(1+(d_k))\in M\) and let \(n=\iota(1+(d'_{k'}))\in N\). Note that \(\mathrm{ann}_R(m)= (d_k)\), and since \(\phi\) is an isomorphism, we also have \(\mathrm{ann}_R(\phi(m))= (d_k)\). On the other hand, any element of \(N\) with nonzero annihilator must have \(R^{r'}\) component zero, and must then be annihilated by \(d'_{k'}\). It follows that \(d'_{k'}\in (d_k)\), so \(d_k \ | \ d'_{k'}\). Switching roles, we must also have \(d'_{k'} \ | \ d_k\), so \(d_k\) and \(d'_{k'}\) are associates. Now, we claim that there is an \(R\)-module isomorphism \(\psi: N \to N\) such that \(\psi(\phi(m)) = n\) (left as exercise). It follows that \(\psi \circ \phi: M\to N\) is an isomorphism and \(\psi\circ \phi(\iota(R/(d_k))) = \iota(R/(d'_{k'}))\). We get an induced isomorphism \[ R^r \oplus R/(d_1) \oplus \dots \oplus R/(d_{k-1}) \cong M/ \iota(R/(d_k)) \cong N/ \iota(R/(d'_{k'})) \cong R^{r'} \oplus R/(d'_1) \oplus \dots \oplus R/(d'_{k'-1}).\] It then follows from the induction hypothesis that \(r=r'\), \(k=k'\), and \(d_i\) and \(d'_i\) are associates for \( i\lt k \). Since we already saw that \(d_k\) and \(d'_{k'}\) are associates, the induction is complete.

Definition 13.3.4

Let \(R\) be a PID, let \(r \geqslant 0, k \geqslant 0\), and let \(d_1,\ldots, d_k\) be nonzero nonunit elements of \(R\) satisfying \(d_1 \mid d_2 \mid \dots \mid d_k\). Let \(M\) be any \(R\)-module such that

\[ M \cong R^r \oplus R/(d_1) \oplus \dots \oplus R/(d_k). \]

We say \(M\) has free rank \(r\) and invariant factors \(d_1,\ldots,d_k\).

Notice that the invariant factors of \(M\) are only defined up to multiplication by units.

Remark 13.3.5

The classification theorem can be interpreted as saying that \(M\) decomposes into a free submodule \(R^r\) and a torsion submodule \(\mathrm{Tor}(M)=R/(d_1) \oplus \dots \oplus R/(d_k)\).

Corollary 13.3.6 (Classification of finitely generated abelian groups)

Let \(G\) be a finitely generated abelian group. Then

\[ G \cong \mathbb{Z}^r \oplus \mathbb{Z}/n_1 \oplus \dots \oplus \mathbb{Z}/n_k \]

for some \(r \geqslant 0\), \(k \geqslant 0\), and \(n_i \geqslant 2\) for all \(i\), satisfying \(n_{i+1} \mid n_i\) for all \(i\). Moreover, the integers \(r\), \(k\), and \(n_1,\ldots n_k\) are uniquely determined by \(G\).

Example 13.3.7

Consider the \(\mathbb{Z}\)-module \(M\) presented by the matrix

\[ A=\begin{bmatrix} 1 & 6 & 5 & 2 \\ 2 & 1 & -1 & 0 \\ 3 & 0 & 3 & 0 \end{bmatrix}. \]

We can obtain the Smith Normal Form as follows:

\[ A=\begin{bmatrix} 1 & 6 & 5 & 2 \\ 2 & 1 & -1 & 0 \\ 3 & 0 & 3 & 0 \end{bmatrix} \xrightarrow[R3 \to R3-3R1]{R2 \to R2-2R1} \begin{bmatrix} 1 & 6 & 5 & 2 \\ 0 & -11 & -11 & -4 \\ 0 & -18 & -12 & -6 \end{bmatrix} \rightarrow \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & -11 & -11 & -4 \\ 0 & -18 & -12 & -6 \end{bmatrix} \] \[ \xrightarrow{C2 \leftrightarrow C4} \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & -4 & -11 & -11 \\ 0 & -6 & -12 & -18 \end{bmatrix} \xrightarrow[C4 \to C4 + 3C1]{C3 \to C3+2C2} \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & -4 & -3 & 1 \\ 0 & -6 & 0 & 0 \end{bmatrix} \xrightarrow{C2 \leftrightarrow C4} \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & -3 & -4 \\ 0 & 0 & 0 & -6 \end{bmatrix} \] \[ \rightarrow \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0\\ 0 & 0 & 0 & -6 \end{bmatrix} \xrightarrow{C3 \leftrightarrow C4} \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0\\ 0 & 0 & -6 & 0 \end{bmatrix} \xrightarrow{C3 \to -C3} \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0\\ 0 & 0 & 6 & 0 \end{bmatrix} . \]

Thus the Smith normal form of \(A\) is

\[ M=\begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 6 & 0 \end{bmatrix}, \]

with invariant factor \(d_1=6\). Notice that the two ones are not invariant factors: we only care about nonunits. Therefore we have

\[ M\cong \mathbb{Z}/(1)\oplus \mathbb{Z}/(1)\oplus \mathbb{Z}/(6)\cong \mathbb{Z}/(6). \]

Here is a spinoff of the classification theorem.

Theorem 13.3.8 (Classification of finitely generated modules over a PID using elementary divisors)

Let \(R\) be a PID and let \(M\) be a finitely generated module. Then there exist \(r \geqslant 0\), \(s \geqslant 0\), prime elements \(p_1,\ldots,p_s\) of \(R\) (not necessarily distinct), and \(e_1,\ldots ,e_s \geqslant 1\) such that

\[ M \cong R^r \oplus R/(p_1^{e_1}) \oplus \cdots \oplus R/(p_s^{e_s}). \]

Moreover, \(r\) and \(s\) are uniquely determined by \(M\), and the list \(p_1^{e_1},\ldots, p_s^{e_s}\) is unique up to associates and reordering.

Proof of Theorem 13.3.8

First, write \(M\) in invariant factor form \(M \cong R^r \oplus R/(d_1) \oplus \dots \oplus R/(d_k)\). Then write each invariant factor as a product of prime powers

\[ d_i := \prod_{j=n_i}^{n_{i+1}} p_j^{e_j}, \]

and recall that by the CRT we have

\[ R/(d_i)\cong R/(p_{n_i}^{e_{n_i}})\oplus\dots\oplus R/(p_{n_{i+1}}^{e_{n_{i+1}}}). \]

Substituting into the invariant factor form gives the desired result. Uniqueness follows from the uniqueness of the invariant factor form and of the prime factorizations of each \(d_i\).

Definition 13.3.9

Let \(R\) be a PID, let \(r \geqslant 0\), \(s \geqslant 0\), \(p_1,\ldots,p_s\) be prime elements of \(R\), and let \(e_1,\ldots,e_s \geqslant 1\). Let \(M\) be the \(R\)-module \(M \cong R^r \oplus R/(p_1^{e_1}) \oplus \dots \oplus R/(p_s^{e_s})\). The elements \(p_1^{e_1},\ldots, p_s^{e_s}\) of \(R\) are the elementary divisors of \(M\).

Careful that a particular prime might appear repeatedly in the elementary divisors of a particular module.

Example 13.3.10

When \(R = \mathbb{Z}\) and \(M = \mathbb{Z}/(6)\), we can write \(M\cong \mathbb{Z}/(2)\oplus \mathbb{Z}/(3)\), so the elementary divisors are \(2\) and \(3\).

Corollary 13.3.11

Let \(G\) be a finitely generated abelian group. Then there exist \(r,s \geqslant 0\), prime integers \(p_1, \ldots, p_s\), and positive integers \(e_i \geqslant 1\) such that

\[ G \cong \mathbb{Z}^r\oplus \mathbb{Z}/p_1^{e_1} \oplus \dots \oplus \mathbb{Z}/p_s^{e_s}. \]

Moreover, \(r\), \(p_i\), and \(e_i\) are all uniquely determined by \(G\).

13.4 Proof of the Smith Normal Form Theorem

We have yet to show that every matrix over a PID has a Smith Normal Form.

Definition 13.4.1

Let \(R\) be a PID and \(A \in \mathrm{Mat}_{m,n}(R)\). For \(1\leq t \leq \min\{m,n\}\) we define \(I_t(A)\) to be the ideal generated the \(t\times t\) minors of \(A\), i.e., the determinants of \(t\times t\) submatrices of \(A\). In particular, \(I_1(A)\) is the ideal generated by all entries of \(A\). We write \(\gcd(A)\) for a generator of \(I_1(A)\).

Note that \(\gcd(A)\) is only well-defined up to associates: in a domain, two elements generate the same ideal if and only if they are unit multiples of each other. We will say \(\gcd(A)=\gcd(B)\) to mean that there exist equal gcds for \(A\) and \(B\).

We will use the following fact that you proved in the Homework.

Lemma 13.4.2

Let \(R\) be a commutative ring. Let \(A \in \mathrm{Mat}_{m,n}(R)\) be any matrix and let \(P \in \mathrm{Mat}_m(R)\) and \(Q \in \mathrm{Mat}_n(R)\) be invertible matrices. Then for any \(t\geq 0\), we have \(I_t(A) = I_t(PAQ)\), where \(I_t\) denotes the ideal generated by all \(t\times t\) minors of a matrix.

In particular, \(\gcd(A) = \gcd(PAQ)\).

Lemma 13.4.3

Let \(R\) be a PID and \(x, y \in R\). Let \(g\) be a GCD of \(x\) and \(y\).

There exists an invertible \(2 \times 2\) matrix \(P \in \mathrm{Mat}_{2}(R)\) such that \[ P \begin{bmatrix} x \\ y\end{bmatrix} = \begin{bmatrix} g \\ 0 \end{bmatrix}. \]
If \(R\) is a Euclidean domain, then \(P\) can be chosen to be a product of elementary matrices.

Proof of Lemma 13.4.3

We first prove part (1). Let \(g\) be a gcd of \(x\) and \(y\), so \((x,y) = (\gcd(x,y))\). Thus, there exist \(a, b \in R\) such that \(ax+by = g\). Since \(g\) divides \(x\), we can write \(x=g x'\) for some \(x'\in R\), and likewise we can write \(y=gy'\) for some \(y'\in R\). Then \(a g x' + b g y' = g\), and since \(R\) is a domain, \(a x' + b y' =1\). Set \(P= \begin{bmatrix} a & b \\ -y' & x' \end{bmatrix}\). Then \(P \begin{bmatrix} x \\ y\end{bmatrix} = \begin{bmatrix} g & 0\end{bmatrix}\), since \(-y' x + x' y = g(-y' x' + x' y') = 0\). To check that \(P\) is invertible, we claim that \(P^{-1} = \begin{bmatrix} x' & -b \\ y' & a\end{bmatrix}\), which can be checked directly by multiplying this matrix on both sides with \(P\).

We now prove part (2). Let \(N\) be a Euclidean norm for \(R\). We proceed by induction on \(M= \min\{N(x),N(y)\}\). If \(M=0\) then either \(x\) or \(y\) has norm zero and hence is a unit. Swapping rows if necessary (which is an elementary operation), we can assume \(x\) is a unit, and hence the GCD of \(x\) and \(y\). Then

\[ \begin{bmatrix} 1 & 0 \\ -yx^{-1} & 1 \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} x \\ 0 \end{bmatrix}. \]

For the inductive step, again swapping if necessary, we can assume that \(N(x)\geq N(y)\). By the division algorithm, we can write \(x=qy+r\) with either \(r=0\) or \(N(r) \lt N(y)\). Note that the GCD \(g\) of \(x\) and \(y\) is equal to the GCD of \(y\) and \(r\). We have

\[ \begin{bmatrix} 1 & -q \\ 0 & 1\end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} r \\ y \end{bmatrix}. \]

If \(r=0\) we are done; otherwise we can apply the induction hypothesis to find some product of elementary matrices \(P\) such that

\[ P \begin{bmatrix} r \\ y \end{bmatrix} = \begin{bmatrix} g \\ 0\end{bmatrix} . \]

Then \(\begin{bmatrix} 1 & -q \\ 0 & 1\end{bmatrix} P\) is the product we seek.

By transposing the matrices in Lemma 13.4.3, we can show that there exists an invertible \(2 \times 2\) matrix \(Q\) such that

\[ \begin{bmatrix} x & y\end{bmatrix} Q = \begin{bmatrix} g & 0 \end{bmatrix}. \]

We are now finally ready to show that every matrix over a PID can be put into Smith Normal Form.

Theorem 13.4.4

Let \(R\) be a PID and let \(A \in \mathrm{Mat}_{m,n}(R)\). There exist invertible matrices \(P\) and \(Q\) such that \(D = PAQ = [a_{ij}]\) satisfies the following: all nondiagonal entries of \(M\) are 0, meaning \(a_{ij} = 0\) if \(i \neq j\), and the diagonal entries of \(M\) satisfy

\[ a_{11} \mid a_{22} \mid a_{33} \mid \cdots. \]

Moreover, the number \(\ell\) of nonzero entries of \(D\) is uniquely determined by \(A\), and the nonzero diagonal entries \(a_{11},\ldots, a_{\ell \ell}\) are unique up to multiplication by units. Furthermore, if \(R\) is a Euclidean domain, that \(P\) and \(Q\) can be chosen to be products of elementary matrices.

Proof of Theorem 13.4.4

We show the existence of such matrices \(P\), \(Q\) in steps.

Proof of existence: To construct \(P,Q\), we note first that it suffices to show the following:

Key Claim: Given a matrix \(A\), there exist invertible (products of elementary matrices, if \(R\) is Euclidean) \(P',Q'\) matrices such that

\[ P' A Q' = \begin{bmatrix} d_1 & 0 \\ 0 & B_1\end{bmatrix} \]

with \(d_1=\gcd(A)\) and the \(0\)'s denote a row and column of zeroes.

Indeed, once we have shown this, note that \(d_1\) divides every entry of \(B\), so \(d_1 \ | \ \gcd(B)\). Then we can apply the claim to get

\[ P'' (P' A Q') Q'' = \begin{bmatrix} d_1 & 0 & 0 \\ 0 & d_2 & 0 \\ 0 & 0 & B_2 \end{bmatrix} \]

where \(d_2= \gcd(B_1)\) and \(d_1 \ | \ d_2 \ | \ \gcd(B_2)\). Repeating this process ends in a matrix of the form we seek.

Proof of Key Claim, case 1: In the setting of the key claim, suppose that \(d_1= \gcd(A)\) occurs as some entry of \(A\). After swapping rows and columns (which corresponds to multiplication by elementary matrices), we can assume that \(d_1\) is the top left entry of \(A\). Now, every element of the first row and the first column (and every entry!) is a multiple of \(d_1\), so we can subtract a suitable multiple of row one from each row (which corresponds to multiplication by elementary matrices) to get zeroes in the other entries of the first row. After that, do the same with the columns. Then we are done.

Proof of Key Claim, case 2: Without assuming that \(d_1=\gcd(A)\) occurs as an entry of \(A\), choose an entry \(a\) of \(A\) for which the number of irreducible factors of \(a/d_1\) is minimal. We proceed by induction on the number of such factors; the base case where this number is zero is case 1 above. Using row and column operations, we can put \(a\) in the top left entry.

If \(a\) does not divide some entry \(b\) of the first row, use a column operation to put \(b\) in the \(1,2\) position. Using the Lemma, we can multiply by a suitable matrix to replace \(a,b\) by \(\gcd(a,b)\) and \(0\). Since \(a\) does not divide \(b\) and \(d_1 \ | \ \gcd(a,b) \ | \ a\), the number of irreducible factors of \(\gcd(a,b)/d_1\) must be strictly less than that of \(a/d_1\). We can then apply the induction hypothesis.

A similar argument applies if \(a\) does not divide some entry of the first column.

If \(a\) does divide every entry of the first row and first column, proceed as in case 1 to clear out the rest of the first row and column, and then add a row containing an entry \(b\) that does not divide \(a\) to the first row. Then proceed as in the first part of this case.

Proof of uniqueness: For a matrix of the form \(D\), the only minors that are nonzero are those where the choices of columns and rows are the same, and hence the only nonzero \(t \times t\) minors of \(M\) are \(d_{s_1} \cdots d_{s_t}\) for some \(s_1 < \cdots < s_t\). Since \(d_{s_1} \cdots d_{s_t}\) divide each other, it follows that \(I_t(A) = I_t(D) = d_1 \cdots d_t\). This proves uniqueness, for it shows that \(d_1, \dots, d_{\ell}\) are all defined from \(A\) directly, without any choices.

Example 13.4.5

Consider the PID \(R = k[x]\), where \(k\) is any field, and the matrix

\[ A = \begin{bmatrix} x-1 & 0 \\ 1 & x-2 \end{bmatrix}. \]

The first row has already been zeroed out, but unfortunately \(x-1\) does not divide \(1\). In this case, though, we can see that \(\gcd(A) = 1\), so we can switch the first and second rows to get

\[ \begin{bmatrix} 1 & x-2 \\ x-1 & 0 \end{bmatrix}. \]

Now we zero out the rest of the first row and first column using row and column operations:

\[ \begin{bmatrix} 1 & x-2 \\ x-1 & 0 \end{bmatrix} \xrightarrow{R2 \to R2 - (x-1)R1} \begin{bmatrix} 1 & x-2 \\ 0 & -(x-1)(x-2) \end{bmatrix} \xrightarrow{C2 \to C2 - (x-2)C1} \begin{bmatrix} 1 & 0 \\ 0 & -(x-1)(x-2) \end{bmatrix}. \]

This is a Smith Normal Form. If we prefer to not have that negative sign, we can multiply the second row by \(-1\), to obtain

There is only one invariant factor, which is \((x-1)(x-2)\). The \(k[x]\)-module \(M\) presented by \(A\) is

\[ M \cong k[x]/((x-1)(x-2)). \]

If we prefer to write this in terms of elementary divisors, our module has two: \(x-1\) and \(x-2\), and it is isomorphic to

\[ M \cong k[x]/(x-1) \oplus k[x]/(x-2). \]

Chapter 14 Canonical forms for endomorphisms

On obvious case where the structure theory of modules over PIDs applies is in the case of the integers: we obtain the structure theory of finitely generated abelian groups and a method to solve systems of linear equations over integers. We mow move to another key application of this theory: classifying linear transformations and matrices up to similarity.

14.1 The module associated to an \(F\)-linear endomorphism

For any linear transformation from an \(F\)-vector space to itself, we can construct a module over a polynomial ring \(F[x]\).

Definition 14.1.1

Let \(F\) be a field, and \(V\) be an \(F\)-vector space. Let \(\phi:V\to V\) be a linear transformation. For any polynomial

\[ f(x) = a_n x^n + \cdots + a_1 x + a_0, \]

we have an \(F\)-linear transformation \(f(\phi): V\to V\) given by

\[ f(\phi) = a_n \phi^n + \cdots + a_1 \phi + a_0 \mathrm{id}_V. \]

We define a \(V_\phi\) to be the \(F[x]\)-module with underlying additive group \((V,+)\) and \(F[x]\)-action given by

\[ f(x) \cdot v = f(\phi)(v). \]

That is, \(V_\phi\) is the same \(F\)-vector space as \(V\), with a bigger action where the action of \(x\) on \(V\) is given by \(\phi\). Of course, one has to verify that this definition satisfies the module axioms. We leave this for you as an exercise if you want more practice with the module axioms.

Every \(F[x]\)-module can be thought of this way.

Lemma 14.1.2

Let \( F\) be a field. Then there is a bijective correspondence:

\[ \begin{aligned} \{V \ | \ V \ \text{a} \ F[x]\text{-module}\} &\longleftrightarrow \{(V,\phi) \ | \ V \ \text{a} \ F\text{-vector space}, \ \phi\in \mathrm{End}_F(V)\}\\ V &\rightarrow \ (V, \mu_x:V\to V)\\ W_{\phi} &\leftarrow (W,\phi), \end{aligned} \]

where \(\mu_x:V\to V\) is the map of multiplication by \(x\) on the \(F[x]\)-module \(V\).

Proof of Lemma 14.1.2

If \(V\) is an \(F[x]\)-module then \(V\) is an \(F\)-vector space by restriction of scalars along the inclusion \(F \hookrightarrow F[x]\). Let \(\mu_x:V\to V\) be as above. To show that \(\mu_x\in \mathrm{End}_F(V)\), note that for any \(c\in F\) and \(v,v_1, v_2\in V\) the axioms of the \(F[x]\)-module give us

\[ \mu_x(v_1+v_2)=x(v_1+v_2)=xv_1+xv_2=\mu_x(v_1)+\mu_x(v_2) \textrm{ and } \mu_x(cv)=x(cv)=c \mu_x(v). \]

The construction \((W,\phi) \to W_\phi\) was discussed above. It remains to check that these are mutually inverse constructions, which we leave for you.

Example 14.1.3

Let \(F=\mathbb{R}\), and \(V=\mathbb{R}^2\).

For \(\phi=0\), in the \(\mathbb{R}[x]\)-module \(V_\phi\), we have \(x \cdot \begin{bmatrix} \lambda_1 \\ \lambda_2 \end{bmatrix}= \begin{bmatrix} 0 \\ 0 \end{bmatrix}\) and more generally
\[ (c_n x^n +\cdots + c_0) \cdot \begin{bmatrix} \lambda_1 \\ \lambda_2 \end{bmatrix} = \begin{bmatrix} c_0 \lambda_1 \\ c_0 \lambda_2 \end{bmatrix}. \]
The \(\mathbb{R}[x]\)-module \(V_\phi\) is evidently generated by \(e_1\) and \(e_2\), since any element of \(V_\phi\) can we written \(\lambda_1 e_1 + \lambda_2 e_2\). (We didn't even need to use the \(x\)'s!). Since \(x\) times any vector \(v\in V\) is zero, \(x\in \mathrm{ann}_{\mathbb{R}[x]}(V_\phi)\). You can check that \(\mathrm{ann}_{\mathbb{R}[x]}(V_\phi) = (x)\).
Let \(\phi\) be the linear transformation of rotation by \(\pi/3\) counterclockwise. The standard basis vectors \(e_1, e_2\) generate \(V_\phi\), since any element is an \(\mathbb{R}\)-linear combination of \(e_1,e_2\). Even better, \(V_\phi\) is generated by \(e_1\) as an \(\mathbb{R}[x]\)-module, since \(e_1\) and \(x e_1 = \begin{bmatrix} \sqrt{3}/2 \\ 1/2 \end{bmatrix}\) are \(\mathbb{R}\)-linearly independent, so any \(v\in V_\phi\) can be written as \(\lambda_1 e_1 + \lambda_2 x e_1 = (\lambda_2 x + \lambda_1) e_1\).

Since \(\phi^6\) is the identity on \(V\), we have \((x^6-1)v= \phi^6(v) - v = 0\) for all \(v\in V\). It follows that \(x^6-1\) annihilates \(V_\phi\). We will see soon that \(\mathrm{ann}_{\mathbb{R}[x]}(V_\phi) = (x^2 - x + 1)\).

We can give a presentation for \(V_\phi\) in general by choosing a basis for \(V\). Recall that given an \(F\)-vector space \(V\) with \(\dim_F(V)=n\) and an ordered basis \(B\) for \(V\) we have proven that \(\mathrm{End}_F(V)\cong \mathrm{Mat}_n(F)\) via the maps \(\phi\mapsto [\phi]_B^B\) and \(A\mapsto \phi_A\).

Theorem 14.1.4

Let \(F\) be a field, let \(V\) be an \(F\)-vector space of dimension \(n\), let \(\phi\!: V \to V\) be a linear transformation, let \(B\) be an ordered basis for \(V\), and let \(A = [\phi]_B^B\). Then the matrix \(xI_n - A \in \mathrm{Mat}_n(F[x])\) presents the \(F[x]\)-module \(V_\phi\).

Proof of Theorem 14.1.4

Let \(B=\{b_1, \dots, b_n\}\) be any basis for \(V\), and note that \(B\) is a generating set for \(V_t\) as a module over \(F[x]\). Then \(V_\phi\) can then be written as a quotient of \(F[x]^n\). More precisely, let \(e_1, \dots, e_n\) denote the standard \(F[x]\)-basis for the free \(F[x]\)-module \(F[x]^n\), and let \(\pi\!: F[x]^n \to V_\phi\) be the surjective \(F[x]\)-module homomorphism sending \(e_i\) to \(b_i\). By the First Isomorphism Theorem, we have \(V_\phi\cong F[x]^n/\ker(\pi)\). On the other hand, the matrix \(xI_n - A\) determines a map

\[ t_{xI_n - A}: F[x]^n \to F[x]^n, \]

and to show that \(V_\phi\cong F[x]^n/\mathrm{im}(t_{xI_n - A})\) it suffices to show that \(\mathrm{im}(t_{xI_n - A}) = \ker(\pi)\). First we show \(\mathrm{im}(t_{xI_n - A}) \subseteq \ker(\pi)\); since \(\mathrm{im}(t_{xI_n - A})\) is generated by the elements \( t_{xI_n - A}(e_i)\), we show that each of these is in the kernel of \(\pi\). To this end, we recall that \(A e_i = [\phi]_B^B e_i\) is the \(i\)th column of \([\phi]_B^B\), which is the vector \( (\lambda_1,\dots,\lambda_n) = \sum_j \lambda_j e_j \) such that \(\phi(b_i) = \sum_j \lambda_j b_j\). In particular, \(\pi (A e_i) = \phi(b_i)\). Then

\[\pi(t_{xI_n - A}(e_i)) = \pi (x e_i - A e_i) = x \pi(e_i) - \pi(A e_i) = \phi(b_i) - \phi(b_i) = 0.\]

This proves \(\mathrm{im}(t_{xI_n - A}) \subseteq \ker(\pi)\). It follows by UMP for quotient modules that there is a surjection of \(F[x]\)-modules

\[ W := F[x]^n/\mathrm{im}(t_{xI_n - A}) \twoheadrightarrow V_\phi. \]

We may also regard this as a surjection of \(F\)-vector spaces. Since \(\dim_F(V_\phi) = n\) and the map above is surjective, we have \(\dim_F(W)\geqslant n\), which follows from the Rank Nullity Theorem. To establish that the map above is an isomorphism, it suffices to show that \(\dim_F(W) \leqslant n\).

Denote by \(\overline{e_i}=e_i+\mathrm{im}(t_{xI_n - A})\) the image of the standard basis of \(F[x]^n\) in \(W\). The \(i\)th column of \(xI_n - A\) gives the relation \(x \overline{e_i} = v_i\) in \(W\), where \(v_i\) is the \(i\)-th column of \(A\). It follows that \(p(x) c_i = p(A) c_i\) in \(W\) for any polynomial \(p(x)\). Thus a typical element of \(W\), given by \(\sum_i g_i(x) \overline{e_i}\), is equal to \(g_1(A) \overline{e_1} + \cdots + g_n(A) \overline{e_n}\). Such an expression belongs to the \(F\)-span of \(\overline{e_1}, \ldots, \overline{e_n}\) in \(W\); that is; \(\overline{e_1}, \ldots, \overline{e_n}\) span \(W\) as an \(F\)-vector space. Therefore, we have the desired inequality \(\dim_F(W) \leqslant n\), which completes our proof.

Corollary 14.1.5

Suppose \(F\) is a field, \(V\) is an \(F\)-vector space, and \(\phi\!:V\to V\) is a linear transformation. There exist unique monic polynomials \(g_1 | \cdots | g_k \in F[x]\) of positive degree and an \(F[x]\)-module isomorphism

\[ V_{\phi} \cong F[x]/(g_1)\oplus \cdots \oplus F[x]/(g_k). \]

The polynomials \(g_1,\ldots, g_k\) are both the invariant factors of the \(F[x]\)-module \(V_{\phi}\) and the entries on the diagonal of the Smith normal form of \(xI_n-[\phi]_B^B\) for any basis \(B\) of \(V\).

Proof of Corollary 14.1.5

The statement says that \(xI_n-[\phi]_B^B\) presents the \(F[x]\)-module \(V_\phi\), and the remainder of the statement is an immediate application of the Classification of finitely generated modules over PIDs to this special case once we show that there is no free summand. Note that \(F[x]\) is an infinite dimensional vector space over \(F\), while \(V_\phi\) is a finite dimensional vector space. If \(V_\phi\) had a free summand, then it would contain an infinite linearly independent set over \(F\), and thus it could not be finite-dimensional.

Definition 14.1.6

The polynomials \(g_1,\ldots,g_k\) in Corollary 14.1.5 are called the invariant factors of the linear transformation \(\phi\).

Example 14.1.7

Let

\[ A= \begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix}\in \mathrm{Mat}_2(\mathbb{Q}). \]

Then

\[ xI_2-A= \begin{bmatrix} x-1 & -1 \\ 0 & x-1 \end{bmatrix}. \]

We could compute the invariant factors of \(t\!:\mathbb{Q}^2\to\mathbb{Q}^2\) by appealing to the Smith Normal Form of \(xI_2-A\), but let us try another way. Let

\[ \begin{bmatrix} d_1 & 0 \\ 0 & d_2 \end{bmatrix} \]

be the Smith Normal Form of \(xI_2-A\). Recall from the proof of Smith Normal Form that \(d_1\) is the gcd of the entries of \(xI_2-A\) and \(d_1d_2=\det(xI_2-A)\). Thus \(d_2=\det(xI_2-A)=(x-1)^2\) and \(d_1=1\). Therefore the only invariant factor of \(t_A\) is \((x-1)^2\).

We know that for linear transformation \(\alpha\neq \beta\) from \(V\to V\), the modules \(V_\alpha\) and \(V_\beta\) are not literally equal, since the action of \(x\) is different. It is natural to ask when they are isomorphic. You will show the following in Problem Set \(\#7\):

Exercise 14.1.8

Let \(F\) be a field and \(V=F^n\). Let \(A,B\in \mathrm{Mat}_{n\times n}(F)\). Then \(V_{t_A}\) and \(V_{t_B}\) are isomorphic if and only if \(A\) and \(B\) are similar matrices, i.e., there is some invertible matrix \(P\) such that \(B=PAP^{-1}\).

14.2 Rational canonical form

You will show the following lemma in Problem Set #6:

Lemma 14.2.1

For a monic polynomial \(f(x) = x^n + a_{n-1}x^{n-1} + \cdots a_1 x + a_0\) with \(n \geqslant 1\), the classes of \(1, x, \dots, x^{n-1}\) form a basis for \(F[x]/(f(x))\) regarded as an \(F\)-vector space. Relative to this basis, the \(F\)-linear operator \(\mu_x: F[x]/(f(x)) \to F[x]/(f(x))\) defined by \(\mu_x(v)=xv\) is given by the following matrix:

\[ C(f) := \begin{bmatrix} 0 & 0 &\cdots & 0 & -a_0 \\ 1 & 0 & \cdots & 0 & -a_1 \\ 0 & 1 & \ddots & 0 & -a_2 \\ \vdots & \ddots & \ddots & \vdots & \vdots \\ 0 & \cdots & 0 & 1 & -a_{n-1} \\ \end{bmatrix} = \begin{bmatrix} 0 & \cdots & 0 & -a_0 \\ &&& -a_1 \\ & I_{n-1}&& \vdots\\ &&& -a_{n-1} \\ \end{bmatrix}. \]

Definition 14.2.2

In the setup of Lemma 14.2.1, the matrix \(C(f)\) is called the companion matrix of the monic polynomial \(f\).

Definition 14.2.3

Given square matrices \(A_1, \ldots, A_m\) with entries in a ring \(R\), not necessarily of the same size, we define \(A_1 \oplus \cdots \oplus A_m\) to be the block diagonal matrix direct sum of matrices

\[ \begin{bmatrix} A_1 & 0 & \cdots & 0 \\ 0 & A_2 & \cdots & 0 \\ \vdots & & \ddots & \vdots \\ 0 & \cdots & 0 & A_m \\ \end{bmatrix}. \]

Remark 14.2.4

If \(f:V_1\to W_1\) and \(g:V_2\to W_2\) are linear transformations, then the map \(f\oplus g:V_1\oplus V_2\to W_1\oplus W_2\) given by \((f\oplus g)(a,c)= (f(a),g(c))\) is a linear transformation. If \(B_i\) is a basis for \(V_i\) and \(C_i\) is a basis for \(W_i\), and \(\iota_i: A_i \hookrightarrow A_1 \oplus A_2\) are the natural inclusions, then \(\mathcal{B}=\iota_1(B_1)\cup \iota_2(B_2)\) is a basis for \(V_1\oplus V_2\), \(\mathcal{C}=\iota_1(C_1)\cup \iota_2(C_2)\) is a basis for \(W_1\oplus W_2\), and

\[ [f\oplus g]_{\mathcal{B}}^{\mathcal{C}}= \begin{bmatrix} [f]_{B_1}^{C_1} & 0 \\ 0 & [g]_{B_2}^{C_2} \end{bmatrix}. \]

Theorem 14.2.5 (Rational Canonical Form)

Let \(F\) be a field, \(V\) a finite dimensional \(F\)-vector space, and \(\phi \!: V \to V\) an \(F\)-linear transformation. There is a basis \(B\) of \(V\) such that

\[ [\phi]_B^B=C(g_1) \oplus \cdots \oplus C(g_k) = \begin{bmatrix} C(g_1) & 0 & 0 & \cdots & 0 \\ 0 & C(g_2) & 0 & \cdots & 0 \\ \vdots & \vdots & \ddots & & \vdots \\ 0 & 0 & \cdots & 0 & C(g_k) \\ \end{bmatrix} \]

where \(g_1, \dots, g_k\) are the invariant factors of \(t\), meaning in particular they are monic polynomials of positive degree such that \(g_1 \mid g_2 \mid \cdots \mid g_k\). Moreover, the polynomials \(g_1, \dots, g_k\) are unique.

Proof of Theorem 14.2.5

By Corollary 14.1.5, \(V_\phi \cong \bigoplus_{i=1}^k F[x]/(g_i(x))\) for some unique \(g_i\) as in the statement. Set \(V_i=F[x]/(g_i(x))\) and note that \(V_\phi=V_1\oplus \dots \oplus V_k\). The map \(\mu _x\!:V_\phi\to V_\phi \) given by multiplication by \(x\) preserves each summand in this decomposition: \(\mu_x(V_i)\subseteq V_i\). Thus if we choose a basis \(B_i\) of each summand \(V_i\) and set \(B=\bigcup_{i=1}^k \iota_i(B_i)\), by the previous remark, \(B\) is a basis of \(V_\phi\) and \([\phi]_B^B=[\phi|_{V_1}]_{B_1}^{B_1}\oplus \dots \oplus [\phi|_{V_k}]_{B_k}^{B_k}\). The result now follows from the Lemma from the Homework.

Definition 14.2.6

In the setup of Theorem 14.2.5, the matrix \(C(g_1) \oplus \cdots \oplus C(g_k)\) is called the rational canonical form (RCF) of the linear transformation \(\phi\). The rational canonical form of a matrix \(A\in \mathrm{Mat}_n(F)\) is defined to be the rational canonical form of the endomorphism \(t_A\) represented by \(A\) with respect to the standard basis of \(F^n\).

Example 14.2.7

Let \( A= \begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix}\in \mathrm{Mat}_2(\mathbb{Q})\). We showed earlier that the unique invariant factor of \(xI_2 - A\) is \( (x-1)^2\). Thus, the Rational Canonical Form of \(t_A\) is

\[ RCF(A)=C((x-1)^2)=C(x^2-2x+1)=\begin{bmatrix} 0 & -1 \\ 1 & 2 \end{bmatrix}. \]

The rational canonical form is canonical in that it uniquely characterizes the similarity class of a matrix.

Theorem 14.2.8

Let \(F\) be a field and let \(A, B \in \mathrm{Mat}_n(F)\). The following are equivalent:

\(A\) and \(B\) are similar matrices.
\(A\) and \(B\) have the same Rational Canonical Form.
\(A\) and \(B\) have the same invariant factors.

Proof of Theorem 14.2.8

To show \((1)\Rightarrow (2)\), suppose \(A\) is similar to \(B\). Then there exists an invertible matrix \(P\) such that \(B = PAP^{-1}\), and thus

\[ xI_n-B = xI_n - PAP^{-1} = P(xI_n - A) P^{-1}. \]

Thus the matrices \(xI_n-A\) and \(xI_n-B\) are also similar. Moreover, similar matrices have the same Smith Normal Form, since the ideals of \(t\times t\) minors are equal, and thus \(A\) and \(B\) have the Rational Canonical Form. The invariant factors can be read off of the Rational Canonical Form, and thus \((2)\Rightarrow (3)\).

Finally, to show \((3)\Rightarrow (1)\) notice that if \(A\) and \(B\) have the same invariant factors then by the Structure Theorem of Finitely Generated Modules over PIDs, there is an isomorphism of \(F[x]\)-modules \(F^n_{t_A} \cong F^n_{t_{B}}\), which implies by a homework problem in Problem Set #7 that \(A\) and \(B\) must be similar.

14.3 The Cayley-Hamilton Theorem

Definition 14.3.1

Let \(F\) be a field and let \(A \in \mathrm{Mat}_n(F)\). The characteristic polynomial of \(A\) is the polynomial \(c_A = \det(xI_n - A)\).

Definition 14.3.2

Let \(V\) be an \(F\)-vector space of dimension \(n\), and let \(\phi: V \to V\) be a linear transformation. The characteristic polynomial of \(t\), denoted \(c_\phi\), is the characteristic polynomial \(c_A\) for a matrix \(A = [\phi]_B^B\) with respect to some ordered basis \(B\) of \(V\).

Characteristic polynomials are well-defined.

Remark 14.3.3

We need to check that the characteristic polynomial of a linear transformation is invariant under base changes. More precisely, we need to check that if we choose two different basis \(B\) and \(B'\) for \(V\), then the matrices \(A = [\phi]_{B}^B\) and \(C = [\phi]_{B'}^{B'}\) have the same characteristic polynomial. First, recall that \(A\) and \(C\) are similar matrices, so \(C = PAP^{-1}\) for some invertible matrix \(P\). Moreover, diagonal matrices are in the center of \(\mathrm{Mat}_n(R)\), meaning they commute with other matrices, and thus we have the following:

\[ \begin{aligned} \det(xI_n-C) & = \det(xI_n-PAP^{-1}) \\ & = \det(P(xI_n-A)P^{-1}) \\ & = \det(P) \det(xI_n - A) \det(P^{-1}) \\ & = \det(xI_n-A). \end{aligned} \]

We conclude that \(A\) and \(B\) have the same characteristic polynomial.

Remark 14.3.4

For any matrices \(A\) and \(B\), \(c_{A \oplus B} = c_Ac_B\).

Definition 14.3.5

Let \(F\) be a field and let \(A \in \mathrm{Mat}_n(F)\). The minimal polynomial of \(A\), denoted \(m_A\), is the unique monic polynomial that generates the principal ideal

\[ \{ f(x) \in F[x] \mid f(A) = 0 \}. \]

Definition 14.3.6

Let \(V\) be an \(F\)-vector space of dimension \(n\), and let \(\phi\!: V \to V\) be a linear transformation. The minimal polynomial of \(\phi\), denoted \(m_\phi\), is the unique monic polynomial generating the ideal \(\ann_{F[x]}(V_\phi)\) in the PID \(F[x]\).

Lemma 14.3.7

Let \(F\) be a field. Let \(V\) be an \(F\)-vector space of dimension \(n\) with basis \(B\) and let \(\phi: V \to V\) be a linear transformation. The minimal polynomial \(m_A\) of \(A = [\phi]_B^B\) satisfies \(m_A = m_\phi\).

Proof of Lemma 14.3.7

Since \(m_A\) and \(m_t\) are both monic, it's sufficient to show \({\ann}_{F[x]}(V_\phi)=(m_A)\). Indeed,

\[ \begin{aligned} f\in {\ann}_{F[x]}(V_\phi) & \iff f(x)v=0 \textrm{ for all } v\in V_\phi\\ & \iff f(A)v=0 \textrm{ for all } v\in V_\phi\\ & \iff \ker(f(A))= V_\phi\\ & \iff \rank(f(A))=0 && \textrm{by the Rank-Nulity Theorem}\\ & \iff f(A)=0\\ & \iff f\in (m_A) && \textrm{by definition of } m_A.\qedhere \end{aligned} \]

Remark 14.3.8

If \(m(x)\) is the minimal polynomial of an endomorphism \(t\) and \(f(x)\) is another polynomial such that \(f(x)\) annihilates \(V_\phi\), then \(f(x) \in \ann(V_\phi) = (m(x))\), and thus \(m(x) | f(x)\).

Similarly, suppose that \(m(x)\) is the minimal polynomial of a matrix \(A\) and \(f(x)\) is another polynomial such that \(f(A)=0\). We know that \(m(x)\) is also the minimal polynomial of the linear transformation \(\phi: v \mapsto Av\), and that \(f(x)\) also annihilates \(V_\phi\). Thus we can also conclude that \(m(x)\mid f(x)\).

Lemma 14.3.9

Let \(F\) be a field, let \(V\) be a finite dimensional \(F\)-vector space, and \(\phi\!: V \to V\) be a linear transformation with invariant factors \(g_1 | \cdots | g_k\). Then \(c_\phi = g_1 \cdots g_k\) and \(m_\phi = g_k\).

Proof of Lemma 14.3.9

The product of the elements on the diagonal of the Smith Normal Form of \(xI_n-A\) is the determinant of \(xI_n-A\). Thus the product of the invariant factors \(g_1 \cdots g_k\) of \(V_\phi\) is the characteristic polynomial \(c_\phi\) of \(\phi\). Notice here that we chose our invariant factors \(g_1, \ldots, g_k\) to be monic, so that \(g_1 \cdots g_k\) is monic, and thus actually equal to \(c_\phi\) (not just up to multiplication by a unit).

By Problem Set 5, \(\ann_{F[x]}(V_\phi)=(g_k)\), and since \(g_k\) is monic we deduce that \(m_\phi=g_k\).

We can now prove the famous Cayley-Hamilton theorem.

Theorem 14.3.10 (Cayley-Hamilton)

Let \(F\) be a field, and let \(V\) be a finite dimensional \(F\)-vector space. If \(\phi: V \to V\) is a linear transformation, then \(m_\phi \mid c_\phi\), and hence \(c_\phi(\phi) = 0\). Similarly, for any matrix \(A \in \mathrm{Mat}_n(F)\) over a field \(F\) we have \(m_A | c_A\) and \(c_A(A) = 0\).

Proof of Theorem 14.3.10

Let \(A = [\phi]_B^B\) for some basis \(B\) of \(V\). Note that the statements about \(A\) and \(\phi\) are equivalent, since by definition \(c_A = c_\phi\), while \(m_A=m_t\) we have \(f(A) = 0\) if and only if \(f(\phi) = 0\). So write \(m= m_A=m_\phi\) and \(c=c_A=c_\phi\).

We know that \(m = g_k\) and \(c = g_1 \cdots g_k\), so \(m \mid c\). By definition, we \(m(A) = 0\). Since \(m|c\), we conclude that \(c(A)=0\).

Remark 14.3.11

As a corollary of the Cayley-Hamilton Theorem, we obtain that the minimal polynomial of \(\phi\!: V \to V\) has degree at most \(n = \dim(V)\), since \(m_\phi\) divides \(c_\phi\), which is a polynomial of degree \(n\).

Lemma 14.3.12

Let \(F\) be a field and let \(V\) be a finite dimensional \(F\)-vector space. If \(\phi: V \to V\) is a linear transformation, then \(c_\phi \mid m_\phi^k\).

Proof of Lemma 14.3.12

Since \(g_i\mid g_k\) for \(1\leqslant i \leqslant k\), we have \(c_\phi=g_1 \cdots g_k \mid g_k^k=m_\phi^k\).

It follows that \(c_\phi\) and \(m_\phi\) have the same roots, not counting multiplicities.

Definition 14.3.13

Let \(V\) be \(\phi\!: V \to V\) be a linear transformation over a field \(F\). A nonzero element \(v \in V\) satisfying \(\phi(v) = \lambda v\) for some \(\lambda \in F\) is an eigenvector of \(\phi\) with eigenvalue \(\lambda\). Similarly, given a matrix \(A \in \mathrm{Mat}_n(F)\), a nonzero \(v \in F^n\) satisfying \(Av = \lambda v\) for some \(\lambda \in F\) is an eigenvector of \(A\) with eigenvalue \(\lambda\).

Theorem 14.3.14

Let \(f \in F\). The following are equivalent:

\(\lambda\) is an eigenvalue of \(\phi\).
\(\lambda\) is a root of \(c_\phi\).
\(\lambda\) is a root of \(m_\phi\).

Proof of Theorem 14.3.14

By the Cayley-Hamilton Theorem, \(m_\phi|c_\phi\), and thus \((3) \Rightarrow (2)\). On the other hand, by \(\Cref{lemma ct divides multiple of mt}\) we know that \(c_\phi \mid m_\phi^k\), so if \(c_\phi(\lambda) = 0\) then \(m_\phi(\lambda)^k=0\), and since we are over a field, we conclude that \(m_\phi(\lambda) = 0\). This shows \((2) \Rightarrow (3)\).

Finally, to show that \((1) \Leftrightarrow (2)\), notice that the scalar \(\lambda\in F\) is an eigenvalue of \(A\) if and only if there is a nonzero solution \(v\) to \((\lambda I_n-A)v = 0\). This happens if and only if \(\lambda I_n - A\) has a nontrivial kernel, or equivalently if \(\lambda I - A\) is not invertible. Thus \(\lambda\in F\) is an eigenvalue of \(A\) if and only if it is a root of its characteristic polynomial \(c_A(x)=\det(xI_n-A)\), meaning \(c_A(\lambda)=0\).

Example 14.3.15

Let us find the minimal and characteristic polynomials of \(\phi: \mathbb{R}^2 \to \mathbb{R}^2\) given as rotation by \(\pi/3\) degrees counter-clockwise. We could write this down as matrix and compute its characteristic polynomial, but a simpler way is to notice that \(\phi^3 = -I_2\), and so \(\phi\) satisfies the polynomial \(x^3 + 1 = (x+1)(x^2 - x + 1)\). Its minimal polynomial must therefore divide \(x^3 +1\). Since \(x^3 +1 = (x+1)(x^2 - x +1)\) and \(x^2 - x +1\) is irreducible in \(\mathbb{R}[x]\), we conclude that the minimal polynomial of \(T\), which we know has degree at most \(2\), must be either \(x+1\) or \(x^2 - x +1\). If \(m_\phi = x+1\), then \(\phi\) would be \(-I_2\), which is clearly incorrect. So the minimal polynomial of \(\phi\) must be \(x^2 - x +1\). By Cayley-Hamilton, this polynomial must divide the characteristic polynomial, and since the latter also has degree two, we conclude that

\[ c_\phi(x) = x^2 - x + 1. \]

Since this is irreducible, in this example we have no choice for how to form the invariant factors: there must just be one of them, \(c_\phi(x)\) itself. So

\[ C(x^2 - x + 1) = \begin{bmatrix} 0 & -1 \\ 1 & 1 \\ \end{bmatrix} \]

is the rational canonical form of \(\phi\).

Example 14.3.16

Let's find the minimal polynomial of

\[ \begin{bmatrix} 1 & 1 & 0 & 0 \\ 0 & 1 & 1 & 0 \\ 0 & 0 & 1 & 1 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} \]

By the Cayley-Hamilton Theorem, \(m_A(x) \mid c_A(x)\). The polynomial \(c_A(x)\) is easy to compute since this matrix is upper-triangular:

\[ c_A(x) = \det(xI_4 - A) = (x-1)^4. \]

So \(m_A(x) = (x-1)^j\) for some \(j \leqslant 4\). By brute force, we verify that \((A-I_4)^3 \neq 0\) and thus it must be the case that \(m_A(x) = c_A(x) = (x-1)^4\).

Example 14.3.17

Let's find the minimal polynomial of

\[ \begin{bmatrix} 1 & 1 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 1 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} \]

As in the previous example, \(c_A(x) = (x-1)^4\) and so by the Cayley-Hamilton Theorem \(m_A(x) = (x-1)^j\) for some \(j \leqslant 4\). This time we notice that \((A-I_4)^2 = 0\) and so, since \((A-I_4) \neq 0\), we have \(m_A(x) = c_A(x) = (x-1)^2\).