Introduction
These are the lecture notes and class materials for Math 818 Introduction to Modern Algebra II in Spring 2026. This is the second part of a two-part course on groups, rings, modules, and fields. In this second half, we will discuss module theory, with a focus on the structure theory of modules over PIDs and applications to linear algebra, field theory, and Galois theory. A major goal of this course is to prepare graduate students for the PhD qualifying exam in algebra.
The lecture notes for Math 817 Introduction to Modern Algebra I can be found here: Math 817 lecture notes.
The lecture notes draw heavily on Eloísa Grifo’s Algebra Notes (PDF, opens in new tab) , which in turn draw from earlier lecture notes of Mark Walker and Alexandra Seceleanu. The textbook Abstract Algebra by Dummit and Foote is a good resource covering similar material.
Modules
11. An Introduction to Modules
Modules are a generalization of the concept of a vector space to any ring of scalars in place of a field. While vector spaces are key examples of modules, many of the basic facts we are used to from linear algebra are often a little more subtle over a general ring.
11.1 Definitions and first examples
Definition 11.1.1
Let \(R\) be a ring with \(1 \neq 0\). A left \(R\)-module is an abelian group \((M,+)\) together with an action \(R \times M \to M\) of \(R\) on \(M\), written as \((r,m) \mapsto rm\), such that for all \(r,s \in R\) and \(m,n \in M\) we have the following:
- \((r + s)m = rm + sm\),
- \((rs)m = r(sm)\),
- \(r(m + n) = rm + rn\), and
- \(1m = m\).
A right \(R\)-module is an abelian group \((M,+)\) together with an action of \(R\) on \(M\), written as \(M \times R \to M, (m,r)\mapsto mr\), such that for all \(r,s \in R\) and \(m,n \in M\) we have
- \(m(r + s) = mr + ms\),
- \(m(rs) = (mr)s\),
- \((m + n)r = mr + nr\), and
- \(m1 = m\).
By default, we will be studying left \(R\)-modules. To make the writing less heavy, we will sometimes say \(R\)-module rather than left \(R\)-module whenever there is no ambiguity.
Remark 11.1.2
If \(R\) is a commutative ring, then any left \(R\)-module \(M\) may be regarded as a right \(R\)-module by setting \(m r := r m\). Likewise, any right \(R\)-module may be regarded as a left \(R\)-module. Thus for commutative rings, we just refer to modules, and not left or right modules.
Lemma 11.1.3 (Arithmetic in modules)
Let \(R\) be a ring with \(1_R \neq 0_R\) and \(M\) be an \(R\)-module. Then \(0_Rm = 0_M\) and \((-1_R)m = -m\) for all \(m \in M\).
Proof (of Lemma 11.1.3)
Let \(m \in M\). Then \[ 0_R m = (0_R + 0_R) m = 0_Rm + 0_Rm. \] Since \(M\) is an abelian group, the element \(0_Rm\) has an additive inverse, \(-0_Rm\), so adding it on both sides we see that \[ 0_M = 0_Rm. \] Moreover, \[ m + (-1_R)m = 1_R m + (-1_R)m = (1_R - 1_R)m = 0_R m = 0_M, \] so \((-1_R)m = -m\).
Typically, one first encounters modules in an undergraduate linear algebra course: the vector spaces from linear algebra are modules over fields. Later we will see that vector spaces are much simpler modules than modules over other rings. So while one might take linear algebra and vector spaces as an inspiration for what to expect from a module, be warned that this perspective can often be deceiving.
Definition 11.1.4
Let \(F\) be a field. A vector space over \(F\) is an \(F\)-module.
We will see more about vector spaces soon. Note that many of the concepts we will introduce have special names in the case of vector spaces. Here are some other important examples:
Lemma 11.1.5
Let \(M\) be a set with a binary operation \(+\). Then
- \(M\) is an abelian group if and only if \(M\) is a \(\mathbb{Z}\)-module.
- \(M\) is an abelian group such that \(nm:=\underbrace{ m + \cdots + m}_{n \textrm{ times}}=0_M\) for all \(m\in M\) if and only if \(M\) has a \(\mathbb{Z}/n\)-module structure.
Proof (of Lemma 11.1.5)
First, we show 1). If \(M\) is a \(\mathbb{Z}\)-module, then \((M,+)\) is an abelian group by definition of module. Conversely, if \((M,+)\) is an abelian group then there is a unique \(\mathbb{Z}\)-module structure on \(M\) given by the formulas below. The uniqueness of the \(\mathbb{Z}\) action follows from the identities below in which the right hand side is determined only by the abelian group structure of \(M\). The various identities follow from the axioms of a module: \[ \begin{cases} i \cdot m = (\underbrace{1 + \cdots + 1}_i) \cdot m = \underbrace{1 \cdot m + \cdots +1 \cdot m}_i = \underbrace{ m + \cdots + m}_i & \text{ if } i>0\\ 0 \cdot m = 0_M & \\ i \cdot m = - (-i) \cdot m = - (\underbrace{m + \cdots + m}_{-i}) & \text{ if } i<0. \end{cases} \] We leave it as an exercise to check that this \(\mathbb{Z}\)-action really satisfies the module axioms.
Now we show 2). If \(M\) is a \(\mathbb{Z}/n\) module, then \((M,+)\) is an abelian group by definition, and \(nm= \underbrace{ m + \cdots + m}_n=\underbrace{[1]_n \cdot m + \cdots +[1]_n \cdot m}_n=[0]_nm=0_M\).
Conversely, there is a unique \(\mathbb{Z}/n\)-module structure on \(M\) given by the formulas below, which are analogous to the ones above: \[ \begin{cases} [i]_n \cdot m = (\underbrace{[1]_n+ \cdots + [1]_n}_i) \cdot m = \underbrace{[1]_n \cdot m + \cdots +[1]_n \cdot m}_i= \underbrace{ m + \cdots + m}_i & \text{ if } i>0\\ 0 \cdot m = 0_M &\\ [i]_n \cdot m = - (-[i]_n) \cdot m = - (\underbrace{m + \cdots + m}_{-i}) & \text{ if } i<0. \end{cases} \] These formulas are well-defined, meaning they are independent of the choice of representative for \([i]_n\), because of the assumption that \(nm=0_M\). Again checking that this \(\mathbb{Z}/n\)-action really satisfies the module axioms is left as an exercise.
The proposition above says in particular that any group of the form \[ G=\mathbb{Z}^\ell\times \mathbb{Z}/d_1\times \dots\times \mathbb{Z}/d_m \] is a \(\mathbb{Z}\)-module, and if \(\ell=0, m \geqslant 1\) and \(d_i \mid n\) for \(1 \leqslant i \leqslant m\) then \(G\) is also a \(\mathbb{Z}/n\)-module. In particular, the Klein group is a \(\mathbb{Z}/2\)-module.
In contrast to vector spaces, for \(M\) a module over a ring \(R\), it can happen that \(rm=0\) for some \(r \in R\) and \(m \in M\) such that \(r \neq 0_R\) and \(m \neq 0_M\). For example, in the Klein group \(K_4\) viewed as a \(\mathbb{Z}\)-module we have \(2m=0\) for all \(m \in K_4\).
Example 11.1.6
- The trivial \(R\)-module is \(0=\{0\}\) with \(r0=0\) for any \(r\in R\).
- If \(R\) is any ring, then \(R\) is a left and right \(R\)-module via the action of \(R\) on itself given by its internal multiplication.
- If \(I\) is a left (respectively, right) ideal of a ring \(R\) then \(I\) is a left (respectively, right) \(R\)-module with respect to the action of \(R\) on \(I\) by internal multiplication.
- If \(R\) is a subring of a ring \(S\), then \(S\) is a (left or right) \(R\)-module with respect to the action of \(R\) on \(S\) by internal multiplication in \(S\).
- If \(R\) is a ring with \(1 \neq 0\), then \(R[x_1,\ldots,x_n]\) is an \(R\)-module for any \(n \geqslant 1\). This is a special case of (4).
- The standard free module over \(R\) of rank \(n\) is the set \[ R^n=\left\{\begin{bmatrix} r_1\\ \vdots\\r_n \end{bmatrix} \mid r_i\in R, 1 \leqslant i \leqslant n\right\} \] with componentwise addition and multiplication by elements of \(R\), as follows: \[ \begin{bmatrix} r_1\\ \vdots\\r_n \end{bmatrix} +\begin{bmatrix} r'_1\\ \vdots\\r'_n \end{bmatrix} =\begin{bmatrix} r_1+r'_1\\ \vdots\\r_n +r'_n\end{bmatrix} \text{ and } r\begin{bmatrix} r_1\\ \vdots\\r_n \end{bmatrix}=\begin{bmatrix} rr_1\\ \vdots\\ rr_n \end{bmatrix}. \] We will often write the elements of \(R^n\) as \(n\)-tuples \((r_1, \ldots, r_n)\) instead. Notice that \(R\) is the free \(R\)-module of rank \(1\).
- More generally, given a collection of \(R\)-modules \(\{ A_i \}\), the abelian groups \[ \prod_i A_i = \{ (a_i)_i \mid a_i \in A_i \}\] and \[ \bigoplus_i A_i = \{ (a_i)_i \mid a_i \in A_i, a_i = 0 \textrm{ for all but finitely many}\ i \,\} \] are \(R\)-modules with the \(R\)-action \(r(a_i) := (ra_i)\). They are called the direct product and direct sum, respectively.
- If \(R\) is a ring, let \( M_n(R)\) denote the ring of \(n \times n\) matrices with entries in \(R\). Then the set of \(n\times 1\) column vectors with entries in \(R\) is a left \(M_n(R)\)-module, and the set of \(1\times n\) row vectors with entries in \(R\) is a right \(M_n(R)\)-module, with the usual coordinatewise vector-plus-vector addition and usual matrix-times-vector multiplication.
11.2 Submodules and restriction of scalars
Definition 11.2.1
Let \(R\) be a ring and let \(M\) be a left \(R\)-module. An \(R\)-submodule of \(M\) is a subset \(N\subseteq M\) that is an \(R\)-module with the same addition and \(R\)-action as \(M\).
Exercise 11.2.2
Show that a subset \(N\subseteq M\) is a submodule if and only if it is a subgroup under addition of \(M\) satisfying \(rn \in N\) for all \(r \in R\) and \(n \in N\).
Example 11.2.3
Every \(R\)-module \(M\) has two trivial submodules: \(M\) itself and the zero module \(0 = \{ 0_M \}\). A submodule \(N\) of \(M\) is nontrivial if \(N \neq M\) and \(N \neq 0\).
Lemma 11.2.4 (Submodule tests)
Let \(R\) be a ring with \(1 \neq 0\) and let \(M\) be a left \(R\)-module. Let \(N\) be a nonempty subset of \(M\).
- (Two-step test) \(N\) is an \(R\)-submodule of \(M\) if and only if \(n+n' \in N\) and \(rn \in N\) for all \(n,n'\in N\) and \(r\in R\).
- (One-step test) \(N\) is an \(R\)-submodule of \(M\) if and only if \(rn+n'\in N\) for all \(n,n'\in N\) and \(r\in R\).
Proof (of Lemma 11.2.4)
Exercise.
Example 11.2.5
Let \(R\) be a ring and let \(M\) be a subset of \(R\). Then \(M\) is a left (respectively, right) \(R\)-submodule of \(R\) if and only if \(M\) is a left (respectively, right) ideal of R.
Exercise 11.2.6
Let \(R\) be a ring and let \(A\) and \(B\) be submodules of an \(R\)-module \(M\). Then the sum of \(A\) and \(B\), \[ A + B := \{ a + b \mid a \in A, b \in B \}, \] is a submodule of \(M\). If \(\{A_i \ | \ i\in I\}\) is a collection of submodules of \(M\), then the intersection \( \bigcap_{i\in I} A_i\) is a submodule of \(M\).
Exercise 11.2.7
Let \(R\) be a commutative ring with \(1\neq 0\), let \(I\) be an ideal of \(R\) and let \(M\) be an \(R\)-module. Show that \[ IM := \left\{\sum_{k=1}^n j_km_k \mid n \geqslant 0, j_k \in I, m_k \in M \text{ for } 1 \leqslant k \leqslant n \right\} \] is a submodule of \(M\).
Example 11.2.8
When \(R\) is a field, the submodules of a vector space \(V\) are precisely the subspaces of \(V\). When \(R = \mathbb{Z}\), then the class of \(R\)-modules is simply the class of all abelian groups. The submodules of a \(\mathbb{Z}\)-module \(M\) coincide with the subgroups of the abelian group \(M\).
Given an \(R\)-module \(M\), the ring \(R\) is sometimes referred to as the ring of scalars, by analogy to the vector space case. Given an action of a ring of scalars on a module, we can sometimes produce an action of a different ring of scalars on the same set, producing a new module structure.
Lemma 11.2.9 (Restriction of scalars)
Let \(\phi\!: R \to S\) be a ring homomorphism. Any left \(S\)-module \(M\) may be regarded via restriction of scalars as a left \(R\)-module with \(R\)-action defined by \(r m := \phi(r)m\) for any \(m\in M\). In particular, if \(R\) is a subring of a ring \(S\), then any left \(S\)-module \(M\) may be regarded via restriction of scalars as a left \(R\)-module with \(R\)-action defined by the action of the elements of \(R\) viewed as elements of \(S\).
Proof (of Lemma 11.2.9)
Let \(r,s \in R\) and \(m,n \in M\). One checks that the axioms in the definition of a module hold for the given action using properties of ring homomorphisms. For example: \[ (r+s)m=\phi(r + s)m= (\phi(r)+\phi(s))m=\phi(r)m + \phi(s)m=rm+sm. \] The remaining properties are left as an exercise.
Note that the second module structure on \(M\) obtained via restriction of scalars is induced by the original module structure, so the two are related. In general, one can give different module structures on the same abelian group over different, possibly unrelated, rings.
Example 11.2.10
If \(I\) is an ideal of a ring \(R\), applying restriction of scalars along the quotient homomorphism \(q\!:R\to R/I\) tells us that any left \(R/I\)-module is also a left \(R\)-module. In particular, applying this to the \(R/I\)-module \(R/I\) makes \(R/I\) a left and right \(R\)-module by restriction of scalars along the quotient homomorphism. Thus the \(R\)-action on \(R/I\) is given by \[ r \cdot (a + I) := ra + I. \]
The next example explains why restriction of scalars is called a restriction.
Example 11.2.11
Let \(R\) be a subring of \(S\), and let \(i\!: R \to S\) be the inclusion map, which must by definition be a ring homomorphism. Applying restriction of scalars to an \(S\)-module \(M\) via \(i\) is the same as simply restricting our scalars to the elements of \(R\).
11.3 Module homomorphisms and isomorphisms
Definition 11.3.1
Given \(R\)-modules \(M\) and \(N\), an \(R\)-module homomorphism from \(M\) to \(N\) is a function \(f\!: M \to N\) such that for all \(r \in R\) and \(m,n \in M\) we have
- \(f(m+n)=f(m)+f(n)\)
- \(f(rm) = rf(m)\).
Remark 11.3.2
The condition \(f(m+n)=f(m)+f(n)\) says that \(f\) is a homomorphism of abelian groups, and the condition \(f(rm) = rf(m)\) says that \(f\) is \(R\)-linear, meaning that it preserves the \(R\)-action. Since \(f\) is a homomorphism of abelian groups, it follows that \(f(0) = 0\) must hold.
Definition 11.3.3
Let \(M\) and \(N\) be vector spaces over a field \(F\). A linear transformation from \(M\) to \(N\) is an \(F\)-module homomorphism \(M \to N\).
Example 11.3.4
Let \(R\) be a commutative ring and \(M\) be an \(R\)-module. For each \(r \in R\), the multiplication map \(\mu_r\!: M \to M\) given by \(\mu_r(m) = rm\) is a homomorphism of \(R\)-modules: indeed, by the definition of \(R\)-module we have \[ \mu_r(m+n) = r(m+n) = rm+rn = \mu_r(m) + \mu_r(n), \] and \[ \mu_r(sm) = r(sm) = (rs)m = (sr)m = s (rm) = s \mu_r(m). \] Note that \(R\) is not commutative, the left multiplication map \(\mu_r\!:M\to M\) is not a homomorphism of (left) \(R\)-modules.
Definition 11.3.5
An \(R\)-module homomorphism \(h\!: M \to N\) is an \(R\)-module isomorphism if there is an \(R\)-module homomorphism \(g: N \to M\) such that \(h \circ g = \mathrm{id}_N\) and \(g \circ h = \mathrm{id}_M\). We say \(M\) and \(N\) are isomorphic, denoted \(M \cong N\), if there exists an isomorphism \(M \to N\).
To check that an \(R\)-module homomorphism \(f\!: M \to N\) is an isomorphism, it is sufficient to check that it is bijective.
Exercise 11.3.6
Let \(f\!: M \to N\) be a homomorphism of \(R\)-modules. Show that if \(f\) is bijective, then its set-theoretic inverse \(f^{-1}\!: N \to M\) is an \(R\)-module homomorphism. Therefore, every bijective homomorphism of \(R\)-modules is an isomorphism.
One should think of a module isomorphism as a relabelling of the names of the elements of the module. If two modules are isomorphic, that means that they are essentially the same, up to renaming the elements.
Definition 11.3.7
Let \(f\!: M \to N\) be a homomorphism of \(R\)-modules. The kernel of \(f\) is \[ \ker(f) := \{ m \in M \mid f(m) = 0 \}. \] The image of \(f\), denoted \(\mathrm{im}(f)\) or \(f(M)\), is \[ \mathrm{im}(f) := \{ f(m) \mid m \in M \}. \]
Exercise 11.3.8
Let \(R\) be a ring with \(1 \neq 0\), let \(M\) be an \(R\)-module, and let \(N\) be an \(R\)-submodule of \(M\). Then the inclusion map \(i\!: N \to M\) is an \(R\)-module homomorphism.
Exercise 11.3.9
If \(f\!: M \to N\) is an \(R\)-module homomorphism, then \(\ker(h)\) is an \(R\)-submodule of \(M\) and \(\mathrm{im}(f)\) is an \(R\)-submodule of \(N\).
Definition 11.3.10
Let \(R\) be a ring and let \(M\) and \(N\) be \(R\)-modules. Then \(\mathrm{Hom}_R(M,N)\) denotes the set of all \(R\)-module homomorphisms from \(M\) to \(N\), and \(\mathrm{End}_R(M)\) denotes the set \(\mathrm{Hom}_R(M,M)\). We call \(\mathrm{End}(M)\) the endomorphism ring of \(M\), and elements of \(\mathrm{End}(M)\) are called endomorphisms of \(M\).
The endomorphism ring of an \(R\)-module \(M\) is called that because it is a ring, with multiplication given by composition of endomorphisms, \(0\) given by the zero map (the constant equal to \(0\)), and \(1\) given by the identity map. However, two homomorphisms from \(M\) to \(N\) are not composable unless \(M = N\), so \(\mathrm{Hom}_R(M,N)\) is not a ring.
When \(R\) is commutative, \(\mathrm{Hom}_R(M,N)\) is, however, an \(R\)-module; let us describe its \(R\)-module structure. Given \(f, g \in \mathrm{Hom}_R(M,N)\), \(f+g\) is the map defined by \[ (f+g)(m) := f(m) + g(m), \] and given \(r \in R\) and \(f \in \mathrm{Hom}_R(M,N)\), \(r \cdot f\) is the \(R\)-module homomorphism defined by \[ (r \cdot f) (m) := r \cdot f(m) = f(rm). \] The zero element of \(\mathrm{Hom}_R(M,N)\) is the zero map, the constant equal to \(0_N\).
Lemma 11.3.11
Let \(M\) and \(N\) be \(R\)-modules over a commutative ring \(R\). Then the addition and multiplication by scalars defined above make \(\mathrm{Hom}_R(M,N)\) an \(R\)-module.
Proof (of Lemma 11.3.11)
There are many things to check, including:
- The addition and the \(R\)-action are both well-defined: given \(f,g\in \mathrm{Hom}_R(M,N)\) and \(r\in R\), we always have \(f+g, rf\in \mathrm{Hom}_R(M,N)\).
- The axioms of an \(R\)-module are satisfied for \(\mathrm{Hom}_R(M,N)\).
We leave the details as exercises.
We will see later that for an \(n\)-dimensional vector space \(V\) over a field \(F\), there is an isomorphism of vector spaces \(\mathrm{Hom}_F(V)\cong M_n(F)\). This says that every linear transformation \(T:V\to V\) corresponds to some \(n\times n\) matrix. However, the story for general \(R\)-modules is a lot more complicated.
Lemma 11.3.12
For any commutative ring \(R\) with \(1\neq 0\) and any \(R\)-module \(M\) there is an isomorphism of \(R\)-modules \(\mathrm{Hom}_R(R,M)\cong M\).
Proof (of Lemma 11.3.12)
Let \(f\!:M\to \mathrm{Hom}_R(R,M)\) be given for each \(m\in M\) by \(f(m)=\phi_m\) where \(\phi_m\) is the map defined by \(\phi_m(r)=rm\) for all \(r\in R\). Now we have many things to check:
- \(f\) is well-defined, meaning that for any \(m\in M\), its image \(f(m) = \phi_m\) is an element of \(\mathrm{Hom}_R(R,M)\), since \[ \phi_m(r_1+r_2)=(r_1+r_2)m=r_1m+r_2m=\phi_m(r_1)+\phi_m(r_2) \] \[ \phi_m(r_1r_2)=(r_1r_2)m=r_1(r_2m)=r_1\phi_m(r_2) \] for all \(r_1,r_2\in R\).
- \(f\) is an \(R\)-module homomorphism, since \[ \phi_{m_1+m_2}(r)=r(m_1+m_2)=rm_1+rm_2=\phi_{m_1}(r)+\phi_{m_2}(r) \] \[ \phi_{r'm}(r)=r(r'm)=(rr')m=r'(rm)=r'\phi_{m}(r) \]
- \(f\) is injective, since \(\phi_m=\phi_{m'}\) implies in particular that \(\phi_m(1_R)=\phi_{m'}(1_R)\), which by definition of \(\phi_{-}\) means that \(m=m'\).
- \(f\) is surjective, since for \(\psi \in \mathrm{Hom}_R(R,M)\) we have \(\psi(r)=\psi(r1_R)=r\psi(1_R)\) for all \(r\in R\), so \(\psi=\phi_{\psi(1_R)}\).
This shows that \(f\) is an \(R\)-module isomorphism.
Remark 11.3.13
Let \(R\) be a commutative ring with \(1 \neq 0\) and let \(M\) be an \(R\)-module. Then \(M\) is also an \(\mathrm{End}_R(M)\)-module with the action \(\phi m=\phi(m)\) for any \(\phi\in \mathrm{End}_R(M)\), \(m\in M\).
Definition 11.3.14
Let \(R\) be a ring, let \(M\) be an \(R\)-module, and let \(N\) be a submodule of \(M\). The quotient module \(M/N\) is the quotient group \(M/N\) with R action defined by \[ r(m + N) := rm + N \] for all \(r \in R\) and \(m + N \in M/N\).
Lemma 11.3.15
Let \(R\) be a ring, let \(M\) be an \(R\)-module, and let \(N\) be a submodule of \(M\). The quotient module \(M/N\) is an \(R\)-module, and the quotient map \(q\!: M \to M/N\) is an \(R\)-module homomorphism with kernel \(\ker(q) = N\).
Proof (of Lemma 11.3.15)
Among the many things to check here, we will only check the well-definedness of the \(R\)-action on \(M\), and leave the others as exercises. To check well-definedness, consider \(m+N=m'+N\). Then \(m-m'\in N\), so \(r(m-m')\in N\) by the definition of submodule. This gives that \(rm-rm'\in N\), hence \(rm+N=rm'+N\).
Definition 11.3.16
Given an \(R\)-module \(M\) and a submodule \(N\) of \(M\), the map \(q\!: M \to M/N\) is the canonical quotient map, or simply the canonical map from \(M\) to \(N\).
Example 11.3.17
If \(R\) is a field, quotient modules are the same thing as quotient vector spaces. When \(R = \mathbb{Z}\), recall that \(\mathbb{Z}\)-modules are the same as abelian groups. Quotients of \(\mathbb{Z}\)-modules coincide with quotients of abelian groups.
Theorem 11.3.18 (Universal mapping property for quotient modules)
Let \(N\) be a submodule of \(M\), let \(T\) be an \(R\)-module, and let \(f: M \to T\) be an \(R\)-module homomorphism. If \(N \subseteq \ker f\), then the function \[ \begin{aligned} M/N &\longrightarrow & T \\ m+N &\longmapsto& f(m) \end{aligned} \] is a well-defined \(R\)-module homomorphism. In fact, \(\overline{f}: M/N \to T\) is the unique \(R\)-module homomorphism such that \(\overline{f} \circ q = f\), where \(q\!: M \to M/N\) denotes the canonical map.
Proof (of Theorem 11.3.18)
By 817, we already know that \(\overline{f}\) is a well-defined homomorphism of groups under \(+\) and that it is the unique one such that \(\overline{f} \circ q = f\). It remains only to show \(\overline{f}\) is an \(R\)-linear map: \[ \overline{f}(r (m +N)) = \overline{f} (rm + N) = f(rm) = r f(m) = r \overline{f}(m + N). \] where the third equation uses that \(f\) preserves scaling.
Theorem 11.3.19 (First Isomorphism Theorem)
Let \(N\) be an \(R\)-module and let \(h: M \to N\) be an \(R\)-module homomorphism. Then \(\ker(h)\) is a submodule of \(M\) and there is an \(R\)-module isomorphism \(M/\ker(h) \cong \mathrm{im}(h)\).
Proof (of Theorem 11.3.19)
If we forget the multiplication by scalars in \(R\), by the First Isomorphism Theorem for Groups, we know that there is an isomorphism of abelian groups under \(+\), given by \[ \begin{aligned} M/\mathrm{ker}(h) &\longrightarrow & \mathrm{im}(h) \\ m+\mathrm{ker}(h) &\longmapsto& h(m). \end{aligned} \] It remains only to show this map preserves multiplication by scalars. And indeed: \[ \begin{aligned} \overline{h}(r(m+\ker(h))) & = \overline{h}(rm+\ker(h)) & \textrm{by definition of the \(R\)-action on } M/\ker(h)\\ & = h(rm) & \textrm{by definition of } \overline{h} \\ & = rh(m) & \textrm{ since \(h\) is an \(R\)-module homomorphism} \\ & = r \overline{h}(m+ \ker h) & \textrm{by definition of } h. \end{aligned} \]
Theorem 11.3.20 (Diamond Isomorphism Theorem)
Let \(A\) and \(B\) be submodules of \(M\), and let \(A + B = \{a+b \mid a \in A, b \in B\}\). Then \(A + B\) is a submodule of \(M\), \(A \cap B\) is a submodule of \(A\), and there is an \(R\)-module isomorphism \((A + B)/B \cong A/(A \cap B)\).
Proof (of Theorem 11.3.20)
We know that \(A+B\) and \(A \cap B\) are submodules of \(M\). By the Diamond Isomorphism Theorem for Groups, there is an isomorphism of abelian groups \[\begin{aligned} A/(A \cap B) &\cong (A+B)/B \\ a + (A\cap B) &\mapsto a + B\end{aligned} \] It remains only to show \(h\) preserves multiplication by scalars: \[ h(r(a+(A \cap B))) = h(ra + A \cap B) = ra + B = r(a +B) = rh(a + (A \cap B)). \]
Theorem 11.3.21 (Cancelling Isomorphism Theorem)
Let \(A\) and \(B\) be submodules of \(M\) with \(A \subseteq B\). Then there is an \(R\)-module isomorphism \((M/A)/(B/A) \cong M/B\).
Proof (of Theorem 11.3.21)
From 817, we know that \(B/A\) is a subgroup of \(M/A\) under \(+\). Given \(r \in R\) and \(b +A \in B/A\) we have \(r(b+A) = rb + A\) which belongs to \(B/A\) since \(rb \in B\). This proves \(B/A\) is a submodule of \(M/A\). By the Cancelling Isomorphism Theorem for Groups, there is an isomorphism of abelian groups \[\begin{aligned} (M/A)/(B/A) &\cong M/B \\ (m+A) + B/A &\mapsto m + B \end{aligned} \] and it remains only to show this map is \(R\)-linear: \[ \begin{aligned} h(r((m+A) + B/A)) = & h(r(m+A) + B/A) = h((rm + A) + B/A) \\ & = rm + B = r(m +B)\\ & = r h((m+A) + B/A). \end{aligned} \]
Theorem 11.3.22 (Lattice Isomorphism Theorem)
Let \(R\) be a ring, let \(N\) be a R-submodule of an \(R\)-module \(M\), and let \(q\!: M \to M/N\) be the quotient map. Then the function \[ \begin{aligned} \{\, R\text{-submodules of } M \text{ containing } N \,\} &\longrightarrow \{\, R\text{-submodules of } M/N \,\} \\ K &\longmapsto K/N . \end{aligned} \] is a bijection, with inverse defined by \[ \Psi^{-1}(T) := q^{-1}(T) = \{ a \in M \mid a+N \in T \} \] for each \(R\)-submodule \(T\) of \(M/N\). Moreover, \(\Psi\) and \(\Psi^{-1}\) preserve sums and intersections of submodules.
Proof (of Theorem 11.3.22)
From 817, we know there is a bijection between the set of subgroups of \(M\) and that contain \(N\) and subgroups of the quotient group \(M/N\), given by the same map \(\Psi\). We just need to prove that these maps send submodules to submodules. If \(K\) is a submodule of \(M\) containing \(N\), then by the Cancelling Isomorphism Theorem we know that \(K/N\) is a submodule of \(M/N\). If \(T\) is a submodule of \(M/N\), then \(\pi^{-1}(T)\) is an abelian group, by 817. For \(r \in R\) and \(m \in \pi^{-1}(T)\), we have \(\pi(m) \in T\), and hence \(\pi(rm) = r\pi(m) \in T\) too, since \(T\) is a submodule. This proves \(\pi^{-1}(T)\) is a submodule.
11.4 Module generators, bases and free modules
Definition 11.4.1
Let \(M\) be an \(R\)-module. A linear combination of finitely many elements \(a_1,\dots,a_n\) of \(M\) is an element of \(M\) of the form \(r_1m_1 + \dots + r_nm_n\) for some \(r_1,\ldots,r_n \in R\).
Definition 11.4.2
Let \(R\) be a ring with \(1 \neq 0\) and let \(M\) be an \(R\)-module. For a subset \(A\) of \(M\), the submodule of \(M\) generated by \(A\) is \[ RA := \{r_1a_1 + \dots + r_na_n \mid n \geq 0, r_i \in R, a_i\in A\}. \] We say \(M\) is generated by \(A\) if \(M=RA\). If \(M\) is an \(F\)-vector space, we may say that \(M\) is spanned by a set \(A\) instead of generated by \(A\).
A module M is finitely generated if there is a finite subset \(A\) of \(M\) that generates \(M\). If \(A = {a}\) has a single element, the module \(RA= Ra\) is called cyclic.
Exercise 11.4.3
Let \(M\) be an \(R\)-module and let \(A \subseteq M\). Then \(RA\) is the smallest submodule of \(M\) containing \(A\), that is \[ RA \quad = \bigcap\limits_{A\subseteq N, N \text{ submodule of }M} N. \]
Exercise 11.4.4
Being finitely generated and being cyclic are \(R\)-module isomorphism invariants.
Example 11.4.5
Let \(R\) be a ring with \(1 \neq 0\).
- \(R = R1\) is cyclic.
- \(R \oplus R\) is generated by \(\{(1,0),(0,1)\}\).
- \(R[x]\) is generated as an \(R\)-module by the set \(\{1,x,x^2,\ldots, x^n,\ldots\}\) of monic monomials in the variable \(x\).
-
Let \(M = \mathbb{Z}[x,y]\). \(M\) is generated by
- \(\{1,x,y\}\) as a ring,
- \(\{1,y,y^2,\ldots, y^n,\ldots\}\) as an \(\mathbb{Z}[x]\)-module, and
- \(\{x^iy^j \mid i,j \in \mathbb{Z}_{\geqslant 0}\}\) as a group (\(\mathbb{Z}\)-module).
Lemma 11.4.6
Let \(R\) be a ring with \(1 \neq 0\), let \(M\) be an \(R\)-module, and let \(N\) be an \(R\)-submodule of \(M\).
- If \(M\) is finitely generated as an \(R\)-module, then so is \(M/N\).
- If \(N\) and \(M/N\) are finitely generated as \(R\)-modules, then so is \(M\).
Proof (of Lemma 11.4.6)
The proof of (2) will be a problem set question. To show (1), note that if \(M=RA\) then \(M/N=R\bar{A}\), where \(\bar{A}=\{a+N \mid a\in A\}\).
Definition 11.4.7
Let \(M\) be an \(R\)-module and let \(A\) be a subset of \(M\). The set \(A\) is linearly independent if whenever \(r_1,\ldots,r_n \in R\) and \(a_1,\ldots ,a_n\) are distinct elements of \(A\) satisfying \(r_1a_1 + \dots + r_na_n = 0\), then \(r_1 = \dots = r_n = 0\). Otherwise \(A\) is linearly dependent.
Definition 11.4.8
A subset \(A\) of an \(R\)-module \(M\) is a basis of \(M\) if \(A\) is linearly independent and generates \(M\). An \(R\)-module M is a free \(R\)-module if \(M\) has a basis.
We will later see that over a field, every module is free. However, when \(R\) is not a field, there are \(R\)-modules that are not free; in fact, most modules are not free.
Example 11.4.9
Here are some examples of free modules:
- If we think of \(R\) as a module over itself, it is free with basis \(\{1\}\).
- The module \(R \oplus R\) is free with basis \(\{(1,0),(0,1)\}\).
- The \(R\)-module \(R[x]\) is free, and \(\{1,x,x^2,\ldots, x^n,\ldots\}\) is a basis.
- Let \(M = \mathbb{Z}[x,y]\). Then \(\{1,y,y^2,\ldots, y^n,\ldots\}\) is a basis for the \(\mathbb{Z}[x]\)-module \(M\), and \(\{x^iy^j \mid i,j \in \mathbb{Z}_{\geqslant 0}\}\) is a basis for the \(\mathbb{Z}\)-module \(M.\)
Example 11.4.10
\(\mathbb{Z}/2\) is not a free \(\mathbb{Z}\)-module. Indeed suppose that \(A\) is a basis for \(\mathbb{Z}/2\) and \(a\in A\). Then \(2a=0\) so \(A\) cannot be linearly independent, a contradiction.
Lemma 11.4.11
If \(A\) is a basis of \(M\) then every nonzero element \(0\neq m\in M\) can be written uniquely as \(m=r_1a_1 + \dots + r_na_n\) with \(a_i\) distinct elements of \(A\) and \(r_i\neq 0\).
Proof (of Lemma 11.4.11)
Suppose that if \(m\neq 0\) and \(A_1,A_2\) are finite subsets of \(A\) such that \[ m=\sum_{a\in A_1}r_aa=\sum_{b\in A_2}s_bb \] for some \(r_a, s_b \in R\). Then \[ \sum_{a\in A_1\cap A_2} (r_a-s_a)a+\sum_{a\in A_1\setminus A_2} r_aa-\sum_{a \in A_2\setminus A_1} s_aa=0. \] Since \(A\) is a linearly independent set, we conclude that \(r_a=s_a\) for \(a\in A_1\cap A_2\), \(r_a=0_R\) for \(a\in A_1\setminus A_2\), and \(s_a=0_R\) for \(a \in A_2\setminus A_1\). Set \[ B := \{a \in A_1\cap A_2 \mid r_a \neq 0_R\}. \] Then \[ m = \displaystyle\sum_{a\in B}r_aa \] is the unique way of writing \(m\) as a linear combination of elements of \(A\) with nonzero coefficients.
Theorem 11.4.12 (Universal mapping property for free modules)
Let \(R\) be a ring, \(M\) be a free \(R\)-module with basis \(B\), \(N\) be any \(R\)-module, and let \(j: B \to N\) be any function. Then there is a unique \(R\)-module homomorphism \(h: M \to N\) such that \(h(b) = j(b)\) for all \(b \in B\).
Proof (of Theorem 11.4.12)
We have two things to prove: existence and uniqueness.
Existence: Any \(0\neq m\in M\) can be written uniquely as \[ m=r_1b_1+\dots+r_nb_n \] with \(b_i\in B\) distinct and \(0 \neq r_i \in R\). Define \(h\!: M \to N\) by \[ \begin{cases} h(r_1b_1+\dots+r_nb_n) = r_1j(b_1) + \cdots +r_nj(b_n) & \text{ if } r_1b_1 + \cdots + r_nb_n \neq 0\\ h(0_M)=0_N \end{cases} \] One can check that this satisfies the conditions to be an \(R\)-module homomorphism (exercise!).
Uniqueness: Let \(h:M\to N\) be an \(R\)-module homomorphism such that \(h(b_i)=j(b_i)\). Then in particular \(h\!:(M,+)\to (N,+)\) is a group homomorphism and therefore \(h(0_M)=0_N\) by properties of group homomorphisms. Furthermore, if \(m=r_1b_1+\dots+r_nb_n\) then \[ h(m)=h(r_1b_1+\dots+r_nb_n)=r_1h(b_1)+\dots+r_nh(b_n)=r_1j(b_1)+\dots+r_nj(b_n) \] by the definition of homomorphism, and because \(h(b_i)=j(b_i)\).
Corollary 11.4.13
If \(A\) and \(B\) are sets of the same cardinality, and fix a bijection \(j:A\to B\). If \(M\) and \(N\) are free \(R\)-modules with bases \(A\) and \(B\) respectively, then there is an isomorphism of \(R\)-modules \(M \cong N\).
Proof (of Corollary 11.4.13)
Let \(g:M\to N\) and \(h:N\to M\) be the module homomorphisms induced by the bijection \(j:A\to B\) and its inverse \(j^{-1}:B\to A\), which exist by the UMP for free module. We will show that \(h\) and \(g\) are inverse homomorphisms. First, note that \(g \circ h:N\to N\) is an \(R\)-module homomorphism and \((g \circ h)(b) = g(j^{-1}(b))=j(j^{-1}(b))=b\) for every \(b\in B\). Since the identity map \(\mathrm{id}_N\) is an \(R\)-module homomorphism and \(id_N(b)=b\) for every \(b\in B\), by the uniqueness in the UMP for free modules we have \(g \circ h = \mathrm{id}_n\). Similarly, one shows that \(h \circ g = \mathrm{id}_M\).
The corollary gives that, up to isomorphism, there is only one free module with basis \(A\), provided such a module exists. But does a free module generated by a given set \(A\) exist? It turns out it does.
Definition 11.4.14
Let \(R\) be a ring and let \(A\) be a set. The free \(R\)-module generated by \(A\), denoted \(F_R(A)\) is the set of formal sums
\[ \begin{align*} F_R(A) &= \{r_1a_1 + \dots + r_na_n \mid n \geqslant 0, r_i \in R, a_i \in A\} \\ &= \left\lbrace \sum_{a \in A} r_aa \mid r_a \in R, r_a = 0 \text{ for all but finitely many }a \right\rbrace, \end{align*} \] with addition defined by \[ \left(\sum_{a \in A} r_aa\right) + \left(\sum_{a \in A} s_aa \right) = \sum_{a \in A} (r_a + s_a)a \] and \(R\)-action defined by \[ r \left(\sum_{a \in A} r_aa \right) = \sum_{a \in A} (rr_a)a. \]
Exercise 11.4.15
This construction \(F_R(A)\) results in an \(R\)-module, which is free with basis \(A\), and \(F_R(A)\cong \bigoplus_{a\in A}R\).
Theorem 11.4.16 (Uniqueness of rank over commutative rings)
Let \(R\) be a commutative ring with \(1 \neq 0\) and let \(M\) be a free \(R\)-module. If \(A\) and \(B\) are both bases for \(M\), then \(A\) and \(B\) have the same cardinality, meaning that there exists a bijection \(A \to B\).
Proof (of Theorem 11.4.16)
You will show this in the next problem set (at least in the case where \(M\) has a finite basis).
Definition 11.4.17
Let \(R\) be a commutative ring with \(1\neq 0\) and let \(M\) be a free \(R\)-module. The rank of \(M\) is the cardinality of any basis of \(M\).
Example 11.4.18
Let \(R\) be a commutative ring with \(1 \neq 0\). The rank of \(R^n\) is \(n\). Note that, any free \(R\)-module of rank \(n\) must be isomorphic to \(R^n\).
Earlier, we described the \(R\)-module structure on the direct sum of \(R\)-modules; this is how we construct \(R^n\), by taking the direct sum of \(n\) copies of the \(R\)-module \(R\). This construction can also be described as the direct product of \(n\) copies of \(R\). However, the direct sum and direct product are two different constructions.
Definition 11.4.19
Let \(R\) be a ring. Let \(\{ M_a \}_{a \in J}\) be a collection of \(R\)-modules. The direct product of the R-modules \(M_a\) is the Cartesian product \[ \prod_{a \in J} M_a := \{ (m_a)_{a \in J} \mid m_a \in M_a \} \] with addition defined by \[ (m_a)_{a \in J}+(n_a)_{a \in J} := (m_a+n_a)_{a \in J} \] and \(R\)-action defined by \[ r(m_a)_{a \in J} = (rm_a)_{a \in J}. \]
The direct sum of the \(R\)-modules \(M_a\) is the \(R\)-submodule \(\bigoplus_{a \in J} M_a\) of the direct product \(\prod_{a \in J} M_a\) given by \[ \bigoplus_{a \in J} M_a = \{(m_a)_{a \in J} \mid m_a = 0 \text{ for all but finitely many } a \}. \]
Exercise 11.4.20
The direct sum and the direct product of an arbitrary family of \(R\)-modules are \(R\)-modules.
Example 11.4.21
Suppose that \(|A| = n < \infty\). Let \(M_1,\ldots,M_n\) be \(R\)-modules. The direct product module \(M_1 \times \dots \times M_n\) is the abelian group \(M_1 \times \dots \times M_n\) with ring action given by \(r(m_1,\ldots,m_n) = (rm_1,\ldots,rm_n)\) for all \(r \in R\) and \(m_i \in M_i\). Comparing the definitions we see that \[ M_1 \times \dots \times M_n = M_1 \oplus \dots \oplus M_n. \]
If \(M_i=R\) for \(1 \leqslant i \leqslant n\), then we denote \(R^n = \underbrace{R\times \dots \times R}_n=\underbrace{R\oplus \dots \oplus R}_n\).
It is useful to talk about maps from the factors/summands to the direct product/ direct sum and conversely.
Definition 11.4.22
For \(i\in J\) the inclusion of the \(i\)-th factor into a direct product or direct sum is the map \[ \iota_i\!: M_i \to \prod_{a \in J} M_a \text{ or } \iota_i\!: M_i \to \bigoplus_{a \in J} M_a, \iota_i(m)=(m_a)_{a \in J}, \text{ where } m_a=\begin{cases} m & \text{ if } a = i \\ 0 & \text{ if } a \neq i \end{cases}. \]
For \(i\in J\) the \(i\)-th projection map from a direct product or a direct sum module is \[ \pi_i\!: \prod_{a \in J} M_a \to M_i \text{ or } \pi_i:\bigoplus_{a \in J} M_a \to M_i, \pi_i \left((m_a)_{a \in J}\right)=m_i. \]
Lemma 11.4.23
Projections from direct products or sums of \(R\)-module, inclusions into direct products or sums of \(R\)-modules, and products of \(R\)-module homomorphisms are \(R\)-module homomorphisms. Furthermore, inclusions are injective, projections are surjective, and \[ \pi_i\circ \iota_i=\mathrm{id}_{M_i}. \] Also, \(\iota_i(M_i)\) is an \(R\)-submodule of the direct product/sum which is isomorphic to \(M_i\).
Note, however, that \(\iota_i\circ\pi_i\neq \mathrm{id}\).
12. Vector Spaces
We now turn our focus to vector spaces: modules over fields.
12.1 Classification of vector spaces and dimension
Recall that for a subset \(A\) of an \(F\)-vector space \(V\), the span of \(A\), denoted \(\mathrm{span}(A)\), is the subspace generated by \(A\):
\[ \mathrm{span}(A) := \left\{\sum_{i=1}^n c_i a_i \mid n \geqslant 0, c_i\in F, a_i \in A \right\}. \]Lemma 12.1.1
Suppose \(I\) is a linearly independent subset of an \(F\)-vector space \(V\) and \(v \in V \setminus \mathrm{span}(I)\), then \(I \cup \{v\}\) is also linearly independent.
Proof of Lemma 12.1.1
Let \(w_1, \dots, w_n\) be any list of distinct elements of \(I \cup \{v\}\) and suppose that \(\sum_i c_i w_i = 0\) for some \(c_i \in F\). If none of the \(w_i\)'s is equal to \(v\), then \(c_i = 0\) for all \(i\), since \(I\) is linearly independent. Without loss of generality, say \(w_1 = v\). If \(c_1 = 0\) then \(c_i = 0\) for all \(i\) by the same reasoning as in the previous case. If \(c_1 \ne 0\), then
\[ v = \sum_{i \geqslant 2} \frac{-c_i}{c_1} w_i \in \mathrm{span}(I), \]contrary to assumption. This proves that \(I \cup \{v\}\) is a linearly independent set.
Theorem 12.1.2
Let \(V\) be an \(F\)-vector space and assume \(I \subseteq S \subseteq V\) are subsets such that \(I\) is linearly independent and \(S\) spans \(V\). Then there is a subset \(B\) with \(I \subseteq B \subseteq S\) such that \(B\) is a basis.
Before we prove this theorem, we note the following corollary:
Corollary 12.1.3 (Every vector space has a basis)
Every vector space \(V\) has a basis, and hence is a free module. Moreover, every linearly independent subset of \(V\) is contained in some basis, and every set of vectors that spans \(V\) contains some basis.
Proof of Corollary 12.1.3
For this first part, apply the theorem with \(I = \varnothing\) and \(S = V\). For the second and third, use \(I\) arbitrary and \(S = V\) and \(I = \varnothing\) and \(S\) arbitrary, respectively.
Example 12.1.4
\(\mathbb{R}\) has a basis as a \(\mathbb{Q}\)-vector space; just don't ask me what it looks like.
Proof of Theorem 12.1.2
Let \(\mathcal{P}\) denote the collection of all subsets \(X\) of \(V\) such that \(I \subseteq X \subseteq S\) and \(X\) is linearly independent. We make \(\mathcal{P}\) into a poset by the order relation given by set containment \(\subseteq\). We note that \(\mathcal{P}\) is not empty since, for example \(I \in \mathcal{P}\).
Let \(\mathcal{T}\) be any nonempty chain in \(\mathcal{P}\). Let \(Z = \bigcup_{Y \in \mathcal{T}} Y\). We claim \(Z \in \mathcal{P}\). Given \(z_1, \dots, z_m \in Z\), for each \(i\) we have \(z_i \in Y_i\) for some \(Y_i \in T\). Since \(T\) is totally ordered, one of \(Y_1, \dots, Y_m\) contains all the others and hence contains all the \(z_i\)'s. Since \(Y_i\) is linearly independent, this shows \(z_1, \dots, z_m\) are linearly independent. Thus \(Z\) is linearly independent. Since \(\mathcal{T}\) is non-empty, \(Z \supseteq I\) and hence \(Z \in \mathcal{P}\). It is an upper bound for \(\mathcal{T}\) by construction.
By Zorn's Lemma, \(\mathcal{P}\) has a maximal element \(B\), which we claim is a basis for \(V\). Note that \(B\) is linearly independent and \(I \subseteq B \subseteq S\) by construction. We need to show that it spans \(V\). Suppose not. Since \(S\) spans \(V\), if \(S \subseteq \mathrm{span}(B)\), then \(\mathrm{span}(B)\) would have to be all of \(V\). So, there is at least one \(v \in S\) such that \(v \notin \mathrm{span}(B)\), and set \(X := B \cup \{v\}\). Clearly, \(I \subset X \subseteq S\) and, by Lemma 12.1.1, \(X\) is linearly independent. This shows that \(X\) is an element of \(\mathcal{P}\) that is strictly bigger than \(B\), contrary to the maximality of \(B\).
Corollary 12.1.5
Let \(F\) be a field and \(W\) be a subspace of the \(F\)-vector space \(V\). Then every basis of \(W\) extends to a basis of \(V\), that is, if \(B\) is a basis of \(W\) then there exists a basis \(\tilde{B}\) of \(V\) such that \(B\) is a subset of \(\tilde{B}\).
Proof of Corollary 12.1.5
Apply Theorem 12.1.2 with \(B = I\) and \(S = V\). Since \(B\) is a basis of \(W\), \(B\) is linearly independent, and \(B\) remains linearly independent when regarded as a subset of \(V\).
Remark 12.1.6
It is not true that, with the notation of the previous Corollary, if \(\tilde{B}\) is a basis of \(V\) then there exists a basis \(B\) of \(W\) such that \(B\) is a subset of \(\tilde{B}\). For instance, take \(F = \mathbb{R}\), \(V = \mathbb{R}^2\), \(\tilde{B} = \{(1,0), (0,1)\}\) and \(W\) the subspace spanned by \((1,1)\).
The following is an essential property of vector spaces that eventually will allow us to compare bases in terms of size.
Lemma 12.1.7 (Exchange Property)
Let \(B\) be a basis for a vector space \(V\) and consider any set of linearly independent vectors \(I \subseteq V\). Then there is a subset \(A \subseteq B\) such that
- \(|I|=|A|\) (meaning that there is a bijection \( I \leftrightarrow A\)), and
- \((B \smallsetminus A) \cup I\) is also a basis for \(V\).
Proof of Lemma 12.1.7
First we show we can swap out one element of \(B\) for one nonzero element \(\{v\}\). In this case, we will show the stronger statement that for any subset \(B_0 \subseteq B\), and any element \(a\notin \mathrm{span}(B_0)\), there is some \(b\in B\smallsetminus B_0\) such that \(B\smallsetminus \{b\} \cup \{v\}\) is a basis for \(V\).
Since \(B\) is a basis, we can write
\[ v = \sum_i \lambda_i b_i \]for some elements \(b_i \in B\). Since \(v\notin \mathrm{span}(B_0)\), we have \(\lambda_i\neq 0\) for some \(b_i\notin B_0\); say \(\lambda_1\neq 0\). We claim that \(B' := B\smallsetminus \{ b_1\} \cup \{v\}\) is a basis.
\(B'\) is linearly independent: By Lemma 12.1.1, it suffices to show that \(v\notin \mathrm{span}(B\smallsetminus \{b_1\})\). Indeed, if it were, then writing \(v\) as a linear combination of \(B\smallsetminus \{b_1\}\) and the linear combination \(v = \sum_i \lambda_i b_i\) with \(b_1\neq 0\) would give two different expressions of \(v\) in the basis \(B\), a contradiction.
\(B'\) spans \(V\): First, we note that it suffices to show that \(b_1\) in the span of \(V\): once we have shown this, we can write any element of \(V\) as a linear combination of elements of \(B\), and just replace the term with \(b_1\) with a linear combination of elements of \(B'\) by substituting. To this end, by the contrapositive of Lemma 12.1.1, it suffices to show that \(B' \cup \{b_1\}\) is not linearly dependent, and our starting expression \(v = \sum_i \lambda_i b_i\) does this.
This concludes the case of one element. For the general case, we set up a Zorn's Lemma argument.Consider the collection of pairs \((I', A')\) with \(I' \subseteq I\), \(A' \subseteq A\), and \(|I'|=|A'|\) with the property that \(B\smallsetminus A' \cup I'\) is a basis for \(V\). By a Zorn's Lemma argument (left as an exercise), there is a maximal such pair under the partial order \((I',A')\leq (I'',A'')\) if \(I'\subseteq I''\) and \(A'\subseteq A''\). Let \((I_0,A_0)\) be a maximal element. We will argue that \(I_0=I\).
To obtain a contradiction, suppose otherwise, and let \(a\in I \smallsetminus I_0\). Apply the special case above to the basis \((B\smallsetminus A_0) \cup I_0\) and special subset \(I_0\): since \(I\) is linearly independent, \(a\notin \mathrm{span}(I_0)\). Then by the special case, there is some \(b\in B\smallsetminus A_0\) such that \( B\smallsetminus (A_0 \cup \{b\}) \cup (I_0 \cup \{a\})\) is a basis. This contradicts the maximality of \((I_0,A_0)\), so we deduce that \(I_0=I\) as required.
It follows that all bases for the same vector space have the same cardinality.
Theorem 12.1.8 (Dimension Theorem)
Let \(V\) be a vector space, and \(B,B'\) be two bases for \(V\). Then \(|B| = |B'|\), meaning there is a bijection \(B\leftrightarrow B'\).
Proof of Theorem 12.1.8
Let \(B, B'\) be two bases for \(V\). Applying the Exchange Lemma with \(C=B'\), there is a subset \(C\subseteq B\) with \(|C|=|B'|\), so there is an injective map \(B' \hookrightarrow B\). Switching roles, there is an injective map \(B\hookrightarrow B'\). It follows from a result in set theory (the Cantor-Bernstein theorem) that there is a bijection \( B\leftrightarrow B'\).
Corollary 12.1.9
Let \(F\) be a field. Let \(V\) be a vector space with a basis \(B\) and \(V'\) be a vector space with a basis \(B'\). Then \[ V\cong V' \quad \Longleftrightarrow \quad |B|=|B'|.\]
Proof of Corollary 12.1.9
The (\(\Leftarrow\)) implication is a special case of Corollary 11.4.13. For the (\(\Rightarrow\)) implication, we claim that if \(\phi: V\to V'\) is an isomorphism, then \(\phi(B)\) is a basis for \(V'\):
\(\phi(B)\) is linearly independent: Let \(\phi(b_1),\dots,\phi(b_n)\in \phi(B)\) and \( \lambda_1 \phi(b_1) + \cdots+ \lambda_n \phi(b_n) = 0\). Then \[ 0= \lambda_1 \phi(b_1) + \cdots+ \lambda_n \phi(b_n) = \phi( \lambda_1 b_1 + \cdots+ \lambda_n b_n)\] and \(\phi\) is injective so \( \lambda_1 b_1 + \cdots+ \lambda_n b_n=0\). Since \(B\) is linearly independent, we have \(\lambda_1 = \cdots=\lambda_n=0\).
\(\phi(B)\) spans \(V'\): Since \(\phi\) is surjective, for any \(v'\in V'\) we can write \(v'=\phi(v)\) for some \(v\in V\). Then we can write \(v=\lambda_1 b_1 + \cdots + \lambda_n b_n\) for some \(\lambda_i\in F\) and \(b_i\in B\), and then \[ v' = \phi(v) = \phi(\lambda_1 b_1 + \cdots + \lambda_n b_n) = \lambda_1 \phi(b_1) + \cdots+ \lambda_n \phi(b_n).\]
Thus, \(\phi(B)\) is a basis for \(V'\). Since \(\phi\) is a bijection, we have \(|B|=|\phi(B)|\), and by the previous Theorem, \(|\phi(B)|=|B'|\).
Definition 12.1.10
The dimension of a vector space \(V\), denoted \(\dim_F(V)\) or \(\dim(V)\), is the cardinality of any of its bases.
Example 12.1.11
\(\dim_F(F^n) = |\{e_1,e_2,\ldots,e_n\}| = n.\)
While one can talk about infinite cardinals, we'll generally say that dimension is a natural number or \(\infty\). We restate the results above specifically in the finite-dimensional case for easy reference.
Theorem 12.1.12 (Classification of finite dimensional vector spaces)
Let \(F\) be a field. Let \(V\) be a vector space of dimension \(n\), and \(W\) be a vector space of dimension \(m\).
- (1) \(V\cong W\) if and only if \(n = m\).
- (2) \(V \cong F^n\).
- (2) \(F^n \cong F^m\) if and only if \(m=n\).
Proof of Theorem 12.1.12
Part (1) follows from of Corollary 12.1.9. Part (2) and (3) are special cases, in light of Example 12.1.11.
Remark 12.1.13
Let us consider a few infinite-dimensional vector spaces.
Example 12.1.14
Consider the vector space \(F[x]\). This cannot be a finite dimensional vector space. For instance, if \(\{f_1 , \dots , f_n\}\) were a basis, then setting
\[ M = \max_{1 \leqslant j \leqslant n}\{ \deg(f_j)\} \]we see that the element \(x^{M+1}\) is not be in the span of \(\{f_1 , \dots , f_n\}\). We can find a basis for this space though. Consider the collection \(B = \{1, x, x^2 , \ldots \}\). This set is linearly independent and spans \(F[x]\), thus it forms a basis for \(F[x]\). This basis is countable, so \(\dim_F(F[x])= |\mathbb{N}|\).
Example 12.1.15
Consider the real vector space
\[ V := \mathbb{R}^\mathbb{N} = \mathbb{R}\times \mathbb{R}\times \mathbb{R} \times \cdots. \]This space can be identified with sequences \(\{a_n\}\) of real numbers. One might be interested in a basis for this vector space. At first glance, the most obvious choice for a basis would be \(E = \{e_1,e_2,\ldots\}\). It turns out that \(E\) is the basis for the direct sum \(\bigoplus_{i\in \N}\mathbb{R}\). However, it is immediate that this set does not span \(V\), as \(v = (1,1,\ldots)\) can not be represented as a finite linear combination of these elements. Since \(v\) is not in \(\mathrm{span}(E)\), then we know that \(E \cup \{v\}\) is a linearly independent set. However, this new set \(E \cup \{v\}\) does not span \(V\) either, as \((1, 2, 3, 4, \ldots)\) is not in the span of \(E \cup \{v\}\). We know that \(V\) has a basis, but it can be shown that no countable collection of vectors forms a basis for this space, and in fact \(\dim_\mathbb{R} (\mathbb{R}^\mathbb{N}) =|\mathbb{R}|\).
Example 12.1.17
Since \(\mathbb{Q}\) is a subring of \(\mathbb{R}\), we have that \(\mathbb{R}\) is a \(\mathbb{Q}\)-vector space, and likewise with \(\mathbb{C}\). One can show that \(\dim_{\mathbb{Q}}(\mathbb{R})=|\mathbb{R}|\), and \(\dim_{\mathbb{Q}}(\mathbb{C})=|\mathbb{C}| = |\mathbb{R}|\), so \(\mathbb{R}\cong \mathbb{C}\) as \(\mathbb{Q}\)-vector spaces. In particular, \((\mathbb{R},+)\cong (\mathbb{C},+)\) as groups.
We now deduce some formulas that relate the dimensions of various vector spaces.
Theorem 12.1.18
Let \(W\) be a subspace of a vector space \(V\). Then \[ \dim(V) = \dim(W) + \dim(V/W). \]
Here the dimension of a vector space is understood to be either a nonnegative integer or \(\infty\), and the arithmetic of the formula is understood to follow the rules \(n+\infty=\infty=\infty+\infty\) for any \(n\in \mathbb{Z}_{\geqslant 0}\). The proof follows from the Problem #1 in Problem Set #2.
Example 12.1.19
Consider the vector space \(V = \mathbb{R}^2\) and its subspace \(W=\mathrm{span}\{e_1\}\). Then the quotient vector space \(V/W\) is, by definition,
\[ V/W=\{(x,y)+W \mid (x,y)\in \mathbb{R}^2\}. \]Looking at each coset we see that
\[ (x,y)+W=(x,y)+\mathrm{span}\{e_1\}=\{(x,y)+(a,0)\mid a\in \mathbb{R}\}=\{(t,y)\mid t\in \mathbb{R}\}, \]so \((x,y)+W\) is geometrically a line parallel to the \(x\)-axis and having the \(y\)-intercept \(y\). It is intuitively natural to identify such a line with its intercept, which gives a map
\[ V/W\to \mathrm{span}\{e_2\} \quad (x,y)+W \mapsto (0,y). \]It turns out that this map is a vector space isomorphism, hence \[ \dim(V/W) = \dim(\mathrm{span}\{e_2\}) = 1 \] and we can check that \[ \dim(W) + \dim(V/W) = 1+1 = 2 = \dim(V). \]
If \(V\) and \(W\) are both infinite dimensional vector spaces, it can happen that \(V/W\) is finite dimensional but also that it is infinite dimensional.
Example 12.1.20
Let \(V=F[x]\), which we saw in Example 12.1.15 is an infinite dimensional vector space over \(F\). Fix a polynomial \(f\) with \(\deg(f)=d\), and note that the ideal \((f)\) of \(F[x]\) generated by \(f\) is also an \(F\)-vector subspace of \(F[x]\) via restriction of scalars. We will show later that \(\dim(F[x]/(f))=d\). In contrast, the subspace \(E\) of all even degree polynomials in \(F[x]\) together with the zero polynomial satisfies \(\dim(F[x]/E)=\infty\).
Definition 12.1.21
Let \(T\!: V \to W\) be a linear transformation. The nullspace of \(T\) is \(\ker(T)\). The rank of \(T\) is \(\dim(\mathrm{im}(T))\).
Corollary 12.1.22 (Rank-Nullity Theorem)
Let \(f\!: V \to W\) be a linear transformation. Then \[ \dim(\ker(f)) + \dim(\mathrm{im}(f)) = \dim(V). \]
Proof of Corollary 12.1.22
By the First Isomorphism Theorem for modules we have \(V/\ker(f)\cong\mathrm{im}(f)\), thus \[ \dim\left(V/\ker(f)\right)=\dim(\mathrm{im}(V)). \] By Theorem 12.1.18, we have \[ \dim(V)=\dim(\ker(V))+\dim\left(V/\ker(f)\right). \] Thus \[ \dim(V)=\dim(\ker(V))+\dim\left(V/\ker(f)\right) = \dim(\ker(V)) + \dim(\mathrm{im}(V)). \]
12.2 Linear transformations and homomorphisms between free modules
Definition 12.2.1 (The matrix of a homomorphism between free modules)
Let \(R\) be a commutative ring with \(1\neq 0\). Let \(V\) be a finitely generated free \(R\)-module of rank \(n\), and let \(W\) be a finitely generated free \(R\)-module of rank \(m\). Let \(B=\{b_1, \dots, b_n\}\) and \(C=\{c_1, \dots, c_m\}\) be ordered bases of \(V,W\). Given an \(R\)-module homomorphism \(f: V \to W\), we define elements \(a_{ij}\in R\) for \(1 \leqslant i \leqslant m\) and \(1 \leqslant j \leqslant n\) by the formulas
\[ f(b_i) = \sum_{j=1}^m a_{j,i} c_j. \tag{12.2.1}\label{eq-12-2-aij} \]The matrix
\[ [f]_B^C= \begin{bmatrix} a_{1,1} & a_{1,2} & \cdots & a_{1,n} \\ a_{2,1} & a_{2,2} & \cdots & a_{2,n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m,1} & a_{m,2} & \cdots & a_{m,n} \\ \end{bmatrix} \]is said to represent the homomorphism \(f\) with respect to the bases \(B\) and \(C\).
Remark 12.2.2
By Lemma 11.4.11, the coefficients \(a_{j,i}\) in equation \(\eqref{eq-12-2-aij}\) are uniquely determined by the \(f(b_i)\) and the elements of \(C\). The coefficients \(a_{j,i}\) corresponding to \(f(b_i)\) form the \(i\)th column of \([f]_B^C\). Note that \([f]_B^C\) is an \(m\times n\) matrix with entries in \(R\).
Definition 12.2.3
Let \(V\) and \(W\) be finite \(F\)-vector spaces of dimension \(n\) and \(m\) with ordered bases \(B\) and \(C\), respectively, and let \(f\!:V\to W\) be a linear transformation. The matrix \([f]_B^C\) is called the matrix of the linear transformation \(f\) with respect to the bases \(B\) and \(C\).
Example 12.2.4
If \(\mathrm{id}_V\!: V \to V\) is the identity automorphism of an \(n\)-dimensional free \(R\)-module \(V\), then for any basis \(B\) of \(V\) we have \(\mathrm{id}_V(b_i) = b_i\) for all \(i\) and hence
\[ [\mathrm{id}_V]^B_B = I_n. \]Example 12.2.5
Let \(P_3\) denote the the \(F\)-vector space of polynomials of degree at most 3 (including the zero polynomial) and consider the linear transformation \(d:P_3\to P_3\) given by taking the derivative \(d(f)=f'\). Let \(B=\{1,x,x^2,x^3\}\). Then
\[ [f]_B^B= \begin{bmatrix} 0 & 1 &0 & 0 \\ 0 &0 & 2 & 0 \\ 0& 0& 0& 3 \\ 0 & 0 &0 & 0 \\ \end{bmatrix}. \]Example 12.2.6
Let \(F\) be a field and consider a linear transformation \(f\!:V\to W\), where \(V=F^n\) and \(W=F^m\). Consider also the standard ordered bases \(B\) and \(C\), i.e. \(b_i=e_i\in V\) and \(c_i=e_i\in W\). Then for any
\[ v = \begin{bmatrix} \ell_1\\ \vdots\\ \ell_n \end{bmatrix} =\sum_i \ell_i b_i \]in \(V\) we have
\[ f \left( \sum \ell_i b_i \right) = \sum_i \ell_i f(b_i). \]Each \(f(b_i)\) can be written uniquely as a linear combination of the \(c_j\)'s as in \(\eqref{eq-12-2-aij}\):
\[ f(b_i) = \sum_j a_{j,i} c_j. \]Then we get
\[ f(v) = \sum_i\ell_i\left( \sum_{j} a_{j,i} c_j \right)= \sum_j \left(\sum_i a_{j,i} \ell_i\right) c_j. \]In other words, we have
\[ f(v) = \begin{bmatrix} \sum_i a_{1,i} \ell_i \\ \vdots\\ \sum_i a_{m,i} \ell_i \end{bmatrix} = \begin{bmatrix} a_{1,1} & a_{1,2} & \cdots & a_{1,n} \\ a_{2,1} & a_{2,2} & \cdots & a_{2,n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m,1} & a_{m,2} & \cdots & a_{m,n} \\ \end{bmatrix} \cdot \begin{bmatrix} \ell_1\\\vdots\\ \ell_n\end{bmatrix} =[f]_B^C\cdot v. \]This says that any linear transformation \(f\!:F^n\to F^m\) is given by multiplication by a matrix, since we noticed above that \(f(v) = [f]_B^C\cdot v\). The same type of statement holds for free modules over commutative rings, and we will show it below in Proposition 12.2.7.
Proposition 12.2.7
Let \(R\) be a commutative ring with \(1\neq 0\). Let \(V\) and \(W\) be finitely generated free \(R\)-modules of ranks \(n\) and \(m\) respectively. Fixing ordered bases \(B\) for \(V\) and \(C\) for \(W\) gives an isomorphism of \(R\)-modules
\[ \mathrm{Hom}_R(V, W) \cong \mathrm{Mat}_{m,n}(R) \qquad f\mapsto [f]_B^C. \]If \(V=W\), so that in particular \(m=n\), and \(B=C\), then the above map is an \(R\)-module isomorphism \(\mathrm{End}_R(V)\cong\mathrm{Mat}_n(R)\), and an isomorphism of rings as well.
Proof of Proposition 12.2.7
Let \(\varphi\!:\mathrm{Hom}_R(V, W) \to \mathrm{Mat}_{m,n}(R)\) be defined by \(\varphi(f)=[f]_B^C\). We need to check that \(\varphi\) is a homomorphism of \(R\)-modules, which translates into \([f+g]_B^C=[f]_B^C+[g]_B^C\) and \([\lambda f]_B^C=\lambda[f]_B^C\) for any \(f,g \in \mathrm{Hom}_R(V, W)\) and \(\lambda\in R\). Let \(A=[f]_B^C\) and \(A'=[g]_B^C\). Then
\[ (f+g)(b_i)=f(b_i)+g(b_i)= \sum_j a_{j,i} c_j+ \sum_j a'_{j,i} c_j= \sum_j (a_{j,i}+a'_{i,j}) c_j \]gives \([f+g]_B^C=A+A'\) and
\[ (\lambda f)(b_i)=\lambda\left( \sum_j a_{j,i} c_j\right)= \sum_j (\lambda a_{j,i}) c_j \]gives \([\lambda f]_B^C=\lambda A\). We leave the proof that for \(f,g\in \mathrm{End}_R(V)\) we have \([f\circ g]_B^B=[f]_B^B[g]_B^B\) as an exercise.
Finally, the argument described in Example 12.2.6 also works for any ring \(R\), and it can be adapted for any two chosen basis \(B\) and \(C\), showing that \(\varphi\) is a bijection.
Corollary 12.2.8
For any field \(F\) and finite \(F\)-vector spaces \(V\) and \(W\) of dimension \(n\) and \(m\) respectively, \(\dim(\mathrm{Hom}_F(V, W))=mn\).
Proof of Corollary 12.2.8
The isomorphism \(\mathrm{Hom}_F(V, W) \cong \mathrm{Mat}_{m,n}(F)\) gives
\[ \dim \left(\mathrm{Hom}_F(V, W) \right) = \dim \left( \mathrm{Mat}_{m,n}(F) \right)=mn. \]Exercise 12.2.9
Let \(R\) be a commutative ring and \(V\) be a free module with a basis \(B\). Let \(M\) be an arbitrary \(R\)-module and let \(\phi: V\to M\) be an \(R\)-module homomorphism. Then:
- \(\phi\) is injective if and only if \(\phi(B)\) is linearly independent.
- \(\phi\) is surjective if and only if \(\phi(B)\) generates \(M\).
- \(\phi\) is an isomorphism if and only if \(\phi(B)\) is a basis for \(M\).
Definition 12.2.10
Let \(R\) be a commutative ring and \(V\) be a free module with basis \(B=\{b_1,\dots,b_n\}\). Consider the \(R\)-module homomorphism \(\phi:V\to R^n\) with \(\phi(b_i)=e_i\). There is a unique such map by the UMP for free modules, and it is an isomorphism by the previous exercise. We call \(\phi(v)\) the vector of \(B\)-coordinates of \(v\), denoted \([v]_B\).
Remark 12.2.11
Note that
\[ [v]_B = (r_1,\dots,r_n) \Longleftrightarrow v = r_1 b_1 + \cdots + r_n b_n, \]since \(\phi(r_1 b_1 + \cdots + r_n b_n) = r_1 e_1 + \cdots + r_n e_n = (r_1,\dots,r_n)\) and \(\phi\) is injective.
Proposition 12.2.12
Let \(R\) be a commutative ring. Let \(V\) be a free module with ordered basis \(B\) and \(W\) be a free module with ordered basis \(C\). Let \(f:V\to W\) be a linear transformation. Then
\[ [f(v)]_C = [f]_B^C \cdot [v]_B \]for all \(v\in V\).
Proof of Proposition 12.2.12
Let \(v\in V\) and write \([v]_B = (r_1,\dots,r_n)\), so \(v=\sum_j r_j b_j\). Write \([f]_B^C=[a_{i,j}]\). Then
\[ f(v) = f\!\left(\sum_j r_j b_j\right) = \sum_j r_j f(b_j) = \sum_j r_j \left(\sum_i a_{i,j} c_i\right) = \sum_i \left(\sum_j a_{i,j} r_j\right) c_i. \]Thus the \(i\)-th entry of \([f(v)]_C\) is \(\sum_j a_{i,j} r_j\). On the other hand, multiplying out \([f]_B^C \cdot [v]_B = [a_{i,j}](r_1,\dots,r_n)\), the \(i\)-th entry is also \(\sum_j a_{i,j} r_j\).
12.3 Change of basis
Definition 12.3.1
Let \(V\) be a finitely generated free module over a commutative ring \(R\), and let \(B\) and \(C\) be bases of \(V\). Let \(\mathrm{id}_V\) be the identity map on \(V\). Then \([\mathrm{id}_V]_B^{C}\) is a matrix called the change of basis matrix from \(B\) to \(C\).
In Theorem 12.3.6 we will show that \([\mathrm{id}_V]_B^{C}\) is invertible with inverse \(\left([\mathrm{id}_V]_B^{C}\right)^{-1}=[\mathrm{id}_V]_{C}^B\).
Example 12.3.2
Consider the subspace \(V = P_2\) of \(F[x]\) of all polynomials of degree up to \(2\), and the bases \(B = \{1, x, x^2\}\) and \(C = \{1,x-2,(x-2)^2\}\) of \(V\). We calculate the change of basis matrix. We have
\[ \begin{aligned} \mathrm{id}_V(1) &=1 ,\\ \mathrm{id}_V(x) &=2\cdot1+1\cdot(x-2), \\ \mathrm{id}_V(x^2) &=4\cdot1 +4\cdot(x-2)+1\cdot(x-2)^2. \end{aligned} \]Thus, the change of basis matrix is given by \([\mathrm{id}_V]_B^{C} = \begin{bmatrix} 1 & 2 & 4\\ 0 & 1 & 4\\ 0 & 0 & 1 \end{bmatrix}.\)
Lemma 12.3.3
If \(V,W,U\) are finitely generated free \(R\)-modules with ordered bases \(B\), \(C\), and \(D\), respectively, and \(f\!: V \to W\) and \(g\!: W \to U\) are \(R\)-module homomorphisms, then \([g\circ f]_B^D=[g]_C^D \cdot [f]_B^C.\)
Proof of Lemma 12.3.3
It suffices to check that \([g\circ f]_B^D \cdot p=[g]_C^D \cdot [f]_B^C \cdot p\) for any \(p\in R^n\) where \(n=\mathrm{rank}(V)\). (In fact, we can just take \(p=e_j\) for each \(j\), since \(Ae_j\) is the \(j\)th column of \(A\).) We can write \(p=[v]_B\) for some \(v\in V\). Then
\[ [g\circ f]_B^D [v]_B = [(g\circ f)(v)]_D = [g(f(v))]_D = [g]_C^D [f(v)]_C = [g]_C^D ([f]_B^C [v]_B) = ([g]_C^D [f]_B^C) [v]_B. \]Definition 12.3.4
Let \(V\) be a finitely generated free module over a commutative ring \(R\). Two \(R\)-module homomorphisms \(f,g: V \to V\) are similar if there is a bijective linear transformation \(h: V \to V\) such that \(g = h\circ f \circ h^{-1}\). Two \(n \times n\) matrices \(A\) and \(B\) with entries in \(R\) are similar if there is an invertible \(n \times n\) matrix \(P\) such that \(B = PAP^{-1}\).
Remark 12.3.5
For elements \(A,B\in \textrm{GL}_n(R)\), the notions of similar and conjugate are the same.
Theorem 12.3.6
Let \(V, W\) be finitely generated free modules over a commutative ring \(R\), let \(B\) and \(B'\) be bases of \(V\), let \(C\) and \(C'\) be bases of \(W\), and let \(f: V \to W\) be a homomorphism. Then
\[ [f]_{B'}^{C'} = [\mathrm{id}_W]_C^{C'} [f]_B^C [\mathrm{id}_V]_{B'}^{B} \]In particular, if \(g\!: V \to V\) is an \(R\)-module homomorphism, then \([g]_B^B\) and \([g]_{B'}^{B'}\) are similar.
Proof of Theorem 12.3.6
Since \(f=\mathrm{id}_W\circ f\circ \mathrm{id}_V\), by Lemma 12.3.3 we have
\[ [f]_{B'}^{C'} = [\mathrm{id}_W]_C^{C'} [f]_B^C [\mathrm{id}_V]_{B'}^{B}. \]
Now set \(V=W,B=C,B'=C'\) and \(f=g\) in the displayed equation to obtain
\[ [g]_{B'}^{B'} = [\mathrm{id}_V]_B^{B'} [g]_B^B [\mathrm{id}_V]_{B'}^{B}=P[g]_B^B P^{-1}. \]We now come to certain special changes of basis and their matrices:
Definition 12.3.7
Let \(R\) be a commutative ring, let \(M\) be a free \(R\)-module of finite rank \(n\), and let \(B = \{b_1,\dots ,b_n\}\) be an ordered basis for \(M\). An elementary basis change operation on the basis \(B\) is one of the following three types of operations to produce a new basis \(B'=\{b'_1,\dots,b'_n\}\):
- Replacing \(b_j\) by \(r b_i + b_j\) for some \(i \neq j\) and some \(r\in R\); that is, \(b'_j = rb_i + b_j\) and \(b'_k = b_k\) for \(k\neq j\).
- Replacing \(b_i\) by \(ub_i\) for some \(i\) and some unit \(u\) of \(R\); that is, \(b'_i= u b_i\) and \(b'_k = b_k\) for \(k\neq i\).
- Swapping the indices of \(b_i\) and \(b_j\) for some \(i \neq j\); that is, \(b'_i= b_j\), \(b'_j=b_i\), and \(b'_k = b_k\) for \(k\neq i,j\).
Definition 12.3.8
Let \(R\) be a commutative ring . An elementary column operation on a matrix \(A \in \mathrm{Mat}_{m,n}(R)\) is one of the following three types of operations:
- Adding an element of \(R\) times a column of \(A\) to a different column of \(A\).
- Multiplying a column of \(A\) by a unit of \(R\).
- Interchanging two columns of \(A\).
We define a elementary row operation analogously.
Definition 12.3.9
Let \(R\) be a commutative ring. An elementary matrix over \(R\) is an \(n \times n\) matrix of one of the following three forms:
- For \(r \in R\) and \(1 \leqslant i,j \leqslant n\) with \(i \neq j\), let \(E_{i,j}(r)\) be the matrix with \(1\)s on the diagonal, \(r\) in the \((i,j)\) position, and \(0\) everywhere else.
- For \(u \in R^\times\) and \(1\leqslant i \leqslant n\) let \(E_i(u)\) denote the matrix with \((i,i)\) entry \(u\), \((j,j)\) entry \(1\) for all \(j \neq i\), and \(0\) everywhere else.
- For \(1 \leqslant i,j \leqslant n\) with \(i \neq j\), let \(E_{(i,j)}\) denote the matrix with \(1\) in the \((i,j)\) and \((j,i)\) positions and in the \((l,l)\) positions for all \(l\not \in \{i,j\}\), and \(0\) in all other entries.
Remark 12.3.10
The elementary matrices \(E_i(u)\) and \(E_{(i,j)}\) are symmetric and the transpose of \(E_{i,j}(r)\) is \(E_{j,i}(r)\). In particular, the transpose of an elementary matrix is an elementary matrix.
Lemma 12.3.11
Let \(E\) be an \(n \times n\) elementary matrix.
- \(E\) is the change of basis matrix \([\mathrm{id}]_{B'}^B\) for the corresponding elementary basis change operation from \(B\) to \(B'\).
-
If \(A \in \mathrm{Mat}_{m,n}(R)\), then the result of performing the corresponding elementary column operation on \(A\) is the product matrix \(AE\).
Explicitly,
- \(AE_{i,j}(r)\) is the matrix obtained from \(A\) by replacing \[ \mathrm{col}_j(A) \quad \rightsquigarrow \quad \mathrm{col}_j(A) + r \cdot \mathrm{col}_i(A). \]
- \(AE_{i}(u)\) is the matrix obtained from \(A\) by replacing \[ \mathrm{col}_i(A)\quad \rightsquigarrow \quad u \cdot \mathrm{col}_i(A). \]
- \(A E_{(i,j)}\) is the matrix obtained from \(A\) by replacing \[ \begin{aligned} &\mathrm{col}_i(A)\quad \rightsquigarrow \quad \mathrm{col}_j(A)\\ &\mathrm{col}_j(A)\quad \rightsquigarrow \quad \mathrm{col}_i(A) \end{aligned} \]
-
If \(B \in \mathrm{Mat}_{n,q}(R)\), then the result of performing the corresponding elementary row operation on \(A\) is the product matrix \(E^T B\).
Explicitly,
- \(E_{i,j}(r) B\) is the matrix obtained from \(B\) by replacing \[ \mathrm{row}_i(A) \quad \rightsquigarrow \quad \mathrm{row}_i(A) + r \cdot \mathrm{row}_j(A). \]
- \(E_{i}(u) B\) is the matrix obtained from \(B\) by replacing \[ \mathrm{row}_i(A)\quad \rightsquigarrow \quad u \cdot \mathrm{row}_i(A). \]
- \(E_{(i,j)} B\) is the matrix obtained from \(B\) by replacing \[ \begin{aligned} &\mathrm{row}_i(A)\quad \rightsquigarrow \quad \mathrm{row}_j(A)\\ &\mathrm{row}_j(A)\quad \rightsquigarrow \quad \mathrm{row}_i(A) \end{aligned} \]
Proof of Lemma 12.3.11
- By definition, the \(j\)-th column of \([\mathrm{id}]_{B'}^B\) gives the coefficients for \(b'_j\) as a linear combination of the elements of \(B\). In each case, we check that the matrix \(E\) agrees with the specified combinations in the definition of the basis operation.
- It suffices to check this for a row vector, since the \(i\)-th row of \(BE\) can be computed as the \(i\)-th row of \(B\) multiplied by \(E\). Then one can verify this by case-by-case multiplication.
- Similar to (2).
Remark 12.3.12
To remember the relationship between elementary matrices and elementary operations, it suffices to remember that
- Row operations correspond to multiplication on the left and column operations correspond to multiplication on the right, and
- The elementary matrix corresponding to an elementary row or column operation is the matrix that results from applying that operation to the identity matrix.
Indeed, (2) follows from taking \(A=I\) or \(B=I\) in the Lemma.
12.4 Determinants
We briefly cover some of the key facts about determinants that we will need later.
Definition 12.4.1
Let \(R\) be a commutative ring. We define the function
\[ \det: \mathrm{Mat}_{n\times n}(R) \to R \]by the rule
\[ \det(A) = \sum_{i\in S_n} \mathrm{sgn}(\sigma) \prod_{i=1}^n {a_{i,\sigma(i)}} \]for a matrix \(A=[a_{i,j}]\). We call \(\det(A)\) the determinant of \(A\).
Example 12.4.2
If \(A= \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22}\end{bmatrix}\), then \(\det(A)=a_{11} a_{22} - a_{12} a_{21}\).
If \(A\) is an upper triangular matrix, so that \(a_{ij}=0\) for \(j>i\), then there is only one nonzero term in the sum, and \(\det(A)\) is the product of the diagonal entries.
Definition 12.4.3
Let \(R\) be a commutative ring. Let \(\phi: \underbrace{R^n \times \cdots \times R^n}_{n-\text{times}} \to R\) be a function. We say that
-
\(\phi\) is multilinear if for each \(i=1,\dots,n\) we have
\[ \phi( v_1,\dots,v_{i-1}, v_i + v'_i, v_{i+1},\dots, v_n) = \phi( v_1,\dots,v_{i-1}, v_i, v_{i+1},\dots, v_n) + \phi( v_1,\dots,v_{i-1}, v'_i, v_{i+1},\dots, v_n) \]and
\[ \phi( v_1,\dots,v_{i-1}, r v_i, v_{i+1},\dots, v_n) = \phi( v_1,\dots,v_{i-1}, v_i, v_{i+1},\dots, v_n) + r \phi( v_1,\dots,v_{i-1}, v_i, v_{i+1},\dots, v_n) \]for all \(v_1,\dots,v_n, v'_i \in R^n\) and \(r\in R\); i.e., when all but one entry is fixed, the function \(R^n \to R\) in the remaining output is an \(R\)-module homomorphism.
-
\(\phi\) is alternating if \(\phi(v_1,\dots,v_n)=0\) whenever \(v_i=v_j\) for some \(i\neq j\).
Lemma 12.4.4
Let \(\phi: \underbrace{R^n \times \cdots \times R^n}_{n-\text{times}} \to R\) be a multilinear alternating function. Then for any \(\sigma\in S_n\) and any vectors \(v_1,\dots,v_n\in R^n\), we have
\[ \phi(v_{\sigma(1)}, v_{\sigma(2)},\dots,v_{\sigma(n)}) = \mathrm{sgn}(\sigma) \phi(v_1, v_2,\dots,v_n). \]Proof of Lemma 12.4.4
First, we consider the case of the transposition \((1\, 2)\). Note that
\[ \begin{aligned} 0 &= \phi(v_1+v_2, v_1+ v_2,\dots,v_n) = \phi(v_1, v_1+ v_2,\dots,v_n) + \phi(v_2, v_1+ v_2,\dots,v_n) \\ & = \phi(v_1, v_1,\dots,v_n) + \phi(v_2, v_1,\dots,v_n) + \phi(v_1, v_2,\dots,v_n) + \phi(v_2, v_2,\dots,v_n) \\ & = \phi(v_2, v_1,\dots,v_n) + \phi(v_1, v_2,\dots,v_n) , \end{aligned} \]so \(\phi(v_2, v_1,\dots,v_n) = - \phi(v_1, v_2,\dots,v_n)\). The case of an arbitrary transposition follows in the same way. For an arbitrary permutation \(\sigma\), we can write \(\sigma\) as a product of \(t\) transpositions for some \(t\). Applying the case of one transposition \(t\) times yields
Theorem 12.4.5
Let \(R\) be a commutative ring. Identify \(\mathrm{Mat}_{n\times n}(R)\) with \(\underbrace{R^n \times \cdots \times R^n}_{n-\text{times}}\) mapping \(A\) to the \(n\)-tuple of columns of \(A\). Then \(\det\) is the unique function \(\mathrm{Mat}_{n\times n} \to R\) that is multilinear, alternating, and satisfies \(\det(I) = 1\).
Proof of Theorem 12.4.5 (Sketch)
The verification that \(\det\) has these properties is straightforward but messy. To show uniqueness, we can use multlinearity to show that the value of a function with these properties is determined by the values when each column is a standard vector \(e_i\). We can then use the alternating property and Lemma 12.4.4 to show that the value is determined by the value at the identity matrix.
Our next goal is to prove the familiar multiplicative property for determinants.
Proposition 12.4.6
Let \(R\) be a commutative ring. Let \(A\) be a square matrix and let \(B\) be a matrix obtained from \(A\) by a single elementary column operation:
- If the operation is of type I, \(\det(B) = \det(A)\).
- If the operation is of type II, given by multiplying a column of \(A\) by a unit \(u\), then \(\det(B) = u \det(A)\).
- If the operation is of type III, \(\det(B) = - \det(A)\).
In particular, if \(A\) is an arbitrary matrix and \(E\) is an elementary matrix, then \(\det(EA)=\det(E)\det(A)\).
Proof of Proposition 12.4.6
The first case follows from multilinearity and alternating properties: For notational simplicity say \(A = (v_1, v_2, \dots)\) and \(B = (v_1 + rv_2, v_2, \dots)\). Then
\[ \det(B) = \det(v_1, v_2, \dots) + r \det(v_2, v_2, \dots) = \det(A) + r \cdot 0 = \det(A) \]The second case is immediate from (the second part of) \(R\)-multilinearity. The last case is a special case of Lemma 12.4.4.
The final claim comes from noting that \(\det(E)=1,u,-1\) in the three cases, respectively.
Corollary 12.4.7
For \(R = F\) a field, we have \(\det(A) \ne 0\) if and only if \(A\) is invertible.
Proof of Corollary 12.4.7
If \(A\) is not invertible, then the span of the columns of \(A\) is a proper subspace of \(F^n\) and hence the columns of \(A\) must be linearly dependent. Say the \(i\)-th column is a linear combination of the rest: \(v_i = \sum_{j \ne i} c_j v_j\). Then
\[ \det(v_1, \dots, v_n) = \sum_{j \ne i} c_j \det(\text{a matrix with the \(i\)-th and \(j\)-th columns equal}) = 0. \]If \(A\) is invertible, then we can write \(A\) as a product of elementary matrices (this is a result that we stated before, but will prove soon). The result thus follows from the Proposition 12.4.6 and the fact that \(\det(I_n) = 1\).
Theorem 12.4.8
Let \(R\) be a commutative ring. Then for any matrices \(A, B \in \mathrm{Mat}_{n \times n}(R)\) we have
\[ \det(AB) = \det(A) \det(B). \]Proof of Theorem 12.4.8
First we will consider the case where \(R=F\) is a field.
If \(A\) is not invertible, neither is \(AB\), since \(\mathrm{im}(AB) \subseteq \mathrm{im}(A)\), and if \(B\) is not invertible, neither is is \(AB\), since \(\ker(AB) \supseteq \ker(A)\). So, by the Proposition 12.4.6, if either \(A\) or \(B\) is not invertible, both sides of the equation are \(0\).
Assume now that \(A\) and \(B\) are both invertible. Then by the Proposition 12.4.6 we have
\[ A = E_1 \cdots E_n \]and
\[ B = F_1 \cdots F_m \]and hence
\[ AB = E_1 \cdots E_n F_1 \cdots F_m \]where the \(E_i\)'s and \(F_j\)'s are elementary matrices.
Applying Corollary 12.4.7 repeatedly gives
\[ \det(AB) = \det(E_1 \cdots E_n F_1 \cdots F_{m-1}) \det(F_m) = \cdots = \det(E_1) \cdots \det(E_n) \det(F_1) \cdots \det(F_m) \]and similarly
\[ \det(A) \det(B) = \left(\det(E_1) \cdots \det(E_n)\right) \left( \det(F_1) \cdots \det(F_m)\right). \]Now, for an integral domain \(R\), consider its fraction field \(F\), and identify \(R\subseteq F\) as a subring. To compute \(\det(A)\), \(\det(B)\), and \(\det(AB)\) we can replace \(R\) by \(F\), and are done by the field case.
One can apply a similar trick for arbitrary commutative rings, but we'll skip this for now.
Proposition 12.4.9
Let \(R\) be a commutative ring. Let \(A\in \mathrm{Mat}_{n\times m}(R)\) and \(B\in \mathrm{Mat}_{m\times n}(R)\) with \(m\geq n\). For a subset \(I=\{i_1,\dots,i_n\} \subseteq [m]\) with \(|I|=n\), let \(A_I\) denote the submatrix of \(A\) with columns indexed by \(I\) (in increasing order). Then
\[ \det(AB) \in \left( \{ \det(A_I) \mid I\subseteq [m],\ |I|=n\} \right). \]Proof of Proposition 12.4.9
Let \(a_1,\dots,a_m\) be the columns of \(A\). We can write the \(j\)-th column of \(AB\) as \(\sum_{i=1}^m b_{i,j} a_i\). Then, by multilinearity,
\[ \begin{aligned} \det(AB) &= \det \begin{bmatrix} \sum_{i_1=1}^m b_{i_1,1} a_{i_1} & \cdots & \sum_{i_n=1}^m b_{i_n,n} a_{i_n} \end{bmatrix} \\ &= \sum_{1\leq i_1,\dots,i_n \leq m} b_{i_1,1} \cdots b_{i_n,n} \det \begin{bmatrix} a_{i_1} & \cdots & a_{i_n} \end{bmatrix}. \end{aligned} \]
By the alternating property, we can rewrite each
\(\det\begin{bmatrix} a_{i_1} & \cdots & a_{i_n} \end{bmatrix}\)
as either zero, or the determinant of a submatrix with columns
\(i_1