Introduction

These are the lecture notes and class materials for Math 817 Introduction to Modern Algebra I in Fall 2025. This is the first of a two-part course on groups, rings, modules, and fields. In this first half, we will discuss group theory, including group actions, and introduce rings. A major goal of this course is to prepare graduate students for the PhD qualifying exam in algebra.

The lecture notes draw heavily on Eloísa Grifo’s Algebra Notes (PDF, opens in new tab) , which in turn draw from earlier lecture notes of Mark Walker and Alexandra Seceleanu. The textbook Abstract Algebra by Dummit and Foote is a good resource covering similar material.

Groups

1. An Introduction to Groups

This class has four major topics: Groups, Rings, Modules, and Fields. Let us begin with group theory.

A group is a basic algebraic structure that is found in many objects we might otherwise care about, but has enough structure that we can deduce general statements and theorems.

1.1 Definitions and first examples

Definition 1.1.1 (Binary operation)

A binary operation on a set \(S\) is a function \(S \times S \to S\). If the binary operation is denoted by \(\cdot\), we write \(x \cdot y\) for the image of \((x,y)\).

Remark 1.1.2

We often write \(xy\) instead of \(x \cdot y\) if the operation is clear from context.

Remark 1.1.3 ("Closed under \(\cdot\)")

We say that a set \(S\) is closed under the operation \(\cdot\) when we want to emphasize that for any \(x,y\in S\) the result \(xy\) belongs to \(S\). Closure is part of the definition of a binary operation and is implicitly assumed.

Definition 1.1.4 (Group)

A group is a set \(G\) with a binary operation \(\cdot\) (group multiplication) such that:

  • Associativity: For all \(x,y,z \in G\), \((x\cdot y)\cdot z = x\cdot(y\cdot z)\).
  • Identity element: There exists \(e\in G\) with \(e\cdot x = x = x\cdot e\) for all \(x\in G\).
  • Inverses: For each \(x\in G\) there is \(y\in G\) with \(xy=e=yx\).

The element \(e\) is the identity. For each \(x\in G\), an element \(y\) with \(xy=e=yx\) is an inverse of \(x\). The order of \(G\) is the number of elements in \(G\).

Example 1.1.5 (General linear group)

\[ GL_n(\mathbb{R}) := \{\, \text{invertible } n\times n \text{ matrices with entries in } \mathbb{R}\,\}. \]

This is a group under matrix multiplication: associativity holds, the identity matrix is the identity, and every element has an inverse by definition.

Vaguely, the definition of group is motivated by the idea that a collection of functions from set to itself that preserves some extra structure naturally satisfies the three axioms: for example, the general linear group consists of functions from the vector space \(\mathbb{R}^n\) to itself that preserve the linearity structure.

Remark 1.1.6 (Naming groups)

Although a group is the set and the operation, we often refer to it by the underlying set \(G\).

Remark 1.1.7 (Semigroups and monoids)

A set with an associative binary operation is a semigroup; if it has an identity, it is a monoid. While we will not study non-group monoids in this course, they’re useful objects.

Lemma 1.1.8 (Identity and inverses are unique)

  1. The identity \(e\) is unique.
  2. For each \(x\in G\), the inverse of \(x\) is unique.

Proof of Lemma 1.1.8

If \(e\) and \(e'\) are both identities, then \(e = ee' = e'\). If \(y\) and \(z\) are both inverses of \(x\), then \(z = (yx)z = y(xz) = ye = y\).

Now given \(x \in G\), suppose \(y\) and \(z\) are two inverses for \(x\), meaning that \(yx = xy = e\) and \(zx = xz = e\). Then

\[ \begin{aligned} z & = ez &&& \textrm{since $e$ is the identity}\\ & = (yx)z &&& \textrm{since $y$ is an inverse for $x$}\\ & = y(xz) &&& \textrm{by associativity}\\ & = ye &&& \textrm{since $z$ is an inverse for $x$}\\ & = y &&& \textrm{since $e$ is the identity}. \quad \square \end{aligned} \]

Remark 1.1.9

The same argument shows the identity in a monoid is unique (no inverses needed).

Notation 1.1.10

For \(x\in G\), write \(x^{-1}\) for its unique inverse.

Remark 1.1.11 (Left/right inverses in monoids)

In a monoid, an element might have left or right inverses (possibly several). If it has both a left and a right inverse, they are unique and equal (by adapting the uniqueness proof).

Exercise 1.1.12

Give an example of a monoid \(M\) and an element with a left inverse but not a right inverse.

Definition 1.1.13 (Powers)

For \(x\in G\) and integer \(n\ge 1\), define \(x^n\) as the product of \(x\) with itself \(n\) times.

Exercise 1.1.14 (Properties of group elements)

  1. If \(xy=xz\), then \(y=z\).
  2. If \(yx=zx\), then \(y=z\).
  3. \((x^{-1})^{-1}=x\).
  4. \((a_1\cdots a_n)^{-1}=a_n^{-1}\cdots a_1^{-1}\).
  5. \((x^{-1}yx)^n=x^{-1}y^n x\) for \(n\ge 1\).
  6. \((x^{-1})^n=(x^n)^{-1}\).

Notation 1.1.15

For \(n>0\), we define negative powers by the rule \(x^{-n}:= (x^n)^{-1}\). Note that \(x^{-n}=(x^{-1})^n\).

Exercise 1.1.16

Show that \(x^a x^b = x^{a+b}\) for all integers \(a,b\).

Definition 1.1.17 (Abelian group)

A group \(G\) is abelian if \(xy=yx\) for all \(x,y\in G\).

For abelian groups, we often write the operation as “\(+\)”, identity as \(0\), and inverse as \(-x\).

Example 1.1.18

The following are examples of abelian groups.
  • The trivial group \(\{e\}\).
  • \((\mathbb{Z},+), (\mathbb{Q},+), (\mathbb{R},+), (\mathbb{C},+)\).
  • \((\mathbb{Z}/n, +)\) (addition modulo \(n\)).
  • Let \(F\) be a field. If you are unfamiliar with fields, it is good enough for now to keep in mind the following examples: the real numbers \(\mathbb{R}\), the complex numbers \(\mathbb{C}\), and modular rings \(\mathbb{Z}/(p)\) for a prime number \(p\). Then \(F^\times = F\setminus\{0\}\) under multiplication.

Example 1.1.19 (\(GL_n(F)\))

For a field \(F\), \(\displaystyle GL_n(F) = \{\text{invertible } n\times n \text{ matrices over } F\}\). If an \(n\times n\) matrix has a left inverse, it also has a right inverse (and vice-versa). For \(n\ge 2\), \(GL_n(F)\) is generally nonabelian; \(GL_1(F) = F^\times\).

Definition 1.1.20 (Center)

The center of a group is the subset \[ \mathcal{Z}(G) := \{\, x\in G \mid xy=yx \text{ for all } y\in G \,\}. \]

Remark 1.1.21

The center always contains the identity. If \(\mathcal{Z}(G)=\{e_G\}\), the center is said to be trivial.

\(G\) is abelian if and only if \(\mathcal{Z}(G)=G\).

Informal Definition 1.1.22 (Presentation)

A presentation specifies a group via generators and relations:

\[ G = \langle \text{generators} \mid \text{relations} \rangle . \]

A set \(S\) generates \(G\) if every element of \(G\) can be expressed as a finite product of elements of \(S\) and their inverses. A relation is an identity among such words.

Only finite products are allowed; infinite products are not considered.

We will return to presentations later in the class when we can give a proper defintion.

Remark 1.1.23

We only take products of finitely many generators and their inverses; infinite products are not defined here.

Example 1.1.24

The group \(\mathbb{Z}\) has one generator \(1\) and no relations.

Example 1.1.25

The group of integers modulo \( n \) has presentation \[ \mathbb{Z}/n \cong \langle x \mid x^n = e \rangle . \]

Definition 1.1.26 (Cyclic / finitely generated)

A group is cyclic if generated by one element. It is finitely generated if generated by finitely many elements.

Example 1.1.27

\(\mathbb{Z}\) and \(\mathbb{Z}/n\) are cyclic.

Exercise 1.1.28

Prove that every cyclic group is abelian.

Exercise 1.1.29

Prove that \((\mathbb{Q}, +)\) and \(\operatorname{GL}_2(\mathbb{Z}_2)\) are not cyclic.

There is no algorithm that, given any group presentation as an input, can decide whether the group is actually the trivial group with just one element. There exist a presentation with finitely many generators and finitely many relations such that whether or not the group is actually the trivial group with just one element is independent of the standard axioms of mathematics! Nonetheless, finding and working with presentations of groups is a crucial technique in group theory.

1.2 Permutation groups

Definition 1.2.1 (Permutation group)

For any set \(X\), the permutation group on \(X\) is the set \(\mathrm{Perm}(X)\) of all bijective functions from \(X\) to itself equipped with the binary operation given by composition of functions.

Notation 1.2.2

For an integer \(n \geqslant 1\), we write \([n] := \{1,\ldots,n\}\) and \(S_n := \mathrm{Perm}([n])\). An element of \(S_n\) is called a permutation on \(n\) symbols, sometimes also called a permutation on \(n\) letters or \(n\) elements. The group \(S_n\) is also called the symmetric group on \(n\) symbols.

We can write an element \(\sigma\) of \(S_n\) as a table of values:

\[ \begin{array}{c||c|c|c|c|c} i & 1 & 2 & 3 & \cdots & n \\ \hline \sigma(i) & \sigma(1) & \sigma(2) & \sigma(3) & \cdots & \sigma(n) \\ \end{array} \]

We may also represent this using arrows, as follows:

\[ \begin{array}{rcl} 1 & \mapsto & \sigma(1) \\ 2 & \mapsto & \sigma(2) \\ \vdots & & \vdots \\ n & \mapsto & \sigma(n) \end{array} \]

Remark 1.2.3

To count the elements \(\sigma \in S_n\), note that

  • there are \(n\) choices for \(\sigma(1)\);
  • once \(\sigma(1)\) has been chosen, we have \(n-1\) choices for \(\sigma(2)\);
\(\vdots\)
  • once \(\sigma(1), \ldots, \sigma(n-1)\) have been chosen, there is a unique possible value for \(\sigma(n)\).

Thus the group \(S_n\) has \(n!\) elements.

It is customary to use cycle notation for permutations.

Definition 1.2.4 (Cycle, \(m\)-cycle)

If \(i_1, \dots, i_m\) are distinct integers between \(1\) and \(n\), then \(\sigma=(i_1 \, i_2 \, \cdots i_m)\) denotes the element of \(S_n\) determined by

\[ \sigma(i_1)=i_2, \quad \sigma(i_2)=i_3, \quad \ldots, \quad \sigma(i_{m-1})=i_m, \quad \sigma(i_m)=i_1, \]

and which fixes all elements of \([n] \setminus \{i_1, \dots, i_m\}\).

\[ \sigma(j) = j \quad \text{for all } j \in [n] \setminus \{i_1, \dots, i_m\}. \]

Such a permutation is called a cycle or an \(m\)-cycle. In particular, we say that \(\sigma\) has length \(m\).

Remark 1.2.5

A 1-cycle is the identity permutation.

Notation 1.2.6

A 2-cycle is often called a transposition.

Remark 1.2.7

The cycles \((i_1 \cdots i_m)\) and \((j_1 \cdots j_m)\) represent the same cycle if and only if the two lists are cyclical rearrangements. Example: \((1 \, 2 \, 3) = (2 \, 3 \, 1)\) but \((1 \, 2 \, 3) \neq (2 \, 1 \, 3)\).

Remark 1.2.8

For \(\sigma = (i_1 \ldots i_m)\), any integer \(k\) gives

\[ \sigma^k(i_j) = i_{\,j+k \pmod{m}}. \]

Here we interpret \(j+k \!\pmod{m}\) to denote the unique integer \(0 \leqslant s < m\) such that

\[ s \equiv j+k \pmod m. \]

Notation 1.2.9

The product of cycles \((i_1 \cdots i_s)\) and \((j_1 \cdots j_t)\) is written \((i_1 \cdots i_s)(j_1 \cdots j_t)\), composed right-to-left.

Example 1.2.10

We claim that the permutation group \(\mathrm{Perm}(X)\) is nonabelian whenever the set \(X\) has \(3\) or more elements. Indeed, given three distinct elements \(x, y, z \in S\), consider the transpositions \((xy)\) and \((yz)\). Now consider the permutations \((yz)(xy)\) and \((xy)(yz)\), where the composition is read from right to left, such as function composition. Then

\[ \begin{array}{c|ccc} (yz)(xy): & x \mapsto z & y \mapsto x & z \mapsto y \\ (xy)(yz): & x \mapsto y & y \mapsto z & z \mapsto x \end{array} \]

Note that \((yz)(xy) \neq (xy)(yz)\), since for example the first one takes \(x\) to \(z\) while the second one takes \(x\) to \(y\).

Lemma 1.2.11 (Disjoint cycles commute)

If the cycles involve disjoint sets of elements, they commute.

Proof of Lemma 1.2.11

We need to show \(\sigma_1(\sigma_2(l)) = \sigma_2(\sigma_1(l))\) for all \(l \in [n]\). If \(l \notin \{i_1, \ldots, i_m, j_1, \dots, j_k\}\), Then \(\sigma_1(l) = l = \sigma_2(l)\), so

\[ \sigma_1(\sigma_2(l)) = \sigma_1(l) = l \qquad \textrm{and} \qquad \sigma_2(\sigma_1(l)) = \sigma_2(l) = l. \]

If \(l \in \{j_1, \dots, j_k\}\), then \(\sigma_2(l) \in \{j_1, \dots, j_k\}\) and hence, since the subsets are disjoint, \(l\) and \(\sigma_2(l)\) are not in the set \(\{i_1 , i_2 , \dots i_m\}\). It follows that \(\sigma_1\) preserves \(l\) and \(\sigma_2(l)\), and thus

\[ \sigma_1(\sigma_2(l)) = \sigma_2(l) \quad \textrm{and} \quad \sigma_2(\sigma_1(l)) = \sigma_2(l). \]

The case when \(l \in \{i_1, \dots, i_m\}\) is analogous.

Theorem 1.2.12 (Products of disjoint cycles)

Every \(\sigma\in S_n\) can be written uniquely (up to order) as a product of disjoint cycles.

Remark 1.2.13

For the uniqueness part of the Theorem, one needs to establish a convention regarding 1-cycles: we need to decide whether the 1-cycles will be recorded. If we decide not to record \(1\)-cycles, this gives the shorter version of our factorization into cycles. If all the 1-cycles are recorded, this gives a longer version of our factorization, but this option has the advantage that it makes it clear what the size \(n\) of our group \(S_n\) is.

We will follow the first convention: we will write only \(m\)-cycles with \(m \geqslant 2\). Under this convention, the identity element of \(S_n\) is the empty product of disjoint cycles. We will, however, sometimes denote the identity by \((1)\) for convenience.

Proof of Theorem 1.2.12

Fix a permutation \(\sigma\). The key idea is to look at the orbits of \(\sigma\): for each \(x \in [n]\), its orbit by \(\sigma\) is the subset of \([n]\) of the form

\[ O_x=\{ \sigma(x), \sigma^2(x), \sigma^3(x), \ldots \} = \{\sigma^i(x) \mid i \geqslant 1 \}. \]

Notice that the orbits of two elements \(x\) and \(y\) are either the same orbit, which happens precisely when \(y \in O_x\), or disjoint. Since \([n]\) is a finite set, and \(\sigma\) is a bijection of \(\sigma\), we will eventually have \(\sigma^i(x) = \sigma^j(x)\) for some \(j > i\), but then

\[ \sigma^{j-i}(x) = \sigma^{i-i}(x) = \sigma^0(x) = x. \]

Thus we can find the smallest positive integer \(n_x\) such that \(\sigma^{n_x}(x)=x\). Now for each \(x \in [n]\), we consider the cycle

\[ \tau_x = (\sigma(x) \,\, \sigma^2(x) \,\, \sigma^3(x) \, \cdots \, \sigma^{n_x}(x)). \]

Now let \(S\) be a set of indices for the distinct \(\tau_x\), where note that we are not including the \(\tau_x\) that are \(1\)-cycles. We claim that we can factor \(\sigma\) as

\[ \sigma=\prod_{i\in S}\tau_i. \]

To show this, consider any \(x \in [n]\). It must be of the form \(\sigma^j(i)\) for some \(i \in S\), given that our choice of \(S\) was exhaustive. On the right hand side, only \(\tau_i\) moves \(x\), and indeed by definition of \(\tau_i\) we have

\[ \tau_i(x) = \sigma^{j+1}(i) = \sigma(\sigma^j(i)) = \sigma(x). \]

This proves that

\[ \sigma=\prod_{i\in S}\tau_i. \]

As for uniqueness, note that if \(\sigma = \tau_1 \cdots \tau_s\) is a product of disjoint cycles, then each \(x \in [n]\) is moved by at most one of the cycles \(\tau_i\), since the cycles are all disjoint. Fix \(i\) such that \(\tau_i\) moves \(x\). We claim that

\[ \tau_x = (\sigma(x) \,\, \sigma^2(x) \,\,\sigma^3(x) \, \cdots \, \sigma^{n_x}(x)). \]

This will show that our product of disjoint cycles giving \(\sigma\) is the same (unique) product we constructed above. To do this, note that we do know that there is some integer \(s\) such that \(\tau_x^s(x) = e\), and

\[ \tau_x = (\tau_x(x) \,\, \tau_x^2(x) \,\, \tau_x^3(x) \, \cdots \, \tau_x^{s}(x)). \]

Thus we need only to prove that

\[ \tau_x^k(x) = \sigma^k(x) \]

for all integers \(k \geqslant 1\). Now by the Theorem, disjoint cycles commute, and thus for each integer \(k \geqslant 1\) we have

\[ \sigma^k = \tau_1^k \cdots \tau_s^k. \]

But \(\tau_j\) fixes \(x\) whenever \(j \neq i\), so

\[ \sigma^k = \tau_i^k (x). \]

We conclude that the integer \(n_x\) we defined before is the length of the cycle \(\tau_i\), and that

\[ \tau_i = (x \, \tau_i(x) \, \tau_i^2(x) \cdots \tau_i^{n_x-1}(x)) = (x \, \sigma(x) \, \sigma^2(x) \cdots \sigma^{n_x-1}(x)). \]

Thus this decomposition of \(\sigma\) as a product of disjoint cycles is the same decomposition we described above.

Example 1.2.14

For \(\sigma\in S_5\) given by

\[ \begin{array}{rcl} 1 & \mapsto & 3 \\ 2 & \mapsto & 4 \\ 3 & \mapsto & 5 \\ 4 & \mapsto & 2 \\ 5 & \mapsto & 1 \end{array} \]
its decomposition is \((135)(24)\).

Definition 1.2.15 (Cycle type)

The cycle type of \(\sigma\) is the unordered list of lengths of cycles in its disjoint decomposition.

Example 1.2.16

\((3\,4)(1\,5)(2\,6\,7)(9\,8\,11)(15\,16\,17\,105\,114)\) in \(S_{156}\) has cycle type \(2,2,3,3,5\).

Exercise 1.2.17

Show \((i_1 \, i_2 \, \cdots \, i_p) = (i_1 \, i_2) (i_2 \, i_3) \cdots (i_{p-2} \, i_{p-1}) (i_{p-1} \, i_p)\).

Corollary 1.2.18

\(S_n\) is generated by transpositions.

Proof of Corollary 1.2.18

Given any permutation, we can decompose it as a product of cycles by the Theorem. Thus it suffices to show that each cycle can be written as a product of permutations. For a cycle \((i_1 \, i_2 \, \cdots \, i_p)\), one can show that

\[ (i_1 \, i_2 \, \cdots \, i_p) = (i_1 \, i_2)(i_2 \, i_3)\cdots(i_{p-2} \, i_{p-1})(i_{p-1} \, i_p), \]

which we leave as an exercise.

Remark 1.2.19

Note however that when we write a permutation as a product of transpositions, such a product is no longer necessarily unique.

Example 1.2.20

If \(n \geqslant 2\), the identity in \(S_n\) can be written as \( (1 2) (1 2) \). In fact, any transposition is its own inverse, so we can write the identity as \( (i j)(i j) \) for any \(i \neq j\).

Exercise 1.2.21

Show \((cd)(ab) = (ab)(cd)\) and \((bc)(ab) = (ac)(bc)\) for distinct \(a,b,c,d\).

Theorem 1.2.22

Given a permutation \(\sigma \in S_n\), the parity (even vs. odd) of the number of transpositions in any representation of \(\sigma\) as a product of transpositions depends only on \(\sigma\).

Proof of Theorem 1.2.22

Suppose that \(\sigma\) is a permutation that can be written as a production of transpositions \(\beta_i\) and \(\lambda_j\) in two ways,

\[ \sigma = \beta_1 \cdots \beta_s = \lambda_1 \cdots \lambda_t \]

where \(s\) is even and \(t\) is odd. As we noted in Exercise 1.2.21, every transposition is its own inverse, so we conclude that

\[ e_{S_n} = \beta_1 \cdots \beta_s \lambda_t \cdots \lambda_1, \]

which is a product of \(s+t\) transpositions. This is an odd number, so it suffices to show that it is not possible to write the identity as a product of an odd number of transpositions.

Suppose the identity can be written as the product \((a_1 b_1) \cdots (a_k b_k)\), where each \(a_i \neq b_i\). A single transposition {cannot} be the identity, and thus \(k \neq 1\). So assume, for the sake of an argument by induction, that for a fixed \(k\), we know that every product of fewer than \(k\) transpositions that equals the identity must use an even number of transpositions. Since \(2\) is even, we might as well assume \(k \geqslant 3\). Now note that since \(k > 1\), and our product is the identity, then some transposition \((a_i b_i)\) with \(i > 1\) must move \(a_1\); otherwise, \(b_1\) would be sent to \(a_1\), and our product would not be the identity.

Of all the possible ways we can write the identity as a product of \(k\) many transpositions \((a_1 b_1) \cdots (a_k b_k)\) with \(a_1 = a\) and \(b_1 = b\) fixed, choose one where the number \(N\) of times that \(a_1\) appears in one of the transpositions is smallest. The two rules in Exercise 1.2.21 allow us to rewrite the overall product without changing the number of transpositions in such a way that the transposition \((a_2 b_2)\) moves \(a_1\), meaning \(a_2 = a_1\) or \(b_2=a_1\). So let us assume that our product of transpositions has already been put in this form. Note also that \((a_i b_i) = (b_i a_i)\), so we might as well assume without loss of generality that \(a_2 = a_1\).

Case 1: When \(b_1 = b_2\), our product is

\[ (a_1 b_1) (a_1 b_1) (a_3 b_3) \cdots (a_k b_k), \]

but \((a_1 b_1) (a_1 b_1)\) is the identity, so we can rewrite our product using only \(k-2\) transpositions. By induction hypothesis, \(k-2\) is even, and thus \(k\) is even.

Case 2: When \(b_1 \neq b_2\), we can use Exercise 1.2.21 to write

\[ (a_1 b_1) (a_1 b_2) = (a_1 b_1) (b_2 a_1) = (a_1b_2)(b_1b_2). \]

Notice here that it matters that \(a_1\), \(b_1\), and \(b_2\) are all distinct, so that we can apply Exercise 1.2.21. So our product, which equals the identity, is

\[ (a_1 b_2)(b_1 b_2)(a_3 b_3) \cdots (a_k b_k). \]

The advantage of this shuffling is that while we have only changed the first two transpositions, we have decreased the number \(N\) of transpositions that move \(a_1\). But this contradicts our choice of \(N\) to be smallest possible.

Definition 1.2.23 (Parity of a permutation)

Consider a permutation \(\sigma \in S_n\). If \(\sigma = \tau_1 \cdots \tau_s\) is a product of transpositions, the sign of \(\sigma\) is given by \((-1)^s\). Permutations with sign \(1\) are called even and those with sign \(-1\) are called odd. This is also called the parity of the permutation.

Example 1.2.24

The identity is even. Every transposition is odd.

Example 1.2.25

The 3-cycle \( (123) \) can be rewritten as \( (12)(23) \), a product of 2 transpositions, so the sign of \( (123) \) is \(1\). That is, it is an even permutation.

Exercise 1.2.26

Show every permutation is a product of adjacent transpositions of the form \((i \,\, i+1)\).

1.3 Dihedral groups

For any integer \(n \geqslant 3\), let \(P_n\) denote a regular \(n\)-gon. For concreteness sake, let us imagine \(P_n\) is centered at the origin with one of its vertices located along the positive \(y\)-axis. Note that the size of the polygon will not matter. Here are some examples:

Example 1.3.1

Equilateral triangle An equilateral triangle with equal sides, oriented with a flat base and point upwards.
\( P_3\)
Square A square with four equal sides, oriented upright.
\( P_4\)
Regular pentagon A regular pentagon with five equal sides, centered and upright.
\( P_5\)

Definition 1.3.2

The dihedral group \(D_n\) is the set of symmetries of the regular \(n\)-gon \(P_n\) equipped with the binary operation given by composition.

Remark 1.3.3

There are competing notations for the group of symmetries of the \(n\)-gon. Some authors prefer to write it as \(D_{2n}\), since, as we will show, that is the order of the group. Democracy has dictated that we will be denoting it by \(D_n\), which indicates that we are talking about the symmetries of the \(n\)-gon. Some authors like to write \(D_{2 \times n}\), always keeping the \(2\), for example with \(D_{2 \times 3}\), to satisfy both camps.

Let \(d(-,-)\) denote the usual Euclidean distance between two points on the plane \(\mathbb{R}^2\). An isometry of the plane is a function \(f\!: \mathbb{R}^2 \to \mathbb{R}^2\) that is bijective and preserves the Euclidean distance, meaning that

\[ d(f(A),f(B))=d(A,B) \quad \textrm{ for all } A,B \in \mathbb{R}^2. \]

Though not obvious, it is a fact that if \(f\) preserves the distance between every pair of points in the plane, then it must be a bijection.

A symmetry of \(P_n\) is an isometry of the plane that maps \(P_n\) to itself. By this I do not mean that \(f\) fixes each point of \(P_n\), but rather that we have an equality of sets \(f(P_n) = P_n\), meaning every point of \(P_n\) is mapped to a (possibly different) point of \(P_n\) and every point of \(P_n\) is the image of some point in \(P_n\) via \(f\).

We are now ready to give the formal definition of the dihedral groups:

Remark 1.3.4

Let us informally verify that this really is a group. If \(f\) and \(g\) are in \(D_n\), then \(f \circ g\) is an isometry (since the composition of any two isometries is again an isometry) and

\[ (f \circ g)(P_n) = f(g(P_n)) = f(P_n) = P_n, \]

so that \(f \circ g \in D_n\). This proves composition is a binary operation on \(D_n\). Now note that associativity of composition is a general property of functions. The identity function on \(\mathbb{R}^2\), denoted \(\mathrm{id}_{\mathbb{R}^2}\), belongs to \(D_n\) and it is the identity element of \(D_n\). Finally, the inverse function of an isometry is also an isometry. Using this, we see that every element of \(D_n\) has an inverse.

Lemma 1.3.5

Every point on a regular polygon is completely determined, among all points on the polygon, by its distances to two adjacent vertices of the polygon.

Exercise 1.3.6

Prove Lemma 1.3.5.

Definition 1.3.7 (Rotations in \(D_n\))

Assume that the regular \(n\)-gon \(P_n\) is drawn in the plane with its center at the origin and one vertex on the \(x\) axis. Let \(r\) denote the rotation about the origin by \(\frac{2\pi}{n}\) radians counterclockwise; this is an element of \(D_n\). Its inverse is the clockwise rotation by \(\frac{2 \pi}{n}\). This gives us rotations \(r^i\), where \(r^i\) is the counterclockwise rotation by \(\frac{2 \pi i}{n}\), for each \(i = 1, \ldots, n\). Notice that when \(i=n\) this is simply the identity map.

Example 1.3.8

Here are the rotations of \(D_3\).

Triangle labeled 1-2-3 An upright triangle with vertices labeled 1 at bottom left, 3 at bottom right, and 2 at the top. 1 3 2
The identity
Triangle rotation by 120 degrees An upright triangle with vertices labeled 2 at bottom left, 1 at bottom right, and 3 at the top. 2 1 3
Rotation by 2π/3
Triangle rotation by 240 degrees An upright triangle with vertices labeled 3 at bottom left, 2 at bottom right, and 1 at the top. 3 2 1
Rotation by 4π/3

Definition 1.3.9 (Reflections in \(D_n\))

For any line of symmetry of \(P_n\), reflection about that line gives an element of \(D_n\). When \(n\) is odd, the line connecting a vertex to the midpoint of the opposite side of \(P_n\) is a line of symmetry. When \(n\) is even, there are two types of reflections: the ones about the line connecting tow opposite vertices, and the ones across the line connecting midpoints of opposite sides.

In both cases, these give us a total of \(n\) reflections.

Example 1.3.10

Reflection lines in D3 An isosceles triangle with vertices (0,0), (2,3), and (4,0). Dashed reflection lines: a vertical line through x=2 from y=-0.5 to y=3.5, a line from (0,0) to (4,2), and a line from (4,0) to (0,2).
The reflection lines in D3
Reflection lines in D4 A square with corners (0,0), (4,0), (4,4), (0,4). Dashed reflection lines: vertical through x=2, horizontal through y=2, and both diagonals extended slightly beyond the square.
The reflection lines in D4

Notation 1.3.11 (Defining \(r\) and \(s\))

Fix \(n \geqslant 3\). We will consider two special elements of \(D_n\):

  • Let \(r\) denote the symmetry of \(P_n\) given by counterclockwise rotation by \(\frac{2 \pi}{n}\).
  • Let \(s\) denote a reflection symmetry of \(P_n\) that fixes at least one of the vertices of \(P_n\), as described in Definition 1.3.9. Let \(V_1\) be a vertex of \(P_n\) that is fixed by \(s\), and label the remaining vertices of \(P_n\) with \(V_2, \ldots, V_{n}\) by going counterclockwise from \(V_1\).

From now on, whenever we are talking about \(D_n\), the letters \(r\) and \(s\) will refer only to these specific elements. Finally, we will sometimes denote the identity element of \(D_n\) by \(\mathrm{id}\), since it is the identity map.

Theorem 1.3.12

The dihedral group \(D_n\) has \(2n\) elements.

Proof of Theorem 1.3.12

First, we show that \(D_n\) has order at most \(2n\). Any element \(\sigma \in D_n\) takes the polygon \(P_n\) to itself, and must in particular send vertices to vertices and preserve adjacencies, meaning that any two adjacent vertices remain adjacent after applying \(\sigma\). Fix two adjacent vertices \(A\) and \(B\). By Lemma 1.3.5, the location of every other point \(P\) on the polygon after applying \(\sigma\) is completely determined by the locations of \(\sigma(A)\) and \(\sigma(B)\). There are \(n\) distinct possibilities for \(\sigma(A)\), since it must be one of the \(n\) vertices of the polygon. But once \(\sigma(A)\) is fixed, \(\sigma(B)\) must be a vertex adjacent to \(\sigma(B)\), so there are at most \(2\) possibilities for \(\sigma(B)\). This gives us at most \(2n\) elements in \(D_n\).

Now we need only to present \(2n\) distinct elements in \(D_n\). We have described \(n\) reflections and \(n\) rotations for \(D_n\); we need only to see that they are all distinct. First, note that the only rotation that fixes any vertices of \(P_n\) is the identity. Moreover, if we label the vertices of \(P_n\) in order with \(1, 2, \ldots, n\), say by starting in a fixed vertex and going counterclockwise through each adjacent vertex, then the rotation by an angle of \(\frac{2 \pi i}{n}\) sends \(V_{1}\) to \(V_{i+1}\) for each \(i \lt n \), showing these \(n\) rotations are distinct. Now when \(n\) is odd, each of the \(n\) reflections fixes exactly one vertex, and so they are all distinct and disjoint from the rotations. Finally, when \(n\) is even, we have two kinds of reflections to consider. The reflections through a line connecting opposite vertices have exactly two fixed vertices, and are completely determined by which two vertices are fixed; since rotations have no fixed points, none of these matches any of the rotations we have already considered. The other reflections, the ones through the midpoint of two opposite sides, are completely determined by (one of) the two pairs of adjacent vertices that they switch. No rotation switches two adjacent vertices, and thus these give us brand new elements of \(D_n\).

In both cases, we have a total of \(2n\) distinct elements of \(D_n\) given by the \(n\) rotations and the \(n\) reflections.

Remark 1.3.13

Given an element of \(D_n\), we now know that it must be a rotation or a reflection. The rotations are the elements of \(D_n\) that preserve orientation, while the reflections are the elements of \(D_n\) that reverse orientation.

Remark 1.3.14

Any reflection is its own inverse. In particular, \(s^2 = \mathrm{id}\).

Remark 1.3.15

Note that \(r^j(V_1) = V_{1+j \pmod{n}}\) for any \(j\). Thus if \(r^j = r^i\) for some \(1 \leqslant i,j \leqslant n\), then we must have \(i=j\).

In fact, we have seen that \(r^n = \mathrm{id}\) and that the rotations \(\mathrm{id}, r, r^2, \ldots, r^{n-1}\) are all distinct, so \(|r| = n\). In particular, the inverse of \(r\) is \(r^{n-1}.\)

Lemma 1.3.16

Following Notation 1.3.11, we have \(srs = r^{-1}\).

Proof of Lemma 1.3.16

First, we claim that \(rs\) is a reflection. To see this, observe that \(s(V_1) = V_1\), so

\[ rs(V_1) = r(V_1) = V_2 \]

and

\[ rs(V_{2}) = r(V_{n}) = V_1. \]

This shows that \(rs\) must be a reflection, since it reverses orientation. Reflections have order \(2\), so \(rsrs = (rs)^2 = \mathrm{id}\) and hence \(srs = r^{-1}.\)

Remark 1.3.17

Given \(|r| = n\) and \(|s| = 2\), as noted in Remarks 1.3.14 and 1.3.15, we can rewrite Lemma 1.3.16 as \[ srs^{-1} = r^{n-1}. \]

Exercise 1.3.18

Show that \(s r^i s^{-1} = r^{n-i}\) for all \(i\).

Theorem 1.3.19

Every element in \(D_n\) can be written uniquely as \(r^j\) or \(r^j s\) for \(0 \leqslant j \leqslant n-1\).

Proof of Theorem 1.3.19

Let \(\alpha\) be an arbitrary symmetry of \(P_n\). Note \(\alpha\) must fix the origin, since it is the center of mass of \(P_n\), and it must send each vertex to a vertex because the vertices are the points on \(P_n\) at largest distance from the origin. Thus \(\alpha(V_1) = V_ j\) for some \(1 \leqslant j \leqslant n\) and therefore the element \(r^{-j}\alpha\) fixes \(V_1\) and the origin. The only elements that fix \(V_1\) are the identity and \(s\). Hence either \(r^{-j}\alpha = \mathrm{id}\) or \(r^{-j}\alpha = s\). We conclude that \(\alpha = r^j\) or \(\alpha = r^js\).

Notice that we have shown that \(D_n\) has exactly \(2n\) elements, and that there are \(2n\) distinct expressions of the form \(r^j\) or \(r^js\) for \(0 \leqslant j \leqslant n-1\). Thus each element of \(D_n\) can be written in this form in a unique way.

Remark 1.3.20

The elements \(s, rs, \dots, r^{n-1}s\) are all reflections since they reverse orientation. Alternatively, we can check these are all reflections by checking they have order \(2\). As we noted before, the elements \(\mathrm{id}, r, \dots, r^{n-1}\) are rotations, and preserve orientation.

Example 1.3.21

The group \(D_4\) has eight elements: The rotations are \(\mathrm{id}, r, r^2, r^3\) and the reflections are \(s, rs, r^2s, r^3s\).

Let us now give a presentation for \(D_n\).

Theorem 1.3.22

Let \(r:\mathbb{R}^2\to\mathbb{R}^2\) denote counterclockwise rotation around the origin by \(\frac{2\pi}{n}\) radians and let \(s:\mathbb{R}^2\to\mathbb{R}^2\) denote reflection about the \(x\)-axis respectively. Set

\[ X_{2n}=\langle r,s \mid r^n = 1, s^2 = 1, srs^{-1} = r^{-1} \rangle. \]

Then \(D_n=X_{2n}\), that is,

\[ D_n = \langle r,s \mid r^n = 1, s^2 = 1, srs^{-1} = r^{-1} \rangle. \]

Proof of Theorem 1.3.22

Theorem 1.3.19 shows that \(\{r,s\}\) is a set of generators for \(D_n\). Moreover, we also know that the relations listed above \(r^n = 1, s^2 = 1, srs^{-1} = r^{-1}\) hold; the first two are easy to check, and the last one is Lemma 1.3.16. The only concern we need to deal with is that we may not have discovered all the relations of \(D_n\); or rather, we need to check that we have found enough relations so that any other valid relation follows as a consequence of the ones listed.

Let

\[ X_{2n}=\langle r,s \mid r^n = 1, s^2 = 1, srs^{-1} = r^{-1} \rangle. \]

Assume that \(D_n\) has more relations than \(X_{2n}\) does. Then \(D_n\) would be a group of cardinality strictly smaller than \(X_{2n}\), meaning that \(|D_n|<|X_{2n}|\). This will become more clear once we properly define presentations. We will show below that in fact \(|X_{2n}|\leqslant 2n=|D_n|\), thus obtaining a contradiction.

Now we show that \(X_{2n}\) has at most \(2n\) elements using just the information contained in the presentation. By definition, since \(r\) and \(s\) generated \(X_{2n}\) then every element \(x\in X_{2n}\) can be written as

\[ x = r^{m_1} s^{n_1} r^{m_2} s^{n_2} \cdots r^{m_j} s^{n_j} \]

for some \(j\) and (possibly negative) integers \(m_1, \dots, m_j, n_1, \dots, m_j\).Note that, \(m_1\) could be \(0\), so that expressions beginning with a power of \(s\) are included in this list. As a consequence of the last relation, we have

\[ sr = r^{-1}s, \]

and its not hard to see that this implies

\[ sr^m = r^{-m} s \]

for all \(m\). Thus, we can slide an \(s\) past a power of \(r\), at the cost of changing the sign of the power. Doing this repeatedly gives that we can rewrite \(x\) as

\[ x = r^M s^N. \]

By the first relation, \(r^n = 1\), from which it follows that \(r^a = r^b\) if \(a\) and \(b\) are congruent modulo \(n\). Thus we may assume \(0 \leqslant M \leqslant n-1.\) Likewise, we may assume \(0 \leqslant N \leqslant 1\). This gives a total of at most \(2n\) elements, and we conclude that \(X_{2n}\) must in fact be \(D_n.\)

Note that we have not shown that

\[ X_{2n}=\langle r,s \mid r^n, s^2, srs^{-1} = r^{-1} \rangle \]

has at least \(2n\) elements using just the presentation. But for this particular example, since we know the group presented is the same as \(D_n\), we know from Theorem 1.3.19 that it has exactly \(2n\) elements.

1.4 The quaternions

For our last big example we mention the group of quaternions, written \(Q_8\).

Definition 1.4.1

The quaternion group \(Q_8\) is a group with \(8\) elements

\[ Q_8=\{ 1, -1, i, -i, j, -j, k, -k \} \]

satisfying the following relations: \(1\) is the identity element, and

\[ i^2 = -1, \quad j^2 = -1, \quad k^2 =-1, \quad ij = k, \quad jk = i, \quad ki = j, \]
\[ (-1)i = -i, \quad (-1)j = -j, \quad (-1)k = -k, \quad (-1)(-1) = 1. \]

To verify that this really is a group is rather tedious, since the associative property takes forever to check. Here is a better way: in the group \(\mathrm{GL}_2(\mathbb{C})\), define elements

\[ I = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}, \quad A = \begin{bmatrix} \sqrt{-1} & 0 \\ 0 & -\sqrt{-1} \end{bmatrix}, \quad B = \begin{bmatrix} 0 & 1 \\ -1 & 0 \end{bmatrix}, \quad C = \begin{bmatrix} 0 & \sqrt{-1} \\ \sqrt{-1} & 0 \end{bmatrix} \]

where \(\sqrt{-1}\) denotes a complex number whose square is \(-1\), to avoid confusion with the symbol \(i \in Q_8\). Let \(-I, -A, -B, -C\) be the negatives of these matrices.

Then we can define an injective map \(f:Q_8\to \mathrm{GL}_2(\mathbb{C})\) by assigning

\[ \begin{aligned} 1 &\mapsto I, &\quad -1 &\mapsto -I\\ i &\mapsto A, &\quad -i &\mapsto -A \\ j &\mapsto B, &\quad -j &\mapsto -B \\ k &\mapsto C, &\quad -k &\mapsto -C. \end{aligned} \]

It can be checked directly that this map has the nice property (called being a group homomorphism) that

\[ f(xy)=f(x)f(y) \text{ for any elements } x,y\in Q_8. \]

Let us now prove associativity for \(Q_8\) using this information:

Claim 1.4.2

For any \(x,y,z\in Q_8\), we have \((xy)z=x(yz)\).

Proof of Claim 1.4.2

By using the property \(f(xy)=f(x)f(y)\) as well as associativity of multiplication in \(\mathrm{GL}_2(\mathbb{C})\) (marked by \(*\)) we obtain

\[ f((xy)z)=f(xy)f(z)=\left(f(x)f(y)\right)f(z)\stackrel{*}{=}f(x)\left(f(y)f(z)\right)=f(x)f(yz)=f(x(yz)). \]

Since \(f\) is injective and \(f((xy)z)=f(x(yz))\), we deduce \((xy)z=x(yz)\).

The subset \(\{\pm I, \pm A, \pm B, \pm C\}\) of \(\mathrm{GL}_2(\mathbb{C})\) is a subgroup (a term we define carefully later), meaning that it is closed under multiplication and taking inverses. (For example, \(AB= C\) and \(C^{-1} = -C\).) This proves it really is a group and one can check it satisfies an analogous list of identities as the one satisfied by \(Q_8\).
This is an excellent motivation to talk about group homomorphisms.

1.5 Group homomorphisms

A group homomorphism is a function between groups that preserves the group structure.

Definition 1.5.1

Let \((G, \cdot_G)\) and \((H, \cdot_H)\) be groups. A (group) homomorphism from \(G\) to \(H\) is a function \(f: G \to H\) such that

\[ f(x \cdot_G y) = f(x) \cdot_H f(y). \]

Note that a group homomorphism does not necessarily need to be injective nor surjective, it can be any function as long as it preserves the product.

Definition 1.5.2

Let \(G\) and \(H\) be groups. A homomorphism \(f\!: G \to H\) is an isomorphism if there exists a homomorphism \(g: H \to G\) such that

\[ f \circ g = \mathrm{id}_H \textrm{ and } g \circ f = \mathrm{id}_G. \]

If \(f:G\to H\) is an isomorphism, \(G\) and \(H\) are called isomorphic, and we denote this by writing \(G\cong H\). An isomorphism \(G \longrightarrow G\) is called an automorphism of \(G\). We denote the set of all automorphisms of \(G\) by \(\mathrm{Aut}(G)\).

Remark 1.5.3

Two groups \(G\) and \(H\) are isomorphic if we can obtain \(H\) from \(G\) by renaming all the elements, without changing the group structure. One should think of an isomorphism \(f\!: G \longrightarrow{\cong} H\) of groups as saying that the multiplication tables of \(G\) and \(H\) are the same up to renaming the elements.

The multiplication rule \(\cdot_G\) for \(G\) can be visualized as a table with both rows and columns labeled by elements of \(G\), and with \(x \cdot_G y\) placed in row \(x\) and column \(y\). The isomorphism \(f\) sends \(x\) to \(f(x)\), \(y\) to \(f(y)\), and the table entry \(x \cdot_G y\) to the table entry \(f(x) \cdot_H f(y)\). The inverse map \(f^{-1}\) does the opposite.

Remark 1.5.4

Suppose that \(f\!: G \to H\) is an isomorphism. As a function, \(f\) has an inverse, and thus it must necessarily be a bijective function. Our definition, however, requires more: the inverse must in fact also be a group homomorphism.

Note that many books define group isomorphism by simply requiring it to be a homomorphism that is bijective: and we will soon show that this is in fact equivalent to the definition we gave. There are however good reasons to define it as we did: in many contexts, such as sets, groups, rings, fields, or topological spaces, the correct meaning of the word “isomorphism” in “a morphism that has a two-sided inverse”. This explains our choice of definition.

Exercise 1.5.5

Let \(G\) be a group. Show that \(\mathrm{Aut}(G)\) is a group under composition.

Example 1.5.6

\(\,\)

  1. For any group \(G\), the identity map \(\mathrm{id}_G\!: G \to G\) is a group isomorphism.
  2. For all groups \(G\) and \(H\), the constant map \(f\!: G \to H\) with \(f(g) = e_H\) for all \(g \in G\) is a homomorphism, which we sometimes refer to as the trivial homomorphism.
  3. The exponential map and the logarithm map
    \[ \exp\!: (\mathbb{R}, +) \to (\mathbb{R}_{>0}, \cdot), \quad x \mapsto e^x \] \[ \ln\!: (\mathbb{R}_{>0}, \cdot) \to (\mathbb{R}, +), \quad y \mapsto \ln y \]
    are both isomorphisms, so \((\mathbb{R}, +)\cong (\mathbb{R}_{>0}, \cdot)\). In fact, these maps are inverse to each other.
  4. The function \(f\!: \mathbb{Z} \to \mathbb{Z}\) given by \(f(x) = 2x\) is a group homomorphism that is injective but not surjective.
  5. For any positive integer \(n\) and any field \(F\), the determinant map
    \[ \det\!: \mathrm{GL}_n(F) \to (F \setminus \{0\}, \cdot), \quad A \mapsto \det(A) \]
    is a group homomorphism. For \(n \geqslant 2\), the determinant map is not injective (you should check this!) and so it cannot be an isomorphism. It is however surjective: for each \(c \in F \setminus \{ 0 \}\), the diagonal matrix
    \[ \begin{pmatrix} c & & & \\ & 1 && \\ && \ddots & \\ &&& 1 \end{pmatrix} \]
    has determinant \(c\).
  6. Fix an integer \(n > 1\), and consider the function \(f\!: (\mathbb{Z},+) \to (\mathbb{C}^*,\cdot)\) given by \(f(m) = e^{\frac{2 \pi i m}{n}}\). This is a group homomorphism, but it is neither surjective nor injective. It is not surjective because the image only contains complex numbers \(x\) with \(|x| = 1\), and it is not injective because \(f(0) = f(n)\).

Lemma 1.5.7 (Properties of homomorphisms)

If \(f: G \to H\) is a homomorphism of groups, then

\[f(e_G) = e_H.\]

Moreover, for any \(x \in G\) we have

\[f(x^{-1}) = f(x)^{-1}.\]

Proof of Lemma 1.5.7

By definition, \[ f(e_G)f(e_G) = f(e_Ge_G) = f(e_G). \] Multiplying both sides by \(f(e_G)^{-1}\), we get \[ f(e_G) = e_H. \]

Now given any \(x \in G\), we have \[ f(x^{-1}) f(x) = f(x^{-1}x) = f(e) = e, \] and thus \(f(x^{-1}) = f(x)^{-1}\).

Remark 1.5.8

Let \(G\) be a cyclic group generated by the element \(g\). Then any homomorphism \(f\!: G \to H\) is completely determined by \(f(g)\), since any other element \(h \in G\) can be written as \(h = g^n\) for some integer \(n\), and \[ f(g^n) = f(g)^n. \]

More generally, given a group \(G\) and a set \(S\) of generators for \(G\), any homomorphism \(f\!: G \longrightarrow H\) is completely determined by the images of the generators in \(S\): the element \(g = s_1 \cdots s_m\), where \(s_i\) is either in \(S\) or the inverse of an element of \(S\), has image \[ f(g) = f(s_1 \cdots s_m) = f(s_1) \cdots f(s_m). \]

Note, however, that not all choices of images for the generators might actually give rise to a homomorphism; we need to check that the map determined by the given images of the generators is well-defined.

Definition 1.5.9

The image of a group homomorphism \(f\!: G \longrightarrow H\) is

\[\mathrm{im}(f) := \{f(g) \mid g \in G \}.\]

Notice that \(f\!: G \to H\) is surjective if and only if \(\mathrm{im}(f) = H\).

Definition 1.5.10

The kernel of a group homomorphism \(f\!: G \longrightarrow H\) is

\[\ker(f) := \{g \in G \mid f(g) = e_H\}.\]

Remark 1.5.11

Given any group homomorphism \(f\!: G \longrightarrow H\), we must have \(e_G \in \ker f\) by Lemma 1.5.7.

When the kernel of \(f\) is as small as possible, meaning \(\ker(f) = \{ e \}\), we say that the kernel of \(f\) is trivial. A homomorphism is injective if and only if it has a trivial kernel.

Lemma 1.5.12

A group homomorphism \(f: G \to H\) is injective if and only if \(\ker(f) = \{e_G\}\).

Proof of Lemma 1.5.12

First, note that \(e_G \in \ker f\) by Lemma 1.5.7. If \(f\) is injective, then \(e_G\) must be the only element that \(f\) sends to \(e_H\), and thus \(\ker(f) = \{ e_G \}\).

Now suppose \(\ker(f) = \{e_G\}\). If \(f(g) = f(h)\) for some \(g,h \in G\), then \[ f(h^{-1}g) = f(h^{-1})f(g) = f(h)^{-1}f(g) = e_H. \] But then \(h^{-1}g \in \ker(f)\), so we conclude that \(h^{-1}g = e_G\), and thus \(g = h\).

Example 1.5.13

First, number the vertices of \(P_n\) from \(1\) to \(n\) in any manner you like. Now define a function \(f\!: D_{n} \to S_n\) as follows: given any symmetry \(\alpha \in D_n\), set \(f(\alpha)\) to be the permutation of \([n]\) that records how \(\alpha\) permutes the vertices of \(P_n\) according to your labelling.

So \(f(\alpha) = \sigma\) where \(\sigma\) is the permutation that for all \(1 \leqslant i \leqslant n\), if \(\alpha\) sends the \(i\)th vertex to the \(j\)th one in the list, then \(\sigma(i) = j\). This map \(f\) is a group homomorphism.

Now suppose \(f(\alpha) = \mathrm{id}_{S_n}\). Then \(\alpha\) must fix all the vertices of \(P_n\), and thus \(\alpha\) must be the identity element of \(D_n\). We have thus shown that the kernel of \(f\) is trivial. By Lemma 1.5.12, this proves \(f\) is injective.

Lemma 1.5.14

Suppose \(f\!: G \to H\) is a group homomorphism. Then \(f\) is an isomorphism if and only if \(f\) is bijective.

Proof of Lemma 1.5.14

(\(\Rightarrow\)) A function \(f: X \to Y\) between two sets is bijective if and only if it has an inverse, meaning that there is a function \(g: Y \to X\) such that \(f \circ g = \mathrm{id}_Y\) and \(g \circ f = \mathrm{id}_X\). Our definition of group isomorphism implies that this must hold for any isomorphism (and more!), as we noted in Remark 1.5.4.

(\(\Leftarrow\)) If \(f\) is bijective homomorphism, then as a function it has a set-theoretic two-sided inverse \(g\), as remarked in Remark 1.5.4. But we need to show that this inverse \(g\) is actually a homomorphism. For any \(x,y \in H\), we have

\[ \begin{aligned} f(g(xy)) & = xy \quad && \textrm{since } fg=\mathrm{id}_G \\ & = f(g(x))f(g(y)) \quad && \textrm{since } fg=\mathrm{id}_G\\ & = f(g(x)g(y)) \quad && \textrm{since $f$ is a group homomorphism} . \end{aligned} \]

Since \(f\) is injective, we must have \(g(xy) = g(x)g(y)\). Thus \(g\) is a homomorphism, and \(f\) is an isomorphism.

Exercise 1.5.15

Let \(f\!: G \to H\) be an isomorphism. Show that for all \(x \in G\), we have \(|f(x)| = |x|\).

In other words, isomorphisms preserve the order of an element. This is an example of an isomorphism invariant.

Definition 1.5.16

An isomorphism invariant (of a group) is a property \(P\) (of groups) such that whenever \(G\) and \(H\) are isomorphic groups and \(G\) has the property \(P\), then \(H\) also has the property \(P\).

Theorem 1.5.17

The following are isomorphism invariants:

  1. the order of the group,
  2. the set of all the orders of elements in the group,
  3. the property of being abelian,
  4. the order of the center of the group,
  5. being finitely generated.

Recall that by definition two sets have the same cardinality if and only if they are in bijection with each other.

Proof of Theorem 1.5.17

Let \(f\!:G\to H\) be any group isomorphism.

  1. Since \(f\) is a bijection by Remark 1.5.4, we conclude that \(|G|=|H|\).
  2. We wish to show that \(\{|x| \ | \ x\in G\}= \{|y| \ | \ y\in H\}\). (\(\subseteq\)) follows from Exercise 1.5.15: given any \(x\in G\), we have \(|x| = |f(x)|\), which is the order of an element in \(H\). (\(\supseteq\)) follows from the previous statement applied to the group isomorphism \(f^{-1}\): given any \(y\in H\), we have \(f^{-1}(y)\in G\) and \(|y| = |f^{-1}(y)|\) is the order of an element of \(G\).
  3. For any \(y_1,y_2\in H\) there exist some \(x_1, x_2\in G\) such that \(f(x_i)=y_i\). Then we have \[ y_1y_2=f(x_1)f(x_2)=f(x_1x_2)\stackrel{*}{=}f(x_2x_1)=f(x_2)f(x_1)=y_2y_1, \] where \(*\) indicates the place where we used that \(G\) is abelian.
  4. Exercise. The idea is to show \(f\) induces an isomorphism \(\mathcal{Z}(G)\cong \mathcal{Z}(H)\).
  5. Exercise. Show that if \(S\) generates \(G\) then \(f(S)=\{f(s) \ | \ s\in S\}\) generates \(H\).

The easiest way to show that two groups are not isomorphic is to find an isomorphism invariant that they do not share.

Remark 1.5.18

Let \(G\) and \(H\) be two groups. If \(P\) is an isomorphism invariant, and \(G\) has \(P\) while \(H\) does not have \(P\), then \(G\) is not isomorphic to \(H\).

Example 1.5.19

  1. We have \(S_n\cong S_m\) if and only if \(n=m\), since \(|S_n| = n!\) and \(|S_m| = m!\) and the order of a group is an isomorphism invariant.
  2. Since \(\mathbb{Z}/6\) is abelian and \(S_3\) is not abelian, we conclude that \(\mathbb{Z}/6\ncong S_3\).
  3. You will show in Problem Set 2 that \(|Z(D_{24})|=2\), while \(S_n\) has trivial center. We conclude that \(D_{24}\ncong S_4\).
We come to one of the central concepts in group theory: the action of a group on a set. Some would say this is the main reason one would study groups, so we want to introduce it early both as motivation for studying group theory but also because the language of group actions will be very helpful to us.

2. Group actions: a first look

We come to one of the central concepts in group theory: the action of a group on a set. Some would say this is the main reason one would study groups, so we want to introduce it early both as motivation for studying group theory but also because the language of group actions will be very helpful to us.

2.1 What is a group action?

Definition 2.1.1

For a group \((G, \cdot)\) and set \(S\), an action of \(G\) on \(S\) is a function

\[G \times S \to S,\]

typically written as \((g,s) \mapsto g \cdot s\), such that

  1. \(g \cdot (h \cdot s) = (gh) \cdot s\) for all \(g,h \in G\) and \(s\in S\).
  2. \(e_G \cdot s = s\) for all \(s \in S\).

Remark 2.1.2

To make the first axiom clearer, we will write \(\cdot\) for the action of \(G\) on \(S\) and no symbol (concatenation) for the multiplication of two elements in the group \(G\).

A group action is the same thing as a group homomorphism.

Lemma 2.1.3 (Permutation representation)

Consider a group \(G\) and a set \(S\).

  1. Suppose \(\cdot\) is an action of \(G\) on \(S\). For each \(g \in G\), let \(\mu_g\!:S\longrightarrow S\) denote the function given by \(\mu_g(s)=g \cdot s\). Then the function
    \[ \rho\!: G \to \mathrm{Perm}(S), \quad g \mapsto \mu_g \]
    is a well-defined homomorphism of groups.
  2. Conversely, if \(\rho: G \to \mathrm{Perm}(S)\) is a group homomorphism, then the rule
    \[g \cdot s := (\rho(g))(s)\]
    defines an action of \(G\) on \(S\).

Proof of Lemma 2.1.3

(1) Assume we are given an action of \(G\) on \(S\). We first need to check that for all \(g\), \(\mu_g\) really is a permutation of \(S\). We will show this by proving that \(\mu_g\) has a two-sided inverse; in fact, that inverse is \(\mu_{g^{-1}}\).

Indeed, we have

\[ \begin{aligned} (\mu_g\circ\mu_{g^{-1}})(s) &=\mu_g(\mu_{g^{-1}}(s)) & \text{ by the definition of composition}\\ &=g\cdot (g^{-1} \cdot s) & \text{ by the definition for } \mu_g \text{ and } \mu_{g^{-1}}\\ &=(gg^{-1})\cdot s & \text{ by the definition of a group action}\\ &=e_G\cdot s & \text{ by the definition of a group}\\ &= s &\text{ by the definition of a group action} \end{aligned} \]

thus \(\mu_g \circ \mu_{g^{-1}}=\mathrm{id}_S\), and a similar argument shows that \(\mu_{g^{-1}}\circ \mu_{g}=\mathrm{id}_S\) (exercise!). This shows that \(\mu_g\) has an inverse, and thus it is bijective; it must then be a permutation of \(S\).

Finally, we wish to show that \(\rho\) is a homomorphism of groups, so we need to check that \(\rho(gh)=\rho(g) \circ \rho(h)\). Equivalently, we need to prove that \(\mu_{gh}=\mu_g\circ\mu_{h}\). Now for all \(s\), we have

\[ \begin{aligned} \mu_{gh}(s) & = (gh) \cdot s & \textrm{ by definition of $\mu$} \\ & = g\cdot(h \cdot s) & \textrm{ by definition of a group action} \\ & =\mu_g\left(\mu_{h}(s)\right) & \textrm{by definition of } \mu_g \textrm{ and } \mu_h \\ & = (\mu_g \circ \mu_{h})(s). \end{aligned} \]

This proves that \(\rho\) is a homomorphism.

(2) On the other hand, given a homomorphism \(\rho\), the function

\[ (g,s) \mapsto g \cdot s = \rho(g)(s) \]
is an action, because

\[ \begin{aligned} h \cdot (g \cdot s) & = \rho(h)(\rho(g)(s)) & \textrm{by definition of $\rho$}\\ & = (\rho(h) \circ \rho(g))(s) \\ & = \rho(gh)(s) & \textrm{since $\rho$ is a homomorphism} \\ & = (gh) \cdot s & \textrm{by definition of } \rho, \end{aligned} \]

and \[ e_G \cdot s = \rho(e_G)(s) = \mathrm{id}(s) = s. \]

Definition 2.1.4

Given a group \(G\) acting on a set \(S\), the group homomorphism \(\rho\) associated to the action as defined in Lemma 2.1.3 is called the permutation representation of the action.

Definition 2.1.5

Let \(G\) be a group acting on a set \(S\). The equivalence relation on \(S\) induced by the action of \(G\), written \(\sim_G\), is defined by \(s\sim_G t\) if and only if there is a \(g \in G\) such that \(t=g\cdot s\). The equivalence classes of \(\sim_G\) are called orbits: the equivalence class

\[ \mathrm{Orb}_G(s) := \{g\cdot s \ | \ g\in G\} \]

is the orbit of \(s\). The set of equivalence classes with respect to \(\sim_G\) is written \(S/G\).

Lemma 2.1.6

Let \(G\) be a group acting on a set \(S\). Then

  1. The relation \(\sim_G\) really is an equivalence relation.
  2. For any \(s,t \in S\) either \(\mathrm{Orb}_G(s)=\mathrm{Orb}_G(t)\) or \(\mathrm{Orb}_G(s)\cap \mathrm{Orb}_G(t)=\emptyset\).
  3. The orbits of the action of \(G\) form a partition of \(S\): \(S=\bigcup_{s \in S} \mathrm{Orb}_G(s)\).

Proof of Lemma 2.1.6

Assume \(G\) acts on \(S\).

  1. We really need to prove three things: that \(\sim_G\) is reflexive, symmetric, and transitive.

    (Reflexive): We have \(x \sim_G x\) for all \(x \in S\) since \(x = e_G \cdot x\).

    (Symmetric): If \(x \sim_G y\), then \(y = g \cdot x\) for some \(g \in G\), and thus \[ g^{-1} \cdot y = g^{-1} \cdot (g \cdot x) = (g^{-1}g) \cdot x = e \cdot x = x, \] which shows that \(y \sim_G x\).

    (Transitive): If \(x \sim_G y\) and \(y \sim_G z\), then \(y = g \cdot x\) and \(z = h \cdot y\) for some \(g, h \in G\) and hence \[ z = h \cdot (g \cdot x) = (hg) \cdot x, \] which gives \(x \sim_G z\).

Parts (b) and (c) are formal properties of the equivalence classes for any equivalence relation.

Corollary 2.1.7

Suppose a group \(G\) acts on a finite set \(S\). Let \(s_1, \dots, s_k\) be a complete set of orbit representatives — that is, assume each orbit contains exactly one member of the list \(s_1, \dots, s_k\). Then

\[|S| = \sum_{i = 1}^k |\mathrm{Orb}_G(s_i)|.\]

Proof of Corollary 2.1.7

This is an immediate corollary of the fact that the orbits form a partition of \(S\).

Remark 2.1.8

Let \(G\) be a group acting on \(S\). The associated group homomorphism \(\rho\) is injective if and only if it has trivial kernel, by Lemma 1.5.12. This is equivalent to the statement \(\mu_g = \mathrm{id}_S \implies g = e_G\). The latter can be written in terms of elements of \(S\): for each \(g \in G\),

\[g \cdot s = s \quad \textrm{for all } s \in S \implies g = e_G.\]

Definition 2.1.9

Let \(G\) be a group acting on a set \(S\). The action is faithful if the associated group homomorphism is injective. Equivalently, the action is faithful if and only if

\[g \cdot s = s \quad \textrm{for all } s \in S \implies g = e_G.\]

The action is transitive if for all \(p,q \in S\) there is \(g \in G\) such that \(q=g\cdot p\). Equivalently, the action is transitive if there is only one orbit, meaning that

\[\mathrm{Orb}_G(p)=S \textrm{ for all } p\in S.\]

2.2 Examples of group actions

Example 2.2.1 (Trivial action)

For any group \(G\) and any set \(S\), \(g \cdot s := s\) defines an action, the trivial action. The associated group homomorphism is the map

\[ \rho: G \longrightarrow \mathrm{Perm}(S),\quad g \longmapsto \mathrm{id}_S. \]

A trivial action is not faithful unless the group \(G\) is trivial; in fact, the corresponding group homomorphism is trivial.

Example 2.2.2

The group \(D_{n}\) acts on the vertices of \(P_n\), which we will label with \(V_1, \dots, V_{n}\) in a counterclockwise fashion, with \(V_1\) on the positive \(x\)-axis, as in Notation 1.3.11. Note that \(D_{n}\) acts on \(\{V_1, \dots, V_n \}\): for each \(g \in D_{n}\) and each integer \(1 \leqslant j \leqslant n\), we set

\[ g \cdot V_j = V_i \quad \textrm{ if and only if } \quad g(V_j)=V_i. \]

This satisfies the two axioms of a group action (check!).

Let \(\rho\!: D_{n} \to \mathrm{Perm}\!\left(\{V_1,\ldots,V_n\}\right)\cong S_n\) be the associated group homomorphism. Note that \(\rho\) is injective, because if an element of \(D_{n}\) fixes all \(n\) vertices of a polygon, then it must be the identity map. More generally, if an isometry of \(\mathbb{R}^2\) fixes any three noncolinear points, then it is the identity. To see this, note that given three noncolinear points, every point in the plane is uniquely determined by its distance from these three points (exercise!).

The action of \(D_{n}\) on the \(n\) vertices of \(P_n\) is faithful; in fact, we saw before that each \(\sigma \in D_n\) is completely determined by what it does to any two adjacent vertices.

Example 2.2.3 (group acting on itself by left multiplication)

Let \(G\) be any group and define an action \(\cdot\) of \(G\) on \(G\) (regarded as just a set) by the rule

\[ g \cdot x := g x. \]

This is an action, since multiplication is associative and \(e_G \cdot x = x\) for all \(x\); it is know as the left regular action of \(G\) on itself.

The left regular action of \(G\) on itself is faithful, since if \(g \cdot x = x\) for all \(x\) (or even for just one \(x\)), then \(g = e\). It follows that the associated homomorphism is injective. This action is also transitive: given any \(g \in G\), \(g = g \cdot e\), and thus \(\mathrm{Orb}_G(e) = G\).

Example 2.2.4 (conjugation)

Let \(G\) be any group and fix an element \(g \in G\). Define the conjugation action of \(G\) on itself by setting

\[ g\cdot x := gxg^{-1} \textrm{ for any } g,x\in G. \]

The action of \(G\) on itself by conjugation is not necessarily faithful. In fact, we claim that the kernel of the permutation representation \(\rho\!:G\to \mathrm{Perm}(G)\) for the conjugation action is the center \(\mathrm{Z}(G)\). Indeed,

\[ g\in \ker\rho\iff g\cdot x=x \textrm{ for all } x\in G \iff gxg^{-1}=x \textrm{ for all } x\in G \] \[ \iff gx=xg \textrm{ for all } x\in G \iff g\in \mathrm{Z}(G). \]

The orbits for this action are quite interesting, and we will study them in more detail later. If \(G\) is nontrivial, this action is never transitive unless \(G\) is trivial: note that \(\mathrm{Orb}_G(e) = \{ e \}\).

3. Subgroups

Every time we define a new abstract structure consisting of a set \(S\) with some extra structure, we then want to consider subsets of \(S\) that inherit that special structure. It is now time to discuss subgroups.

3.1 Definition and examples

Definition 3.1.1

A nonempty subset \(H\) of a group \(G\) is a subgroup of \(G\) if \(H\) is a group under the multiplication law of \(G\). If \(H\) is a subgroup of \(G\), we write \(H \leq G\), or \(H \lt G\) if we want to indicate that \(H\) is a subgroup of \(G\) but \(H \neq G\).

Remark 3.1.2

Note that if \(H\) is a subgroup of \(G\), then necessarily \(H\) must be closed for the product in \(G\), meaning that for any \(x,y \in H\) we must have \(xy \in H\).

Remark 3.1.3

Let \(H\) be a subgroup of \(G\). Since \(H\) itself is a group, it has an identity element \(e_H\), and thus \[ e_H e_H = e_H \] in \(H\). But the product in \(H\) is just a restriction of the product of \(G\), so this equality also holds in \(G\). Multiplying by \(e_H^{-1}\), we conclude that \(e_H = e_G\).

In summary, if \(H\) is any subgroup of \(G\), then we must have \(e_G \in H\).

Example 3.1.4

Any group \(G\) has two trivial subgroups: \(G\) itself, and \(\{ e_G \}\).

Any subgroup \(H\) of \(G\) that is neither \(G\) nor \(\{ e_G \}\) is a nontrivial subgroup. A group might not have any nontrivial subgroups.

Example 3.1.5

The group \(\mathbb{Z}/2\) has no nontrivial subgroup.

Example 3.1.6

The following are strings of subgroups with the obvious group structure:

\[ \mathbb{Z} < \mathbb{Q} < \mathbb{R} < \mathbb{C} \quad \textrm{and} \quad \mathbb{Z}^\times < \mathbb{Q}^\times < \mathbb{R}^\times < \mathbb{C}^\times. \]

Lemma 3.1.7 (Subgroup tests)

Let \(H\) be a subset of a group \(G\).

  • Two-step test: If \(H\) is nonempty and closed under multiplication and taking inverses, then \(H\) is a subgroup of \(G\). More precisely, if for all \(x, y \in H\), we have \(xy \in H\) and \(x^{-1} \in H\), then \(H\) is a subgroup of \(G\).
  • One-step test: If \(H\) is nonempty and \(xy^{-1} \in H\) for all \(x,y \in H\), then \(H\) is a subgroup of \(G\).

Proof of Lemma 3.1.7

We prove the One-step test first. Assume \(H\) is nonempty and for all \(x,y \in H\) we have \(xy^{-1} \in H\). Since \(H\) is nonempty, there is some \(h \in H\), and hence \(e_G = hh^{-1} \in H\). Since \(e_Gx=x=xe_G\) for any \(x\in G\), and hence for any \(x \in H\), then \(e_G\) is an identity element for \(H\). For any \(h \in H\), we have that \(h^{-1} = eh^{-1} \in H\), and since in \(G\) we have \(h^{-1}h = e = hh^{-1} \in H\) and this calculation does not change when we restrict to \(H\), we can conclude that every element of \(H\) has an inverse inside \(H\). For every \(x,y \in H\) we must have \(y^{-1} \in H\) and thus

\[ xy = x(y^{-1})^{-1} \in H \]

so \(H\) is closed under the multiplication operation. This means that the restriction of the group operation of \(G\) to \(H\) is a well-defined group operation. This operation is associative by the axioms for the group \(G\). The axioms of a group have now been established for \((H, \cdot)\).

Now we prove the Two-step test. Assume \(H\) is nonempty and closed under multiplication and taking inverses. Then for all \(x,y\in H\) we must have \(y^{-1}\in H\) and thus \(xy^{-1}\in H\). Since the hypothesis of the One-step test is satisfied, we conclude that \(H\) is a subgroup of \(G\).

Lemma 3.1.8 (Examples of subgroups)

Let \(G\) be a group.

  1. If \(H\) is a subgroup of \(G\) and \(K\) is a subgroup of \(H\), then \(K\) is a subgroup of \(G\).
  2. Let \(J\) be any (index) set. If \(H_\alpha\) is a subgroup of \(G\) for all \(\alpha \in J\), then \(H=\bigcap_{\alpha\in J} H_\alpha\) is a subgroup of \(G\).
  3. If \(f: G \to H\) is a homomorphism of groups, then \(\mathrm{im}(f)\) is a subgroup of \(H\).
  4. If \(f: G \to H\) is a homomorphism of groups, and \(K\) is a subgroup of \(G\), then \[ f(K) := \{ f(g) \mid g \in K \} \] is a subgroup of \(H\).
  5. If \(f: G \to H\) is a homomorphism of groups, then \(\ker(f)\) is a subgroup of \(G\).
  6. The center \(\mathrm{Z}(G)\) is a subgroup of \(G\).

Proof of Lemma 3.1.8

  1. By definition, \(K\) is a group under the multiplication in \(H\), and the multiplication in \(H\) is the same as that in \(G\), so \(K\) is a subgroup of \(G\).
  2. First, note that \(H\) is nonempty since \(e_G \in H_\alpha\) for all \(\alpha\in J\). Moreover, given \(x,y\in H\), for each \(\alpha\) we have \(x,y \in H_\alpha\) and hence \(xy^{-1} \in H_\alpha\). It follows that \(xy^{-1} \in H\). By the Two-Step test, \(H\) is a subgroup of \(G\).
  3. Since \(G\) is nonempty, then \(\mathrm{im}(f)\) must also be nonemtpy; for example, it contains \(f(e_G) = e_H\). If \(x,y \in \mathrm{im}(f)\), then \(x = f(a)\) and \(y = f(b)\) for some \(a,b \in G\), and hence \[ xy^{-1} =f(a)f(b)^{-1} = f(ab^{-1}) \in \mathrm{im}(f). \] By the Two-Step Test, \(\mathrm{im}(f)\) is a subgroup of \(H\).
  4. The restriction \(g\!: K \to H\) of \(f\) to \(K\) is still a group homomorphism, and thus \(f(K) = \mathrm{im}(g)\) is a subgroup of \(H\).
  5. Using the One-step test, note that if \(x, y \in \ker(f)\), meaning \(f(x)=f(y)=e_G\), then \[ f(xy^{-1})=f(x)f(y)^{-1}=e_G. \] This shows that if \(x,y\in \ker(f)\) then \(xy^{-1}\in \ker(f)\), so \(\ker(f)\) is closed for taking inverses. By the Two-Step test, \(\ker(f)\) is a subgroup of \(G\).
  6. The center \(\mathrm{Z}(G)\) is the kernel of the permutation representation \(G\to \mathrm{Perm}(G)\) for the conjugation action, so \(\mathrm{Z}(G)\) is a subgroup of \(G\) since the kernel of a homomorphism is a subgroup.

Example 3.1.9

For any field \(F\), the special linear group

\[ \mathrm{SL}_n(F) := \{A \mid A = n\times n \text{ matrix with entries in } F, \det(A)=1_F\} \]

is a subgroup of the general linear group \(\mathrm{GL}_n(F)\). To prove this, note that \(\mathrm{SL}_n(F)\) is the kernel of the determinant map \(\det\!:\mathrm{GL}_n(F)\to F^\times\), which is one of the homomorphisms in Example 1.5.6. By Lemma 3.1.8, this implies that \(\mathrm{SL}_n(F)\) is indeed a subgroup of \(\mathrm{GL}_n(F)\).

Definition 3.1.10

Let \(f\!:G\to H\) be a group homomorphism and \(K\leq H\). The preimage of \(K\) is given by

\[ f^{-1}(K) := \{g\in G \mid f(g)\in K\} \]

Exercise 3.1.11

Prove that if \(f\!:G\to H\) is a group homomorphism and \(K\leq H\), then the preimage of \(K\) is a subgroup of \(G\).

Exercise 3.1.12

The set of rotational symmetries \(\{ r^i \mid i \in \mathbb{Z} \} = \{\mathrm{id}, r, r^2, \dots, r^{n-1}\}\) of \(P_n\) is a subgroup of \(D_{n}\).

In fact, this is the subgroup generated by \(r\).

Definition 3.1.13

Given a group \(G\) and a subset \(X\) of \(G\), the subgroup of \(G\) generated by \(X\) is

\[ \langle X \rangle := \bigcap_{\substack{H \leq G \\ H \supseteq X}} H. \]

If \(X=\{x\}\) is a set with one element, then we write \(\langle X \rangle=\langle x \rangle\) and we refer to this as the cyclic subgroup generated by \(x\). More generally, when \(X = \{ x_1, \ldots, x_n \}\) is finite, we may write \(\langle x_1, \ldots, x_n \rangle\) instead of \(\langle X \rangle\). Finally, given two subsets \(X\) and \(Y\) of \(G\), we may sometimes write \(\langle X, Y \rangle\) instead of \(\langle X \cup Y \rangle\).

Remark 3.1.14

Note that by Lemma 3.1.8, \(\langle X \rangle\) really is a subgroup of \(G\). By definition, the subgroup generated by \(X\) is the smallest (with respect to containment) subgroup of \(G\) that contains \(X\), meaning that \(\langle X \rangle\) is contained in any subgroup that contains \(X\).

Remark 3.1.15

Do not confuse this notation with giving generators and relations for a group; here we are forgoing the relations and focusing only on writing a list of generators. Another key difference is that we have picked elements in a given group \(G\), but the subgroup they generate might not be \(G\) itself, but rather some other subgroup of \(G\).

Lemma 3.1.16

For a subset \(X\) of \(G\), the elements of \(\langle X \rangle\) can be described as:

\[ \langle X \rangle = \left\{x_1^{j_1} \cdots x_m^{j_m} \mid m \geqslant 0, j_1, \dots, j_m \in \mathbb{Z} \text{ and }x_1, \dots, x_m \in X \right\}. \]

Note that the product of no elements is by definition the identity.

Proof of Lemma 3.1.16

Let

\[ S= \left\{x_1^{j_1} \cdots x_m^{j_m} \mid m \geqslant 0, j_1, \dots, j_m \in \mathbb{Z} \text{ and }x_1, \dots, x_m \in X \right\}. \]

Since \(\langle X \rangle\) is a subgroup that contains \(X\), it is closed under products and inverses, and thus must contain all elements of \(S\). Thus \(X \supseteq S\).

To show \(X \subseteq S\), we will prove that the set \(S\) is a subgroup of \(G\) using the One-step test:

  • \(S \neq \emptyset\) since we allow \(m = 0\) and declare the empty product to be \(e_G\).
  • Let \(a\) and \(b\) be elements of \(S\), so that they can be written as \(a = x_1^{j_1} \cdots x_m^{j_m}\) and \(b= y_1^{i_1} \cdots y_n^{i_n}\). Then
    \[ ab^{-1} = x_1^{j_1} \cdots x_m^{j_m}(y_1^{i_1} \cdots y_n^{i_m})^{-1}= x_1^{j_1} \cdots x_m^{j_m} y_n^{-i_n} \cdots y_1^{-i_1} \in S. \]

Therefore, \(S\leq G\) and \(X\subseteq S\) (by taking \(m=1\) and \(j_1=1\)) and by the minimality of \(\langle X \rangle\) we conclude that \(\langle X \rangle\subseteq S\).

Example 3.1.17

The Lemma implies that for an element \(x\) of a group \(G\), \(\langle x\rangle=\{x^j \mid j\in \mathbb{Z}\}\).

Example 3.1.18

We showed in Theorem 1.3.19 that \(D_{n}=\langle r,s \rangle\), so \(D_{n}\) is the subgroup of \(D_{n}\) generated by \(\{r,s\}\). But do not mistake this for a presentation with no relations! In fact, these generators satisfy lots of relations, such as \(srs=r^{-1}\), which we proved in Lemma 1.3.16.

Example 3.1.19

For any \(n \geqslant 1\), we proved in Problem Set 2 that \(S_n\) is generated by the collection of adjacent transpositions \((i \quad i+1)\).

Theorem 3.1.20 (Cayley's Theorem)

Every finite group is isomorphic to a subgroup of \(S_n\).

Proof of Theorem 3.1.20

Suppose \(G\) is a finite group of order \(n\) and label the group elements of \(G\) from \(1\) to \(n\) in any way you like. The left regular action of \(G\) on itself determines a permutation representation \(\rho\!:G\to \mathrm{Perm}(G)\), which is injective. Note that since \(G\) has \(n\) elements, \(\mathrm{Perm}(G)\) is the group of permutations on \(n\) elements, and thus \(\mathrm{Perm}(G) \cong S_n\). By Lemma 3.1.8, \(\mathrm{im}(\rho)\) is a subgroup of \(S_n\). If we restrict \(\rho\) to its image, we get an isomorphism \(\rho\!: G \to \mathrm{im}(\rho)\). Hence \(G\cong \mathrm{im}(\rho)\), which is a subgroup of \(S_n\).

Remark 3.1.21

From a practical perspective, this is a nearly useless theorem. It is, however, a beautiful fact.

3.2 Subgroups vs isomorphism invariants

Some properties of a group \(G\) pass onto all its subgroups, but not all. In this section, we collect some examples illustrating some of the most important properties.

Theorem 3.2.1 (Lagrange's Theorem)

If \(H\) is a subgroup of a finite group \(G\), then \(|H|\) divides \(|G|\).

You will prove Lagrange's Theorem in the next problem set.

Exercise 3.2.2

Let \(G\) be a finite group. Suppose that \(A\) and \(B\) are subgroups of \(G\) such that \(\gcd(|A|, |B|) = 1\). Show that \(A \cap B = \{ e \}\).

Example 3.2.3 (Infinite group with finite subgroup)

The group \(\mathrm{SL}_2(\mathbb{R})\) is infinite, but the matrix

\[ A = \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix} \]

has order \(2\) and it generates the subgroup \(\langle A \rangle = \{ A, I \}\) with two elements.

Example 3.2.4 (Nonabelian group with abelian subgroup)

The dihedral group \(D_n\), with \(n \geqslant 3\), is nonabelian, while the subgroup of rotations (see Exercise 3.1.12) is abelian (for example, because it is cyclic; see Theorem 3.3.3 below).

To give an example of a finitely generated group with an infinitely generated subgroup, we have to work a bit harder.

Example 3.2.5 (Finitely generated group with infinitely generated subgroup)

Consider the subgroup \(G\) of \(\mathrm{GL}_2(\mathbb{Q})\) generated by

\[ A = \begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix} \qquad \textrm{and} \qquad B = \begin{pmatrix} 2 & 0 \\ 0 & 1 \end{pmatrix}. \]

Let \(H\) be the subgroup of \(\mathrm{GL}_2(\mathbb{Q})\) given by

\[ H = \left\lbrace \begin{pmatrix} 1 & \tfrac{n}{\, 2^m} \\ 0 & 1 \end{pmatrix} \in G \;\middle|\; n, m \in \mathbb{Z} \right\rbrace. \]

We leave it as an exercise to check that this is indeed a subgroup of \(\mathrm{GL}_2(\mathbb{Q})\). Note that for all integers \(n\) and \(m\) we have

\[ A^n = \begin{pmatrix} 1 & n \\ 0 & 1 \end{pmatrix} \qquad \textrm{and} \qquad B^m = \begin{pmatrix} 2^m & 0 \\ 0 & 1 \end{pmatrix}, \]

and

\[ B^{-m} A^n B^m = \begin{pmatrix} 1 & \tfrac{n}{\, 2^m} \\ 0 & 1 \end{pmatrix} \in H. \]

Therefore, \(H\) is a subgroup of \(G\), and in fact

\[ H = \langle B^{-m} A^n B^m \mid n, m \in \mathbb{Z} \rangle. \]

While \(G = \langle A, B \rangle\) is finitely generated by construction, we claim that \(H\) is not. The issue is that

\[ \begin{pmatrix} 1 & \tfrac{a}{\, 2^b} \\ 0 & 1 \end{pmatrix} \begin{pmatrix} 1 & \tfrac{c}{\, 2^d} \\ 0 & 1 \end{pmatrix} = \begin{pmatrix} 1 & \tfrac{a}{\, 2^b} + \tfrac{c}{\, 2^d} \\ 0 & 1 \end{pmatrix}, \]

so the subgroup generated by any finite set of matrices in \(H\), say

\[ \left\langle \begin{pmatrix} 1 & \tfrac{n_1}{\, 2^{m_1}} \\ 0 & 1 \end{pmatrix}, \ldots, \begin{pmatrix} 1 & \tfrac{n_t}{\, 2^{m_t}} \\ 0 & 1 \end{pmatrix} \right\rangle \]

does not contain

\[ \begin{pmatrix} 1 & \tfrac{1}{\, 2^{N}} \\ 0 & 1 \end{pmatrix} \in H \]

with \(N = \max_i \{|m_i| \} + 1\). Thus \(H\) is infinitely generated.

In the previous example, we constructed a group with two generators that has an infinitely generated subgroup. We will see in the next section that we couldn't have done this with less generators; in fact, the subgroups of a cyclic group are all cyclic.


Below we collect some important facts about the relationship between finite groups and their subgroups, including some explained by the examples above and others which we leave as an exercise.

Order of the group:

  • Every subgroup of a finite group is finite.
  • There exist infinite groups with finite subgroups; see Example 3.2.3.
  • Lagrange's Theorem: If \(H\) is a subgroup of a finite group \(G\), then \(|H|\) divides \(|G|\).

Orders of elements:

  • If \(H \subseteq G\), then the set of orders of elements of \(H\) is a subset of the set of orders of elements of \(G\).

Abelianity:

  • Every subgroup of an abelian group is abelian.
  • There exist nonabelian groups with abelian subgroups; see Example 3.2.4.
  • Every cyclic (sub)group is abelian.

Generators:

  • There exist a finitely generated group \(G\) and a subgroup \(H\) of \(G\) such that \(H\) is not finitely generated; see Example 3.2.5.
  • Every infinitely generated group has finitely generated subgroups.
    (This one is a triviality: we are just noting that even if the group is infinitely generated, we can always consider the subgroup generated by our favorite element, which is, by definition, finitely generated.)
  • Every subgroup of a cyclic group is cyclic; see Theorem 3.3.3.

3.3 Cyclic groups

Recall the definition of a cyclic group.

Definition 3.3.1

If \(G\) is a group a generated by a single element, meaning that there exists \(x \in G\) such that \(G = \langle x \rangle\), then \(G\) is a cyclic group.

Remark 3.3.2

Given a cyclic group \(G\), we may be able to pick different generators for \(G\). For example, \(\mathbb{Z}\) is a cyclic group, and both \(1\) or \(-1\) are a generator. More generally, for any element \(x\) in a group \(G\)

$$\langle x \rangle= \langle x^{-1}\rangle.$$

Example 3.3.3

The main examples of cyclic groups, in additive notation, are the following:

  • The group \((\mathbb{Z},+)\) is cyclic with generator 1 or -1.
  • The group \((\mathbb{Z}/n,+)\) of congruences modulo \(n\) is cyclic, since it is for example generated by \([1]\). Below we will find all the choices of generators for this group.

In fact, we will later prove that up to isomorphism these are the only examples of cyclic groups.

Let us record some facts important facts about cyclic groups which you have proved in problem sets:

Lemma 3.3.4

Every cyclic group is abelian.

Lemma 3.3.5

Let \(G\) be a group and \(x \in G\). If \(x^m = e\) then \(|x|\) divides \(m\).

Now we can use these to say more about cyclic groups.

Theorem 3.3.6

Let \(G=\langle x\rangle\), where \(x\) has finite order \(n\). Then

  1. \(|G|=|x|=n\) and \(G=\{e,x,\ldots,x^{n-1}\}\).
  2. For any integer \(k\), we have \(|x^k| = \dfrac{n}{\gcd(k,n)}\). In particular, $$\langle x^k\rangle =G \iff \gcd(n,k)=1.$$
  3. There is a bijection between the set of divisors of \(|G|\) and the set of subgroups of \(G\), given by

    \(\Psi:\ \{ \text{divisors of } |G| \}\to \{ \text{subgroups of } G \}\), \quad \(\Psi(d)=\big\langle x^{\frac{|G|}{d}}\big\rangle\),

    \(\Phi:\ \{ \text{subgroups of } G \}\to \{ \text{divisors of } |G| \}\), \quad \(\Phi(H)=|H|\).

    Thus all subgroups of \(G\) are cyclic, and there is a unique subgroup of each order.

Proof

  1. By the Lemma, we know \(G=\{x^i \mid i \in \mathbb{Z}\}\). Now we claim that the elements $$e = x^0, x^1, \dots, x^{n-1}$$ are all distinct. Indeed, if \(x^i=x^j\) for some \(0\leqslant i<j<n\), then \(x^{j-i}=e\) and \(1 \leqslant j-i<n\), contradicting the minimality of the order \(n\) of \(x\). In particular, this shows that \(|G| \geqslant n\).

    Now take any \(m \in \mathbb{Z}\). By the Division Algorithm, we can write \(m = qn+r\) for some integers \(q, r\) with \(0 \le r < n\). Then $$x^m=x^{nq+r}=(x^n)^q x^r=x^r.$$ This shows that every element in \(G\) can be written in the form \(x^r\) with \(0 \leqslant r < n\), so $$G = \{x^0, x^1, \dots, x^{n-1}\} \qquad \textrm{and} \qquad |G| = n.$$

  2. This is part of the Homework.

  3. Consider any subgroup \(H\) of \(G\) with \(H \neq \{ e \}\), and set $$k := \min \{i \in \mathbb{Z} \mid i>0 \ \textrm{and}\ g^i\in H\}.$$ On the one hand, \(H \supseteq \langle g^k \rangle\), since \(H \ni g^k\) and \(H\) is closed for products. Moreover, given any other positive integer \(i\) with \(g^i\in H\), we can again write \(i = kq+r\) for some integers \(q, r\) with \(0 \leqslant r < k\), and $$g^r = g^{i-kq} = g^i (g^{k})^{-q} \in H,$$ so by minimality of \(k\) we conclude that \(r = 0\). Therefore, \(k \mid i\), and thus we conclude that $$H = \langle g^k \rangle.$$ Now to show that \(\Psi\) is a bijection, we only need to prove that \(\Phi\) is a well-defined function and a two-sided inverse for \(\Psi\), and this we leave as an exercise. \( \)

Corollary 3.3.7

Let \(G\) be any finite group and consider \(x \in G\). Then \(|x|\) divides \(|G|\).

Proof

The subgroup \(\langle x \rangle\) of \(G\) generated by \(x\) is a cyclic group, and since \(G\) is finite so is \(\langle x \rangle\). By Theorem 3.3.6 above, \(|x| = |\langle x \rangle|\), and by Lagrange's Theorem, the order of \(\langle x \rangle\) divides the order of \(G\).

There is a sort of quasi-converse to the Theorem 3.3.6

Exercise 3.3.8

Show that if \(G\) is a finite group \(G\) has a unique subgroup of order \(d\) for each positive divisor \(d\) of \(|G|\), then \(G\) must be cyclic.

We can say a little more about the bijection in Theorem 3.3.6. Notice how smaller subgroups (with respect to containment) correspond to smaller divisors of \(G\). We can make this observation rigorous by talking about partially ordered sets.

Definition 3.3.9

An order relation on a set \(S\) is a binary relation \(\leq\) that satisfies the following properties:

  • Reflexive: \(s \leq s\) for all \(s \in S\).
  • Antisymmetric: if \(a\leq b\) and \(b\leq a\), then \(a=b\).
  • Transitive: if \(a\leq b\) and \(b\leq c\), then \(a \leq c\).

A partially ordered set or poset consists of a set \(S\) endowed with an order relation \(\leq\), which we might indicate by saying that the pair \((S,\leq)\) is a partially ordered set.

Given a poset \((S, \leq)\) and a subset \(T \subseteq S\), an upper bound for \(T\) is an element \(s \in S\) such that \(t \leq s\) for all \(t \in T\), while a lower bound is an element \(s \in S\) such that \(s \leq t\) for all \(t \in T\). An upper bound \(s\) for \(T\) is called a supremum if \(s \leq u\) for all upper bounds \(u\) of \(T\), while a lower bound \(t\) for \(T\) is an infimum if \(l \leq t\) for all lower bounds \(t\) for \(T\). A lattice is a poset in which every two elements have a unique supremum and a unique infimum.

Remark 3.3.10

Note that the word unique can be removed from the definition of lattice. In fact, if a subset \(T \subseteq S\) has a supremum, then that supremum is necessarily unique. Indeed, given two suprema \(s\) and \(t\), then by definition \(s \leq t\), since \(s\) is a supremum and \(t\) is an upper bound for \(T\), but also \(t \leq s\) since \(t\) is a supremum and \(s\) is an upper bound for \(T\). By antisymmetry, we conclude that \(s=t\).

Example 3.3.11

The set of all positive integers is a poset with respect to divisibility, setting \(a\leq b\) whenever \(a\mid b\). In fact, this is a lattice: the supremum of \(a\) and \(b\) is \(\mathrm{lcm}(a,b)\) and the infimum of \(a\) and \(b\) is \(\gcd(a,b)\).

Example 3.3.12

Given a set \(S\), the power set of \(S\), meaning the set of all subsets of \(S\), is a poset with respect to containment, where the order is defined by \(A\leq B\) whenever \(A\subseteq B\). In fact, this is a lattice: the supremum of \(A\) and \(B\) is \(A\cup B\) and the infimum of \(A\) and \(B\) is \(A\cap B\).

Exercise 3.3.13

Show that the set of all subgroups of a group \(G\) is a poset with respect to containment, setting \(A \leq B\) if \(A \subseteq B\).

Lemma 3.3.14

The set of all subgroups of a group \(G\) is a lattice with respect to containment.

Proof

Let \(A\) and \(B\) be subgroups of \(G\). We need to prove that \(A\) and \(B\) have an infimum and a supremum. We claim that \(A \cap B\) is the infimum and \(\langle A, B \rangle\) is the supremum. First, these are both subgroups of \(G\), by Lemma 3.1.18 in the case \(A \cap B\) and by definition for the other. Moreover, \(A \cap B\) is a lower bound for \(A\) and \(B\) and \(\langle A, B \rangle\) is an upper bound by definition. Finally, if \(H \leq A\) and \(H \leq B\), then every element of \(h\) is in both \(A\) and \(B\), and thus it must be in \(A \cap B\), so \(H \leq A \cap B\). Similarly, if \(A \leq H\) and \(B \leq H\), then \(\langle A, B \rangle \subseteq H\).

Remark 3.3.15

The isomorphism \(\Psi\) in Theorem 3.3.6 satisfies the following property: if \(d_1\mid d_2\) then \(\Psi(d_1)\subseteq \Psi(d_2)\). In other words, \(\Psi\) preserves the poset structure. This means that \(\Psi\) is a lattice isomorphism between the lattice of divisors of \(|G|\) and the lattice of subgroups of \(G\). Of course the inverse map \(\Phi =\Psi^{-1}\) is also a lattice isomorphism.

Lemma 3.3.16 (Universal Mapping Property of a Cyclic Group)

Let \(G = \langle x \rangle\) be a cyclic group and let \(H\) be any other group.

  1. If \(|x| = n < \infty\), then for each \(y \in H\) such that \(y^n = e\), there exists a unique group homomorphism \(f\!: G \to H\) such that \(f(x) = y\).
  2. If \(|x| = \infty\), then for each \(y \in H\), there exists a unique group homomorphism \(f\!: G \to H\) such that \(f(x) = y\).

In both cases this unique group homomorphism is given by \(f(x^i)=y^i\) for any \(i \in \mathbb{Z}\).

Remark 3.3.17

We will later discuss a universal mapping property of any presentation. This is a particular case of that universal mapping property of a presentation, since a cyclic group is either presented by \(\langle x \mid x^n = e \rangle\) or \(\langle x \mid \textrm{--} \rangle\).

Proof

Recall that either \(G = \{e,x,x^2, \dots, x^{n-1}\}\) has exactly \(n\) elements if \(|x| = n\) or \(G = \{ x^i \mid i \in \mathbb{Z} \}\) with no repetitions if \(|x| = \infty\).

Uniqueness: We have already noted that any homomorphism is uniquely determined by the images of the generators of the domain in Remark 1.5.8, and that \(f\) must then be given by \(f(x^i) =f(x)^i = y^i\).

Existence: In either case, define \(f(x^i) = y^i\). We must show this function is a well-defined group homomorphism. To see that \(f\) is well-defined, suppose \(x^i=x^j\) for some \(i,j\in \mathbb{Z}\). Then, since \(x^{i-j}=e_G\), using Lemma 3.3.5 we have $$\begin{cases} n\mid i-j & \text{ if } |x|=n\\[2pt] i-j=0 & \text{ if } |x|=\infty \end{cases} \ \Rightarrow\ \begin{cases} y^{\,i-j}=y^{nk} & \text{ if } |x|=n\\[2pt] y^{\,i-j}=y^0 & \text{ if } |x|=\infty \end{cases} \ \Rightarrow\ y^{\,i-j}=e_H \ \Rightarrow\ y^ i=y^j.$$ Thus, if \(x^i=x^j\) then \(f(x^i)=y^i=y^j=f(x^j)\). In particular, if \(x^k = e\), then \(f(x^k) = e\), and \(f\) is well-defined.

The fact that \(f\) is a homomorphism is immediate: $$f(x^ix^j)=f(x^{i+j})=y^{i+j}=y^iy^j=f(x^i)f(x^j). $$

Definition 3.3.18

The infinite cyclic group is the group $$C_\infty := \{a^i \mid i \in \mathbb{Z}\}$$ with multiplication \(a^i a^j = a^{i+j}\).

For any natural number \(n\), the cyclic group of order \(n\) is the group $$C_n := \{a^i \mid i \in \{0,\dots,n-1\}\}$$ with multiplication \(a^i a^j = a^{\,i+j \pmod n}\).

Remark 3.3.19

The presentations for these groups are $$C_\infty = \langle a \mid \textrm{--} \rangle \qquad \textrm{ and } \qquad C_n = \langle a \mid a^n=e\rangle.$$

Theorem 3.3.20 (Classification Theorem for Cyclic Groups)

Every infinite cyclic group is isomorphic to \(C_\infty\). Every cyclic group of order \(n\) is isomorphic to \(C_n\).

Proof

Suppose \(G = \langle x \rangle\) with \(|x| = n\) or \(|x| = \infty\), and set $$H=\begin{cases} C_n & \textrm{if } |x| = n \\ C_\infty & \textrm{if } |x| = \infty. \end{cases}$$ By the Universal Mapping Property for cyclic groups, there are homomorphisms \(f\!: G \to H\) and \(g\!: G \to H\) such that \(f(x) = a\) and \(g(a) =x\). Now \(g \circ f\) is an endomorphisms of \(G\) mapping \(x\) to \(x\). But the identity map also has this property, and so the uniqueness clause in the Universal Mapping Property for cyclic groups gives us \(g \circ f = \mathrm{id}_G\). Similarly, \(f \circ g = \mathrm{id}_H\). We conclude that \(f\) and \(g\) are isomorphisms.

Example 3.3.21

For a fixed \(n \geqslant 1\), $$\mu_n := \{z \in \mathbb{C} \mid z^n = 1\}$$ is a subgroup of \((\mathbb{C} \setminus \{0\}, \cdot)\). Since \( \| z^n \| = \|z\|^n =1\) for any \(z \in \mu_n\), then we can write \(z = e^{ri}\) for some real number \(r\). Moreover, the equality \(1 = z^n = e^{nri}\) implies that \(nr\) is an integer multiple of \(2 \pi\). It follows that $$\mu_n = \{1, e^{2 \pi i/n}, e^{4 \pi i/n}, \cdots , e^{(n-1) 2 \pi i/n}\}$$ and that \(e^{2 \pi i/n}\) generates \(\mu_n\). Thus \(\mu_n\) is cyclic of order \(n\). This group is therefore isomorphic to \(C_n\), via the map \(C_n \to \mu_n\) given by \(a^j \mapsto e^{\frac{2 j \pi i}{n}}\).

Exercise 3.3.22

Let \(p>0\) be a prime. Show that every group of order \(p\) is cyclic.

4. Quotient groups

Recall from your undergraduate algebra course the construction for the integers modulo \(n\): one starts with an equivalence relation \(\sim\) on \(\mathbb{Z}\), considers the set \(\mathbb{Z}/n\) of all equivalence classes with respect to this equivalence relation, and verifies that the operations on \(\mathbb{Z}\) give rise to well defined binary operations on the set of equivalence classes. This idea still works if we replace \(\mathbb{Z}\) by an arbitrary group, but one has to be somewhat careful about what equivalence relation is used.

4.1 Equivalence relations on a group and cosets

Let \(G\) be a group and consider an equivalence relation \(\sim\) on \(G\). Let \(G/\sim\) denote the set of equivalence classes for \(\sim\) and write \([g]\) for the equivalence class that the element \(g \in G\) belongs to, that is

\[[x] := \{g \in G \mid g \sim x\}.\]

When does \(G/\sim\) acquire the structure of a group under the operation

\[[x] \cdot [y] := [xy] \ ?\]

Right away, we should be worried about whether this operation is well-defined, meaning that it is independent of our choice of representatives for each class. That is, if \([x] = [x']\) and \([y] = [y']\) then must \([xy] = [x'y']\)? In other words, if \(x \sim x'\) and \(y \sim y'\), must \(xy \sim x'y'\)?

Definition 4.1.1

We say an equivalence relation \(\sim\) on a group \(G\) is compatible with multiplication if \(x \sim y\) implies \(xz \sim yz\) and \(zx \sim zy\) for all \(x,y,z \in G\).

Lemma 4.1.2

For a group \(G\) and equivalence relation \(\sim\), the rule \([x] \cdot [y] = [xy]\) is well-defined and makes \(G/\sim\) into a group if and only if \(\sim\) is compatible with multiplication.

Proof of Lemma 4.1.2

To say that the rule \([x] \cdot [y] = [xy]\) is well-defined is to say that for all \(x, x', y, y' \in G\) we have

\[[x]=[x'] \textrm{ and } [y]=[y'] \implies [x][y]=[x'][y'].\]

So \([xy]=[x'y']\) if and only if whenever \(x\sim x'\) and \(y\sim y'\), then \(xy\sim x'y'\).

Assume \(\sim\) is compatible with multiplication. Then \(x\sim x'\) implies \(xy\sim x'y\) and \(y\sim y'\) implies \(x'y\sim x'y'\), hence by transitivity \(xy\sim x'y'\). Thus \([x] \cdot [y] = [xy]\) is well-defined.

Conversely, assume the rule \([x] \cdot [y] = [xy]\) is well-defined, so that

\[[x]=[x'] \textrm{ and } [y]=[y'] \implies [x][y]=[x'][y'].\]

Setting \(y=y'\) gives us

\[x\sim x' \implies xy\sim x'y.\]

Setting \(x=x'\) gives us

\[y\sim y' \implies xy\sim xy'.\]

Hence \(\sim\) is compatible with multiplication.

So now assume that the multiplication rule is well-defined, which we have now proved is equivalent to saying that \(\sim\) is compatible with the multiplication in \(G\). We need to prove that \(G/\sim\) really is a group. Indeed, since \(G\) itself is a group then given any \(x,y,z \in G\) we have

\[[x] \cdot ([y] \cdot [z]) = [x] \cdot [yz] = [x(yz)] = [(xy)z] = [xy][z] = ([x][y])[z].\]

Moreover, for all \(x \in G\) we have

\[[e_G] [x] = [e_G x] = [x] \qquad \textrm{and} \qquad [x] [e_G] = [x e_G] = [x],\]

so that \([e_G]\) is an identity for \(G/\sim\). Finally,

\[[x][x^{-1}] = [e_G] = e_{G/\sim},\]

so that every element in \(G/\sim\) has an inverse; in fact, this shows that \([x]^{-1} = [x^{-1}]\).

Definition 4.1.3

Let \(G\) be a group and let \(\sim\) be an equivalence relation on \(G\) that is compatible with multiplication. The quotient group is the set \(G/\sim\) of equivalence classes, with group multiplication \([x]\cdot [y] = [xy]\).

Example 4.1.4

Let \(G=\mathbb{Z}\) and fix an integer \(n \geqslant 1\). Let \(\sim\) be the equivalence relation given by congruence modulo \(n\), so \(\sim = \equiv \pmod n\). Then

\[(\mathbb{Z},+)/\sim = (\mathbb{Z}/n,+).\]

But how do we come up with equivalence relations that are compatible with the group law?

Definition 4.1.5 (Cosets)

Let \(H\) be a subgroup of a group \(G\). The left action of \(H\) on \(G\) is given by \[ h\cdot g=hg \quad \textrm{for } h\in H, g\in G. \] The equivalence relation \(\sim_H\) on \(G\) induced by the left action of \(H\) is \[ a \sim_H b \textrm{ if and only if } b = ha \text{ for some } h \in H. \] The equivalence class of \(g \in G\), also called the orbit of \(g\), and also called the right coset of \(H\) in \(G\) containing \(g\), is \[ Hg := \{hg \mid h \in H\}. \] That is, \( a \sim_H b\) if and only if \( Ha = Hb\). There is also a left coset of \(H\) in \(G\) containing \(g\), defined by \[ gH := \{gh \mid h \in H\}. \]

Example 4.1.6

Let \(G=\mathbb{Z}\) and \(H=\langle n\rangle=n\mathbb{Z}=\{nk \mid k\in \mathbb{Z}\}\). Then

\[ x\sim_{ n\mathbb{Z}} y \iff x=y+nk \textrm{ for some } k\in \mathbb{Z} \iff x\equiv y \!\!\!\!\pmod n. \]

Therefore the equivalence relation \(\sim_{ n\mathbb{Z}}\) is the same as congruence modulo \(n\) and the right and left cosets of \(n\mathbb{Z}\) in \(\mathbb{Z}\) are the congruence classes of integers modulo \(n\).

Lemma 4.1.7

Let \(H \leq G\). The following facts about left cosets are equivalent for \(x,y \in G\):

  1. The elements \(x\) and \(y\) belong to the same left coset of \(H\) in \(G\).
  2. \(x = yh\) for some \(h \in H\).
  3. \(y = xh\) for some \(h \in H\).
  4. \(y^{-1}x \in H\).
  5. \(x^{-1}y \in H\).
  6. \(xH = yH\).

Analogously, the following facts about right cosets are equivalent for all \(x,y \in G\):

  1. The elements \(x\) and \(y\) belong to the same right coset of \(H\) in \(G\).
  2. There exists \(h \in H\) such that \(x = hy\).
  3. There exists \(h \in H\) such that \(y = hx\).
  4. \(yx^{-1} \in H\).
  5. \(xy^{-1} \in H\).
  6. \(Hx = Hy\).

Proof of Lemma 4.1.7

We will only prove the statements about left cosets, since the statements about right cosets are analogous.

(1 ⇒ 2). Suppose that \(x\) and \(y\) belong to the same left coset \(gH\) of \(H\) in \(G\). Then \(x=ga\) and \(y=gb\) for some \(a,b \in H\), so \(g=yb^{-1}\) and therefore \(x=yb^{-1}a=ya\) where \(h=b^{-1}a\in H\).

(2 ⇔ 3). We have \(x=yh\) for some \(h \in H\) if and only if \(y = xh^{-1}\) and \(h^{-1} \in H\).

(2 ⇔ 4). We have \(x=yh\) for some \(h \in H\) if and only if \(y^{-1} x=h\in H\).

(4 ⇔ 5). Note that \(y^{-1} x\in H \Leftrightarrow (y^{-1} x)^{-1}\in H \iff x^{-1}y\in H\).

(2 ⇒ 6). Suppose \(x = ya\) for some \(a \in H\). Then by (2 ⇒ 3) we also have \(y = xb\) for some \(b \in H\). Note that for all \(h \in H\), we also have \(ah \in H\) and \(bh \in H\). Then

\[ xH= \{xh \mid h \in H\}=\{y(ah) \mid h \in H\} \subseteq yH \]

and

\[ yH= \{y h\mid h \in H\}=\{x(bh) \mid h \in H\} \subseteq xH. \]

Therefore, \(xH=yH\).

(6 ⇒ 1). Since \(e_G=e_H\in H\), we have \(x=xe_G\in xH\) and \(y=ye_G\in yH\). If \(xH=yH\) then, \(x\) and \(y\) belong to the same left coset.

Lemma 4.1.9

For \(H \leq G\), the collection of left cosets of \(H\) in \(G\) form a partition of \(G\), and similarly for the collection of right cosets:

\[ \bigcup_{x\in G} xH=G \]

and for all \(x,y \in G\), either \(xH = yH\) or \(xH \cap yH = \emptyset\).

Moreover, all left and right cosets have the same cardinality: for any \(x\in G\),

\[|xH| = |Hx|=|H|.\]

Proof of Lemma 4.1.9

Since the left (respectively, right) cosets are the equivalence classes for an equivalence relation, the first part of the statement is just a special case of a general fact about equivalence relation.

Let us nevertheless write a proof for the assertions for right cosets.

Every element \(g \in G\) belongs to at least one right coset, since \(e \in H\) gives us \(g \in Hg\). Thus

\[\bigcup_{x\in G} xH=G.\]

Now we need to show any two cosets are either identical or disjoint: if \(Hx\) and \(Hy\) share an element, then it follows from (1 ⇒ 6) of Lemma 4.1.7 that \(Hx=Hy\). This proves that the right cosets partition \(G\).

To see that all right cosets have the same cardinality as \(H\), consider the function

\[\rho: H \to Hg \quad \textrm{defined by} \quad \rho(h) = hg.\]

This function \(\rho\) is surjective by construction. Moreover, if \(\rho(h) = \rho(h')\) then \(hg = h'g\) and thus \(h = h'\). Thus \(\rho\) is also injective, and therefore a bijection, so \(|Hg| = |H|\).

Definition 4.1.10

The number of left cosets of a subgroup \(H\) of a group \(G\) is denoted by \([G:H]\) and called the index of \(H\) in \(G\). Equivalently, the index \([G : H]\) is the number of right cosets of \(H\).

Corollary 4.1.11 (Lagrange's Theorem revisited)

If \(G\) is a finite group and \(H \leq G\), then

\[|G| = |H| \cdot [G : H].\]

In particular, \(|H|\) is a divisor of \(|G|\).

Example 4.1.12

For \(G =D_{n}\) and \(H = \langle s \rangle = \{e,s\}\), the left cosets \(gH\) of \(H\) in \(G\) are

\[ \{e, s\}, \quad \{r, rs\}, \quad \{r^2, r^2s\}, \cdots , \{r^{n-1}, r^{n-1}s\} \]

and the right cosets \(Hg\) are

\[ \{e, s\}, \quad \{r, r^{-1}s\}, \quad \{r^2, r^{-2}s\}, \cdots , \{r^{n-1}, r^{-n+1}s\}. \]

Note that these lists are not the same, but they do have the same length. For example, \(r\) is in the left coset \(\{ r, rs \}\), while its right coset is \(\{ r, r^{-1}s \}\).

We have \(|G| = 2n\), \(|H| = 2\) and \([G:H] = n\).

Keeping \(G = D_{n}\) but now letting \(K = \langle r \rangle\), the left cosets are \(K\) and

\[ sK = \{s, sr, \dots, sr^{n-1} \} =\{s, r^{n-1}s, r^{n-2}s,\dots, rs \} \]

and the right cosets are \(K\) and

\[ Ks = \{s, r^{n-1}s, r^{n-2}s,\dots, rs \}. \]

In this case \(sK = Ks\), and the left and right cosets are exactly the same. We have \(|G| = 2n\), \(|H| = n\) and \([G:H] = 2\).

4.2 Normal subgroups

Definition 4.2.1 (Normal subgroup)

A subgroup \(N\) of a group \(G\) is normal in \(G\), written \(N \trianglelefteq G\), if \[ gNg^{-1} = N \quad \textrm{for all } g \in G. \]

Example 4.2.2

  1. The trivial subgroups \(\{e\}\) and \(G\) of a group \(G\) are always normal.
  2. Any subgroup of an abelian group is normal.
  3. For any group \(G\), \(\mathrm{Z}(G)\trianglelefteq G\).

Remark 4.2.3

The relation of being a normal subgroup is not transitive. For example, for

\[ V=\{e, (1 2)(3 4), (1 3)(2 4), (1 4)(2 3)\} \]

one can show that \(V \trianglelefteq S_4\), and since \(V\) is abelian (because you proved before that all groups with 4 elements are abelian!), the subgroup \(H = \{e,(12)(34)\}\) is normal in \(V\). But \(H\) is not normal in \(S_4\), since for example

\[ (1 3) [(1 2)(3 4)] (1 3)^{-1} = (3 2)(1 4) \notin H. \]

Lemma 4.2.4

Assume \(N\) is a subgroup of \(G\). The following conditions are equivalent.

  1. \(N\) is a normal subgroup of \(G\), meaning that \(gNg^{-1} = N\) for all \(g \in G\).
  2. We have \(gNg^{-1} \subseteq N\) for all \(g \in G\), meaning that \(gng^{-1} \in N\) for all \(n \in N\) and \(g \in G\).
  3. The right and left cosets of \(N\) agree. More precisely, \(gN = Ng\) for all \(g \in G\).
  4. We have \(gN \subseteq Ng\) for all \(g \in G\).
  5. We have \(Ng \subseteq gN\) for all \(g \in G\).

Proof of Lemma 4.2.4

Note that \(gNg^{-1} = N\) if and only if \(gN = Ng\) and hence (a) \(\iff\) (c).

The implication \((a) \Rightarrow (b)\) is immediate. Conversely, if \(gNg^{-1} \subseteq N\) for all \(g\), then

\[ N = g^{-1}(gNg^{-1})g \subseteq g^{-1}Ng. \]

Thus (b) implies (a).

Finally, \((b)\), \((d)\), and \((e)\) are all equivalent since

\[ gNg^{-1} \subseteq N \iff gN \subseteq Ng \]

and

\[ g^{-1}Ng \subseteq N \iff Ng \subseteq gN. \]

Exercise 4.2.5

Kernels of group homomorphisms are normal.

We will see later that, conversely, all normal subgroups are kernels of group homomorphisms.

Exercise 4.2.6

Any subgroup of index two is normal.

Exercise 4.2.7

Preimages of normal subgroups are normal, that is, if \(f:G\to H\) is a group homomorphism and \(K\trianglelefteq H\), then \(f^{-1}(K)\trianglelefteq G\).

Remark 4.2.8

Let \(A \leq B\) be subgroups of a group \(G\). If \(A\) is a normal subgroup of \(G\), then in particular for all \(b \in B\) we have

\[ bab^{-1} \in A, \]

since \(b \in B \subseteq G\). Therefore, \(A\) is a normal subgroup of \(B\).

Example 4.2.9

Let us go back to Example 4.1.12, where we considered the group \(G =D_{n}\) and the subgroups

\[ H = \langle s \rangle = \{e,s\} \quad \textrm{and} \quad K = \langle r \rangle. \]

We showed that the left and right cosets of \(H\) are not the same, and thus \(H\) is not a normal subgroup of \(G\). We also showed that the left and right cosets of \(K\) are in fact the same, which proves that \(K\) is a normal subgroup of \(G\). Note that \(H\) is nevertheless a very nice group -- it is cyclic and thus abelian -- despite not being a normal subgroup of \(G\). This indicates that whether a subgroup \(H\) is a normal subgroup of \(G\) has a lot more to do about the relationship between \(H\) and \(G\) than the properties of \(H\) as a group on its own.

Definition 4.2.10

The alternating group \(A_n\) is the subgroup of \(S_n\) generated by all products of two transpositions.

Remark 4.2.11

Recall that we proved that the sign of a permutation is well-defined. Notice also that the inverse of an even permutation must also be even, and the product of any two even permutations is even, and thus \(A_n\) can also be described as the set of all even permutations.

Lemma 4.2.12

For all \(n \geqslant 2\), \(A_n\trianglelefteq S_n\).

Proof of Lemma 4.2.12

Consider the sign map \(\operatorname{sign}\!\!: S_n \to \mathbb{Z}/2\) that takes each permutation to its sign, meaning

\[ \operatorname{sign}(\sigma) = \begin{cases} 1 & \textrm{if $\sigma$ is even} \\ -1 & \textrm{if $\sigma$ is odd}. \end{cases} \]

This a group homomorphism (exercise!), and by construction the kernel of \(\operatorname{sign}\) is \(A_n\). Since kernels of group homomorphisms are normal subgroups, we conclude that \(A_n\) must be a normal subgroup of \(S_n\).

Alternatively, we can prove the Lemma by showing that \(A_n\) is a subgroup of \(S_n\) of index \(2\).

The last condition in out list of equivalent characterizations of normal subgroups implies that for all \(g \in G\) and \(n \in N\), we have \(gn = n'g\) for some \(n' \in N\), which is precisely what was needed to make the group law on \(G/\sim_H\) well-defined. Recall that

\[ a \sim_H b \textrm{ if and only if } b = ha \text{ for some } h \in H. \]

Lemma 4.2.13

Let \(G\) be a group. An equivalence relation \(\sim\) on \(G\) is compatible with multiplication if and only if \(\sim \,= \,\sim_N\) for some normal subgroup \(N \trianglelefteq G\).

Proof of Lemma 4.2.13

(\(\Rightarrow\)) Suppose \(\sim\) is compatible with multiplication, and set \(N := \{g \in G \mid g \sim e\}\). Then we claim that \(N\trianglelefteq G\) and \(\sim=\sim_N\).

To see that \(N\trianglelefteq G\), first we check that \(N\) is a subgroup of \(G\) using the two-step test. Since \(e\sim e\), \(N\) is nonempty. If \(g \in N\) and \(h\in N\), then \( g\sim e\) and \( h\sim e\), so \( gh \sim ge = g \sim e\), so \(gh\in N\), and \( e = g g^{-1} \sim e g^{-1} = g^{-1}\) so \(g^{-1} \in N\). Thus, \(N \leq G\). Now let \(n\in N\) and \(g\in G\). Since \(n\in N\), then \(n \sim e\), and thus since \(\sim\) is compatible with multiplication we conclude that for all \(g \in G\) we have

\[ gng^{-1}\sim geg^{-1} = e \in N. \]

This shows that \(gng^{-1} \subseteq N\) for any \(n \in N\) and any \(g\in G\), and thus \(N\) is a normal subgroup of \(G\).

It remains to check that \(\sim=\sim_N\). Given any \(a, b \in G\), since \(\sim\) is compatible with multiplication then

\[ a \sim b \implies ab^{-1} \sim bb^{-1} = e \implies ab^{-1} \in H. \]

Thus there exists some \(h \in H\) such that

\[ ab^{-1} = h \implies a = hb. \iff a \sim_H b. \]

(\(\Leftarrow\)) If \(\sim \,= \,\sim_N\), let \(x,y,z\in G\) such that \(x\sim_N y\). Then \(y=nx\) for some \(n\in N\), so \(yz=nxz\) and

\[ zy=znx=zn(z^{-1}z)x=(znz^{-1})zx=n'zx \]

for some \(n'\in N\), where the last equality uses the normal subgroup property. We deduce that \(yz\sim_N xz\) and \(zy\sim_N zx\), so \(\sim_N\) is compatible with multiplication.

4.3 Quotient groups

Definition 4.3.1

Let \(N\) be a normal subgroup of a group \(G\). The quotient group \(G/N\) is the group \(G/\sim_N\), where \(\sim_N\) is the equivalence relation induced by the left action of \(N\) on \(G\). Thus \(G/N\) is the set of left cosets of \(N\) in \(G\), and the multiplication is given by \[ xN\cdot yN := (xy)N. \] The identity elements is \(e_GN = N\) and for each \(g \in G\), the inverse of \(gN\) is \((gN)^{-1} = g^{-1}N\).

Remark 4.3.2

Note that, by a previous lemma, \(G/N\) is also the set of right cosets of \(N\) in \(G\) with multiplication given by \[ Nx\cdot Ny := N(xy). \]

In order to prove statements about a quotient \(G/N\), it is often useful to rewrite those statements in terms of elements in the original group \(G\), but one needs to be careful when translating.

Remark 4.3.3

Given a group \(G\) and a normal subgroup \(N\), equality in the quotient does not mean that the representatives are equal. By an earlier lemma, \[ gN = hN \iff gh^{-1} \in N. \] In particular, \(gN = N\) if and only if \(g \in N\).

Remark 4.3.4

Note that \(|G/N| = [G :N ]\). By Lagrange's Theorem, if \(G\) is finite then \[ \left | G/N\right | = \frac{|G|}{|N|}. \]

Example 4.3.5

We saw that the subgroup \(N = \langle r \rangle\) of \(D_{n}\) is normal. The quotient \(D_{n}/N\) has just two elements, \(N\) and \(sN\), and hence it must be cyclic of order \(2\), since that is the only one group of order \(2\). In fact, note that \(|N| = n\) and \(|D_n| = 2n\), so by Lagrange's Theorem \[ |D_n/N| = \frac{2n}{n} =2. \]

Example 4.3.6

The infinite dihedral group \(D_\infty\) is the set \[ D_\infty = \{r^i,r^is \mid i \in \Z\} \] together with the multiplication operation defined by \[ r^i \cdot r^j = r^{i+j}, \quad r^i \cdot (r^js) = r^{i+j}s, \quad (r^is) \cdot r^j = r^{i-j}s, \quad \textrm{and} \quad (r^is)(r^js) = r^{i-j}. \] One can show that \(D_\infty\) is the group with presentation \[ D_\infty=\langle r,s \mid s^2=e, srs=r^{-1}\rangle. \] Then \(\langle r^n \rangle \trianglelefteq D_\infty\) and \(D_\infty/ \langle r^n \rangle \cong D_{n}\) via the map \(r\langle r^n \rangle\mapsto r\) and \(s\langle r^n \rangle\mapsto s\).

Remark 4.3.7

In the example above, both groups \(D_\infty\) and \(\langle r^n \rangle\) are infinite, but \[ [D_\infty:\langle r^n \rangle]=\left|D_\infty/\langle r^n \rangle\right|=|D_{n}|=2n. \] This shows that the quotient of an infinite group by an infinite subgroup can be a finite group.

The quotient of an infinite group by an infinite subgroup can also be infinite. In contrast, a quotient of any finite group must necessarily be finite.

Lemma 4.3.8

Let \(G\) be a group and consider a normal subgroup \(N\) of \(G\). Then the map \[ \pi: G \longrightarrow G/N,\qquad g \longmapsto \pi(g)=gN. \] is a surjective group homomorphism with \(\ker(\pi)=N\).

Proof of Lemma 4.3.8

Surjectivity is immediate from the definition. Now we claim that \(\pi\) is a group homomorphism:

\[ \begin{aligned} \pi(gg') & =(gg')N && \hspace{2em} \textrm{by definition of } \pi \\ & = gN\cdot g'N && \hspace{2em} \textrm{by definition of the multiplication on } G/N \\ & = \pi(g)\pi(g') && \hspace{2em} \textrm{by definition of } \pi. \end{aligned} \]

Finally, by our lemma on costs, we have \[ \ker(\pi)=\{g\in G\mid gN=e_GN\}=N. \]

Definition 4.3.9

Let \(G\) be any group and \(N\) be a normal subgroup of \(G\). The group homomorphism \[ \begin{array}{ccc} G & \stackrel{\pi}{\longrightarrow} & G/N\\ g & \mapsto & gN \end{array} \] is called the canonical (quotient) map, the canonical surjection, or the canonical projection} of \(G\) onto \(G/N\).

The canonical projection is a surjective homomorphism. We might indicate that in our notation by writing \(\pi\!: G \twoheadrightarrow G/N\). More generally

Notation 4.3.10

If \(f\!: A \to B\) is a surjective function, we might write \(f\!: A \twoheadrightarrow B\) to denote that surjectivity.

Normal subgroups are precisely those that can be realized as kernels of a group homomorphism.

Corollary 4.3.11

A subgroup \(N\) of a group \(G\) is normal in \(G\) if and only if \(N\) is the kernel of a homomorphism with domain \(G\).

Proof of Corollary 4.3.11

The kernel of any group homomorphism is a normal subgroup; we have just shown that every normal subgroup can be realized as the kernel of a group homomorphism.

Definition 4.3.12

Let \(G\) be any group. For \(x,y \in G\), the commutator of \(x\) and \(y\) is the element \[ [x,y] := xyx^{-1}y^{-1}. \]

The commutator subgroup or derived subgroup of \(G\), denoted by \(G'\) or \([G,G]\), is the subgroup generated by all commutators of elements in \(G\). More precisely, \[ [G,G] := \langle [x,y] \mid x, y \in G \rangle. \]

Remark 4.3.13

Note that \([x,y] = e\) if and only if \(xy = yx\). More generally, \([G,G] = \{e_G\}\) if and only if \(G\) is abelian.

The commutator subgroup measures how far \(G\) is from being abelian: if the commutator is as small as possible, then \(G\) is abelian, so a larger commutator indicates the group is somehow further from being abelian.

Remark 4.3.14 (The commutator is a normal subgroup)

A typical element of \([G,G]\) has the form \[ [x_1, y_1] \cdots [x_k, y_k] \qquad \textrm{ for } k \geqslant 1 \textrm{ and } x_1, \dots, x_k, y_1, \dots, y_k \in G. \] We do not need to explicitly include inverses since \[ [x,y]^{-1} = yxy^{-1}x^{-1} = [y,x]. \]

Exercise 4.3.15

Show that \([G,G]\) is a normal subgroup of \(G\).

Definition 4.3.16

Let \(G\) be a group and \([G,G]\) be its commutator subgroup. The associated quotient group \[ G^{\textrm{ab}} := G/[G,G] \] is called the abelianization of \(G\).

Remark 4.3.17

In this remark we will write \(G'\) instead of \([G,G]\) for convenience. The abelianization \(G/G'\) of any group \(G\) is an abelian, since \[ [xG', yG'] = [x,y]G' = G' = e_{G/G'} \] for all \(x,y \in G\).

Exercise 4.3.18

Let \(G\) be any group. The abelianization of \(G\) is the largest quotient of \(G\) that is abelian, in the sense that if \(G/N\) is abelian for some normal subgroup \(N\), then \([G,G] \subseteq N\).

4.4 The Isomorphism Theorems for groups

Theorem 4.4.1 (Universal Mapping Property (UMP) of a Quotient Group)

Let \(G\) be a group and \(N\) a normal subgroup. Given any group homomorphism \(f\!: G \to H\) with \(N \subseteq \ker(f)\), there exists a unique group homomorphism \[ \overline{f}: G/N \to H \] such that the triangle G π f GN f H commutes, meaning that \(\overline{f} \circ \pi = f\).

Moreover, \(\mathrm{im}(f) = \mathrm{im}(\overline{f})\). In particular, if \(f\) is surjective, then \(\overline{f}\) is also surjective. Finally, \[ \ker(\overline{f}) = \ker(f)/N := \{ g N \mid f(g) = e_H\}. \]

Proof (of Theorem 4.4.1)

Suppose that such a homomorphism \(\overline{f}\) exists. Since \(f=\pi\circ \overline f\), then \(\overline{f}\) has to be given by \[ \overline{f}(gN) = \overline{f}(\pi(g)) = f(g). \] In particular, \(\overline f\) is necessarily unique. To show existence, we just need to show that this formula determines a well-defined homomorphism. Given \(xN = yN\), we have \[ y^{-1}x \in N \subseteq \ker(f) \] and so \[ f(y)^{-1} f(x) = f(y^{-1}x) = e \implies f(y) = f(x). \] This shows that \(\overline{f}\) is well-defined. Moreover, for any \(x,y \in G\), we have \[ \overline{f}((xN)(yN)) = \overline{f}((xy)N) = f(xy) = f(x)f(y) =\overline{f}(xN) \overline{f}(yN). \] Thus \(\overline{f}\) is a group homomorphism.

The fact that \(\mathrm{im} f=\mathrm{im} \overline f\) is immediate from the formula for \(\overline f\) given above, and hence \(f\) is surjective if and only if \(\overline f\) is surjective.

Finally, we have \[ xN \in \ker (\overline{f}) \iff \overline{f}(xN) = e_H \iff f(x) = e_H \iff x \in \ker(f). \] Therefore, if \(xN \in \ker (\overline{f})\) then \(xN \in \ker(f)/N\). On the other hand, if \(xN \in \ker(f)/N\) for some \(x \in G\), then \(xN = yN\) for some \(y \in \ker(f)\) and hence \(x = yz\) for some \(z \in N\). Since \(N \subseteq \ker(f)\), then \(x, y \in \ker(f)\), and thus we conclude that \(x = yz \in \ker(f)\).

In short, the UMP of quotient groups says that to give a homomorphism from a quotient \(G/N\) is the same as to give a homomorphism from \(G\) with kernel containing \(N\).

Corollary 4.4.2

Let \(G\) be any group and let \(A\) be an abelian group. Any group homomorphism \(f\!: G \to A\) must factor uniquely through the abelianization \(G^{\textrm{ab}}\) of \(G\): there exists a unique homomorphism \(\overline{f}\) such that \(f\) factors as the composition \[ f\!:G \stackrel{\pi}{\longrightarrow} G/[G,G] \stackrel{\overline{f}}{\longrightarrow} A. \]

Proof (of Corollary 4.4.2)

Let \(\pi\!: G \to G^\textrm{ab} = G/[G,G]\) be the canonical projection. Since \(A\) is abelian, then \[ f([x,y]) = [f(x),f(y)] = e \] for all \(x, y \in G\), and thus \([G,G] \subseteq \ker(f)\). By the Universal Mapping Property for quotient groups, the homomorphism \(f\) must uniquely factor as \[ f\!:G \stackrel{\pi}{\longrightarrow} G/[G,G] \stackrel{\overline{f}}{\longrightarrow} A. \]

The slogan for the previous result is that any homomorphism from a group \(G\) to any abelian group factors uniquely through the abelianization \(G/[G,G]\) of \(G\).

We are now ready for the First (and most important) Isomorphism Theorem.

Theorem 4.4.3 (First Isomorphism Theorem)

If \(f\!: G \to H\) is a homomorphism of groups, the map \(\overline{f}\) defined by \[ \begin{aligned} G/\ker(f) &\;\longrightarrow\; H \\ g\ker(f) &\;\longmapsto\; f(g) \end{aligned} \] induces an isomorphism \[ \overline{f}: G/\ker(f) \ {\cong} \ \mathrm{im}(f). \] In particular, if \(f\) is surjective, then \(f\) induces an isomorphism \(\overline{f}\!: G/\ker(f) \cong H\).

Proof (of Theorem 4.4.3)

Let us first restrict the target of \(f\) to \(\mathrm{im}(f)\), so that we can assume without loss of generality that \(f\) is surjective. By the Universal Mapping Property for quotient groups, there exists a (unique) homomorphism \(\overline{f}\) such that \(\overline{f} \circ \pi = f\), where \(\pi\!: G \to G/\ker(f)\) is the canonical projection. Moreover, the kernel \(\ker(f)/\ker(f)\) of \(\overline{f}\) consists of just one element, the coset \(\ker(f)\) of the identity, and so \(\overline{f}\) it injective. Moreover, the Universal Mapping Property for quotient groups also says that the image of \(\overline{f}\) equals the image of \(f\). We conclude that \(\overline{f}\) is an isomorphism.

Example 4.4.4

Let \(F\) be a field and consider \(G = \mathrm{GL}_n(F)\) for some integer \(n \geqslant 1\). We claim that \(H = \mathrm{SL}_n(F)\), the square matrices with determinant \(1\), is a normal subgroup of \(G = \mathrm{GL}_n(F)\). Indeed, given \(A \in \mathrm{GL}_n(F)\) and \(B \in \mathrm{SL}_n(F)\), then \[ \det(ABA^{-1}) = \det(A) \underset{1}{\underbrace{\det(B)}} \det(A)^{-1} = \det(A) \det(A)^{-1} = 1, \] so \(ABA^{-1} \in H\). The map \[ \det\!: \mathrm{GL}_n(F) \to (F^\times, \cdot) \] is a surjective group homomorphism whose kernel is by definition of \(\mathrm{SL}_n(F)\). By the First Isomorphism Theorem, \[ \mathrm{GL}_n(F)/\mathrm{SL}_n(F) \cong (F^\times, \cdot). \]

Example 4.4.5

Note that \(N = (\{\pm 1 \}, \cdot)\) is a subgroup of \(G = (\mathbb{R} \setminus \{0\}, \cdot)\), and \(N\) is normal in \(G\) since \(G\) is abelian. We claim that \(G/N\) is isomorphic to \((\mathbb{R}_{>0}, \cdot)\). To prove this, define \[ f\!: \mathbb{R}^\times \to \mathbb{R}_{>0} \] to be the absolute value function, so that \(f(r) = |r|\). Then \(f\) is a surjective homomorphism and its kernel is \(N\). The First Isomorphism Theorem gives \[ G/N \cong (\mathbb{R}_{>0}, \cdot). \]

Example 4.4.6

We showed that \(D_n/<r>\) is isomorphic to the cyclic group of order \(2\). Let us now reprove that fact using the First Isomorphism Theorem.

Recall that \((\{\pm 1\}, \cdot)\) is a group with \(\cdot\) the usual multiplication. Define \(f\!: D_{n} \longrightarrow \{\pm 1\}\) by \[ f(\alpha) = \begin{cases} \, 1 & \textrm{ if $\alpha$ preserves orientation} \\ -1 & \textrm{ if $\alpha$ reverses orientation} \end{cases} = \begin{cases} \, 1 & \textrm{ if $\alpha$ is a rotation} \\ -1 & \textrm{ if $\alpha$ is a reflection}. \end{cases} \] One can show (exercise!) that this is a surjective homomorphism with kernel \(\ker f = \langle r \rangle\), and hence by the First Isomorphism Theorem \[ D_{n}/\langle r \rangle \cong (\{\pm 1\}, \cdot). \]

There are three other isomorphism theorems that we will discuss. The numbering of the other isomorphism theorems is not standard, so we prefer to give them somewhat descriptive names. Our second isomorphism theorem will be the Diamond Isomorphism Theorem. To set up the Diamond Isomorphism Theorem, we need some more background first.

Definition 4.4.7

Given subgroups \(H\) and \(K\) of a group \(G\), we define the subset \(HK\) of \(G\) by \[ HK := \{hk \mid h \in H, k\in K\}. \]

Note that \(HK\) is in general only a subset of \(G\), not a subgroup.

Remark 4.4.8

Given subgroups \(H\) and \(K\) of a group \(G\), note that \(H\) and \(K\) are both subgroups of \(HK\). For example, any element \(h \in H\) is in \(HK\) because \(e \in K\) and \(h = he \in HK\).

Exercise 4.4.9

Let \(H\) and \(K\) be subgroups of \(G\).

  1. The subset \(HK\) is a subgroup of \(G\) if and only if \(HK=KH\).
  2. If at least one of \(H\) or \(K\) is a normal subgroup of \(G\), then \[ HK\leq G \quad \textrm{and} \quad HK=KH=\langle H\cup K\rangle. \]

Warning! The identity \(HK=KH\) does not mean that every pair of elements from \(H\) and \(K\) must commute, as the example below will show; this is only an equality of sets.

Example 4.4.10

In \(D_{n}\), consider the subgroups \(H=\langle s\rangle\) and \(K=\langle r\rangle\). The work we did in computing the cosets of \(D_n\) shows that \[ HK=KH=D_{2}, \] but \(r\) and \(s\) do not commute. The fact that \(HK=KH\) can also be justified by observing that \(K\trianglelefteq D_{n}\) and using the exercise above.

Theorem 4.4.11 (Diamond Isomorphism Theorem)

Let \(G\) be a group, \(H \leq G\), and \(N \trianglelefteq G\). Then \[ HN \leq G, \quad N \cap H \trianglelefteq H, \quad N \trianglelefteq HN \] and there is an isomorphism \[ \frac{H}{N \cap H} \stackrel{\longrightarrow}{\cong} \frac{HN}{N} \] given by \[ h \cdot (N \cap H) \mapsto hN. \]

Proof (of Theorem 4.4.11)

We leave the facts that \(HN \leq G\) and \(N \cap H \trianglelefteq H\) as exercises. Since \(N \trianglelefteq G\), then \(N \trianglelefteq HN\). Let \(\pi\!: HN \to \frac{HN}{N}\) be the canonical projection. Define \[ \begin{aligned} H &\xrightarrow{\ f\ }& HN/N \\ h &\longmapsto & f(h)=hN \end{aligned} \] This is a homomorphism, since it is the composition of homomorphisms \[ f\!: H \subseteq HN \stackrel{\pi}{\longrightarrow} \frac{HN}{N}, \] where the first map is just the inclusion. Moreover, \(f\) is surjective since \[ hnN = hN = f(h) \] for all \(h \in H\) and \(n \in N\). The kernel of \(f\) is \[ \ker(f)=\{h \in H \mid hN = N\} = H \cap N. \] The result now follows from the First Isomorphism Theorem applied to \(f\).

Corollary 4.4.12

If \(H\) and \(N\) are finite subgroups of \(G\) and \(N \trianglelefteq G\), then \[ |HN |= \frac{|H|\cdot |N| }{\left|H \cap N\right|}. \]

Proof (of Corollary 4.4.12)

By the Diamond Isomorphism Theorem, \[ \frac{H}{N \cap H} \cong \frac{HN}{N}. \] The result now follows from Lagrange's Theorem: \[ \frac{|H|}{|N \cap H|} = \frac{|HN|}{|N|}. \]

In fact, the corollary is also true without requiring that \(N\) is normal.

Example 4.4.13

Fix a field \(F\) and an integer \(n \geqslant 1\). Let \(G = \mathrm{GL}_n(F)\) and \(N = \mathrm{SL}_n(F)\), and recall that we showed that \(N\) is a normal subgroup of \(G\). Let \(H\) be the set of diagonal invertible matrices, which one can show is also a subgroup of \(G\). One can show that every invertible matrix \(A\) can be written as a product of a diagonal matrix and a matrix of determinant \(1\), and thus \(H N = G\). By the Diamond Isomorphism Theorem, \[ H/(N \cap H) \cong G/N \] and since we showed that \[ G/N \cong (F^\times, \cdot), \] where \(F^\times = F \setminus \{0\}\), we get \[ H/(N \cap H) \cong (F^\times, \cdot). \]

BWe need to get a better understanding of the subgroups of a quotient group. That is the content of what is known as the Lattice Isomorphism Theorem.

Theorem 4.4.14 (Lattice Isomorphism Theorem)

Let \(G\) be a group and \(N\) a normal subgroup of \(G\), and let \(\pi\!: G \twoheadrightarrow G/N\) be the quotient map. There is an order-preserving bijection of posets (a lattice isomorphism) \[ \begin{aligned} &\{\text{subgroups of $G$ containing $N$}\} &\xrightarrow{\;\;\Psi\;\;} &\ \{\text{subgroups of $G/N$}\} \\[0.6em] &H &\mapsto \ &\Psi(H) = H/N \\[0.6em] &\Phi(A) = \pi^{-1}(A) = \{x \in G \mid \pi(x)\in A\} &\leftarrow \ &A \end{aligned} \] This bijection enjoys the following properties:

  1. Subgroups correspond to subgroups: \[ H \leq G \iff H/N \leq G/N. \]
  2. Normal subgroups correspond to normal subgroups: \[ H \trianglelefteq G \iff H/N \trianglelefteq G/N. \]
  3. Indices are preserved: \[ [G:H] = [G/N : H/N]. \]
  4. Intersections and subgroups generated by pairs are preserved: \[ H/N \cap K/N = (H \cap K)/N \quad \textrm{and} \quad \langle H/N \cup K/N \rangle = \langle H \cup K \rangle/N. \]

Proof (of Theorem 4.4.14)

We showed that the quotient map \(\pi\!:G\to G/N\) is a surjective group homomorphism. It will be useful to rewrite the maps in the statement of the theorem in terms of \(\pi\). Notice that \(\Psi(H)=H/N=\{hN \mid h\in H\}=\pi(H)\). Note that \(\Psi\) does indeed land in the correct codomain, since images of subgroups through group homomorphisms are subgroups, and thus \(\pi(H)\leq G/N\) for each \(H \leq G\). Thus \(\Psi\) is well-defined. We claim \(\Phi\) also lands in the correct codomain. Indeed, preimages of subgroups through group homomorphisms are subgroups, and thus in particular for each \(A \leq G\) we have \(\pi^{-1}(A)\leq G\). Moreover, for any \(A \leq G\) we have \(\{e_GN\} \subseteq A\), hence \[ N = \ker(\pi) = \pi^{-1}(\{e_GN\}) \subseteq \pi^{-1}(A) = \Phi(A). \] Thus \(\Psi\) is well-defined.

To show that \(\Psi\) is bijective, we will show that \(\Phi\) and \(\Psi\) are mutual inverses. First, note that since \(\pi\) is surjective, then \(\pi(\pi^{-1}(A))=A\) for all subgroups \(A\) of \(G/N\), and thus \[ (\Psi\circ\Phi)(A)=\pi(\pi^{-1}(A))=A. \] Moreover, \[ \begin{aligned} x\in \pi^{-1}(H/N) &\iff \pi(x)\in H/N \\ & \iff xN=hN & \text{ for some } h\in H \\ & \iff x\in hN & \text{ for some } h\in H \\ & \iff x\in H & \text{since } N\subseteq H. \end{aligned} \] Thus \[ (\Phi\circ\Psi)(H)=\pi^{-1}(\pi(H))=\pi^{-1}(H/N)=H. \] Thus, \(\Psi\) and \(\Phi\) are well-defined and inverse to each other. Since \(\pi\) and \(\pi^{-1}\) both preserve containments, each of \(\Psi\), \(\Psi^{-1}\) preserves containments as well.

Again, images and preimages of subgroups by group homomorphisms are subgroups, which proves (1). Moreover, if \(N \leq H \leq G\) and \(H \trianglelefteq G\), then \(ghg^{-1} \in H\) for all \(g \in G\) and all \(h \in G\), and thus \[ (gN)(hN)(gN)^{-1}= (ghg^{-1})N \in H/N. \] Therefore, if \(N \leq H \trianglelefteq G\), then \(H/N \trianglelefteq G/N\). Finally, the preimage of a normal subgroup is normal. We have now shown (2).

We leave (3) as an exercise, and (4) is a consequence of the more general fact that lattice isomorphisms preserve suprema and infima.

We record here what is left to do.

Exercise 4.4.15

Let \(G\) be a group and \(N\) a normal subgroup of \(G\). For all subgroups \(H\) of \(G\) with \(N \leq H\), show that \[ [G:H] = [G/N : H/N] \quad \textrm{and} \quad [G: \pi^{-1}(A)] = [G/N : A]. \]

Theorem 4.4.16 (Cancelling Isomorphism Theorem)

Let \(G\) be a group, \(M \leq N \leq G\), \(M \trianglelefteq G\) and \(N \trianglelefteq G\). Then \[ M \trianglelefteq N, \qquad N/M \trianglelefteq G/M, \] and there is an isomorphism \[ \begin{aligned} \frac{(G/M)}{(N/M)} &\;\xrightarrow{\;\cong\;}\; G/N \\ gM &\;\mapsto\; gN \end{aligned} \]

Proof (of Theorem 4.4.16)

Since \(M\) is a normal subgroup of \(G\), then it is also a normal subgroup of \(N\). Similarly, the fact that \(N\) is normal in \(G\) implies that it is normal in \(G/M\) by the Lattice Isomorphism Theorem.

The kernel of the canonical map \(\pi: G \twoheadrightarrow G/N\) contains \(M\), and so by the Universal Mapping Property of quotient groups we get an induced homomorphism \[ \phi\!: G/M \to G/N \] with \(\phi(gM) = \pi(g) = gN\). Moreover, we know \[ \ker(\phi) = \ker(\pi)/M = N/M. \] Finally, apply the First Isomorphism Theorem to \(\phi\).

We can now prove the statement about indices in the Lattice Isomorphism Theorem in the case of normal subgroups.

Corollary 4.4.17

Let \(G\) be a group and \(N\) a normal subgroup of \(G\). For all normal subgroups \(H\) of \(G\) with \(N \leq H\), \[ [G:H] = [G/N : H/N] \quad \textrm{and} \quad [G: \pi^{-1}(A)] = [G/N : A]. \]

Proof (of Corollary 4.4.17)

By the Cancelling Isomorphism Theorem, \[ G/H \cong \frac{(G/N)}{(H/N)} \] and thus their orders are the same; in particular, \[ [G:H] = |G/H| = \left| \frac{(G/N)}{(H/N)} \right| = [G/N: H/N]=[G/N : H/N]. \]

4.5 Presentations as quotient groups

We can finally define group presentations in a completely rigorous manner.

Definition 4.5.1

Let \(A\) be a set. Consider the new set of symbols \[ A^{-1}=\{a^{-1} \mid a \in A\}. \] Consider the set of all finite words written using symbols in \(A \cup A^{-1}\), including the empty word. If a word \(w\) contains consecutive symbols \(a a^{-1}\) or \(a^{-1}a\), we can simplify \(w\) by erasing those two consecutive symbols, and we obtain a word that is equivalent to \(w\). If a word cannot be simplified any further, we say that it is reduced. Given any \(a \in A\), \(a^1\) denotes \(a\), to distinguish it from \(a^{-1}\).

The free group on \(A\), denoted \(F(A)\), is the set of all reduced words in \(A \cup A^{-1}\). In symbols, \[ F(A):=\{a_1^{i_1}a_2^{i_2}\cdots a_m^{i_m} \mid m \geqslant 0, a_j\in A, i_j\in\{-1,1\}\}. \] The set \(F(A)\) is a group with the operation in which any two words are multiplied by concatenation.

Example 4.5.2

The free group on a singleton set \(A={x}\) is the infinite cyclic group \(C_\infty\).

Theorem 4.5.3 (Universal mapping property for free groups)

Let \(A\) be a set, let \(F(A)\) be the free group on \(A\), and let \(H\) be any group. Given a function \(g\! :A \to H\), there is a unique group homomorphism \(f\!:F(A) \to H\) satisfying \(f(a) = g(a)\) for all \(a \in A\).

Proof (of Theorem 4.5.3)

Let \(f\!:F(A) \to H\) be given by \[ f(a_1^{i_1}a_2^{i_2}\cdots a_m^{i_m})=g(a_1)^{i_1}g(a_2)^{i_2}\cdots g(a_m)^{a_m} \] for any \( m\geqslant 0\), \(a_j \in A\), and \(i_j\in\{-1,1\}\). To check that this is a well-defined function, note that \[ f(a_1^{i_1}a_2^{i_2}\cdots aa^{-1}\cdots a_m^{i_m})=g(a_1)^{i_1}g(a_2)^{i_2}\cdots g(a)g(a)^{-1}\cdots g(a_m)^{a_m}=f(a_1^{i_1}a_2^{i_2}\cdots a_m^{i_m}) \] for any \(a\in G\) and similarly for inserting \(a^{-1}a\). The fact that \(f\) is a group homomorphism and its uniqueness are left as an exercise.

Definition 4.5.4

Let \(G\) be a group and let \(R \subseteq G\) be a set. The normal subgroup of \(G\) generated by \(R\), denoted \(\langle R \rangle ^N\), is the set of all products of conjugates of elements of \(R\) and inverses of elements of \(R\). In symbols, \[ \langle R \rangle ^N= \{ g_1r_1^{i_1}g_1^{-1} \dots g_mr_m^{i_m}g_m^{-1} \mid m \geqslant 0, i_j \in \{1,-1\}, r_j \in R, g_j \in G \}. \]

Definition 4.5.5 (presentation (of a group))

Let \(A\) be a set and let \(R\) be a subset of the free group \(F(A)\). The group with presentation \[ \langle A \mid R \rangle = \langle A \mid \{r = e \mid r \in R\} \rangle \] is defined to be the quotient group \(F(A)/\langle R \rangle ^N\).

Example 4.5.6

Let \(A=\{x\}\) and consider \(R=\{x^n\}\). Then the group with presentation \(\langle A \mid R \rangle\) is the cyclic group of order \(n\): \[ C_n=\langle x \mid x^n=e\rangle = \frac{F(\{x\})}{\langle x^n\rangle ^N}=C_\infty/\langle x^n \rangle. \]

Example 4.5.7

Taking \(A=\{r,s\}\) and \(R=\{s^2,r^n,srsr\}\), \(\langle A \mid R \rangle\) is the usual presentation for \(D_{n}\): \[ D_{n}=\langle r,s \mid s^2=e,r^n=e,srsr=e \rangle =\frac{F\left( \{r,s\}\right)}{\{s^2,r^n,srsr\}^N}. \]

Theorem 4.5.8 (Universal mapping property of a presentation)

Let A be a set, let \(F(A)\) be the free group on A, let \(R\) be a subset of \(F(A)\), and let \(H\) be a group. Let \(g\!:A \to H\) be a function satisfying the property that whenever \(r = a_1^{i_1} \cdots a_m^{i_m} \in R\), with each \(a_j \in A, g_j \in G\) and \(i_j \in \{1,-1\}\), then \[ (g(a_1))^{i_1} \cdots (g(a_m))^{i_m} = e_H. \] Then there is a unique homomorphism \(\overline{f}\!: \langle A \mid R \rangle \to H\) satisfying \[ \overline f(a\langle R\rangle ^N) = g(a) \quad \textrm{for all } a \in A. \]

Proof (of Theorem 4.5.8)

By the universal mapping property free groups, there is a unique group homomorphism \(\tilde f:F(A)\to H\) such that \(f(a)=g(a)\) for all \(a\in A\). Then for \[ r = a_1^{i_1} \cdots a_m^{i_m} \in R \] we have \[ f(r)=(g(a_1))^{i_1} \cdots (g(a_m))^{i_m} = e_H, \] showing that \(R\subseteq \ker(f)\). Since \(\ker(f)\trianglelefteq F(A)\) and \(\langle R\rangle ^N\) is the smallest normal subgroup containing \(R\), it follows that \(\langle R\rangle^N\subseteq \ker(f)\). By the UMP for quotient groups, \(f\) induces a group homomorphism \(\overline{f}: G/\langle R\rangle ^N\to H\). Moreover, for each \(a\in A\) we have \[ g(a)=f(a)=\overline f(a\langle R\rangle ^N). \]

Remark 4.5.9

The universal property of a presentation says that to give a group homomorphism from a group \(G\) with a given presentation to a group \(H\) is the same as picking images for each of the generators that satisfy the same relations in \(H\) as those given in the presentation.

Example 4.5.10

To find a group homomorphism \(D_{n} \to \mathrm{GL}_2(\mathbb{R})\), it suffices to pick images for \(r\) and \(s\), say \(r\mapsto R, s\mapsto S\), and to verify that \[ S^2=I_2, \quad R^n=I_2, \quad SRSR=I_2. \] One can check that this does hold for the matrices \[ S=\begin{pmatrix} \cos{2\pi}{n} & -\sin{2\pi}{n} \\ \sin{2\pi}{n} & \cos{2\pi}{n}\end{pmatrix} \quad \textrm{and} \quad R=\begin{pmatrix} 0 & 1 \\ 1& 0\end{pmatrix}. \] By the UMP of the presentation there is a unique group homomorphism \(D_{n} \to \mathrm{GL}_2(\mathbb{R})\) that sends \(r\) to \(R\) and \(s\) to \(S\).

Presentations of groups are remarkably complex mathematical constructions. What makes them so complicated is that \(\langle R \rangle^N\) is very hard to calculate in general. The following theorem is a negative answer to what is know as the Word Problem, and illustrates how complicated the story can become:

Theorem 4.5.11 (Boone-Novikov)

There exists a finite set \(A\) and a finite subset \(R\) of \(F(A)\) such that there exists no algorithm that determines whether a given element of \(\langle A \mid R \rangle\) is equal to the trivial element.

5. Group actions in action

5.1 Orbits and Stabilizers

Let \(G\) be a group acting on a set \(S\). Let us recall some notation and facts about group actions. The orbit of an element \(s \in S\) is

\[ \mathrm{Orb}_G(s)=\{g\cdot s \mid g\in G\}. \]

A permutation representation of a group \(G\) is a group homomorphism \(\rho\!: G \to \mathrm{Perm}(S)\) for some set \(S\). By \(\)permutation representation\(\), to give an action of \(G\) on a set \(S\) is equivalent to giving a permutation representation \(\rho\!: G \to \mathrm{Perm}(S)\), which is induced by the action via

\[ \rho(g)(s) = g \cdot s. \]

An action is faithful if the only element \(g\in G\) such that \(g\cdot s=s\) for all \(s\in S\) is \(g=e_G\). Equivalently an action is faithful if \(\ker(\rho) = \{e_G\}\). An action is transitive if for all \(p,q \in S\) there is a \(g \in G\) such that \(q=g\cdot p\). Equivalently, an action is transitive if \(\mathrm{Orb}_G(p)=S\) for any \(p\in S\).

Definition 5.1.1

Let \(G\) be a group acting on a set \(S\). The stabilizer of an element \(s\) in \(S\) is the set of group elements that fix \(s\) under the action:

\[ \mathrm{Stab}_G(s)=\{g\in G \mid g\cdot s=s\}. \]

Definition 5.1.2

Let \(G\) be a group acting on a set \(S\). An element \(s \in S\) is a fixed point of the action if \(g \cdot s = s\) for all \(g \in G\).

Remark 5.1.3

Let \(G\) be a group acting on a set \(S\). An element \(s \in S\) is a fixed point if and only if \(\mathrm{Orb}_G(s) = \{ s \}\). Moreover, \(s\) is a fixed point if and only if \(\mathrm{Stab}_G(s) = G\).

The stabilizer of any element is always a subgroup of \(G\).

Lemma 5.1.4

Let \(G\) be a group acting on a set \(S\), and let \(s \in S\). The stabilizer \(\mathrm{Stab}_G(s)\) of \(s\) is a subgroup of \(G\).

Proof (of Lemma 5.1.4)

By definition of group action, \(e \cdot s = s\), so \(e \in \mathrm{Stab}_G(e)\). If \(x,y \in \mathrm{Stab}_G(s)\), then \((xy)s = x(ys) = xs = s\) and thus \(xy \in \mathrm{Stab}_G(s)\). If \(x \in \mathrm{Stab}_G(s)\), then

\[ xs = s \Rightarrow s = x^{-1}xs = x^{-1}s \Rightarrow x^{-1} \in \mathrm{Stab}_G(s). \]

Theorem 5.1.5 (Orbit-Stabilizer Theorem)

Let \(G\) be a group that acts on a set \(S\). For any \(s \in S\) we have

\[ |\mathrm{Orb}_G(s)| = [G: \mathrm{Stab}_G(s)]. \]

Proof (of Theorem 5.1.5)

Let \(\mathcal{L}\) be the collection of left cosets of \(\mathrm{Stab}_G(s)\) in \(G\). Let \(\alpha: \mathcal{L} \to \mathrm{Orb}_G(s)\) be given by

\[ \alpha(x \mathrm{Stab}_G(s)) = x \cdot s. \]

This function is well-defined and injective:

\[ x \mathrm{Stab}_G(s) = y \mathrm{Stab}_G(s)\iff x^{-1}y \in \mathrm{Stab}_G(s) \iff x^{-1}y \cdot s = s \iff y \cdot s = x \cdot s. \]

The function \(\alpha\) is surjective by definition of \(\mathrm{Orb}_G(s)\), and thus it is a bijection. Finally, we can now conclude that

\[ [G: \mathrm{Stab}_G(s)] = |\mathcal{L}| = |\mathrm{Orb}_G(s)|. \]

Corollary 5.1.6 (Orbit-Stabilizer Theorem part 2)

Let \(G\) be a finite group acting on a set \(S\). For any \(s \in S\) we have

\[ |G|=|\mathrm{Orb}_G(s)| \cdot |\mathrm{Stab}_G(s)|. \]

Proof (of Corollary 5.1.6)

This is a direct consequence of the Orbit-Stabilizer Theorem, since by Lagrange's Theorem

\[ [G: \mathrm{Stab}_G(s)]=|G|/|\mathrm{Stab}_G(s)|. \]

Remark 5.1.7 (Orbit Formula)

Let \(G\) be a group acting on a finite set \(S\). The orbits of the action form a partition of \(S\). The one-element orbits correspond to the fixed points of the action. Pick one element \(s_1, \ldots, s_m\) in each of the other orbits. This gives us the

The Orbit Formula: \(|S| = (\text{the number of fixed points}) + \sum_{i=1}^m |\mathrm{Orb}_G(s_i)|.\)

By the Orbit-Stabilizer Theorem, we can rewrite this as

The Stabilizer Formula: \(|S| = (\text{the number of fixed points}) + \sum_{i=1}^m [G: \mathrm{Stab}_G(s_i)].\)

We will later see that these are very useful formulas.

We can now use these simple facts to do some explicit calculations with groups.

Example 5.1.8

Let \(G\) be the group of rotational (orientation-preserving) symmetries of the cube. To count the number of elements of \(G\), think about an isometry as picking up a cube lying on a table, moving it, and placing it back in the same location. To do this, one must pick a face to place on the table. This can be chosen in 6 ways. Once that face is chosen, one needs to decide on where each vertex of that face goes and this can be done in 4 ways. Thus \(|G|=24\).

We can restrict the action of \(G\) to the four lines that join opposite vertices of the cube; the group of permutations of the four lines is \(S_4\), so the corresponding permutation representation associated to this action is a group homomorphism \(\rho\!: G \to S_4\).

We claim that this homomorphism \(\rho\) is actually an isomorphism from \(G\) to \(S_4\). To see this, first label each vertex of the cube \(1\) through \(8\). Let \(a\), \(b\), \(c\), and \(d\) denote each of the four lines, and let us also label the vertices of the cube \(a\), \(b\), \(c\), or \(d\) according to which of the diagonal lines goes through that vertex.

Cube with vertices labeled 1 through 8 A wireframe cube: front square lower-right, back square upper-left. Edges drawn; a dashed back-right vertical edge indicates hidden part. Vertices are labeled 1..8. 7 6 2 3 8 5 1 4
Vertices labeled \(1,\dots,8\).
Cube with vertices labeled by the four space diagonals a, b, c, d Same cube layout; each vertex is labeled by the letter of the space diagonal (a, b, c, or d) that passes through it. Opposite vertices share the same letter. a d c b c b a d
Vertices labeled by diagonals \(a,b,c,d\).

Now note that each face corresponds to a unique order on \(a\), \(b\), \(c\), \(d\), read counterclockwise from the outside of the cube:

\[ \begin{aligned} \text{The face } 1234 && \hspace{1em}\text{corresponds to} \hspace{1em} & adcb\\ \text{The face } 1256 && \hspace{1em}\text{corresponds to} \hspace{1em} & abdc \\ \text{The face } 1458 && \hspace{1em}\text{corresponds to} \hspace{1em} & adbc \\ \text{The face } 5678 && \hspace{1em}\text{corresponds to} \hspace{1em} & abcd \\ \text{The face } 2367 && \hspace{1em}\text{corresponds to} \hspace{1em} & adbc \\ \text{The face } 3478 && \hspace{1em}\text{corresponds to} \hspace{1em} & acdb. \end{aligned} \]

So suppose that \(g \in G\) fixes all of the four lines \(a\), \(b\), \(c\), \(d\). Then the face at the bottom must be \(abcb\), which corresponds to \(1234\), and thus all the vertices of the cube in the bottom face must be fixed. We conclude that \(g\) must fix the entire cube, and thus \(g\) must be the identity.

Thus the action is faithful, and hence the permutation representation \(\rho\!: G\to S_4\) is injective. Moreover, we showed above that \(|G|=24=|S_4|\), and thus \(\rho\) is an injective function between two finite sets of the same size. We conclude that \(\rho\) must actually be a bijection, and thus an isomorphism.

The same group \(G\) also acts on the six faces of the cube. This action is transitive, since we can always pick up the cube and put it back on the table with any face on the top. Thus the one and only orbit for the action of \(G\) on the six faces of the cube has length \(6\). By the Orbit-Stabilizer Theorem, it follows that for any face \(f\) of the cube, its stabilizer has index \(6\) and, since we already know that \(|G|= 24\), the Orbit-Stabilizer Theorem gives us

\[ |\mathrm{Stab}_G(f)| = \frac{|G|}{|\mathrm{Orb}_G(s)|} = \frac{24}{6} = 4. \]

Thus, there are four symmetries that map \(f\) to itself. Indeed, they are the \(4\) rotations by \(0\), \(\frac{\pi}{2}\), \(\pi\) or \(\frac{3\pi}{2}\) about the line of symmetry passing through the midpoint of \(f\) and the midpoint of the opposite face.

Example 5.1.9

Let \(X\) be a regular dodecahedron, with \(12\) faces, centered at the origin in \(\mathbb{R}^3\).

Let \(G\) be the group of isometries of the dodecahedron that preserve orientation: \[ G := \{\alpha : \mathbb{R}^3 \to \mathbb{R}^3 \mid \text{$\alpha$ is an isometry, $\alpha$ preserves orientation, and } \alpha(X) = X\}. \] This is a subgroup of the group of all bijections from \(\mathbb{R}^3\) to \(\mathbb{R}^3\). Though not obvious, every element of \(G\) is given as rotation about a line of symmetry. There are three kinds of such lines: those joining midpoints of opposite face, those joining midpoints of opposite edges, and those joining opposite vertices. To count the number of elements of \(G\) informally, think about an isometry as picking up a dodecaedron that was lying on a table and replacing it in the same location. To do this, one must first pick one of the twelve faces to place on the table, and, for each possible face, there are five ways to orient it. Thus \[ |G|=12 \cdot 5 = 60. \]

Let us use the Orbit-Stabilizer Theorem to do this more formally. Note that \(G\) act on the collection \(S\) of the \(12\) faces of \(X\). This action is transitive since it is possibly to move one face to any other via an appropriate rotation. So, the one and only orbit has length \(12\). Letting \(F\) be any one of the faces, the orientation preserving isometries of \(X\) that map \(F\) to itself are just the orientation-preserving elements of \(D_{5}\), of which there are \(5\). Indeed, these correspond to the five rotations of \(X\) by \(\frac{2 \pi n j}{5}\) radians for \(j =0,1,2,4\) about the axis of symmetry passing through the midpoint of \(F\) and the midpoint of the opposite face. Applying the Orbit-Stabilizer Theorem gives \[ |G|=|\mathrm{Orb}_G(F)|\cdot |\mathrm{Stab}_G(F)]=12 \cdot 5=60. \]

5.2 The class equation

The main goal of this subsection is to apply the Orbit-Stabilizer Formula to the action of \(G\) on itself by conjugation. Let \(G\) be a group. As we saw before, \(G\) acts on \(S = G\) by conjugation: the action is defined by \(g \cdot x=gxg^{-1}\).

Definition 5.2.1 (conjugate elements)

Let \(G\) be a group. Two elements \(g,g' \in G\) are conjugate if there exists \(h \in G\) such that \[ g' = hgh^{-1}. \] Equivalently, \(g\) and \(g'\) are conjugate if they are in the same orbit of the conjugation action. The conjugacy class of an element \(g \in G\) is \[ [g]_c := \{hgh^{-1} \mid h \in G\}. \] Equivalently, the conjugacy class of \(g\) is the orbit of \(g\) under the conjugation action.

Remark 5.2.2

Let \(G\) be any group. Then \(geg^{-1} = e\) for all \(g \in G\), and thus \([e]_c=e = \{ e \}\).

Let us study the conjugacy classes of \(S_n\). You proved in a problem set that two cycles in \(S_n\) are conjugate if and only if they have the same length:

Lemma 5.2.3

For any \(\sigma \in S_n\) and distinct integers \(i_1, \dots, i_p\), we have \[ \sigma (i_1 \, i_2 \, \cdots i_p) \sigma^{-1} = (\sigma(i_1) \, \cdots \, \sigma(i_p)). \]

Note that the right-hand cycle is a cycle since \(\sigma\) is injective. This generalizes to the following:

Theorem 5.2.4

Two elements of \(S_n\) are conjugate if and only if they have the same cycle type.

Proof (of Theorem 5.2.4)

Consider two conjugate elements of \(S_n\), say \(\alpha\) and \(\beta = \sigma\alpha \sigma^{-1}\). By earlier work, we may write \(\alpha\) as a product of disjoint cycles \(\alpha = \alpha_1 \cdots \alpha_m\). Then \[ \beta = \sigma\alpha \sigma^{-1} = (\sigma\alpha_1 \sigma^{-1}) \cdots (\sigma\alpha_m \sigma^{-1}). \] Since \(\alpha_1, \ldots, \alpha_m\) are disjoint cycles, the elements \((\sigma\alpha_1 \sigma^{-1}), \cdots, (\sigma\alpha_m \sigma^{-1})\) are also disjoint cycles, and \(\sigma\alpha_i \sigma^{-1}\) has the same length as \(\alpha_i\). We conclude that \(\alpha\) and \(\beta\) must have the same cycle type.

Conversely, consider two elements \(\alpha\) and \(\beta\) with the same cycle type. More precisely, assume \(\alpha = \alpha_1 \cdots \alpha_k\) and \(\beta = \beta_1 \cdots \beta_k\) are decompositions into disjoint cycles and that \(\alpha_i, \beta_i\) both have length \(p_i \geqslant 2\) for each \(i\). We need to prove that \(\alpha\) and \(\beta\) are conjugate. Let us start with the case \(k = 1\). Given two cycles of the same length, \[ \alpha = (i_1 \, \dots \, i_p) \quad \text{ and } \quad \beta = (j_1 \, \dots \, j_p). \] Any permutation \(\sigma\) such that \(\sigma(i_m) = j_m\) for all \(1 \leqslant m \leqslant p\) must satisfy \(\sigma\alpha \sigma^{-1} = \beta\).

Note that such \(\sigma\) has no restrictions on what it does to the set \(\{1, \dots, n\} \setminus \{i_1 \, \dots \, i_p\}\): it can map \(\{1, \dots, n\} \setminus \{i_1 \, \dots \, i_p\}\) bijectively to \(\{1, \dots, \} \setminus \{j_1 \, \dots \, j_p\}\) in any way possible. From this observation, the general case follows: since the cycles are disjoint, we can find a single permutation \(\sigma\) such that \(\sigma\alpha_i \sigma^{-1} = \beta_i\) for all \(i\).

Example 5.2.5

Given our computation on conjugates in the symmetric group, we can now write a complete list of the conjugacy classes of \(S_4\):

  1. The conjugacy class of the identity \(\{e\}\).
  2. The conjugacy class of \((12)\), which is the set of all two cycles and has \({4 \choose 2} = 6\) elements.
  3. The conjugacy class of \((123)\), which is the set of all three cycles and has \(4 \cdot 2 = 8\) elements.
  4. The conjugacy class of \((1234)\), which is the set of all four cycles and has \(3! = 6\) elements.
  5. The conjugacy class of \((12)(34)\), which is the set of all products of two disjoint \(2\)-cycles and has \(3\) elements.

We can check our work by recalling that the conjugacy classes partition \(S_4\), and indeed we counted \(24\) elements.

Example 5.2.6

Given our computation on conjugates in the symmetric group,, we can now write a complete list of the conjugacy classes of \(S_5\):

  1. The conjugacy class of the identity \(\{e\}\).
  2. The conjugacy class of \((12)\), which is the set of all \(2\)-cycles and has \({5 \choose 2} = 10\) elements.
  3. The conjugacy class of \((123)\), containing all \(3\)-cycles, of size \(2! \cdot {5 \choose 3} = 20\) elements.
  4. The conjugacy class of \((1234)\), containing all \(4\)-cycles, of size \(5 \cdot 3! = 30\) elements.
  5. The conjugacy class of \((12345)\), which is the set of all \(5\)-cycles, and has \(4! = 24\) elements.
  6. The conjugacy class of \((12)(34)\), which is the set of all products of two disjoint \(2\)-cycles and has \(5 \cdot 3= 15\) elements.
  7. The conjugacy class of \((12)(345)\), which is the set of all products of a \(2\)-cycle by a \(3\)-cycle, and has \({5 \choose 2} \cdot 2! = 20\) elements.

We can check our work by noting that indeed \[ 1 + 10 + 20 + 30 + 24 + 15 + 20 = 120 = 5!. \]

Remark 5.2.7

For any nontrivial group \(G\), since \([e]_c = \{ e \}\) and the conjugacy classes partition \(G\), then \([g]_c \neq G\) for all \(g \in G\).

Definition 5.2.8 (centralizer)

Let \(G\) be a group and \(a\in G\). The centralizer of \(a\) is the set of elements of \(G\) that commute with \(a\): \[ C_G(a) := \{x \in G \mid xa=ax\}. \] More generally, given a subset \(S \subseteq G\), the centralizer of \(S\) is the set \[ C_G(S) := \{x \in G \mid xs=sx \textrm{ for all } s \in S \}. \]

Definition 5.2.9 (normalizer)

Let \(G\) be a group and consider a subset \(S \subseteq G\). The normalizer of \(S\) is the set \[ N_G(S) := \{g \in G \mid gSg^{-1}=S\}. \]

Exercise 5.2.10

Let \(G\) be a group and \(S \subseteq G\). Prove that the centralizer and the normalizer of \(S\) are subgroups of \(G\).

Lemma 5.2.11

Let \(S \subseteq G\) be any subset of a group \(G\). Then \(C_G(S) \subseteq N_G(S)\).

Proof (of Lemma 5.2.11)

Let \(G\) be a group and \(S \subseteq G\). If \(x \in C_G(S)\), then for all \(s \in S\) we have \[ xs=sx \implies xsx^{-1} = s \in S \textrm{ and } x^{-1}sx = s. \] Thus \(xSx^{-1} \subseteq S\) and \(x^{-1}Sx \subseteq S\). Now for any \(s \in S\) we have \(x^{-1}sx \in S\) and \(s\) can be written as \[ s = x(x^{-1}sx)x^{-1} \in xSx^{-1}. \] This shows that \(S \subseteq xSx^{-1}\). Thus \(xSx^{-1} = S\), and therefore \(x \in N_G(S)\).

Remark 5.2.12

If \(G\) is an abelian group, then for any \(a \in G\) we have \(C_G(a) = G = N_G(a)\).

Exercise 5.2.13

Let \(H\) be a subgroup of a group \(G\), and \(S\) a subset of \(H\). Then \[ C_{H}(S) = C_G(S) \cap H \quad \textrm{and} \quad N_{H}(S) = N_G(S) \cap H. \]

Exercise 5.2.14

Let \(G\) be a group and let \(H\) be a subgroup of \(G\). Show that \(N_G(H)/C_G(H)\) is isomorphic to a subgroup of the automorphism group \(\mathrm{Aut}(H)\) of \(H\).

Exercise 5.2.15

Let \(G\) be a group and \(H\) a subgroup of \(G\). Prove that if \(H\) is normal in \(G\), then so is \(C_G(H)\), and that \(G/C_G(H)\) is isomorphic to a subgroup of the automorphism group of \(H\).

Lemma 5.2.16

Let \(G\) be a group. Consider the action of \(G\) on \(G\) by conjugation, where \(g \cdot h = ghg^{-1}\). For all \(g \in G\), \[ \mathrm{Orb}_G(g) = [g]_c \quad \textrm{and} \quad \mathrm{Stab}_G(g)=C_G(g) \quad \textrm{and} \quad |[g]_c| = [G : C_G(g)]. \]

Proof (of Lemma 5.2.16)

The first statement is the definition of the conjugacy class of \(g\): \(\mathrm{Orb}_G(g) = [g]_c\). Moreover, by simply following the definitions we see that \[ h \in \mathrm{Stab}_G(g) \iff h \cdot g = g \iff hgh^{-1} = g \iff hg = gh \iff h \in C_G(G). \] Thus, \(\mathrm{Stab}_G(g)=C_G(G)\), and by the Orbit-Stabilizer Theorem, \[ |[g]_c| = |\mathrm{Orb}_G(g)| = [G : C_G(g)]. \]

Exercise 5.2.17

Let \(G\) be a group. Consider the action of \(G\) on the power set \[ P(G)=\{S\mid S\subseteq G\} \] of \(G\) by conjugation, meaning \(g \cdot S = gSg^{-1}\). For all \(S \in P(G)\), \[ \mathrm{Stab}_G(S)=N_G(S) \quad \textrm{and} \quad |\mathrm{Orb}_G(S)| = [G : N_G(S)]. \]

Corollary 5.2.18

For a finite group \(G\), the size of any conjugacy class divides \(|G|\).

Proof (of Corollary 5.2.18)

Let \(g \in G\). The order of the conjugacy class of \(g\) is the index of the centralizer: \[ |[g]_c| = [G : C_G(g)] \] By Lagrange's Theorem, the index of any subgroup must divide \(|G|\), and thus in particular \(|[g]_c|\) divides \(|G|\).

We will take the orbit equation and apply it to the special case of the conjugation action. In order to do that, all that remains is to identify the fixed points of the action.

Lemma 5.2.19

Let \(G\) be a group acting on itself by conjugation. An element \(g \in G\) is a fixed point of the conjugation action if and only \(g \in \mathrm{Z}(G)\).

Proof (of Lemma 5.2.19)

\((\Leftarrow)\) Suppose that \(g \in \mathrm{Z}(G)\). Then for all \(h \in G\), \(g\) commutes with \(h\), and thus \[ hgh^{-1} = (hg)h^{-1} = g(hh^{-1}) = g. \] Thus \(g\) is conjugate to only itself, meaning it is a fixed point for the conjugation action.

\((\Rightarrow)\) Conversely, suppose that \(g\) is a fixed point for the conjugation action. Then for all \(h \in G\), \[ hgh^{-1} = h \cdot g = g \implies hg=gh. \] Thus \(g \in \mathrm{Z}(G)\).

We can now write the Orbit Equation for the conjugation action; this turns out to be a very useful formula.

Theorem 5.2.20 (The Class Equation)

Let \(G\) be a finite group. For each conjugacy class of size greater than \(1\), pick a unique representative, and let \(g_1,\ldots g_r \in G\) be the list of all the chosen representatives. Then \[ |G| = |\mathrm{Z}(G)| + \sum_i^r |G : C_G(g_i)|. \]

Proof (of Theorem 5.2.20)

The elements of \(\mathrm{Z}(G)\) are precisely the fixed points of the conjugation action. In particular, \(|\mathrm{Z}(G)|\) counts the number of orbits that have only one element. Because the orbits of the conjugation action partition \(G\), and the conjugacy classes are the orbits, then as noted in the orbit equation, \[ |G| = |\mathrm{Z}(G)| + \sum_i^r [g_i]_c. \] By the Orbit-Stabilizer Theorem, the index of the stabilizer is the order of the conjugacy class. Thus for each \(g_i\) as in the statement we have \[ [g_i]_c = [G: C_G(g_i)]. \] The class equation follows from substituting this into the equation above: \[ |G| = |\mathrm{Z}(G)| + \sum_i^r |G : C_G(g_i)|. \]

Remark 5.2.21

The class equation is not very interesting if \(G\) is abelian, since there is only one term on the right hand side: \(|\mathrm{Z}(G)|\).

But when \(G\) is nonabelian, the class equation can lead us to discover some very interesting facts, despite its simplicity.

Exercise 5.2.22

Prove that if \(G\) is a nonabelian group of order \(21\), then there is only one possible class equation for \(G\), meaning that the numbers appearing in the class equation are uniquely determined up to permutation.

Corollary 5.2.23

If \(p\) is a prime number and \(G\) is a finite group of order \(p^m\) for some \(m > 0\), then \(\mathrm{Z}(G)\) is not the trivial group.

Proof (of Corollary 5.2.23)

Let \(g_1,\ldots g_r \in G\) be a list of unique representatives of all of the conjugacy classes of \(G\) of size greater than 1, as in the class equation. By construction, each \(g_i\) is not a fixed point of the action, and thus \(\mathrm{Stab}_G(g_i) \neq G\). We have \(C_G(g_i) = \mathrm{Stab}_G(g_i)\), so \(C_G(g_i) \neq G\). In particular, \([G:C_G(g_i)]\neq 1\). Since \(1\neq [G:C_G(g_i)]\) and \([G:C_G(g_i)]\) divides \(|G|=p^m\), we conclude that \(p\) divides \([G:C_G(g_i)]\) for each \(i\). From the class equation, we can now conclude that \(p\) divides \(|\mathrm{Z}(G)|\), and in particular \(|\mathrm{Z}(G)|\neq 1\).

Exercise 5.2.24

Let \(p\) be prime and let \(G\) be a group of order \(p^m\) for some \(m \geqslant 1\). Show that if \(N\) is a nontrivial normal subgroup of \(G\), then \(N \cap Z(G) \ne \{e\}\). In fact, show that \(|N \cap Z(G)| = p^j\) for some \(j \geqslant 1\).

Lemma 5.2.25

Let \(G\) be a group and \(N\trianglelefteq G\). The conjugation action of \(G\) on itself induces an action by conjugation of \(G\) on \(N\). In particular, \(N\) is the disjoint union of some of the conjugacy classes in \(G\).

Proof (of Lemma 5.2.25)

Define the conjugation action of \(G\) on \(N\) by \(g\cdot n=gng^{-1}\) for all \(g\in G\) and \(n\in N\). Since \(N\trianglelefteq G\), this always gives us back an element of \(N\), and thus the action is well-defined. We can think of this action as a restriction of the action of \(G\) on itself by conjugation, and thus the two properties in the definition of an action hold for the action of \(G\) by conjugation on \(N\). Therefore, this is indeed an action. The orbits of elements \(n\in N\) under this action are the conjugacy classes \([n]_c\), and we have just shown that for all \(n \in N\), \([n]_c \subseteq N\). But every element in \(N\) belongs to some conjugacy class, thus the conjugacy classes of the elements of \(N\) partition \(N\).

Remark 5.2.26

The previous lemma says that the orbits of the conjugation action of \(G\) on a normal subgroup \(N\) are just the orbits of the conjugation action of \(G\) on itself that contain elements of \(N\) (and must thus be completely contained in \(N\)). In contrast, if \(N\) is a normal subgroup of \(G\), we can also consider the conjugation action of \(N\) on itself. If \(a\) and \(b\) are elements of \(N\) that are conjugate for the \(N\)-conjugation, then they must also be conjugate for the \(G\)-conjugation action, using the same element \(n \in N\) such that \(a =nbn^{-1}\). However, if \(a\) and \(b\) are conjugate for the \(G\)-conjugation, they might not necessarily be conjugate for the \(N\)-action, as all the elements \(g \in G\) such that \(a = gbg^{-1}\) could very well all be in \(G \setminus N\).

We will see examples of this in the next section, where we will study the special case of the alternating group.

5.3 The alternating group

Since \(A_n \leq S_n\), we know that if two elements of \(A_n\) are conjugate, then they have the same cycle type, as they are also conjugate elements of \(S_n\), and thus we can apply our computation of conjugacy classes on in \(S_n\). But there is no reason for the converse to hold: given \(\alpha, \beta \in A_n\) of the same cycle type, the elements \(\sigma \in S_n\) such that \(\sigma \alpha \sigma^{-1} = \beta\) might all belong to \(S_n \setminus A_n\). Indeed, we will see that this does happen in some cases.

Example 5.3.1

The two permutations \((123)\) and \((132)\) are not conjugates in \(A_3\), despite having the same cycle type and thus being conjugate in \(S_3\) by our computation of conjugacy classes on in \(S_n\). One can check this easily, for example, by conjugating \((123)\) by the \(3\) elements in \(A_3\).

Lemma 5.3.2

Let \(\sigma\) be an \(m\)-cycle in \(S_n\). Then \[ \sigma \in A_n \iff m \textrm{ is odd}. \]

Proof (of Lemma 5.3.2)

Recall from earlier that, \[ (i_1 \, i_2 \, \cdots \, i_m) = (i_1 \, i_m) (i_1 \, i_{m-1}) (i_1 \, i_3) (i_1 \, i_2) \] is a product of \(m-1\) transpositions. Thus \(\sigma\) is even if and only if \(m-1\) is even.

Lemma 5.3.3 (Conjugacy classes of \(A_5\))

The conjugacy classes of \(A_5\) are given by the following list:

  1. The singleton \(\{e\}\) is a conjugacy class.
  2. The conjugacy class of \((1 \, 2 \, 3 \, 4 \, 5)\) in \(A_5\) has \(12\) elements.
  3. The conjugacy class of \((2 \, 1 \, 3 \, 4 \, 5)\) in \(A_5\) has \(12\) elements, and it is disjoint from the conjugacy class of \((1 \, 2 \, 3 \, 4 \, 5)\).
  4. The collection of all three cycles, of which there are \(20\), forms a conjugacy class in \(A_5\).
  5. The collection of all products of two disjoint transpositions, of which there are \(15\), forms one conjugacy class in \(A_5\).

As a reality check, note that \(12 + 12 + 20 + 15 + 1 = 60 = |A_5|\).

Proof (of Lemma 5.3.3)

By the work above, the cycle types of elements of \(A_5\) are

  • five cycles, of which there are \(4! = 24\),
  • three cycles, of which there are \({5 \choose 3} 2 = 20\),
  • products of two disjoint transpositions, of which there are \(5 \cdot 3 = 15\), and
  • the unique \(1\)-cycle \(e\), and indeed \([e]_c = \{e\}\).

By our computation of conjugacy classes on in \(S_n\), we know that two permutations are conjugate in \(S_5\) if and only if they have the same cycle type. It follows that the conjugacy classes in \(A_5\) form a subset of the cycles types. The statement we are trying to prove asserts that the set of five cycles breaks apart into two conjugacy classes in \(A_5\), whereas in all the other cases, the conjugacy classes remain whole.

Claim: Fix a \(5\)-cycle \(\sigma\). The conjugacy class of \(\sigma\) in \(A_5\) has \(12\) elements.

By Lagrange's Theorem, \[ |C_{S_5}(\sigma)| = \frac{|S_5|}{[S_5 : C_{S_5}(\sigma)]}. \] Thus, \[ [S_5 : C_{S_5}(\sigma)] = |[\sigma]_c|. \] By our computation of conjugacy classes on in \(S_n\), this is the number of \(5\)-cycles in \(S_5\), which is \(4!\). Thus \[ |C_{S_5}(\sigma)| = \frac{5!}{4!} = 5. \] Since every power of \(\sigma\) commutes with \(\sigma\), and there are \(5\) such elements, we conclude that \[ C_{S_5}(\sigma) = \{e, \sigma, \sigma^2, \sigma^3, \sigma^4\}. \] But these are all in \(A_5\), and thus by an earlier lemma we conclude that \[ C_{A_5}(\sigma) = C_{S_5}(\sigma) \cap A_5 = \{e, \sigma, \sigma^2, \sigma^3, \sigma^4\}. \] By the Orbit-Stabilizer Theorem and Lagrange's Theorem, \[ \text{ the size of the conjugacy class of \(\sigma\) in \(A_5\)} = [A_5: C_{A_5}(\sigma)] = \frac{|A_5|}{|C_{A_5}(\sigma)|} = \frac{60}{5} = 12. \] This proves the claim.

We have now shown that the conjugacy class of each \(5\)-cycle has \(12\) elements, and all twenty-four \(5\)-cycles are in \(A_5\). Thus there are two conjugacy classes of \(5\)-cycles in \(A_5\). This shows that \(\sigma\) is only conjugate in \(A_5\) to half of the five cycles. If we pick two \(5\)-cycles \(\sigma\) and \(\tau\) that are not conjugate in \(A_5\), then \(\tau\) is conjugate to exactly \(12\) elements, which must be exactly the other \(5\)-cycles that \(\sigma\) is not conjugate to.

One can see that in fact \((1 \, 2 \, 3 \, 4 \, 5)\) and \((2 \, 1 \, 3 \, 4 \, 5)\) are not conjugate. While they are conjugate in \(S_5\), it is via the element \((1 \,2)\), which is not in \(A_5\). Suppose that \(\alpha \in S_5\) is such that \[ \alpha (2 \, 1 \, 3 \, 4 \, 5) \alpha^{-1} = (1 \, 2 \, 3 \, 4 \, 5). \] Note that \(\tau = \alpha (1 \, 2)\) satisfies \[ \begin{aligned} \tau (1 \, 2 \, 3 \, 4 \, 5) & = \alpha (1 \, 2) (1 \, 2 \, 3 \, 4 \, 5) \\ & = \alpha (2 \, 1 \, 3 \, 4 \, 5) \\ & = (2 \, 1 \, 3 \, 4 \, 5) \alpha \\ & = (1 \, 2 \, 3 \, 4 \, 5) (1 \, 2) \alpha \\ & = (1 \, 2 \, 3 \, 4 \, 5) \tau. \end{aligned} \] Thus \(\alpha (1 \, 2) \in C_{S_5}(1 \, 2 \, 3 \, 4 \, 5)\), or equivalently, \[ \alpha \in (1 \, 2) \cdot C_{S_5}(2 \, 1 \, 3 \, 4 \, 5). \] But note that we just proved that every element in \(C_{S_5}(2 \, 1 \, 3 \, 4 \, 5)\) is in \(A_5\), and thus even; this shows that every element in the coset \[ (1 \, 2) \cdot C_{S_5}(2 \, 1 \, 3 \, 4 \, 5) \] is odd (as we multiplied by one transposition), and thus there are no such \(\alpha\) in \(A_5\). This proves (1) and (2).

Claim: All \(20\) three cycles are conjugate in \(A_5\).

Given two three cycles \((a \, b \, c)\) and \((d \, e \, f)\) in \(S_5\), we already know that they are both in \(A_5\) and that there is a \(\sigma \in S_5\) such that \[ \sigma (a \, b \, c) \sigma^{-1} = (d \, e \, f). \] If \(\sigma \notin A_5\), let \(\{1, \dots, 5\} \setminus \{a,b,c\} = \{ x, y\}\). Then \(\sigma\) is a product of an odd number of transpositions, so \(\sigma \cdot (x \, y) \in A_5\). Moreover, since \((x \, y)\) and \((a \, b \, c)\) are disjoint cycles, so they must commute, so that \[ (x \, y) (a \, b \, c) (x \, y))^{-1} = (a \, b \, c). \] Therefore, \[ (\sigma \cdot (x \, y)) (a \, b \, c) (\sigma \cdot (x \, y))^{-1} = (d \, e \, f), \] so \((a \, b \, c)\) and \((d \, e \, f)\) are still conjugate in \(S_5\). This proves the claim.

Claim: All products of two disjoint transpositions are conjugate in \(A_5\).

Set \(\alpha = (1 \, 2)(3 \, 4)\). The conjugacy class of \(\alpha\) in \(S_5\) consists of all the products of two disjoint two-cycles, and there are \(15\) such elements. By the Orbit-Stabilizer Theorem, \[ 15 = |\text{ the conjugacy class of \(\alpha\) in \(S_5\) }| = [S_5: C_{S_5}(\alpha)]=\frac{120}{\left|C_{S_5}(\alpha)\right|}. \] Thus \[ \left|C_{S_5}(\alpha)\right| = \frac{120}{15} = 8. \] Since \(\alpha\) commutes with \(e\), \(\alpha\), \((1 \, 3)(2 \, 4)\) and \((1 \, 4)(2 \, 3)\) and each of these belongs to \(A_5\), we must have \(|C_{A_5}(\alpha)| \geqslant 4\). Since \[ C_{A_5}(\alpha) = C_{S_5}(\alpha) \cap A_5, \] it follows that \(|C_{A_5}(\alpha)|\) must divide both \(8\) and \(60\), and so must be \(1\), \(2\) or \(4\). We conclude that \(|C_{A_5}(\alpha)| = 4\). Thus \(\alpha\) is conjugate in \(A_5\) to \(60/4 = 15\) elements. Since there are \(15\) products of disjoint two-cycles, they must all be conjugate to \(\alpha\), and thus the conjugacy class of \(\alpha\) in \(A_5\) is still the set of all \(2\)-cycles.

Now that we have completely calculated all the conjugacy classes of \(A_5\), our hard work will pay off: we can now prove a very important result in group theory.

Definition 5.3.4 (simple group)

A nontrivial group \(G\) is simple if it has no proper nontrivial normal subgroups.

Exercise 5.3.5

Let \(p\) be prime. Show that \(\mathbb{Z}/p\) is a simple group.

Theorem 5.3.6

The group \(A_5\) is a simple group.

Proof (of Theorem 5.3.6)

Suppose \(N \trianglelefteq A_5\). By Lagrange's Theorem, \(|N|\) divides \[ |A_5| = \frac{5!}{2} = 60. \] Then \(A_5\) has only four nontrivial conjugacy classes, and they have order \(12\), \(12\), \(15\), and \(20\). Since \(N\) is normal, it is a union of conjugacy classes of \(A_5\). Thus \[ |N| = 1 + \text{ the sum of a sublist of the list } 20, 12, 12, 15. \] By checking the relatively small number of cases we see that \(|N| = 1\) or \(|N| = 60\) are the only possibilities, as the remaining options do not divide \(60\).

In fact, \(A_n\) is simple for all \(n \geqslant 5\), but we will not prove this. In contrast, \(A_4\) is not simple: we have seen in class that it has a normal subgroup with four elements consisting of the identity and the three products of two disjoint transpositions.

Example 5.3.7

The alternating group \(A_3\) is simple and abelian since it has order \(3\).

Both \(A_1\) and \(A_2\) are the trivial group.

Theorem 5.3.8

Let \(n \geqslant 3\). The alternating group \(A_n\) is simple if and only if \(n \neq 4\).

In fact, one can show that \(A_5\) is the smallest nonabelian simple group, having \(60\) elements. This we will also not prove.

5.4 Other group actions with applications

Let's discuss a couple other group actions that often lead to useful information about the group doing the acting. The first one arises from the action of a group on the collection of left cosets of one of its subgroups. More precisely, let \(G\) be a group and \(H\) a subgroup, and let \(\mathcal{L}\) denote the collection of left cosets of \(H\) in \(G\): \[ \mathcal{L} = \{xH \mid x \in G\}. \] When \(H\) is normal, \(\mathcal{L}\) is the quotient group \(\mathcal{L} = G/H\), but note that we are not assuming that \(H\) is normal. Then \(G\) acts on \(\mathcal{L}\) via the rule \[ g \cdot (xH) := (gx)H. \] This action is transitive: for all \(x\), \[ xH = x \cdot (eH). \] The stabilizer of the element \(H \in \mathcal{L}\) is \[ \mathrm{Stab}_G(H) = \{x \in G \mid xH = H\} = H, \] which is consistent with the Orbit-Stabilizer Theorem, as indeed \[ \mathrm{Orb}_G(H) = \mathcal{L}, \quad \textrm{so } |\mathrm{Orb}(H)| = |\mathcal{L}| = [G : H], \] while \[ \mathrm{Stab}_G(H) = H, \quad \textrm{so } [G: \mathrm{Stab}_G(H)] = [G : H]. \]

As with any group action, this action induces a homomorphism \(\rho\!: G \to \mathrm{Perm}(\mathcal{L})\) where for any \(g\), it is given by

\[ \begin{aligned} \rho(g):\ &\mathcal{L} \longrightarrow \mathcal{L}\\ &xH \longmapsto (gx)H. \end{aligned} \]

If \(n = [G :H] = |\mathrm{Perm}(\mathcal{L})|\) is finite, then we have a homomorphism \(\rho\!: G \to S_n\).

Lemma 5.4.1

Let \(G\) be a group and \(H\) a subgroup of \(G\). Consider the action of \(G\) on the set \(\mathcal{L}\) of left cosets of \(H\), and the corresponding permutation representation \(\rho\!: G \to \mathrm{Perm}(\mathcal{L})\). Then \[ \ker(\rho) = \bigcap_{x \in G} xHx^{-1}. \] In particular, \(\ker(\rho) \subseteq H\).

Note that \(\displaystyle\bigcap_{x \in G} xHx^{-1}\) is the largest normal subgroup of \(G\) contained in \(H\).

Proof (of Lemma 5.4.1)

Note that

\[ \begin{aligned} g \in \ker(\rho) & \iff (gx)H = xH \textrm{ for all } x \in G \\ & \iff x^{-1}gx \in H \text{ for all } x \in G \\ & \iff g \in xHx^{-1} \text{ for all } x \in G. \end{aligned} \]

Thus

\[ \ker(g) = \bigcap_{x \in G} xHx^{-1}. \]

Since \(eHe^{-1} = H\), we conclude that \(\ker(g) \subseteq H\).

Remark 5.4.2

The action of \(G\) on the left cosets of \(H\) might be faithful or not. The Lemma above says that the action is faithful if and only if \[ \bigcap_{x \in G} x H x^{-1} = \{ e \}. \] If \(H\) is a normal subgroup of \(G\), then in fact \[ \bigcap_{x \in G} x H x^{-1} = H, \] and thus the action is not faithful unless \(H = \{e\}\).

Remark 5.4.3

Consider the subgroup \(H = \langle (12) \rangle\) of \(S_3\). The action of \(S_3\) on the left cosets of \(H\) is faithful: for example, taking \(\sigma = (13)\) we have \[ \sigma H \sigma^{-1} = \{ e, (12)(13) \} = \{ e, (23) \}, \] and thus the permutation representation \(\rho\!: S_3 \to S_3\) associated with the action has \[ \ker \rho \subseteq \sigma H \sigma^{-1} \cap H = \{ e \}. \]

Theorem 5.4.4

Let \(G\) be a finite group and \(H\) a subgroup of index \(p\), where \(p\) is the smallest prime divisor of \(|G|\). Then \(H\) is normal.

Proof (of Theorem 5.4.4)

The action of \(G\) on the set of left cosets of \(H\) in \(G\) by left multiplication induces a homomorphism \(\rho\!: G \to S_p\). By the Lemma above, its kernel \(N := \ker(\rho)\) is contained in \(H\). By the First Isomorphism Theorem, \[ [G:N] = |G/N| = |\mathrm{im}(f)|. \] By Lagrange's Theorem, since \(\mathrm{im}(f)\) is a subgroup of \(S_p\) then \([G:N] = |\mathrm{im}(f)|\) divides \(|S_p|=p!\). On the other hand, \([G:N]\) divides \(|G|\) by Lagrange's Theorem. Since \([G:N]\) divides both \(|G|\) and \(p!\), it must divide \(\gcd(|G|,p!)\). Since \(p\) is the smallest prime divisor of \(G\), we must have \[ \gcd(|G|, p!) = p. \] It follows that \([G:N]\) divides \(p\), and hence \([G:N] =1\) or \([G:N] = p\). But \(N \subseteq H\), and \(H\) is a proper subgroup of \(G\), so \(N \neq G\), and thus \([G:N] \neq 1\). Therefore, we conclude that \([G:N] = p\). Since \(N \subseteq H\) and \([G:H] = p = [G : N]\), we conclude that \(H = N\). In particular, \(H\) must be a normal subgroup of \(G\).

This generalizes an earlier exercise, which says that any subgroup of index \(2\) is normal.

Another interesting action arises from the following: Let \(G\) be a group and let \[ \mathcal{S}(G) = \{H \mid H \leq G \} \] be the collection of all subgroups of \(G\). Then \(G\) acts on \(\mathcal{S}\) by \[ g \cdot H = gHg^{-1}. \]

Definition 5.4.5 (conjugate subgroups)

Two subgroups \(A\) and \(B\) of a group \(G\) are conjugate if there exists \(g \in G\) such that \(A = gBg^{-1}\).

Equivalently, two subgroups are conjugate if they are in the same orbit by the following group action: the action of \(G\) on the set of its subgroups by conjugation.

Exercise 5.4.6

Let \(G\) be a group and let \[ \mathcal{S}(G) = \{H \mid H \leq G \}. \] Check that the rule \[ g \cdot H = gHg^{-1} \] defines an action of \(G\) on \(\mathcal{S}(G)\). Moreover, prove that given any subgroup \(H\) of \(G\), the stabilizer of \(H\) is given by \(N_G(H)\).

The normalizer \(N_G(H)\) is the largest subgroup of \(G\) that contains \(H\) as a normal subgroup, meaning that \(H \trianglelefteq N_G(H)\).

Exercise 5.4.7

Let \(G\) be a group and \(H\) be a subgroup of \(G\). Show that if \(K\) is any subgroup of \(G\) such that \(H \trianglelefteq K\), then \(K \leq N_G(H)\). In particular, \(H \trianglelefteq G\) if and only if \(N_G(H) = G\).

Lemma 5.4.8

Let \(G\) be a group and \(H\) be a subgroup of \(G\). The number of subgroups of \(G\) that are conjugate to \(H\) is equal to \([G: N_G(H)]\).

Proof (of Lemma 5.4.8)

The number of subgroups of \(G\) that are conjugate to \(H\) is just the size of the orbit of \(H\) under the action of \(G\) by conjugation on the set of subgroups of \(G\). By the Orbit-Stabilizer Theorem, the number of elements in the orbit of \(H\) is the index of the stabilizer. Finally, by above, the stabilizer of \(H\) is \(N_G(H)\).

Here is an application of this action:

Lemma 5.4.9

If \(G\) is finite and \(H\) is a proper subgroup of \(G\), then \[ G \neq \bigcup_x xHx^{-1}. \]

Proof (of Lemma 5.4.9)

First, suppose that \(H\) is a normal. Then \(H = xHx^{-1}\) for all \(x \in G\), so \[ \bigcup_x xHx^{-1} = H \neq G. \] Now assume that \(H\) is not normal, so that \(N_G(H) \neq G\) and \([G: N_G(H)] \geqslant 2\). By Lemma 5.4.8, we have \(|H| = |xHx^{-1}|\) for all \(x\). Since there are \([G: N_G(H)]\) conjugates of \(H\) by Lemma 5.4.8, and since \(e \in xHx^{-1}\) for all \(x\), we get \[ \left| \, \bigcup_x xHx^{-1} \right| \leqslant [G:N_G(H)] \cdot |H|. \] But in fact, this calculation can be improved, as there are at least two distinct conjugates of \(H\) and \(e\) is an element of all of them. This gives us \[ \left| \, \bigcup_x xHx^{-1} \right| \leqslant [G:N_G(H)] \cdot |H| - 1. \] But \(H \subseteq N_G(H)\) and so \([G: N_G(H)] \leq [G:H]\). We conclude that \[ \left| \, \bigcup_x xHx^{-1} \right| \leqslant [G:H] \cdot |H| - 1 = |G| - 1. \]

Since \(|H| = |xHx^{-1}|\) for all \(x \in G\), we can fix a natural number \(n\), set \[ \mathcal{S}_n(G) := \{H \mid H \leq G \text{ and } |H| = n\}, \] and consider the action of \(G\) on \(\mathcal{S}_n(G)\) by conjugation. This idea will be exploited in the next section.

Exercise 5.4.10

Show that if \(G\) is a finite group acting transitively on a set \(S\) with at least two elements, then there exists \(g \in G\) with no fixed points, meaning \(g \cdot s \neq s\) for all \(s \in S\).

6. Sylow Theory

Sylow Theory is a very powerful technique for analyzing finite groups of relatively small order. One aspect of Sylow theory is that it allows us to deduce, in certain special cases, the existence of a unique subgroup of a given order, and thus it allows one to construct a normal subgroup.

6.1 Cauchy’s Theorem

We start by proving a very powerful statement: that every finite group whose order is divisible by \(p\) must have an element of order \(p\).

Theorem 6.1.1 (Cauchy’s Theorem)

If \(G\) is a finite group and \(p\) is a prime number dividing \(|G|\), then \(G\) has an element of order \(p\). In fact, there are at least \(p-1\) elements of order \(p\).

Proof (of Theorem 6.1.1)

Let \(S\) denote the set of ordered \(p\)-tuples of elements of \(G\) whose product is \(e\):

\[ S = \{(x_1, \dots, x_p) \mid x_i \in G \textrm{ and } x_1 x_2 \cdots x_p = e\}. \]

Consider

\[ G^{\, p-1} := \underbrace{G\times \dots\times G}_{p-1 \text{ factors }} \]

and the map

\[ \begin{aligned} \phi :\; G^{\,p-1} &\longrightarrow S\\ (x_1, \dots, x_{p-1}) &\longmapsto \bigl(x_1, \dots, x_{p-1},\, x_{p-1}^{-1} \cdots x_{1}^{-1}\bigr). \end{aligned} \]

Given the definition of \(S\), the map \(\phi\) does indeed land in \(S\). Moreover, \(\phi\) is bijective since the map \(\psi\!: S \to G^{p-1}\) given by

\[ \psi(x_1, \dots, x_{p}) = (x_1, \dots, x_{p-1}) \]

is a two-sided inverse of the map above. Therefore, \(|S| = |G^{p-1}| = |G|^{p-1}\).

Let \(C_p\) denote cyclic subgroup of \(S_p\) of order \(p\) generated by the \(p\)-cycle

\[ \sigma = (1 \, 2 \, \cdots \, p). \]

The following rule gives an action of \(C_p\) on \(S\):

\[ \sigma^i \cdot (x_1, \dots, x_p) := (x_{\sigma^i(1)}, \dots, x_{\sigma^i(p)}) = (x_{1+i}, x_{2+i}, \dots, x_{p+i}), \]

where the indices are taken modulo \(p\). We should check that this is indeed an action. On the one hand, \(\sigma^0\) is the identity map, so

\[ e \cdot (x_1, \dots, x_p) =\sigma^0 \cdot (x_1, \dots, x_p) = (x_{\sigma^0(1)}, \dots, x_{\sigma^0(p)})=(x_1, \dots, x_p). \]

Moreover,

\[ \sigma^i \cdot\left( \sigma^j \cdot (x_1, \dots, x_p)\right)= \sigma^i \cdot (x_{1+j}, x_{2+j}, \dots, x_{p+j})= (x_{1+j+i}, x_{2+j+i}, \dots, x_{p+j+i}), \]

while

\[ (\sigma^i \sigma^j) \cdot (x_1, \dots, x_p)=\sigma^{i+j} \cdot (x_1, \dots, x_p)=(x_{1+i+j}, x_{2+i+j}, \dots, x_{p+i+j}). \]

Thus

\[ \sigma^i \cdot\left( \sigma^j \cdot (x_1, \dots, x_p)\right)=(\sigma^i \sigma^j) \cdot (x_1, \dots, x_p), \]

and we have shown that this is indeed an action.

Now let us consider the fixed points of this action. If

\[ \sigma \cdot (x_1, \dots, x_p) = (x_1, \dots, x_p), \]

then \(x_{i+1}=x_i\) for \(1 \leqslant i \leqslant p\), so it follows that

\[ x_1 = x_2 = \cdots = x_p. \]

Thus if \(\sigma \cdot (x_1, \dots, x_p) = (x_1, \dots, x_p)\), then \((x_1, \dots, x_p)\) corresponds to an element \(x\) such that \(x^p = x_1 \cdots x_p = e\). On the other hand, if \(\sigma\) fixes \((x_1, \dots, x_p)\), then so does any element of \(C_p\). Therefore, a fixed point for this action corresponds to an element \(x\) such that \(x^p = e\). The element \((e, e, \dots, e)\) is a fixed point. Any other fixed point, meaning an orbit of size one, corresponds to an element of \(G\) order \(p\), thus we wish to show that there is at least one fixed point besides \((e, \dots, e)\).

By the Orbit-Stabilizer Theorem, the size of every orbit divides \(|C_p| = p\). Since \(p\) is prime, every orbit for this action has size \(1\) or \(p\). By the Orbit Equation,

\[ |S| = \# \text{ fixed points } + p \cdot \# \text{ orbits of size } p \]

Since \(p\) divides \(|S|\), we conclude that \(p\) divides the number of fixed points. We already know that there is at least one fixed point, \((e, \dots, e)\). Thus there must be at least one other fixed point; in fact, at least \(p-1\) others, since the number of fixed points must then be at least \(p\).

We now know that if \(p\) divides \(|G|\), then \(G\) has an element of order \(p\). However, this is not true if \(n\) divides \(|G|\) but \(n\) is not prime. In fact, \(G\) may not even have any subgroup of order \(n\).

Exercise 6.1.2

Prove that the converse to Lagrange’s Theorem is false: find a group \(G\) and an integer \(d>0\) such that \(d\) divides the order of \(G\) but \(G\) does not have any subgroup of order \(d\).

6.2 The Main Theorem of Sylow Theory

Definition 6.2.1

Let \(G\) be a finite group and \(p\) a prime. Write the order of \(G\) as \(|G| = p^e m\) where \(p \nmid m\). A \(p\)-subgroup of \(G\) is a subgroup of \(G\) of order \(p^k\) for some \(k\). A Sylow \(p\)-subgroup of \(G\) is a subgroup \(H \leq G\) such that \(|H| = p^e\).

Thus a Sylow \(p\)-subgroup of \(G\) is a subgroup whose order is the highest conceivable power of \(p\) according to Lagrange’s Theorem.

Definition 6.2.2

We will denote the collection of all Sylow \(p\)-subgroups of \(G\) by \(\mathrm{Syl}_p(G)\).

This is, of course, not very interesting unless \(e>0\). Nevertheless, we allow that case.

Remark 6.2.3

When \(p\) does not divide \(|G|\), we have \(e = 0\) and \(G\) has a unique Sylow \(p\)-subgroup, namely \(\{e\}\), which indeed has order \(p^0=1\).

Note that even if \(p\) does divide \(|G|\), it is a priori possible that \(n_p = 0\) for some groups \(G\) and primes \(p\). We will prove this is not possible, and that is actually one of the hardest things to prove to establish Sylow theory.

Example 6.2.4

Let \(p>2\) be a prime and consider the group \(D_p\). The subgroup \(\langle r \rangle\) is a Sylow \(p\)-subgroup, as it has order \(p\) and \(|D_p|=2p\). In fact, this is the only Sylow \(p\)-subgroup of \(D_{p}\), as by Exercise, every group of order \(p\) is cyclic, and the only elements of order \(p\) in \(D_p\) are \(r\) and its powers.

In \(D_{n}\) for \(n\) odd, each of the subgroups \(\langle sr^j \rangle\), for \(j = 0, \dots, n-1\) is a Sylow \(2\)-subgroup. Since \(n\) is odd, only the reflections have order \(2\), and we have listed all the subgroups generated by reflections, so we conclude that the number of Sylow \(2\)-subgroups is \(n\).

Example 6.2.5

If \(G\) is cyclic of finite order, there is a unique Sylow \(p\)-subgroup for each \(p\), since by the structure theorem for cyclic groups there is a unique subgroup of each order that divides \(|G|\): if \(G = \langle x \rangle\) and \(|x| = p^e m\) with \(p \nmid m\), then the unique Sylow \(p\)-subgroup of \(G\) is \(\langle x^m \rangle\).

Let \(G\) be a finite group and \(p\) is a prime that divides \(|G|\). Then \(G\) acts on its Sylow \(p\)-subgroups of \(G\) via conjugation. As of now, for all we know, this might be the action on the empty set. Sylow Theory is all about understanding this action very well. Before we can prove the main theorem, we need a technical lemma.

Lemma 6.2.6

Let \(G\) be a finite group, \(p\) a prime, \(P\) a Sylow \(p\)-subgroup of \(G\), and \(Q\) any \(p\)-subgroup of \(G\). Then \(Q \cap N_G(P) = Q \cap P\).

Proof (of Lemma 6.2.6)

\((\subseteq)\) Since \(P \leq N_G(P)\), then \(Q \cap P \leq Q \cap N_G(P)\).

\((\supseteq)\) Let \(H := Q \cap N_G(P)\). Since \(H \subseteq N_G(P)\), then \(PH=HP\), so we get that \(PH\) is a subgroup of \(G\). By the Diamond Isomorphism Theorem, we have

\[ |PH| = \frac{|P| \cdot |H|}{|P \cap H|} \]

and since each of \(|P|\), \(|H|\), and \(|P \cap H|\) is a power of \(p\), we conclude that the order of \(PH\) is also a power of \(p\). In particular, \(PH\) is a \(p\)-subgroup of \(G\). On the other hand, \(P \leq PH\) and \(P\) is already a \(p\)-subgroup of the largest possible order, so we must have \(P = PH\). Note that \(H \leq PH\) always holds. We conclude that \(H \leq P\) and thus \(H \leq Q \cap P\).

Theorem 6.2.7 (Main Theorem of Sylow Theory)

Let \(p\) be prime. Assume \(G\) is a group of order \(p^e m\), where \(p\) is prime, \(e \geqslant 0\), and \(\gcd(p,m) = 1\).

  1. There exists at least one Sylow \(p\)-subgroup of \(G\). In short, \(\mathrm{Syl}_p(G) \neq \emptyset\).
  2. If \(P\) is a Sylow \(p\)-subgroup of \(G\) and \(Q \leq G\) is any \(p\)-subgroup of \(G\), then \(Q \leq gPg^{-1}\) for some \(g \in G\). Moreover, any two Sylow \(p\)-subgroups are conjugate and the action of \(G\) on \(\mathrm{Syl}_p(G)\) by conjugation is transitive.
  3. We have
    \[ | \mathrm{Syl}_p(G)| \equiv 1 \mod{p}. \]
  4. For any \(P \in \mathrm{Syl}_p(G)\),
    \[ |\mathrm{Syl}_p(G)| = [G: N_G(P)], \]

    and hence

    \[ | \mathrm{Syl}_p(G)| \text{ divides } m. \]

Proof (of Theorem 6.2.7)

First we will prove \(G\) contains a subgroup of order \(p^e\) by induction on \(|G| = p^em\).

When \(|G| = 1\), \(\{e\}\) is a Sylow \(p\)-subgroup by our convention. In fact, this argument applies for whenever \(e = 0\), so we may thus assume through the rest of the proof that \(p\) does divide \(|G|\). So suppose that \(p\) divides \(|G|\) and every group of order \(n < |G|\) has a Sylow \(p\)-subgroup. We will consider two cases, depending on whether \(p\) divides \(|\mathrm{Z}(G)|\).

If \(p\) divides \(|\mathrm{Z}(G)|\), then by Cauchy’s Theorem there is an element \(z \in \mathrm{Z}(G)\) of order \(p\). Set \(N := \langle z \rangle\). Since \(z \in \mathrm{Z}(G)\), then for all \(g \in G\) we have

\[ gz^ig^{-1} = z^i \in N, \]

and thus \(N \trianglelefteq G\). Since

\[ |G/N| = \frac{|G|}{|N|} = \frac{p^em}{p} = p^{e-1}m, \]

by induction hypothesis \(G/N\) has a subgroup of order \(p^{e-1}\), which must then have index \(m\). By the Lattice Isomorphism Theorem, this subgroup corresponds to a subgroup of \(G\) of index \(m\), hence of order \(p^e\).

Now assume \(p\) does not divide \(|\mathrm{Z}(G)|\), and consider the class equation for \(G\): \(g_1, \dots, g_k\) are a complete list of noncentral conjugacy class representatives, without repetition of any class, we have

\[ |G| = |\mathrm{Z}(G)| + \sum_{i=1}^k [G: C_G(g_i)]. \]

Suppose that \(p\) divides \([G:C_G(g_i)]\) for all \(i\). Since \(p\) also divides \(|G|\), then this would imply that \(p\) divides \(|\mathrm{Z}(G)|\), but we assumed that \(p\) does not divide \(|\mathrm{Z}(G)|\). We conclude that \(p\) does not divide \([G:C_G(g_i)]\) for some \(i\).

Note that \([G:C_G(g_i)]\) divides \(|G|\) by Lagrange’s Theorem, and thus it must divide \(m\). Set

\[ d := \frac{m}{[G:C_G(g_i)]}. \]

Then

\[ |C_G(g_i)|= \frac{|G|}{[G:C_G(g_i)]} = \frac{p^em}{[G:C_G(g_i)]} = p^e d, \]

and note that \(p\) does not divide \(d\) since it does not divide \(m\). Since \(g_i\) is not central, then \(e \notin C_G(g_i)\), and in particular \(|C_G(g_i)| < |G|\). By induction hypothesis, \(C_G(g_i)\) contains a subgroup \(S\) of order \(p^e\). But \(S\) is also a subgroup of \(G\), and it has order \(p^e\), as desired. This completes the proof of \((1)\): we have shown that \(G\) contains a subgroup of order \(p^e\).

To prove \((2)\) and \((3)\), let \(P\) be a Sylow \(p\)-subgroup and let \(Q\) be any \(p\)-subgroup. Let \(\mathcal{S}_P\) denote the collection of all conjugates of \(P\):

\[ \mathcal{S}_P := \{ gPg^{-1} \mid g \in G\}. \]

By definition, \(G\) acts transitively on \(\mathcal{S}_P\) by conjugation. Restricting that action to \(Q\), we get an action of \(Q\) on \(\mathcal{S}_P\), though note that we do now know if that action is transitive. The key to proving parts \((2)\) and \((3)\) of the Sylow Theorem is to analyze the action of \(Q\) on \(\mathcal{S}_P\).

Let \(\mathcal{O}_1, \dots, \mathcal{O}_s\) be the distinct orbits of the action of \(Q\) on \(\mathcal{S}_P\), and for each \(i\) pick a representative \(P_i \in O_i\). Note that

\[ \begin{aligned} \mathrm{Stab}_Q(P_i) & = \{q \in Q \mid qP_iq^{-1} = P_i\} && \textrm{by the definition of the action}\\ & = N_Q(P_i) && \textrm{by definition of normalizer}\\ & = Q \cap N_G(P_i) && \\ & = Q \cap P_i && \text{by the Sylow normalizer Lemma}. \end{aligned} \]

By the Orbit-Stabilizer Theorem, we have \(|\mathcal{O}_i| = [Q: Q \cap P_i]\), and thus, collecting the orbits,

\[ |\mathcal{S}_P |= \sum_{i=1}^s [Q: Q \cap P_i]. \label{E1030} \]

This equation holds for any \(p\)-subgroup \(Q\) of \(G\). In particular, we can take \(Q = P_1\). In this case, the first term in the sum is \([Q: Q \cap P_i] = 1\) and, for all \(i \neq 1\) we have

\[ Q \cap P_i = P_1 \cap P_i \neq P_1 = Q \implies [Q: Q \cap P_i] > 1. \]

But \(|Q|\) is a power of \(p\), so \([Q: Q \cap P_i]\) must be divisible by \(p\) for all \(i\). We conclude that

\[ |\mathcal{S}_P| \equiv 1 \pmod{p}. \]

Note, however, that this does not yet prove part \((3)\), since we do not yet know that \(\mathcal{S}_P\) consists of {all} the Sylow \(p\)-subgroups. But we do have all the pieces we need to prove part (2). Suppose, by way of contradiction, that \(Q\) is a \(p\)-subgroup of \(G\) that is not contained in any of the subgroups in \(\mathcal{S}_P\). Then \(Q \cap P_i \neq Q\) for all \(i\), and thus every term on the right-hand side of

\[ |\mathcal{S}_P |= \sum_{i=1}^s [Q: Q \cap P_i] \]

is divisible by \(p\), contrary to the equation above. We conclude that \(Q\) must be contained in at least one of the subgroups in \(\mathcal{S}_P\). This proves the first part of \((2)\).

Moreover, if we take \(Q\) to be a Sylow \(p\)-subgroup of \(G\), then \(Q \leq gPg^{-1}\) for some \(g\), but \(Q\) and \(P\) are both Sylow \(p\)-subgroups of \(G\), so

\[ |Q| = |P| = |gPg^{-1}|. \]

We conclude that \(Q = gPg^{-1}\) is conjugate to \(P\). In particular, the conjugation action of \(G\) on \(\mathrm{Syl}_p(G)\) is transitive, and this finishes the proof of \((2)\).

This proves, in particular, that \(\mathcal{S}_P\) in fact does consist of all Sylow \(p\)-subgroups, we can now also conclude part \((3)\) from \(\eqref{E1030b}\).

Finally, for any \(P \in \mathrm{Syl}_p(G)\), the stabilizer of \(P\) for the action of \(G\) on \(\mathrm{Syl}_p(G)\) by conjugation is \(N_G(P)\). Since we now know the action is transitive, the Orbit-Stabilizer Theorem says that

\[ |\mathrm{Syl}_p(G)| = [G: N_G(P)]. \]

Moreover, since \(P \leq N_G(P)\) and \(|P| = p^e\), it follows that \(p\) divides \(|N_G(P)|\), so

\[ |N_G(P)| = p^e d \]

for some \(d\) that divides \(m\). We conclude that

\[ [G: N_G(P)] = \frac{|G|}{|N_G(P)} = \frac{p^em}{p^ed} = \frac{m}{d}, \]

so \([G: N_G(P)]\) divides \(m\).

Remark 6.2.8

In general, Cauchy’s Theorem can be deduced from part one of the Sylow Theorem. However, we used Cauchy’s Theorem to prove the Sylow Theorem, so it is important to see that Cauchy’s Theorem can be proven independently of Sylow theory.

To see how Cauchy’s Theorem follows from the Sylow Theorem, suppose that the prime \(p\) divides \(|G|\). Then by the Sylow Theorem there exists a Sylow \(p\)-subgroup \(P\) of \(G\). Pick any nontrivial element \(x \in P\). Then \(|x| = p^j\) for some \(j \geqslant 1\), since by Lagrange’s Theorem \(|x|\) must divide \(|P| = p^e\). Then \(y = x^{p^{j-1}}\) has order \(p\):

\[ y^p = \left( x^{p^{j-1}} \right)^p =x^{p \cdot p^{j-1}} = x^{p^{j}} = e, \]

Moreover, \(y^i \neq e\) for \(2 \leqslant i < p\), as otherwise

\[ |x| \leqslant ip^{j-1} < p^j. \]

Remark 6.2.9

Let \(G\) be a group. We saw that if \(H\) is the unique subgroup of finite order \(n\), then \(H\) is must be a normal subgroup of \(G\). One consequence of the Main Theorem of Sylow Theory is a sort of converse to this: if \(G\) has multiple Sylow \(p\)-subgroups, then \(G\) has no normal Sylow \(p\)-subgroups, since any two Sylow \(p\)-subgroups must be conjugate to each other.

6.3 Using Sylow Theory

Using the Main Theorem of Sylow Theory, we can often find the exact number of Sylow \(p\)-subgroups, sometimes leading us to find normal subgroups. In particular, these techniques can be used to show that there are no normal subgroups of a particular order, as the next example will illustrate.

Example 6.3.1 (No simple groups of order \(12\))

Let us prove that there are no simple groups of order \(12\). To do that, let \(G\) be any group of order \(12 = 2^2 \cdot 3\). We will prove that \(G\) must have either a normal subgroup of order \(3\) or a normal subgroup of oder \(4\).

First, consider \(n_2=|\mathrm{Syl}_2(G)|\). By the Main Theorem of Sylow Theory, \(n_2 \equiv 1 \!\pmod 2\) and \(n_2\) divides \(3\). This gives us \(n_2 \in \{ 1, 3 \}\). Similarly, \(n_3 = |\mathrm{Syl}_3(G)|\) satisfies

\[ n_3 \equiv 1 \!\!\pmod 3 \quad \textrm{and} \quad n_3 \mid 4, \]

so \(n_3 \in \{1, 4 \}\). If either of these numbers is \(1\), we have a unique subgroup of order \(4\) or of order \(3\), and such a subgroup must be normal.

Suppose that \(n_3 \neq 1\), which leaves us with \(n_3 = 4\). Let \(P_1\), \(P_2\), \(P_3\), and \(P_4\) be the Sylow \(3\)-subgroups of \(G\). Consider any \(i \neq j\). Since \(P_i \cap P_j\) is a subgroup of \(P_i\), its order must divide \(3\). On the other hand, \(P_i\) and \(P_j\) are distinct groups of order \(3\), so \(|P_i \cap P_j| < 3\), and we conclude that \(|P_i \cap P_j| = 1\). Therefore, \(P_i \cap P_j = \{e\}\) for all \(i \neq j\). Thus the set

\[ T := \bigcup_{i=1}^4 P_i \]

has \(9\) elements: the identity \(e\) and \(8\) other distinct elements. Since each \(P_i\) has order \(3\), those \(8\) elements must all have order \(3\). Note, moreover, that any other potential element of order \(3\) would generate its own Sylow \(3\)-subgroup, so this is a complete count of all the elements of order \(3\). We conclude that there are \(8\) elements of order \(3\) in \(G\).

In particular, there are \(9\) elements in \(G\) that are either the identity or have order \(3\), and thus there are only \(12-9=3\) elements in \(G\) of order not \(3\), say \(a, b, c\).

Now consider any Sylow \(2\)-subgroup \(Q\), which has \(4\) elements. None of its elements has order \(3\), so we must have \(Q = \{ e, a, b, c \}\). In particular, this shows that there is a unique Sylow \(2\)-subgroup, which must then be normal.

Remark 6.3.2 (Warning!)

In the previous example, it would not be so easy to count the elements of order \(2\) and \(4\). We do know that every element in

\[ S := \bigcup_i Q_i \]

has order \(1\), \(2\), or \(4\), but the size of this set is harder to calculate. The issue is that \(Q_i \cap Q_j\) might have order \(2\) for distinct \(i\) and \(j\). The best we can say for sure is that \(S\) has at least \(4 + 4 - 2 = 6\) elements.

More generally, if \(P\) and \(Q\) are both subgroups of \(G\) of prime order \(p\), we can say that \(P \cap Q = \{ e \}\) using the same argument we employed in the previous example. However, if \(P\) and \(Q\) are two subgroups of order \(p^e\) with \(e \geqslant 2\), we can no longer guarantee that \(P \cap Q = \{ e \}\).

Example 6.3.3 (No simple groups of order \(80\))

Let \(G\) be a group of order \(80=5 \cdot 16\), and let \(n_2 = |\mathrm{Syl}_2(G)|\) and \(n_5 = |\mathrm{Syl}_5(G)|\). By the Main Theorem of Sylow Theory,

\[ n_2 \equiv 1 \!\!\pmod 2 \quad \text{and} \quad n_2 \mid 5 \implies n_2 \in \{1, 5\} \]

and

\[ n_5 \equiv 1 \!\!\pmod 5 \quad \text{and} \quad n_5 \mid 16 \implies n_5 \in \{1, 16\}. \]

If either \(n_2 = 1\) or \(n_5=1\), then the unique Sylow \(2\)-subgroup or \(5\)-subgroup would be normal. If \(G\) is a simple group, then we must have

\[ n_2 = 5 \quad \text{and} \quad n_5 = 16. \]

While the counting trick we used in the previous example would work, let us try on a different tactic here.

Consider the action of \(G\) on \(\mathrm{Syl}_2(G)\) by conjugation, and let

\[ \rho\!: G \to S_5 \]

be the associated permutation representation. The action is transitive by the Main Theorem of Sylow Theory, so the map \(\rho\) is nontrivial. By the previous example, \(\mathrm{im}(\rho)\) is a subgroup of \(S_5\), and thus by Lagrange’s Theorem the order of \(\mathrm{im}(\rho)\) divides \(|S_5|\). However, \(|G|=80\) does not divide \(120 = |S_5|\), so the image of \(\rho\) cannot have \(80\) elements, and in particular \(\rho\) cannot be injective. It follows that \(\ker(\rho)\) is a nontrivial, proper normal subgroup of \(G\), a contradiction.

7. Products and finitely generated abelian groups

In this chapter we discuss some important ways to build new groups out of smaller pieces, and conversely, how to break some groups down into smaller pieces.

7.1 Direct products of groups

Definition 7.1.1

Let \(I\) be a set and consider a group \(G_i\) for each \(i \in I\). The direct product of the groups \(\{ G_i \}_{i \in I}\), denoted by \[ \prod_{i \in I} G_i, \] is the group with underlying set the Cartesian product \[ \prod_{i \in I} G_i \] equipped with the operation defined by \[ (g_i)_{i \in I} (h_i)_{i \in I} = (g_i h_i)_{i \in I}. \]

The direct sum of the groups \(G_i\) is the subgroup of the direct product of \(\{ G_i \}_{i \in I}\) given by \[ \bigoplus_{i \in i} G_i := \{(g_i)_{i \in I} \in \prod_{i \in I} G_i \mid g_i = e_{G_i} \text{ for all but finitely many } i \in I \}. \] In particular, the direct sum of \(\{ G_i \}_{i \in I}\) has the same operation as the direct product.

When \(I\) is finite, say \(I = \{ 1, \ldots, n \}\), we write \[ G_1 \times \cdots \times G_n := \prod_{i = 1}^n G_i. \]

Remark 7.1.2

When \(I\) is finite, the direct sum and the direct product of \(\{ G_i \}_{i \in I}\) coincide. This is the case we will be most interested in.

Exercise 7.1.3

The direct product of a collection of groups is a group, and the direct sum is a subgroup of the direct product.

Remark 7.1.4

If \(G_1, \ldots, G_n\) are all finite groups, then \[ |G_1 \times \cdots \times G_n| = |G_1| \cdots |G_n|. \]

Exercise 7.1.5

Let \(\{ G_i \}_{i \in I}\) be a collection of abelian groups. Show that \[ \prod_{i \in I} G_i \] is an abelian group.

Exercise 7.1.6

Let \(G\) and \(H\) be groups, and \(g \in G\) and \(h \in H\).

  1. Show that if \(|g|\) and \(|h|\) are both finite, then \(|(g,h)| = \operatorname{lcm}(|g|,|h|)\).
  2. Show that if at least one of \(g\) or \(h\) has infinite order, then \((g,h)\) also has infinite order.

Lemma 7.1.7 (CRT)

If \(\gcd(m,n)=1\), then \(\mathbb{Z}/m \times \mathbb{Z}/n\cong \mathbb{Z}/mn\).

Proof (of Lemma 7.1.7)

By the exercise above, \[ |(1,1)|=\operatorname{lcm}(m,n)=mn. \] But \(\mathbb{Z}/m\times \mathbb{Z}/n\cong \mathbb{Z}/mn\) has order \(mn\), so \((1,1)\) is a generator for the group, which must then be cyclic. By the structure theory of finite cyclic groups, all cyclic groups of order \(mn\) are isomorphic to \(\mathbb{Z}/mn\), so \[ \mathbb{Z}/m\times \mathbb{Z}/n\cong \mathbb{Z}/mn. \]

Exercise 7.1.8

Show that the converse holds: for all integers \(m, n > 1\), if \[ \mathbb{Z}/m\times \mathbb{Z}/n\cong \mathbb{Z}/mn, \] then \(\gcd(m,n) = 1\).

Sometimes it is convenient to write the CRT in terms of prime factorization, as follows:

Theorem 7.1.9 (CRT)

Suppose \(m = p_1^{e_1} \cdots p_l^{e_l}\) for distinct primes \(p_1, \dots, p_l\). Then there is an isomorphism \[ \mathbb{Z}/m \cong \mathbb{Z}/(p_1^{e_1}) \times \cdots \times \mathbb{Z}/(p_l^{e_l}). \]

Recall that we showed that given a group \(G\) and subgroups \(H\) and \(K\), if \(H\) is normal then \(HK\) is a subgroup of \(G\). In fact, we can saw more:

Theorem 7.1.10 (Recognition theorem for direct products)

Suppose \(G\) is a group with normal subgroups \(H \trianglelefteq G\) and \(K \trianglelefteq G\) such that \(H\cap K=\{e\}\). Then \(HK\cong H\times K\) via the isomorphism \(\theta\!: H \times K \to HK\) given by \[ \theta(h,k) = hk. \] Moreover, \[ H \cong \{(h,e)\mid h\in H\}\leq H\times K \] and \[ K\cong \{(e,k)\mid k\in K\}\leq H\times K. \]

Proof (of Theorem 7.1.10)

The hypothesis implies \(HK \leq G\). Moreover, consider any \(h \in H\) and any \(k \in K\). Since \(H\) is a normal subgroup, \[ khk^{-1} \in H, \textrm{ say } \] so also \[ [k,h]=khk^{-1}h^{-1}\in H. \] But \(K\) is also a normal subgroup, so similarly we obtain \[ [k,h]\in K. \] Therefore, \[ [k,h] \in H \cap K = \{ e \}, \] so \([k,h]= e\). We conclude that \[ hk = kh \quad \textrm{ for all } h\in H, k\in K. \]

The function \(\theta\) defined above must then satisfy \[ \begin{aligned} \theta((h_1, k_1) (h_2, k_2)) & = \theta(h_1h_2, k_1k_2) \\ & = (h_1h_2)(k_1k_2) \\ & = h_1(h_2k_1)k_2\\ & = (h_1k_1)(h_2k_2) \\ & = \theta(h_1, k_1) \theta(h_2, k_2) \end{aligned} \] and thus \(\theta\) is a homomorphism. Its kernel is \[ \ker(\theta) = \{(k,h) \mid k = h^{-1} \} = \{ (e,e) \} \] since \(H \cap K = \{e\}\). Moreover, \(\theta\) is surjective, as any element in \(HK\) is of the form \(hk \in HK\), and \[ \theta(h,k) = hk. \] This proves \(\theta\) is an isomorphism. Finally, restricting the codomain to any subgroup \(L\) of \(G\) and the domain to \(\theta^{-1}(L)\) gives an isomorphism between \(L\) and \(\theta^{-1}(L)\), so in particular \[ H \cong \theta^{-1}(H)=\{(h,e)\mid h\in H\}\leq H\times K \] and \[ K\cong \theta^{-1}(K)=\{(e,k)\mid k\in K\}\leq H\times K. \]

Remark 7.1.11

If \(H\trianglelefteq G\) and \(K\trianglelefteq G\) are such that \(H\cap K=\{e\}\), then each elements of \(HK\) is uniquely of the form \(hk\). This is a consequence of the fact that the map \(\theta\) is a bijection.

Definition 7.1.12

Let \(G\) be a group. If \(H \trianglelefteq G\) and \(K \trianglelefteq G\) are such that \(H \cap K=\{e\}\), then the subgroup \(HK\) of \(G\) is called the internal direct product of \(H\) and \(K\), while the group \(H\times K\) is called the external direct product of \(H\) and \(K\).

Example 7.1.13

Let \(G = D_{n}\), \(H = \langle r \rangle\) and \(K = \langle s \rangle\). Then \(H \cap K = \{e \}\), \(HK = G\), and \(H \trianglelefteq G\), but \(K\) is not normal in \(G\). So the Theorem does not apply to say that \(G\) is isomorphic to \(H \times K\). In fact, \(G\) is {not} isomorphic to \(H \times K\), since \(H \times K\) is abelian, while \(G\) is not. As we shall see, \(G\) is the semidirect product of \(H\) and \(K\).

7.2 Semidirect products

Remark 7.2.1

Let \(G\) be a group. Suppose we are given subgroups \(H\trianglelefteq G\) and \(K\leq G\) such that \(H\cap K=\{e\}\) but \(K\) is not normal. Then we still have \(HK \leq G\), but it is not necessarily true that the map \(\theta: H \times K \to HK\) defined by \(\theta(h,k) = hk\) is a group homomorphism. The issue is that given \(h \in H\) and \(k \in K\), while \[ khk^{-1} \in H \implies kh = h'k \textrm{ for some } h' \in H, \] we can no longer guarantee that \(kh=hk\). So given \(h_1, h_2 \in H\) and \(k_1, k_2 \in K\), suppose that \(k_1h_1 = h'_2k_1\). For \(\theta\) to be a homomorphism, we would need the following: \[ \theta(h_1, k_1) \theta(h_2, k_2) = (h_1k_1)(h_2k_2) = h_1h'_2k_1k_2=\theta(h_1h_2',k_1k_2). \] This we would need \[ (h_1, k_1)(h_2, k_2)=(h_1h_2',k_1k_2). \]

This motivates the following definition:

Definition 7.2.2

Let \(H\) and \(K\) be groups and let \(\rho\!: K \to \mathrm{Aut}(H)\) be a homomorphism. The (external) semidirect product induced by \(\rho\) is the set \(H \times K\) equipped with the binary operation defined by \[ (h_1,k_1)(h_2,k_2) := (h_1\rho(k_1)(h_2),k_1k_2). \] This group is denoted by \(H \rtimes_\rho K\).

The underlying set of \(H \rtimes_\rho K\) is the same as the direct product, but it is the operation that differs.

Remark 7.2.3

Note in particular that if \(H\) and \(K\) are finite, then \(|H \rtimes_\rho K| = |H| \cdot |K|\).

The proof that the semidirect product is indeed a group is straightforward but a bit messy, as we need to check all the group axioms.

Theorem 7.2.4

If \(H\) and \(K\) are groups and \(\rho\!: K \to \mathrm{Aut}(H)\) is a homomorphism, then \(H \rtimes_\rho K\) is a group.

Proof (of Theorem 7.2.4)

First, we show that the operation is associative. Indeed, \[ \begin{aligned} (y_1,x_1) \left( (y_2, x_2) (y_3, x_3) \right) & = (y_1,x_1) (y_2\rho(x_2)(y_3), x_2x_3) \\ & = (y_1\rho(x_1)\left(y_2\rho(x_2)(y_3)\right), x_1x_2x_3)\\ & = (y_1\rho(x_1)(y_2)(\rho(x_1)\circ \rho(x_2))(y_3), x_1x_2x_3)\\ & = (y_1\rho(x_1)(y_2)\rho(x_1x_2)(y_3), x_1x_2x_3)\\ & = (y_1 \rho(x_1)(y_2), x_1 x_2) (y_3, x_3) \\ & = \left( (y_1,x_1) (y_2, x_2) \right) (y_3, x_3). \end{aligned} \] To show that \((e,e)\) is a two-sided identity, consider any \(h \in H\) and \(k \in K\). Since \(\rho(k)\) is a homomorphism, then \(\rho(k)(e) = e\), and thus \[ (h,k)(e,e) = (h\rho(k)(e),ke) = (he,ke) = (h,k). \] Moreover, since \(\rho\) is a homomorphism, \(\rho(e) = \mathrm{id}_H\), and thus \(\rho(e)(y) = \mathrm{id}_H(y)=y\) for any \(y \in K\), so that \[ (e,e)(h,k) = (e\rho(e)(h),ek) = (eh,ek)=(h,k). \] Finally, for any \(x \in H\) and \(y \in K\) we have \[ \begin{aligned} (x,y) (\rho(y^{-1})(x^{-1}), y^{-1}) &= (x \, \rho(y)\left(\rho(y^{-1})(x^{-1}) \right), yy^{-1})\\ & = (x (\rho(y) \circ \rho(y^{-1}))(x^{-1}), e) \\ &= (x \rho(e)(x^{-1}), e) & \textrm{since \(\rho\) is a homomorphism} \\ & = (xx^{-1},e) & \text{since } \rho(e) = \mathrm{id}_H\\ & =(e,e), \end{aligned} \] and similarly, \[ \begin{aligned} (\rho(y^{-1})(x^{-1}), y^{-1}) (x,y) &= (\rho(y^{-1})(x^{-1}) \rho(y^{-1})(x),y^{-1}y) \\ & = (\rho(y^{-1})(x^{-1}x), e) & \textrm{since } \rho(y^{-1}) \text{ is a homomorphism} \\ & = (\rho(y^{-1})(e), e) \\ & = (e, e) & \textrm{since } \rho(y^{-1}) \text{ is a homomorphism.} \end{aligned} \] Thus \((x,y)\) has an inverse, given by \[ (x,y)^{-1} = (\rho(y^{-1})(x^{-1}) ,y^{-1}). \] This completes the proof that the semidirect product is a group.

Example 7.2.5

Given any two groups \(H\) and \(K\), we can always take \(\rho\) to be the trivial homomorphism. In that case, \(H \rtimes_\rho K\) is just the usual direct product: for all \(h \in H\) and all \(k \in K\), \(\rho(k) = \mathrm{id}_H\), so \[ (h,k)(h',k') = (h\rho(k)(h'),kk') = (hh',kk'). \]

Theorem 7.2.6

Given groups \(H\) and \(K\) are groups and a homomorphism \(\rho\!: K \to \mathrm{Aut}(H)\), \(H\) and \(K\) are isomorphic to subgroups of \(H \rtimes_\rho K\), as follows: \[ H \cong \{(h,e)\mid h\in H\} \trianglelefteq H \rtimes_\rho K \text{ and } K \cong \{(e,k)\mid k\in K\} \leq H \rtimes_\rho K. \] Moreover, \[ \frac{(H \rtimes_\rho K )}{\{(h,e)\mid h\in H\}} \cong K. \]

Proof (of Theorem 7.2.6)

Consider the function \(i\!: H \to H \rtimes_\rho K\) given by \[ i(y) = (y, e). \] Then \(i\) is a homomorphism: \[ i(y_1) i(y_2) = (y_1,e)(y_2,e) = (y_1\rho(e)(y_2) , ee) = (y_1 y_2, e) = i(y_1y_2). \] Moreover, \(i\) is injective by construction, and hence its image is isomorphic to \(H\) by the First Isomorphism Theorem. We can describe \(\mathrm{im}(i)\) as the set of all elements whose second component is \(e\). The image \(\mathrm{im}(i)\) is normal since the second component of \[ (h,k) (a, e) (h,k)^{-1} = (h,k) (a, e) (\rho(k^{-1})(h^{-1}), k^{-1}) \] is \[ kek^{-1} = e, \] which shows that any for any \((a,e) \in \mathrm{im}(H)\) and any \((h,k) \in H \rtimes_\rho K\), \[ (h,k) (a, e) (h,k)^{-1} \in \mathrm{im}(i). \] Let us write the image of \(i\), which we now know is a normal subgroup of \(H \rtimes_\rho K\), as \[ H' := \mathrm{im}(i) = \{(y,e) \mid y \in H\} \trianglelefteq H \rtimes_\rho K. \] Similarly, the function \[ j\!: K \to H \rtimes_\rho K \quad \text{given by } j(x) = (e,x) \] is also an injective homomorphism (exercise!), and thus its image \[ K' := \{(e,x) \mid x \in H \} \leq H \rtimes_\rho K \] is isomorphic to \(K\). Finally, given any \((h,k) \in H \rtimes_{\rho} K\), we can write \[ (h,k) = (h\rho(e)(e),k) = (h,e)(e,k) \in H'K', \] so \(H'K'= H \rtimes_\rho K\).

Consider the projection onto the second factor \[ \pi_2\!:H \rtimes_\rho K \to K, \] which is the map given by \[ \pi_2(x,y)=y. \] This is a group homomorphism, since the second component of \((x_1,y_1)(x_2,y_2)\) is \(y_1y_2\), and thus \[ \begin{aligned} \pi_2( (x_1,y_1)(x_2,y_2) ) = y_1y_2 = \pi_2(y_1) \pi_2(y_2). \end{aligned} \] Moreover, \(\pi_2\) is surjective by definition. Finally, \[ \ker(\pi_2)=\{(y,e_K)\mid y\in H\}=H'\cong H. \] By the First Isomorphism Theorem, we conclude that \[ (H \rtimes_\rho K )/ H' \cong K. \]

In the Theorem, we showed that \(\{(h,e)\mid h \in H\}\) is a normal subgroup of \(H \rtimes_\rho K\). However, \(\{(e,k)\mid k\in K\}\) is typically not a normal subgroup of \(H \rtimes_\rho K\). We will see a concrete example of this below.

Studying semidirect products is a great motivation to studying automorphism groups.

Exercise 7.2.8

Let \(C_n\) denote the cyclic group of order \(n \geqslant 2\), and consider the group \[ (\mathbb{Z}/n)^\times = \{ [j]_n \mid \gcd(j,n)=1\} \] with the binary operation given by the usual multiplication. Prove that \[ \mathrm{Aut}(C_n) \cong (\mathbb{Z}/n)^\times. \]

Remark 7.2.9

We can now count the number of elements in \(\mathrm{Aut}(C_n)\), since it is the number of integers \(1 \leqslant i <n\) that are coprime with \(n\). This number is given by what is know as the Euler \(\varphi\) function, \[ \varphi(n) = n \prod_{p \mid n} \left( 1 - \frac{1}{p} \right). \] Equivalently, if \(n = p_1^{a_1} \cdots p_k^{a_k}\), where \(p_1, \ldots, p_k\) are distinct primes and each \(a_i \geqslant 1\), then \[ \varphi(n) = \prod_{i=1}^k \left( p_i^{a_i-1} (p_i - 1) \right). \] In particular, if \(p\) is prime then \(| \mathrm{Aut}(\mathbb{Z}/p) | = p-1\).

Remark 7.2.10

We will show next semester that if \(p\) is prime, then \(\mathrm{Aut}(C_p) \cong (\mathbb{Z}/p)^\times\) is cyclic of order \(p-1\). We will use this important fact now though.

Exercise 7.2.11

Let \(p\) be a prime integer. Show that \[ \mathrm{Aut}(\underbrace{\mathbb{Z}/p\times \cdots\times\mathbb{Z}/p}_{n \text{ factors}})\cong \mathrm{GL}_n(\mathbb{Z}/p) \] and that these groups have order \((p^n-1)(p^n-p)(p^n-p^2)\cdots(p^n-p^{n-1})\).

To better understand semidirect products, we should also better understand what it means to have a homomorphism \(K \to \mathrm{Aut}(H)\).

Definition 7.2.12

Let \(G\) and \(H\) be groups. A (left) action of \(G\) on \(H\) via automorphisms is a pairing \(G \times H \to H\), written as \((g,h) \mapsto g \cdot h\), such that

  • For all \(g_1, g_2 \in G\) and \(h \in H\), \(g_1 \cdot (g_2 \cdot h) = (g_1 \cdot _G g_2) \cdot h\) .
  • For all \(h \in H\), \(e_G \cdot h = h\).
  • For all \(g \in G\) and all \(h_1, h_2 \in H\), \(g \cdot (h_1 \cdot_H h_2) = (g \cdot h_1) \cdot_H (g \cdot h_2)\).

Remark 7.2.13

Note that the first two axioms are just the axioms for a group action. So given a group action of \(G\) on \(H\), let \(\rho\!: G \to \mathrm{Perm}(H)\) be the corresponding permutation representation. If the action satisfies the third axiom, then that means that for each \(g \in G\), \(\rho(g)\) satisfies \[ \rho(g) (h_1 \cdot_H h_2) = \rho(g)(h_1) \, \rho(g)(h_2). \] This condition simply says that \(\rho(g)\) must be a homomorphism. Since \(\rho(g)\) is already a bijection, we conclude that \(\rho(g)\) must be an automorphism of \(H\). Conversely, given any homomorphism \(\rho\!: K \to \mathrm{Aut}(H)\), we can define a group action of \(K\) on \(H\) via automorphisms by setting \[ k \cdot h := \rho(k)(h). \] Since \(\mathrm{Aut}(H) \subseteq \mathrm{Perm}(H)\), we can extend \(\rho\) to a homomorphism \(K \to \mathrm{Perm}(H)\), which we saw is equivalent to the action of \(K\) on \(H\) we just defined. That action satisfies \[ \begin{aligned} k \cdot (h_1 \cdot_H h_2) & = \rho(k)(h_1 \cdot_H h_2) \\ & = \rho(k)(h_1) \cdot_H \rho(k)(h_2) & \text{since \(\rho\) is a homomorphism} \\ & = (k \cdot h_1) \cdot_H (k \cdot h_2) \end{aligned} \] In conclusion, we can now say that to give an action of \(G\) on \(H\) via automorphisms is to give a group homomorphism \[ \rho\!: G \to \mathrm{Aut}(H). \] Moreover, given a group \(K\) acting on a group \(H\) by automorphisms, we get an induced semidirect product \(H \rtimes_\rho K\), where \(\rho\!: K \to \mathrm{Aut}(H)\) is the corresponding homomorphism.

Exercise 7.2.14 (Conjugation action by automorphisms)

Fix a group \(G\), a normal subgroup \(H \trianglelefteq G\) and a subgroup \(K \leq G\). Show that the rule \[ k \cdot h = khk^{-1} \] for \(k \in K\) and \(h \in H\) determines an action of \(K\) on \(H\) via automorphisms, and the associated homomorphism \(\rho\!: K \to \mathrm{Aut}(H)\) is given by \[ \rho(k)(h) = khk^{-1}. \]

So now that we have a bit more context, let us now look at some examples of semidirect products.

Example 7.2.15

Let \(K = \langle x \rangle\) be the cyclic of order \(2\) and \(H = \langle y \rangle\) be the cyclic of order \(n\) for some \(n \geqslant 2\). By the UMP for cyclic groups, to give a homomorphism out of \(K\) is to pick the image \(i\) of the generator \(x\), which must satisfy \(i^2 = e\). In particular, \(i\) must be either the identity or an element of order \(2\).

Since \(H\) is abelian, the inverse map \(f\!: H \longrightarrow H\) given by \(f(a)=a^{-1}\) is an automorphism of \(H\) (Exercise!). (In fact, we can say more: \(\mathrm{Aut}(H) \cong (\mathbb{Z}/n)^\times\). In particular, \(-1\) is an element of \((\mathbb{Z}/n)^\times\), and the associated automorphism sends \(y\) to \(y^{-1}\).) This automorphism \(f\) is not the identity but it is its own inverse, so it has order \(2\). Therefore, by the UMP for cyclic groups, there is a homomorphism \[ \rho\!: K \to \mathrm{Aut}(H) \quad \text{with} \quad \rho(x)(y) = y^{-1}. \] Consider the semidirect product \(H \rtimes_\rho K\). The elements of \(H \rtimes_\rho K\) are the tuples \((y^i, x^j)\) for \(0 \leqslant i \leqslant n-1\) and \(0 \leqslant j \leqslant 1\). In particular, \(|H \rtimes_\rho K|=2n\). Set \[ \tilde y = (y,e_K) \in G \quad \text{ and } \quad \tilde x = (e_H,x) \in G. \] Then \(\tilde y^n = (y,e_K)^n = (y^n,e_K) = (e_H, e_K)= e_G \) and \(\tilde x^2 = (e_H,x)^2 = (e_H, x^2) = (e_H, e_K)=~e_G\). Moreover, \[ \tilde x \tilde y \tilde x \tilde y = (e_H,x)(y,e_K)(e_H, x)(y,e_K) = (\rho(x)(y),x)(\rho(x)(y),x) = (y^{-1},x)(y^{-1},x) = (y^{-1}y, e) = e_G. \] Looks familiar? Indeed, using our presentation for \(D_{n}\) from earlier and the UMP for presentations, we have a homomorphism \[ \theta\!: D_{n} \longrightarrow G \quad \text{given by } \quad \theta(r) = (y,e_K) \text{ and } \theta(s) = (x,e_H). \] Moreover, \(\theta\) is surjective since \[ \theta(r^is^j)=(y^i, x^j) \text{ for all } 0 \leqslant i \leqslant n-1, 0 \leqslant j \leqslant 1. \] Since \(|D_{n}|=|G|=2n\), this surjection must also be a bijection, and we conclude that \(\theta\) is an isomorphism. So the dihedral group is a semidirect product of the cyclic of order \(n\) and the cyclic group of order \(2\) respectively: \[ D_{n} \cong \langle y \rangle \rtimes_\rho \langle x \rangle \] where \(\rho\) is the inverse map as described above.

So given any group, how can we recognize it is in fact a semidirect product?

Theorem 7.2.16 (Recognition theorem for internal semidirect products)

Let \(G\) be a group. Suppose we are given subgroups \(H\) and \(K\) of \(G\) such that \[ H \trianglelefteq G \qquad \text{and} \qquad H \cap K = \{e\}. \] Let \(\rho\!: K \to \mathrm{Aut}(H)\) be the permutation representation of the action of \(K\) on \(H\) via automorphisms given by conjugation in \(G\), meaning that \[ \rho(k)(h) = khk^{-1}. \] Then \[ HK \cong H \rtimes_\rho K \] via the isomorphism \(\theta\!: H \rtimes_\rho K \to G\) given by \(\theta(x,y) = xy\). Moreover, \[ H \cong \{(h,e) \in H \rtimes_\rho K \mid h\in H\} \quad \text{ and } \quad K \cong \{(e,k) \in H \rtimes_\rho K \mid k\in K\}. \]

Proof (of Theorem 7.2.16)

First, we show that \(\theta\) is a group homomorphism. Indeed, \[ \begin{aligned} \theta((y_1, x_1) (y_2, x_2)) & = \theta(y_1 \rho(x_1)(y_2), x_1 x_2) \\ & = y_1(x_1y_2x_1^{-1}) x_1 x_2 \\ & = y_1x_1y_2x_2 \\ & = \theta(y_1, x_1) \theta(y_2, x_2). \end{aligned} \] Since \(H \cap K = \{e\}\), the kernel of \(\theta\) is \[ \ker(\theta) = \{(y,x) \in H \rtimes_{\rho} K \mid y = x^{-1} \} = \{ e \}. \] By construction, the image of \(\theta\) is \(HK\). Therefore, \(\theta\) induces an isomorphism onto \(HK\). Finally, \[ \theta^{-1}(H) = \{(h,e)\mid h\in H\} \quad \text{ and } \quad \theta^{-1}(K) = \{(e,k)\mid k\in K\}. \]

Definition 7.2.17

Given subgroups \(H\) and \(K\) of \(G\) such that \(H \trianglelefteq G\), \(HK = G\), and \(H \cap K = \{e\}\), we say that \(G\) is the internal semidirect product of \(H\) and \(K\).

Example 7.2.18

Consider \(G = D_{n}\) and its subgroups \(H = \langle r \rangle\) and \(K = \langle s \rangle\). Then \(H \trianglelefteq G\), \(K \leq G\), \(HK = G\) and \(H \cap K = \{e\}\). By the recognition theorem, \(G \cong H \rtimes_{\rho} K\), where \(\rho\!: K \to \mathrm{Aut}(H)\) \[ \rho(s)(r^i) = sr^is^{-1} = r^{n-i}. \] The last equality is from an earlier computation. Note in particular that \(K\) is {not} a normal subgroup of \(G\). We had already seen in that \(G\) is not the internal direct product of \(H\) and \(K\), but now know it is their internal semidirect product. We also already knew that \(D_n\) is a semidirect product.

For a fixed pair of groups \(H\) and \(K\), different actions of \(K\) on \(H\) via automorphisms can result in isomorphic semidirect products. Indeed, determining when \(K \rtimes_{\rho} H \cong K \rtimes_{\rho'} H\) is in general a tricky business. Here is an example of this:

Example 7.2.19

Let \(n \geqslant 3\) and consider \(G = S_n\), \(H = A_n\), and \(K = \langle (1 \, 2) \rangle\). Then \(H \trianglelefteq G\), \(K \leq G\), \(HK = G\) and \(K \cap H = \{e\}\). Note that \(H \cong C_2\) is the cyclic group with \(2\) elements. By the recognition theorem, \[ S_n \cong A_n \rtimes_\rho C_2 \] where \(\rho\!: C_2 \longrightarrow \mathrm{Aut}(A_n)\) sends \(x\) to conjugation by \((1 \, 2)\). Similarly, we can also consider the subgroup \(H' = \langle (1 \, 3) \rangle = (1 \,2 \, 3) \langle (1 \, 2) \rangle (1 \,2 \, 3)^{-1}\) of \(S_n\), and we also have \[ S_n \cong A_n \rtimes_{\rho'} C_2 \] where \(\rho': C_2 \to \mathrm{Aut}(A_n)\) sends \(x\) to conjugation by \((1 \,3)\).

However, the actions determined by \(\rho\) and \(\rho'\) are not identical. For example, \[ \rho(x)(1 \, 2 \, 3) = (1 \, 2 \, 3) \quad \text{ and } \quad \rho'(x)(1 \, 2 \, 3 ) = (2 \, 1 \, 3). \] Yet \[ A_n \rtimes_{\rho} H \cong S_n \cong A_n \rtimes_{\rho'} H'. \]

One good reason why this happened in this case is that \(H\) and \(H'\) are conjugate in \(S_n\).

Exercise 7.2.20

Let \(K\) be a finite cyclic group and let \(H\) be an arbitrary group. Suppose \(\phi\!: K \to \mathrm{Aut}(H)\) and \(\theta\!: K \to \mathrm{Aut}(H)\) are homomorphisms whose images are conjugate subgroups of \(\mathrm{Aut}(H)\); that is, suppose there is \(\sigma \in \mathrm{Aut}(H)\) such that \(\sigma \phi(K) \sigma^{-1} = \theta(K)\). Then \(H \rtimes_\phi K \cong H \rtimes_\theta K\).

Example 7.2.21

Let \(K\) be a cyclic group of prime order \(p\) and \(H\) be a group such that \(\mathrm{Aut}(H)\) has a unique subgroup of order \(p\). Suppose \(\phi\!: K \to \mathrm{Aut}(H)\) and \(\theta\!: K \to \mathrm{Aut}(H)\) are any two nontrivial maps. Then \(\phi\) and \(\theta\) are injective, since \(K\) is simple and the kernel would be a proper normal subgroup. Hence, the images of \(\phi\) and \(\theta\) are both the unique subgroup of \(\mathrm{Aut}(H)\) of order \(p\), and in particular they must be equal. Thus the Exercise applies to give \(H \rtimes_\phi K \cong H \rtimes_\theta K\).

Remark 7.2.22

If \(\rho\!: K \longrightarrow \mathrm{Aut}(H)\) is a nontrivial homomorphism, then the semidirect product \(H \rtimes_\rho K\) is never abelian. Indeed, all we need is to consider any \(k \in K\) such that \(\rho(k) \neq \mathrm{id}_H\), so that \(\rho(k)(h) \neq h\) for some \(h \in H\), and note that \[ (e,k)(h,e) = (\rho(k)(h),k) \quad \text{while} \quad (h,e)(e,k) = (h\rho(e)(e),k) = (h,k). \]

Thus we can use semidirect products to construct nonabelian groups. Given an integer \(n \geqslant 2\), to construct a nonabelian group we might set out to find groups \(K\) and \(H\) such that \[ |K| |H| = n \] and such that there exists a nontrivial homomorphism \[ \rho\!: K \to \mathrm{Aut}(H). \]

7.3 Finitely generated abelian groups

Recall that a group \(G\) is finitely generated if it \(G=\langle A \rangle\), where \(A\) is a finite set.

Remark 7.3.1

Any finite group \(G\) is finitely generated, since we can take \(A=G\). However, a finitely generated group need not be finite: for example \(\mathbb{Z}\) is even cyclic but infinite.

The main theorem of this section is a special case of a much more general theorem we will prove in the Spring: the classification of finitely generated modules over PIDs. Thus we leave the proof for next semester.

Theorem 7.3.2 (Fundamental Theorem of Finitely Generated Abelian Groups: Invariant Factor Form)

Let \(G\) be a finitely generated abelian group. There exist integers \(r \geqslant 0\), \(t \geqslant 0\), and \(n_i \geqslant 2\) for \(1 \leqslant i \leqslant t\), satisfying \(n_1 \mid n_2 \mid \cdots \mid n_t\) such that \[ G \cong \mathbb{Z}^r \times \mathbb{Z}/n_1 \times \dots \times \mathbb{Z}/n_t. \] Moreover, the list \(r,n_1,\ldots , n_t\) is uniquely determined by \(G\).

Definition 7.3.3

In the Fundamental Theorem of Finitely Generated Abelian Groups: Invariant Factor Form, the number \(r\) is the rank of \(G\), the numbers \(n_1,\ldots,n_t\) are the invariant factors of \(G\), and the decomposition of \(G\) in this form is the invariant factor decomposition of \(G\).

Remark 7.3.4

A finitely generated abelian group is finite if and only if its rank is \(0\). A special case of the classification theorem is that if \(G\) is a finite abelian group then \[ G \cong \mathbb{Z}/n_1 \times \dots \times \mathbb{Z}/n_t \] for a unique list of integers \(n_i \geqslant 2\) such that \(n_1 | n_2 | \cdots | n_t\).

Here is another version of the classification theorem:

Theorem 7.3.5 (Fundamental Theorem of Finitely Generated Abelian Groups: Elementary Divisor Form)

Let \(G\) be a finitely generated abelian group. Then there exist integers \(r \geqslant 0\) and \(s \geqslant 0\), not necessarily distinct positive prime integers \(p_1, \cdots, p_s\), and integers \(a_i \geqslant 1\) for \(1 \leqslant i \leqslant s\) such that \[ G \cong \mathbb{Z}^r\times \mathbb{Z}/p_1^{a_1} \times \cdots \times \mathbb{Z}/p_s^{a_s}. \] Moreover, \(r\) and \(s\) are uniquely determined by \(G\), and the list of prime powers \(p_1^{a_1}, \dots, p_s^{a_s}\) is unique up to the ordering.

Definition 7.3.6

In the Fundamental Theorem of Finitely Generated Abelian Groups: Elementary Divisor Form, the number \(r\) is the rank of \(G\), the \(p_i^{a_i}\) are the elementary divisors of \(G\), and the decomposition of \(G\) is called the elementary divisor decomposition of \(G\).

The two forms of the classification theorem are equivalent, which we can prove using the CRT. Rather than a careful proof that the two versions of the classification theorem are equivalent, we will now see in examples how the CRT allows us to go between invariant factors and elementary divisors.

Example 7.3.7 (Converting elementary divisors to invariant factors)

Suppose \(G\) is a finitely generated abelian group of rank \(3\) with elementary divisors \(4, 8, 9, 27, 25\). This means that \[ G \cong \mathbb{Z}^{3} \times \mathbb{Z}/4 \times \mathbb{Z}/4 \times \mathbb{Z}/9 \times \mathbb{Z}/27 \times \mathbb{Z}/25. \] By the CRT, \[ \mathbb{Z}/4 \times \mathbb{Z}/27 \times \mathbb{Z}/25 \cong \mathbb{Z}/(4 \cdot 27 \cdot 25) \quad \text{and} \quad \mathbb{Z}/4\times \mathbb{Z}/9 \cong \mathbb{Z}/(4 \cdot 9), \] so that \[ G \cong \mathbb{Z}^{3} \times \mathbb{Z}/(4 \cdot 27 \cdot 25) \times \mathbb{Z}/(4 \cdot 9) = \mathbb{Z}^{3} \times \mathbb{Z}/2700 \times \mathbb{Z}/36. \] Since \(36 \mid 5400\), we conclude that \(G\) has rank \(3\) and invariant factors \(5400\) and \(36\).

Example 7.3.8 (Converting invariant factors to elementary divisors)

Let \[ G \cong \mathbb{Z}^{4} \times \mathbb{Z}/6 \times \mathbb{Z}/36 \times \mathbb{Z}/180. \] Then by the CRT, \[ G \cong \mathbb{Z}^{4} \times \mathbb{Z}/2 \times \mathbb{Z}/3 \times \mathbb{Z}/4 \times \mathbb{Z}/9 \times \mathbb{Z}/4 \times \mathbb{Z}/5 \times \mathbb{Z}/9, \] is the elementary divisor form for \(G\).

Example 7.3.9

Let \(G = \mathbb{Z}/60 \times \mathbb{Z}/50\). This group is finite and abelian, and thus \(r=0\), but not in either invariant factor nor elementary divisor factorization.

Applying the CRT to \(60 = 12 \cdot 5 = 2^2 \cdot 3 \cdot 5\) and \(50 = 2 \cdot 5^2\), we have \[ \mathbb{Z} / 60 \cong \mathbb{Z} / 4 \times \mathbb{Z} / 3 \times \mathbb{Z} / 5 \quad \text{and} \quad \mathbb{Z}/50 \cong \mathbb{Z}/2 \times \mathbb{Z}/25 \] so \[ G \cong \mathbb{Z}/2 \times \mathbb{Z}/4 \times \mathbb{Z}/3 \times \mathbb{Z}/5 \times \mathbb{Z}/25. \] This gives the elementary divisor decomposition: \(G\) has rank \(0\) and elementary divisors \(2\), \(4\), \(3\), \(5\), and \(25\). Applying the CRT again, in a different way, gives \[ G \cong \mathbb{Z}/(4 \cdot 3 \cdot 25) \times \mathbb{Z}/(2 \cdot 5) = \mathbb{Z}/300 \times \mathbb{Z}/10. \] This is the invariant factor decomposition: \(G\) has rank \(0\) and invariant factors \(10\) and \(300\).

This classification makes the classification of finite abelian groups a very quick matter.

Example 7.3.10

Let us classify the abelian groups of order \(75\). First, note that \(75 = 5^2 \cdot 3\). The two possible elementary divisor decompositions are \[ \mathbb{Z}/25 \times \mathbb{Z}/3 \quad \text{ and } \quad \mathbb{Z}/5 \times \mathbb{Z}/5 \times \mathbb{Z}/3. \] Note that the two groups above are not isomorphic. This is part of the theorem, but to see this directly, note that there is an element of order \(25\) in \(\mathbb{Z}/25 \times \mathbb{Z}/3\), namely \(([1]_{25},[0]_3)\) whereas every element \((a,b,c)\in \mathbb{Z}/5 \times \mathbb{Z}/5 \times \mathbb{Z}/3\) has order \[ |(a,b,c)| =\mathrm{lcm}(|a|, |b|, |c|)\leqslant 3 \cdot 5 = 15, \] since \(|a|, |b| \in \{1,5\}\) and \(|c| \in \{1,3\}\).

Alternatively, the two possible invariant factor decompositions are \[ \mathbb{Z}/75 \quad \text{ or } \quad \mathbb{Z}/15 \times \mathbb{Z}/5. \] They are also not isomorphic, as the second option has no elements of order \(75\).

Remark 7.3.11

Let \(n = p_1^{e_1} \cdots p_k^{e_k}\) for distinct positive prime integers \(p_1, \dots, p_k\) and integers \(e_i \geqslant 1\). The classification of finitely generated abelian groups implies that there are \(p(e_1) \cdots p(e_k)\) isomorphism classes of abelian groups of order \(n\), where \(p(m)\) is the number of partitions of \(m\). For example, for \(n = 2^4 \cdot 3^5 \cdot 5^2\) there are \[ p(4) p(6) p(2) = 5 \cdot 7 \cdot 2 = 70 \] abelian groups of order \(n\) up to isomorphism.

7.4 Classifying finite groups of a given order

We can now combine the ideas from Sylow theory, (semi)direct products and the classification theorem for finitely generated abelian groups to classify the isomorphism classes of groups of a given order. You have already done some examples of this kind, such as the following problem set question:

Exercise 7.4.1

Show that any group of order \(6\) is isomorphic either to \(\mathbb{Z}/6\) or to \(D_3\).

Here is an example of the type of classification theorem we can prove.

Theorem 7.4.2

Let \(p<q\) be primes.

  1. If \(p\) does not divide \(q-1\), there is a unique group of order \(pq\) up to isomorphism, the cyclic group \(C_{pq}\).
  2. If \(p\) divides \(q-1\), there are exactly two groups of order \(pq\) up to isomorphism, the cyclic group \(C_{pq}\) and a nonabelian group.

Proof (of Theorem 7.4.2)

Let \(G\) be a group of order \(pq\) and let \(n_q = |\mathrm{Syl}_q(G)|\). Since \(n_q \equiv 1 \pmod{q}\), \(n_q \mid p\), \(p\) is prime, and \(q > p\), we must have \(n_q = 1\). Thus, the unique Sylow \(q\)-subgroup \(H\) is a normal subgroup.1

Now let \(K\) be a Sylow subgroup of order \(p\). Since \(H\) is normal, we know that \(HK\) is a subgroup of \(G\). By Lagrange's Theorem, \(|H\cap K|\) divides \(|H|\) and \(|H\cap K|\) divides \(|K|\). Therefore, \(H\cap K=\{e_G\}\). By an earlier exercise, \[ |HK| = \frac{|H||K|}{|H\cap K|} = \frac{q\cdot p}{1} = pq = |G| \] and so \(HK=G\). The recognition theorem for semidirect products thus yields that \[ G \cong H \rtimes_\rho K \] for some homomorphism \(\rho\!: K \longrightarrow \mathrm{Aut}(H)\). Note that \(H\) and \(K\) are cyclic, since they have prime order. Let us identify \(H\) with \(C_q = \langle x \mid x^q \rangle\) and \(K\) with \(C_p = \langle y \mid y^p \rangle\). Then \[ G \cong C_q \rtimes_\rho C_p \qquad \text{ for some homomorphism \(\rho: C_p \to \mathrm{Aut}(C_q)\).} \] We just need to classify all such semidirect products up to isomorphism. By the UMP of cyclic groups, the homomorphism \(\rho\!: C_p \longrightarrow \mathrm{Aut}(C_q)\) is uniquely determined by the image of the generator \(x\), which must be an element \(\alpha \in \mathrm{Aut}(C_q)\) with \(\alpha^p = \mathrm{id}\). Given such an \(\alpha\), we have \(\rho(y) = \alpha\) and more generally \(\rho(y^i) = \alpha^{i}\).

\(\mathrm{Aut}(C_q)\) is cyclic of order \(q-1\). On the other hand, \(\mathrm{im}(\rho)\) is a subgroup of both \(C_p\) and \(\mathrm{Aut}(C_q)\), so its order must divide both \(p\) and \(q-1\). In particular, there is a nontrivial automorphism \(\rho\) if and only if \(p \mid q-1\).

If \(p\) does not divide \(q-1\), then \(\rho\) is trivial, and by an earlier Lemma and the CRT we have \[ G \cong C_q \times C_q \cong C_{pq}. \]

If \(p\) does divide \(q-1\), there is at least one nontrivial \(\rho\). We still have \(G \cong C_{pq}\) if \(\rho\) is trivial. When \(\rho\) is nontrivial, \(G\) is not abelian, giving us at least two isomorphism classes. It remains to show that if \(\rho_1\) and \(\rho_2\) are any two nontrivial homomorphisms from \(C_p\) to \(\mathrm{Aut}(C_q)\), then the resulting semidirect products are isomorphic.

Since \(\mathrm{Aut}(C_q)\) is a cyclic group and \(p\) divides its order, it has a unique subgroup of order \(p\). Thus, we conclude that \(\mathrm{im}(\rho_1) = \mathrm{im}(\rho_2)\), so we have \[ C_q \rtimes_{\rho_1} C_p \cong C_q \rtimes_{\rho_2} C_p. \]

1 Alternatively, \(H\) is normal since \([G:H]=p\) is the smallest prime that divides \(|G|\).

Example 7.4.3

If \(p =2\) and \(q\) is any odd prime, then there are two groups of order \(2q\) up to isomorphism: \(C_{2q}\) and \(D_{q}\).

Rings

8. An Introduction to Rings

The next major topic is this class is rings.

8.1 Definitions and examples

Definition 8.1.1

A ring is a set \(R\) equipped with two binary operations, \(+\) and \(\cdot\), satisfying:

  • \((R,+)\) is an abelian group. We use additive notation: the identity element for \(+\) is denoted by \(0\) and the inverse of an element \(r\) for \(+\) is written as \(-r\).
  • The operation \(\cdot\) is associative, making \((R,\cdot)\) a semigroup.
  • There is a multiplicative identity element, written as \(1\), such that \[ 1 \cdot a = a = a \cdot 1 \] for all \(a \in R\), and thus \((R, \cdot)\) is a monoid.
  • Distributivity: For all \(a,b,c \in R\), we have \[ a \cdot (b + c) = a \cdot b + a \cdot c \quad \text{and} \quad (a + b) \cdot c = a \cdot c + b \cdot c. \]
  • We also require \(0 \neq 1\).

We sometimes write \(0_R\) and \(1_R\) if we need to emphasize what ring these elements live in.

Definition 8.1.2

An object satisfying just the first three conditions, but without a multiplicative identity, is a nonunital ring or a rng. To emphasize that \(R\) has a multiplicative identity, one might say that a ring is unital.

While some authors consider nonunital rings, in this class all our rings will be unital.

Remark 8.1.3

If we drop the requirement that \(0 \neq 1\), we may consider the zero ring, which is the set \(\{ 0 \}\) together with the only possible operations on it. Conversely, if \(1 = 0\) in a ring, then \(R = \{0\}\), since in this case all \(a \in R\) satisfy \(a \cdot 0 = 0\) and hence \(a = a \cdot 1 = a \cdot 0 = 0\).

Example 8.1.4

The integers with the usual addition and multiplication form a ring \((\mathbb{Z},+,\cdot)\).

Remark 8.1.5

The last condition, asking that \(1 \neq 0\), is not universal: some authors allow the zero ring, which is the ring with only one element. Requiring \(0 \neq 1\) is really asking that \(R\) should have at least two elements.

Lemma 8.1.6 (Ring arithmetic)

The following hold for any ring \(R\) and all \(a,b \in R\):

  1. \(a \cdot 0 = 0 = 0 \cdot a\),
  2. \(({-}a)b = -(ab) = a({-}b)\),
  3. \(({-}a)({-}b) = ab\).
  4. \(1\) is unique, and
  5. \(({-}1)a = -a\).

Proof of Lemma 8.1.6

  1. Note that \[ a \cdot 0 = a \cdot (0+0) = a \cdot 0 + a \cdot 0. \] By subtracting \(a \cdot 0\) on both sides, we conclude that \[ a \cdot 0 = a \cdot (0+0) = 0. \] Analogously, \(0 \cdot a = 0\).

  2. By distributivity, \[ ab + ({-}a)b = (a-a)b = 0 \cdot b = 0. \] Thus \(({-}a)b = -ab\). Analogously, \(a({-}b) = -ab\).

  3. Applying the previous property twice, and noting that \(-(-x)=x\) by properties of group operations, we get \[ ({-}a)({-}b) = -(a({-}b)) = -(-ab) = ab. \]

  4. Note that \((R, \cdot)\) is a monoid, and thus the identity \(1\) is unique, as is the case for groups.

  5. We have \(({-}1)a = - 1 \cdot a = -a\).

There are some additional conditions we might ask for a ring to satisfy, and that are so important they have their own names:

Definition 8.1.7

A ring \(R\) is

  • a commutative ring if \(\cdot\) is commutative, meaning that for all \(a,b \in R\) 1 \[ a \cdot b = b \cdot a. \] 1 The word abelian is never used in the context of rings, except to say things like “the additive group \((R,+)\) is abelian”.
  • a noncommutative ring if it is not commutative.
  • a division ring if \((R - \{0\}, \cdot)\) is a group, meaning that every nonzero element has a multiplicative inverse.
  • a field if it is a commutative division ring.

We are now ready to see many examples of rings.

Example 8.1.8

  1. The ring \(\mathbb{Z}\) is a commutative ring.
  2. Let \(n \geqslant 2\). The set \(\mathbb{Z}/n\) of integers modulo \(n\) is a commutative ring under addition and multiplication modulo \(n\). Note that \(\mathbb{Z}/n\) is a field if any only if \(n\) is prime.
  3. The familiar sets of numbers \(\mathbb{Q}\), \(\mathbb{R}\), \(\mathbb{C}\) are fields.
  4. (Matrix ring) If \(R\) is any ring, not necessarily commutative, then the set \(\mathrm{Mat}_{n}(R)\) of \(n \times n\) matrices with entries in \(R\) is a ring with the usual rules for addition and multiplication of square matrices.
  5. (The endomorphism ring of an abelian group) Let \(A = (A, +)\) be any abelian group, and set \(\mathrm{End}_{Ab}(A)\) to be the collection of endomorphisms of \(A\) — that is, the set of group homomorphisms \(f\!: A \longrightarrow A\) from \(A\) to itself. This set of endomorphisms \(\mathrm{End}_{Ab}(A)\) is a ring with pointwise addition \[ (f + g)(a) := f(a) + g(a) \] and multiplication given by composition of functions \[ f \cdot g := f \circ g. \] The additive identity is the \(0\)-map and the multiplicative identity is the identity map. This is almost always a noncommutative ring.
  6. (The real Hamiltonian quaternion ring) Let \(i\), \(j\), \(k\) be formal symbols and set \(\mathcal{H}\) to be the four dimensional \(\mathbb{R}\)-vector space consisting of all expressions of the form \(a + bi + cj + dk\) with \(a,b,c,d \in \mathbb{R}\). We claim that this can be given a ring structure, as follows. Addition is vector space addition: \[ (a + bi + cj + dk) +(a' + b'i + c'j + d'k) = (a + a') + (b + b') i + (c + c')j + (d + d')k. \] Moreover, multiplication is uniquely determined by the axioms of a ring together with the rules \[ i^2 = j^2 = k^2 = -1,\quad -ji = ij = k,\quad -kj = jk = i,\quad -ik = ki = j. \] and the fact that the real coefficients commute with each other and \(i\), \(j\), \(k\).
    It is not obvious that the multiplication defined in this way satisfies associativity, but in fact it does, and this amounts to conditions very similar to the associativity of the group \(Q_8\), which we discussed in with the quaternion group.
    This ring \(\mathcal{H}\) is a division ring, since one can check that \[ (a + bi + cj + dk)^{-1} = \frac{a - bi - cj - dk}{\|a + bi + cj + dk\|} \] where \[ \|a + bi + cj + dk \| := a^2 + b^2 + c^2 + d^2. \] In the equation above, \(\|a + bi + cj + dk \|\) is a nonzero real number if \(a + bi + cj + dk\) is not the zero element. The quantity \(\|a + bi + cj + dk \|\) is called the norm of the quaternion \(a + bi + cj + dk\).
  7. If \(X\) is a set and \(R\) is a ring, let \(\mathrm{Fun}(X, R)\) be the collection of set-theoretic functions from \(X\) to \(R\), and consider the pointwise addition and multiplication of functions: \[ (f + g)(x) := f(x) + g(x) \qquad \text{and} \qquad (f \cdot g)(x) := f(x) \cdot g(x) \] The set \(\mathrm{Fun}(X,R)\) is a ring with these operations. In this ring, the zero is the function that is constantly equal to zero, and the identity is the constant function equal to \(1\). If \(X\) is a finite set and \(|X|=n\), then \(\mathrm{Fun}(X,R)\) may be identified with \(R^n=\underbrace{R \times \cdots \times R}_n\), the direct product of \(n\) copies of \(R\).

Just like with groups, there are constructions that allow us to take old rings and build new ones.

Definition 8.1.9 (Direct product of rings)

Let \(R\) and \(S\) be two rings. The cartesian product \(R\times S\) has a natural ring structure with addition and multiplication defined componentwise: \[ (a,b)+(c,d)=(a+c,b+d) \qquad \text{and } \qquad (a,b)\cdot(c,d)=(a\cdot c,b\cdot d). \] The additive identity is \(0_{R \times S} = (0_R, 0_{S})\) and the multiplicative identity is \(1_{R \times S} = (1_R, 1_{S})\).

Exercise 8.1.10

Check that the direct product of two rings is a ring. Moreover, prove that \(R \times S\) is a commutative ring if and only if \(R\) and \(S\) are both commutative.

Exercise 8.1.11

Show that the direct product of two fields is never a field.

Definition 8.1.12 (Polynomial ring)

If \(R\) is any ring and \(x\) is a “variable”, then \(R[x]\) denotes the collection of \(R\)-linear combination of powers of \(x\) — i.e., formal expressions of the form \[ r_0 + r_1x + r_2x^2 + \cdots + r_nx^n \] with \(n \geqslant 0\) and \(r_i \in R\), and two such expressions are deemed equal if their coefficients are the same.

We make \(R[x]\) into a ring by the usual rule for adding and multiplying polynomial expressions, treating \(x\) as commuting with all elements of \(R\). So \[ (r_0 + r_1x + r_2x^2 + \cdots + r_nx^n) + (r'_0 + r'_1x + r'_2x^2 + \cdots + r'_mx^m) = (r_0 + r'_0) + (r_1+ r'_1)x + \cdots \] or more precisely, setting \(r_i = 0\) for \(i>n\) and \(r_i'=0\) for \(i>m\), \[ (r_0 + r_1x + r_2x^2 + \cdots + r_nx^n) + (r'_0 + r'_1x + r'_2x^2 + \cdots + r'_mx^m) = \sum_{i=0}^{\max{m,n}} (r_i+r_i') x^i, \] while \[ (r_0 + r_1x + r_2x^2 + \cdots + r_nx^n) \cdot (r'_0 + r'_1x + r'_2x^2 + \cdots + r'_mx^m) = \sum_{k} \left( \, \sum_{a+b = k} r_ar'_b \right) x^k. \] This ring \(R[x]\) is the polynomial ring in one variable over \(R\). One can also talk about polynomial rings in many variables. For a finite set of variables \(x_1, \ldots, x_n\), the ring \(R[x_1, \ldots x_n]\) can be constructed inductively by setting \[ R[x_1, \ldots x_n] = R[x_1, \ldots, x_{n-1}][x_n]. \] More generally, given an infinite set of variables \(X\), an element in the polynomial ring \(R[X]\) can be obtained by formally adding finitely many monomials in \(X\) with coefficients in \(R\), which are terms of the form \(r x_1^{a_1} \cdots x_n^{a_n}\) with \(x_i \in X\) and integers \(a_i \geqslant 0\). Each polynomial in \(R[X]\) uses only finitely many variables, and thus sums and products of two elements are obtained as in the polynomial ring in that finite set of variables.

Exercise 8.1.13

Check that if \(R\) is a ring then so is \(R[x]\). Moreover, show that if \(R\) is commutative, then so is \(R[x]\).

We will later discuss polynomial rings in more detail. For now, we note that in many circumstances when one says a polynomial ring, one often means a polynomial ring over a field.

8.2 Units and Zerodivisors

Elements in a ring might have certain special properties:

Definition 8.2.1

An element \(a\) of a ring is called a unit if there exists \(b \in R\) such that \(ab = 1\) and \(ba = 1\). The set of all units of a ring \(R\) is denoted \(R^{\times}\).

Exercise 8.2.2

Show that if \(a\) is a unit in a ring \(R\), then there is a unique \(b \in R\) such that \(ab=1\) and \(ba=1\).

Definition 8.2.3

Let \(a\) be a unit in a ring \(R\). The unique \(b \in R\) such that \(ab=1=ba\) is called the inverse of \(a\), denoted by \(a^{-1}\).

Exercise 8.2.4

Show that the set of units in a ring \(R\) forms a group \((R^{\times}, \cdot)\) with respect to multiplication.

Example 8.2.5

  1. The units in \(\mathbb{Z}\) are \(\mathbb{Z}^{\times} = \{ \pm 1 \}\).
  2. For all \(n \geqslant 2\),
    \[\mathbb{Z}/n^{\times} = \{ [j]_n \mid \gcd(j,n)=1 \}.\]
  3. For all \(n \geqslant 1\) and any field \(F\),
    \[\mathrm{Mat}_n(F)^{\times} = \mathrm{GL}_n(F).\]

Exercise 8.2.6

Let \(R\) be a ring. Find all the units of \(R[x]\).

Definition 8.2.7

A zerodivisor in a ring \(R\) is an element \(x \in R\) such that \(x \neq 0\) but either \(xy = 0\) or \(yx=0\) for some \(y \neq 0\).

Example 8.2.8

The ring \(\mathrm{Mat}_2(\mathbb{R})\) has many zerodivisors: for example,

\[A = \begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix}\]
is a zerodivisor since \(A^2 = 0\).

Example 8.2.9

In the ring \(\mathbb{Z}/6\), the element \([2]_6\) is a zerodivisor since \([2]_6[3]_6 = 0\).

Lemma 8.2.10

Let \(R\) be any ring. There is no element \(r \in R\) that is both a unit and a zerodivisor.

Proof (of Lemma 8.2.10)

Suppose that \(a\) is both a zerodivisor and a unit. Then there exists \(b \neq 0\) such that \(ab=0\) or \(ba=0\). Multiplying either of these equations by \(a^{-1}\) gives \(b=0\), which is a contradiction.

Definition 8.2.11

A ring \(R\) is an integral domain, often shortened to domain, if \(R\) is commutative and has no zerodivisors.

Remark 8.2.12

If one allows the zero ring, then in the definition of a domain we should explicitly require \(1 \neq 0\). Moreover, if one allows for nonunital rings, then we should also require all domains to be unital.

Remark 8.2.13

Any domain \(R\) satisfies what is known as the cancellation rule: given any nonzero element \(a \in R\),

\[ab = ac \implies b = c.\]
Indeed, the equality
\[ab = ac \implies a(b-c) = 0,\]
but since \(a\) is not a zerodivisor we must have \(b-c = 0\).

The cancellation rule does not hold if \(R\) is not a domain: if \(a\) and \(b\) are nonzero and \(ab=0\), then \(ab=a \cdot 0\) even though \(b \neq 0\).

Corollary 8.2.14

Every field is a domain.

Proof (of Corollary 8.2.14)

If \(R\) is a field, then every nonzero \(r \in R\) is a unit, and thus by Lemma 8.2.10 \(r\) is not a zerodivisor. Thus \(R\) has no zerodivisors, and must be a domain.

In contrast, not every domain must be a field.

Example 8.2.15

The ring \(\mathbb{Z}\) is a domain but not a field.

Example 8.2.16

Fix an integer \(n \geqslant 2\) and consider the ring \(\mathbb{Z}/n\). If \(n\) is composite, say \(n = ab\) with \(1 < a, b < n\), then \([a]_n[b]_n = 0\) in \(\mathbb{Z}/n\). In particular, \([a]_n\) and \([b]_n\) are zerodivisors and \(\mathbb{Z}/n\) is not a domain.

In contrast, if \(n\) is prime then \(\mathbb{Z}/n\) is a field, and thus in particular it is a domain. Putting all this together, we see that \[\mathbb{Z}/n \text{ is a domain } \iff \mathbb{Z}/n \text{ is prime } \iff \mathbb{Z}/n \text{ is a field.}\]

This is a special case of a more general fact:

Exercise 8.2.17

Show that every finite domain is a field.

Definition 8.2.18

An element \(a\) in a ring \(R\) is nilpotent if \(a^n = 0\) for some integer \(n \geqslant 1\).

Exercise 8.2.19

Show that if \(a\) is a nonzero nilpotent element, then \(a\) is a zerodivisor.

Thus there are no nontrivial nilpotent elements in a domain.

Exercise 8.2.20

Show that if \(a\) is a nilpotent element in a ring \(R\), then \(1-a\) is a unit.

Exercise 8.2.21

Given an integer \(n \geqslant 1\), describe all the nilpotent elements in \(\mathbb{Z}/n\).

Definition 8.2.22

An element \(a\) in a ring \(R\) is idempotent if \(a^2 = a\).

Exercise 8.2.23

Show that if \(e\) is an idempotent element in a ring \(R\), then \(1-e\) is also an idempotent element.

Exercise 8.2.24

Show that if \(F\) is a field, then \(0\) and \(1\) are the only idempotent elements.

8.3 Subrings

Definition 8.3.1

A subring of a ring \(R\) is a subset \(S \subseteq R\) such that \(S\) is a ring under the operations of \(R\) and \(1_S = 1_R\). When \(R\) is a field, a subring of \(R\) that is also a field is called a subfield of \(R\).

Some authors do not include the condition that \(1_S = 1_R\) in their definition of subring. However, we think of the identity as part of the basic data of the ring, and thus it is desirable for it to be shared with any subring. As we will see later when we define ideals, this will make our definition of ideal quite different in practice from what we would get if we allowed a subring to not be unital, or not share the multiplicative identity with the original ring.

Exercise 8.3.2

Prove that for a ring \(R\), a subset \(S\) of \(R\) is a subring if and only if \(1_R \in S\) and for all \(x,y \in S\) we have \(x-y \in S\) and \(xy \in S\).

Exercise 8.3.3

Any subring of a commutative ring is a commutative ring. Any subring of a domain is a domain.

Exercise 8.3.4

Prove that the set of \(\mathbb{R}\)-linear combinations of \[ \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix},\quad \begin{bmatrix} \sqrt{-1} & 0 \\ 0 & -\sqrt{-1} \end{bmatrix},\quad \begin{bmatrix} 0 & 1 \\ -1 & 0 \end{bmatrix},\quad \begin{bmatrix} 0 & \sqrt{-1} \\ \sqrt{-1} & 0 \end{bmatrix} \] forms a subring of \(\mathrm{Mat}_2(\mathbb{C})\).

We will later define what it means for two rings to be isomorphic. The ring in Exercise 8.3.4 is isomorphic to the quaternions ring \(\mathcal{H}\).

Remark 8.3.5

Let \(F\) be a ring and \(R = \mathrm{Mat}_n(F)\) with \(n \geqslant 2\). Let \(S\) be the subset of \(R\) consisting of matrices whose only nonzero element is in the upper left corner. Then \(S\) is a ring under same operations as \(R\), and in fact \(S \cong R\), but \(S\) is not a subring of \(R\) according to our definition, since \(1_S \neq 1_R\).

Example 8.3.6

  • The following is a chain of subrings: \[ \mathbb{Z} \subseteq \mathbb{Q} \subseteq \mathbb{R} \subseteq \mathbb{C} \subseteq \mathcal{H}. \] In the last containment, we think of \(\mathbb{C}\) as those elements \(a + bi + cj + dk\) of \(\mathcal{H}\) with \(c=d=0\).
  • For any ring \(R\) and integer \(n \geqslant 1\), the set of scalar matrices \[ \{ rI_n \mid r \in R \} \] is a subring of \(\mathrm{Mat}_n(R)\).
  • For any ring \(R\) and integer \(n \geqslant 1\), the set of all diagonal matrices is a subring of \(\mathrm{Mat}_n(R)\).
  • The set \[ \mathbb{Z}[i] = \{ a + bi \mid a,b \in \mathbb{Z} \} \] is a subring of \(\mathbb{C}\) called the ring of Gaussian integers.

Definition 8.3.7

The center of a ring \(R\) is the set \[ \mathrm{Z}(R) = \{ z \in R \mid zr = rz \text{ for all } r \in R \}. \] An element in \(R\) is called central if it is in the center of \(R\).

Exercise 8.3.8

Show that the center \(\mathrm{Z}(R)\) is a subring of \(R\).

Example 8.3.9

If \(R\) is commutative, then \(\mathrm{Z}(R) = R\).

The center measures how far \(R\) is from being commutative.

Exercise 8.3.10

Show that the center of \(\mathcal{H}\) is \(\mathbb{R}\).

Exercise 8.3.11

Show that for any commutative ring \(R\), the center of \(\mathrm{Mat}_n(R)\) is the collection of scalar matrices.

Lemma 8.3.12

Let \(d\) be a squarefree integer, meaning that the prime factorization of \(d\) has no repeated primes. Then \[ \mathbb{Q}(\sqrt{d}) = \{ a + b\sqrt{d} \mid a,b \in \mathbb{Q} \} \] is a subfield of the field \(\mathbb{C}\). Moreover, \[ \mathbb{Z}[\sqrt{d}] = \{ a + b\sqrt{d} \mid a,b \in \mathbb{Z} \} \] is a subring of \(\mathbb{Q}(\sqrt{d})\).

Proof (of Lemma 8.3.12)

We leave it as an exercise to prove that \(\mathbb{Q}(\sqrt{d})\) and \(\mathbb{Z}[\sqrt{d}]\) are closed under subtraction and products and contain \(1\), and thus are subrings of \(\mathbb{C}\) by the Subring Test (Exercise 8.3.2).

It remains to show that \(\mathbb{Q}(\sqrt{d})\) is a field, which amounts to the claim that it is closed under taking inverses of nonzero elements. Suppose \(r + q\sqrt{d} \neq 0\). Then its inverse in \(\mathbb{C}\) is \[ (r + q\sqrt{d})^{-1} = \frac{r - q\sqrt{d}}{r^2 - dq^2} \in \mathbb{Q}(\sqrt{d}). \] To see that this expression makes sense, note that if \(r^2 - dq^2 = 0\), then either \(r=q=0\) or \(d=(r/q)^2\). But \(r=q=0\) contradicts the assumption that \(r + q\sqrt{d} eq 0\), and \(d=(r/q)^2\) would make \(d\) a perfect square, contradicting that \(d\) is squarefree.

Remark 8.3.13

In Lemma 8.3.12, note that we allow \(d\) to be negative. For instance, it applies to \(\mathbb{Q}(\sqrt{-5})\) and \(\mathbb{Z}[\sqrt{-5}]\). Indeed, this is a classic example, as \(\mathbb{Z}[\sqrt{-5}]\) is a ring that is not a unique factorization domain (UFD), something we will discuss later.

It also makes sense to speak of \(\mathbb{Q}(\sqrt{d})\) and \(\mathbb{Z}[\sqrt{d}]\) when \(d\) has repeated prime factors, but it just leads to redundant examples. For instance, if \(d = 12\), then \(\mathbb{Q}(\sqrt{12}) = \mathbb{Q}(\sqrt{3})\) and \(\mathbb{Z}[\sqrt{12}] = \mathbb{Z}[\sqrt{3}]\).

Example 8.3.14

The ring \(\mathbb{Z}[\sqrt{d}]\) is an integral domain: it is a subring of \(\mathbb{C}\), and \(\mathbb{C}\) is a domain and thus a field.

Remark 8.3.15

The difference in notation (more precisely, in the parentheses) between \(\mathbb{Z}[\sqrt{d}]\) and \(\mathbb{Q}(\sqrt{d})\) will be explained next semester. In short, if \(R\) is a subring of \(S\) and \(s \in S\), then \(R[s]\) is the smallest subring of \(S\) that contains both \(R\) and \(s\); similarly, for a subfield \(F\) of a field \(L\) and an element \(a \in L\), \(F(a)\) denotes the smallest subfield of \(L\) containing \(F\) and \(a\). In this case, the sets \(\mathbb{Z}[\sqrt{d}]\) and \(\mathbb{Q}(\sqrt{d})\) happen to look very similar.

8.4 Ideals

Notation 8.4.1

Given a ring \(R\) and a subset \(S \subseteq R\), we write

\[ RS := \{ ra \mid a \in S,\; r \in R \} \qquad\text{and}\qquad SR := \{ ar \mid a \in S,\; r \in R \}. \]

If \(S = \{ a \}\), then we write \(Ra\) instead of \(R\{ a \}\) and \(aR\) instead of \(\{ a \}R\). Finally, given \(a,b \in R\), we write

\[ Ra + R b := \{ ra + sb \mid r,s \in R \}. \]

Definition 8.4.2

For a ring \(R\), an ideal (or a two sided ideal) of \(R\) is a nonempty subset \(I\) such that

  • Closure under addition: \((I,+)\) is a subgroup of \((R,+)\).
  • Absorption: for all \(r \in R\) and \(a \in I\), we have \(ra \in I\) and \(ar \in I\). More concisely: \(RI \subseteq I\) and \(IR \subseteq I\).

For noncommutative rings, one speaks also about left ideals and right ideals.

Definition 8.4.3

A left ideal of a ring \(R\) is a subgroup \(I\) of \((R,+)\) which satisfies \(RI \subseteq I\), while a right ideal is a subgroup \(I\) of \((R,+)\) which satisfies \(IR \subseteq I\).

Remark 8.4.4

Our definition of rings, or more precisely our insistence that all rings have \(1\), makes ideals and subrings very different beasts. If an ideal \(I\) contains \(1\), then by the absorption property we must have \(I = R\), since for all \(a \in R\) we have \(a = a \cdot 1 \in I\). Thus the only subset of \(R\) that is both an ideal and a subring is \(R\) itself.

Example 8.4.5

  1. Every ring \(R\) has at least two ideals: \(\{0\}\) and \(R\) itself.
  2. The ideals of \(\mathbb{Z}\) are of the form \(\mathbb{Z} \cdot n\) for various \(n\) (we will prove this later). One can show (exercise!) that
    \[ \mathbb{Z} \cdot 6 + \mathbb{Z} \cdot 10 = \{\, m \cdot 6 + n \cdot 10 \mid m,n \in \mathbb{Z} \,\} = \mathbb{Z} \cdot 2. \]
  3. The sets
    \[ R_i=\left\{ \begin{bmatrix} 0 & 0 & \cdots & 0\\ \vdots & \vdots & \ddots & \vdots \\ a_{i1} & a_{i2} & \cdots & a_{in}\\ \vdots & \vdots & & \vdots \\ 0 & 0 & \cdots & 0 \end{bmatrix}\right\}, \qquad L_j=\left\{ \begin{bmatrix} 0 & \cdots & a_{j1} & \cdots & 0\\ \vdots & \ddots & \vdots & & \vdots\\ 0 & \cdots & a_{ji} & \cdots & 0\\ \vdots & & \vdots & \ddots & \vdots\\ 0 & \cdots & a_{jn} & \cdots & 0 \end{bmatrix}\right\}, \]
    are a right ideal and a left ideal of \(\mathrm{Mat}_n(R)\) respectively. Neither of these are two-sided ideals if \(n \geqslant 2\).

Definition 8.4.6

An ideal \(I\) in a ring \(R\) is a proper ideal if \(I \neq R\), and nontrivial if \(I \neq \{0\}\).

Some authors might say an ideal is nontrivial to mean it is proper and nontrivial.

Exercise 8.4.7

Prove that an ideal \(I\) is proper if and only if \(I\) contains no units.

Exercise 8.4.8

Let \(R\) be a commutative ring. Show that \(R\) is a field if and only if \(R\) has only two ideals, \(\{0\}\) and \(R\).

Definition 8.4.9

A ring \(R\) is a simple ring if it has no proper nontrivial ideals, meaning that the only ideals of \(R\) are \(R\) and \(\{0\}\).

Exercise 8.4.10

If \(F\) is a field or, more generally, a division ring, and \(n \geqslant 1\) is an integer, prove that \(\mathrm{Mat}_{n \times n}(F)\) is a simple ring.

Here are some operations that one can perform with ideals.

Lemma 8.4.11

Let \(R\) be a ring and let \(I\) and \(J\) be ideals of \(R\). Then

  1. The sum of ideals
    \[ I + J := \{ a + b \mid a \in I,\; b \in J \} \]
    is an ideal.
  2. The intersection of ideals is an ideal: \(I \cap J\) is an ideal, and more generally the intersection
    \[ \bigcap_{\alpha \in J} I_\alpha \]
    of any collection of ideals \(I_\alpha\) of \(R\) is an ideal.
  3. The product of ideals is an ideal:
    \[ IJ := \left\{ \sum_{i=1}^n a_i b_i \,\middle|\, n \geqslant 0,\; a_i \in I,\; b_i \in J \right\} \]
    is an ideal such that \(IJ \subseteq I \cap J\).

The set of all ideals of a ring \(R\) is a lattice with respect to the partial order given by containment. In this lattice, the supremum of a pair of ideals \(I\) and \(J\) is \(I+J\) and the infimum is \(I \cap J\).

Exercise 8.4.12

Prove Lemma 8.4.11.

Remark 8.4.13

However, the union of ideals is typically not an ideal. For example, in \(\mathbb{Z}\), the sets of even integers \(I = 2\mathbb{Z}\) and multiples of \(3\), \(J = 3\mathbb{Z}\), are both ideals, but \(I \cup J\) is not an ideal since it contains \(2\) and \(3\) but it does not contain \(1 = 3 - 2\).

However, the union of nested ideals is an ideal.

Exercise 8.4.14

Let \(\{ I_\lambda \}_{\lambda \in \Lambda}\) be a chain of ideals, meaning that for all \(\alpha, \beta \in \Lambda\) we have \(I_{\alpha} \subseteq I_{\beta}\) or \(I_{\beta} \subseteq I_{\alpha}\). Show that

\[ \bigcup_{\lambda \in \Lambda} I_{\lambda} \]

is an ideal.

Definition 8.4.15

Let \(R\) be a ring and consider a subset \(S \subseteq R\). The ideal generated by \(S\), denoted \((S)\), is the intersection of all the ideals of \(R\) that contain \(S\). When \(S = \{ a_1, \ldots, a_n \}\), we may write \((a_1, \ldots, a_n)\) instead of \((\{ a_1, \ldots, a_n \})\).

Remark 8.4.16

Let \(S\) be a subset of a ring \(R\). By Lemma 8.4.11, the ideal generated by \(S\) is indeed an ideal.

The ideal generated by \(S\) is the smallest ideal of \(R\) that contains \(S\).

Exercise 8.4.17

Let \(A\) be any subset of a ring \(R\). The ideal generated by \(A\) is given by

\[ (A)=\left\lbrace \sum_{i=1}^n x_i a_i y_i \,\middle\vert\, n \geqslant 0,\; a_i \in A,\; x_i, y_i \in R \right\rbrace. \]

If \(R\) is commutative and \(A\) is any subset, then we can simplify this to

\[ (A)=\left\{\sum_{i=1}^n r_i a_i \mid n \geqslant 0,\; r_i \in R,\; a_i \in A\right\}. \]

Definition 8.4.18

Let \(R\) be a ring. Given an ideal \(I\) and a subset \(S\) of \(R\), we say that \(S\) generates \(I\) if \((S) = I\), and we call the elements of \(S\) generators of \(I\).

Remark 8.4.19

Suppose that \(R\) is a commutative ring. Given generators for \(I\) and \(J\), say \(I = (S)\) and \(J = (T)\), the set \(\{ st \mid s \in S,\; t \in T \}\) generates \(IJ\), while the set \(S \cup T\) generates \(I + J\).

Definition 8.4.20

We say an ideal \(I\) is finitely generated if \(I = (S)\) for some finite subset \(S\) of \(R\).

Remark 8.4.21

Note that if \(A = \{a_1, \dots, a_n \}\) and \(R\) is commutative, then

\[ (a_1, \ldots, a_n) = Ra_1 + \cdots + Ra_n = \{ r_1 a_1 + \cdots + r_n a_n \mid r_i \in R \}. \]

Definition 8.4.22

An ideal of \(R\) is principal if it can be generated by one element, meaning that \(I = (a)\) for some \(a \in R\).

Example 8.4.23

In \(R = \mathbb{Z}[x]\), we have

\[ I = (2 , x) = \{\, 2f(x) + xg(x) \mid f(x), g(x) \in \mathbb{Z}[x] \,\}. \]

Thus \(I\) is the set of polynomials with integer coefficients whose constant term is even. One can show that this ideal cannot be generated by a single element, so it is not a principal ideal.

We will primarily use this notion when \(R\) is commutative.

Remark 8.4.24

Note that if \(R\) is commutative and \(I = (a)\), then \(I = Ra = \{ ra \mid r \in R \}\) by Exercise 8.4.17, since an expression of the form \(r_1 a + \cdots + r_m a\) can be rewritten as \(ra\) with \(r = r_1 + \cdots + r_m\). Note, however, that this does not work for noncommutative rings.

Example 8.4.25

  1. We will later show that every ideal of \(\mathbb{Z}\) is principal, so all ideals in \(\mathbb{Z}\) are of the form \(I=(n) = \mathbb{Z} \cdot n\) for some \(n\in \mathbb{Z}\).
  2. We will later show that for any field \(F\), every ideal of \(F[x]\) is principal.
  3. For any field \(F\), every ideal in \(F[x_1,\ldots,x_n]\) is finitely generated, but not necessarily principal when \(n \geqslant 2\). This fact is the Hilbert Basis Theorem, an elementary result in Commutative Algebra which we will not prove in the class.

8.5 Homomorphisms

A homomorphism of rings is a function between two rings that preserves the ring structure: the addition, multiplication, and \(1\).

Definition 8.5.1

For rings \(R\) and \(S\), a ring homomorphism (also called a ring map) from \(R\) to \(S\) is a function \(f\!: R \to S\) that satisfies the following properties:

  1. \(f(x + y) = f(x) + f(y)\) for all \(x,y \in R\);
  2. \(f(x \cdot y) = f(x) \cdot f(y)\) for all \(x,y \in R\);
  3. \(f(1_R) = 1_S\).

Remark 8.5.2

Equivalently, \(f\) is a ring homomorphism if \(f\) is both a homomorphism of abelian groups \((R, +) \longrightarrow (S,+)\) and a homomorphism of monoids from \((R, \cdot)\) to \((S, \cdot)\).1

1By definition, a homomorphism of monoids preserves the binary operation and sends the identity to the identity.

We really must require \(f(1_R) = 1_S\), since this is not a consequence of the first two conditions.

Example 8.5.3

The map from \(\mathbb{R}\) to \(\mathrm{Mat}_{2}(\mathbb{R})\) sending

\[ r \longmapsto \begin{bmatrix} r & 0 \\ 0 & 0 \end{bmatrix} \]

preserves addition and multiplication, but it does not send \(1\) to \(1\).

Example 8.5.4

The map \(\mathbb{R} \to \mathrm{Mat}_{n \times n}(\mathbb{R})\) sending \(r\) to \(rI_n\) is a ring homomorphism.

Exercise 8.5.5 (\(\mathbb{Z}\) is an initial object)

Prove that for any ring \(S\) there is a unique ring homomorphism \(f\!: \mathbb{Z} \to S\) given by sending \(n\) to \(n \cdot 1_S\).

Example 8.5.6

Fix a commutative ring \(R\), an element \(a \in R\), and an indeterminate \(x\). The evaluation at \(a\) map is the function \(f\!: R[x] \to R\) given by

\[ f\!\left(\sum_i r_i x^i\right) = \sum_i r_i a^i. \]

This is a ring homomorphism.

Exercise 8.5.7

Prove that for any commutative ring \(R\) and any element \(a \in R\), there is a unique ring homomorphism \(\mathbb{Z}[x] \to R\) that sends \(x\) to \(a\).

Definition 8.5.8

Let \(f\!: R \longrightarrow S\) be a ring homomorphism. The kernel of \(f\) is

\[ \ker(f) := \{ x \in R \mid f(x) = 0 \}. \]

Lemma 8.5.9

If \(f\!: R \to S\) is a ring homomorphism, then:

  1. \(f(0_R) = 0_S\);
  2. \(f(-x) = -f(x)\);
  3. If \(u \in R^\times\) then \(f(u) \in S^\times\) and \(f(u^{-1}) = f(u)^{-1}\);
  4. The image \(\mathrm{im}(f)\) is a subring of \(S\);
  5. The kernel \(\ker(f)\) is an ideal of \(R\);
  6. \(f\) is injective if and only if \(\ker(f) = \{0\}\).

Proof (of Lemma 8.5.9)

By definition, \(f\) is a homomorphism of additive groups, and thus \(f(0_R) = 0_S\) and \(f(-x) = -f(x)\).

The fact that units are sent to units follows since \(f(1_R)=1_S\):

\[ 1 = f(1) = f(u u^{-1}) = f(u)f(u^{-1}), \quad \text{and similarly } f(u^{-1})f(u)=1. \]

Hence \(f(u^{-1}) = f(u)^{-1}\).

To show that \(\mathrm{im}(f)\) is a subring, note \(1_S=f(1_R)\in\mathrm{im}(f)\). For \(a=f(x)\) and \(b=f(y)\) we have

\[ a-b=f(x-y)\in\mathrm{im}(f), \qquad ab=f(xy)\in\mathrm{im}(f). \]

Thus, by the Subring Test, \(\mathrm{im}(f)\) is a subring of \(S\).

The kernel \(\ker(f)\) is a subgroup under \(+\) and is closed under multiplication by all elements of \(R\): if \(a\in\ker(f)\) and \(r\in R\), then \(f(ra)=f(r)f(a)=f(r)\cdot0=0\), so \(ra\in\ker(f)\), and likewise \(ar\in\ker(f)\).

Finally, \(f\) is injective iff \(\ker(f)=\{0\}\) by the standard group-homomorphism argument.

Remark 8.5.10

In fact, a subset \(I\) of a ring \(R\) is an ideal if and only if it is the kernel of some ring homomorphism with source \(R\).

Definition 8.5.11

Given rings \(R\) and \(S\), a ring isomorphism from \(R\) to \(S\) is a ring homomorphism \(f\!: R \to S\) such that there exists a ring homomorphism \(g\!: S \to R\) with

\[ f \circ g = \mathrm{id}_S, \qquad g \circ f = \mathrm{id}_R. \]

In that case, we write \(f^{-1}\) to denote the homomorphism \(g\). Two rings \(R\) and \(S\) are isomorphic, written \(R \cong S\), if there exists such an isomorphism.

Exercise 8.5.12

Show that if \(f\!: R \to S\) is a bijective ring homomorphism, then \(f\) is an isomorphism. Moreover, show that the composition of two ring homomorphisms (respectively, isomorphisms) is again a ring homomorphism (respectively, isomorphism).

Exercise 8.5.13

Fix a ring \(R\) and integer \(n \geqslant 1\). Recall that the collection \(S\) of all diagonal matrices in \(\mathrm{Mat}_{n}(R)\) is a subring of \(\mathrm{Mat}_{n}(R)\). Prove that

\[ S \cong \underset{n\text{ times}}{\underbrace{R \times \cdots \times R}}. \]

Exercise 8.5.14

Show that the following are ring isomorphism invariants:

  1. All group isomorphism invariants of the additive group, including the isomorphism class (i.e. if \(R \cong S\) then \((R,+) \cong (S,+)\)).
  2. The properties of being commutative, a division ring, a field, or an integral domain.
  3. The cardinality of the set of zero divisors.
  4. All group isomorphism invariants of the group of units (i.e. if \(R \cong S\) then \((R^\times, \cdot) \cong (S^\times, \cdot)\)).
  5. The isomorphism type of the center: if \(R \cong S\) then \(\mathrm{Z}(R) \cong \mathrm{Z}(S)\).

Exercise 8.5.15

Let \(f\!: R \to S\) be a ring homomorphism. Show the following:

  1. If \(I\) is an ideal in \(R\), then \(f(I)\) is an ideal of \(f(R)\).
  2. If \(I\) is an ideal of \(S\), then \(f^{-1}(I)\) is an ideal of \(R\).

Warning! The image of an ideal under a ring homomorphism is not necessarily an ideal of the target ring.

Example 8.5.16

Let \(k\) be a field and \(x\) an indeterminate. Consider the subring of \(S = k[x]\) consisting of polynomials where all terms have even degree, given by

\[ R = k[x^2] := \{ r_0 + r_1 x^2 + \cdots + r_n x^{2n} \mid r_i \in k \}. \]

The inclusion map \(i\!: R \to S\) is a ring homomorphism. Consider the ideal \(I = (x^2)\) of \(R\). Its image \(J = i(I)\) under \(i\) is not an ideal of \(S\): because \(x^2 \in J\) but \(x \cdot x^2 = x^3 \notin J\).

Definition 8.5.17

Let \(R\) and \(S\) be commutative rings. Given a ring homomorphism \(f\!: R \to S\) and an ideal \(I\) in \(R\), the expansion of \(I\) into \(S\) is the ideal of \(S\) given by \(S f(I)\), sometimes denoted \(SI\).

8.6 Quotient Rings

We should think of a two-sided ideal as analogous to a normal subgroup of a group, for two related reasons:

  • They are the things that occur as kernels of homomorphisms.
  • They are the things you are allowed to mod out by.

Suppose \(I\) is a proper ideal of a ring \(R\). Recall this includes the fact that \(I\) is a subgroup of \((R,+)\), and hence it is a normal subgroup since \((R,+)\) is abelian. Thus, \(R/I\) is an abelian group under \(+\). Since we use additive notation, a typical element of this group is of the form \(r+I\) for \(r\in R\), and

\[ a + I = b + I \iff a - b \in I. \]

This quotient group also inherits a ring structure from \(R\):

Theorem 8.6.1

If \(R\) is a ring and \(I\) is a proper (two-sided) ideal, then the binary operation

\[ (r + I)\cdot(s + I) := rs + I \]

on \(R/I\) is well-defined and makes \((R/I, +, \cdot)\) into a ring, where \(+\) is the operation induced by addition on \(R\). The one in this ring is \(1+I\). Moreover, the map \(\pi\!: R \to R/I\) with \(\pi(r)=r+I\) is a ring homomorphism.

Proof (of Theorem 8.6.1)

The main point is the well-definedness of the operation. Suppose \(r+I = r'+I\) and \(s+I = s'+I\). Then \(r = r' + a\) and \(s = s' + b\) for some \(a,b\in I\), and hence

\[ rs = r's' + r'b + a s' + a b. \]

Since \(I\) is a two-sided ideal, \(r'b\), \(as'\), and \(ab\) all belong to \(I\), and so does their sum. It follows that \(rs + I = r's' + I\), proving that the operation is well-defined.

To show that \(R/I\) is a ring, note that it is already an abelian group under addition. Associativity of multiplication follows from that in \(R\). Moreover, \(1+I\) acts as a multiplicative identity since \(1\) does in \(R\), and the distributive laws descend directly from those in \(R\).

Finally, \(\pi(1)=1+I\) by definition, and \(\pi\) preserves products by construction, hence \(\pi\) is a ring homomorphism.

Definition 8.6.2

The ring \(R/I\) with the operations \(+\) and \(\cdot\) induced from \(R\) is the quotient ring of \(R\) modulo \(I\). The ring homomorphism \(\pi\!: R \to R/I\) sending \(r\) to \(r+I\) is called the canonical surjection, canonical map, or quotient map.

Remark 8.6.3

In the quotient ring \(R/I\), the zero element is \(0+I\) and the one is \(1+I\).

Example 8.6.4

Given an ideal \(I=(n)\) in the ring \(\mathbb{Z}\), the quotient ring \(\mathbb{Z}/(n)\) is the familiar ring \(\mathbb{Z}/n\).

Example 8.6.5

Let \(R = \mathbb{R}[x]\) and \(I = (x^2 + 1)\). Then we may form the quotient ring

\[ R/I = \mathbb{R}[x]/(x^2 + 1). \]

Intuitively, we are starting with \(\mathbb{R}\), adjoining an element \(x\), and imposing that \(x^2 = -1\), so we should obtain \(\mathbb{C}\). We will prove this carefully in Lemma 8.7.?.

Example 8.6.6

More generally, let \(R\) be any commutative ring, \(x\) an indeterminate, and suppose \(f(x)\) is a monic polynomial, say

\[ f(x) = x^n + r_{n-1} x^{n-1} + \cdots + r_1 x + r_0 \]

for some \(r_0, \ldots, r_{n-1} \in R\). Set \(S = R[x]/(f(x))\). One should think of this as adjoining a new ring element \(\overline{x}\) to \(S\) and imposing the relation given by \(f\):

\[ \overline{x}^n = -r_{n-1}\overline{x}^{n-1} - \cdots - r_1 \overline{x} - r_0. \]

In fact, the elements of \(S\) are in bijective correspondence with polynomials of degree at most \(n-1\):

\[ \{ a_0 + \cdots + a_{n-1} x^{n-1} \mid a_i \in R \} \longrightarrow S, \quad g \mapsto g + I. \]

For instance, the ring

\[ S = \mathbb{Q}[x]/(x^4 + x^3 + x^2 + x + 1) \]

can be thought of as taking \(\mathbb{Q}\) and adjoining an element \(\zeta_5\) such that

\[ \zeta_5^4 + \zeta_5^3 + \zeta_5^2 + \zeta_5 + 1 = 0 \implies -\zeta_5(\zeta_5^4 + \zeta_5^3 + \zeta_5^2 + \zeta_5 + 1) = 1. \]

Thus this new element \(\zeta_5\) is invertible; in fact, one can show that \(S\) is a field and is isomorphic to \(\mathbb{Q}(\zeta_5)\), the smallest subfield of \(\mathbb{C}\) containing both \(\mathbb{Q}\) and \(\zeta_5 = e^{2\pi i/5}\).

Example 8.6.7

Many rings of interest in commutative algebra arise from the construction

\[ F[x_1, \dots, x_n]/I \]

for some field \(F\), integer \(n \geqslant 1\), and ideal \(I \subseteq F[x_1, \dots, x_n]\). By the Hilbert Basis Theorem, every such ideal is finitely generated, so the ring has the form

\[ F[x_1, \dots, x_n]/(f_1, \dots, f_m), \]

where each \(f_j\) is a polynomial in \(x_1, \dots, x_n\). You should think of this as starting with \(F\), adjoining \(n\) new elements, and then imposing \(m\) relations among them. In commutative rings, relations involve both addition and multiplication.

8.7 The Isomorphism Theorems for Rings

Theorem 8.7.1 (Universal Mapping Property for Quotient Rings)

Let \(R\) be a ring and \(I\) a (two-sided) ideal in \(R\), and let \(\pi: R \to R/I\) be the canonical surjection. If \(f\!: R \to S\) is a ring homomorphism such that \(I \subseteq \ker(f)\), there exists a unique ring homomorphism \(\overline{f}: R/I \to S\) such that the following diagram commutes: \[ \begin{CD} R @>{f}>> S \\ @V{\pi}VV @| \\ R/I @>>{\overline{f}}> S \end{CD} \]

Proof (of Theorem 8.7.1)

Ignoring the multiplication operation, we already know from \Cref{UMP quotient group} that there is a unique group homomorphism \(\overline{f}\) of abelian groups from \((R/I, +)\) to \((S, +)\) such that \(\overline{f} \circ \pi = f\). It remains only to check that \(\overline{f}\) preserves multiplication and sends \(1\) to \(1\). Given elements \(r + I, s + I \in R/I\), we have

\[ \overline{f}((r+I)(s + I)) = \overline{f}(rs + I) = f(rs) = f(r)f(s) = f(r + I) f(s +I), \]

since \(f\) preserves multiplication. Finally,

\[ \overline{f}(1_{R/I}) = \overline{f}(1_R + I) = f(1_R) = 1_S \]

since \(f\) sends \(1_R\) to \(1_S\).

Theorem 8.7.2 (First Isomorphism Theorem for Rings)

If \(f\!: R \to S\) is a ring homomorphism, there is an isomorphism

\[ \begin{array}{rcl} \overline{f}: R/\ker(f) & \xrightarrow{\ \cong\ } & \mathrm{im}(f) \\[4pt] r+\ker(f) & \longmapsto & f(r) \end{array} \]

In particular, if \(f\) is surjective, then

\[ R/\ker(f)\cong S. \]

Proof (of Theorem 8.7.2)

Taking \(I = \ker(f)\) in the UMP for quotient rings, we have a ring homomorphism \(\overline{f}: R/\ker(f) \to S\). By the formula for \(\overline{f}\) we immediately get that \(\mathrm{im}(\overline{f}) = \mathrm{im}(f)\). Its kernel is

\[ \{r + I \mid f(r) = 0\} = \{0_{R/I}\} \]

and hence \(\overline{f}\) is injective. The result follows.

Here is a nice application of the First Isomorphism Theorem:

Example 8.7.3

Recall that \(\mathbb{R}[x]/(x^2+1)\) ought to be \(\mathbb{C}\). To prove this, we define a map

\[ \phi\!: \mathbb{R}[x] \longrightarrow \mathbb{C} \]

sending \(f(x)\) to \(f(i)\), the evaluation of \(f\) at \(i\). It is easy to check \(\phi\) is a ring homomorphism, but we leave the details as an exercise. This map is surjective since elements of the form \(a + bx\) in the source map to all possible complex numbers under \(\phi\).

We claim the kernel of \(\phi\) is \((x^2 +1)\). Note that

\[ x^2 + 1 \in \ker(\phi) \]

and it follows that

\[ (x^2 + 1) \subseteq \ker(\phi), \]

since \(\ker(\phi)\) is a two-sided ideal.

Suppose \(\phi(f(x)) = 0\). By the Division Algorithm in the polynomial ring \(\mathbb{R}[x]\), which we will cover in more detail later, we can write

\[ f(x) = (x^2 + 1)q(x) + r(x) \]

with the degree of \(r(x)\) at most \(1\). So \(r(x) = a + bx\) for real numbers \(a\) and \(b\). If \(r(x) \neq 0\), so that at least one of \(a\) or \(b\) is nonzero, then

\[ r(i) = a + bi \neq 0 \]

since a complex number is \(0\) only if both components are, which would contradict the fact that \(f(i) = 0\). So we must have \(r(x) = 0\) and hence \(f(x) \in (x^2 +1)\).

Applying the \hyperref[First Isomorphism Theorem for rings]{First Isomorphism Theorem for rings}, we get

\[ \mathbb{R}[x]/(x^2+1) \cong \mathbb{C} \]

via the map sending \(f(x) + (x^2 + 1)\) to \(f(i)\).

Example 8.7.4

Similarly, we may define \(\phi\!: \mathbb{Q}[x] \to \mathbb{C}\) by \(\phi(p(x)) = p(\zeta_5)\). We will skip the details, but its image of \(\mathbb{Q}(\zeta_5)\) and its kernel is \((x^4+x^3+x^2+x+1)\) and hence we declared \(\mathbb{Q}[x]/(x^4+x^3+x^2+x+1) \cong \mathbb{Q}(\zeta_5)\).

Exercise 8.7.5

(Second Isomorphism Theorem setup) Let \(S\) be a subring of a ring \(R\) and let \(I\) be an ideal of \(R\). Show that Show that

\[ S + I = \{s + i \mid s \in S, i \in I\} \]

is a subring of \(R\) and \(S \cap I\) is an ideal of \(S\).

Theorem 8.7.6 (Second Isomorphism Theorem for rings)

Let \(S\) be a subring of a ring \(R\) and let \(I\) be an ideal of \(R\). Then

\[ S + I = \{s + i \mid s \in S, i \in I\} \]

is a subring of \(R\), \(S \cap I\) is an ideal of \(S\), and

\[ \frac{S+I}{I}\cong \frac{S}{S\cap I}. \]

Proof (of Theorem 8.7.6)

The first two facts are the exercise above. The map \(f\!: S+I \to \frac{S}{S\cap I}\) sending \(s+i\) to \(s+i+I = s+I\) is a homomorphism of rings since it is the composition of a subring inclusion with the canonical quotient map. It is surjective by definition, and the kernel is

\[ \ker(f) = \{ s+i \mid s \in S, i \in I, s+I = I\} = I. \]

The result now follows from the First Isomorphism Theorem for rings.

Theorem 8.7.7 (Cancelling Isomorphism Theorem for rings)

If \(R\) is a ring and \(I \subseteq J\) are two ideals of \(R\), then \(J/I\) is an ideal of \(R/I\) and

\[ \frac{R/I}{J/I} \cong R/J \quad \text{ via } \quad (r + I) + J/I \longmapsto r + J. \]

Proof (of Theorem 8.7.7)

If we ignore multiplication, we know that \((J/I,+)\) is a subgroup of \((R/I,+)\) and that there is an isomorphism of abelian groups

\[ (R/I)/(J/I) \cong R/J \]

given by

\[ (r + I) + J/I \mapsto r + J. \]

One just needs to check that \(J/I\) is a two-sided ideal of \(R/I\) and the indicated bijection preserves multiplication, which we leave as an elementary exercise.

The following will be helpful in discussing some interesting examples:

Exercise 8.7.8 (Reduction homomorphism)

Given a ring map \(\phi\!: R \to S\) between commutative rings, there is an induced ring map

\[ \rho\!: R[x] \to S[x] \quad \text{ given by } \quad \rho\left(\sum_i r_i x^i\right)=\sum_i \phi(r_i) x^i. \]

That is, \(\rho\) consists of by applying \(\phi\) to the coefficients of each polynomials.

The proof is just a tedious check of the axioms, and so we leave it as an exercise.

Example 8.7.9

In particular, for \(I\) an ideal of \(R\), taking \(S = R/I\) and \(\phi\) to be the canonical homomorphism, the exercise implies that there is a ring homomorphism

\[ \rho\!: R[x] \to \frac{R}{I}[x] \]

given by

\[ \rho\left(\sum_i r_i x^i\right) = \sum_i (r_i + I) x^i \]

Thus \(\rho\) is given by modding out the coefficients by \(I\). In this case, the kernel of \(\rho\) is the collection of polynomials with coefficient in \(I\), which we denote by \(I[x]\). By the \ref{First Isomorphism Theorem for rings}{First Isomorphism Theorem}, we conclude that

\[ \frac{R[x]}{I[x]} \cong \frac{R}{I}[x]. \]

Example 8.7.10

Consider the ideal \(J = (2, x^2 + x + 1)\) of \(\mathbb{Z}[x]\). Explicitly, we have

\[ J =\{ p(x)\cdot 2 + q(x)(x^2 + x+ 1) \mid p(x),q(x)\in\mathbb{Z}[x]\}. \]

Suppose we want to understand \(\mathbb{Z}[x]/J\). Then the Cancelling Isomorphism Theorem is our friend. Set \(I = (2) = \mathbb{Z}[x] \cdot 2\) and note that \(I \subseteq J\), and so by theCancelling Isomorphism Theorem we have

\[ \frac{\mathbb{Z}[x]}{J} \cong \frac{\mathbb{Z}[x]/I}{J/I}. \]

By the example above,

\[ \frac{\mathbb{Z}[x]}{I} \cong \frac{\mathbb{Z}}{2}[x]. \]

As we did for groups, we will write \(J/I\) to denote the image of \(J\) under the quotient map \(\pi:\mathbb{Z}[x]\to \mathbb{Z}[x]/I\). Since \(J\) is generated by \(2\) and \(x^2+x+1\) and \(I\) is generated by \(2\), one can show that \(J/(2)\) is the principal ideal of \(\mathbb{Z}[x]/(2)\) generated by the coset represented by \(x^2+x+1\). Under the identification

\[ \mathbb{Z}[x]/(2) \cong (\mathbb{Z}/2)[x], \]

this ideal \(J/(2)\) corresponds to the principal ideal of \((\mathbb{Z}/2)[x]\) generated by \(x^2+x+1 \in (\mathbb{Z}/2)[x]\). We obtain a ring isomorphism

\[ \mathbb{Z}[x]/J \cong \frac{(\mathbb{Z}/2) [x]}{(x^2+x+1)}. \]

Looking ahead a bit, we note that the quadratic polynomial \(x^2+x+1\) has no roots in the field \(\mathbb{Z}/2\), as the only possibilities are \(0\) and \(1\), and neither is a root. As we will prove in soon, this implies \((\mathbb{Z}/2) [x]/(x^2+x+1)\) is a field, and thus \(\mathbb{Z}[x]/J\) is a field.

As discussed before, the set of all all ideals in a ring \(R\) is a partially ordered set with respect to the order given by containment.

Theorem 8.7.11 (Lattice Theorem for Quotient Rings)

Suppose \(R\) is a ring and \(I\) is a two-sided ideal of \(R\), and write \(\pi\!: R \to R/I\) for the quotient map. There is a bijection

\[ \begin{array}{rcl} \{\text{ideals of } R \text{ containing } I\} & \longleftrightarrow & \{\text{ideals of } R/I\} \\[6pt] J & \longmapsto & \pi(J) = J/I \\[6pt] \pi^{-1}(L) & \longmapsfrom & L \end{array} \]

Proof (of Theorem 8.7.11)

By the Lattice Isomorphism Theorem, we know that there is a bijection of subgroups (under \(+\)) of \(R\) that contain \(I\) and subgroups of \(R/I\), given by these formulas. It remains to pprove that this correspondence preserves the property of being an ideal, which we leave as an exercise.

Example 8.7.12

We claimed above that \(\mathbb{Z}[x]/(2, x^2+x+1)\) is a field. Since a field has only two ideals, \(\{0\}\) and the field itself, we deduce, using the Lattice Isomorphism Theorem, that there are only two ideals in \(\mathbb{Z}[x]\) that contain \((2, x^2+ x+1)\), namely

\[ (2, x^2+ x+1)=\pi^{-1}(0) \quad \text{ and } \quad \mathbb{Z}/[x]=\pi^{-1}(F). \]

8.8 Prime and maximal ideals in commutative rings

Definition 8.8.1

A maximal ideal of a ring \(R\) is an ideal that is maximal with respect to containment among all proper ideals of \(R\). More precisely, an ideal \(M\) is maximal if \(M \neq R\) and for all ideals \(I\) in \(R\), \[ M \subseteq I \implies M = I \text{ or } I = R. \] Thus the only ideals of \(R\) containing \(M\) are \(M\) and \(R\).

Let \(R\) be a commutative ring. A prime ideal of \(R\) is a proper ideal \(P\) such that \[ xy \in P \implies x \in P \text{ or } y \in P. \]

Example 8.8.2

In \(\mathbb{Z}\), the prime ideals are \((0)\) and the ideals generated by prime integers \(P=(p)\), where \(p\) is a prime integer. The maximal ideals are the ideals generated by prime integers. In particular, \((0)\) is prime but not maximal.

Example 8.8.3

In \(\mathbb{Z}[i]\), we claim that the ideal \((13)\) is not prime. On the one hand, \[ 13=(3+2i)(3-2i)\in(13) \] but we claim that \[ 3+2i \notin (13) \quad \text{ and } \quad 3-2i \notin (13). \] To see this, let \(N\) be the square of the complex norm function, meaning that \(N(a+bi) = a^2 + b^2\) for any \(a, b \in \mathbb{R}\). Now note that if \(3\pm 2i = 13 \alpha\) for some \(\alpha \in \mathbb{Z}[i]\), then \[ N(3\pm 2i)=N(13)N(\alpha), \] so it would follow that \[ 13= N(3\pm 2i) = 13^2 N(\alpha) \] with \(N(\alpha) \in \mathbb{Z}\), which is impossible.

Theorem 8.8.4

Let \(R\) be a commutative ring and let \(Q\) be an ideal of \(R\).

  1. The ideal \(Q\) is maximal if and only if \(R/Q\) is a field.
  2. The ideal \(Q\) is prime if and only if \(R/Q\) is a domain.
  3. Every maximal ideal of \(R\) is prime.

Proof (of Theorem 8.8.4)

By the Lattice Isomorphism Theorem, the ideals of \(R/Q\) are of the form \(I/Q\), where \(I\) is an ideal in \(R\) containing \(Q\).

By earlier work, \(R/Q\) is a field if and only if \(R/Q\) has only two ideals, \(\{ 0 \} = Q/Q\) and \(R/Q\). Thus \(R/Q\) is a field if and only if the only ideals that contain \(Q\) are \(Q\) and \(R\).

Now suppose \(Q\) is prime. If \[ (r + I)(r' + I) = 0 + I, \] then \(rr' \in I\) and hence either \(r \in I\) or \(r' \in I\), so that either \[ r + I = 0 \quad \text{or} \quad r'+ I = 0. \] Since \(R\) is commutative, then \(R/I\) is also commutative, and since \(Q\) is a proper, then \(R/I\) is not the zero ring. This proves that \(R/Q\) is a domain.

Conversely, suppose that \(R/Q\) is a domain. Since \(R/Q\) is not the zero ring, \(Q\) is proper. If \(x,y \in R\) satisfy \(xy \in I\), then \[ (x + I)(y + I) = 0 \] in \(R/Q\), and hence either \(x+ Q = 0\) or \(y + Q = 0\). It follows \(x \in Q\) or \(y \in Q\). This proves that \(Q\) is prime.

If \(Q\) is maximal, then \(R/Q\) is a field, which in particular implies that \(R/Q\) is a domain, and thus \(Q\) is prime.

Exercise 8.8.5

Show that the ideal \((2,x)\) in \(\mathbb{Z}[x]\) is maximal (and thus prime). In contrast, the ideals \((2)\) and \((x)\) are prime but not maximal.

Example 8.8.6

For a field \(F\), the ideal \(I = (x_1 - a_1, \dots, x_n - a_n)\) of the polynomial ring \(F[x_1, \dots, x_n]\) is maximal. This holds because \(I\) is the kernel of the surjective ring homomorphism \(F[x_1, \dots, x_n] \to F\) given by evalating polynomials at \((a_1, \dots, a_n)\).

Exercise 8.8.7

Show that \(f\!: R \longrightarrow S\) is a ring homomorphism and \(S\) is a domain, then \(\ker(f)\) is a prime ideal.

Theorem 8.8.8

Every commutative ring has a maximal ideal.

Fun fact: this is actually equivalent to the Axiom of Choice. We will prove it (but not its equivalence to the Axiom of Choice!) using Zorn's Lemma, another equivalent version of the Axiom of Choice. Zorn's Lemma is a statement about partially ordered sets. Given a partially ordered set \(S\), a chain in \(S\) is a totally ordered subset of \(S\).

Theorem 8.8.9 (Zorn's Lemma)

Let \(S\) be a nonempty partially ordered set \(S\) such that every chain in \(S\) has an upper bound in \(S\). Then \(S\) contains at least one maximal element.

We can now prove every ring has a maximal ideal; in fact, we will prove something stronger:

Theorem 8.8.10

Given a commutative ring \(R\), every proper ideal \(I \neq R\) is contained in some maximal ideal.