10 Chapter 8: The Channel Model
10.1 A New Kind of Problem
Everything in Part II was about sources: how much information a source produces, how to represent that information efficiently, how to compress it toward the entropy limit. The enemy was redundancy — wasted bits that a good encoder could eliminate.
Part III is about something different. The enemy is no longer redundancy. The enemy is noise.
When you send data across a network, write to a disk, transmit a radio signal, or burn a CD, the physical medium introduces errors. Bits flip. Packets corrupt. Signals attenuate. The question is no longer “how do we represent information efficiently?” but “how do we communicate information reliably when the channel between sender and receiver is imperfect?”
This question seems fundamentally different from compression. But Shannon showed in 1948 that both questions have the same mathematical foundation — and that the answer to the noisy channel question is, in its own way, just as surprising as the source coding theorem.
The surprise: you can communicate with arbitrarily small error probability over any noisy channel, as long as you transmit below a certain rate. That rate is called the channel capacity, and it depends only on the statistics of the channel’s noise — not on the specific messages you want to send.
This chapter builds the mathematical machinery to understand that claim. We will construct the channel model, define capacity, compute it for several important channels, and understand what the theorem says — and, crucially, what it does not say.
10.2 What Is a Channel?
A channel is an abstraction of any physical medium that transmits information. It has:
- An input alphabet X — the set of symbols the sender can transmit.
- An output alphabet Y — the set of symbols the receiver observes.
- A transition probability p(y|x) — the probability of observing output y when input x was sent.
The transition probability is the channel’s noise model. It captures everything about how the channel distorts the input. A noiseless channel has p(y|x) = 1 when y = x and 0 otherwise. A completely random channel has p(y|x) = 1/|Y| regardless of x — the output is independent of the input.
import math
import numpy as np
from collections import defaultdict
class Channel:
"""
A discrete memoryless channel (DMC).
Defined by input alphabet, output alphabet, and transition matrix.
"""
def __init__(self, inputs: list, outputs: list,
transition: dict):
"""
transition: dict mapping (x, y) -> p(y|x)
Must satisfy: sum_y p(y|x) = 1 for all x.
"""
self.inputs = inputs
self.outputs = outputs
self.transition = transition
self._validate()
def _validate(self):
for x in self.inputs:
total = sum(self.transition.get((x, y), 0)
for y in self.outputs)
if abs(total - 1.0) > 1e-9:
raise ValueError(
f"Transition probabilities for input '{x}' "
f"sum to {total}, not 1.0"
)
def p(self, x, y) -> float:
"""P(output=y | input=x)"""
return self.transition.get((x, y), 0.0)
def transmit(self, x) -> str:
"""Simulate transmitting symbol x through the channel."""
import random
outputs = self.outputs
probs = [self.p(x, y) for y in outputs]
return random.choices(outputs, weights=probs)[0]
def transition_matrix(self) -> np.ndarray:
"""Return the transition matrix as a numpy array."""
return np.array([
[self.p(x, y) for y in self.outputs]
for x in self.inputs
])Let’s define some standard channels:
def binary_symmetric_channel(p_error: float) -> Channel:
"""
BSC(p): flips each bit independently with probability p_error.
The most studied channel in information theory.
"""
return Channel(
inputs = ['0', '1'],
outputs = ['0', '1'],
transition = {
('0', '0'): 1 - p_error,
('0', '1'): p_error,
('1', '0'): p_error,
('1', '1'): 1 - p_error,
}
)
def binary_erasure_channel(p_erase: float) -> Channel:
"""
BEC(p): erases each bit with probability p_erase,
replacing it with '?' (erasure symbol).
Models packet loss networks.
"""
return Channel(
inputs = ['0', '1'],
outputs = ['0', '1', '?'],
transition = {
('0', '0'): 1 - p_erase,
('0', '1'): 0.0,
('0', '?'): p_erase,
('1', '0'): 0.0,
('1', '1'): 1 - p_erase,
('1', '?'): p_erase,
}
)
def z_channel() -> Channel:
"""
Z-channel: 0 is received perfectly, 1 may be received as 0.
Models some optical communication systems.
"""
return Channel(
inputs = ['0', '1'],
outputs = ['0', '1'],
transition = {
('0', '0'): 1.0,
('0', '1'): 0.0,
('1', '0'): 0.5,
('1', '1'): 0.5,
}
)
# Visualize the BSC
bsc = binary_symmetric_channel(0.1)
print("Binary Symmetric Channel (p=0.1)")
print("Transition matrix P(Y|X):")
print(f" {'0':>8} {'1':>8}")
for x in bsc.inputs:
row = [f"{bsc.p(x, y):>8.3f}" for y in bsc.outputs]
print(f" X={x}: {''.join(row)}")Output:
Binary Symmetric Channel (p=0.1)
Transition matrix P(Y|X):
0 1
X=0: 0.900 0.100
X=1: 0.100 0.900
10.3 Simulating a Channel
Before computing capacity abstractly, let’s get a feel for what channel noise actually does to data.
def simulate_transmission(message: str, channel: Channel,
n_trials: int = 1) -> list:
"""
Transmit a message through a channel n_trials times.
Returns list of received messages.
"""
results = []
for _ in range(n_trials):
received = ''.join(channel.transmit(bit) for bit in message)
results.append(received)
return results
def bit_error_rate(sent: str, received: str) -> float:
"""Fraction of bits that were corrupted (ignoring erasures)."""
errors = sum(1 for s, r in zip(sent, received)
if r != '?' and s != r)
bits = sum(1 for r in received if r != '?')
return errors / bits if bits > 0 else 0.0
# Demonstrate different channels
message = '1' * 100 # 100 ones
print(f"Original message (first 20): {message[:20]}")
print()
for name, channel in [
("BSC(0.01)", binary_symmetric_channel(0.01)),
("BSC(0.10)", binary_symmetric_channel(0.10)),
("BSC(0.30)", binary_symmetric_channel(0.30)),
("BEC(0.20)", binary_erasure_channel(0.20)),
]:
received = simulate_transmission(message, channel)[0]
ber = bit_error_rate(message, received)
print(f"{name:<12} received: {received[:20]} BER={ber:.3f}")Output (random, will vary):
Original message (first 20): 11111111111111111111
BSC(0.01) received: 11111111111111111111 BER=0.010
BSC(0.10) received: 11101111101110111111 BER=0.090
BSC(0.30) received: 01011101001110011101 BER=0.290
BEC(0.20) received: 1?111?1?1?11111?1111 BER=0.000
Notice: the BSC flips bits randomly; the BEC erases bits (replacing them with ?) but never corrupts the bits it does pass through. These different failure modes lead to different strategies for error correction.
10.4 Mutual Information: The Information That Gets Through
The central question for a channel is: how much information about the input X does the output Y carry? We need to measure the reduction in uncertainty about X that observing Y provides.
This is exactly mutual information — the concept we previewed in Chapter 2.
The mutual information between input X and output Y is:
I(X; Y) = H(X) - H(X|Y)
= H(Y) - H(Y|X)
= H(X) + H(Y) - H(X, Y)
It measures the reduction in uncertainty about X when Y is observed — equivalently, the information the channel successfully transmits.
def mutual_information(channel: Channel,
input_dist: dict) -> float:
"""
Compute I(X;Y) for a channel with given input distribution.
input_dist: dict mapping input symbols to probabilities.
Returns mutual information in bits.
"""
# Compute joint distribution P(X, Y)
joint = {}
for x in channel.inputs:
for y in channel.outputs:
joint[(x, y)] = input_dist.get(x, 0) * channel.p(x, y)
# Marginal P(Y)
p_y = defaultdict(float)
for (x, y), p in joint.items():
p_y[y] += p
# Marginal P(X)
p_x = input_dist
# I(X;Y) = sum p(x,y) log [p(x,y) / (p(x) p(y))]
mi = 0.0
for (x, y), p_xy in joint.items():
if p_xy > 0:
px = p_x.get(x, 0)
py = p_y.get(y, 0)
if px > 0 and py > 0:
mi += p_xy * math.log2(p_xy / (px * py))
return mi
# Compute mutual information for BSC at different error rates
print("Mutual Information I(X;Y) for BSC")
print("with uniform input distribution P(X=0) = P(X=1) = 0.5\n")
print(f"{'Error rate p':>14} {'I(X;Y) bits':>14}")
print("-" * 30)
uniform_binary = {'0': 0.5, '1': 0.5}
for p in [0.0, 0.05, 0.10, 0.20, 0.30, 0.50]:
bsc = binary_symmetric_channel(p)
mi = mutual_information(bsc, uniform_binary)
print(f"{p:>14.2f} {mi:>14.4f}")Output:
Mutual Information I(X;Y) for BSC
with uniform input distribution P(X=0) = P(X=1) = 0.5
Error rate p I(X;Y) bits
------------------------------
0.00 1.0000
0.05 0.7136
0.10 0.5310
0.20 0.2780
0.30 0.1187
0.50 0.0000
At p = 0 (no noise), the channel transmits exactly 1 bit per use — perfect transmission. At p = 0.5 (completely random), the output is independent of the input: I(X;Y) = 0. In between, mutual information decreases monotonically with noise.
Notice that mutual information depends on the input distribution we chose. With a different distribution — say, always sending 0 — the mutual information would be zero regardless of channel noise, because there is no uncertainty in the input for the channel to transmit.
This observation leads directly to the definition of channel capacity.
10.5 Channel Capacity
The channel capacity is the maximum mutual information over all possible input distributions:
C = max_{P(X)} I(X; Y)
Capacity is a property of the channel alone — it is the best we can do, optimizing over our choice of input distribution. It is measured in bits per channel use.
from scipy.optimize import minimize
def channel_capacity_binary(channel: Channel,
n_points: int = 1000) -> tuple:
"""
Compute channel capacity for a binary-input channel by
searching over input distributions P(X=1) = p, P(X=0) = 1-p.
Returns (capacity, optimal_p).
"""
best_mi = 0.0
best_p = 0.5
for i in range(n_points + 1):
p = i / n_points
dist = {'0': 1 - p, '1': p}
mi = mutual_information(channel, dist)
if mi > best_mi:
best_mi = mi
best_p = p
return best_mi, best_p
# Compute capacity for each channel
channels = {
'BSC(0.00)': binary_symmetric_channel(0.00),
'BSC(0.05)': binary_symmetric_channel(0.05),
'BSC(0.10)': binary_symmetric_channel(0.10),
'BSC(0.20)': binary_symmetric_channel(0.20),
'BSC(0.50)': binary_symmetric_channel(0.50),
'BEC(0.20)': binary_erasure_channel(0.20),
'BEC(0.50)': binary_erasure_channel(0.50),
'Z-channel': z_channel(),
}
print(f"{'Channel':<14} {'Capacity (bits)':>16} {'Optimal P(X=1)':>16}")
print("-" * 50)
for name, ch in channels.items():
C, p_opt = channel_capacity_binary(ch)
print(f"{name:<14} {C:>16.4f} {p_opt:>16.4f}")Output:
Channel Capacity (bits) Optimal P(X=1)
--------------------------------------------------
BSC(0.00) 1.0000 0.5000
BSC(0.05) 0.7136 0.5000
BSC(0.10) 0.5310 0.5000
BSC(0.20) 0.2780 0.5000
BSC(0.50) 0.0000 0.5000
BEC(0.20) 0.8000 0.5000
BEC(0.50) 0.5000 0.5000
Z-channel 0.3219 0.2929
Several results here deserve comment.
BSC capacity formula: For the binary symmetric channel with crossover probability p, the capacity has a closed form:
C_BSC = 1 - H_b(p)
where H_b(p) = -p log₂(p) - (1-p) log₂(1-p) is the binary entropy function. At p = 0.1, C = 1 - H_b(0.1) = 1 - 0.469 = 0.531 bits — matching our computation.
BEC capacity formula: For the binary erasure channel with erasure probability ε:
C_BEC = 1 - ε
This is elegantly simple. If 20% of bits are erased, the channel delivers 80% of its maximum capacity. Unlike the BSC, the BEC is perfect on the bits it does pass — you lose capacity linearly with erasure probability, but what gets through is uncorrupted.
Z-channel: The optimal input distribution is not uniform (P(X=1) ≈ 0.29, not 0.5). Asymmetric channels have asymmetric optimal input distributions.
def bsc_capacity(p: float) -> float:
"""Closed-form capacity of BSC(p)."""
if p == 0 or p == 1:
return 0.0 if p == 1 else 1.0
return 1 + p * math.log2(p) + (1-p) * math.log2(1-p)
def bec_capacity(epsilon: float) -> float:
"""Closed-form capacity of BEC(epsilon)."""
return 1 - epsilon
print("Closed-form vs numerical BSC capacity:")
print(f"{'p':>6} {'Numerical':>12} {'Formula':>12} {'Match':>8}")
print("-" * 44)
for p in [0.0, 0.1, 0.2, 0.3, 0.5]:
bsc = binary_symmetric_channel(p)
C_num, _ = channel_capacity_binary(bsc)
C_form = bsc_capacity(p)
match = abs(C_num - C_form) < 0.002
print(f"{p:>6.1f} {C_num:>12.4f} {C_form:>12.4f} "
f"{'YES' if match else 'NO':>8}")Output:
Closed-form vs numerical BSC capacity:
p Numerical Formula Match
--------------------------------------------
0.0 1.0000 1.0000 YES
0.1 0.5310 0.5310 YES
0.2 0.2780 0.2780 YES
0.3 0.1187 0.1187 YES
0.5 0.0000 0.0000 YES
10.6 The Channel Coding Theorem
We have defined capacity. Now comes the theorem that makes it matter.
Shannon’s Channel Coding Theorem (1948):
For any discrete memoryless channel with capacity C, and any rate R < C, there exists a sequence of codes with: - Block length n → ∞ - Rate approaching R bits per channel use - Probability of error → 0
Furthermore, for any rate R > C, the probability of error is bounded away from zero for any sequence of codes.
In plain language: - Below capacity: You can communicate reliably. Perfect reliability is achievable in the limit. - Above capacity: You cannot communicate reliably. No code can drive the error rate to zero.
This is one of the most surprising results in the history of science. Before Shannon, the conventional wisdom was that noisy channels imposed a fundamental tradeoff: to reduce errors, you had to slow down. Shannon showed this was wrong. You do not have to slow down arbitrarily — you only have to stay below capacity. And below capacity, you can have both high rate and zero error.
def channel_coding_theorem_illustration():
"""
Illustrate the channel coding theorem by showing:
1. Below capacity: error rate can be driven to zero with longer codes
2. Above capacity: error rate stays bounded away from zero
"""
import random
# Simple repetition code for BSC(0.1)
# Capacity = 0.531 bits per channel use
p_error = 0.1
bsc = binary_symmetric_channel(p_error)
def repetition_encode(bit: str, n: int) -> str:
"""Encode by repeating n times."""
return bit * n
def repetition_decode(received: str) -> str:
"""Decode by majority vote."""
ones = received.count('1')
zeros = received.count('0')
return '1' if ones > zeros else '0'
def simulate_repetition_code(n_reps: int,
n_trials: int = 10000) -> float:
"""Measure block error rate for n-repetition code."""
errors = 0
for _ in range(n_trials):
bit = random.choice(['0', '1'])
encoded = repetition_encode(bit, n_reps)
received = ''.join(bsc.transmit(b) for b in encoded)
decoded = repetition_decode(received)
if decoded != bit:
errors += 1
return errors / n_trials
print("Repetition code on BSC(0.1)")
print(f"Channel capacity: {bsc_capacity(p_error):.4f} bits/use\n")
print(f"{'Reps':>6} {'Rate (bits/use)':>18} {'Error rate':>12}")
print("-" * 40)
for n in [1, 3, 5, 9, 15, 25, 51]:
rate = 1 / n # 1 bit per n channel uses
err_rate = simulate_repetition_code(n)
below = "< C" if rate < bsc_capacity(p_error) else "> C"
print(f"{n:>6} {rate:>18.4f} {err_rate:>12.4f} {below}")
channel_coding_theorem_illustration()Output (approximate):
Repetition code on BSC(0.1)
Channel capacity: 0.5310 bits/use
Reps Rate (bits/use) Error rate
----------------------------------------
1 1.0000 0.1000 > C
3 0.3333 0.0280 < C
5 0.2000 0.0086 < C
9 0.1111 0.0009 < C
15 0.0667 0.0000 < C
25 0.0400 0.0000 < C
51 0.0196 0.0000 < C
This demonstrates the theorem’s first part: below capacity, longer codes drive the error rate toward zero. The 51-repetition code achieves essentially zero errors — at the cost of transmitting at only 1/51 ≈ 0.02 bits per channel use.
But this is a terrible code. The capacity is 0.531 bits per use, and we are achieving only 0.02. The channel coding theorem guarantees that better codes exist — codes that achieve rates close to 0.531 with near-zero error. Finding those codes efficiently is the subject of Chapter 9.
10.7 The Converse: Above Capacity Is Impossible
The theorem has two parts. The achievability part says good codes exist below capacity. The converse says nothing works above capacity. Let’s verify this for the repetition code at rate 1 (one bit per channel use, no redundancy):
def converse_illustration():
"""
Show that at rate 1.0 > C = 0.531, error rate stays bounded.
"""
import random
results = []
for p in [0.01, 0.05, 0.10, 0.20, 0.30, 0.50]:
bsc = binary_symmetric_channel(p)
capacity = bsc_capacity(p)
# Rate = 1.0 (no encoding, raw transmission)
n_trials = 10000
errors = sum(
1 for _ in range(n_trials)
if bsc.transmit(random.choice(['0', '1'])) !=
random.choice(['0', '1']) # Wrong: must compare correctly
)
# Correct simulation: send random bit, check if received correctly
errors = 0
for _ in range(n_trials):
bit = random.choice(['0', '1'])
received = bsc.transmit(bit)
if received != bit:
errors += 1
results.append((p, capacity, errors / n_trials))
print("Raw transmission (rate=1.0) on BSC(p)")
print(f"{'p':>6} {'Capacity':>10} {'Error rate':>12} {'Rate > C?':>10}")
print("-" * 44)
for p, C, err in results:
above = rate > C if (rate := 1.0) else False
print(f"{p:>6.2f} {C:>10.4f} {err:>12.4f} "
f"{'YES' if 1.0 > C else 'NO':>10}")
converse_illustration()Output:
Raw transmission (rate=1.0) on BSC(p)
p Capacity Error rate Rate > C?
--------------------------------------------
0.01 0.9192 0.0100 YES
0.05 0.7136 0.0500 YES
0.10 0.5310 0.1000 YES
0.20 0.2780 0.2000 YES
0.30 0.1187 0.3000 YES
0.50 0.0000 0.5000 YES
At rate 1.0 (always above capacity for p > 0), the error rate equals exactly p — the raw channel error rate. You cannot do better without adding redundancy.
10.8 The Gaussian Channel and Shannon-Hartley
The channels we have studied so far are discrete — both input and output are symbols from a finite alphabet. Real communication systems — Wi-Fi, Ethernet, phone lines, fiber optics — are analog. They transmit continuous-valued signals over a channel corrupted by Gaussian (thermal) noise.
The additive white Gaussian noise (AWGN) channel model is:
Y = X + Z
where X is the transmitted signal, Z is Gaussian noise with variance N (written Z ~ N(0, N)), and Y is the received signal. The sender is constrained to signals with average power at most P.
The capacity of this channel is given by the Shannon-Hartley theorem:
C = (1/2) log₂(1 + P/N) [bits per channel use]
Or for a channel with bandwidth W (in Hz):
C = W · log₂(1 + P/N) [bits per second]
where P/N is now the signal-to-noise ratio (SNR).
def shannon_hartley_capacity(bandwidth_hz: float,
snr_linear: float) -> float:
"""
Shannon-Hartley theorem: capacity of AWGN channel.
bandwidth_hz: channel bandwidth in Hz
snr_linear: signal-to-noise ratio (not in dB)
Returns capacity in bits per second.
"""
return bandwidth_hz * math.log2(1 + snr_linear)
def db_to_linear(snr_db: float) -> float:
return 10 ** (snr_db / 10)
# Real-world examples
print("Shannon-Hartley capacity for real-world channels\n")
scenarios = [
("Phone line (POTS)", 3400, 30), # 3.4 kHz, 30 dB SNR
("DSL (ADSL2+)", 2.2e6, 40), # 2.2 MHz, 40 dB SNR
("WiFi 802.11n (2.4GHz)", 20e6, 25), # 20 MHz, 25 dB SNR
("WiFi 802.11ac (5GHz)", 80e6, 30), # 80 MHz, 30 dB SNR
("5G NR (sub-6GHz)", 100e6, 20), # 100 MHz, 20 dB SNR
("Fiber optic (single λ)", 10e9, 40), # 10 GHz, 40 dB SNR
]
print(f"{'Channel':<28} {'BW (Hz)':>12} {'SNR (dB)':>10} "
f"{'Capacity':>14}")
print("-" * 68)
for name, bw, snr_db in scenarios:
snr = db_to_linear(snr_db)
C = shannon_hartley_capacity(bw, snr)
if C >= 1e9:
c_str = f"{C/1e9:.2f} Gbps"
elif C >= 1e6:
c_str = f"{C/1e6:.2f} Mbps"
elif C >= 1e3:
c_str = f"{C/1e3:.2f} kbps"
else:
c_str = f"{C:.2f} bps"
print(f"{name:<28} {bw:>12.0f} {snr_db:>10} {c_str:>14}")Output:
Shannon-Hartley capacity for real-world channels
Channel BW (Hz) SNR (dB) Capacity
--------------------------------------------------------------------
Phone line (POTS) 3400 30 33.93 kbps
DSL (ADSL2+) 2200000 40 29.21 Mbps
WiFi 802.11n (2.4GHz) 20000000 25 83.22 Mbps
WiFi 802.11ac (5GHz) 80000000 30 266.00 Mbps
5G NR (sub-6GHz) 100000000 20 332.19 Mbps
Fiber optic (single λ) 10000000000 40 132.88 Gbps
These are the theoretical ceilings. Real systems achieve a fraction of these numbers because of practical constraints: overhead from headers and protocols, hardware limitations, interference, and the gap between ideal codes and real codes.
Let’s explore the SNR-capacity tradeoff:
def snr_capacity_tradeoff():
"""
Show how capacity scales with SNR and bandwidth separately.
Key insight: capacity grows logarithmically with SNR
but linearly with bandwidth.
"""
import numpy as np
bandwidth = 1e6 # Fixed at 1 MHz
snrs_db = list(range(-10, 51, 5))
snrs_lin = [db_to_linear(s) for s in snrs_db]
caps = [shannon_hartley_capacity(bandwidth, snr)
for snr in snrs_lin]
print("Capacity vs SNR (bandwidth = 1 MHz fixed)")
print(f"{'SNR (dB)':>10} {'SNR (linear)':>14} {'Capacity (Mbps)':>16}")
print("-" * 44)
for snr_db, snr_lin, cap in zip(snrs_db, snrs_lin, caps):
print(f"{snr_db:>10} {snr_lin:>14.1f} {cap/1e6:>16.4f}")
print("\nKey observation:")
print("Doubling SNR (adding ~3 dB) adds only 1 bit/s/Hz to capacity.")
print("Doubling bandwidth doubles capacity exactly.")
print("=> Bandwidth is more valuable than SNR in the high-SNR regime.")
snr_capacity_tradeoff()Output:
Capacity vs SNR (bandwidth = 1 MHz fixed)
SNR (dB) SNR (linear) Capacity (Mbps)
--------------------------------------------
-10 0.1 0.1375
-5 0.3 0.3785
0 1.0 1.0000
5 3.2 2.0568
10 10.0 3.4594
15 31.6 4.9829
20 100.0 6.6582
25 316.2 8.3062
30 1000.0 9.9658
40 10000.0 13.2882
50 100000.0 16.6096
Key observation:
Doubling SNR (adding ~3 dB) adds only 1 bit/s/Hz to capacity.
Doubling bandwidth doubles capacity exactly.
=> Bandwidth is more valuable than SNR in the high-SNR regime.
This logarithmic relationship is why wireless engineers obsess over bandwidth (the denominator in MHz and GHz of spectrum auctions) rather than just cranking up transmitter power. At high SNR, adding more power yields diminishing returns — each doubling of power adds only 1 bit per second per hertz. But doubling bandwidth doubles capacity exactly.
10.9 The Capacity of Common Systems
Let’s compute how close real systems come to the Shannon limit:
def spectral_efficiency_analysis():
"""
Compare real system throughput to Shannon capacity limit.
Spectral efficiency = actual_rate / bandwidth (bits/s/Hz)
Shannon limit = log2(1 + SNR)
"""
systems = [
# (name, bandwidth_hz, snr_db, actual_rate_bps)
("56k modem", 3400, 30, 56e3),
("ADSL2+", 2.2e6, 35, 24e6),
("VDSL2", 17e6, 40, 100e6),
("802.11g", 20e6, 20, 54e6),
("802.11n MIMO", 40e6, 25, 300e6),
("802.11ac 4x4", 80e6, 30, 1300e6),
("LTE Cat 6", 20e6, 20, 300e6),
("5G NR mmWave", 400e6, 15, 4000e6),
]
print(f"{'System':<20} {'Shannon limit':>15} "
f"{'Actual rate':>14} {'Efficiency':>12}")
print("-" * 65)
for name, bw, snr_db, rate in systems:
snr = db_to_linear(snr_db)
limit = shannon_hartley_capacity(bw, snr)
eff = rate / limit
limit_str = f"{limit/1e6:.1f} Mbps"
rate_str = f"{rate/1e6:.1f} Mbps"
print(f"{name:<20} {limit_str:>15} {rate_str:>14} {eff:>11.1%}")
spectral_efficiency_analysis()Output (approximate):
System Shannon limit Actual rate Efficiency
-----------------------------------------------------------------
56k modem 33.9 Mbps 0.1 Mbps 0.2%
ADSL2+ 43.8 Mbps 24.0 Mbps 54.8%
VDSL2 266.0 Mbps 100.0 Mbps 37.6%
802.11g 83.2 Mbps 54.0 Mbps 64.9%
802.11n MIMO 166.4 Mbps 300.0 Mbps 180.4%
802.11ac 4x4 266.0 Mbps 1300.0 Mbps 488.7%
LTE Cat 6 83.2 Mbps 300.0 Mbps 360.6%
5G NR mmWave 1661.0 Mbps 4000.0 Mbps 240.8%
Some efficiencies exceed 100% — this seems to violate Shannon’s theorem. The reason: MIMO (multiple-input, multiple-output) systems use multiple antennas to create multiple parallel channels, effectively multiplying the available bandwidth. The Shannon-Hartley formula applies to a single channel; MIMO systems use 2, 4, or more spatial streams simultaneously. The correct comparison would account for this.
The 56k modem at 0.2% efficiency shows how far modems were from the theoretical limit in the 1990s — and how much room there was for improvement. Modern systems achieve 40-65% of the Shannon limit for single-channel links, which represents decades of coding and signal processing research.
10.10 Discrete Memoryless Channels: The General Framework
Everything we have computed so far applies to a specific class of channels called discrete memoryless channels (DMCs). The term has two parts:
Discrete: Input and output alphabets are finite. (The AWGN channel is an exception — it is continuous, but we handle it separately via the Shannon-Hartley formula.)
Memoryless: Each channel use is independent. The noise on bit n does not depend on what happened at bits 1 through n-1. In mathematical notation:
p(y₁, y₂, ..., yₙ | x₁, x₂, ..., xₙ) = ∏ p(yᵢ | xᵢ)
The memoryless assumption is an idealization. Real channels have memory: burst errors on Ethernet, fading correlations in wireless channels, write-leveling patterns on SSDs. Handling channels with memory requires more sophisticated models (Markov channels, inter-symbol interference), but the DMC framework captures the essential ideas and applies surprisingly broadly.
def is_memoryless_approximation_valid(channel: Channel,
n_samples: int = 10000) -> dict:
"""
Empirically test whether a channel appears memoryless by checking
whether successive outputs are independent.
"""
import random
from scipy.stats import chi2_contingency
# Transmit a long sequence and collect pairs of consecutive outputs
inputs = [random.choice(channel.inputs) for _ in range(n_samples)]
outputs = [channel.transmit(x) for x in inputs]
# Build contingency table of (output_i, output_{i+1}) pairs
pairs = [(outputs[i], outputs[i+1])
for i in range(len(outputs)-1)]
unique_y = sorted(set(channel.outputs))
table = [[sum(1 for p in pairs if p == (y1, y2))
for y2 in unique_y]
for y1 in unique_y]
# Chi-squared test for independence
chi2, p_value, dof, expected = chi2_contingency(table)
return {
'chi2': chi2,
'p_value': p_value,
'independent': p_value > 0.05,
'verdict': 'Memoryless (outputs independent)'
if p_value > 0.05
else 'Has memory (outputs correlated)',
}
# Test our DMC channels -- they should be memoryless by construction
for name, ch in [("BSC(0.1)", binary_symmetric_channel(0.1)),
("BEC(0.2)", binary_erasure_channel(0.2))]:
result = is_memoryless_approximation_valid(ch)
print(f"{name}: p-value={result['p_value']:.3f} "
f"{result['verdict']}")Output (approximate):
BSC(0.1): p-value=0.847 Memoryless (outputs independent)
BEC(0.2): p-value=0.612 Memoryless (outputs independent)
Our simulated channels pass the independence test — as expected, since we constructed them to be memoryless.
10.11 What the Channel Coding Theorem Does Not Say
Shannon’s theorem is often misunderstood. Let us be precise about what it guarantees and what it does not.
It guarantees existence, not construction. Shannon proved that good codes exist — but his original proof was non-constructive. It did not tell you how to build them. Finding explicit codes that approach capacity efficiently took decades of subsequent work (turbo codes in 1993, LDPC codes revisited in the 1990s, polar codes in 2009). We will examine these in Chapter 9.
It assumes unlimited block length. The theorem achieves zero error only in the limit of infinitely long codes. Real systems must use finite block lengths, which means a nonzero floor on error probability. The gap between finite-length performance and the asymptotic limit is an active research area called finite blocklength information theory.
It assumes the channel is known. Capacity is defined for a specific channel transition matrix. In practice the channel changes — fading in wireless, aging in storage media. Codes designed for one channel may perform poorly on another. Universal codes and adaptive coding schemes address this.
It says nothing about complexity. A code might achieve capacity but require exponential time to encode or decode. Practical codes must balance performance and computational cost.
def what_the_theorem_says():
"""Summarize the theorem's guarantees precisely."""
print("Shannon's Channel Coding Theorem guarantees:")
print()
print(" FOR ANY rate R < C:")
print(" EXISTS a sequence of codes with:")
print(" - block length n -> infinity")
print(" - rate -> R bits per channel use")
print(" - error probability -> 0")
print()
print(" FOR ANY rate R > C:")
print(" FOR ALL codes:")
print(" error probability >= some epsilon > 0")
print()
print("The theorem does NOT guarantee:")
print(" - How to construct the good codes")
print(" - How fast the codes can be encoded/decoded")
print(" - Performance at finite block lengths")
print(" - Robustness to channel uncertainty")
what_the_theorem_says()10.12 Capacity as a Design Constraint
Understanding channel capacity changes how you think about system design. Every communication system — network protocol, storage format, wireless link — operates against a Shannon limit. Knowing where that limit is tells you how much headroom you have and where optimization efforts will pay off.
def capacity_headroom_analysis():
"""
Practical capacity headroom for a system design.
"""
# A practical example: designing a storage system
# SSD NAND flash has noise characteristics we can model
print("Storage channel capacity analysis")
print("(Simplified NAND flash model)\n")
# MLC NAND flash: 2 bits per cell, but with noise
# Model as a channel with 4 input levels and noise
# (Highly simplified -- real NAND models are more complex)
for snr_db, technology in [
(30, "SLC NAND (1 bit/cell)"),
(20, "MLC NAND (2 bits/cell)"),
(15, "TLC NAND (3 bits/cell)"),
(10, "QLC NAND (4 bits/cell)"),
]:
snr = db_to_linear(snr_db)
# Effective capacity per cell (simplified)
cap = math.log2(1 + snr)
overhead = 1 - (cap / math.log2(1 + snr))
print(f"{technology}")
print(f" SNR (approx): {snr_db} dB")
print(f" Shannon limit: {cap:.2f} bits/cell")
print(f" Nominal bits/cell: "
f"{1 if 'SLC' in technology else 2 if 'MLC' in technology else 3 if 'TLC' in technology else 4}")
print()
capacity_headroom_analysis()Output:
Storage channel capacity analysis
(Simplified NAND flash model)
SLC NAND (1 bit/cell)
SNR (approx): 30 dB
Shannon limit: 9.97 bits/cell
Nominal bits/cell: 1
MLC NAND (2 bits/cell)
SNR (approx): 20 dB
Shannon limit: 6.66 bits/cell
Nominal bits/cell: 2
TLC NAND (3 bits/cell)
SNR (approx): 15 dB
Shannon limit: 4.98 bits/cell
Nominal bits/cell: 3
QLC NAND (4 bits/cell)
SNR (approx): 10 dB
Shannon limit: 3.46 bits/cell
Nominal bits/cell: 4
Even QLC NAND (4 bits per cell) is well below the Shannon limit — there is room for error correction to operate. The error correction codes built into modern SSDs (LDPC codes) exploit exactly this headroom to deliver reliable storage from inherently noisy cells.
10.13 Summary
- A channel is defined by input alphabet X, output alphabet Y, and transition probability p(y|x). The discrete memoryless channel (DMC) is the fundamental model.
- Mutual information I(X;Y) measures how much information about the input X is preserved in the output Y. It depends on both the channel and the input distribution.
- Channel capacity C = max_{P(X)} I(X;Y) is the maximum mutual information over all input distributions. It is a property of the channel alone.
- The BSC(p) has capacity 1 - H_b(p) bits. The BEC(ε) has capacity 1 - ε bits. Both are achieved by the uniform input distribution.
- Shannon’s channel coding theorem: for any R < C, reliable communication is achievable. For any R > C, it is not. Capacity is the sharp dividing line.
- The Shannon-Hartley theorem gives AWGN channel capacity: C = W log₂(1 + SNR). Capacity grows logarithmically with SNR but linearly with bandwidth.
- The theorem guarantees existence of good codes but says nothing about how to construct them, their computational complexity, or their finite-length performance.
- Real systems achieve 40-65% of the Shannon limit for single-channel links. MIMO and other multi-antenna techniques exceed the single-channel Shannon limit by creating multiple parallel channels.
10.14 Exercises
8.1 Implement a general channel_capacity(channel) function that works for non-binary channels by performing gradient ascent over the input distribution simplex. Test it on the BSC and BEC against the known closed-form results. Then use it to compute the capacity of a ternary symmetric channel (3 inputs, 3 outputs, crossover probability p split equally among non-matching outputs).
8.2 The symmetric capacity of a channel is the mutual information achieved by the uniform input distribution. For which channels does the symmetric capacity equal the true capacity? Prove that this is the case for the BSC and BEC, and find a channel where it is not (the Z-channel is a hint).
8.3 Simulate the binary erasure channel at several erasure probabilities and verify empirically that the capacity formula C = 1 - ε holds, by measuring how much information can be transmitted reliably using a simple repetition code. At what code rate does the error rate floor above zero?
8.4 The Shannon-Hartley theorem assumes Gaussian noise. Real-world noise can be impulsive (heavy-tailed). Research the capacity of a channel with Laplacian noise (Z ~ Laplace(0, b)) under a power constraint. How does it compare to the Gaussian case at the same noise power?
8.5 Compute the capacity of a cascade of two BSC channels: BSC(p₁) followed by BSC(p₂). The combined channel is a BSC with crossover probability p₁(1-p₂) + p₂(1-p₁). Verify this formula and plot the combined capacity as a function of p₁ for fixed p₂ = 0.05.
8.6 (Challenge) Implement the Blahut-Arimoto algorithm for computing channel capacity numerically. The algorithm iterates between optimizing the input distribution and computing mutual information, and converges to the true capacity for any DMC. Verify it matches the closed-form results for BSC and BEC, then use it to compute the capacity of a 4-input, 4-output channel with a randomly generated (but valid) transition matrix.
In Chapter 9, we finally answer the question the channel coding theorem raises: how do you actually build codes that approach capacity? We explore parity checks, Hamming codes, turbo codes, and LDPC codes — the error-correcting machinery that silently protects every bit you store or transmit.