Shannon's noiseless coding theorem
From Wikipedia, the free encyclopedia
In information theory, Shannon's noiseless coding theorem places an upper and a lower bound on the minimal possible expected length of codewords as a function of the entropy of the input word (which is viewed as a random variable) and of the size of the target alphabet.
[edit] Shannon's statement
Let X be a random variable taking values in some finite alphabet Σ1 and let f be a decipherable code from Σ1 to Σ2 where . Let S denote the resulting wordlength of f(X).
If f is optimal in the sense that it has the minimal expected wordlength for X, then
[edit] Proof
Let si denote the wordlength of each possible xi (). Define , where C is chosen so that .
Then
where the second line follows from Gibbs' inequality and the third line follows from Kraft's inequality: so .
For the second inequality we may set
so that
and so
and
and so by Kraft's inequality there exists a prefix-free code having those wordlengths. Thus the minimal S satisfies