Theoretical Review
Zipf's law is most easily observed by plotting the data on a log-log graph, with the axes being log (rank order) and log (frequency). For example, the word "the" (as described above) would appear at x = log(1), y = log(69971). The data conform to Zipf's law to the extent that the plot is linear.
Formally, let:
- N be the number of elements;
- k be their rank;
- s be the value of the exponent characterizing the distribution.
Zipf's law then predicts that out of a population of N elements, the frequency of elements of rank k, f(k;s,N), is:
Zipf's law holds if the number of occurrences of each element are independent and identically distributed random variables with power law distribution
In the example of the frequency of words in the English language, N is the number of words in the English language and, if we use the classic version of Zipf's law, the exponent s is 1. f(k; s,N) will then be the fraction of the time the kth most common word occurs.
The law may also be written:
where HN,s is the Nth generalized harmonic number.
The simplest case of Zipf's law is a "1⁄f function". Given a set of Zipfian distributed frequencies, sorted from most common to least common, the second most common frequency will occur ½ as often as the first. The third most common frequency will occur ⅓ as often as the first. The nth most common frequency will occur 1⁄n as often as the first. However, this cannot hold exactly, because items must occur an integer number of times; there cannot be 2.5 occurrences of a word. Nevertheless, over fairly wide ranges, and to a fairly good approximation, many natural phenomena obey Zipf's law.
Mathematically, the sum of all relative frequencies in a Zipf distribution is equal to the harmonic series, and
In human languages, word frequencies have a very heavy-tailed distribution, and can therefore be modeled reasonably well by a Zipf distribution with an s close to 1.
As long as the exponent s exceeds 1, it is possible for such a law to hold with infinitely many words, since if s > 1 then
where ζ is Riemann's zeta function.
Read more about this topic: Zipf's Law
Famous quotes containing the words theoretical and/or review:
“Post-structuralism is among other things a kind of theoretical hangover from the failed uprising of 68Ma way of keeping the revolution warm at the level of language, blending the euphoric libertarianism of that moment with the stoical melancholia of its aftermath.”
—Terry Eagleton (b. 1943)
“Generally there is no consistent evidence of significant differences in school achievement between children of working and nonworking mothers, but differences that do appear are often related to maternal satisfaction with her chosen role, and the quality of substitute care.”
—Ruth E. Zambrana, U.S. researcher, M. Hurst, and R.L. Hite. The Working Mother in Contemporary Perspectives: A Review of Literature, Pediatrics (December 1979)