Skip to main content
Log in

The variation of Zipf’s law in human language

  • Statistical Physics and Biological Information
  • Published:
The European Physical Journal B - Condensed Matter and Complex Systems Aims and scope Submit manuscript

Abstract.

Words in humans follow the so-called Zipf’s law. More precisely, the word frequency spectrum follows a power function, whose typical exponent is β≈2, but significant variations are found. We hypothesize that the full range of variation reflects our ability to balance the goal of communication, i.e. maximizing the information transfer and the cost of communication, imposed by the limitations of the human brain. We show that the higher the importance of satisfying the goal of communication, the higher the exponent. Here, assuming that words are used according to their meaning we explain why variation in β should be limited to a particular domain. From the one hand, we explain a non-trivial lower bound at about β=1.6 for communication systems neglecting the goal of the communication. From the other hand, we find a sudden divergence of β if a certain critical balance is crossed. At the same time a sharp transition to maximum information transfer and unfortunately, maximum communication cost, is found. Consistently with the upper bound of real exponents, the maximum finite value predicted is about β=2.4. It is convenient for human language not to cross the transition and remain in a domain where maximum information transfer is high but at a reasonable cost. Therefore, only a particular range of exponents should be found in human speakers. The exponent β contains information about the balance between cost and communicative efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • G.K. Zipf, Human behaviour and the principle of least effort. An introduction to human ecology, 1st edn. (Hafner reprint, New York, 1972) (Cambridge, MA: Addison-Wesley, 1949)

  • G.K. Zipf, The psycho-biology of language (Houghton Mifflin, Boston, 1935)

  • R.J. Chitashvili, R.H. Baayen, in Quantitative Text Analysis, edited by G. Altmann, L. Hřebíček (Wissenschaftlicher Verlag Trier, Trier, 1993), pp. 54–135

  • J. Tuldava, J. Quantitative Linguistics 3(1), 38 (1996)

    Article  Google Scholar 

  • V.K. Balasubrahmanyan, S. Naranan, J. Quantitative Linguistics 3(3), 177 (1996)

    Article  Google Scholar 

  • R. Ferrer i Cancho, submitted to the J. Quantitative Linguistics (2002)

  • R. Ferrer i Cancho, R.V. Solé, J. Quantitative Linguistics 8(3), 165 (2001)

    Google Scholar 

  • M.A. Montemurro, Physica A 300, 567 (2001)

    MATH  ADS  Google Scholar 

  • M.A. Montemurro, D. Zanette, Glottometrics 4, 87 (2002)

    Google Scholar 

  • R.G. Piotrowski, V.E. Pashkovskii, V.R. Piotrowski, Automatic Documentation and Mathematical Linguistics 28(5), 28 (1995), first published in Nauchno-Tekhnicheskaya Informatisiya, Seriya 2, Vol. 28, No. 11. pp. 21–25, 1994

    Google Scholar 

  • X. Piotrowska, W. Pashkovska, R. Piotrowski, to appear (2003)

  • L. Brilluen, Science and Theory of Information (Russian translation) (Gos. Izd-vo Fiz,-Mat. Lit-ry, Moscow, 1960)

  • G.K. Zipf, Science 96, 344 (1942)

    ADS  Google Scholar 

  • A.N. Kolguškin, Linguistic and engineering studies in automatic language translation of scientific Russian into English. Phase II (University of Washington Press, Seattle, 1970)

  • R. Ferrer i Cancho, Physica A 345, 275 (2004), doi:10.1016/j.physa.2004.06.158

    ADS  Google Scholar 

  • G.A. Miller, N. Chomsky, in Handbook of Mathematical Psychology, edited by R.D. Luce, R. Bush, E. Galanter (Wiley, New York, 1963), Vol. 2

  • L. Reder, J.R. Anderson, R.A. Bjork, J. Experimental Psychology 102, 648 (1974)

    Google Scholar 

  • R. Köhler, Zur Linguistischen Synergetik: Struktur und Dynamik der Lexik (Brockmeyer, Bochum, 1986)

  • C.D. Manning, H. Schütze, Foundations of Statistical Natural Language Processing (MIT Press, Cambridge, MA, 1999), Chap. Introduction

  • S. Wolfram, A new kind of science (Wolfram Media, Champaign, 2002)

  • M.A. Nowak, J.B. Plotkin, V.A. Jansen, Nature 404, 495 (2000)

    ADS  Google Scholar 

  • M.A. Nowak, J. Theor. Biol. 204, 179 (2000), doi:10.1006/jtbi.2000.1085

    Google Scholar 

  • M.A. Nowak, Phil. Trans. R. Soc. Lond. B 355, 1615 (2000)

    Google Scholar 

  • W. Li, IEEE T. Inform. Theory 38(6), 1842 (1992)

    Google Scholar 

  • G.A. Miller, Am. J. Psychol. 70, 311 (1957)

    Google Scholar 

  • B. Mandelbrot, in Readings in Mathematical Social Sciences, edited by P.F. Lazarsfield, N.W. Henry (MIT Press, Cambridge, 1966), pp. 151–168

  • J.S. Nicolis, Chaos and information processing (World Scientific, Singapore, 1991)

  • R. Suzuki, P.L. Tyack, J. Buck, Anim. Behav. 69, 9 (2005)

    Google Scholar 

  • H.A. Simon, Biometrika 42, 425 (1955)

    MATH  MathSciNet  Google Scholar 

  • D.H. Zanette, S.C. Manrubia, Physica A 295(1-2), 1 (2001)

    Google Scholar 

  • A. Rapoport, Quantitative Linguistics 16, 1 (1982)

    Google Scholar 

  • B. Mandelbrot, in Communication theory, edited by W. Jackson (Butterworths, London, 1953), p. 486

  • A.A. Tsonis, C. Schultz, P.A. Tsonis, Complexity 3(5), 12 (1997)

    Google Scholar 

  • I. Kanter, D.A. Kessler, Phys. Rev. Lett. 74, 4559 (1995)

    ADS  Google Scholar 

  • S. Naranan, V.K. Balasubrahmanyan, Current Science 63, 261 (1992)

    Google Scholar 

  • S. Naranan, V.K. Balasubrahmanyan, Current Science 63, 297 (1992)

    Google Scholar 

  • S. Naranan, V.K. Balasubrahmanyan, J. Scientific Industrial Res. 52, 728 (1993)

    Google Scholar 

  • R. Ferrer i Cancho, R.V. Solé, Proc. Natl. Acad. Sci. USA 100, 788 (2003)

    MATH  MathSciNet  ADS  Google Scholar 

  • P. Harremoës, F. Topsøe, Entropy 3, 227 (2001)

    Article  MathSciNet  Google Scholar 

  • P. Harremoës, F. Topsøe, in Proceedings of the International Symposium on Information Theory, Lausanne, Switzerland (2002), p. 207

  • P. Allegrini, P. Gricolini, L. Palatella, Chaos, Solitons and Fractals 20, 95 (2004)

    MATH  Google Scholar 

  • P. Allegrini, P. Gricolini, L. Palatella, World Scientific (2003), submitted

  • A.G. Bashkirov, A.V. Vityazev, Physica A 277, 136 (2000)

    ADS  Google Scholar 

  • A.G. Bashkirov, cond-mat/0211685 (2003)

  • F. Pulvermüller, The Neuroscience of Language. On Brain Circuits of Words and Serial Order (Cambridge University Press, Cambridge, 2003)

  • R. Ferrer i Cancho, submitted to Phys. Rev. E (2004)

  • R. Ferrer i Cancho, F. Reina, J. Quantitative Linguistics 9, 35

  • C.E. Shannon, Bell Systems Techn. J. 27, 379 (1948)

    MathSciNet  MATH  Google Scholar 

  • T.J. Crow, British J. Psychiatry 173, 303 (1998)

    Article  Google Scholar 

  • R. Köhler, Theor. Linguist. 14(2-3), 241 (1987)

    Google Scholar 

  • W. Wildgen, Recherches semiotiques - Semiotic Inquiry 14, 53 (1989)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to R. Ferrer i Cancho.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ferrer i Cancho, R. The variation of Zipf’s law in human language. Eur. Phys. J. B 44, 249–257 (2005). https://doi.org/10.1140/epjb/e2005-00121-8

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1140/epjb/e2005-00121-8

Keywords

Navigation