Total views : 156

Modification of Zipf-Mandelbrot Law for Text Analysis using Linear Regression

Affiliations

  • Hindustan Institute of Technology and Science, Chennai – 603103, Tamil Nadu, India

Abstract


Background: The application of Zipf’s law is ubiquitous in linguistics and other fields. Mandelbrot proposed a modification of the law called Zipf-Mandelbrot law (ZM). An enhanced form of ZM law has been proposed. Methods: In this paper, we approximate the logarithmic form of ZM law into a linear regression form of arbitrary order of the inverse of the Zipfi an rank of words in a text. The maximum likelihood solution of the regression model is given in closed form. This is in contrast to the complex search for the optimum solution of the enhanced ZM models. Findings: The performance of the proposed model is shown to compare favorably with that of ZM law as well as other existing models using Chi-Square goodness of fit test. Improvements: The present work addresses mainly the lower ranks, so we propose to extend the work to higher order ranks using LNRE model in the future.

Keywords

Goodness of Fit, Linear Regression, Quantitative Linguistics, Zipf-Mandelbrot Law.

Full Text:

 |  (PDF views: 129)

References


  • Zipf GK. The Psycho–Biology of Language. Boston: Houghton Mifflin; 1935.
  • Zipf GK. Human behaviour and the principle of the least effort. A Introduction to Human Ecology. New York: Hafner; 1949.
  • Wyllys Ronald E. Empirical and theoretical bases of Zipf ’s law. Library Trends. 1981; 30(1):53-64.
  • Mandelbrot B. An information theory of statistical structure of language. In: Jackson WE, editor. Communication Theory. New York: Academic Press; 1953. P. 503-12.
  • Mandelbrot B. On the theory of word frequencies and on related Markovian models of discourse. In: Jakobson R. editor. Structure of language and its Mathematical Aspects, American Mathematical Society, Providence Rhode Island. 1962. p. 190-219.
  • Montemurro, Marcelo A. Beyond the Zipf–Mandelbrot law in quantitative linguistics. Physica A: Statistical Mechanics and its Applications. 2001; 300(3):567-78.
  • Khmaladze EV. The statistical Analysis of large number of rare events. Technical report MS-R8804,Dept of Mathematical Statistics, CWI. Amsterdam:Center of Mathematics and Compute Science(1987).
  • Evert S. A simple LNRE model for random character sequences. Proceedings of JADT. 2004; 2004.
  • Ioan-Iovi P. Word frequency studies. Walter de Gruyter. 2009; 64.
  • Riyal Manoj Kumar, et al. Rank-frequency analysis of characters in Garhwali text: Emergence of Zipf law. Current Science. 2016; 110(3): 429-34.
  • Baayen R. Harald. Word frequency distributions. Science & Business Media. 2001; 18.
  • Available from: www.math2.org/math/expansion/log.htm 13. Available from: www.wikepedia.org/wiki/Naturallograthim.2016.
  • Conrad J. Heart of darkness. Black Wood Magazine. 3rd ed. 1899.
  • Mohandass Karamchand Gandhi. My Experiments with Truth. 5th ed. Sublime Books; 1921.
  • Available from: http://stattrek.com/chi-square-test/goodnessof-fit.asp

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.