“Information Theory and Statistical Mechanics” is the title of a paper that E.T. Jaynes published about 60 years ago. (Official journal link Information Theory and Statistical Mechanics, you can download PDF from https://journals.aps.org/pr/abstract/10.1103/PhysRev.106.620 )
For people who knows both information theory and statistical mechanics, one can recognize the same form “p log(p)” happens in both fields. In statistical mechanics, we learn about “p_i” is some probability of a “system” in state “i”, and the entropy of such system is the sum of p log(p) of all possible state. In Shannon’s information theory, the sum of p log(p) is the information (the amount of uncertainty reduced) when one received ith symbols from a sets of symbols with a probablity distribution p_i.
Does the the same formula indicate some “deep” connection between information theory and statistical physics? Well, I guess the answer is “yes”. But, we need to be careful about simply equating things just because they have the form as Jaynes wrote in the Introduction section: “The mere fact that the same mathematical expression -p_i \sum p_i occurs both in statistical mechanics an in information theory does not itself establish any connection between these fields”.
Physicists typically treat the underlying microscopic objects as reality and treat entropy as a quantity that we can compute directly from the underlying reality. In contrast, Shannon’s information is a construction from how human passing code through communication channel. One we typically consider “objective” and the other we consider “subjective”. However, what Jaynes tried to argue in the paper was that, given the observed macroscopic physical quantities as constraints, we can “infer” other related physical observation by applying “maximum entropy principle.” The line between “objectivity” and “subjectivity” becomes blurred.
In nature sciences, scientists do want to maintain “objectivity”. We, as human scientists, hope to remove ourselves from “nature law” work. “Subjective” way for describing how nature works is unfavorable in general. In some sense, one might think Jaynes’ method using “maximum entropy principle” for deriving and inferring physical quantities is more or less useful “trick” rather than fundamental principle. There are some interesting discussion in stackexchange about this: http://physics.stackexchange.com/questions/26821/what-are-some-critiques-of-jaynes-approach-to-statistical-mechanics
In some machine learning literatures, we can find shadows of the language and methods of statistical mechanics. In the context of machine learning, I think whether the interpretation is “objective” or “subjective” is no longer important. To some degree, Jaynes’ paper actually sets up the foundation why some of the methods used statistical mechanics become really useful for certain types of machine learning techniques. We will come back to the later.