From Boltzmann’s Atoms to “Knowledge Atoms”

Jason Chin
5 min readMar 11, 2017


“In 1904 at a physics conference in St. Louis most physicists seemed to reject atoms and he was not even invited to the physics section.” — Wikipedia page about Ludwig Boltzmann

Lucky for us, we live in an era that we can see “see” atoms with atomic force microscope. And, fortunately for myself, I am personally involved in the development of the technologies to read DNA molecule by molecule to get whole human or gorilla genome. Without the theory of atoms, many modern day technological miracles would be impossible. Imaging if one lived in 19 century, when whether the existence of atom was still a “rough theory”, how would one prove the existence of atom with instruments that were only capable doing macroscopic measurement?

Boltzmann first formulated the Boltzmann distribution P(S) ~ exp(-E_s/kT) when he studied the statistical mechanics of gases in equilibrium. Boltzmann was also a pioneer to define modern atomic theory. In his later life, he defended and debated about atomic theory against other eminent physicists in the same period of time. Some believed his tragic suicide in 1906 was related to the depression caused his long fight for atomic theory with his rival scientists. (I think this could be an interesting book to read: Boltzmann’s Atom: The Great Debate That Launched A Revolution In Physics.)

Boltzmann distribution P(S) ~ exp(-E_s/kT) gives the probability of a system at a state s with energy E. The state s is typically determined by some collective pattern of microscopic physical entities, e.g., a particular configuration of a set of spins in up or down states. This probability distribution links the microscopic world to some macroscopic observables. For example, one can derive the idea gas law (PV=nRT) from the assumption that the ideal gas is constituted of simple atoms which their distribution of energy states follows the Boltzmann distribution.

Boltzmann Machine (BM) and Restricted Boltzmann Machine (RBM) were introduced in mid-1980’s as model of artificial neural networks that could learn from a set of training patterns and “recreate” those pattern autonomously as “generative model”. To some degree, what D. Ackley, G. Hinton, T. Sejnowskip and others showed was that it was possible to build a physical system which the Boltzmann distribution of the system P(S) reflected the probability distribution of the training set.

The energy model used in BM and RBM is closely related to the “spin-glass” model in statistical mechanics. It is a very simple model. You don’t need to know Dirac’s original explanation of the existence of “spin” frome relativity and quantum mechanics. (If you insist, check In most of our context, “spin” is just like “bit”, some elementary unit s_i, at position i, that has two step 1/0, “+/-” or “up/down”. “Glass” means the system is typically getting in a state that has some internal “tanglement” between those elementary units rather than the lowest possible energy state. In fact, because such system can have complicated tangled state structures, it makes using them for machine learning possible.

The energy of a “spin-galss” system is defined by E = \sum_{ij} W_{ij} s_i s_j. It is deceivingly a simple questions that inspires of thousands research papers studying the properties of such system in physics and machine learning community. Well, while it is easy to write such equation, the complexity comes from the “connection weight” W_ij can have complicated form.

When the connection weight “W_{ij}” are all positives, the problem will be relatively simple. Other other hand, the connection weigh “W_{ij}” can be both positive and negative, the problem to find the “low energy states” suddenly becomes a non-trivial problem. I will like to come back to this some time in the future. However, it is such non-trivial behaviors of the machine making it useful for machine learning.

Back to BM and RBM, when I started to study the original paper introducing such physics-inspired machine-learning model, I was surprised that the term “knowledge atoms” was used in Paul Smolensky’s Paper that introduced RBM to explain the “nature of knowledge”.

“Point 5. Knowledge atoms are fragments of representations that accumulate with experience.” Smolensky in Information Processing in Dynamical Systems: Foundations of Harmony Theory, 1986

I think, like Boltzmann, Smolensky was probably thinking how a mind could “simplify” complicated macroscopic world like physicists explained all different phases of material with a simple atomic theory. With a physical model like RBM, it provides a mechnism that can “summarize” the explicit information (Representation Features / Training data) with the hidden units (Knowledge Atoms). Smolensky borrowed a couple of useful concepts, e.g. simulation annealling, phase transistion and symmetry breaking, etc., from statistical mechanics to explain how such system can be used to model how these “knowledge atoms” work.

Before getting into Smolensky’s paper, I have not heard any one used the term “knowledge atoms” (4540 hits v.s. “Restricted Boltzmann Machine” 81,200 hits with Google search). Maybe Smolensky was a little bit too ambitious coining the term “knowledge atoms” then. But, if we takes what happens in the last half 19th century as a guide, maybe we just have not the right “instruments” to “measure and confirm” such “knowledge atoms” yet. I think it is an other exciting era for discoveries with scale of computation available and continuous progress of scientific instrument development today.

Whether we can identify such “knowledge atoms” beyond some sort of abstract concepts is yet to see. The recent progress of deep learning has shown great utilities for solving real world problems beyond theory. To some degree, we might be like those “engineers” who designed (real thermal) “engines” in 19th century with all different DL toolset. We will build better AI machines even without a full understanding how it works. But, through such process, we could eventual know what such “knowledge atoms” are and it could lead to better understanding about how ourselves think.



Jason Chin

Common thing in the following subjects: physics, computation, bioinformatics, systems bio., software, engineering? -- A curious mind. The opinions are my own.