## February 14, 2010

### Algorithmic Thermodynamics

#### Posted by John Baez

Mike Stay and I would love comments on this paper:

We got into writing this because Mike has worked a lot on algorithmic information theory. Gregory Chaitin has described this subject as “the result of putting Shannon’s information theory and Turing’s computability theory into a cocktail shaker and shaking vigorously.” In particular, this theory has revealed a deep link between ‘algorithmic entropy’ and the notion of entropy used in information theory and thermodynamics.

The algorithmic entropy of a string of bits is a measure of much information that string contains. It’s a bit subtler than the length of the shortest program that does the job . But it’s close: the difference between these is bounded by a constant. Anyway, the idea is that a random string like this:

$1010001011010111010010010101101110101000101010000101110$

has more algorithmic entropy than one with a simple pattern, like this:

$0000000000000000000000000000000000000000000000000000000$

On the other hand, in thermodynamics we define the Gibbs entropy of a probability measure $p$ on a countable set $X$ to be

$S(p) = - \sum_{x \in X} p(x) \ln p(x)$

If we think of $X$ as a set of possible outcomes of an experiment, the entropy says how much information we gain — on average — by learning the result of the experiment, if beforehand all we knew was that each outcome $x$ happened with probability $p(x)$.

If the set of outcomes is a set of bit strings, or some other set of ‘signals’, Gibbs entropy has another name: Shannon entropy. Gibbs was interested in chemistry and steam engines. Shannon was interested in communication, and measuring the amount of information in a signal. But the math is the same. This is a great story in itself! If you don’t know this story, try the links.

On the other hand, algorithmic entropy seems really different. Why? Both algorithmic entropy and Shannon entropy can be used to measure the amount of information in bit strings. But algorithmic entropy can measure the information in a single bit string, while Shannon entropy applies only to probability measures on bit strings.

Nonetheless, there are deep relationships between these two kinds of entropy. So, when I started talking about thermodynamics as part of a big set of analogies between different theories of physics in week289, Mike naturally wondered if algorithmic information theory should become part of this story. And that’s how this paper was born!

Here’s the idea:

Abstract: Algorithmic entropy can be seen as a special case of entropy as studied in statistical mechanics. This viewpoint allows us to apply many techniques developed for use in thermodynamics to the subject of algorithmic information theory. In particular, suppose we fix a universal prefix-free Turing machine and let $X$ be the set of programs that halt for this machine. Then we can regard $X$ as a set of ‘microstates’, and treat any function on $X$ as an ‘observable’. For any collection of observables, we can study the Gibbs ensemble that maximizes entropy subject to constraints on expected values of these observables. We illustrate this by taking the log runtime, length, and output of a program as observables analogous to the energy $E$, volume $V$ and number of molecules $N$ in a container of gas. The conjugate variables of these observables allow us to define quantities which we call the ‘algorithmic temperature’ $T$, ‘algorithmic pressure’ $P$ and ‘algorithmic potential’ $\mu$, since they are analogous to the temperature, pressure and chemical potential. We derive an analogue of the fundamental thermodynamic relation

$d E = T d S - P d V + \mu d N,$

and use it to study thermodynamic cycles analogous to those for heat engines. We also investigate the values of $T, P$ and $\mu$ for which the partition function converges. At some points on the boundary of this domain of convergence, the partition function becomes uncomputable. Indeed, at these points the partition function itself has nontrivial algorithmic entropy.

As the first sentence hints, one of the fun things we noticed is that algorithmic entropy is a special case of Gibbs entropy — but only if we generalize a bit and use relative entropy. They say “everything is relative”. I don’t know if that’s true, but it’s sure true for entropy. Here’s the entropy of the probability measure $p$ relative to the probability measure $q$:

$S(p,q) = - \sum_{x \in X} p(x) \ln \left(\frac{p(x)}{q(x)}\right)$

This says how much information we gain when we learn that the outcomes of an experiment are distributed according to $p$, if beforehand we had thought they were distributed according to $q$.

Another fun thing about this paper is how it turns around an old idea of Charles Babbage. His Analytical Engine was a theoretical design for computer powered by a steam engine. We describe a theoretical design for a heat engine powered by programs!

Posted at February 14, 2010 6:37 PM UTC

TrackBack URL for this Entry:   http://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/2170

### Re: Algorithmic Thermodynamics

Quoting from the paper:

“Charles Babbage described a computer powered by a steam engine; we describe a heat engine powered by programs! We admit that the signiﬁcance of this line of thinking remains a bit mysterious. However, we hope it points the way toward a further synthesis of algorithmic information theory and thermodynamics. We call this hoped-for synthesis ‘algorithmic thermodynamics’.”

Though programs can be appreciated abstractly, I think people lose sight of the physical implementation of a program = unfolding of an stored electro-magnetic configuration on a hard drive. I think it was Bennett who said that a computable operation was no longer reversible after a release of heat had occurred.

This reminded of the other paper you guys collaborated on, something like: Does the HUP imply Godelian Incompleteness?

Posted by: Stephen Harris on February 15, 2010 5:46 AM | Permalink | Reply to this

### Re: Algorithmic Thermodynamics

Does the HUP imply Godelian Incompleteness?

That was actually Cris Calude and me.

In this latest paper, John and I don’t try to connect physical thermodynamics to algorithmic thermodynamics; I certainly intend to look into it, though!

Posted by: Mike Stay on February 15, 2010 6:15 AM | Permalink | Reply to this

### Re: Algorithmic Thermodynamics

Li and Vitanyi talk about an algorithmic approach to physical thermodynamics by considering the shortest program that outputs a description of the physical state. See chapter 8 of their book we cite.

Posted by: Mike Stay on February 15, 2010 6:31 AM | Permalink | Reply to this

### Re: Algorithmic Thermodynamics

Apparently John’s having trouble posting, so he asked me to post this for him:

Mike wrote:

In this latest paper, John and I don’t try to connect physical thermodynamics to algorithmic thermodynamics…

People could find this confusing. After all, we show that that the math of algorithmic thermodynamics is formally identical to the math of physical thermodynamics! So we do connect them in that way.

But I think you mean this: nothing in our paper has anything to do with the physical implementation of programs on hardware, or the application of thermodynamics to the study of hardware. So in that sense, algorithmic thermodynamics remains unconnected to the physical thermodynamics of computers.

Which is indeed something that would be good to fix, someday….

Posted by: Mike Stay on February 15, 2010 8:04 AM | Permalink | Reply to this

### Re: Algorithmic Thermodynamics

Yes, I did read too much into the paper to arrive at a stronger physical connection. Perhaps I also over-interpreted the Wikipedia article on Shannon entropy and Information Theory. I also remember reading somewhere that Shannon constructed his entropy to look like Boltzmann entropy.

“But, at a multidisciplinary level, connections can be made between thermodynamic and informational entropy, although it took many years in the development of the theories of statistical mechanics and information theory to make the relationship fully apparent. In fact, in the view of Jaynes (1957), thermodynamics should be seen as an application of Shannon’s information theory: the thermodynamic entropy is interpreted as being an estimate of the amount of further Shannon information needed to define the detailed microscopic state of the system, that remains uncommunicated by a description solely in terms of the macroscopic variables of classical thermodynamics. For example, adding heat to a system increases its thermodynamic entropy because it increases the number of possible microscopic states that it could be in, thus making any complete state description longer.”

Introduction to Algorithmic Information Theory by Nick Szabo

“Recent discoveries have unified the fields of computer science and information theory into the field of algorithmic information theory. This field is also known by its main result, Kolmogorov complexity. Kolmogorov complexity gives us a new way to grasp the mathematics of information, which is used to describe the structures of the world.”

SH: I guess I jumped to my conclusion because I’ve read some descriptions not worded precisely enough.

Posted by: Stephen Harris on February 15, 2010 9:18 AM | Permalink | Reply to this

### Re: Algorithmic Thermodynamics

An aside…

I am a big fan of Jaynes, so I’m happy whenever I see his name pop up.

Posted by: Eric Forgy on February 15, 2010 9:32 AM | Permalink | Reply to this

### Re: Algorithmic Thermodynamics

Hi,

I haven’t read the paper yet, but while I have a passing moment here, I hope to quickly relate this to a recent challenge I posed in a series of articles:

Information is physical and the ability to transmit wireless information is related to the ability to transmit electromagnetic waves. It would be interesting to study information theory in relation to electromagnetic theory.

Times up…

Posted by: Eric Forgy on February 15, 2010 7:03 AM | Permalink | Reply to this

### Re: Algorithmic Thermodynamics

Nice paper!

When talking about “programs”, you do not distinguish executable code and data - it is custom not to make this distinction in the present context, but maybe some of your readers will be thankful if you explain this in the introducton :-)

A prefix-free Turing machine is one whose halting programs form a prefix-free set.

There is something here that I don’t get: If the machine halts if fed with a program X, it is supposed not to halt for every other program that starts with X?
(That sounds strange to me, maybe it’s easier for me to think of equivalence classes of programs instead, I’ll try that).

Some misprints:

p.8, double “the”: We use μ/T to stand for the the conjugate variable of N.

p.10, verb should denote plural: Then simple calculations, familiar from statistical mechanics [17], shows…

p.10, missing “to” in “useful to think”: To build intuition, it is useful think of the entropy S

p.10: One “the” too much: The algorithmic temperature, T , is the roughly the number of

[John Baez: Thanks for all the corrections! For some reason I’m not being allowed to post comments. Until this gets fixed, I will have to manifest myself as ‘the ghost in the machine’.

You write:

When talking about “programs”, you do not distinguish executable code and data…

Yeah, sorry. Our way of speaking would probably be understood as convenient slang among people who work on algorithmic information theory. But since we’re hoping to expand our audience beyond those folks, we should be clearer.

If everyone blogged about their papers before publishing them, everyone would see how many obstacles to understanding their papers contain!

There is something here that I don’t get: If the machine halts if fed with a program X, it is supposed not to halt for every other program that starts with X?

The issue of ‘halting’ is not really the main point here. The key idea is that a program should not contain another as an initial segment. So intuitively, just imagine that every program must end with a string that says

THE END

written out in binary.

As explained in the quote by Mike here, this is just a convenient trick for making

X 2-|X|

converge when we sum over all programs X. This let us define a probability measure on programs — or more precisely, on the set of things we’re allowed to feed our universal Turing machine! That’s very convenient.

Mike’s thesis attributes this trick to Chaitin, while the Wikipedia article says “There are several variants of Kolmogorov complexity or algorithmic information; the most widely used one is based on self-delimiting programs and is mainly due to Leonid Levin (1974).”]

Posted by: Tim van Beek on February 15, 2010 9:41 AM | Permalink | Reply to this

### Re: Algorithmic Thermodynamics

When you show the two strings above and say that the uppermost has more algorithmic entropy, my first thought is that this is just a new name for kolmogorov complexity, an impression that sticks with me for a while as I’m reading. On page 6 you make the connection to kolmogorov (algorithmic information) but I wonder if it wouldn’t have been more intuitive to some, if kolmogorov was there from page 1.

This is the problem with uniting ideas in different fields I guess. What’s “intuitive” depends on your vantage point :-)

[John Baez: For some reason I’m not being allowed to post comments. But I can still edit other people’s comments. So, I’ll manifest myself in this unconventional way. Sorry.

You’re right: ‘Algorithmic entropy’ is almost the same as ‘Kolmogorov complexity’, and we should mention this.

The phrases ‘descriptive complexity’, ‘Kolmogorov-Chaitin complexity’, ‘Solomonoff-Kolmogorov-Chaitin complexity’, ‘stochastic complexity’, and ‘program-size complexity’ are other near-synonyms. Unfortunately, I don’t think everyone agrees on what all these terms mean! And in our work the details matter. Mike’s thesis summarizes the story:

In the mid-1960’s, Kolmogorov, Solomonoff, and Chaitin independently proposed the idea of using programs to describe the complexity of strings. This gave birth to the field of algorithmic information theory (AIT). The Kolmogorov complexity KU(s) of a string s is the length of the shortest program in the programming language U whose output is s. Solomonoff proposed a weighted sum over the programs, but it was dominated by the shortest program, so his approach and Kolmogorov’s are roughly equivalent. Chaitin added the restriction that the programs themselves must be codewords in an instantaneous code, giving rise to prefix-free AIT. In this model, complexities become true probabilities, and Shannon’s information theory applies directly.

In our paper, we fix a universal prefix-free Turing machine, and define the algorithmic entropy of a natural number s to be

- ln(∑x 2-|x|)

Here we are summing over all bit strings x that give n as output, and we use |x| to mean the length of the bit string x.

If one program x having n as output is a lot shorter than all the rest, then the algorithmic entropy of n is about (ln2) |x|. And in this case its Kolmogorov complexity is just |x|.

More generally, the Kolmogoriv complexity is always just |x|, and there’s a bound on the difference between the Kolmogorov complexity and the algorithmic entropy divided by ln2. The ln2 is no big deal: it shows up merely because physicists prefer natural logarithms, while computer scientists prefer base 2.

To make the above ideas precise, let me define the concept of a ‘universal prefix-free Turing machine’. This idea plays a basic role in modern algorithmic information theory.

Let me use string to mean a bit string, that is, a finite, possibly empty, list of 0’s and 1’s. If x and y are strings, let xy be the concatenation of x and y. A prefix of a string z is a substring beginning with the first letter, that is, a string x such that z = xy for some y.

A prefix-free set of strings is one in which no element is a prefix of any other. The domain dom(M) of a Turing machine M is the set of strings that cause M to eventually halt. We call the strings in dom(M) programs. We assume that when the M halts on the program x, it outputs a natural number M(x). Thus we may think of the machine M as giving a function M from dom(M) to the natural numbers.

A prefix-free Turing machine is one whose halting programs form a prefix-free set. A prefix-free machine U is universal if for any prefix-free Turing machine M there exists a constant c such that for each string x, there exists a string y with

U(y) = M(x)

and

|y| ≤ |x| + c.

This gobbledygook says that U can simulate M.

All this stuff and more is in our paper!

Challenge for category theorists: create a category where a ‘universal’ Turing machine actually has some universal property. If the Turing machine M can simulate the Turing machine N, we should have a morphism from M to N. Then a universal Turing machine should be something like a weak initial object: it has a morphism, not necessarily unique, to every other Turing machine.]

Posted by: M on February 15, 2010 12:15 PM | Permalink | Reply to this

### Re: Algorithmic Thermodynamics

cf. What’s “explicit” depends on your vantage point. :-) Which leads to controversies such as: my explicit is more explicit than your explicit. For a more reasoned treatment, see Barry Mazur’s `Visions, Dreams and Mathematics’ available from his web page.
www.math.harvard.edu/~mazur/

Posted by: jim stasheff on February 15, 2010 1:00 PM | Permalink | Reply to this

### Re: Algorithmic Thermodynamics

M wrote: “When you show the two strings above and say that the uppermost has more algorithmic entropy, my first thought is that this is just a new name for kolmogorov complexity, an impression that sticks with me for a while as I’m reading. On page 6 you make the connection to kolmogorov (algorithmic information) but I wonder if it wouldn’t have been more intuitive to some, if kolmogorov was there from page 1.”

SH: Your intuition is correct, at least as far as concerning what is usually meant by algorithmic entropy.

In algorithmic information theory (a subfield of computer science), the Kolmogorov complexity (also known as descriptive complexity, Kolmogorov-Chaitin complexity, stochastic complexity, *algorithmic entropy*, or program-size complexity) of an object such as a piece of text is a measure of the computational resources needed to specify the object.

Posted by: Stephen Harris on February 15, 2010 8:30 PM | Permalink | Reply to this

### Re: Algorithmic Thermodynamics

When I started my post (above) which covers part of JB’s reply, JB’s post had not yet appeared or I wouldn’t have duplicated his superior effort.

Posted by: Stephen Harris on February 15, 2010 8:46 PM | Permalink | Reply to this

### Re: Algorithmic Thermodynamics

The ghost wrote:

So intuitively, just imagine that every program must end with a string that says THE END written out in binary.

Ok, I think I get what you mean - frankly that is something I already suspected - and I thought about allowing a program to continue after the end marker, let the machine ignore everything after the endmarker, consider two programs as equivalent iff the part before the endmarker is equal and work with these equivalence classes.

Somehow that seemed more “natural” to me than demanding that the machine is prefix-free (a matter of taste, I guess).

Everthing after the endmarker would be called “dead code” by a practitioner, maybe I see too much of that every day :-)

Posted by: Tim van Beek on February 16, 2010 6:18 AM | Permalink | Reply to this

### Re: Algorithmic Thermodynamics

Challenge for category theorists: create a category where a ‘universal’ Turing machine actually has some universal property. If the Turing machine M can simulate the Turing machine N, we should have a morphism from M to N. Then a universal Turing machine should be something like a weak initial object: it has a morphism, not necessarily unique, to every other Turing machine.

This is a challenge you guys solved a long, long time ago! :)

Universal computers actually need a stronger property than the one you’ve described. Since programs can treat data as programs, we don’t just want morphisms from TMs to TMs, we want the category to have a universal object – that is, it should have an object $U$ such that every object $A$ is a retract of $U$.

Dana Scott exhibited such a model for the untyped lambda calculus way back in the late 60s (the famous $D_ \infty$ construction, which gives a $D$ such that $D \simeq D \to D$). This model construction applies to Turing machines, too, because of Kleene’s $s-m-n$ theorem from recursion theory, which basically says that Turing machines/Godel codes/whatever form a partial combinatory algebra, which is a fancy way of saying that they can model the S and K combinators of the untyped lambda calculus.

John Longley has a nice paper, “Universal Types And What They Are Good For”, which is available at

http://homepages.inf.ed.ac.uk/jrl/Research/universal.pdf

Posted by: Neel Krishnaswami on February 16, 2010 9:33 AM | Permalink | Reply to this

### Re: Algorithmic Thermodynamics

1. I think the term “cross-entropy” is much better established than “relative entropy”. (Let’s pass over “Kullback-Leibler distance”.)

2. Would it be meaningful to ask what a phase transition means, on the algorithmic side?

[John Baez: Sorry, I’ll have to reply this way.

1. Cross-entropy may be more commonly discussed than relative entropy, but unfortunately it means something else. Given probability measures p and q on a countable set X, the cross-entropy of p with respect to q is:

-∑x p(x) ln(q(x))

while the relative entropy of p with respect to q is:

-∑x p(x) ln(p(x)/q(x))

They’re closely related: the relative entropy of p with respect to q is the cross-entropy of p with respect to q minus the entropy of p. I like relative entropy better because I know how to define it for any pair of probability measures p and q on a measure space:

-∫X ln(dp/dq) dp

as long as p is absolutely continuous with respect to q, so the Radon-Nikodym derivative dp/dq is well-defined.

But some good may come of all this terminological nonsense: maybe some of our formulas would simplify if we used cross-entropy.

2. I’m definitely interested in finding and studying critical points in algorithmic thermodynamics. Manin did something similar in the papers Zoran mentioned here. Unfortunately Manin’s functions are all uncomputable! Our partition function is computable except in the T → ∞ limit: the ‘high-temperature limit’, where programs with long runtimes count a lot so the halting problem kicks in. I would love to extract interesting information from studying the behavior of the partition function as T → ∞, but we haven’t figured out how yet.]

Posted by: Allen Knutson on February 15, 2010 1:48 PM | Permalink | Reply to this

### Re: Algorithmic Thermodynamics

Some ideas on connections between computation/information and statistical physics/thermodynamics are contained in 3 recent papers of Yuri Manin

* Yuri Manin, Renormalization and computation I: motivation and background, arxiv 0904.4921, Renormalization and Computation II: Time Cut-off and the Halting Problem arxiv 0908.3430

* Yuri Manin, Matilde Marcolli, Error-correcting codes and phase transitions, arxiv 0910.5135

Posted by: Zoran Skoda on February 15, 2010 1:49 PM | Permalink | Reply to this

### Re: Algorithmic Thermodynamics

I was wandering what you get when you restict the possible programs to cellular automata: Programs that change pieces of data only according to “neighbour” data.

Gerard

Posted by: Gerard Westendorp on February 15, 2010 6:02 PM | Permalink | Reply to this

### Re: Algorithmic Thermodynamics

Since Stephen Wolfram proved (or got very very close to prove) that some 1D cellular automata are turing machines, what you suggest wouldn’t change much, far as I can see.

Posted by: M on February 16, 2010 11:47 AM | Permalink | Reply to this

### Re: Algorithmic Thermodynamics

I havn’t had time yet to think about this deeply, but it seems to me that cellular automata do have some features that make computation look more like phyiscics than more general programs do.
For example, the ‘algorithmic volume’ associated with cellular atomata is just a constant times the number of cells.
I don’t know how you should define the halting time of a cellular automata. You could say a cellular automat always goes on for ever, even if the data turns extremely boring after a while. On the other hand, you could also say that the program always halts after just one step: It processes a state (t) into a state (t+1). If you want to know State (t+2), you have to run the program again.

Gerard

Posted by: Gerard Westendorp on February 17, 2010 10:43 PM | Permalink | Reply to this

### Re: Algorithmic Thermodynamics

You can have one cell of the CA be special; if it ever turns on, then the CA stops.

Posted by: Mike Stay on February 18, 2010 2:19 AM | Permalink | Reply to this

### Re: Algorithmic Thermodynamics

This seems a bit arbitrary for the equivalent of energy. Also, the program
For i = 1 to N: Next i
Has a halting time that is proporional to N. But that seems equally unsatisfying for the equivalent of energy.
If you fill in random points in Conway’s game of life, you often see an inital stage, in which things get tidied up. After a while, you often get a ‘boring’ state. If you could somehow quantify this, it would seem to be a better analog of energy.

Gerard

Posted by: Gerard Westendorp on February 18, 2010 7:32 AM | Permalink | Reply to this

### Re: Algorithmic Thermodynamics

Wolfram’s conjecture that the “Rule 110” CA is capable of universal computation was proved by Matthew Cook. The politics and drama surrounding that proof are, I’d say, beyond the scope of this comment thread.

Posted by: Blake Stacey on February 16, 2010 10:05 PM | Permalink | Reply to this

### Re: Algorithmic Thermodynamics

Do you find analogues to all the usual laws of thermodynamics? I wonder what the equivalent of a perpetual motion machine would be.

Posted by: David Corfield on February 16, 2010 9:05 AM | Permalink | Reply to this

### Re: Algorithmic Thermodynamics

Sorry for my exceedingly naive thoughts: Let’s take a universal prefix-free Turing machine U, as in the paper.

If the set X of programs that make U halt is analog to the set of microstates, we could define “U executes program x” as analog to “an ideal gas is in microstate y”.

We would then need a process that takes our system from one microstate to the next. Naive idea: We could feed U the output of the program that it executed (assuming that input and output are bit strings, i.e. when the paper says “the output is a natural number” I assume that the output is a bit string interpreted as a natural number by a fixed encoding).

If the energy of the system is the log of the runtime of the program it executes, a perpetuum mobile of first order would be an U and a program $x{_0}$ such that if $x_{n+1}$ is the output of $U(x_{n})$ for n=1, 2, 3…then the execution time of $U(x_{n+1})$ is bigger than that of $U(x_{n})$.

(Sorry for my lack of proficiency in tex).

After googling a bit I discovered that Michael Stay already wrote a paper about simple concrete models of prefix-free Turing machines (comes up as the first and second hit),

I’ll try to read it, I promise! But do the models allow the situation I described?

Posted by: Tim van Beek on February 16, 2010 5:33 PM | Permalink | Reply to this

### Re: Algorithmic Thermodynamics

Sorry for my exceedingly naive thoughts: Let’s take a universal prefix-free Turing machine U, as in the paper.

If the set X of programs that make U halt is analog to the set of microstates, we could define “U executes program x” as analog to “an ideal gas is in microstate y”.

Yep, that’s the analogy.

We would then need a process that takes our system from one microstate to the next.

Not in order to do thermodynamics; one just assumes that all accessible states (i.e. within some delta of the given $E, V, N$) are equiprobable. Then the entropy is just the log of the number of accessible states.

We do need an equation of state; that’s given by the particular choice of universal Turing machine.

After googling a bit I discovered that Michael Stay already wrote a paper about simple concrete models of prefix-free Turing machines (comes up as the first and second hit),

The languages I describe there are useful for doing things like computing exactly how many programs halt of a given length, for small lengths. At one point I considered putting an explicit calculation alongside the description of the thermodynamic cycle in the new paper.

Posted by: Mike Stay on February 16, 2010 10:02 PM | Permalink | Reply to this

### Re: Algorithmic Thermodynamics

Mike wrote:

At one point I considered putting an explicit calculation alongside the description of the thermodynamic cycle in the new paper.

Even if you don’t, you could add the reference to your paper and note that it explains prefix-free Turing machines in more detail, that small amout of immodest cross-reference will be forgiven :-) (at least by people like me who did not know this notion).

Or if there is a more appropriate reference, it would be useful to have it somewhere between “To make this precise, we recall the concept of a universal prefix-free Turing machine.” and “Then we can define some probability measures on X = dom(U) as follows.” on page 6.

Posted by: Tim van Beek on February 17, 2010 8:52 AM | Permalink | Reply to this

### Re: Algorithmic Thermodynamics

David Corfield wrote:

Do you find analogues to all the usual laws of thermodynamics?

Interesting question. Before I answer it, though, I should emphasize this:

In practice what really matters for a lot of thermodynamics is not those famous numbered ‘laws’, but the way we can use the formula for the Gibbs ensemble to derive a host of formulas relating the entropy, the partition function, various observables, their mean values, variances and higher moments, and the conjugate variables of these observables! That is what we study in our paper.

But anyway, now that you mention it, I guess a lot of people will be curious about those laws. Here’s the story:

The zeroth law is true in our setup, but we haven’t mentioned that in our paper yet, since it’s mainly concerned with a single system in equilibrium — a single ensemble of programs — rather than two in contact.

The first law is conservation of energy. We don’t have that, since we don’t have any concept of time in our setup. We could remedy this by treating computers as systems that evolve in time with time evolution generated by a Hamiltonian.

Nonetheless we do derive an analogue of the fundamental thermodynamic relation, which is closely related to energy conservation, namely

$d E = T d S - P d V$

or the fancier version in my blog entry above. This says that the change in energy of our ensemble of programs is the change in heat minus the work done by this ensemble.

We also lack the second law, namely the idea that entropy also increases, again because we have no concept of time.

I haven’t checked, but I believe we might be able to derive the third law, namely that entropy reaches a minimum at absolute zero.

Posted by: John Baez on February 17, 2010 7:44 PM | Permalink | Reply to this

### Re: Algorithmic Thermodynamics

John wrote:

…since we don’t have any concept of time in our setup.

Surly I’m missing something obvious again, but wouldn’t the analog to the ideal gas be this: After the Turing machine finishes with one program, let it pick the next program x that it executes according to the probability p(x) given by the Gibbs ensemble?

This provides you with a discrete notion of time, without specifying mircodynamics, and the analog to the coupling to a heat bath would be the provider of the programs that let’s the Turing machine pick the next program. (Too easy to be true, but I do not see what is wrong with the picture nevertheless).

Posted by: Tim van Beek on February 18, 2010 4:03 PM | Permalink | Reply to this

### Re: Algorithmic Thermodynamics

Tim van Beek writes:

Surely I’m missing something obvious again, but wouldn’t the analog to the ideal gas be this: After the Turing machine finishes with one program, let it pick the next program $x$ that it executes according to the probability $p(x)$ given by the Gibbs ensemble?

You can do this for any Gibbs ensemble, or indeed any probability measure whatsoever. But this is not how the dynamics of an ideal gas actually works. The molecules in the gas do not jump discretely from one state to another in a random manner: they move continuously following some differential equations describing the laws of physics, either classical or quantum. It’s these underlying dynamical laws that are based on the concept of energy, and automatically yield conservation of energy. In a random jump dynamics there would be no conservation of energy except in an average statistical sense.

In short: inventing a random jump dynamics based on the probabilistic description is too easy to be much fun, and it doesn’t get you much. A deeper analogy between algorithmic thermodynamics and ordinary thermodynamics might involve treating a piece of hardware as an actual physical system obeying some differential equations. Then there would really be a god-given concept of energy, time evolution, energy conservation and so on.

But anyway, that’s not what we’re studying in this paper.

Posted by: John Baez on February 18, 2010 9:04 PM | Permalink | Reply to this

### Re: Algorithmic Thermodynamics

John wrote:

A deeper analogy between algorithmic thermodynamics and ordinary thermodynamics might involve treating a piece of hardware as an actual physical system obeying some differential equations.

These are the dots I did not connect, I thought it was all about describing the Turing machine by some sort of microdynamics - which would have to be a discrete time system, assuming that the set of atomar operations it can perform is finite resp. there is a program x where the runtime of U(x) attains it’s minimum, which is the part where the analogy to the ideal gas breaks down.

Sorry for being slow.

Posted by: Tim van Beek on February 19, 2010 11:54 AM | Permalink | Reply to this

### Re: Algorithmic Thermodynamics

You don’t really need the concept of time to derive second law. All you need is that the phase space volume is the same at two times.

Posted by: tytung on February 25, 2010 4:14 PM | Permalink | Reply to this

### Re: Algorithmic Thermodynamics

Thank you for writing this article as I am bent on penetrating what has been written in A NEW KIND OF SCIENCE by Stephen Wolfram.

While I appreciate the use of formulas here as well as at this link as a way to comprehend what Wolfram writes, I also would note that this author claims: “Above 1D no systematic method seems to exist for finding exact formulas for entropies” (p.959) and “So the fact that traditional science and mathematics tends to concentrate on equations that operate like constraints provides yet another reason for their failure to identify the fundamental phenomenon of complexity that I discuss in this book.” (p.221)

For the 1D cellular automata, Wolfram suggests the closest analog to be Limit[Sum[UnitStep[p[i]], {i,k^n}]/n, n → ∞]

Do you recognize this notation or does a 1D entropy not apply significantly when using formulas such as the one you write about?

Posted by: Brenda Tucker on February 16, 2010 7:15 PM | Permalink | Reply to this

### Re: Algorithmic Thermodynamics

The formula shown in the link below looks like the standardish formula I often see.

Posted by: Stephen Harris on February 16, 2010 9:15 PM | Permalink | Reply to this

### Re: Algorithmic Thermodynamics

Wolfram’s NKOS is probably not the place you should start from when trying to learn mathematical physics. Try this list of books instead.

Posted by: Mike Stay on February 17, 2010 3:37 AM | Permalink | Reply to this

### Re: Algorithmic Thermodynamics

Speaking of good books, everybody has heard of “The Feynman Lectures on Physics”. I chanced into another volume dedicated to Richard Feynman with contributions by a bunch of prestigious guys.

Feynman And Computation: Exploring The Limits Of Computers: Anthony J.G. Hey which contained,

ALGORITHMIC RANDOMNESS, PHYSICAL ENTROPY, MEASUREMENTS, AND THE DEMON OF CHOICE
W. H. Zurek [SH: The first two sentences are quoted because I found them very funny.]

The cost of erasure eventually offsets whatever thermodynamic advantages of the demon’s information gain might offer. This point (which has come to be known as “Landauer’s principle”) is now widely recognized as a key ingredient of thermodynamic demonology.

The ultimate analysis of Maxwell’s demon must involve a definition of intelligence, a characteristic which has been all too consistently banished from discussions of demons carried out by physicists.

Using the Church Turing thesis as a point of departure, the present author has demonstrated that even this intelligent threat to the second law can be eliminated — the original “smart” Maxwell’s demon can be exorcized. This is easiest to establish when one recognizes that the net ability of demons to extract useful work from systems depends on the sum of measures of two distinct aspects of disorder:

Physical entropy is the sum of the statistical entropy and of the algorithmic information content: Z(p) = II(p) + K(p)

Besides your and JB’s splendid paper, it is surprising how much interest and how many well-written papers this topic provides.

Posted by: Stephen Harris on February 17, 2010 5:22 AM | Permalink | Reply to this

### Re: Algorithmic Thermodynamics

The Feynman Lectures on Computation, to which that volume is a sequel, give a conceptual construction for “a heat engine powered by programs” — or, in Feynman’s case, a data tape. The gist of the argument, as I recall: Each cell of the “tape” contains an atom bouncing around on one side of a partition; if one knew which side of the partition the atom was on, one could position a piston properly and extract energy from the cell. Re-inserting the partition after the energy extraction leaves the cell in a randomized state. If the tape is randomized, there’s no way to know how to configure the piston; a random tape has no fuel value.

Posted by: Blake Stacey on February 17, 2010 5:37 PM | Permalink | Reply to this

### Re: Algorithmic Thermodynamics

I carefully read the paper many times and would just ask these two questions:

1. What’s the difference in your mind between the # of programs which can be doubled in Algorithmic Temperature and the Mean Length of the programs? It seems as if the length of a program is relative to the number of programs on initial conditions.

2. Why does Tadaki categorize your algorithmic pressure as his algorithmic temperature (or why do you elect to discuss algorithmic temperature as something that he calls pressure?) Does it make little difference?

In Wolfram’s NKS he shows mechanical work is possible due to the orderly cyclical repetitions of his 2 dimensional CA as long as no object is present to distract the simple repetitions. (p.446)

Posted by: Brenda Tucker on February 17, 2010 1:16 AM | Permalink | Reply to this

### Re: Algorithmic Thermodynamics

Brenda wrote:

What’s the difference in your mind between the # of programs which can be doubled in Algorithmic Temperature and the Mean Length of the programs?

I don’t understand what you mean by “the number of programs which can be doubled in algorithmic temperature”. That phrase doesn’t quite parse.

So, I’ll just say some stuff:

In thermodynamics we have:

$T = d E / d S$

In words: temperature $T$ is the rate of increase of energy $E$ as you increase the Gibbs entropy $S$. In many situations the Gibbs entropy is approximately the logarithm of the number of microstates with a given energy. For an explanation of this approximation see the Wikipedia article on Boltzmann’s entropy formula. Boltzmann came before Gibbs and his formula for entropy is now seen as an approximation to Gibbs’ exact formula.

If we make this approximation, the above equation says roughly that “temperature is how much we need to increase the energy of a system to multiply the number of microstates having this energy by a factor of $e$.”

Or if we don’t mind being off by a factor of $ln 2$: “temperature is how much we need to increase the energy of a system to double the number of microstates having this energy.”

In the introduction of our paper we make this approximation. But we’re talking about an analogy where we treat:

• a program as analogous to a microstate,
• the logarithm of the runtime of a program as analogous to the energy of a microstate,
• the ‘algorithmic temperature’ as analogous to the temperature.

So, we may say: “the algorithmic temperature is how much we need to increase the log runtime of a program to double the number of programs having this log runtime.”

Or, again ignoring a factor of $ln 2$, which actually compensates for the one we ignored before: “the algorithmic temperature is how many times we need to double the runtime in order to double the number of programs having this runtime.”

You can think of this sentence as a rough verbal definition of ‘algorithmic temperature’. The actual definition is $T = d E /d S$.

Why does Tadaki categorize your algorithmic pressure as his algorithmic temperature (or why do you elect to discuss algorithmic temperature as something that he calls pressure?) Does it make little difference?

So far it seems to make little difference. As we say in our paper:

Before proceeding, we wish to emphasize that the analogies here were chosen somewhat arbitrarily. They are merely meant to illustrate the application of thermodynamics to the study of algorithms. There may or may not be a specific ‘best’ mapping between observables for programs and observables for a container of gas! Indeed, Tadaki has explored another analogy, where length rather than log run time is treated as the analogue of energy. There is nothing wrong with this. However, he did not introduce enough other observables to see the whole structure of thermodynamics, as developed in Sections 4.1–4.2 below.

I would like to know which analogy is ‘best’, but so far I don’t see a strong argument for one being ‘best’.

Posted by: John Baez on February 19, 2010 2:01 AM | Permalink | Reply to this

### Re: Algorithmic Thermodynamics

Also, I have to ask because it is starting to bug me.

How do you know there are two relative probability measures (p,q)? Since it is one dimensional, is it even possible to use the relative part of the equation? Maybe there is only one variable for information to be learned?

Posted by: Brenda Tucker on February 17, 2010 1:37 AM | Permalink | Reply to this

### Re: Algorithmic Thermodynamics

You say ‘since it is one dimensional’, but you don’t say what ‘it’ is. So, I don’t understand what you’re saying.

Posted by: John Baez on February 18, 2010 9:06 PM | Permalink | Reply to this

### Re: Algorithmic Thermodynamics

Interesting Idea. But I think a better name should be “Thermodynamic Algorithms”?

Posted by: tytung on February 25, 2010 3:58 PM | Permalink | Reply to this

### Re: Algorithmic Thermodynamics

The fact that the energy (and the other variables, for that matter) is entirely arbitrary makes me rather leery of any strong connection to thermodynamics. Saying instead “statistical mechanics” alleviates some of this, as that truly is maximization of Shannon entropy relative to some prior measure, consistent with observed constraints.

The first question that springs to my mind is: “what are reasons to consider particular constraints as useful?”. Physically, there are certain macroscopic variables we care about, and the ones you pick are ones we do tend to care about in CS. The second is: “what are good priors to consider?” In physics, of course conservation of phase-space volume gives us that prior, but anything analogous is much harder to think about without some notion of time to give us conserved quantities.

Posted by: Aaron Denney on February 26, 2010 6:05 AM | Permalink | Reply to this

### Re: Algorithmic Thermodynamics

Aaron writes:

The fact that the energy (and the other variables, for that matter) is entirely arbitrary makes me rather leery of any strong connection to thermodynamics. Saying instead “statistical mechanics” alleviates some of this…

That’s true; we say ‘thermodynamics’ merely because we expect that relations like $d E = T d S - P d V$ and the Maxwell relations are likely to be familiar to most people from classes on thermodynamics. But it’s important to emphasize that such relations arise automatically from any collection of observables. I thought we made this clear — but now I think it should be made even clearer.

Posted by: John Baez on February 26, 2010 8:29 PM | Permalink | Reply to this

### Re: Algorithmic Thermodynamics

I haven’t yet read the paper, but I prefer a two column text layout, as opposed to one column with 2 inch margins on each side.

Posted by: Greg Buchholz on February 26, 2010 8:02 PM | Permalink | Reply to this

### Re: Algorithmic Thermodynamics

The TeX file is here and you’re free to reformat it, but you’ll need tikz to get the diagrams, and you’ll also need this eps file to get the picture of the piston.

Posted by: John Baez on February 26, 2010 8:21 PM | Permalink | Reply to this

### Re: Algorithmic Thermodynamics

The kind of margins that make sense for printed output generally don’t make sense for online reading (amongst other things fit-to-window-width doesn’t do what you want it to). I wrote a program which rendered the pages of a pdf to bitmap images, figured out the margins and re-rendered the bitmap at sceen size (designed to be used for those cases where the author doesn’t make the raw source available, unlike here) so that you can use an image viewer to display them for reading. Its a real hack job (for Linux/BSD), written primarily for me to be able to read papers on a 7 inch tablet device whilst on the bus. I don’t have a web-space at the moment but if anyone thinks it’d be any use I can try and figure a way to make it available.

[I did look at the pdf spec to see if there was a way to modify a pdf file directly to achieve this (amongst other things the bitmap images are much larger than pdfs). Unfortunately the pdf doesn’t even make listing a bounding-box-for-actual-content mandatory (in addition to the bounding-box-giving-paper-size) and so most pdf creation programs dont bother to provide it. This was even before considering how to move the content so I quickly gave up on the idea of tweaking the pdf. This is one of those problems that really ought to have been seen but clearly the pdf specification writers cared more about cool things like being able to embed active content into pdf documents.]

Posted by: bane on March 2, 2010 1:05 PM | Permalink | Reply to this

### Re: Algorithmic Thermodynamics

A revised version of our paper on algorithmic thermodynamics is now available on the arXiv, and I’ve written an introduction to it here.

Posted by: John Baez on October 12, 2010 1:32 PM | Permalink | Reply to this

Post a New Comment