Kolmogorov-Chaitin Complexity

by Nikos A. Salingaros

Trying to measure the complexity of a system is not straightforward. The simplest measure of a system’s complexity reflects not so much its intrinsic complexity, or the complexity of the process that generated it, but the complexity of the system’s description. As such, this approach is not without its limitations.

We can measure the complexity of a one-dimensional system by using a very simple notion: the length of its description in some programming language. This measure is known as the Kolmogorov-Chaitin Complexity. Analyzing a string of letters or numbers might reveal a pattern that repeats, and in that case, its description becomes shorter. I’m going to use binary entries of 0 and 1 in 70-character strings, yet the discussion is valid for any letters or digits. For example, it is obvious that this information string is not complex at all:

1111111111111111111111111111111111111111111111111111111111111111111111

Its generative algorithm is: repeat the digit 1 70 times, or “1 x 70”. On the other hand, the more complex string

1010101010101010101010101010101010101010101010101010101010101010101010

still has only minimal complexity, since its algorithm is a short one: “10 x 35”. But it has slightly more complexity than the first trivial string. The next orderly sequence is very different, yet it also possesses low complexity:

1111111111111111111111111111111111100000000000000000000000000000000000

It has the generating algorithm “1 x 35, 0 x 35”. Something non-trivial occurs only at the 36th position. For genuinely complex strings of digits, we face two distinct types of complexity. First, there is organized or coherent complexity, such as the following binary sequence:

1000110101100011010110001101011000110101100011010110001101011000110101

Analyzing this information string for patterns, we discover hidden symmetries, and those allow us to reduce its generating code to alternating 10001 with 10101 7 times, or “10001 alternate 10101 x 7”. We do have a complex string, yet internal patterns reduce the coding length of its generating algorithm.

The second type of complexity represents random, or disorganized complexity, as for example the following random sequence:

0010001111010101100001011100101111110101111111101001110001101000100101

It has maximal complexity, because its shortest description is exactly as long as the string itself. There is no possible compression, such as hidden patterns in its descriptive rule. We can be sure of that, since we generated this string by using a random-number generator.

Probability arguments produce one complication, however: it is quite possible to obtain internal patterns in a randomly-generated string. But you cannot count on it, and might have to try an astronomical number of times before you come up with a regular pattern.

The key mechanism revealed by these examples is the presence or not of patterns, which then permit compression of the informational string. Total compression is equivalent to informational collapse, and this occurs in cases of extremely low complexity. In examples of strings with non-trivial complexity, the ability to compress them reduces their complexity measure, but never to near zero.

I will now compute the length of the description of each algorithm for the informational strings presented as examples. This is a simple character count of the description in English, and I obtain the following values for all the strings:

1 x 70 → 4 characters
10 x 35 → 5 characters
1 x 35, 0 x 35 → 9 characters
10001 alternate 10101 x 7 → 21 characters
[Random] → 70 characters

Since we partially depend upon the English language for our descriptions, these algorithms are not expressed in the most efficient way; hence their length is not unique. The shortest description length obviously varies with the programming language used. But that doesn’t matter here, as the model serves to illustrate one-dimensional complexity in a very simple manner. For example, the above complex string with internal patterns could be generated equally by the description “1000110101 x 7”, which has a character count of only 12. This number is different from 21, yet still falls between 9 and 70; that is, between a trivially-simple string and a random string, which is the key point.

Another issue is that human perception plays an important role in describing complexity, even though this has nothing to do with the actual algorithm. Informationally-minimal strings do indeed look trivially simple to us. They are of no interest, as they do not communicate a message. Thus, extreme simplicity is not useful, and we require complexity above some threshold to transmit a message. Humans pay attention to ordered information, but too much information overwhelms our cognitive system. And complex strings of very different complexity may at first appear to be equally complex. It is only after we “see” a pattern that we can recognize the relative complexity of an informational string.

The Kolmogorov-Chaitin complexity has obvious limitations because it ignores the degree of internal symmetries, such as repetition and nesting, yet those directly affect our cognition. For example, the string referred to above is generated by the two “words” 10001 and 10101, each one of which is internally symmetric. This important feature is not measured by the character count. In fact, the algorithm in terms of the symmetric words gives us a higher character count (21) than the simpler repetition that takes no symmetry into account (12), which is counter-intuitive. We are most interested in, and wish to investigate, systems of information with complex ordered patterns.

Another important phenomenon not described by this model is when a very simple algorithm generates complexity iteratively throughout the entire system. Here, a simple rule such as a cellular automaton computes configurations in which every entry is constantly changing. With very few exceptions, there is no final state for a cellular automaton: the generated pattern continues to grow and transform. Any one of these states could be irreducibly complex, i.e. not compressible. When we try to find a description of such a system in terms of an algorithm that generates the information string sequentially (not in the way it was actually generated), it appears highly complex.

My interest in this topic arises from the description of design in general, and architectural styles, or “form languages”, in particular. Each form language is a description of how a building looks like and also how it is constructed, and I wish to measure the relative complexity of different form languages. Traditional form languages adapt to climate, local materials, human needs and uses, and culture, and are therefore irreducibly complex. They have many forces and constraints to adapt to, and have evolved step-by-step over time. Even though the buildings that distinct form languages generate obviously look very different, all of them (languages and resulting designs) have an equivalent measure of complexity.

A word count of a form language’s verbal description is a very rough measure of its Kolmogorov-Chaitin complexity. I ask my architecture students to fill out a rather standard checklist for a form language, and use their word processor to obtain a word count. Then students are able to compare their assigned form languages by comparing these numbers. This exercise separates minimalist from more complex buildings. Nevertheless, I don’t know how to solve the problem of distinguishing evolved form languages with high complexity, from randomly-generated and thus non-adaptive form languages.

Some architects introduce deliberate randomness for the sake of an innovative “look”. Nothing has evolved here, since no adaptation is involved. The design method is performed as an instantaneous artistic act. Those architects often use a “shortcut” to design their building, which, while not generating any adaptations, could be very complex indeed. This case is analogous to a simple rule (that we don’t know) generating a random string. It would be useful to understand when a form language is simple, and to distinguish the two cases when it is complex due to adaptation, or complex because it is generated randomly.
Perhaps a future essay could discuss these matters.

REFERENCES

“Kolmogorov complexity”, entry from Wikipedia.

A. Klinger & N. A. Salingaros (2000) “A Pattern Measure”, Environment and Planning B: Planning and Design, volume 27, pages 537-547. Republished by the Cornell University Library Arxiv, 28 August 2011.