towards a stable definition of algorithmic randomness

25
Towards a stable definition of Algorithmic Randomness Hector Zenil hector.zenil@lifl.fr Laboratoire d’Informatique Fondamentale de Lille (CNRS) and Institut d’Histoire et de Philosophie des Sciences et des Techniques (Paris 1 Panth´ eon-Sorbonne/ENS Ulm/CNRS) Paris Diderot Philmaths Seminar May 17, 2011 Universit Paris Diderot - Paris 7, SPHERE-REHSEIS Hector Zenil (LIFL and IHPST) Towards a stable definition of AR Paris Diderot 1 / 25

Upload: hector-zenil

Post on 18-May-2015

1.827 views

Category:

Education


1 download

DESCRIPTION

Although information content is invariant up to an additive constant, the range of possible additive constants applicable to programming languages is so large that in practice it plays a major role in the actual evaluation of K(s), the Kolmogorov complexity of a string s. We present a summary of the approach we've developed to overcome the problem by calculating its algorithmic probability and evaluating the algorithmic complexity via the coding theorem, thereby providing a stable framework for Kolmogorov complexity even for short strings. We also show that reasonable formalisms produce reasonable complexity classifications.

TRANSCRIPT

Page 1: Towards a stable definition of Algorithmic Randomness

Towards a stable definition ofAlgorithmic Randomness

Hector [email protected]

Laboratoire d’Informatique Fondamentale de Lille (CNRS) andInstitut d’Histoire et de Philosophie des Sciences et des Techniques

(Paris 1 Pantheon-Sorbonne/ENS Ulm/CNRS)

Paris Diderot Philmaths SeminarMay 17, 2011

Universit Paris Diderot - Paris 7, SPHERE-REHSEIS

Hector Zenil (LIFL and IHPST) Towards a stable definition of AR Paris Diderot 1 / 25

Page 2: Towards a stable definition of Algorithmic Randomness

Classical Probability

If the process generating bitstrings of length k is uniformly random, theprobability of producing a particular string is exactly 1/2k , the same as for anyother string of the same length.

Example

Let s1 and s2 be as follow: s1=’01010101010101’ and s2=’10110110001010’Both have probability P(s1) = P(s2) = 1/214 to be chosen at random amongthe 214 binary strings of length k = 14.

Yet s1 looks less random than s2. How to quantify and characterize suchintuition?

Hector Zenil (LIFL and IHPST) Towards a stable definition of AR Paris Diderot 2 / 25

Page 3: Towards a stable definition of Algorithmic Randomness

Algorithmic Complexity

Basic notion

A string complexity (or simplicity) is the difference in length between thestring and its shortest description.

The description of an object depends on a language. The theory ofcomputation is the framework of algorithmic complexity:

Basic notion

description ⇐⇒ computer program

A string of low algorithmic complexity is highly compressible, as theinformation that it contains can be encoded in an algorithm much shorterin length than the string itself.

Hector Zenil (LIFL and IHPST) Towards a stable definition of AR Paris Diderot 3 / 25

Page 4: Towards a stable definition of Algorithmic Randomness

Hector Zenil (LIFL and IHPST) Towards a stable definition of AR Paris Diderot 4 / 25

Page 5: Towards a stable definition of Algorithmic Randomness

Algorithmic Complexity (cont.)

Definition

(Kolmogorov, Chaitin) The algorithmic complexity K (s) of a string s isthe length (in bits) of the shortest program p that produces s running on auniversal Turing machine M.

K (s) = min|p|,M(p) = s

Hector Zenil (LIFL and IHPST) Towards a stable definition of AR Paris Diderot 5 / 25

Page 6: Towards a stable definition of Algorithmic Randomness

Algorithmic Randomness

Example

The string ’010101010101010101...’ has low algorithmic complexitybecause it can be described as k times ’01’, no matter how long, with thedescription increasing only by ∼ log(k) despite the linear increasingnumber of k .

Example

The string ’010010110110001010...’ may have high algorithmic complexitybecause it doesn’t seem to allow a shorter description other than the stringitself, so a shorter description may not exist.

Hector Zenil (LIFL and IHPST) Towards a stable definition of AR Paris Diderot 6 / 25

Page 7: Towards a stable definition of Algorithmic Randomness

Infinite vs. Finite Randomness

Definition

Given a natural number c and a sequence s, s is c-incompressible if

K (s) ≥ |s| − c

Example

A string s is random if the shortest program producing s is no shorter thans itself.

Definition

An infinite sequence s is Martin-Lof random if and only if there is aconstant c such that all initial segments (prefixes) of s arec-incompressible.

Hector Zenil (LIFL and IHPST) Towards a stable definition of AR Paris Diderot 7 / 25

Page 8: Towards a stable definition of Algorithmic Randomness

Infinite vs. Finite Randomness (cont.)

No finite string can be declared random. A string s can only look randombecause it can always be part of a longer non-random sequence.

Hector Zenil (LIFL and IHPST) Towards a stable definition of AR Paris Diderot 8 / 25

Page 9: Towards a stable definition of Algorithmic Randomness

Convergence of Definitions

There are 3 mathematical approaches to randomness, each capturingdifferent intuitive features:

Incompressibility (program-size)

Unpredictability (effective martingales)

Typicalness (effective statistical tests)

Importanta (Chaitin, Schnorr)

aalbeit some technicalities

Uncompressibility ⇐⇒ Unpredictability ⇐⇒ Typicality

Hector Zenil (LIFL and IHPST) Towards a stable definition of AR Paris Diderot 9 / 25

Page 10: Towards a stable definition of Algorithmic Randomness

The Choice of M Matters

A major criticism brought forward against K is also its dependence of thechoice of programming language (or, M, the universal Turing machine).From the definition:

K (s) = min|p|,M(p) = s

It may turn out that:

KM1(s) 6= KM2(s) when evaluated respectively using M1 and M2.

Basic notion

This dependency is particularly troubling for short strings, shorter than forexample the length of the universal Turing machine on which K of thestring is evaluated (hundreds of bits).

Hector Zenil (LIFL and IHPST) Towards a stable definition of AR Paris Diderot 10 / 25

Page 11: Towards a stable definition of Algorithmic Randomness

The Invariance Theorem

A theorem guarantees that in the long term algorithmic complexityevaluations will quickly converge, and that they will only diverge for somefixed constant value in the beginning.

Theorem

Invariance theorem If M1 and M2 are two (universal) Turing machines andKM1(s) and KM2(s) the algorithmic complexity of a binary string s whenM1 or M2 are used respectively, there exists a constant c such that for allbinary string s:

|KM1(s)− KM2(s)| < c

(think of a compiler between 2 programming languages)

Hector Zenil (LIFL and IHPST) Towards a stable definition of AR Paris Diderot 11 / 25

Page 12: Towards a stable definition of Algorithmic Randomness

Evaluating K

Let’s take an example, namely the program “A” as following :

1 n := 0

2 Print n

3 n := n + 1 mod 2

4 Goto 2

which generates the output string: “01010101...”. The length of A (inbits) is an upper bound of K .

Predictability

The program A trivially allows a shortcut to the value of an arbitrary digitthrough the following function f(n):

if n = 2m then f (n) = 1, f (n) = 0 otherwise.

Hector Zenil (LIFL and IHPST) Towards a stable definition of AR Paris Diderot 12 / 25

Page 13: Towards a stable definition of Algorithmic Randomness

Miscellaneous Facts

Most strings are random. There are exactly 2n bit strings of length n,but there are only 20 + 21 + 22 + . . .+ 2(n−1) = 2n − 1 bit strings offewer bits. So one can’t pair-up all n-length strings with programs ofshorter length (there simply aren’t enough short strings to encode alllonger strings).

There are examples of infinite random sequences. For example,Chaitin Ω numbers are algorithmic random.

Most real numbers are algorithmic random.

There is a deep connection between algorithmic randomness and thefield of computability (Turing degrees): random number →noncomputable number.

Hector Zenil (LIFL and IHPST) Towards a stable definition of AR Paris Diderot 13 / 25

Page 14: Towards a stable definition of Algorithmic Randomness

Noncomputability of K

Important Result

No algorithm will tell whether a program generating s is the shortestpossible (due to the undecidability of the halting problem of Turingmachines).

Basic notion

One may not be able to prove that a program generating s is the shortestbut one can exhibit a short program generating s (much) shorter than sitself. So even though one cannot tell whether a string is random becauseit may have a short generating program, one can find a short program andtherefore tell that the string is definitely not random.

Basic notion

One can find upper bounds by finding short programs approaching K , forexample, using compression algorithms.

Hector Zenil (LIFL and IHPST) Towards a stable definition of AR Paris Diderot 14 / 25

Page 15: Towards a stable definition of Algorithmic Randomness

Algorithmic Probability

There is a distribution that describes the expected output when picking aprogram at random and run it on a universal Turing machine.According to algorithmic probability, the simpler a string the more likely tobe produced by a short program. The idea formalizes the concept ofOccam’s razor.

Definition

(Levin) m(s) = Σp:M(p)=s1/2|p| i.e. the sum over all the programs forwhich M with p outputs the string s and halts.

M is a prefix free universal Turing machine1.

1The set of valid programs forms a prefix-free set, that is no element is aprefix of any other, a property necessary to keep 0 < m(s) < 1.)

Hector Zenil (LIFL and IHPST) Towards a stable definition of AR Paris Diderot 15 / 25

Page 16: Towards a stable definition of Algorithmic Randomness

AP Metaphors

Basic notion

It is unlikely that a Rube Goldberg machine produces a string if the stringcan be produced by a much simpler process.

The immediate consequence of AP is simple but powerful (and surprising):

Basic notion

Monkeys on a type-writer (Borel)garbage in → garbage out

Programmer monkeys: (Chaitin, Lloyd)garbage in → interesting out

Hector Zenil (LIFL and IHPST) Towards a stable definition of AR Paris Diderot 16 / 25

Page 17: Towards a stable definition of Algorithmic Randomness

Algorithmic Probability (cont.)

The chances of producing π are greater typing a program producing thedigits of π than typing the digits of π. Monkeys are just a representationof a random source.

m is related to algorithmic complexity in that m(s) is at least themaximum term in the summation of programs. So one can actually writem(s) as:

Theorem

(Levin, Chaitin)

− log2 m(s) = K (s) + c

Algorithmic probability defines a prior distribution on produced stringsbased on algorithmic complexity.

Hector Zenil (LIFL and IHPST) Towards a stable definition of AR Paris Diderot 17 / 25

Page 18: Towards a stable definition of Algorithmic Randomness

Main Idea

Using m(s) to evaluate K (s):

Observation (Zenil, Delahaye)

To approach K (s) one can calculate m(s).

Motivation

m is more stable than K (s) because one makes less arbitrary choices on M.

As m is defined in terms of K , m is also noncomputable and onlyapproachable from below (hence called semi-computable measure orsimply a semi-measure).

Hector Zenil (LIFL and IHPST) Towards a stable definition of AR Paris Diderot 18 / 25

Page 19: Towards a stable definition of Algorithmic Randomness

Calculating m

Definition

(Zenil, Delahaye) D(n) = the function that assigns to every finite binarystring s the quotient:(# of times that a machine (n,2) produces s) / (# of machines in (n,2)).

i.e. D(n) is the probability distribution of the strings produced by all2-symbol halting Turing machines with n states.

Examples for n = 1, n = 2

D(1) = 0→ 0.5; 1→ 0.5D(2) = 0→ 0.328; 1→ 0.328; 00→ .0834 . . .

Hector Zenil (LIFL and IHPST) Towards a stable definition of AR Paris Diderot 19 / 25

Page 20: Towards a stable definition of Algorithmic Randomness

Calculating m (cont.)

Given that the Busy Beaver function values are known for n-state 2-symbolTuring machines for n = 2, 3, 4 we could compute D(n) for n = 2, 3, 4.

Following techniques from (Wolfram), we ran all 22 039 921 152 two-waytape Turing machines starting with a tape filled with 0s and 1s in order tocalculate D(4)2

Theorem

D(n) is noncomputable (by reduction to Rado’s Busy Beaver problem(Rado)).

2A 9-day calculation on a single 2.26 Core Duo Intel CPU.Hector Zenil (LIFL and IHPST) Towards a stable definition of AR Paris Diderot 20 / 25

Page 21: Towards a stable definition of Algorithmic Randomness

Complexity Tables

Table: The 22 bit-strings in D(2) from 6 088 (2,2)-Turing machines that halt.(Zenil, Delahaye)

0 → .328 010 → .000651 → .328 101 → .0006500 → .0834 111 → .0006501 → .0834 0000 → .0003210 → .0834 0010 → .0003211 → .0834 0100 → .00032001 → .00098 0110 → .00032011 → .00098 1001 → .00032100 → .00098 1011 → .00032110 → .00098 1101 → .00032000 → .00065 1111 → .00032

Solving degenerate cases

Does ’0’ have high Kolmogorov complexity? AP says it is not random, it isactually the simplest string (together with ’0’) according to D.

Hector Zenil (LIFL and IHPST) Towards a stable definition of AR Paris Diderot 21 / 25

Page 22: Towards a stable definition of Algorithmic Randomness

From a Prior to an Empirical Distribution

We see algorithmic complexity emerging:

1 The classification goes according to our intuition of what complexityshould be.

2 Strings are almost always classified by length except in cases in whichintuition justifies they should not e.g. 0101010 is better ranked (lesscomplex) than e.g. 11001101.

Full tables are available online: www.algorithmicnature.org

From m to D

Unlike m, D is an empirical distribution and no longer a prior. Dexperimentally confirms Solomonoff and Levin’s AP measure.

Hector Zenil (LIFL and IHPST) Towards a stable definition of AR Paris Diderot 22 / 25

Page 23: Towards a stable definition of Algorithmic Randomness

Miscellaneous Facts from D

There are 5 970 768 960 machines that halt among the 22 039 921 152in (4,2). That is a fraction of 0.27.

A total number of 1824 strings are produced in (4,2).

The following are the most random looking strings according to D:1101010101010101, 1101010100010101, 1010101010101011 and1010100010101011, each with 5.4447×10−10 probability.

(4,2) produces all strings up to length 8, then the number of stringslarger than 8 decreases.

As in D(3), where we report that one string group (0101010 and itsreversion), in D(4) 399 strings climbed to the top and were not sortedamong their length groups.

In D(4) string length was no longer a determinant for string positions.For example, between positions 780 and 790, string lengths are: 11,10, 10, 11, 9, 10, 9, 9, 9, 10 and 9 bits.

D(4) preserves the string order of D(3) except in 17 places out of 128strings in D(3) ordered from highest to lowest string frequency.

Hector Zenil (LIFL and IHPST) Towards a stable definition of AR Paris Diderot 23 / 25

Page 24: Towards a stable definition of Algorithmic Randomness

Method Limitations

One cannot continue calculating D(n) for any given n because of thenoncomputability of D (the lack of Busy Beaver values for n = 5), but onecan proceed either by sampling or by a partitioning technique, that iscutting a longer string in shorter strings for which their complexity isknown.

Given that the procedure is, computationally speaking, very expensive, forlonger strings one can continue using compression algorithms that workwell for longer strings.

Media Coverage

“Pour La Science” (Scientific American in French) has featured myresearch in its July 2011 issue. Available onlinehttp://www.mathrix.org/zenil/PLSZenilDelahaye.pdf.

Hector Zenil (LIFL and IHPST) Towards a stable definition of AR Paris Diderot 24 / 25

Page 25: Towards a stable definition of Algorithmic Randomness

Further Discussion

1 How to extend the results of D to larger strings?

2 How stable is D to other computing frameworks?

3 How stable is D(n) for growing n?

4 Is convergence of D(n) in order or values possible?

5 How to formally reconnect D to m?

We positively answer some of these questions. We have shown thatreasonable formalisms of computation produce reasonable (andcompatible) complexity classifications. We have also shown that D(n) isstrongly stable at least for n ≤ 5.

Hector Zenil (LIFL and IHPST) Towards a stable definition of AR Paris Diderot 25 / 25