Causal Inference 101
THE LINK BETWEEN ENTROPY AND INTELLIGENCE. A SIMPLE EXPLANATION
Subjects included: causal
information, the origin of algorithms, adaptive behaviors, preset goals, and
maximum and minimum entropy methods.
1. START WITH AN EMPTY BRAIN
I'll try to explain the
entropy-intelligence link in the light of my theory of causality in a simple
way. Imagine an empty "brain". I know about bootstrapping, DNA, etc, but let's
leave that out for the moment. We need to understand how entropy works before we
can argue about bootstrapping. So we have an empty brain with no information in
it, more like a substrate, that can do two thinks: store information received
from sensors in an autobiographical memory, and remove entropy from it. This is
also known as a "host" in the host-guest model of the brain. Note that the term
"sensor" as a source of information is very general. A sensor can be just about
anything. And I have chosen the brain because the brain is the only known
example of an intelligent system.
Information coming from
sensors is causal. It consists of ordered (cause, effect) pairs, the cause is
the signal that activated the sensor, the effect is the signal the sensor
outputs to indicate detection. The elements in the pairs correspond to
"neurons", the relation in each pair is a dendritic connection. The pairs chain
together whenever they share a common element, for example (a, b) and (b, c)
would form a chain of two pairs. More in general, the pairs form an acyclic
digraph, a graph that has no loops. A path in the graph formed by chained pairs
is known as a trajectory. In a big graph there can be an enormously large number
of trajectories. The graph, or rather the collection of distinct elements and
the collection of ordered pairs, is known as a causal set.
Causal pairs are executable. Every causal set is an algorithm, a computer
program. Every trajectory in the causal set is a possible execution path. If we
write a statement such as A = f (B, C), we are saying that we need to have the
values of B and C before A can be calculated. That's a causal relation. We are
not saying how A is calculated, we are just expressing the causal relationships.
In this case, there are two pairs: (B, A) and (C, A), meaning that both B and C
must exist in order for A to exist. When working with the host-guest model, this
algorithm is the guest.
2. NOW THE BRAIN HAS A PROGRAM
Now the brain has a causal set obtained by chaining the pairs that
have been arriving from the sensors in all possible ways. The causal set is an
executable algorithm, and every trajectory in the causal set is an execution
path. Of course, every causal set has a collection of "effects" that have no
known causes (they have causes, but the causes are outside the brain and are
unknown). They are considered given, and are the input for the causal set. Every
causal set also has a collection of "causes" with unknown or non-existing
effects. These are the outputs.
This works. Nothing else is
necessary. You have a whole program, you can enter some input data to initialize
all the inputs or some of them and execute the chains that start there and reach
some or all of the outputs. This is how algorithms originate. Where else could
they have originated from? Recall the brain was assumed to be initially empty,
there is no other source for the algorithms other that the causal input itself.
I have argued this conclusion in more detail in my JAGI paper and will not
repeat that here. The important part is that algorithms, or behaviors, come from
causal pairs acquired by sensors from the environment and chained together. They
are not "created" in some magical sense or by some secret process in the human
brain. They come from observation of the world. More information can be coming
from the sensors in a constant flow, and it will be stored and chained into a
larger and larger program with a growing number of execution paths. In the real
brain, this guest process is conscious.
With that kind of organization, you already have many goals. They are the
subsets of outputs that are causally accessible from the inputs. They are all
preset by the information coming from outside, and, in this sense, they
constitute what we usually call adaptive behavior. They are the information,
only better organized. They represent possible behaviors the system in response
to external stimuli, and they appear to be adaptive because they respond to the
external stimuli. There is behaviors at this point, but there is still no
semantics, no meaning. For that,
we need entropy.
3. ENTROPY
Now it gets a little more sophisticated. I hope you know how to code
in some language, at least in general terms. So imagine yourself coding the
program in the brain as-is. You can't code each trajectory by itself, it would
be gigantic. Two trajectories can differ by as little as a few pairs. So you
have to reuse portions of trajectories to save space, and connect them with many
IF's and GOTO's. What will you get? Spaghetti code. The brain has spaghetti
code, the kind of code that is correct and works properly but only its author
can understand. This code is highly disorganized, can't be easily modularized,
or built-upon, or maintained, or integrated with other products. I don't know
very much about brain pathologies, but I can imagine that a person who learns by
heart but is unable to draw conclusions or use the "knowledge" intelligently may
have this kind of brain. Maybe this situation is related with autism. I just
can't say, I am only suggesting to look into it.
What would you do if your boss asked you to make the code "better", more
"understandable", but still correct? You would create classes and objects and
inheritance relationships and hierarchies of classes and methods that use the
objects. This is exactly what entropy does for you. Recall that the host is
assumed to be able to extract entropy from the information, and hence make it
more less uncertain. When entropy is removed, the code gets refactored, similar
elements of information get associated (binding!), and similar functionalities
coalesce together and form the classes and the objects and the methods. In the
brain, this phenomenon is observed as the formation of neural clicks made of
neurons and neural clicks made of other neural cliques, and, at a global level,
as the overall partition of the human brain into functionally specialized parts.
In the mind, as Joaquin Fuster explains, cognits arise made of elements of
information and cognits made of other cognits, all of them interconnected in a
complex network that associates the elements at all levels. In the actual brain,
this process of extracting entropy and causing the code to self-organize is
entirely unconscious. But the result, once the process has completed, is a
behavior, an algorithm, which is delivered to our cognition at the time it
completes and often causes surprise. This, we acknowledge when we express "I had
an idea." The new code is understandable, it has meaning, it has a semantics,
all of which has been created by the entropic process, known as causal
inference.
4. PRESET GOALS
The two processes, the conscious one and the unconscious one, run concurrently.
The unconscious process is of a themodynamic nature, and the conscious one is
algorithmic and constitutes our behaviors. They both happen "in place", the
whole substrate, or host, or brain, is running both processes at the same time
and at the same place, as neurons adjust their connections to minimum entropy
but do not affect the information kept in memory. Information acquired by
interaction with the environment is "hot" and highly uncertain. Yet, it is
algorithmic in nature and it admits of an input and an output. As heat and
energy are removed from the information, algorithmic paths known as trajectories
are formed in large numbers that connect the inputs to the outputs. The outputs
are the preset goals. They represent all the goals that can be achieved with the
currently available information. There are frequently many trajectories leading
to the same goals, and to each trajectory there corresponds a value of the
action as determined b y the action functional. Some have high action, others
least-action. The entropy of the system is very high because of all the
uncertainty associated with the multiplicity of trajectories. This is known as
the combinatorial explosion. The removal of entropy eliminates the high-action
trajectories associated with each goal, and leaves only the least-action
trajectories for that goal. The goal is not affected, only the trajectories that
lead to it are affected.
Incoming information is "hot", high entropy, but the rest of the brain is
already organized, "cold", certain. The incoming information gets organized and
integrated locally, very fast, causing very little perturbation for the rest of
the memory. All goals are preserved, and more goals keep being created as more
an more information is acquired. The result, is a brain that is constantly well
organized, has no uncertainty, and constantly knows all the possible behaviors
and goals corresponding to the history of information. There is no need for a
high-entropy search for goals, as some propose. The high entropy occurs only
locally at the point or points where incoming information is acquired.